 Welcome everybody to the session of Rosie, the robot operating system in Erlang by Natalia Chechina and Luka Sushi. We are glad that they could join us today. So without further delay, handing over to Natalia and Luka. Right, thank you Shish. So hello everybody. And so I'm Natalia Chechina and we have Luka Sushi and we'll talk today about Rosie, which is a robot operating system integrating Erlang. And this work we actually started in 2015 at the time I was a postdoc at Glasgow University, and then we sort of worked on the idea tried why whether Erlang is actually good fit for robotics. And last year we collaborated with the University of Milan, and we took this initiative with Piazza Gzinger and collaborated with the University of Milan, got funding from Erlang system foundation already heard about it from Francesca today, and I'll just say a couple of words later, and this project began to leave. So I will start with who we are, and both Luka and myself were developers, and we both have our past in academia. So I used to be an academic I worked at Glasgow, Harriet Ward and Bournemouth universities, and Luka he just recently graduated from University of Milan. So I'm a developer at Erlang Solutions and Luka is a developer at Piazza Gzinger Limited. And so Rosie, so what is Rosie? Rosie is about bringing scalability for tolerance, hotscots swapping to the, of the state of that distributed systems into the open source robotics. So the sort of first, there were there will be a couple of questions that will be answering today before we jump into sort of explaining Rosie explaining what exactly was then demonstrating what it has today. And the first two questions are what we are dealing with. Right. So what are robots. And the second question will be why do Robert needs Rosie, right, why, you know, life is, is not good without it. So what is a robot. So the, the question is, like, is our is my washing machine a robot. Right. So what makes a robot what distinguishes just, you know, computer or any sort of mechanism with some of a processor in it or some board in it from from a robot. And the answer is it has to have two things. It should sense and it should act. So if your washing machine can sense that you have unwashed laundry, and it can load it and wash it and give it back to you. Then it means you're washing machine as a robot. Otherwise, I'm sorry to say it's just a washing machine. And say the breath and the sort of variety of robots, it's huge. And we have lots and lots of robots we have humanoid robots, we have military robots. We have robots that, you know, focus on walking on recognizing on communicating with humans like huge variety of robots, and the types of robots that we interested, these are complex robots, right. So it's not just, you know, like a very, you know, primitive simple thing that you can, you know, code very quickly like in a day or two. And it starts like moving around maybe so we're interested in really complex robots that have multiple paths to them. For example, even let's they're very complex. They have lots of joins. They need to, you know, make sure that they stable they're not falling anywhere and things like that. So they're capable of various things, right. So they not just, you know, move hand, for example, back and forward, but they can maybe lift things, you know, carry them around. So, or walk around even more can see can like with cameras can hear with microphones can speak with speakers, and then can move with some kind of joins and things like that. So we're interested in very like light complex robot very, very complex robots. They are capable of communicating with various devices. It's interesting because we'll talk about it that modern robotics, it's modular. Right. So it has different components and then they communicate and this communication is very interesting to us. But I'm ultimately what we want is a collection of robots are doing something together. And when I'm talking about communication and robotics communication means different things. So communication means communicating like humans, like you and I, for example, you see my emotions, you see my sort of hands movement. You hear me sort of talking and that's how we communicate. But what I mean by communication and robots is that they sent data to each other like machines. Right. So it's not through their recognition of emotions. It's not called collaborative robotics, but through a sort of some kind of machine data transfer, which is more in the area of cooperative robotics. And what we are out of scope what we are not interested in I'm primitive robots. It's not that we're completely not interested in that. But maybe there will be later some application of it but at the moment it's sort of out of scope we are not targeting them. For example, these swarm robotics they're very primitive, their their behaviors about collection so behavior emerges from primitive behavior of single elements. So for example, these robots, they can, for example, move only, you know, forward one step forward right, but then when all of them start moving forward or part of them like stays and some move forward we can see like this complex behavior emerging from just the swarm of them. So these are out of scope for us at the moment, at least. So the next question, especially from those who work with distributed system is, well, why robots are different, right, why can't we just, you know, use the breadth of knowledge that we have in distributed systems we know how do for tolerance scalability, hot cold swapping in distributed system. Is it really needs something you can we just, you know, reuse these mechanisms, and actually we need we need to think a lot and we need to come up with new methods very specific to robots. And the reason for that, that first of all communication is not stable so when I think about a data center. For example, we think about very stable connections right so if connection fails then there is a problem right, or for example when we talk about a collection of computers in a room or in a building, they're still quite stable communication if they won't just lose it randomly, right, every another second. So, and in robots, we have very unstable communication because they move all the time. They go out of range that come back and things like that. Maybe the signal is not strong enough and all sorts of things that we work in the environment when the communication fails all the time. Then we deal with limited resources right so depending on the type of robot depending on what they do. We don't have like terabytes of memory, we don't have super powerful hundred core, or, you know, millions of course on our processes on the on the robot. So we make some powerful, you know, eight core 16 cores 24 cores, whatever that is, but it won't be, you know, super powerful. That's just will be a computer with some resources, and usually these resources are not particularly enough. So the robot needs much more than it has and there isn't because it needs to move. It needs to be compact. So we need to rely on this limited resources, then limited power. So when our data centers they just connected to the mains, we don't have this luxury we have a battery the battery have limited and the more powerful computer robots, the more powerful battery we have the more energy it consumes. But still, you know, it doesn't matter we still don't have enough energy and we need to make sure that when sort of the battery goes down the performance remains the same it's not the battery goes down beyond for example 40% and our sensors got disconnected so we need even if this happens then we need to sort of act on that and do something about it. And then each robot is a network, right. So, as I said, modular robots that build using a middle work rating system that is not just a monolith, but they have different components that go fail. So we're already in a single robot, we're dealing with a network of all those nodes or those components of those elements of example we have sensors camera. They have different joints and all of them like independent that have that very often that separate processors separate boards that deal with that. So, again, we have already a network even within a single device. And final thing, their threat, and that risk, right so the threat, because, especially the largest robot and if they have hands they can swing hands they can poke and things like that. So it's not necessarily sort of safe is something goes wrong so we need to have mechanisms that if something goes wrong in terms of software. The robot doesn't go wild and start you know swinging it hands you know around and jumps or falls or whatever happens. And that's risk. Right. So, again, if something happens it doesn't attempt to destroy itself or you know jump. From a bridge or under under train or something like that. So, if we need to have so there are lots of failures right so from what you can see already. There are failures that some software, you know these behave, and they can be lost. Right, the components may not behave they can, you know, hit or hit themselves somebody with themselves. There are lots of failures that we need to deal with. So, Ross robotic has Ross and Ross is huge right so it's open source robotics foundation that developed it. And it's robot pressing system it's not actually operating system it's a middle one, but they have rose and rose is huge. It's a, it's a really, really big it's like Linux maybe in this series it's systems. So, it's around from 2007. And it's a standard it's an amazing tool and the reason for creating was that is that no single lab, no single group or Institute can handle the robotics. It's just too big no one organization can do things sensible things with robotics right so they decided to come up with middle where that everybody can use everybody can, you know, quickly start with, and then focus on that. So, one of the elements of Ross is that you can use as little or as much of Ross as you want. Right, so you start with and then off you go, wherever you want to sort of work with. So, this is quite old picture but this is the only sort of picture that's rose foundation provides. So, I just keep using it I use in 2015. I use it in 2022. Maybe when they come back, come with a better picture, I will share a new one. But anyway, even this picture sort of demonstrates you that rose is used a lot. I will demonstrate how a lot it's used the Boston Dynamics, they have this amazing robot probably many of you saw it, it can jump it can resist. It's just amazing. And then they released a package rose package so that academics and industry who get by this robot, they can participate in open source, maybe not an open source, but in the research in developing something that so roses this huge that you know companies like Boston Dynamics, they developed packets that enables researchers and other industries to use it. So, it's a communicating tool, and it's used to enable partners, even who work with different middle way different software to talk and develop things that are available to everybody. So, this is overview of rose, and there is also rose to so I will just introduce rose for now. So, it's modular. So we have elements, and each element has a note. For example, we have a camera note that's connects to the camera, we have image processing node, and they don't have to be on a single computer that can be on different computers and that's the sort of demonstrated they use pops up publishing and subscription and if you're interested in certain thing, you can subscribe to it. So the reason roses huge, right. So there is rose to now but it's not as popular because rose just had lots of legacy and very well developed rose to makes way now, but roses still huge. And it has it has this master node, right, and all notes need to be registered with it. And that's a single point of failure. The thing is that communication doesn't go through rose now. But if you want to communicate with other nodes if you want to subscribe, publish and other notes to sort of subscribe to you, you need to be registered with rose master. And the problem with that was that if something disconnects, and you don't connect to this master than the whole robot may fail. Right. So, just single sort of note that doesn't carry much thing in it can take the whole thing down. But indeed, they introduced rose to so they removed the single point of failure, and they replaced communication with dds that is considered for tolerant, reliable, and you can transfer lots of data. When we talk about reliability and full tolerance. That's an interesting thing. And that's sort of the language that we need to be aware when we come to another area, right for distributed systems for tolerance means that if something fails. You can quickly detect that that has failed, and then autonomously, and automatically without human intervention, restart this thing, right, that's full tolerance and robotics, it means that it's full tolerant, if it failed, and it doesn't prevent. It doesn't cause the rest of the robot to fail. Right. So that's full tolerance. And that's why we thought about crossing. Right. So, because this so sort of ancient understanding of reliability, and we have so much more powerful tools in distributed system. And that's why we bring it in. The process has quite a number of limitations from this to be systems perspective, and one of the thing it was never designed for scalability, and it was never designed for full tolerance. Right. It has a purpose. It's amazing for roboticists. It used the right tools. Right. Huge society of roboticist uses it. And it makes lots and lots of things right for them. But a full tolerance scalability we just never in the picture. And then we came and we have to, for example, and that is scalable that is full tolerant that was designed specifically for that thing. And Rosie is about tackling that exactly those things. And Rosie it's about remaining, leaving the notes as they are in Ross Ross to specifically right because we don't have a single point of failure. So we still have this Ross nodes, and all this sort of breadth and expertise of roboticist that developed those nodes. But then when we do communication between the nodes. That's when the alarm comes in, and it brings a scalability. It brings ability to recover from failures, and it brings quote to quote swapping. Yes. So initial find findings were done in 2015 when I was a postdoc at Glasgow University and I had an amazing, amazing student. I was a postdoc and with her and my colleagues from Glasgow University for Trinidad and Gerardo. We did these experiments just to see whether it will work. So we had, we had recognition, and that's Andrea face recognition, and then we implemented two systems so one was pure rose as it is. And the second another one, but with sort of a long handling communication there but it was a very primitive right, you couldn't actually use it. It was more for just to experimenting is there a future in there. And the main thing that we explored here is let it crush. We implemented two systems, and, as I said one is rose one in Erlang if you're interested in to have a look at the paper. It's, it's over here. And so what we've got here before this purple line is the system works right so it recognizes the basis basis we have this number of frames and it just recognize them. So it's fine so the thing that it goes up and down it's absolutely fine. And so here they're all sort of fine. And then we start introducing failures, right, and we have two types of failures, we have a soft failure, sending a signal, and we just have a brutal kill. And what happened is that with rose when even there is a 10% of failures, the robot just couldn't recover, right, it just eventually failed to go ahead and handle with it. And when we did with Erlang, even when we killed 50% of processes, it's sort of troubled like it was lagging right so by the failure what what we happen is that we kill process that handles it, and it's tries to sort of catch up and it's sort of lagging and the maybe your face is here the frame is still here then it sort of it tries to catch up, and it's sort of lagging behind. And then it's, it recovered and it was fine. Right. So what we found that Alan, it doesn't just reduce component downtime. It's actually mitigates the negative impact of failure. That in 2015 the paper was published in 2016 gave us the idea that yes, that's what we need to work. So, then we last year, as I said we collaborated with the University of Milan and we've got our amazing Luca, who started the implementation and he will talk, tell you about the implementation. Okay. Hi everyone. Thank you for this introduction. Basically, one year ago, I was tasked with giving an Erlang solution to the Ros two problem. So at the beginning I started studying what we had to implement. And we focus on the client library, which is the core feature of Ros, Ros has many packages many stuff. But the most important is the library for communication. So here you have the software stack and Ros two modernizes a lot over Ros one, because it decided to provide two bindings one in C++ and Python. And then under them as a common layer implemented in C called Ros client library. And down there, you can see it has multiple RMW that stands for Ros Middleware interfaces that lets Ros choose different DDS implementations. And basically, to describe you this tech, on the top you have language bindings. Language bindings are important when you have to manage the concurrency, because each languages has its way to use threads and resources in the system. So, for example, for example, Ros client library in C++ lets you plug in third party libraries for executors so that you can have multiple ways to deploy threads and have different scheduling to use for Ros messages, services and actions. And the RCL layer implements the logic and the shared so they save code repetitions. And they basically don't care about the distribution part that is the most complex aspect. So how actually that is replicated. So how do we send a message from this process on this machine to this other process this other machine on the network. DDS takes care of that. But when we, we come to implement a solution, we want to use all the beautiful features of Ireland that Natalia just introduced. So we need a fully airline implementation. We cannot just call we could call C code from Ireland and just use a piece of the stack so we could just implement let's say RCL in Ireland and then we use RCL and DDS. We decided to go fully committed and implement an airline version of basically, let's say the whole stack. So in the next slide. Natalia, she can change. We proposed that more that looks more simple, which is a good thing, because now you don't program an application on a library, but you program an application on other applications that are you can see you have RCL application. And you have a DDS application and everything is ended on the elevator machine. And this is a big plus, because on the left, if you have to think about threats, concurrency, etc. On the right, you have applications that are collection of processes organized in supervision tree so everything is supervised everything is under control. And that enables you to implement fault tolerance as described before. And so I implemented the DDS app in the past year. Of course experimental version because DDS is huge is big DDS itself is bigger than the last two, because it was to only use a subset of the DDS specification. The specification app that stands for data distribution service is managed by the OMG group object management group that is the, those are the same that manage the UML specification. And, okay, so we can change life, I think. And by implementing this of course, first requirement was to remain it's operable, because it's useless otherwise. So, the most important piece of the software was to implement the RTP as library, which is the real time public subscribed protocol that allows us to speak the same language of the native raw implementation through UDP. So we are right now. And by the way, our code is open source so you can check it out on GitHub. And we are interoperable with SCL Pi, SCL CPP. So basically any available ROS robot is able to communicate with us. And also, all the visualization tools. So, for example, RVs is a tool that basically, as you visualize what a ROS robot is publishing on the network. So you can see what is he's doing, but also other tool like introspection tools that allows you to debug the network. One thing is that for the ROS introspection tools, the official tools, they cannot tell a difference between their official implementation and ours. So we are fully transparent on that field. By the way, this slide only talks about the communication, but you have to take in consideration that at higher level data is transferred with messages. Messages are described and ROS has a custom language to describe these messages. So we also had to implement compilers to compile the ROS language for messages in Erlang modules. And also we developed another plugin for the Rebar 3 that is the development tool for the build tool for Erlang that extends the build tool and allows you to put ROS packages as dependencies of your Erlang application. So you can basically use the ROS package ecosystems, at least for now just for interfaces. You can basically pull the original interfaces, compile them and depend on them so you can speak literally the same language at all levels with ROS. So you are fully interoperable. And, okay, now, code is working. One thing I did with my thesis was to demonstrate a cool feature of Erlang. So this is the scenario. Maybe this is the photo of the Perseverance rover. And we thought, okay, but what if the Perseverance rover is implemented in ROS? Okay. And Perseverance rover maybe is running code that is vital for its survival. And what happens if we spot a bug that is going to kill the robot and we need to upgrade it, but we cannot allow ourselves to let the robot have a downtime. So we don't want to stop the robot. We don't want to stop the communication with the robot. The robot remains stable while the robot upgrades itself because the robot is alone. We cannot go there and remove a part or format some driver. No, nothing of that. So we need to save the robot and robot basically has to save itself. We just send the upgrade saying, okay, upgrade yourself. And we are going to demonstrate this. Of course, we don't have a rover on Mars, so we did it on our PC. So before starting the video, I want you to explain what you're seeing. Okay. In the top left corner, you have the ARVID window, which is the visualization tool. So ARVID is from the ROS distribution. And it's going to visualize what our Perseverance rover is publishing. So my implementation basically is publishing messages regarding the state of what this is. And the visualization tool is configured to show that in a 2D scene. Also one important note, the data you're going to see are two NASA data we took from a published bundle of the MEDA station, the meteorological environment data analyzer. And so those are two data from Mars. And on the top, on the bottom left corner, you have basically the shell that is running the node. Later I will change to the other shell. Right on the top right, you have basically the file system of the robot. And on the top left, you have the file that is the version 0.1 of the software. So right now, what will be running will be version 0.2.0 and we're going to upgrade to 0.2.1 to fix a bug. So we can start the video. And what you're going to see now, I should start the... Okay, basically I run the start script for an application. So I start in foreground. It's going to start the robot process. And our video is going to subscribe to the topics of our robot and visualize them. Basically there is a problem because the robot is bouncing around. Arrows that measure the wind are going up and down. The temperature is red hot while it's cold in Mars. So I'm going to create a directory for the upgrade and to paste the upgrade. So basically I'm just transferring data and until this point it's pretty standard stuff. You're basically upgrading a package with the new release. The release contains a fix, of course. And I'm going to start a shell. And I'm going to use the same script I used to start the robot right now. With another command, instead of foreground, I'm going to say upgrade and I'm going to provide the version number I'm targeting. By doing this, this script is another process. It's not the robot process. It's sending a remote procedure call. And on the left, the robot, as you can see, stabilizes itself. So now it's not bouncing. It's going straight. Temperature is blue. Arrows are horizontal. They should be because it's horizontal wind direction. And basically what happened is that our script launched a remote procedure call that says, hey, it's time to upgrade. The robot performed the upgrade was the robot, the application that was running. Basically the virtual machine that was handling this robot. And it made it without lag, without interruption, down times, collateral things, no side effects. Everything went fine, because we of course configured the upgrade procedure. And I'm not showing anything of this is a bit complex procedure to explain. But once you have the tar file ready, as you can see, it's a matter of distance. A millisecond swap. I'm perceptible. Totally transparent. Okay. I'm done. So I think we have just 10 minutes now. So it would be a good time to have questions and answers. Right. So I'll go very quickly. So you've already heard about Erlang ecosystem foundation. And this project has been funded by Erlang ecosystem foundations that if you have work interesting ideas that you think would benefit the Erlang community. Please don't hesitate to get in touch, submit the applications. You can find all the information on the website of Erlang ecosystem foundation. And if it's beneficial, if everything sort of meets, then your project will be funded and shown. Another thing is Erlang workshop. So if you have something that you've already implemented, and it's interesting. It has academic insight. Please consider applying to Erlang workshop. It's not just about Erlang. It's about functional programming. It's about anything that is sort of in any way can be related to scalability for tolerance and beam Erlang, whatever any languages. So there is a sort of dates deadline when submission presentation. If you're new to writing your papers, then the workshop provides support so an experienced academic can provide you support in writing your paper. So if you have ideas, please consider. Grisp too. So this work has been done with Grisp too. Yes, there it is Lucas showing it. So please consider if you're interested, they're available. That's the website. It's done. It's developed by Pia Stringsinger. And Erlang solution is hiring. I'm sure Francesca already talked about it. There will be more Erlang developers. So if you're interested, we have a number of offices. We have amazing A-class developers. And knowledge of Erlang is not compulsory. That's it from us. Thank you very much. And if you have any questions, we'll be very happy to answer. Thank you Natalia and Luca for sharing your experience with us today. And thank you Natalia and Luca for your time.