 So hi, everyone. My name is Ricco Carbagnani, and today, I will show you how to create an architecture to create a to a platform to create an efective human-robot interaction. I will start saying what our main goals are. What is the main architecture that we use. Then I will say, how do we implement this architecture for the now, the field of application for the robot. alguna. recorded video. Well, audio doesn't work. So it at the end without any audio so forgive me, I will show you all the problems that there are, next steps conclusions and then I'll show you Adam off the robot actually doing something. Only making? What the main goals? Our main goal is to create a robot able toanstulate and interact with human understanding, speech, gestures and facial expressions. The other thing is to build a modular software infrastructure so we have the possibility of of integrating different robotic platforms and integrate new software really easy and also remove all software. So this is really useful, especially for testing purposes and performance assessment. Also so we can test software and change it whenever we want. So how was it achieved? So we use high-level Python libraries for spoken language processing, sentiment analysis and vision, and also we use artificial intelligence applications. So the main architecture actually is pretty simple. Everything is based on Rosindigo. Why Rosindigo? Because this Mac library that we use for finistate machines runs only on Indigo. So we are stuck with that. Also the state machines actually run separate processes, so separate Python scripts, and they communicate with Ros. Now, what are the main benefits of this architecture? So thanks to the state machines that actually run as different programs, we are actually able to use asynchronous computing and also achieve multiprocessing. It's not threading, it's different, they're just different programs using different cores. So since its structure is modular, because that was one of our main goals, then soft code can be added and removed without problems. Also since it's already ROS-based, basically every component that uses ROS in any way can be added without problems. So, now, where do we actually use the now? Mainly in entertainment, education, field robotics, home and companion robotics, hospitality and robot assisted therapy. Now we would have been in the video, but too bad. Now, robotic platforms. The way that we design this architecture, and then it can be virtually be used on every robot, but based, but ROS based or not, because in fact, now we are not actually using ROS to communicate with the robot. We're just using ROS to communicate between the state machines. Also, if you want, you could also use the architecture to work on non-robotic related projects, but it's up to you. Why did we decide to use the now robot? Well, it's actually one of the best commercially available humanoid robots. There are actually not many other options. The other one is the pepper, which is its evolution. Also, since we actually need to interact with humans having humanoid robot makes the, with a humanoid robot, a human can actually connect to the emotional level so the interaction is a lot more effective. Now, how did we actually implement our architecture for the now? So as I already said, we use ROS just for communication between the state machine, and then we use the Noachi library, which is a Python based library to send all of the comments to the robot. Now, we use the PyIML interpreter for natural human robot dialog. So what is IAML? IAML basically is just a programming language that permits to actually create chatbots. This is at the moment kind of the only way to create an effective way for the robot to communicate with a human that actually with answers that kind of are human since you programmed them. Also, we actually heavily use Cosby solution, especially for voice recognition and also language translation, because the local-based one equivalent are not as effective and also the class-based one are a lot faster and a lot more accurate. Also, why language translation? Because basically if we add some kind of automatic translation, then we can just go to the config file and say, okay, I don't want Italian anymore, I don't want Polish and the robot now can speak another language. Also, we use neural networks at the moment we use it for object recognition and face recognition. Now, this is the actual map of the state machines. So basically the main Python script is called humanoid. When you call humanoid, it will launch these five state machines which are sensing, sight, thinking, acting speak and acting move. So now you might ask why is acting speed and active move connected to sensing? That doesn't make any sense. Basically, when the robot talks, if we didn't do anything, it will still continue hearing and go crazy more or less, wouldn't do anything. So we need to make it mute when it talks and when it moves. Now, this is the inside of the site state machine. Basically what happens is there are different states and basically you can just have one state at a time. So you don't have two working at the same time, just one at the same time. The standard one is idle and basically when I talk to the robot it will change to whatever I ask him to. For example, I ask him, oh, who am I? It will go to face recognition. And then when it's done it will go back to idle. Now, this is the actual brain. Now, don't be fooled. I don't know, but for the sake of readability we cut them down. Otherwise you couldn't see anything. It would be just a mess. It's already a mess right now. Imagine with five or six extra machines. You can read anything. And it's pretty self-explanatory. Also we have movement control which is pretty simple, I guess. OK, now one thing. We can provide for a web-based solution rather than, you know, a local one. So basically we can reduce a lot the CPU load. So basically if we use something like a laptop then everything can run faster since it's on the cloud and who cares. Oh yeah, also you actually need a fast internet connection that might be a problem but if you have it then you're screwed basically. And also obviously the tools provided by big companies are usually the best available and the most reliable. OK, so wait, what? For some reason these slides are inverted. They should not be. So OK, what are the main problems with this? So basically Rosindigo is using Python 2 and obviously if you have a piece of code that runs on Python 3 you will have a problem with compatibility. Basically to solve this problem what we do is just, OK, I need to run a Python 3 script. OK, very easy. OK, when you go in this state just execute a Python script that somewhere else and then get the result and make the robot speak or whatever you need it to do. Also right now when we change robotic platform we have a really, really big problem because every time you need to go you need to go to search around on all the code that you wrote and said, OK, so here the robot speaks and use this kind of this kind of command so I need to change it and you need to do this basically on everything that interacts with the robot. Also then our robot for our needs is not really powerful especially in the sensor department for example the microphone that it has is really, really not powerful so basically we actually need to use an external one to make it work and also the actuator's department so basically its limbs are not powerful enough so if you ask him to I don't know pick up something it will fail most of the time also the grape is really basic it's just like the pinch basically it's just like if there were two fingers even there if there are three so what are the future steps that we will do to actually correct every problem with this so as you could understand this is actually a project that we are working on with a company so basically a lot of this stuff unfortunately is closed source so we can just show you how it works but we want to make the skeleton of the main architecture available to everyone I don't know if you want to make it open source so I don't know if you want to start a new robotic project say just download the architecture put in my robot commands in the config file that we will create and basically you can just add code really easily remove it and you could have a robot working pretty easily also obviously so it's not enough for our needs we need to change it to something newer and hopefully more powerful and also adding a config file so as I said when you need to change something on the robot specific task then you need to change all the software so we just create a config file ok here is the webcam here is the activator for the left arm you can do everything there you don't need to modify all of the software also we might add a companion robot to the now so this is just a temporary solution so it will actually house all of the necessary hardware so basically just mainly a desktop computer and also all of the sensors that we replaced the now ones because they just don't work, they're really bad also I know it's kind of early for conclusion but don't worry we also have a lot of more stuff especially with this guy and also we have a video so basically with this robotic platform we have been able to achieve the main goals that we initially achieved which is great, thank God and also we are actually seeing promising results with our now embodiment and also obviously Python is the art of everything and we are exploiting the batteries included philosophy so we try to integrate everything that is already available and now let's enjoy the demo and hopefully the video ok ok, now let's make it start actually ok can you see everything? yes, Rosco started so basically as you can see ok, this is I have little space to actually make you see everything but for every state machine there is like a log on an X term so you can actually see what it's doing also forgive the spelling mistake and here you can actually see what is actually doing on real time the problem is it might crash ok, doesn't also forgive it everything is really slow because this is just not a really powerful computer so speaker is not so important site maybe whatever something missing here it is ok basically you can see what actually it's hearing so one small problem is that the now for the moment can only speak in Italian if you are an English speaker I am really sorry ok, so for example I could say first let me see if it can actually recognize me whatever, it should work but it doesn't so too bad for example I could ask him for I don't know I could ask him something so heard a lot of stuff so as you can see basically it's trying to to understand everything that I said and it's now trying to recognize it I said a lot of stuff and it's taking quite a bit of time actually one of the main problems is the cloud based architecture the speaker cognition is actually based on the cloud so to actually send all of the audio and then to get it analyzed and get it back takes a lot of time especially if you have really big words yeah mi ascolti mi ascolti ti ascolti kome staj kome staj kam nene e tu Or for example I could ask him if he like broccoli ti piaceni broccoli obviously doesn't work any time so we still actually need to to improve everything still a pretty a pretty young project ti piaceni broccoli what? work and times? as you can see Zelo, da se predtahba in in in glasbena, zelo je prišljena, zelo je in glasbena, izgleda je saj sva, a potem je zelo v italii. Zatim, da se prišlo, zelo ne vsezv, zelo je počkaj ustal, Orfan Black je brokulj, kamikaj začneš zelo brokulji, zelo je, da bo začnit, ne bo začnit, nezelo se predtahba. The way it works. So let's try to ask him something else. It's getting too much stuff. The speakers are really not the best, so you can basically hear anything. Kje je George Bush? Doesn't really work as you think. Kje je George Bush? Come on, do it. Oh no, I used this. So that's the problem. This is like a pretty... OK, so as you can see, the performance is really slow because I'm using... It's I3 computer really slow, so it's a laptop. Usually we use a much more powerful desktop, so we have a lot of processes working and it's really, really slow. Also the internet connection, it's not really the best. Also it's hearing everything. This is a pretty sensible microphone. When I talk to the speaker, it can hear everything. He could ask him for some object recognition maybe. It doesn't really work because as you can see, the camera is not really the best. Let's try anyways. Let's try with a water bottle because why not? Come on, stop it. It recognizes everything. Kje kosa? Vimi di piu. Kje kosa? Dipensare. Vaj avanti, cerfi di pensare. As you can see, it's actually working. Siamo ancora su lo stesso argumento. Direi projettore oppure pettorale. AIGIS, AIGIS. OK, as you can see. Se devo scegliere, prendo pettorale. AIGIS, AIGIS. So basically as you can see here, it tries to understand what it is. And also there is like that light, which is really messing with his camera. And obviously object recognition, especially at this level, if you know neural networks, is actually kind of hard. We could try something else if you want, if you ask me, OK, making recognize, I don't know, a glass, a napkin, coin. If you want, you can ask me. Also I could show him, I don't know, dance maybe. Doesn't really ends well usually, but let's try it. OK. Oh, yeah, actually let me do this. Legim in Astoria. OK. Legim in Astoria. Pinocchio. So basically if you give him books, you can read them, that kind of actually it's pretty easy to do. It's nothing super hard, but why is it not working? No, it's heard lucho, not pinocchio. Why? There are a lot of problems, as you can see. Pinocchio, as you can see, it got to the red machine, and it's actually reading the book from a simple txt file. You can stop it by touching Z. Let's try Taichi, why not? Did you understand it? Or not? In this PC? No. In this PC is melting. Taichi, Taichi. Yes. And basically you can make behavior work. Probably it will fall. So don't worry. It's normal. It actually worked. That's impressive. And also I don't know. Dammi la mano. da mi la mano. Basically, if it's understood, in theory, it should follow me. Dam mi la mano. If you just put the arm just for the joystick, and it can follow me around. Yeah, it's slow. As you can see, as I told you, the actuators are not the best. And we kind of need to replace them. I can stop them. Okay, fantastic. Go back to eyeball. Faze, sun rays, it doesn't... Also, I don't know how well it's actually seen me there, so... Probably that gigantic reflector is not the best for its camera. I mean, it's really not that powerful. Come on. Is this the exact face recognized there or something? I think there is... Let me try this. That's right. I recognize you. At least it's the exact face that says, oh, I don't know you. The theory, like if it knows you, it says, oh, hi, Enrico. Obviously, since it's a demo, it must not work. Or... Oh, yeah, actually, it could also... It can actually also read, but... Maybe... This is big enough. I don't know. Can I try with this? Just that. It means they didn't recognize anything. Legi. Oh, it's actually saying it. Legi. Usually it works. You just need to have like really big letters. That's kind of the only problem. Basically, you need to have big letters because the resolution of the camera and also the focal point is kind of crap. So if you have actually a decent camera, you can actually use this with even like really small text. You can just take a book, put it there and say, okay, read, and it will read whatever it's on the page. If I had some kind of big text, but too bad. Also, you could do a lot more stuff at the moment. I could ask for example, domande, domande, ulivo, for example, ulivo, it understood WWW. Ulivo. I don't know. For some reason, at Google, I have no clue of why, but yay. Basically, as you can see, I could ask some really big problems. Think that this is kind of one of the best speech recognition software that you can actually use. Well, now it's not really the best because there is all of the echo from the microphone and it gets really confused pretty easily. So basically now you could ask everything. Microphone, for example, I could ask a microphone and it should ask, tell me what is the mic. It works actually. I don't know. There is more stuff. Stop. Stop, please. Start woofie. It's too intelligent. Oh my god. Stop. Fermo basta. Fermo la sua pazia. They understand what it wants to understand. Fermo basta. Fermo basta. Let's see. Materna. Oh, did it. Oh yeah, now it's idle. Thank god. Fantastic. Yes, it went out. So basically, to just make it go out of a state machine you say basta, fermo, or whatever you want, there is like a config file for that. It goes back to idle. I don't know. We still have some time, so do you want to see it dance a little bit more or not? Okay, so... Ballo sequenza. Ballo sequenza. Okay, this is pretty hard. Yeah, it's... Oh, okay, thank god. Pretty nice you should see the full manager kick in. So basically, there is a state machine that says, okay, every time, that's actually has like the priority that says, okay, it's full, then go back up. And it works as you can see. And there you have it. Good as new. I don't know, should I try to make it recognize something? Depends on what you prefer. We have time for a few questions. If you want demonstrations. I don't know. Tell me what you want to do. If you want, I can do Q&A. If you want, I can continue showing stuff that you can do. Tell me. Maybe one question, somebody? Yes. What? Behind, on the back. Question is, what happens if it falls on the other side? If you want, I can show you. You can do it softly. Any other question or challenges for the robots? Hi. Actually, this camera, I was wondering about, is it really... I mean, the color camera, what we are seeing is the same thing that the robot sees. Why don't you, for instance, use black and white and some sort of infrared light, so you are not dependent on... So the camera, the scene. Is it the same thing that we see right now, or you do some sort of black and white and infrared maybe to be not dependent on the environmental light? And the other challenge or question raise your hands. One question, can you pick up some objects? Can you pick up an object on the ground? Please speak in the microphone. I think it will be easier for everybody. So, as it is right now, it kind of can take objects on the ground. So basically, you can just create an animation for the robot and say, okay, when you are here in this spot, then you can just run the animation to pick up the object. But it's not actually a smart object pickup. It doesn't say, okay, I recognize that is the object and then I will pick it up. It just says, I go here and then I execute just a stupid behavior. That's go down, close the finger and get back up. We actually do it, not really that kind. If you want, I can kind of show just there are some things. Oh, yeah, let me try. Prendi. Yes, he does it. Great. Dammi. Dammi. Dammi. It looks hard. Give me back the napkin, but it's not. Oh man, this is so beautiful. Yeah, it's Dammi, no? Yes. Let's try again. Dammi. Why is it not executing it? Well, yeah. You're taking it out of his hands. It's stealing from a robot that you just did. I will. Yeah, basically you can just, it's kind of a cheap way to do it, but it works. Great anyway. Well, thank you. Thank you very much for showing us that robots don't like broccoli. And please give him a warm applause for his work with his robots. Thank you.