 Maybe just hold on to it. OK, very good morning to everyone. My name is Yaldanand. I go by Yalu for short. And today I'm excited to talk about a personal research project of mine. This is something I'm curious about, and it's still in a very experimental stage. And the idea is, can we map in digital twins for autonomous navigation in the real world? A little bit about myself before I begin. As Marco mentioned, I'm a software engineer at the Open Robotics team at intrinsic. And I helped to develop, maintain, release several core packages in ROS2 as well as OpenRM. And yes, it's really my privilege to be here today to talk about my interests. Before I start, can I get a show of hands? How many of us here have worked with mobile robots before or even better done any kind of mapping or navigation? Oh, wow, that's a lot of people. OK, OK. So I'll try not to smoke as much, try to be factual. Yeah, let's get it. So today I'll be talking about a few things. We'll start with a high-level overview of what autonomous navigation is and the role of mapping in this process. Some of the practical challenges with mapping in the real world. And this concept of digital twins, and how do we generate them and how do we build them? And then we'll get into mapping and navigation, and then we'll talk about some pitfalls I've observed along the way. So it sounds like most people here already know what autonomous navigation is, but for the sake of completeness. In robotics or self-driving, autonomous navigation is basically the process of getting an agent to move from one point to another, where your only input is telling it where you need it to go or arrive at. And the agent is able to figure out how to get there. And along the way, it's avoiding obstacles. It's finding the most optimal path to get there. And it's respecting certain constraints you give it, how fast it can go, how fast it can pair, et cetera. So there's several components that go into autonomous navigation. The first is you need an agent. And typically, this agent needs to be able to sense the world around it. This is in the form of, let's talk about mobile robots for today. And sensors come in a variety of forms. You have lidars, cameras, stereo cameras, et cetera. And these help the agent develop an understanding of the world around it. We have our eyes, stereo vision that helps us perceive the world. We also can infer depth from this. So similarly, in robotics, we have sensors that can do the same. The second is we need a map. This is generally true for indoor navigation. We need a datum to help the agent know where it is in space and how it can navigate around the space. So a map is basically a representation of the world that the agent is moving around in. And the next important component is localization. So once we have this map or this representation of the world, an important part of navigation is for the agent to figure out where it currently is in this representation of the world. And this process is called localization, or it's able to infer where it is right now, given this representation of the world. And finally, there's a lot of planning and control that goes into navigation. So once you give it a goal, it needs to plan a collision-free path to get to the destination. And the wheels have to move to get it to follow this path. And that's where the control also comes in. So I would say this is a very high-level basics of basic overview of what autonomous navigation is. So what's mapping here? I think we talked about it briefly. Mapping or commonly what's practiced is simultaneous localization and mapping or SNAP. And this is really solving the problem where we need to construct this digital representation of the world for this autonomous agent. And we need to do this while the space is originally unknown. So we just throw a robot in the space. It doesn't know anything about it. It needs to construct a representation, some representation that it can then use to firstly localize, to plan, and then navigate. And this typically relies on a combination of different sensors to perceive the world. So linears are quite popular in the field of robotics and self-driving cars. And these are basically usually laser-based cameras. And they shoot out a beam of certain frequency, a light wave, and then it reflects off the surface, comes back, can often tell you how far away that hit was. And sometimes we can also infer the intensity of the light that came back. So we're able to tell whether this is a solid surface, glass surface, shiny, matte, et cetera. So linears are getting really advanced these days. We can do 360 degrees cannon. So it's not just a single point that goes out. It rather shoots around in 360 degrees. And it has a field of view in both the vertical and horizontal domains. If I am used, inertial measurement units that keep track of the current kinematic state of the robot, how fast it's moving, how fast it's accelerating, et cetera. And there's cameras. And there's a whole bunch of cameras that range from monochromatic cameras to stereo vision to depth sensors, et cetera. And typically, the output of this mapping process is this representation of the world. It sounds pretty abstract, but from a practical standpoint, it's typically like two formats. The output is generated in. For 2D representations of the world, we use something called an occupancy grid. And that's what you see on the right-hand side here. You can think of it as a PNG image. And it's just grayscale. And if the pixel is black, it means there's an obstacle. If it's white, it means it's obstacle-free. And gray areas usually mean unexplored or uncertain. It's a very simple representation, but you can think of it as we generate a grid of the world. Imagine this 3D space. We just project it down at a certain height. So I choose this level, maybe one meter above the ground. And then I imagine I cut the plane of this room right now. And if you project that down, that's what this image looks like. So it's generated at a specific height from the ground. Another common storage format is 3D dense point clouds. So if you have sensors set up that can generate the 3D description of the environment with these 3D lidars or other kinds of sensors, you probably store it in the point cloud format. So it's just a whole bunch of 3D points that store. It can go up to millions or maybe tens of millions of points even. There's other forms I haven't mentioned here, but you can even do visual-based slab. And then in this case, you're storing key frames. Or say you're looking at this image right now. And then I want to store this corner of the table. Or I want to store certain key things that I see in this image. And this will typically be stored as a bag of words or something else. But those are some details that we can ignore for today. So as a robotics engineer, I've spent a lot of time working with mobile robots. And typically when we want to deploy a robot somewhere, it's based mappings, the first thing that you do. And personally, I've been very frustrated by this process because it requires you to manually drive around a robot, either tele-operating it or sometimes as a big enough robot, you have to ride the robot to drive it around the space and generate this representation of the world. And it takes time, especially in really, really large environments, something like this room can be done in a few minutes. But if you want to map this entire institute, for example, it's going to take a lot of time. And furthermore, if the layout changes, for example, we close off a certain section, we add new furniture, or something changes physically. You probably have to remap, which means you need to come down and do this process all over again. And so this kind of gave me this idea to explore, which was basically, can we map in simulation and then take the output of that and then run it directly in the digital world, in the physical world? And yeah, so if you're like me that plays a lot of video games, one reason I do that is, although I like the graphics in the real world, the gameplay often isn't as fun. And I feel the same way with mapping, right? It's something that's annoying in the physical world, but I think the gameplay can be improved in simulation. And this is where gazebo comes in, gazebo is another open source tool that we distribute. I'm sure a lot of you are familiar with it. It's a simulator, it's a physics-based simulator that does physics, rendering, it has a whole bunch of plugins that can try to emulate the physical world as closely as possible. And we call this high fidelity simulation. So you're not just able to reproduce visuals from the world, but also interactions, contact forces, and various other physics elements. So how does gazebo help? So like I mentioned, it's a simulator. And with simulators, you can define a rate at which the simulator can run. And there are some constraints, of course, depending on the hardware that you have. But you can typically run the simulated world at a faster rate than the real world. So I can say, you run the simulation at 10 times real time, real time rate. So if I'm sending a command to move the robot for one second, it might move the robot as if it was getting that command for 10 seconds in simulation. So you can move the robot faster. And if we can move the robot faster, it means theoretically we should be able to map faster, if we're able to process all that data at this higher rate. And there's some other benefits, too. Like I said, we can replicate the physics in simulation. And we can build really, sorry, specifically why gazebo is because we can generate these very realistic worlds, right? There's a cool talk I've linked here by my colleague, Cole, who talks about photorealistic simulations. So how can we build simulation environments that look very similar to the real world? Another cool part about gazebo, it has all of these plugins. A plugin, you can think of it as basically a component that emulates a physical system. So there's a lot of model plugins that emulate various kinds of sensors. So in this case, we're interested in Lidar sensors. And so there's some images here of Lidar sensors in gazebo. You can see the rays hitting an object and detecting an obstacle. And the cool thing is these sensor plugins, they're very customizable. You can define a type of noise that you want. This sensor has a Gaussian noise. Has this mean, this standard deviation, which you can typically find in a data sheet. So that's really important because you don't want to get super accurate results from this simulation, right? Because no sensor is that accurate in the real world. So being able to model this noise really, really helps. And the cool part about gazebo is, like I said, you can, one challenge is dealing with environments that change. So in simulation, if we can quickly rebuild that simulation world, deploy the robot, do this mapping at this faster rate. I think that solves a lot of the annoyances that I talked about earlier. So what's my plan here? So this is the idea that came to mind and I started exploring. So can I build a photorealistic digital twin of a real environment? So this is, start with the gazebo representation of a space that actually exists. Can I run all my SLAM algorithms and ROS nodes and all of that with the data from the simulation? So the sensors from the simulation produce data. I'm using that data to generate this map. And I take this map, which is the occupancy grid that I talked about before, and then I give it to a real robot and then have the real robot move in the real space. So this is kind of the research question that I'm trying to explore here. So the first question you may have is, okay, so how do I build these digital worlds quickly? And a tool that I wanna share is something called RMF traffic editor. So this comes from the OpenRMF project that Marco alluded to earlier. And this is a great tool. It helps you very quickly build 3D environments. The only input you need is a 2D floor plan. So you start off with a floor plan, such as a PNG image you can download from the building website or get it from your landlord, I don't know. And then this tool lets you annotate this 2D floor plan. So you're basically annotating features that you've seen the floor plan. Here's where a wall is, here's where a door is, et cetera. And then you're very easily able to take this annotated floor plan and quickly generate a 3D environment that has all the physics and everything in visible. And this slide kind of covers some of the features of traffic editor. So you can, there are utilities to add walls. So the first gift here, we're sort of highlighting the walls in the floor plan. And second gift, we're adding doors. We support different kinds of doors from swing, sliding, and telescopic. Most of the kinds that we've seen at least in Singapore. You can tell, you can annotate where the floor is. You can even give different textures to floors. Like this is a marble floor, this is a carpeted floor, et cetera. And these models, that's the most important part because we need to have these models that we see in the real world. So there's very quick ways to drag and drop these thumbnails of models that will eventually get generated as the physical model. I won't go into too many details. So how to use traffic editor. We've done a Rosconn workshop last November, last October. There are some slides from that workshop that go into more details of how to use this tool. And we also have documentation linked here. So that's something we're checking out later. So once you have this annotated floor plan, we have another tool called building map generator. And this takes the annotated file. It's a YAML file. And then it basically auto-generates your Xebo file format, which is a .world file. So it's just as simple as running the script. What we typically do is we even automate running the script. When we build a map file, we have some hooks that will automatically run the script. So it's a very seamless process. You're just having fun with your imagination and annotating this map. And then you save, you build, and then you can immediately open this 3D world in simulation. So that's what I did here. So this is a floor plan of our office in Singapore and Wannong. And I started with a floor plan. I took some pictures of the real world. And then using my limited imagination, I tried to reproduce or annotate that floor plan with the different elements. And here are the results, right? So I'd love to get a feedback on this. And all of this is open source, by the way. It's in our RMF demo. So possibly there's a link to that later. But I'd love to get a feedback on this on how close you think it is to some of the images. It's hard to tell now if you haven't been to our office. But to me, I think it's pretty close. Maybe I'm biased. All right, so the next thing is, okay, we have this digital world. How can we now generate that map? So the cool thing about Ross and Gazebo is whatever code that you want to run on the real robot, you can run in simulation. There's no differences. Maybe you just have to say use the time for my computer or use the time for my simulation instead. But rest of the code, everything's the same. It's the same nodes, same logic. It's getting the data and that same topic. So everything's the same. So it's not like this is just a visual simulation. And then when you have to run something in the real world, that's a completely different code. It's not, it's the same code. So whatever code I would use to map the real world, I'm using the same nodes, the same launch files here to start up these different nodes that I need for mapping, right? And but before I can do that, I need to spawn this robot in simulation. There's a couple of tweaks I make to benefit from simulation. So firstly, I have to make sure that the LiDAR model in simulation is comparable to the one in the real world. But then I also do some hacks, some cheat codes. I increased the range of the LiDAR because my real robot has a range of maybe 12 meters. And then that means if I want to map a wall that's beyond 12 meters, I have to drive closer to that wall. But in simulation, I can just say, no, my LiDAR, it has the same noise, but then it can actually hit 20 meters or 100 meters ahead. So that's really cool. I have to drive lesser now in simulation. So that's one of these hacks that I do. I also increased the field of view. Maybe my real LiDAR has a, I don't know, 270 minus 270 field of horizontal field of view in degrees. So in simulation, I can just say, I have a 360 degree LiDAR. Yeah, I'm sure LiDAR manufacturers hate this, but that's one quick tip that I just wanted to share. And then the most important thing here is I need to keep the LiDAR at the same height as it is on my physical robot. So I mentioned that the way this 2D slam works is we cut a plane across a certain height of the 3D wall and we project that down. So I want to make sure that my LiDAR in simulation is at the same height as the LiDAR in the real robot. So that's just adjusting some of the model values to make it at the same height. And then you run the slam algorithm and then you save the map. And saving the map again, it's just saving that occupancy grid. So here's like a cool video montage. I should have had it playing while I was explaining, but since it's a robot that you can spawn in simulation, so all the blue lines you see are the LiDAR hits. And as you can see, I've cheated there with using a longer range LiDAR and create a horizontal field of view. So I can map really fast though, right? And I'm probably moving the robot at a much faster rate too. So I'm done. And that's the map that gets generated, right? Yeah? Yeah. You have no prediction on the wall depending on the type of material, right? Yeah. That's a great question. So as part of the model description, you can specify what are the friction parameters of the contact points. So there is a certain representation of this robot, right? And it has wheels. And so depending on the physics engine, we use ODE, which is an open source physics engine. You can give it certain contact parameters. Like this is the coefficient of static friction in a dynamic friction. There are some other contact parameters that you can actually data continuing on. So how good is this map, right? So the image on the right side is the one I got from the simulation. And the image on the left side is what I generated with a physical robot in the real space. So I think it's pretty close. Obviously, the real robot has a bit more noise. So maybe I didn't model the noise as well as I should have. And maybe there's a bit more uncertainty in certain areas and the lidar hits were much cleaner in the simulation. But that's fine. Even when you drive in the real world, the real world is never gonna be the same as when you map it. It's always gonna be humans walking around. There's gonna be some differences. So we rely on this localization a lot to counteract some of these differences that we see to still maintain an understanding of where we are right now. So I think it's not bad. I think it's not bad. Okay, so the proof is in the pudding. Does it really work? I'm gonna play this video. And what's happening in this video here is this camera feed on the bottom left is from the camera mounted on the real robot. I sent the map to the real robot and then we're running the navigation status. This is navigation two running. And the first thing you need to do is localize the robot. So you have to give it some help. If it doesn't, okay, where are you initially starting off in this giant world? So I kind of know where it is in space and then I kind of localize it. You can see a whole bunch of green dots around it. So this is a probabilistic localization we do with AMCL. There's a initial belief of where we are and eventually that belief updates. As the robot moves around the algorithm updates to firm up its understanding of where the robot is. So as the robot moves, you see that those green points kind of converge to strengthen the belief of where the robot is in this world. So I'll play the video. At some point I had the thought of also opening up the digital, the twin to kind of get a cool side-by-side of the real camera feed and the digital twin. All right, just play the video. So this is the robot that's initially localized. I give it a few goals to see if it can move. Just waiting for that belief and see if we have the camera. And then I give it some goals that are initially nearby to see if things work. That was pretty surprising work. And then give it a few other goals. So here's a robot autonomously navigating in the real world with that map that regenerated in simulation. And this is that digital world that we use for simulation. So we're right now in this kind of corridor that's facing this country here and then giving it other goals. I thought the shower was pretty cool. Yeah, here's a cooler image of that localization. So there's a lot of line art noise. So it doesn't know where it is, but then it immediately localizes. You can see that line art is the AMTL immediately snapping and the robot figures out where it is. And then it's able to try it and see that my desk in the office. So again, this is like very early work on this concept that's personal to me and I've been trying to explore. And I think that's great potential. A lot of things that can be good, of course. And I'm really hoping to talk to more people here who had experience with mapping and find ways we can make this part of Ross even. How are we at time? We have time for a couple more slides. Okay, yeah. This isn't the Oscars. They start playing the music in the walk-up. Yeah, so a couple of gotchas I experienced during this process. Obviously, one of the big limitations is the model. So to build this digital twin, you need to have models that exist in the real world. So that takes effort. Honestly, that's the most challenging part of this process. But the team at Open Robotics and even the community has contributed a lot of models. And we host a lot of these models on fuel. That's like our drop box for models. There's a whole bunch of everything that I used to build the digital twins out there. So that's a great place to start. And if you end up building models and you want to contribute, this is why you should help a lot of them. So others can also use them. It's on nuances, right? So, because even with a physics engine, sorry, it runs a physics engine and physics is important for sensors. So one thing I realized was initially when I used a certain GPU model, sorry, I used a certain LiDAR model in simulation, I was using this type called ray. And I found out later that it just uses your CPU alone to figure out where the ray hits. Each of these lines are kind of raised on laser beams that emanate from the sensor. And it hits some point in the world, right? So with this type of sensor, I realized that, oh, it's actually just hitting the collision meshes of the models. So every model you can think of it as having two components. It has a collision component. So collision is also mesh. There's two different meshes for every model. There's a collision mesh and then there's a visual mesh. The visual mesh helps you render what it looks like. The collision mesh, it can be the same as the visual mesh, but something that we do to improve the performance of our simulation is we use these simplistic collision meshes. These are like primitives, they're just rectangles that kind of encompass the bounding box of the model. And this really, really helps speed up simulation. And in the RMF project, we're simulating really big world. So this is important to do. And I realized that actually, oh, our technical artists helped us craft these really nice collision meshes. But it's actually a problem with this type of plugin because the rays just hit the collision mesh. And then if you map with this, you end up with like these kind of boxes. This box here is this back of this chair. That's actually the collision mesh. So it's not gonna be accurate. So the robot could not localize in this kind of map. All I had to do was change that one line. It's from ray to GPU on the score ray, right? And now it's using my GPU. And with GPU, we're able to do all kinds of fancy ray casting. And now you can see that the rays actually penetrate the collision mesh and they're hitting the visual mesh. So that helps generate the more accurate representation of the world. And yeah, now I'm at the end of my talk and I'm happy to take any questions. I'll just leave this playing. Yeah, please go ahead. So with that, do you have built a digital twin of the world? Is there some, what would happen if you use that digital twin slice it digitally and convert the mesh to resolve the slight mesh of the result directly into a up-to-date grid? Would that work? Instead of how would you see the simulation? Yeah, I think that would work. So if you export this entire digital world as a single mesh and then you take a slice of that mesh and somehow you generate an image from that view of the world, but the problem there is you're not accounting for the sensor noise. And that's the most important thing with mapping. But if you had a way to factor that in, that would be pretty cool. Oh, so because I'm new to this video, so I'm sorry, I'll be questioned in a little bit. That was a good question, yeah? Yeah, so the sensor noise does affect the output of the up-to-date grid. It definitely does, yeah. So if you look at the map that's generated from the real robot, right? So even these walls here, it's kind of a wavy line because the sensor has a certain noise. It can't be super confident about exactly where the laser hit. So there's always this noise that's important to model. And the goal is to do that in simulation. I think that's the challenging part. And the cool thing is because Evo has a way to model this kind of course. There's a question? Yeah, that's a question. Yeah, I'm just wondering like in the situation of a deployment, right? Why do you want a noisy map? Like if you can do a perfect 2D size, you get an ideal map. Will it be navigation step better even if there's noise in the real world? Yeah, the problem is your localization algorithm needs to be better in that case. If you have a very perfect slice of this opportunity grid, when you run your robot's navigation stack with that map, what you're doing essentially is you're getting live sensor data, right? The laser is actually hitting all these points in the world and you're getting a live sensor input. The goal is to take the sensor input and find out which part of the map does it actually match to, right? So if your model is super perfect but your sensor from the real robot has some noise, that localization is fighting extra hard to just snap to the right position. So ideally, if you spend more money on grid sensors and you have a really good odometry on your robot, you could get away with that. I think it's totally costly. Albin, yes? Yeah, thank you. Could the robot be potentially, potentially, okay, I know it doesn't mean that we get drone. That means it's fun. So something like Terminator. So we have flying drones around. Second question is, we took a ride by bikes to land post one. It's somewhere to us. Oh yeah, the sticker with all the bicycles, yeah. That's it, that's it, yeah. Okay, on the way back, we had an intersection that the ride leader said, this is not yet on Google Maps. And if we were sending a, okay, let's call the tribal bus car, as your robot, would he end up on the sidewalk or would he be able to navigate this way around that? Yeah, thanks for the question, Albin. So two questions there. One is, can we do the same with a drone, like a flying type of robot? And I think it's possible definitely. So we have drones that are, can be fitted with the same type of sensors. LiDARs, predominantly I use, that can then map the world. So people are already using this type of technology with drone navigation indoors. So I think it's certainly possible to do that. Just probably the representation of the world will be different here. We're dealing with 2D optical secrets. Maybe there you're storing a 2D point cloud or maybe you're storing this feature map or a bag of words that I talked about in the visual side. Second question is about, I think navigating in unexplored territories. I think this really depends on the navigation algorithm itself. Mostly the autonomous cars, you do need a map, but you're also relying on things like GPS, et cetera, to update your belief of where you are in the world. So I think it could be possible, it really depends on the algorithm, whether the algorithm allows the agent to go into an unexplored territory when it's not mapping. Okay, so I've seen that there's like two extremes that you can go to. One is like making it more like the gaming simulation where you have an absolutely amazing kind of photorealistic digital twin to work with. The other one, you can also go to the other extreme where you just go with the floor plan instead of any simulation like the gentleman mentioned that you can just take a cut to figure out the map. Which way would you foresee this project going? I'm definitely leaning towards a simulation approach because it doesn't restrict you to just rely on 2D occupancy maps. Sure, the slice approach works with 2D maps, but this kind of approach even lets you try other more sophisticated navigation stacks, right? So what if you're doing visual navigation, just pure vision-based navigation that's taking in two stereo images figuring out where you are. A lot of the food delivery robots in IDLOW, for example, do this, they look upstairs, they look at the ceiling to figure out where the robot is right now. That's called stargaze navigation. So if you want to do stuff like that, you want to try that out in simulation. You want to have that same kind of representation in the real world. So I think there's more benefits to keeping it general, but if you want to save computation, I think that's also another approach. If you want to try 3D slam, this would also work. But if you want to test some perception algorithms, maybe you're doing some perception-based, some deep learning inference, or you're doing something else, you can use the image from the simulation and then run your inference. So that, I think there's some benefits to this. Yeah, so we don't have more time for questions.