 So, welcome everyone. My name is Marco and I'm going to present this new library that we started recently. I'm a PhD student in robotics at the University of San Martín, Spain. I work here at ASTAR. I have a research scholarship to finish my PhD there and I'm doing some little bit of language research. But my main topic is computer vision. I do the computer vision for robotics at the robot lab. I wanted to ask first, does any of you have worked with computer vision related stuff? Can I get a hands up? Yeah, anyone, such as old TV, stuff like that, point cloud library, something. Okay, cool. So, because there's a lot of stuff going on here, a lot of background that I'll talk. So, I'll tell you about the story about how it got started. It's pretty new, as I said, it's very in an early stage. But I wanted to talk about it to get an idea of what you guys might think about it or how does it fit people's needs. So, I come from this robotics lab, as I said, in Spain. One very cool thing, and not that cool at the same time, is that we do the whole thing on the robots, like we do everything. For example, with this space, we will be cutting the metal to make the face. That's very cool when you talk about it, but when you have to do it, and you're the computer vision guy, you're like, why the hell am I cutting metal? So, it's a pain in the ass because we have to deal with all the problems in robotics, and it's actually a big problem. So, in the end, for me that I'm doing a PhD, I end up having not that many publications, and I don't feed that well in the system. But anyway, it's cool, you get to touch a little bit of everything. Luckily, I'm not that into the hardware part because we got some people that deal with that, but still, you get to learn a lot. So, my role on the lab was doing the computer vision part, as I said. So, we have our own framework, as I said, we do everything. So, we don't use any other framework like most of the people do. They use cross or any other ones, if we will, or on one from scratch. So, we have to deal with our own tools for the robots. And my main task there is to provide the team with the tools to deal with the vision part of the robots. So, at some point, this is what Robocon, the framework, looks like, like what we have there. For those that don't know how to program robots, it's mainly like component-oriented programming. So, it basically is a component that talks to each other through interfaces, and each component will do a certain task. So, let's say I would have a component to grab images from the camera, and then I have another component that will be processing those images and maybe detecting tables, something like that, through an algorithm like Rensack or something like that. So, that's how it works. We have components, we have the interfaces to talk to the components, and then we have some files, some other stuff, documentation, c-makes, and here we have the classes and libraries. So, we build our own classes and libraries to make it easier to develop components, right? And there was like a lot of libraries by the time, and there was one missing, like a computer vision library for a framework, right? Like, specific for a framework. So, there was this summer that was my computer, and I was like, why don't we make a computer vision library specific for robots? I mean, there's all this out there, but I found it that every time I was coding the same stuff to deal with tables and subjects, and it would take me a fairly amount of time, and I was mainly using the same thresholds and hard-core numbers. So, I was like, I would maybe want a bit of a layer up, so I don't have to deal with this. In the end, I'm mostly like trying to train it up with detecting cups or bottles or something. Same stuff, it's just like, so I can make my robot move, right? So, with this idea, a little bit of half, well, actually a really good help from the program from Google Summer of Code, and this student creep up. It's a really good student that I got for this project. We managed to start a library for this framework, right? So, yeah, the thing is, why am I doing this? Like, why am I doing this reporting library where does it fit? There's a computer vision library already, there's OpenCV, there's ECO, those are the main ones, they're really good, actually. But I want it to be specific for the tasks we do in the robots. So, we want it to be very easy to use, just take a few lines, and then you can start detecting stuff, like just detect something and then go ahead. We found out it was good for beginners that because it was very easy to try out stuff, just a few lines, and then you can just go ahead and try out stuff. And it's good for computer vision researchers because it allows you to use the same different approaches to one goal and try them out very, very fast. So, here's a bit of where it stands. It's a library that goes on top of, as we say, on researchers. It's on the shoulder of Janice, right? We have OpenCV, a local library underneath. We try the main libraries that we use. But the thing is that you don't have to care about having two libraries, two different stuff. So, you can just do it through one common thing and then just have some presets and just go ahead and detect stuff very fast. So, that was the whole point of it. So, if anyone wants to just jump on and try to get that started on it, it's fairly easy like anyone. It's a pain in the ass to install dependencies, I won't lie, because it's research. So, we try to use the latest versions of everything. So, you probably have to install OpenCV, latest version, PCL 1.8, or trans version. You don't actually need CUDA extensions, but they're highly recommended. You might need the country package from OpenCV. Yeah, so that's mainly getting all that compiled takes especially a lot of time. And then you can start running the, just clone it and make it and install. It's fairly straightforward as any of the libraries. I will talk a bit about the basic structures of the library. So, you can get them in the app. Well, actually this talk was going to be just 30 minutes, but the speaker before dropped out, so I got another extra 30 minutes. So, I'll try to do some code and show you how this works later. I hope it's not too boring and it's kind of cool. So yeah, I'll explain first a bit of the basic structures that we have. So, we divided, mainly we divided the option of detection with two main parts. We have a training and a detection part. So, we have trainers. So, you can train your model with data and then you use that on the detection part. So, you have different trainers and different detections for either for 2D or 3D data or stuff like that. So, here you can see the detectors. So, first of all, you train something. A trainer will implement the train effect. And then, they all inherit from the trainer. It's a virtual class and yeah, you can initiate the trainers and give them the data you want to use for the training and go ahead and train it. Then, you will use a detector. There's a bunch of detectors, not many yet, because this is really new stuff. But then, once you train with a proper trainer, you can go ahead and use that detector. Over a scene, that's a data structure that represents a scene that you want to detect. There's two kinds of detections. There's one that you have the regular one that runs over an object and the one that it will run over as a whole scene. So, it's like kind of like pre-segmented or after or not cemented, right? Yeah, so those are the detectors. And then, what you get out of that detector will give you a detection, right? So, that's what you get out of that act of detecting something. You get what is called a detection. And then, this is a structure, again, that will contain image, that will contain the pose, label. It will contain everything that is, even the confidence. You can get the confidence for, not for all the other detection algorithms, but some of them you get a level of confidence of the detection that just happens, right? So, here's a few of the detectors that we have already implemented. We have 2D detectors, hall, cascade, face recognizer, 2D for local features, local features like SIFT, SERV, those kind of features. And then, we have 3D detectors for global features. I'll show you the demo later on this. Also, another basic structure that we have is the frame generator. So, you can input data onto the detectors and trainers, but you might just want to generate the data from an input source. So, an input source can be either image from a camera, can be point clouds from our GBD sensors, like the Kinect, or we can use image files or point cloud data from files. So, I don't know if it's here or it will be later. Yeah, I'll show you how this is implemented, but this is basically to grab the image or point cloud from either a device or from data on the art drive. So, how does it look on code? This is the frame generator. This is an example of a frame generator that will generate frames from the point cloud, a point cloud from the Kinect or any other GBD device. Basically, we are using a frame, we declare a frame, then we declare the generator and then we get the frame from the generator. Pretty straightforward. This is the type of generators that we can use. So, it's also plate based, so once you create a frame generator, you have to set up the kind of scene that you want to take, either image or point cloud, and the input that you want to take, either files or devices, right? So, with those two things, you can start grabbing stuff to process. Then you have the trainer. How do you train? You declare a trainer, you set up, if you need them like this, this is like the whole, not all the trainers need everything, but you set up positive samples, you set up negative samples. You set the numbers of features, and then you have this flag that you can set up, and then, again, you go ahead and train. That will generate the training that will be used on the detection, and then once you create the train, the detection, you declare your detector of the kind of detection that you want to use. Just use the train app, you give them the train data location, and go ahead and detect. To obtain the detection, you create a detection, and once you do the detection, you can just get it on the detection, and then you can access. This is accessing the detection, so you get the information, and what we're doing here is getting the openCV image from that detection, and then you can show it. Sorry, is the example code, is it also findable in GitHub? These samples are part of documentation on opendetection.com. There's also a bunch of examples. There's actually examples of everything inside the library, so you can just go through the code. I'll go through the few examples later, and there are actually examples from the library, so you can find them there. Actually, now, I'm going to show you how to build a friend generator. It's going to be fairly easy, and I hope it's not too boring for you. I want to see this. It's all switched now here, and I'm going to build an example. It's going to be a friend generator. Get this. So, can you see it from the back? It's good. Let's start. One thing we don't have yet on the library are visualizers. For visualizing stuff, what we need to use is either ECL visualizers for the point cloud, or OpenCV for the images. This year, we got this project, luckily, to Google Summer of Code, and hopefully, one of the tasks will be building our own visualization tool. So, we wouldn't have to do this anymore. This is going to be our visualizer. Then, I'm going to create a frame that is going to be off point A. That is going to be a frame. If I'm going to save this, I will get some syntax. I'm going to make it as an example of the library. So, it's going to be maybe here. For Stasia, frame... We have the frame here. Now, I'm going to create the frame generator that is going to be... It's going to grab this type of scene and just go ahead, past here. All this scene. And, it's going to get it from the device. I'm going to use a Kinect device for this. You remember, as I said, when you create a frame generator, you need the type of data and the source. The source now is going to be device. This is just a generator type device. Now, I'm going to just check frame generator is valid. So, while it's working, I'm going to go ahead and grab a frame. And, I'm just going to show this frame. So, with the visualizer that we created from PCL, I'm just going to go ahead and use it. I'm going to remove the last point cloud, make sure it's clean, and I'm going to add anyone type point s y r g p a. And, it's going to be the frame that we got. And, I'm going to get the point cloud out of the frame. Because these frames have some information, not only the point cloud. We're going to have to get the point cloud out of it to show it. Hopefully, when we get our own visualizers, we don't have to do that. Probably, the visualizer will take care of everything. So, it's easier for everybody. Then, we're going to do the skin. We can move around and update. Then, I'm going to delete the frame. So, that's basically it. Maybe it's too big, but I hope you can see it. That's the main thing when getting in the frame and everything. I'm going to add it now to the examples. So, I can... Wait, okay. On this scene, I'm going to add it to the examples. Probably, there's actually a frame. Yeah, there's one here. So, I'm just going to copy it and add it. I'm going to add an example that it's called... Oh, we missed it. At least it worked. Hopefully, it might compile and check. No, it didn't. Yeah, I forgot to include it. I have it somewhere. Right. It's a frame generator from the library. Let's see how it goes now. So, for those of you that don't know what this is, this is just a Kinect device, same as the one on the Xbox. It will give you the 3D and the color. So, it's got a camera on it. It's got a projector. And it's got the receiver for that projector. The projector will project a point cloud that has a certain pattern. And then the receiver will receive that pattern. And because of the deformation in the space of that pattern, it will calculate the distance to those points. And, therefore, it will get the 3D and then match it with the color. So, hopefully, we're able with the tool that we just created, should be able to... I suppose something, yeah, right? So, that should be... If it, hopefully, is crossed, it works. And, yeah, that's it. I don't know if you can see it properly. It's down side, because it's always down side. That's how... It's good in it. But this is the... I don't know if it's... Yeah, this is you guys. So, this is the point cloud that we're getting. So, this is how we grab the information here. That's how the grabbers work. It's fairly easy to change it to the camera. We would only have to... I don't know how much time we have. I'll try to make it quick. To grab the camera... I think we'll just go to the next example, because the next example will use the camera. Anyway, I'm just going to show you how to grab from the camera. So, the next example is... What I'm going to do is I'm going to do a classifier to detect faces. That's actually a fairly easy, very common application. But, it's very cool to see how easy it is with the new... This new library. Even though you might not have too much options yet to tune it. So, I'm going to go ahead and add the plugs here. Save it as easy as that. I need the training... I need the training for the cascade detector. So, I'm just going to grab it from the input. The first part is just the training location. So, I'm going to build the detector first. We have the training already. One thing very cool that we do is that we provide the training data for these examples. So, we can just go ahead and start using them. This is a 2D detector. So, it's under the namespace GQD. And G also means global because it's using global features. This is going to be a cascade detector. And I'm going to just name it detector. I'm going to set the detector, the location of the training data. Go ahead and start the detection. That's it. We're detecting... Up to now, we're detecting faces already. Just give the detector the data. The same way that we did with the frame generator. I'm going to generate frames from the camera and pass them to the detector. And go ahead and detect. It will start detecting faces as they get the frames. So, let's get the scenes from the generator. This time, it's not going to be a point cloud, but it's going to be the same image. Which is the thing that we... The images that we get from the camera. So, we're going to go ahead and have the same input. It's going to be generator type. It's going to be device. Because we're going to use the camera. It's going to be a frame generator. And we're going to use the first camera device. The first device that it's going to live after the camera. Because we don't have any other. And as we said, we don't have the visualization part yet. So, we're going to create one using OpenCV. This is OpenCV. Here. I'm going to call it color lake. I'm going to go ahead and do the same way as I did. The frame generator. While it's valid. I'm going to add this white cake, white cake. That you're always doing OpenCV. I actually don't know why. But if I don't add this, it doesn't work properly. And they do always. So, I just add it. I'll be happy. Someone has an explanation for this. Okay. So, while the generator is valid. We're going to get a scene. Then we're going to pass it to the detector. The scene image. Okay. The scene image. Yeah, thank you. Scene image. Call it scene. So, we're going to go ahead, frame generator. Get next frame. Get next frame. Now, we're going to pass that frame to the detector. The detector is the detector. We're detecting over the whole image. So, it's going to take only the function that we want to use. We're going to pass the scene there. And, as I said before, what we get out of the detection is, out of the detector, we get a detection, right? Detections to D. So, we get detection. So, now, if size of the detection is more than 1, than 0, we're going to show on the image, we're going to show what we detected. Right? The name of the window. Detections. Getting the information from the detection. If not, we're going to go ahead and show the image from the scene. So, there's no detection to show. We just don't show it. Just the regular image coming out of the frame generator. It's going to be scene. And, we're going to get a CV image for this visualizer. Again, we're going to delete the scene, because we don't want it anymore. We can drag it. And, that's it. The full line should contain the image function. Is it? It may. Oh. Who you mean? Here? Yeah. Yeah. That's misspelled, right? I think that should be it. We're going to go ahead and add it to the examples again. Copy this one. The name is K. This depends on the ODE common library and ODE global image detector. So, that should be enough. I'm not run. So, this is all we need to do. Frame detection. Detector. We show it, we show it. Because we had the training data already. So, let's see if that compiles. And, hopefully, that should be working. Wait. Wait. Okay. So, let me just do the long space. Be that way. So, this is the code. And, we're going to check if that works. So, my example should be under object detector. That's K. As we said before, I'm going to pass in the training data that I downloaded from the repository on the website. And, this is training data. And, it will take care of grabbing that and passing it to the detector so it should work. Wait. So, it didn't fail. Oh, right. That's something wrong with the camera grabbing probably. There should be some typo somewhere. So, this is the detector working. Just detecting my face with those lines. So, the basic thing, the thing is that in robotics, that happens a lot of times. Like, we need these kind of applications very often. So, what we do is, we have to code everything from, not everything from scratch, because these libraries that are there are very nice. But, still, it takes a lot of time to develop. I'll actually show you. So, this is the cascade that we just made. And, if I actually purposely downloaded somewhere, if I can find, I think I did. I did download the version from OpenCV. This is really on the website. I just want to compare how it looks. We have internet, right? With this thing. Is it on? Yes. The details are on the paper. Oh, right. The speaker details are on the top. Okay. So, I'll just go ahead and try it. Try it, I suppose, please. So, the funny thing is that this was actually made by another student that I had on the Google Summer of Code. But, if you have a look at the code, the amount of code that you need for a classifier to work is much more. So, you need much more lines than we do here. So, our art is here, right? It is like 20 lines, and it's like maybe 200 here. So, it's a bit more messy, and it takes a bit more of a time to do that. So, yeah, I'm going to go ahead now and start with the final demo. I don't know if I have time to write it all. Yeah, I don't think I would. So, I'll just go through the code a bit and show you how it works. Basically, what we're going to do here is we're going to do a 3D detection. I'm sure you guys can't see this properly. Probably at the back, right? So, basically, again, we get a trainer. We get a trainer, a different trainer. This time, we get an input data directory and we start training. Then we have a detector. We create the detector here. We give them the input location, the train data location, and we initiate the detector. We grab the frame, fairly easy, this time from the Kinect, as we did at the beginning. Again, same thing, frame generator, if it's valid, we grab the frame. We clean up the visualizer, visualizer, and we have the point cloud. We get the detections that are going on the frame. For this, we pass the frame to the detector. This is like a global feature detection. So, it will try to detect every object on the scene based on some CAD models that we train. What we do is that we go through every detection. There can be multiple detections at the same time. So, we will go through every detection there and just go... This is called under, this is PCL, as I said. We don't have visualization here. We will go through every detection. We give the detection color and give them a name, which is... Here, we grab the position, and the name is here. The name of the detection, position, and these are some visualization stuff that is not very important. The name of the cluster, and this is just some addictic detection, and it will show up fairly easy. Basically, the idea is that you can already detect stuff with the same way we did before. Same structure, you don't have to learn pretty much anything new. For us, if you're working, not for me, because I'm mainly focused on computer vision, but my colleagues can just use the same stuff and go ahead and run it on the robot. So, it's very easy for them to detect a bunch of stuff. So, I'm just going to go ahead and try to run this version that I have already compiled. I'm going to try to put this somewhere here. This is like a robot. That right thing should be a robot. I'm going to check how it looks. For that, I can actually use the visualizer that we created before. Examples, detection, which is easier, frame generator. I'm going to just check how it looks. What this does is I'm going to get that image here. I'm going to look for the domino plane. That's what this detector is doing, which is going to be probably this one. I'm going to put some stuff there. It's not going to be correct because I'm using CAD models as training data. I'll show you how it looks very fast. So, it's getting the training data. It's a great training because I tried it before. It should show up some cool CAD things. But I don't actually think I have time to show that. FunCAD should be somewhere. I don't know what it is now. Sometimes it gets lost. Yeah, there it is. So, the other way, as before, we can just turn it up. It's getting a bit confused, as you can see with the labels. It's saying all them, energy drink, because those are the models that I have. I don't have the model of this thing. But we can keep on adding stuff and then probably, I don't know if it will solve. Also, this is something different, is it? No, it depends. If it matches the model or not, if we have a cup, it will detect a cup or stuff like that. So, this is actually a lot of code, if you have to do it by yourself. But just doing it here, it's a very common task in robotics to get the position and the label of the object. And it's very easy to do it with this library. So, that's the main goal of the whole thing. Let's go back for the last part. This is a bit of an idea. Yeah, this is what we have. It's not very much more. There's a few other detectors that you can try out and play with. But, as we said, it's very new and then we have much stuff. So, we have a few ideas that we want to implement. We want to have CNN-based algorithms that's pretty new now. It's very in the state of the art. So, they work pretty well and that's something that we want to implement. We want to improve the framework. We want to do some other stuff. We might want to add some semantics, some language, add to the library. We will see. So, for those of you that are interested in this, we are on Google time of code, both projects, open detection and Robocop. You can go ahead and apply for the ideas. Very cool. You can get paid and you get the opportunity to work in real code and do very amazing stuff. And that's pretty much it. I just want to get your feedback, guys, possibly during the pub crawl or something. If you guys are wrong, just talk to me. I'll be happy to chat with you. Tell me if this is a crazy idea, if it works, if it doesn't, what your thoughts are, whether you need it, and everything. So, thank you very much and I hope I wasn't that boring.