 Bienvenido a todos. Welcome everybody. This is the first session we have this morning. We have a very busy morning at the UPF because now we have the different hours for the different TCH. How can we call it in English? The final project for the degree. And we have three winners and three short presentations that we will learn a lot about the three students we have here. And after at 11 we have the individual presentations of the different masters. So you will split, I guess, in the different sessions and we will know the coordinator of the different programs. But first of all, let me introduce myself. I'm Michael Liber. I am the director of the department. And so we will... I will meet some of you in one of the programs I am teaching, which is the Wireless Communications. It's a new master that we're very pleased to start together with the UPC. But now, until 10, Anders, which is the director of the communication, is responsible of the organization of this event. We'll introduce a little bit how he's organized the session and we'll give the floor to different persons we have on the table. Okay, so Anders, please. So I will be very, very brief. So this is the second edition of this award. So the idea of the award is to promote excellence among bachelor's or undergraduate students in Spain, especially those, of course, in an ICT-related area, which is the focus of our department. This year... So last year, actually, there was no award ceremony. But this year we thought that we should give some prominence to the winners. And so we invited them to come here and give a short presentation of their works. There's also been a new modality this year, which is... Xavier will talk a little bit about, which is the reproducibility award that we funded through something called the Maria de Maes to Excellence Program, which the department was awarded this year. And so maybe you want to say something about this? Yeah, so our department this year got a recognition, which is given by the Spanish government, which is the Maria de Maes to Award. And it basically recognizes excellence in university departments and also promotes the idea of strategic programs for these departments. We are really proud to be the only engineering department in Spain that has such an award. And as part of the strategic program that we have put together, one of the core ideas is to promote reproducibility. Reproducibility is becoming a fundamental aspect for good research and for research that aims to have as much impact as possible. So the idea is that when you publish something, it's very important if you want this publication to be of relevance, to have some impact and to be used by other people so that what you have done can be reproduced. And in the area of ICT, what it means is that apart from the publication with a well described process and explanation of what you have done so that someone can also do it, you have to supply the software that you have used for your work in a way that is well explained, that people can actually compile and use that code. And also typically in a lot of the work we do, we also use data. So ideally, we should also provide the data, ideally also open so that people can use that. And amazingly enough, that's not that common still. A lot of the work and a lot of the undergraduate thesis that many of you have done and people do even at the level of course of master's thesis and PhD thesis and research done by faculty, really we do not follow yet this type of practice that by now I think it's quite clear that what we should do. So anyway, so we decided to promote that also at the level of undergraduate thesis and so we organized this prize within these awards that we're giving specifically for the idea of reproducibility at the undergraduate thesis level. But again, I think that's a very important concept that I would like you to take and to use in your work in this case in the master or in your future research. Okay. Okay, so the winners of this year's award, actually the bachelor's thesis award was shared between between two people, Natalia Delgado who will present her work, which is titled developing a software for automatic synchronization between tonalities and colors in audiovisual music therapy and Arancha Casanova, fusion of 3D data for interactive applications. And the winner of this Maria de Mesto reproducibility award is Adriá Gariga. This work is titled solving Montezuma's revenge with planning and reinforcement learning. So part of, so the price, these prices have a cash part, but also part of the price is to pay for the tuition of the masters should they decide to enroll in the master of UPF. And I believe at least, I believe both Natalia and Arancha this year will attend the master's programs. So I think we will invite, I think you have decided that Arancha will go first. Okay, so we'll, sorry. Okay, so well, we'll invite you all up to, to give you the price before you give your presentations. So if all of you can step up here. Arancha is the first one, right? Which is the winner of this thesis award. Okay, so here you have it. Natalia Delgado is also the co-winner. Last but not least, Adriá, the Maria de Maestú award. Okay, so Arancha is going to give now the talk, right? Before I dive into the project, I need to explain what Kinect devices can obtain. Apart from normal color data, we can also obtain a depth image where for every pixel we can obtain the distance from the pixel to the camera. Combining these both information, we can obtain point clouds in 3D. This is, for example, a frontal view of a point cloud of a room with several objects. And this is a rotated view of the same point cloud. Later, we're gonna see it better in the video. If we work just with one Kinect, we can have a limited working area here. But the objective of this work is to increase this working area by using two Kinects. We need, of course, an overlapping area to detect similar points. To have these both Kinects working together, we need to register both sensors. The way to do this is with some registration techniques. Here in this work, I explore the classical registration using a chessboard pattern and a new method using a finger detector provided by Exibol Studio. So to sum up, the idea is to combine both point clouds in a single one. Captured by one sensor. This is the other one. Captured by a second sensor. If we fuse that together, we have an increased span of the room. To validate this registration, we have just an application using a hand tracking provided by Exibol Studio. The idea is to increase the interaction range of both registered sensors in front of just using a single sensor. Here, I will briefly explain the workflow of the registration, which consists in taking HDB plus depth data, converted to point clouds, and then applied three different registration methods. The first one, 166, is the classical method. The second one, I'm going to explain today. It's just an intermediate phase. And the last one, our proposed method, is the automatic finger detection. Finally, all these methods go through a refinement block to obtain a better registered point clouds as an output. The first registration method consists on having key points or similar points in both overlapping viewing area of both sensors. This means that in this case, we consider a chessboard in a still position captured by two different sensors. Here, we have different points of views. And the key points we identify are the corners of the chessboard. If we have these corners, we can extract a transformation matrix that can relate the camera coordinates where we have our point cloud to the world coordinates, having as an origin this point of the new systems coordinate. If we do this with both sensors, we now have two different point clouds used together in the same system reference. This was done following the work performed by Zang. Here, I put the reference. This is the result I showed you before in the video. We have here one point cloud, connect one, connect two, the second point cloud, and finally the transition. Now, our method follows a different approach. Now, we don't detect chessboard corners, but we detect the finger as a key point. Now, imagine that I have two different keynecks pointing at me. In the overlapping area, I draw a path, for example, a square. As we see here, two different squares or just a spiral. So, per every time stamp that we detect a finger point, we consider this as a 3D point and we construct a whole point cloud using these detections. Here we have an example where this is a point cloud. Here, I draw two different squares at different depth. And the orange one is captured by the connect number one and the red one by connect number two. If we calculate the transformation between these two point clouds, we can obtain the registration matrix that we will later use. To see that it works, we can see the green cloud that is the result of a pine distance formation to the orange cloud. So, now we have the correspondences matched. Finally, we have a refinement block. We have used it at the closest point as a method. I'm going to explain. I don't have time. Here's the reference. As an initial alignment, for example, we have maybe the floor is not well aligned, the chessboard is also not well aligned. And when we refine this cloud, we see that now it's fixed. To validate the registration, we improved the hand-dragging application. Here is the workflow. Having two sensors captured in real time, we put a detector in each one of them. And one of them, the output of one of them needs to be registered to the other sensor. So, we apply the registration matrix we obtained before. So, now we can have a few interaction space. All this is fit to the tracker. As a first example, I draw a rectangle through all the interaction space. We see in green, detected points from connect one and red from connect two. We can see that the rectangle drawn is a smoothly transition between both connects. A second example with another shape. Here we have the resource briefly appear. If we use just one connect, we could either have red or green points. In this case, we have both. Another final example, I draw my name. This is just using the hand. There are zones where both green and red points are present because it's the overlapping area. And other zones where just red or green points appear. So, now is the conclusion. We can say that comparing both registration methods, we see that for what it comes when it comes to error and computational time, we have similar results, but using a finger detector in set of a chess board gives us a main advantage that is user-friendly. We only need a user and we don't need extra material. And it's very easy because we only need maybe five seconds or less to draw a square. And we have both sensors registered. And for the application, we see that it has been improved because it almost doubled interaction range. So, thank you. That's all. Hello. I am Natalia. And my project consisted in developing a software that synchronized automatically sonalities and colors for artificial music therapy. This project started during my Erasmus experience in Holland, where I made a PhD student that was developing a new method for music therapy in which she proposed visualizing changes in the harmony of music through colors. When I met her, she was trying this in an orchestra concert. And what she would do was analyze the musical piece and then decide when the changes would take place and she would change them manually with an application. Now, this had some problems. It was mainly a manual work, which turned out as inefficient and would probably turn into a product that would not be very accessible since it would be quite expensive. So, a solution was proposed, providing with efficiency through automation. And this would result as a software, which would be the final result of this project. So, this project is centered in the design and development of this solution. Even though, usually, we go from the beginning to... We start explaining the beginning and then explain the end. I thought it would be interesting to see the final result, so we understand better what this project is aiming at. So, I'm going to show a demo. Okay. So, in this short extract, you can see when, at some points, the changes were very small in color, like from blue to purple, and this is showing there is a quite stable moment in music. And then there are moments when, for example, we have red and then blue. This drastic change is showing a musical tension. What is meant by musical tension? It's like a sensation generated in a person when you feel the urge of the music to fall into a more liberated state. So, well, I studied engineer and I loved music. So, this is basically why I thought it would be interesting to develop this project. And to obtain the main objective of this project, which was providing with efficiency through automation, I divided the objectives in four main objectives. The first one, making an automatic extraction of the harmony. This is obtaining chords, the chords from the musical piece. After this, these chords are expressed as intervals, and then these intervals are mapped into colors. After this, they are both reproduced the colors and the music, and the idea is that the harmony changes synchronously with the colors. And finally, since an external software was employed, an evaluation of the software was also made. So, the first block is as the objective of obtaining a list of chords from the MIDI file. How do we do this? Well, first of all, MIDI files look this way. They are written in vinyl chord and they have to be interpreted into a list of commands with a timestamp, a command note-on, note-off, and their frequency values. These values were then represented as frequency in a time axis, and they were divided into time frames, which are bars. Why do we do this? Well, to define a chord, it is important to define when the chord starts and when the chord finishes. So, we have to use the unit in music, which is beat. The problem is that beat tracking is not an easy task to do. And these results were not accurate, which would result as not having accurate results in the chords either. So, an external software was employed. Find MIDI chords. After using the software, we had a list of chords, and these chords were interpreted as intervals, which are jumps in terms from one chord to the next, in terms of half steps. These intervals were then mapped into colors. How was this mapping done? The idea is that the maximum tension is represented with complementary colors, and the rest of the tension, middle tensions, are represented by the middle colors. Once we have the colors, the list of colors, they would be executed as changes in colors of light, and the music would be played synchronously, as we saw before. The idea is that the changes occur synchronously. But the real thing was that this wouldn't happen, since we had, again, a problem of beat tracking. To go back from a list of colors to the moment in time when this happened, we need to go back and use beat tracking. So, to offer a quick solution for the PhD student, another software was generated, where the values in time could be introduced manually, and demos could be done. Also, performance evaluation was done of the external software employed by MIDI cords. To do this, I based cordarization on speaker derivation. This is based on dividing an audio recording in homogeneous regions, and corresponding and deciding which region corresponds to which speaker. In this case of speaker derivation, in the case of cordarization, we divide the regions, which are bars, and we say that each chord corresponds to a different region. So, this is a test that Fine MIDI cords is doing. And to evaluate, we compare to a ground truth that was done manually. And we evaluate with derivation error rate. So, the objective is to minimize this value. And these were the results. The mean derivation error rate was 29.11. This means it's around 70% of correct answers. We thought this was good enough. And finally, well, conclusions and objectives. Most of the objectives were achieved, even though some of them were not obtained the way we initially thought they would be done. And well, I like to think of this project as the design of a product and the development of just the foundation. So, I think it would be nice to follow and continue it and change some things, feature work, like beat tracking or using real synchronization, unifying the software. Since I use different software and put them together, and we'll maybe do a better interface and music style extension, since I didn't save it. This was, we thought this would be just for classical music. So, that's all. Thank you. Hello. My name is Adrià Griga, and I'm going to present my thesis, which is Solving Montezuma's Revenge, Respiling and Reinforcement Learning. It's been done here in UPF and my supervisor is Anders Johnson. So, why did I focus into this game? So, a while ago in 2015, DeepMind, the AI company released a algorithm called DeepQ Learning, which basically attached neural networks to Q Learning and was very successful in playing Atari games, such as this one. But it was very successful with a lot of them, but with some of them, it was really, really bad, like zero score bad. And the problem is that what this algorithm does is, at the beginning, it doesn't know anything about the game, it doesn't know anything about the domain, only knows that there is some vector values that comes in, and some of them give rewards and some do not, and that it can move controls. So, what it does is take random actions until there's some reward, and then it says, oh, I should follow up from there. And it does that. So, the problem with this game and others is that the rewards don't come unless there's a really, really small chance. So, for example, in this game here, we have this character that has to get the key, and he has to go down on the outlined path, and that takes more than six seconds, which is more than 60 actions, which with the branching factor of eight gives this really low probability of reaching a reward. So, you're not going to get anywhere by random exploration. So, what I did was focus on this game, and just by working in it, I made the algorithm play well. So, this is the game. The character has to explode all this pyramid, and the problem, as before, is that you have to explore a lot. The previous best scores are this 3,500 with a novelty baseline that tells the algorithm that some states are new, so it gives extra reward for exploring more, which is what we want. But, it wasn't there at the time of starting the work, and the other is the one that I'm basing on, which is the planning instead of learning, which is looking basically at all the possible futures and taking the best one. So, if we look at enough futures, we can reach the keys. So, first thing I did was reverse engineer the game and look at the memory of... So, this is the memory of the Atari console. It's only 128 bytes, and some of them mean things, and that is used in the rest of the work. So, this is the algorithm I was based on, which is iterated with, and it does a breadth-first search on the graph of all possible actions and states, and stops searching in the states that are not novel enough, and states that are not novel enough are those that do not feature a RAM direction, RAM address, with a different value. And by using that algorithm, which got that pretty nice score, the character in the first screen only explored these places. So, as you can see, for example, there is no two points with both the same Y and X. So, well, because here it expands other RAM addresses and down here, it doesn't go to left. It doesn't get to the key. It doesn't explore enough. So, what I did was, instead of pruning so much for each of the values, I started only pruning when the position is the same. That was basically the key to reaching the key. And that gave some much better baseline score. The original algorithm called for doing this, not for just the room XY RAM address position triplet, but for each of the addresses, but there's just not enough computational power in anything I cannot have access to, to do it. So, instead, I did that. And with some other improvements that are listed here, the algorithm ended up working pretty well and getting a 15,000 score, which is a step up from 500. So, one of the innovations that I did for better exploration was screen pruning, which means that with a certain probability, well, I don't see the pointer. Oh, there. With certain probability, when we arrive to a new screen, we mark it as explored, and we don't get there. We prune everything, every state that gets in there. And that, what obtains us is exploring faster and further. So, for example, here, we have 13 exploration steps. So, at each step, what the character does is explore all the possible states around pruning for position, and then it moves a little, and then it does the same again, that it moves a little more. And as you can see, this is a succession of states. So, first, it is in the orange blob, then in the blue blob, the other blue blob, the red blob, et cetera. So, as you can see, in the first frame, it explores the nearest parts, then a bit farther, more, farther, more, farther more, and it reaches the next reward in this case, from here to here, in 13 steps of exploration. And, but, with pruning screens, because we sometimes randomly prevent access to another screen, it reaches only four steps. So, what happens here is the first step, the orange one, explores all these screens here. But it doesn't explore this screen because it's been randomly pruned. Sometimes happens, it's just luck. But with several explorations, we can often get lucky enough that we get really fast today, possible goals. So, that's what I did for planning, all these crutches to the algorithms so that it could work and get some reward. And then I focused on the other approach, which is learning, which, as I said before, instead of looking at all the possible futures, we take some actions and then we follow through the place. In this case, I didn't use the fancy neural network, DeepQ Networks. Instead, I just used a table with all the possible values. And by taking some of the memory addresses, I could construct a small enough table of only 800 megabytes of RAM. So, the problem remained of the no rewards, even in the first screen. So, if it did random stuff, it wouldn't arrive to the goal. So, I gave it a potential based shaping function, which basically means that when the character climbs the gradient of this screen, it gets more reward. So, I did these two gradients and basically the character does correct follows to the key and then gets out to the right. And using learning in 20,000 episodes, we get to a really nice reward of more than three, which means that the character gets the key and gets out the door as it should. And adding options, which are sequences of actions that are predetermined that are deemed good, which I handcrafted, it's even faster. So, with more domain knowledge, in all, it always gets better. So, well, conclusions. Domain knowledge is very useful always in both settings with domain knowledge and with more domain knowledge of the game itself, not of the games in general. And it's much, much better for the algorithm, so it performs much better. Then exploration is very important in these settings. But unfortunately, what I did here is too specific to this one game to be useful for more. But it can be not too difficult to generalize, which is what I'm doing now. And reward shaping and learning really helps also with spare rewards. And for future work, that's basically what I'm doing in generalizing approach to other similar domains, which is Atari games that are reliant on position, and maybe use some really nice machine learning to automate parts of the algorithm. So, for example, the novelty calculation, instead of looking at grand values directly, we can make learning on it, or we can use planning a frame predictor instead of the emulator to explore, etc. This is the references, and if you now allow me, I'm going to show a short demo. I'm a bit faster, so that we don't have to wait a lot. But with planning, the character really makes the optimal actions to get to school. So, yeah. Maybe here, the algorithm fails a little, because it does not take into account this part, so it just randomly moves a while until it gets out. But it works. And in the end, it gets the promised 15,000 score. If the full screen starts working, and we can see the score at the top, if it works ever. Yeah, okay. Now you can see it. Five, 15,000 score, which is really nice. And so that was my fix.