 Okay, welcome once again to this seminar. As you know, this is a seminar to spread the word so what people, faculty of our department, what they are doing and so on. And today is my great pleasure, okay, to introduce Marcelo Altarmillo, okay, nice picture of Marcelo's kids, okay. So I'll go briefly over Marcelo's short bio even though he has been doing many, many things. So Marcelo, I think he grew up in Montevideo and in Madrid, right, okay. And he got the bachelor's degree in Uruguay and then he got the PhD degree in Electrical Engineering and Computer Engineering at the University of Minnesota. That's in 2001. I think that at that time you moved to Barcelona. I think we started together here in 2001. He's as a professor since 2006. So in this time he did a great work on image processing. He got many awards. I'm quite impressed. I'm not gonna go over the whole list, okay. But he's most recently recipient of the very prestigious ERC starting grant, okay, for a project that actually is basically the theme of the talk. He's also the author, recent author of a book. I was not aware, again, on Enhanced, you don't have the title for me to read, Marcelo, but okay, but I have it here, okay, image processing for cinema. I have to say that he not only does the image processing part, but he does cinema as well. So Marcelo has directed some movies. I've seen two, I don't know if there are more, okay, that I recommend you look at. So it's the Dia de Ana and Ruidos, okay, very wonderful movies filming in Uruguay, right? Okay, great. I just to finish, okay, so some interesting bit that I got this morning, I think I'm from a little bird or Javier, so that I read in the short videos, okay, that Marcelo has in some of the papers, okay, so I'll read it. So his interests are image processing and computer vision for digital cinema, although he prefers the films of Mankiewicz and Berlanga that are analog, okay? So you see, if you have a rich soul, you have these contradictions, so, and Marcelo has a very rich soul, I will tell us more. Okay, Marcelo. Thank you very much, Héctor, for this very fine introduction. I'm very pleased to be here. And I'm hearing myself double. Now it's perfect. So, without further ado, I will tell you about what we're doing in our team. So, image processing for enhanced cinematography. When I talk about, I'm sorry, I'm going to move. You hear me properly, right? Yes? More or less? Okay, sorry. So, when I say cinema, I'm talking about digital cinema, which is assumed by most people, like, that it's the standard now. It's been a very swift, I'm sorry, I have this trouble because when I move, I keep hearing myself, so it's a bit distracting, so I probably end up here, okay? Yes. You don't hear what I hear, my voice is in my head. So, anyway, digital cinema. Digital cinema, most cinema is digital now, it's been a very swift transition. So, it started in the late 90s when what was digitized was the last stage, or actually the middle stage. So, movies were shot in film, they were digitized to perform the color correction and post-production there, and then they were transferred again to film, okay? That was started by 98 with this Coen Brothers movie. Then that went, so these are some studios where, post-production studios where all the color correction, color grading, and editing is performed in digital form. Then the next thing to change was the projectors. So, projectors become digital, probably with the onslaught of the 30 movies, like seven years ago. So, that was the excuse that distributors used to force exhibitors to switch to digital protectors, which are probably the norm now. But still, most movies that you are aware of in the past 10 years, major movies, were usually shown, sorry, shot in film still. That changed with the advent of, okay, so digital cinema screens, this is going up. This year, all movies, all digital screens in, sorry, all screens are digital in the US. Anyway, so I was going to mention, what about film? So, cameras, film cameras were, have been gradually displaced by digital cinema cameras, like this one here, this one, with this kind of camera, you can shoot movies. This is a photo camera, it costs like 3,000 euros. And you can shoot movies with this, and you can project them on a big screen, and they really look, if they're shot properly, they really look like cinema. And there's a famous movie called Rubber, which I highly recommend, it was shot on this camera, and I saw it, and I thought it was shot in 60 millimeter film, it was amazing. So you can get fantastic stuff done with this very portable, small camera. And this is the factor standard for cinema cameras now. So most productions nowadays, commercial productions, use this type of digital cinema cameras. So the transition has been rather fast, and well, practically everything is digital from beginning to end, with a few exceptions, like this movie, which I will not mention, which is opening on December 18th, I will not go into details, but that was shot in film. Okay, so these are the latest Oscars, these are nominees for cinematography, you can see that, well, out of five movies, four of them were shot with this digital cinema cameras. Likewise, for Best Picture, like eight out of 10 were shot in digital. Okay. So, there's a lot of image processing going on before you get to see the movie, and it starts in the camera. So cameras, not only digital cinema cameras, but your mobile phone cameras, for those of you who have a mobile phone with a camera, that camera is doing a lot of image processing, and I really mean a lot. So, for instance, it takes, so this is a scheme of the sensor of your camera, this is what this captures, so the sensor captures one single, let's say, value for pixel, and the camera has to create the other two channels to create a full color RGB picture. So that's called demo-cycling, and this is an interpolation process, if you do not do it properly, you get this kind of artifact, very annoying, okay? So the camera does that. The camera also performs some sort of, a little bit of denoising. There's always noise present in cameras, so the camera does a bit of denoising. It does compression, especially if you're shooting video, because the amount of data is huge, so the camera has to do compression. It does, well, some other, this feature can be performed in camera or in post-production stabilization of the, if there's some camera shake. These are artifacts produced by the type of sensor. You also do slow motion, this is most commonly done in post-production. In post-production, a very important part of that is the color grading. When colors are adjusted so that, well, the people who are making the movie are happy with how it looks. And this is mostly a manual process assisted by software, but it's mostly done by skilled artists. So anyway, there's a lot of image processing going on, and what we do, okay, I'm hearing a lot of, maybe because I'm, yes, okay, when I move my head like this, my voice increases. So what we want to do in our team, our goal is what it says there. So we want to develop image processing algorithms so that when you look at the movie on a screen, you get to see the same details and colors that you would get to see if you were present when the movie was shot, or to be more precise, if you were present at the post-processing suit where the movie was actually finished. So this is what we want. It's like the basic problem of image processing, of color processing. You want a fidelity of your displayed image. And why is this hard? Well, this is hard for several reasons. For instance, the dynamic range of the camera. What's the dynamic range? The dynamic range of a scene is the ratio between the brightest and the darkest points in that scene, okay? So in a scene you have very bright and at the same time very dark places, the ratio is very high, okay? And usually common natural scenes have a very high dynamic range of, let's say, five, six, seven orders of magnitude, which is much higher than the dynamic range available for cameras. So camera sensors, what they do is they transform light into, well, electrical charge, and then digital values. And this transform process is bounded below by the noise level, so you can not actually record really real black, there's always some noise. And above by the amount of charge that the sensor is able to store. So when the sensor reaches its capacity, that's the maximum white that you can record, okay? So that range is limited. It's usually two, three orders of magnitude, and although it's, camera makers are always striving to make that range larger. But in any case, cameras cannot record the intensity of light arriving to them in common situations. This is why, this is one of the reasons why professional shoots have artificial lights in them. It's not only for that, of course, professional shoots have lights for artistic reasons, like in this case, or in this case, or this one, or that one, which is, you know which move is this? This is Blade Runner, you have three million points. This is the scene of the kind of the Turing tests at the beginning. Anyway, so you always have lights for artistic reasons, but you often have lights not for artistic reasons, just so that you can see what's there in the scene, like here, or here, okay? So these are cases where the lights are there because it's dark, and you could say, well, okay, just boost the gain. Well, if you boost the gain, then noise becomes apparent, okay? So it's not as easy as that. Again, a detail of that. But also in daylight, you also have artificial lights. So why? People do not need that. Well, it's because the camera does need that. So these are several examples where you have artificial lights in daytime, even in the beach, okay? People are wearing hats. Okay, so this is just a picture taken with certain exposure time, which is the time where you're actually taking the picture. This was a very short time, so you can only see outside. As you take another picture with a longer time of exposure, you get to see more outside, I'm sorry, more inside, and then there's too much light outside. So this is a scene where in no exposure time gives you the opportunity or the possibility to see both outside and inside at the same time, okay? And this highlights the limited dynamic range of the camera, even though a person here would be able to see both outside and inside. Okay, this is just to show that the dynamic range showed here is latitude. Well, you don't see that, but let me just, let me tell you that the dynamic range in a few years has gone for professional cinema cameras, has gone from around 10 stops, which is three orders of magnitude, to around 14 stops, which is four orders of magnitude. So it's continually increasing, okay? Why is that? Because the larger dynamic range is, the easier it is to shoot without artificial lights and to shoot some scenes which would require a lot of work if you needed to artificial lights in them. Usually in shooting a movie, putting artificial lights takes more time setting them up than actually, you know, for the actors to actually play out the scene. So this is what you get in, okay, this is a frame from a digital cinema camera and you can see that the outdoors is overexposed. And here, there's a window there. Okay, so if you wanted to, how to deal with this? Well, there are several ways. So this is how you see the frame in a monitor in the set. What's usually done is you record this in, let's say, raw format without doing any image processing in the camera. And then afterwards, you play with the values so that you can tweak the image so that you can see, and there's a person, a skilled technician doing that, this color grading, so you can see outside and inside. Another very common option is to just add lights here. At the shooting time, you see this and you just, you say, okay, I want to see both outside and inside the car. So I put some lights here, okay, and that takes time. All right, so how is that, how are high-dynamic range images created in practice? So this is one of the, probably the first, sorry, it's not the first, like it's the second work proposing a practical method to create high-dynamic range images using just standard cameras. And what they do is, I apologize for this and apologize for that. Anyway, so what they do is they, so you have differently exposed images like the ones I showed before of this dining room, and you combine them. In order to combine them, you have to do something first which is to invert the camera function. So the camera, when it takes a picture, it applies a nonlinear function, nonlinear correction function, which has to be estimated and this method does that. And once you invert that, you have the original images but all like, let's say calibrated. So they have like the same sort of intensity. And then you do an average, which is shown here. I also apologize for this, of course. And so you had, let's say you take this, all these pictures, low dynamic range with a regular camera. And with this method, you combine them and you can reconstruct the, what is called the radiance map. The, an image which tells you in each pixel the intensity of the light that are right there. And this is the image. It's what green? What is that? There was no green. Well, this is a high dynamic range image. So I cannot display it because cameras have reduced or low dynamic range but displays also have a low dynamic range. Like this projector or any of the displays that we own. I know about this because if you want a high dynamic range projector or display, you need to shell out like 30,000 euros, okay? So I'm guessing you're not owners of a high dynamic range display. Therefore, you have to do something, some trick like this. So this shows like, there's a difference of like five orders of magnitude or even more, right? No, six orders of magnitude between the brightest and the darkest. So you have to figure something out if you want to display this picture, which is called tour mapping and I will discuss that later. All right, so that method works if nothing is moving in the screen in the, when you take the pictures. If something is moving, you get this creepy artifacts called ghosting, okay? Scary. So even though this is the case, that method is used in a professional cinema camera, the red camera, okay? And this has been, this is used in actual major movies. So what they do is they take two exposures, one with longer exposure time, one with a shorter exposure time, okay? Like this, these two in succession and then they combine them, okay? And this is what they get. Of course, if there's motion, you have that ghosting problem and you have to deal with that in post-production. And many people prefer professionally not to use this option because to deal with the ghosting artifacts might be really, really annoying afterwards. There's another possibility, not yet commercial, but it's getting there of shooting with two cameras, okay? Arrange like this. So two cameras, perfectly aligned and there's some semi-transparent mirror there. So the light from the scene gets, most of the light gets to this camera and some of it to the other camera. So you're able to get these two pictures which are perfectly aligned and synchronized so you don't have this problem of motion. So people can move there freely. And then you have to merge them. So this solves the problem of creating high dynamic range video. Of course, you need two cameras and this is a very delicate thing so it's not yet practical but it's getting there because it's really relevant, a relevant problem. But still, what do you do with that? When you merge that, how to display that, okay? So this is the problem. So you get this, you get that and then you have to do a transform in order to display it. If you just do a linear scaling, let's say I take this kind of picture and I say, well, the smallest value I paint as black and the largest I paint as white. So if you do that, you get this kind of image, everything like that, okay? So you have to apply a process called tone mapping and you get this. And the tone mapping process, you can see that what it's doing is actually spreading the histogram. So this is the histogram. And this is the histogram of that image. Okay, so you have to do tone mapping. Tone mapping is also not a trivial problem to solve. So if you look for high dynamic range images on Flickr, most of them will have, so all of the ones I'm showing up downloaded from Flickr. Most high dynamic range images from Flickr have been tone mapped with a sort of algorithm which produces this kind of results which might be artistic, they're not really realistic. This is not the best you can do with tone mapping, okay? There are much better tone mapping algorithms. But still it's not a, it's a problem at all because you have to deal with temporal artifacts, you have to deal with changes in color and also it's very hard to evaluate how good the results are. So you may think of a tone mapping algorithm but then in order to evaluate it properly you have to do some user tests and that is really, really tricky. So this is a case where the high dynamic range output, the tone mapping wasn't just so cool so they did some post processing. Okay, cameras always have noise as I mentioned before. There are, so the way, so this is an image where the noise is uniform but it's more noticeable on the dark areas. This is just a perception, perceptual issue. But there are many sources of noise for the camera and electrical coming from the electrical circuit in the camera, from the temperature outside, from defects in the sensor. But the fact that it's always there. So cameras always do some sort of denoising which is very simple because you have to keep in mind that these cameras do a lot of stuff so they cannot choose the best possible algorithm for each of the stages. They have to do a compromise. So they have, they do okay denoising but they have to do also okay demo-cycling, okay compression and so on. So yeah, they also deal differently with luminance and chromanoise. And this is a case where this is a frame from a digital cinema camera and it's noisy, okay. And so the denoising in cameras is usually not enough, well usually in many occasions it's not enough so you have to deal with that. Okay, one very important challenge is that of color correction. So color correction is usually, you try to do that as much as possible on the set. So on the set you calibrate the cameras with some devices and then in post-production you choose the colors with using this software tool so you manually tweak the colors of the scene so you make it look as you want. That's a manual process. It's very error prone and it's a global process meaning that you just change color values regardless of where they are in the picture. So you may end up with a problem like you're fixing some color here and then you're screwing up some color there, okay. All right, and then what happens? Well, if you have to shoot with two cameras or with one camera changing automatically its parameters, the colors change even if you move just a little like here or in a video like here you go from there to there, the colors change and intensity changes or here you have two cameras there. Sometimes you cannot choose. Just let's use just one camera. Okay, in some situations you can. You have to use several cameras like here. So in broadcast there's usually one technician or several in charge of dealing just with adjusting these knobs so that when you cut from one camera to another there are no jumps in color. This is to show that different camera models have different color properties. There should ideally be identical standard colors, they're not, and sometimes they're not on purpose so that some camera maker can claim my camera is great for blue skies, okay, and that happens. All right, so even if you take the same camera model with the same settings you get differences in color like here, okay. These are a pair from a professional camera. Same model, consecutive zero numbers and still you get differences in color. And sometimes you have 120 cameras, okay. So anyway, that's challenging. Also differences in color gamut. What's the color gamut? This is a diagram that represents all the colors that we can perceive. So each point there represents a color that we can perceive. Displays, common displays are based on the trichromatic property which tells you that you can create any color just by a combination of three primary colors, okay. So you combine red, green, and blue and you get with this mixture any color that you like. Well, it's not any color. It's the color inside this triangle where the vertices are red, green, and blue primaries of your display. So your display has primaries which are physically made by some light emitting material and those primaries have those colors. So these are the colors that your display can reproduce, okay. It will always be a triangle. So there are many colors that you can record with this camera and you cannot reproduce with your display. So this inside is the standard for TV. So different devices have different gamuts. This is a gamut representing most natural occurring colors and many are outside this triangle which is representing the gamut of standard TV. Anyway, again, this color grading stuff. So when you finish with your movie, the latest stage, so after everything is done, there's also a skill technician that goes scene by scene, shot by shot, manually tweaking the colors so that you can see the movie properly in your TV, okay. And he or she has to do that again for cinema, okay. And that process is manual, it's error prone, it's very tedious. And why is it manual? Because people doing movies, they have extremely high quality standards. So if you wanted this done properly, you have to have a human do it because automatic methods are really not there yet. So one way you could just deal with the problem would be clipping. Clipping is, okay, if the color falls outside, I'll just move it to the boundary of my triangle and that's it. So if you do that, let's say this is the original picture shown here with the proper gamut and then I want to show the, I want to reduce the gamut so that it fits into a smaller gamut of, you know, older projectors say. This is what you get, okay, by clipping. So automatically you get this and this is awful. You lose all details in, well, all this, you know, nuances in the face and so on. And you might think, oh, well, okay. Well, this is for crappy projectors, right? Well, let me show you this. This is a clip from an image that you can see properly in a digital cinema projector, okay, but here with this projector, sorry, that projector, you're seeing all these artifacts, okay? Because this image has colors outside the gamut of this projector, okay? Right, again, evaluation of gamut mapping processes is really, really hard, okay? So that makes developing gamut mapping algorithms harder. Okay, and finally, as you can see, many challenges. I still haven't talked about our work. Yeah, only half an hour. Okay, whizzer procession. So since the beginning or rather the first third of the 20th century, we have very good, rather very accurate models of prediction of how we perceive colors. So with these models, I can predict that most of you will see this as being red, okay? And most of you will see this as being green. I'm not asking who of you does see this green because there might be some core deficient people who nonetheless drive cars here and they don't want to be outed. Like, oh, I shouldn't have this license. So most of us will see this as blue, okay? And that's predicted by colorimetry, dating to the beginning of the 20th century. This, well, I don't have a good name for that, maybe mustard, okay? Mustard, most of you of us will see this as mustard. Right, so what about this? So I will just be quick. So I see this as greenish and I'll see this as pinkish, okay? Nonetheless, if you put them side by side, they have the same value and you might not believe me why because you're using your visual perception in order to assess this. And my visual perception is probably like yours and tells me, no way, this is different than that. You made some trick here, okay? No, so if you just click here and get the red, green, and blue triplet values, they're the same, here and here and here and there. It's just because of the surround, we see things differently. So the way we see colors depends on manufacturers, including what's around the object we're looking at. Okay, and this triplet is actually that one, okay? So color is a perceptual quantity. It's not inherent to the light. It's not inherent to the wavelength or any physical property. It's not a physical property. It's something inside our heads. So inside our head, this is mustard, this is mustardish, and this is greenish, and this is pinkish, okay? The same light. Okay, another example, even crazier. So I see this as dark and light gray, but actually they are the same gray values. So if you measure that, you have the same gray value. I know you don't believe me, but that's how it is. This is a famous illusion by Adelson. Also, if you arrange things, they look as having a different color, or you can create the illusion of transparency here. So these gray squares now look as if they are, you know, red overlaid by green and so on. Now, if you look at this dot, please look at the dot. All the time, fixate your head, your eyes on the dot, and do not move your head, because if you move it, you will miss it. So please look at the dot. While I hypnotize you, look at the dot. Look at the dot. You're not feeling sleepy. You're looking at the dot. And please look at the dot, and now blink. And you, I hope you see some colors there, okay? And it's a black and white picture. So there's a temporal aspect also to how we perceive colors. And the ambient light has a very large influence. So as you decrease the ambient light, you see colors as more muted, less saturated, and also the contrast is reduced, but it's the same. So the objects do not change. Okay, another crazy example. This is the same. So this is a case of assimilation. So this color is the same in both wings. So the way we perceive colors is affected by this around. This is another example, in case of chromatic. So this is, so this was assimilation, and this is a case of contrast. Now, this gray squares, well, I hope you see this one. The left one is being darker than the right one, okay? But they are the same. They have the same body. So they, in this case, there is, this is a perceptual phenomenon called contrast, lightness contrast or brightness contrast, because this, the perception of the lightness of this goes away from this round. And here it goes in the direction of this round. So these bars are darker. They look darker than these ones. The gray bars here look darker than the gray ones on the right, and they have the same body. So this is assimilation. There are two sides of visual induction. And this has been studied for many years. So perceptually we go from, the judgment goes from assimilation to contrast, depending on the width of the bars. So this has been studied for quite some time. Okay, so this is the end of the introduction. So what do we do? I'll take a break. So what we do is the following. The superiority of human vision over cameras in this place is in many regards due to better processing. In many regards like, I don't know, maybe I can name just resolution, speed, or even dynamic range. Nonetheless, okay, right, okay, so even though our hardware is not as good as the physical hardware that we create, we still see better. So what we want to do, the goal in the work that we do in the team is not to improve hardware. We want to work out software algorithms that mimic neural or perceptual processes and apply them to move this shot with regular cameras. That's what we want to do, okay? Just emulate as much as possible the process. Just going on in our visual system. So now, current work, I'll speed through that. So neural models, this is very recent work with Gijian Kim in which we take standard models of how the retina works and we updated them with really recent neurophysiological data. And by doing that, we are able, so this is the update, by doing that, we are able to reproduce this assimilation to contrast psychophysical experiments that I showed the plot for before, okay? So with this, again, so with these retinal models, updated retinal models, we applied these models to images and the result is images that show assimilation and they show contrast, okay? So this is the first work to the best of our knowledge where we can claim that assimilation might be happening already at retinal level, okay? It was always thought that assimilation happened later on in the brain. Okay, so this is even more recent work from the future, in which we take those models, we take the essential elements of them, simplify them and still get the same assimilation results. But with a plus, now we can express the models as well, there are a couple of personal differential equations but we can compute the state of the solution, the steady state to that system just very easily, one shot with a convolution with a given kernel, okay? Why is that important? Well, it's important for theoretical and for practical reasons. So we show that, I'm sorry, keep apologizing. Sorry, sorry, sorry. So this is to show that with that, with this convolution with this kernel, which depends on, is related to some, you know, physiological data of the retinal distribution of receptive fields sizes. So we can create, we can reproduce this assimilation in contrast results. These equations are minimizing an energy and we can apply that for instance to, in order to precompensate for differences in visual induction when you look at different screen sizes. So when we look at the large screen, we usually are closer, so this angle is larger, than when we look at small screen where the angle is smaller. So these differences imply that the induction will be different. Maybe we are getting contrast here and assimilation there for the same image, okay? So we, with this formulation, we can precompensate for that. Okay, another work from the future. So if you know that you're producing an image that will be shown on a mobile phone, you can precompensate so that you have your version for your mobile phone, automatically compensating for this induction. Okay, this is also some work on neural models that we are linking with basic ideas of efficient coding in the visual system and also producing this simulation and contrast results also in the chromatic case, okay? Well, now it's noise removal. Very quickly, noise is always present as I said and we have proposed a couple methods where we are getting fairly good results like this. We go from there to there and from there to there. And this is actually not submitted. It's accepted there, published already. So you can check out this new work where we do the following. It follows these ideas that we published last year with joint work with Stacy Levine from Ducan, US. So the idea there is that if you have an image and you add noise to it, this image looks much more noisy than the curvature image. What is the curvature image? Well, the curvature image, I will not bore you with that but the curvature image is another version of this image which just represents how curvy are the lines in the image. So if your image has straight lines, the curvature will be zero. If your image has circles, the curvature will be the values of the radius of the circle. That's the whole idea. So this is the image and this is its curvature. This is the noise image and this is the noisy curvature and you don't see anything. And so this is the enhanced so you can see a little bit. So this is the noisy curvature from the original and this is the noisy curvature from the noisy. And they look fairly similar. So the whole point of this work was to say, okay, if you want to denoise an image, it's better that you just try to denoise the curvature because the curvature is cleaner. You see what I mean? That was the whole point. So that's what we did. So this is the noisy image and we are able to reconstruct this which looks very close to the original. Okay, so that's what I said. Okay, another thing that we did, this is work with Thomas a couple of years ago, is to apply this crazy mathematical concept of the covariant derivative, which we used to improve traditional denoising algorithms based on geometry. So we replace the regular derivative or the regular gradients with the covariant derivative and by doing that, we are able to get better results. Okay, so this is what little forest in Siam last year. So this is what we get. Clean, noisy, normal gradient and this is the covariant derivative. So we get more details. Okay, so in this latest work with Gabriela, what we propose is the following. Giving a denoising method, it's better to project the noisy image into a moving frame and to denoise these components. So what does this mean? Okay, forget about that. We take the image and we convert this image into three by protecting pixel by pixel into something which is called a moving frame, which is just an orthonormal basis computed in some manner. I will not go into details. So from this image, you get one, two and a third image, which I'm not showing because theoretically it should be just flat zero, okay? So from this, we get this too. And what we claim is that if this is noisy, it's better to denoise these two and then reconstruct from this denoised ones the denoised version of this than directly denoising that. I hope you're following me. So this is noisy, you denoise that with your favorite method, okay? And you get a result. What we're saying is, well, you would get a better result if you applied your favorite method to denoise this and that and then reconstruct from those, go back there. And so this is what happens, okay? There's some technical detail about the PSNR. So this is the noisy original. This is denoised with, I think it was total variation denoising, a local method. And this is with that local method applied to the components. So it's the same denoising method applied in the middle to the original image and in the right to the components. So what we're in this paper, we are proposing not a denoising method but rather a denoising approach. So we can improve on any given denoising method that you want, okay? This is what it is. So this is another denoising method, non-local means, and this is our method applied to the local means and this is the same for another, probably closest state-of-the-art method called VM3D. Okay, colour stabilization. This is the problem of, this is joint work with Javier, published last year. So the point of this work was to match the colours among different views that I showed you before was a real problem when you have different cameras, okay? So our method here is based on the following. This is the pipeline of any digital camera that you can think of. In general, I'm being very, very general here. I hope I'm not sued, especially since this is going to the video repository. Anyway, this represents the colour processing pipeline of any digital camera, regardless of price and size. So there's this RGB triplet at the sensor level, RGBN, and this vector is multiplied by a diagonal matrix that does white balance. It corrects the colour, also to match visual perception. Then it's multiplied by another three by three matrix C that does a colour conversion, and then by another matrix E, which does another colour correction, and there's some A alpha gain value, and that is raised to a power of gamma, which performs this nonlinear correction, compensating for this lightness perception that we have. So, and that's the output. So what we do in this paper is to say, okay, if this is the pipeline, there are details I have not mentioned, like there's clipping, some nonlinearities there, but this is a simplification that, as you will see, it works, it does a pretty good job. So if this is how a image value can be modeled, and we have two of them, so I'm sorry, so all this can be collapsed into a single three by three matrix. So we have two of those for two camera views. If we want to make one camera match the other, we do not need to know the matrix A1 and A2. It is enough to just know the product matrix, the inverse of one times the other. Why? Because with that we can just create a linear system of equations where this matrix three by three, which is called H, we can solve for that. It's just a linear system of equations. It's mathematically insulting, okay? 10 minutes, thank you. So with that you can go from there. We're hoping to get sponsoring from Kali's. Someone is watching. So we match using some method like, I think it was ASIFT in this case, so we find correspondences, and for each of these correspondences gives us an equation of the system of equations, okay? And with that we go from this to that, okay? Which is okay, but it can be improved if we also estimate the gamma value, which I told you was present there. So if we estimate the gamma value, also minimizing a quantity and error there. This was the previous result, and this is the current result. Estimating both the gamma and the transform matrix. Okay, and we go from there, this, this, there, this, this, okay? And the videos also can be stabilized, and this is a case for, anyway, too much information. So, right, so we can also do color matching for stereosopic cinema. Here we're not using this, you know, digital, the pipeline of the color processing of the camera. We're using a simplification of a variational method. This is joint work with Stacy again from the Mirage conference two years ago. So this is one view of the stereo, this is the other view, and you can see that there's a switch in color, okay? So what we do is we take this, I presented that in a conference in Berlin. I didn't realize, but chan, chan, chan, anyway. There were nervous laps in the audience. So what we do is we have this pair, we take one and we warp it so that it matches the, it's registered with that. So it's deformed, so it matches the other. Then we find correspondences, which is easy because they're aligned, and we take a neighborhood, compute the histogram there, and we move the histogram so that it matches the other one. And if you do that, now the colors match, okay? Right, so we can also do gamut mapping. And in gamut mapping, we use a contrast modification method, which is also perceptually, well, based on some perceptual phenomenon. And we change the contrast, we reduce the contrast if we want to do gamut reduction, and we achieve this kind of result. This is the original, some other method, some other method, some other method, and ours. And so ours should be better, which is the last column. If we want to do gamut extension, which is what's going to be, it's turned to be a major application now. There's a switch from digital signal projectors going to laser projectors. It's already started. Laser projectors have a much wider gamut because the laser primaries have very much saturated colors. So with this very wide gamut, you will need to extend the gamut of your movie if you want to take profit of all the very wide gamut of your projector. So anyway, you can do that also with this. And here we're showing this is the original wide gamut. Okay, this is an approximation because this cannot be wide gamut because of the projector. But the idea being, the original wide gamut content was reduced and then was extended again, and this is the result of another method, and this is our approach, which is better at some course that we always pay a lot of attention to, which is the skin colors and natural colors like the sky and the food and so on. Okay, you can use an extension or modification of this approach to map between gamuts which are non-inclusive. This is very recent work with Javier. So with this, and this as reference, we transform this and this looks like that. And we can also use a modification of this approach in order to perform de-hasing. So you have this beautiful picture and you remove a bit the haze, okay? You can use it also to remove pollution, okay? So HDR creation, this is very recent work with Raquel in which we take this color stabilization method that I showed before where we match the colors in order to improve how high dynamic range images are created, which as I mentioned before, they're usually created by combining a set of different exposed pictures. If you do this combination without taking into account the differences in color, you get some errors. So what we do is we take a reference and we correct the other pictures so that the colors match to the reference, okay? And by doing that, we get a very good improvement with respect to just using the original method that I showed before, okay? You have to believe me, I guess, but there is more detail and our results are in the bottom row again. Okay, also for, this is a very recent work from the future. We're going to present that in this in electronic imaging in February next year. So there are cameras that allow you to take two exposures, let's say, it's an approximation, two exposures at the same time in an interlaced manner. So different lines have different exposures. It's not actually exposure, what they're playing with, it's the gain. So this is what you get. And from that, you can reconstruct a properly exposed picture, okay? Using your interpolation method that we proposed here. So regarding perceptual image quality, image quality is usually evaluated in an objective manner just because it's easier, not because it's usually better. So the common methods to evaluate how good an algorithm is for denoising, for instance, is to compute the signal to noise ratio, the mean square error. These things are very fast and easy to compute and they usually also do not match with how we perceive quality. But doing a good, a proper evaluation of how people prefer images, that's very costly and complicated and cumbersome. Okay, so we've started working on developing image quality metrics, which are based on perception and we have some initial works with Dave. So we have experienced the following. So this is the output of what's record, the output of a sensor say. You encode that, you do quantization and then you send over some channel, you decode and then the display introduces another nonlinearity, okay? So the system gamma is the product of the gamma of the encoding and the gamma of the decoding. That's called the system gamma. So we have found the following. We have found that people prefer images, they like them better when they have a flat lightness histogram where lightness is how we perceive the intensity, okay? So if you have an image, if you look at an image that has a flat lightness, that produces a flat lightness histogram, you will give it a good grade. That's what we found. And using that, so we can use that in order to, for instance, to select what's the proper system gamma depending on the background of your image, the dynamic range of your display. Yes, several variables there. And we have an initial proposal of an image quality metric that depends on contrast. Okay, one point here is that in the eye, there's a lot of optical defects which produce the following that reduce the dynamic range of an image. So the retinal image, the image formed at the retinal level has a dynamic range which is usually reduced from the image that could be measured with a device before entering the eye. And we tested this, this is a very simple work where we take a database of high dynamic range images, pass them through the nonlinearity or convolve them with the point spread function representing these aberrations and scattering in the eye. And if you do that, the dynamic range of the scenes is very much reduced. Okay. So having that in mind, we go now to the final part of the talk very quickly. So with all these elements in mind, we proposed a tone mapping method. What is tone mapping? I remind you, tone mapping is the process by which you take a high dynamic range image and you reduce the dynamic range so that you can see it on a regular display. So we use concepts from visual science the concept of efficient coding. A lot of the elements in the visual system or I would say in the body in general but in the visual system, it's clear the case are there because they optimize some aspect of our working. So efficient coding is a way of optimizing the available range of photoreceptors, the available number of photoreceptors that we have in the retina. It's a way of maximizing our abilities to distinguish between colors and to distinguish between different levels of light. So this is an average of, in log, log coordinates, of the histogram of natural scenes. So natural scenes, so this was taking many thousands of pictures we're taking and they computed the histogram and average and that's the plot. So on average, the histogram of a natural scene looks like this. So with the concept of efficient coding we designed a tone mapping algorithm in which the first stage is performing histogram equalization which flattens the histogram which we know because of the previous comment that I made that is what we prefer when we look at an image. So we are flattening the histogram assuming that the image has a histogram that looks like that. And at the second stage we performed something called divisive or divisive normalization which is also a canonical neural computation according to this very popular work by Cardinian Higger. So divisive normalization amounts to dividing your signal by a measure of the standard deviation of that signal. So you're normalizing by the deviations that it has. So that happens at photoreceptor level and also at cortical level. Anyway, we propose a tone mapping algorithm that does that. It has two stages and in the first one we perform histogram flattening and in the second one we perform this divisive normalization. So for instance, this is the histogram and we adapt our tone mapping curve so that it's flattening this. And here we're showing, this is very recent work with Provin. We're showing here that the way, well so our algorithm has some parameters so we automatically compute these parameters and we find doing some psychophysical experiments that these parameters correlate very well with what subjects choose as optimal parameters. So we let subjects play with these parameters and they choose the best tone mapping that they want. And these choices match rather well with our automatic findings. Anyway, so you have, I would just go fast. So this is the original image that you could see on the monitor and set. And if we take the raw image and apply our tone mapping method, this is what you get. So it's darker but you can see outside now. And the same goes there. And there. You recognize this is la boqueria. Okay, and these are some video examples shot with this twin camera that I showed ages ago. They were some cameras in an orthogonal setup with a semi-transparent mirror in the middle. That's how this was shot. So you can see that there are no, you know, temporal artifacts there or halos or anything. This is a very tricky scene because the dynamic range here is huge and it's usually very dark. So if you wanted to shoot that with a regular camera, you need to add a lot of light, okay? And this looks natural. And I will end up my talk with a video of Octoberfest in Munich. Which is this. And right, so you can find these references in our webpage. And this would not have been possible without my wonderful team. Raquel, Gabriela, Huacaz and Praveen are PhD students and my postdocs are Gigi, Thomas, Dave and Javier. And also without the financial support of this very fine public institutions, ERC, Ministry of Economy, and I thank you. Yes, I will take the questions. Hold on, there's a mic coming for you. Thank you. It was a great talk. Thank you. You said that we prefer some kind of images like flat histogram or what was this term? Yes. I couldn't catch it. Yeah, so we did an experiment in which we asked people to rate images in which we are playing with the mean value, the contrast of the image. And we also did an experiment in which we let people play with the system gamma and rate the images resulting from playing with that. And what we found was that you can correlate the larger image quality scores given by people with the images that have a more flatter lightness histogram. So the lightness is the perceptual intensity and the histogram is the histogram, yeah. Okay, so I guess the philosophical conclusion is we don't get to see what's out there. So it sounds very postmodern to me. Now, going back to the issue about neural processing, so the input to this neural processing is images. So you are saying that you can use the same techniques or to emulate the techniques that start in our retina even for the process that goes from, let's say from the illumination and the objects to our retina. Is that correct? So it's basically similar techniques to the ones that get into the second phase internal to us are the ones that you want to reproduce outside. I mean, the light coming into the retina. Yes, absolutely. And for tool mapping, this has been, so. I'm glad you asked this question. We hadn't rehearsed this. So for me it's personally very satisfying because what we have found is that by developing models that we understand and that we see that they predict rather well what's going on in our heads. With that, we get better results. So it has an application edge. And why is it rewarding for me? Because I think that, well, I would not be able to improve on things if I didn't have models. So I need to understand that. I would not be able to develop methods which consisted on a black box that correlates very well inputs and outputs. We could go that road. Personally, I would not find that satisfying at all. So I want to have good models so that they can be improved because you understand them. And for the particular problem of this last one of tone mapping, understanding what's going on what's happening really at a written level, it's really crucial because the whole goal of this tone mapping problem is that you want to create an image that has a low dynamic range and nonetheless it appears to you as if you were looking at the high dynamic range original. So there in there is embedded how we perceive things, how we perceive that. So it's not a cosmetic part. You really need to understand that if you want to do a really good job. You can do also a good job without going this route. I do not argue with that. But with this, we're getting results which are really currently the best you can get. And this sort of work is published with the image processing outlets of what this work is published, more than in the neuroscience or... Well, we have a few works in vision science or neuroscience outlets, but the majority are in image processing. So we have some works in large vision science, large vision science conference, some neuroscience journals, but most of what we do is we have some under review works on the journal of vision, which is more to the vision side, but most of our publications are more in image processing. Any more questions? Hi, so another aspect that you talked about was the difference in not only different cameras, but the same camera. You move the camera a little and you get considerably different colors for what you showed there. So you depend not only on the camera parameters, but on the lighting that you get in the scene, I guess. Does that mean that any algorithm for image processing trying to use color is going to fail? Well, I wouldn't be that harsh. Maybe it doesn't fail, but you don't get as accurate results as you could if you took that into account. You can get, so I will not lie to you, I've been working on image processing for 20 years now. Man, and for most of them, I didn't know what was going on inside the camera, which is really embarrassing. Despite of that, well, I was able to work and produce algorithms that did very good things. And for some problems, maybe you don't care about this automatic hidden corrections that the camera is performing. In some other cases, it's crucial. So if you're doing, for instance, color correction, you really need to understand what's going on. If you're doing segmentation, maybe you don't need to, but you probably would get better results. I don't know if that answers your question. Yeah, yeah, yeah. And the thing is, you showed that you can correct from one camera to the other, given correspondences. But at least from what you showed, it didn't look like you could go into some kind of normalized space in which you know, okay, maybe the colors are not the real colors, but I can map different cameras to the same color space and then get some consistency in the algorithms trying to recognize a chair that is red in two different pictures. Yes, no, with what we have, we cannot do that because the way to do what you suggest is it would require to calibrate the images. It will require to go inside the camera and to get the camera and perform some tests so you can actually estimate those matrices that I showed. So the whole point of our method, it's so simple because you don't need to actually do what you said of normalizing. You just need to change one image so that it looks like the other. But to normalize into some standard reference, no, I'm afraid you cannot. Oh, I'm sorry, I don't know how to. Thank you. Yes, you need to, sorry, you need to repeat the question. Yeah, why do you think that people are still shooting with analog cameras and what is the dynamic range, for instance, of an analog camera? And yeah, there there is no processing of the image, right? So that's a very good question. So there's some digital cameras have finally reached the dynamic range of film very recently. So there's not an edge on that side anymore. Okay. Shooting on digital allows you to do many, many things in a new sort of manner. For instance, you can shoot an unlimited amount of, or virtually unlimited amount of time and you don't have to worry about something. So if you're shooting on film, your scene or your take cannot last more than 11 minutes. Okay, because that's the physical limit of one roll. So there are many complications. Nonetheless, some people prefer that. Why is that? Well, I invite you to look at a fantastic documentary that Huacasa recommended to me called Side by Side. It's a documentary about the transition in film from analog to digital and it covers all the aspects of filmmaking. I know, acting, but editing and of course, camera operating and so on. And you can see that there are few people that still prefer to shoot on film and it's no longer an aesthetic choice. It's something else is the way you work with that. It's, I think that they value the challenge of having to shoot in a limited amount of time knowing that there are these limits. So I believe that people who prefer to do it you have also photographers who have gone back from digital to shooting on film. Why? Well, because they say, okay, I have only one roll. I have to choose my 24 pictures wisely. I cannot just shoot away. So that forces me to better frame, better pay attention, better do my job. So that might be one reason, I don't know. Do I understand well that partially what you're saying is like, well, now a lot of processing goes into the acquisition and the cameras and there's a lot of things in there that maybe we better would go to raw data cameras and do the processing at the moment that we display it depending on the display that your phone would do it, your projector would do it, rather than that the camera already pre-processes it. Yes, yeah, you're right. Actually, cameras, all of them do these operations that I mentioned. Professionally, you prefer to not let the camera do anything to your image. Just take the roll, the roll, and process it yourself offline. So you do your demo cycling, you do your denoising, and so on. Sometimes you cannot choose, like if you shoot with one of those small cameras that I showed before, you cannot choose. I mean that comes out already, demo cycling, and so on. But, you know, state of the art digital cinema cameras allow you to record in raw and you do all your processing afterwards. Yeah, that's right, but what we, okay, so there are two sides on that. So what you do afterwards, it's done manually, as I said, because if you want your results to be very good, you need a human to do it. So we are not still there, we're working on it, but we're not still having that done manually. And also an important part of that is that we are not there even in the vision science front. So people working on vision science, I'm not claiming I'm working on vision science, people working on vision science do not really, at this point, understand well all the elements of perception that you will need to understand well in order to solve these problems. So we don't understand well how we perceive colors, how the surround affects the way we perceive the screen. Also, it's not clearly understood the issue of dynamic range either. I don't know if that answers your question. Sergio had a question, but it will be later during the lunch, yes. Oh, thank you all for coming. Thanks Marcelo for the wonderful talk with the many pictures and movies. Some other day we'll talk about the movies. And okay, so thank you, so there is fucking time. Gracias.