 So, hi everyone. I'm Peter. We're going to talk about 3D reconstruction using 2D input or images. Just a bit before about me, so I have BSC and statistics and operational research. I also have a master's in economics. I actually also studied for my master's in computer science and machine learning, which I stopped because I didn't want to get a diploma. It wasn't interesting for me. I've been working with machine learning and deep learning and neural networks for 14 years. A bit before it became a buzz. My previous startup, what we did was technology for automatic lip reading. So this is understanding what a person says through visual input and movement only without audio. I was also part of the founding team and the chief architect of a cyber security company that does cyber security for hospitals and medical devices. And now I run a machine learning consultancy and outsourcing agency. It's called Abellions. And I'm also a blogger. I have my blog in the field of visualization and AI. It's called 2D3D.ai. We actually have a Reddit group for the blog now. It's becoming a bit big. This is a bit about me. Now, for the agenda for the call today, we're going to be following a specific example of a specific research in the deep learning area, which is about 3D reconstruction from a single image. While we're following this research, I'm going to explain how is it possible to achieve 3D reconstruction. I'm also going to explain a bit about what is machine learning, what are neural networks, a bit of different types of important neural networks to understand. This lecture doesn't require prerequisites in the field. But because I saw the type of audience that come to LGM, I saw that there's a bit of technical people there. So it will be technical. So hopefully it will challenge your thought a bit. It will be interesting technically-wise. But at the same time, you don't need to know deep learning in order to understand what I'm going to explain. I'm going to explain the concepts to you as we go. So we'll have both an intro to deep learning and an intro to 3D reconstruction using deep learning. Now when we cover the lecture, you see from 3D to 3D using neural networks. You don't see my slide now. This should be the agenda slide. You see agenda slide? Okay. So when we cover the topic, what we're going to also look at is two types of neural networks. One is called convolutional neural net. And from it derives an important neural net in the research of the industry right now. It's called ResNet. I'm also going to touch a bit about digital art using Google Dream. And moving forward, we'll also talk about how to generate fake images, fake 3D models, how it's done today in the industry. And we'll explain the concept of guns, generative adversarial networks that allow for fake creation. If we have more time, I'll try to talk a bit about more examples in the 3D and the 2D spaces and also a bit about philosophy talk about the future of machine learning. Okay. So first thing to understand, we're going to be following a specific research. It's called implicit decoder. This is the X archive link to it here. If you look for implicit decoder 3D on Google, it's the first result there. It has a Git project, open source, fully open source project where you can use the train network. You can use everything, the research is used for everything you will see here. It's all available to you. It has the project page, the examples, everything. In my blog at 2D3D.ai, there are also two posts about this specific research, which you can look into a bit more later with references and explanations also, but you will get most of the explanations during our talk as well. The main idea in the research is this. If we have this one image of a chair, this is 128 by 128 pixels, the neural net is able to reconstruct this three model on the right side. So we take just one single image, small image, and from it we reconstruct this mesh you see on the right hand. And now, in order to understand how to do the 2D3D, we first have to tackle the problem of 3D3D. So if we have a 3D model, we want to reconstruct the same 3D model from it. Sounds very simple. I saw somebody ask about the XRCAP. I'll look later at the link. Sounds very simple, 3D3D. It should be, it's not as simple. We'll explain shortly why. But first I have to a bit expand about different presentations, digital representations of 3D models. So one representation is a voxel. And this is the equivalent of a pixel in image. A voxel is in 3D. So voxels are a set of points in space. And for each point, we say does this point has mass or have mass or doesn't have mass? If it has mass, it's marked as one. And then we see this green. And if it's not, it's a black. Now specifically with the voxel shape we're looking at here is a 64 by 64 by 64 resolution of voxel. Yes, somebody wrote volumetric pixel. Thank you. From it, there is another similar representation, which is a point cloud. This is similar an idea, but instead of looking at all the points of the 3D space, we look at randomly sampled points or specific points that are of interest to us. Now point cloud are important because when we want to do 3D reconstruction, and when we're working with neural network in this area, if we'll cover all the 64 by 64 by 64 points in space, this will be a lot of time computationally. So we try to lower it down using the same randomly sampled points in space, but achieving very similar result. We still see the three structure underneath the model, but it's not on the points in space. Now the most popular, the most common way to represent a 3D model usually is through a mesh. Now for 3D printing, for gaming industry, everything to do 3D modeling, usually a mesh is used. Mesh is a combination of polygons and space, usually triangles, which are connected using edges and dots. Now let me check just somebody asked about the link. Let me quickly check this for us. Yes, this is the link. So if you look also for implicit decoder, you'll also find it including the git and everything. I might share also here on the IRC in the end. Okay, now continuing on on the 3D models representation. So an important thing here to notice is that we have to have a way to transform from one type of model representation to the other. So transferring the mesh to Voxel is quite straightforward. What you do is you take the mesh and you sample 3D points in space. And for each point you mark is it within the mass or without the mass. And you have a tool online, which you can use it download the bin Voxel and you can use it also on your somebody wrote about Minecraft. So Minecraft is a good way to give an example of Voxels. So you can you can use bin Voxel in order to do the mesh to Voxel, which Voxel to mesh is a bit more tricky. There is an algorithm for this. The YouTube link here is referencing to a very good lecture about this algorithm is a one hour lecture about it's explaining it very thoroughly. For me it was a bit hard to find easy to handle explanations. This one has it going also in depth. Everything is covered in this one lecture. The interesting thing from Voxel to mesh to understand is the concept is is using by the way the algorithm is called marching cube. And the concept in marching cube is to look every time at a specific closed proximate set of points in 3D space of Voxels and 3D space. And when we're looking at this we're actually looking at cubes with that have eight points around them. And the yellow marked points here on the right, the yellow marked points here are the ones that have the mess and the not marked are the ones that don't have the mess. And we have a specific closed set of points construction of which ones have mess and which one don't. And for each of these construction possibilities, there are specific set of polygons that align to it. So if we look at this construction here, these are two triangles that are just in the middle of the square. If we look at this one Voxel here, there's just one triangle on the side of the square. Now what the cool thing is that we can take all the different Voxels in the space with their presentation are there in the mess or not in the mess. And for each combination, we put on the right polygons. And the algorithm is a bit more complicated. But the general idea is that you take all these polygons, you add them together, and you get the outcome mesh. Okay. Now, feel free really always to post questions on the IRC. I don't know the technical background of course, of the audience. So just feel free to ask me questions to clarify if something is a bit hard to understand. Otherwise, I'll just be running on with the lecture. Okay. So after we have a bit of 3d modeling understanding, let's talk about machine learning. So what is it? The original definition, there were several, but this one is one of the better ones that I know the original definition is a field of study that gives computers the ability to learn without being explicitly programmed. Now today, it's a huge buzzword along with neural networks and deep learning. So people say and companies use the word machine learning for everything. So it's a bit harder to distinguish. Still, I like to keep it in the realm of this is a set of tools and algorithms, mathematical algorithms that need data and they construct a view of the data programmatically. Now within this set of tools, I will put statistics, regressions, logistic regressions. I will put the clustering algorithms, and many other types. The one that is important for our talk today is classification algorithms. And we will explain what they mean. So a classification task is the one that we have a point a data point. For example, in this case, this data point can be either belonging to a giraffe or an elephant with characteristics or features. So each of our data points has a weight and a height attached to it. And in a classification task, we want to separate points to different classes based on the data. So of course, giraffes, we will expect them to have a higher height and a lower weight than elephants. Right. And when we do machine learning and we use the classification algorithm, the first thing that happens is that we train this algorithm, we train the software to be able to distinguish between giraffes and elephants based on their height and weight. So for example, in this case, the software might learn that this curve is a separating curve between the two groups. The next stage will be okay. Now we have a new point of height and weight of one item. But we don't know who it belongs to we don't know if it's a giraffe or elephant, but we want the computer to tell us for us. The nice thing is that the computer already learned on a training sample on the training set before, and it has this separation curve. So it can tell tell us okay, in this case, what you have here is an elephant. Okay. Now the next thing the other subset of the bigger thing here is what is the neural network. So first of all, and computer sense and neural network is a sub group of machine learning. It's a technique to allow for machine learning algorithms to exist and to learn and to have predictions. But usually when I explain neural networks, I first like to explain a biological type of brain system, how it works and from it take it to the computer. So our brain and in general biological brain is very versatile and very generic. What do I mean by that? So here we have four examples of different tasks that that entities with a biological brain were able to learn. For example, the top left here is a person he has a camera here this camera is attached to a set of electrodes that put electric signals on his tongue. And his brain was able and he was able to learn to look at the world around him using those signals on his tongue. So actually the sensing area in his mind that is responsible for the tongue was able to also sense visual input around him. So it's a different completely different area than the visual area in the brain. But it was able to do the same function as the visual area. So we see this type of transfer of learning ability, which were different areas in the brain, even though they are completely for different purposes. On the top right, we see a person that is using type of sonar ability to understand his location in space. So actually blind people many times use this type of abilities in order to understand orientation better if they're facing an obstacle. And again here, the part of the brain that is supposed to be responsible for hearing is learning how to be also responsible and it's able to learn how to be also responsible for location in space. Bottom left is also a cool thing this this belt here buses towards the north all the time. So if the person moves, he feels where the north is always. And he's actually able to develop a type of a bird like intuition of orientation and space. Bottom right is a frog that was attached with an extra I he was able also to use how to learn to learn how to use that I I'll give you a personal example from my own experience. When we were working on the recruiting software, for us, it was important to talk to people who were recruiting real life. Many times these are deaf people who look at people talking to understand what they're saying because they can't hear. And the interesting thing there is that that visual input is transferring also to understanding of speech. And another interesting part there that happens is that for a leap reader, he doesn't understand every word at the moment it is being said like us when we hear speech. A leap reader has to see the context of the conversation. He looks at several words said. He saves it back in a buffer sort of way in his memory. And then he reconstructs the entire sentence based on the context of the different leak. I see a question here. So the generic ability of the brain applies also to the lesson. Yes. Our brain is not so unique. It's very similar to the way that mammals operate and also less sophisticated groups of animals. Yeah, we all we have a bit of more layers on top of the way the layers work is actually very generic and very similar in all across our brain and others. Okay. So this is the web biological brain is constructed. There are many cells attached. These are urine cells or nerve cells. And the cell structure that is the following. So here in the center, we have the cell body that's the important part of the cell. This cell connected to previous cells using dentrites. They transfer information here on those lines. The information is an electric pulse. The cell body gets the electric pulses from all the adjacent cells that it has attached to it. And decides if it should be emitting a forward pulse or not in the actual. If it is emitting a forward pulse to the axon. This this pulse is going here on the tritus on the output side of that neuron. And it is connected through synapses, the next neurons in line that synapses are chemical connections that actually have an electric signal coming in here on the edge of the cell and transferring chemical signal to the next cell. And in our brains in the human brain, there are millions of these neurons connected. And this is a network basically a network of neurons connected to each other, transferring information along this network. And there's a bit of background noise. I don't know where it's coming from. Maybe for me, let me know. If it's possible, if you can mute it because it's Oh, thank you. Okay. Oh, this is the biological brain construct. Now, the computer brains construct is very, very similar. We have a set of input very similar to the tritis that we saw before. Each input is connected to the neuron body through a weight. And this is like that line that electrical current line we saw. And then there's the neuron and the neuron gets the information from the previous input and decides if it's going to emit an an output or not. Now it is important to notice each line here, its connection here has a weight w. The stronger the connection, the higher the w, which means the higher the importance of the input from x one, for example, for the neuron. In which case, it means if the x one had a very high input value and and weight w one was very high, the neuron will probably emit the output signal because it puts a lot of emphasis on that x one input. Now if we take this construct and attach it to the next neuron and the next one and the next one, we get a very similar idea of a neural network. But now in a computer sense and not in a biological sense. But still the way the information flows is very similar. The construction is very similar. Okay, I want to cover a bit of terminologies before we continue. These are important both for the top today and for your general knowledge of this field. So many times people talk about features. What is a feature? So a feature is a single single data point in the RGB space in the image space, one color for one pick for one pixel is a picture. So the red color for the first pixel is a feature, the green color for the first pixel is a feature and moving on a numeric measure, measure one numeric measure is a feature. So we saw on the example of the elephants and giraffes a height in centimeters is one feature a width in in width here should be in meters, not meters squared. It's also a feature price also feature, etc. Of course, numeric measures are very hard to figure out as features, but also categorical measures are features. A definition of if somebody is a male or female, that's a feature. A classification of an object to be a car, a bus, a truck, a bicycle, anything else is also a feature. In the signal processing, for example, the acoustic processing space, a measure of the amplitude in one time point is a feature, for example, and sound one decibel could be a measure of a feature. In the computer software specific in machine learning, there's another important feature to recognize which is an encoding of data. So sometimes we can have a calculation over a big neural network. And if we take the center of this calculation, the middle of it, the middle of the network, it has outputs in it in the middle of the network, just on the end of it. And these outputs are sometimes referred to as encodings. And they're also features. Another thing I already mentioned is activation functions. This is this body of the neuron here that we saw that decides if it's going to emit an output or not. So there are many functions, mathematical functions that are used. One of the most common one in the neural net space is called ReLU rectifier linear unit, which is a very simple function. It's f of z here. And it's only positive z, which means if z was negative, the value of the function will be zero, no output. If that is positive, the value of the function is the value of z. The output is as high as that. And we see it here in this graph form also. Okay. So how does machine learning software and specific neural network learn? So in order for it to learn, we already saw we have to have training samples. And it learns in epochs in stages. For each epoch, it's the neural network learns over all the data set, all over again for each out for each epoch. Now, during this learning process, what happens is we take the data set and we put it into the network. For example, let's look at our ID. Let's look at our example of elephants and giraffes. So it gets many measures of different elephants and giraffes with their height onwards. And it gets an output how much the network thinks this is an elephant or is this a giraffe. But because we're doing it during training, we can tell the network for each set for each data point, if it's an elephant or giraffe, so we can measure how much network was wrong. What was the distance of the network from the truth? And our goal in each epoch is to try to diminish this distance, diminish this discrepancy. The way we're diminishing it is by optimizing our weights. So we have those W's that we saw before. We can save all those weights in a big construct, a tensor, or a matrix if it's a simple construct. And what we do is we alter their values so that the distance is smaller. We actually have a mathematical optimization problem here where the changing variables are the weights. At the end of the epoch, we get the loss or the distance between the truth that the network was able to get. And we measure how much it changed in comparison to the previous epoch. When the loss is stopping, the change of the loss is stopping with the epochs, which meaning the network is not able to learn more, then we say, OK, the network has learned and we've finished the learning process. Please ask questions in this case if something is unclear, because this is really the fundamentals of everything we're going to talk and everything you will see in this space when you're looking at yourself. OK, so I mentioned that we're going to be talking about two types of very important networks. One is a convolutional neural network. So a convolutional neural network is a very specific net. The motivation for it came from the limitations that we had in image processing with regular neural networks. The problem, original problem was that if we took a regular neural network, which is usually a fully connected neural network, we'll have an memory and processing time issue. Now, why is this? Let's look at an example of a simple image of 200 by 200 pixels RGB and just one layer with one output from the network. We don't have multiple layers. We don't have multiple neurons, just one neuron, one output and that's it. Because this image is so large, this will amount to 120 weights just for this one image and one layer and one neuron of output. That's it. This is not scalable at all. If you want to put 10 neurons, we're going to grow by an order of magnitude and imagine having an optimization problem over so many different variables. In this case, for example, just 120,000 variables. So we had an issue here. Convolution neural nets came to solve this. And how does it do it? The idea is often came from nature, from the way we and animals also look at the surrounding around us. If you now try to figure out how you're looking at everything around your space, you will notice that most of the things you see don't really see. Your focus area is very specific, very limited, very small. What you're doing is you're using this focus area and you're looking around you and then in your mind, you're able to construct the entire view of everything you see around you. Now, not only this focus area is small, the way that you use it and you move around and you figure out everything is repeating itself. It's the same neurons in the brain as the same connections. It's the same thought process that goes into taking this whole area around you and combining it into one understanding of what's going on. And this is what convolutional neural networks in the computer are trying to achieve. How does it work? It's a bit technical. I'll try to make it a bit high level explanation. The idea is this. If we take, for example, this image here of a vehicle as an input RGB, we have three channels of input. What we do when we do a convolution operation on it is we every time look at a small set of the image and we move along it in a type of repeating manner. And we calculate the output neuron just based on this small area. And the calculation is always using the same weights for the entire image. So we'll take this small rectangle and we'll move it to the right, but we'll use the same weights to do a new calculation and so on until we have a better understanding, which is this box here of the pixels that we saw on the left. Now, this box here is lower in dimension. So, for example, if the input here was 200 by 200 pixel, the box here could be 100 by 100 pixels. But actually it's not really pixels anymore now. It's a different construct. It is still 2D type of construct, but the channels here are not three anymore. They're a bit more. They could be, for example, 10. And now in these 10 channels, we have an encoding and a compression of what was in the original larger image. Now the neural network is outputting a more understanding, semantic understanding of what it saw in the image. And we do the same process again. Again, another convolution, different weights now. But again, those weights are repeating over the different samples we'll do on this next layer and so on and so on until we are able to diminish all the output to 10 units, each one for different type of class of a vehicle and the network is able to understand if what is in the image is a car, is a truck, it's a van, it's a bicycle, it's a bus or something else. Now, I want to give you an intuition on how this works. We're going to use this example here. I'm going to go to the link. Hopefully you will see somebody send something on the IRC. Let me see what is it alternating convolution and pulling. Yes, it is like he I see we have people who know a bit about it. But yeah, it's using also pulling. We're not going to talk pulling in this lecture. If I if you I'll go back I'll show you where the pulling is done soon. But let's look at this network here now. This is network with one, two, three, four, five, six, seven layers. And this network was trained to recognize handwritten digits from zero to nine. I'm going to write here the digit four. And here we see the digit for the slower level here is the digit four that I wrote. Okay, there's a digit four that I wrote. This is not RGB. This is just one color channel, of course, you're just black or white. And we see here already the first convolution layer, which is these fours that you see on the next step. One, two, three, four, five, six channels of the number four, each one is a different channel by itself. Each one, as you see here is calculating convolutional type of transformation on the original image. Now you see when I'm moving the mouse, those lines, those yellow, those kind of green, white and black lines, they're the same. These are the weights, the ways are still remaining the same. But the output is changing because the weights are multiplied by the input and the input of course changing. So the output is changing. And the cool thing here we see is okay, we have here, we said six different channels of output. But the interesting thing is the layer is still semantically seeing the number four, it's still seeing the construct of the number four. And we can clearly see it with our human eye. As we move up on the layers, we see that still the number four sometimes appears sometimes not. And interesting thing I forgot to mention here, for each channel, the network is looking at different parts of the number four. So for this second channel here on the right, the right parts of what I drew are marked in white, which means they are more interesting than the left parts. For this one here, more left bottom and top parts are interesting than the middle parts. For this one here, more of the middle parts are marked than the other ones. So we see that for each channel, already the network is focusing on different areas of the image. And the more we hop up on the layers, the less clear for our human eye is what's going on, what is the type of encoding the network is doing. But when we move up here, we already, we don't have already a 2D encoding, it's just a 1D vector type of encoding, which in the end turns out to be, how can I move this here? Is it possible to easily move it? In the end, we see just a 10 size encoding, which actually tells us what is the number that I drew. And the number is the first guess here is four. This is to give you a bit of intuition about what convolution is. Okay. About pulling somebody else, I'll just mention it briefly. If you see here, we have convolution, value, activation and then pulling. This is repeating itself to allow for even faster dimensionality reduction and even faster calculations. Okay. How does it connect to our 3D reconstruction topic? I think we have a quick question for you. There is a chance we'll run more than an hour. Is this okay? Or should I be sticking to the hour? Okay, perfect. So we'll have a chance to really go through the stuff guys. So 3D to 3D. The way it's done is we have a voxel as an input 64 by 64 by 64 dimensions. We want to run it through a neural network. And we want this neural network later to be able to classify 3D coordinates in space. Now I'll explain. The neural net will get as an input the voxel, but not only the voxel, it will get as an input also one point in 3D space, just one point. Okay. And for this point, the network will learn to classify if this point is within the mass of the object or not within the mass of the object. Okay. And then the cool thing that could happen is if we sample enough 3D points in space, and we run them through the network. And for each point, the network tells us, okay, this point is within the object, this one is without, then we'll have this 3D reconstruction in the point cloud form. And this is the first stage for 3D reconstruction. Okay, but what is the structure of this network? How does this really work? So the structure is this, I'll run this to the end is like this. Actually, this network is created from two networks. These are these, these purple, I forgot the name in English of these objects, but these purple decoder and encoder. And what is happening is this, the 3D encoder is a convolution, your net will look at it quite soon. What it does is it takes a 3D model in a voxel form, and it encodes it into a z vector. It's a vector with one dimension and 128 numbers. The decoder then can take this z vector and can take just one point in 3D space, which is of course made from three numbers because we have three, three accesses. And it can classify this one point. Is it within the mass or without the mass true or false? It gets as an input, the decoder gets as an input, a bit of a larger vector of size 131, the first 128 parts of the vector as a z vector. And the last three are the 3D coordinate. And it classifies that coordinate. Now I want to briefly talk about this z vector because we will use it along the lecture and actually this type of encoding method and vector encoding is very prominent today in the industry. The purpose of the z vector is several. First, it's a very efficient type of compression of the original 3D model. So the 3D model originally was 64 by 64 by 64 voxels points. The z vector is only 128, much smaller, couple of orders of magnitude smaller. It also holds within it a representation of the 3D shape. It has the information of the 3D shape, but in one dimensional space. It is also used as an input for the 3D decoder for the reconstruction. The numbers within it, the 128 numbers within it range from zero to one. It's a continuous range from zero to one. So one number could be 0.22. The other could be 0.9. And the other could be 0.51 and so on. And this is important. It's a result of the 3D reconstruction process. After we have a neural network that is able to efficiently and accurately reconstruct a 3D point cloud from a 3D voxel, the z vector will be what we want to get from this network. The z vector will be the interesting thing that we get from this network. So now it's important to mention here again after the 3D to 3D reconstruction network has been trained and is working, we use it again. We transfer all the different 3D models that we have through it. And for each one, we create the corresponding z vector. And we save the z vector pair 3D model for later use. Now this is a bit technical part. The encoder part, I'm just going to briefly show it. The encoder is a convolutional neural net that has one, two, three, four, five convolution steps. By the way, there is no pulling done here. The interesting thing is that we could see on each step is that the size of the voxel is diminishing by two each time. But the amounts of channels are growing by two each time also until this last convolution that transforms from 256 channels on those small four by four, by four voxels into a z vector of size 128. Now this takes this the encoder is just taking an input of voxel and gets an output of z vector. Then we take the z vector to the decoder. Along with it, we have a 3D point in space just one. And we run it through a new type of network. These are fully connected layers. It's not important to understand if you're not really in the field. For those of you who are these are only fully connected layers with skip connections. We're not going to be talking about them today. And the idea of the decoder is a classification task. Very simple. It has just one output. The output is a numeric is a number ranging from zero to one. If it's close to zero, this means that the code of things, the 3D coordinate is not in the 3D model. And if it's one, the 3D coordinate is in the 3D model. Okay, I want to talk briefly about, we have this architecture, but how, how, how really are we doing the training process? What's going on during training here? One second, I'll drink your water. So what happens during training is this, we can't just have one sample of a 3D coordinate per model. It's impossible to have good training with it. We have to have many 3D coordinates per model. And actually, we use about 16 in the third coordinates per 3D model. So each model, before we start the neural network training, we actually sample 16 in the third coordinates around it. And each one we classify, is it in the mass or not in the mass? And we have this as our training data set. And what? We have a question. Yeah. Oh, I see here a couple, maybe. So the encoding is not completely arbitrary because of the convolutional network. Of course, it's encoding is actually never arbitrary. It's a very important process in all neural nets that you will see. It's very specific. And most of the research is about to do the right encoding. Okay, when we do these types of works that I do diminishing and then expanding type of calculations. Does it answer your question? I think your name is Johan. I can see. Okay, good. So when we do the training is we take each model and we sample 16 in the third coordinates and we run per each model we run during training we're on the network 16 in the third times per model. Now we're going to talk later about I'm going to give you a spoiler we have 5,000 training models here. So these are a lot of coordinates in 3D space to run training for. But actually I lied a bit because we don't want to have the amount of coordinates growing proportionately to the size of our dimensions of the 3D models. So if we look at 64 in the third shape of voxel size of voxel, we don't want to have a number of sampled models in 3D space also in the range of 64 by third. Because then computationally we won't be able to train an effort. Our goal is to reach computational complexity or more concretely, the amount of points per model in the compute on in the range of all and the end in the second. So the larger the model itself the larger the voxel the larger the 3D sample points are but in the one order of magnitude less. And how do we do this? So what we do is actually we're looking at increasing sizes of voxels and we're training first from the smaller size, which is 16 in the third, until the larger ones, which is 128 in the third. But for them, we do point sampling, which is a bit tricky, which is keeping our own over and in the second type of size of 3D coordinates in space. And we're going to look at this algorithm that does specifically this. So what we want to do is we want to have a sophisticated way of sampling 3D points in space. In order to have this, I want to I want to maybe turn on the video so you could see me explaining it first and then we're going to look at the lecture. Hopefully not. Do you see me now? I don't think you see me. Yeah, okay. So so look at this pen here. We have the mess and we have the empty around it. If I sample many random points in this area, it's not interesting. There's nothing to learn here. If I sample many random points in the core of the pen is still I think my camera is not working well. Let me try and fix it one second. So if I sample 3D coordinates in this area in this blank area, nothing interesting to learn there. The network won't be able to learn anything about the construct of this pen. The same thing will happen if we will sample points within the core of the pen. So they're all be with mass, of course, but the network won't be able to recognize shape or form. What we want to be sampling is the points on the edge, the points around the pen just close to the edge, both inside of the pen and outside of the pen because this is where change occurs. And this is where we can learn something with this where the network can learn something. And this is what we're going to be looking here at the cell, the code I want to show you. One second, I'm going to be showing the slides again. Okay, you should be seeing the slide just let me know if you don't see them. So we'll define an edge point in 3D space. Here you see 2D representation, but just imagine it also has a volume to it. I just didn't put it because it's harder to see it. So an edge point is a point that if you look at the cube of three points to the left, three points to the right, three points up, three points down around it, then at least one point is within the mass. Where's my mouse? At least one point is within the mass. This is the black ones here. And at least one point is not within the mass. Right? So this is close to the edge. It's an edge point. A non edge point is one that also we're looking at the same type amount of points around it. But now all the points are marked the same. They're either all outside of the mass, or they're all inside of the mass. Therefore, they're a bit less interesting. What we're doing is we have an iteration process over the voxel. First step we try to do is this. We sample each point in the 3D space. And we mark and we keep only those points that are edges. And hopefully we will reach that 16 in the third points for 16 in the third voxel, two times 16 in the third for 32 in the third voxel, four times 16 in the third for larger and so on. So hopefully we will reach this boundary we set to have all of n in the second using this measurement way. If we see that the amount of edge points is moving is going over that amount that we have allowed for 3D sample points. This is a failed type of sample. It means we have too many edge points. We can't use them or too many. And of course, if we succeed, if we sampled all the edge points and we're still within the boundary, then we're randomly sampling more points that will reach that boundary. But the interesting part is if we overpass that boundary and what happens then? At this case, what we're doing is again, we're iterating over all the points along the voxel. But now it's not really all the points. Now it's only the even points, even index points are along the voxel. So we have diminished by two, by a fraction of two, all the points we're measuring. And for each point, we're checking if it's the edge or not edge. If it's an edge, we're keeping it. And if we haven't reached the boundary, amazing. Now we're finishing off by randomly sampling points until we do reach the boundary. And if we've passed the boundary, again, we have too many edge points, then we fall back to the original ideas. Okay, let's sample just random points in space until we'll reach that boundary that we have set for us. This is, this was a cool way that they didn't research to handle computation times that I haven't seen often done. Maybe it is, but it just didn't see it. But this is something that they made new, which I liked. Okay. And by the way, guys, this point sampling set a code in the code itself, you can also find in the research in the get it's an open source. So you can use it for whatever you want. It doesn't have to be also with this specific implementation. Okay. models, we can talk about models. Yeah. So, um, during the training phase and also the testing and validation phase, we use 5000 training samples or 5000 models, they're get URL I'll send soon. We use 5000 training samples per category. This means that for chairs, we have 5000 chairs for tables. We have 5000 tables for airplanes. We have 5000 airplanes just for training. And remember, for each of these, we have sample 3d points in space growing from 16 in the third to I think it was four times or eight times 16 in the third. I don't remember correctly now. Now each model has couple of concepts. Each model is originally a mesh. By the way, this data is taken from shape net. Those of you are familiar a bit with the field. Shape net is the big data set, academic data set of 3d models. It's annotated. So it's has meshes and everything. And the center of the mess is in the zero point. Why is the height? I'll show you here. Why is the height? X is the width and Z is the depth. Each model also comes with a foxel of 64 by 64 and point cloud sample for 16 in the third 32, the third and 64 in the third resolutions. I'm going to quickly look for the git URL. It's very easy for me to find. So one second. So you have it before I forget. So this is the git. If anybody is interested, also, I will know this while we're doing this. I will also send you a link to what I wrote in my blog about it. Maybe you'll have some more information that could help you. This is what I wrote. Also through the git, you will get to the project page. It has more examples, more details and also the paper itself if anybody is interested. Okay. So I'm continuing. Yeah. So during training phase, what we do is we actually have three training steps for 3D to 3D reconstruction. The first one is using 16 in the third sample coordinates according to the point sampling code we saw before. Next step is taking the train network from the first step and refining it for a higher resolution using two times 16 in the third. And the last step is for the highest resolution in this case, which was 64 in the third box. So it uses 32 in the third 3D coordinates. Just so you understand training times in this case, just for chairs, 5000 chairs for these points, it takes a full day of training on the cloud with the strongest GPU you can get right now, which is a V 100. I think it's still the strongest one you can get until maybe the last time I checked about couple of months ago. This machine costs $5 an hour to run. And one training run is one day, right? Of course when you do research or even when you implement this product, you sometimes need to do many training runs over many more data points. This is expensive runtime. So here is the important reason why that coordinate sampling was important. And why this diminishing and then expanding again is also important to try to keep runtime low. Okay, now we've covered 3D to 3D. Now I want to be talking about 2D to 3D. In order to talk about this, we'll talk about ResNet. And this is a very important network. And we'll also touch about, which will be a bit of off topic discussion about digital art and Google Dream. But first, let's see what is 2D to 3D in the general context. So the idea is, if we get an image now as an input instead of a 3D model, we want to transfer it over an encoder, a very similar type of encoder, but now the 2D encoding instead of a 3D encoding. This encoder will create the same Z vector. The idea here is the Z vector, the information stored inside will be the same that we saw before for the 3D case, but now for 2D input. After we have the Z vector, everything else is exactly the same. We still have 3D coordinates sampled in space. We take a combination of each coordinate with the Z vector. We run through the decoder, the decoder classifies is the coordinate within the mass or not within the mass. What will happen now is if we sample all 3D coordinates in space, pair image and pair Z vector, we could run the Z vector with all the 3D coordinates in space and reconstruct a full voxel, not a point cloud now, a full voxel. And from this voxel we have a marching cube algorithm that can create a mesh and we could get this mesh here on the right. So basically the original concepts of the 3D to 3D remain exactly the same. The other thing in this phase, in this stage, the changes is that we have a new encoder that we need to build and train. Everything else is static. It doesn't change now. Now, in order to train this encoder, what we will have is samples of images and their corresponding Z vectors, which we already have from the previous run, which we discussed. Now, in order to understand the encoder, I need to explain a bit about ResNet. So ResNet is an image classification neural network. It was created in 2015, not so long ago. By the way, in this space and the machine learning space and deep learning, the advancements are very fast. ResNet is already considered an old network, just five years from research, not from product, just from research and it's already kind of old people are trying and maybe finding new ways to do something even better than ResNet. But still today it's a very common framework and architecture that is used in a lot of research. We won't talk about the motivation, but the interesting thing about ResNet is that it makes possible to train networks with hundreds of layers. When we talked about conclusions, I mentioned that it is just one layer with 128,000 weights. ResNet network can be many, many, many more layers and still be computationally efficient and still be able to to have the network learning and understanding semantics from an image. Okay. The cool thing here is that even five years ago, this network, the ResNet was trained over a data set of academia data set that's called ImageNet. This data set also has, it has many different images and each one is classified what's in the image. A human error rate is 5%, which means 5%, roughly 5% of the time people are classifying an image wrong. ResNet was 3.5% better than a human already five years ago. This is a description of the architecture of ResNet. Each jump here is one layer. Sorry, each jump here, those small arrows is one layer. You can see it's many layers. This is the meaning of deep learning. So today sometimes deep learning, sometimes you hear neural networks, they're basically synonyms. Deep learning is just taking a deep neural network. That's it. There's nothing different between neural network and deep learning, really. Okay. Before we continue, I want to talk about something cool in connection to this thing is Google Dream. So in 2015, along with ResNet, there was another network called Inception by Google Research Team. It's kind of similar in construct to ResNet, a bit different. But along with this network, the team tried to answer another question, not just for image classification, but something more. They wanted to understand, I don't have the IRC on me. They wanted to understand if it's possible to quote, unquote, debug a neural network, which means to try to understand what's going on internally within the network. So one question was, okay, a network recognizes the images of bananas. But when a network looks at a banana, what does it see? And the way to achieve that is to take a trained neural network that is able to classify images. And now do a very similar training process, but over different things. Now you take an input, which is white noise image is a random image, completely random. And you take it as an input for the network. And you see what is the output from the network is. And you have a training iteration now, but the iteration is not over the weights of the network. It's over the pixels of the input image, you try to optimize the original image. So that it will look more similar to a banana for the network. And you run this on several iterations, and then you reach this for this image we see here on the right that seems like we have some bananas inside of it. And this is what a network looks like when when it sees banana. Now this created a whole project called Google Dream. Maybe some of your familiar here, I would guess some are because, well, it's coders and digital type of background people here in digital imagery. So I guess you guys know about it, but I'll explain it still because I think it's cool to explain. So now you can do this not only with bananas, of course, you can do this with ants. And then you can see that the network is looking things very similar to ants and starfish and screws and parachutes, different types of images can be created using a neural net. Now up to here we discussed for Google Dream only if you look at the end at the output layer of the neural network. And you want to see what I'm looking at at the output. But you can say no, I just don't want to be looking at the output there. I want to be looking at the first layers. We saw deep network. It has many layers. I want to be debugging quote unquote, the first layers of the network to figure what's it's looking there. By the way, something I forgot to mention ResNet, the different layers there. They're all convolutions. It's everything there is convolution neural network. It's the standard today with image processing. Now, when we are interested in the first layers of that network, and then we want to take an input image. This is a clear one input image. And we want to alter it to be able to figure out better what the network is looking at on the first layers. We run a similar process to the one I described before. And then what we notice is that the network is looking for the outlines of objects and animals here and the changes of colors. This is where the emphasis of the network is. This is what it's searching for. And if we do a similar process, but for later on parts of the network for later layers, then what's happening is the network is trying to look from this original input image is trying to look for different constructs of objects. Now it's not looking for the outlines of the original objects. It's trying to figure out where our objects in the image. So it has like circles here and different types of shapes and like things that look like sort of like a graph and all sorts of shapes. Now, if we take this process and we look even for higher layers of the network, the more semantic layers that what it already don't see any type of human recognizable image, but hold an encoding of semantic representations and semantic information in an image, then we can reach this example, which is taking an image of a cloud and running it through the network. And then the network is trying to figure out if it sees objects in the clouds very similar to what we do when we look up at the clouds and we try to recognize if we see a face or whatever. So that's very much the same. So maybe it's hard for you to recognize here all those pagodas and the sheeps and the snails. But if we look closer, you see that the network is looking for things that look like fish and snails and maybe it's recognizing very similarly to a child that is imagining something that is so in a cloud and it's recognizing it semantically. Another thing it gives us it gives us the ability to understand how the network was originally trained. What was the training data set? What do I mean? If we do the same thing, but we look at different types of images, for example, images of the horizon, many times that the network will recognize within its towers and pagodas and stuff, which is very expected because if you look at a horizon, many times you would expect to see a building there or something in an image of a horizon. Trees network is looking for buildings in the image. So this gives us a hint that the original data set, when he had trees images, it also had buildings in the same image. Leaves, so it's birds and instincts and insects very intuitive to understand that if leaves are in an image, many times could be a bird also. Now, these images already look kind of cool and psychedelic, but you can go even more psychedelic here. You can take real paintings and real imagery by artists and you can run them to a network and see what the network is seeing on the deeper layers inside of it. And then you have this transformation of the Edward for example, a munch artist and it's kind of psychedelic to your eyes along this area and it's kind of freaky stuff. Also, if we look at this beautiful image of the pasture area, we can see that the network is expecting to see tractors and different types of sheeps and psychedelic things there. And this last example, all of these images are are synthetic generated. They actually start off from nothing real. They start off from a white noise image that was altered many, many, many times until it reached this image on the top left. Then you take this image on the top left, you crop one part of it, you zoom into this part and you run again the same process of changing the image and you get this next image on the right. And again, you take a crop area from here. You zoom on it and you run the network and you get this image here and moving on, you can get an infinite amount of new images created from noise. Now, as I said, this allows us also to do debugging of neural nets and training samples. So for example, if we want to see the inception network, how does it see dumbbells, they always come with a hand attached. So this tells us that the data set originally had hands with dumbbells usually not separated. Now, going back to 2D to 3D, we have exactly the amount of time that we need. So what we said in the 2D to 3D space, the decoder remains the same, the setback remains the same. The only thing that changes is the encoder. Now, actually the encoder is a ResNet network. It's the original ResNet as it was in 2015 with a really small change, the two last layers are changed. That's it. So this original ResNet network was trained to classify images for a thousand different classes, different images. Now, this very similar ResNet construct is trained to create a Z vector of 128 numbers. And how is it trained? We don't need to look at this. It's not so interesting. How is it trained? For each Z vector, which we now have because we have 3D to 3D before, for each Z vector, which is basically also 3D model, we have 20 different images for training and we train a 2D encoder to get an image and predict what will be the outcome Z vector. Hello. After this 2D encoder was trained, what we can do is we have a new image that the network has never seen before. We can run it to the trained 2D encoder. It creates for us the Z vector. We can sample all the 3D coordinates in space, let's say 64 by 64 by 64, actually in the research and also when I implemented this network we did I think for 256 in the third resolution and we create a voxel and then when we create a voxel we have marching cube and this marching cube gets us this mesh result. So guys we actually one last thing I forgot to mention that the only thing that is done for training for 2D to 3D is the 2D encoder. To train this part it takes 2 to 3 hours on the same V100 GPU device. As we finish the part that's talking about just from 2D to 3D reconstruction, if there are specific questions this is the time to ask, otherwise I'll be continuing talking about guns and paper. Okay somebody asked do you reuse the learning done with ResNet or just a truck? We just use the structure we don't reuse the learning but an interesting thing to do which I didn't do and I didn't see them doing in the research is to use the trained ResNet network to use those pre-trained weights in order to now better train the 2D encoder. I haven't seen it done it will be interesting to see it done maybe something will come out of it. It's black magic to me. Hopefully but though you get the intuition this is what I'm trying to give you from this lecture. Okay, let's talk about fake. We won't talk about accuracy measures because it's not as important and we don't have so much time. Yeah, so here are a couple of examples not only of chairs but of reconstructing of tables and cars. This is an interesting one because you can see here. I hope you see well on the screen. You see the small truck here. I'm in the back. It doesn't have anything and the network actually read this this here. This are second to the right column is the our network and actually construct this truck very well and the ground truth the original 3D model is on the right most part. And this could be also done with airplanes or the stealth aeroplane. You can see it kind of here also and and this fighter jet is here. Somebody's asking, is there any interpretation that could be given to any of the 128 dimensions? Is the next stick and recorded? But it's connected. Could you could you explain? In the meantime, I'll try to explain better. So the interpretation for the 128 dimensions of the Z vector is what we talked about before. It's a compression. Imagine like a zip file is just compressed form of 3D model. It's also a representation of the information in one D space. Now a question that sometimes pops up is why 128 and not 10 numbers? Yes, I'm going to go back to the question second. So a question that sometimes pops up is what is 120? Why specifically this number? So ideally, we wanted to be even lower wanted to be 10 or five. But if you use 10 or five numbers, you don't have enough enough size in order to encode all the required information. So you have to use more. Actually, I used 256 and now also in the paper and in the in the gift, they use 256 size of encoding because it gives even more accuracy and even more information. Now, Mark, I'm going back to your question one chip versus several shapes present. The Z vectors pair one shape each time. So each shape has its own Z vector. I'm not sure I understand the question more than that. Feel free to explain if you have. I'm going to go with a lecture. We're going to be talking about the vector still so you'll get a chance to maybe understand it better. Okay, generative adversarial networks or fake networks. These are guns. The latest rage the past two three years has also been in the media often in connection to fake videos of Obama saying stuff and similar things. I want to start with an example of this. All the faces you see in these four images are fake. These people don't exist. There is a website I'm going to show it to you. It's called this person does not exist.com. You go on the website. You press F five. You press F five refresh. Every time it's a new person that is faking this person does not exist. Just like the website says, okay, somebody asked about the Z chip here. Yes, some part of the output can be mapped to high level features of the input. This is the exact idea of the Z vector to map high level will actually to map pixels of the input into high level features in the Z vector, which is the output. I hope this explaining it. Okay, how do generate adversarial networks work? Um, so actually a gun network is constructed from two networks. These are the green ones you see here. The generator network and the discriminator network. Now the generator network is trained to generate fake, but in our case fake images or fake models, the discriminator network is trained to recognize fake. And why is this interesting? In the beginning, when we just built these networks, when they're not trained, both the generator is not able to really create any good imagery. It's everything is creating is random noise and the discriminator is not really able to discriminate a real image from fake. All the classification is random. What happens during training is we give the generator a random signal and it creates a fake image. We give it many random symbol signals, it creates many fake images. Again, at the beginning of the training, the images are not good. Then we give the discriminator those fake fake images that the generator created along with real images. And we know what the fake images are fake and the real images are real. So we give this information also to the training of the discriminator. Now, each iteration, each epoch, the discriminator becomes a bit better in recognizing an image, a real image from a fake image. While the generator becomes even better at creating a real fake image. They both play like a sort of tug of war game. It's to those of you who are familiar with game theory actually guns. The idea for guns was taken from game theory from tug of war, that one is trying to beat the other by using sort of mechanisms and methods to choose to fix the other. So both are training at the same time. Both are becoming better, the one at creating fake, the other recognizing fake at the same time. And now because the discriminator is becoming very good at recognizing fake, the generator needs to catch up and it's becoming even better at creating fake and vice versa. It's like a repetitive circle type of thing. Now, how does it connected 3D? I'm going to play you now a video. This you can also see in the research page. You will see different models of airplanes at this video. All these models are fake, none of these models is based on a 3D model before. Now, this is a two minute video. I'm going to just move forward just so you see the different types of airplanes here, everything. How does this happen with 3D? So actually when we do 3D generation, we don't really generate the mesh. That's not what we're generating. What the generator is learning to generate is Z vectors. That's it. That's what it's learning. The discriminator is learning to recognize fake Z vectors from real Z vectors. And now because we have a bit of mathematical background, people here can explain if we use the generator to generate two separate Z vectors, the one is connected with this plane on the top and the other is connected with this plane on the bottom. What we can do is we can have a linear transformation of those two original Z vectors using an alpha, which is ranging from alpha here is ranging from zero to one. When it's zero, we're using only Z two. When it's one, we're using only Z one. And this allows us to create different Z vectors again. Now these Z vectors are based on the two that we saw already. So this allows us to create this range of airplane models that are taking us from this original airplane to this one by extending the fuselage here and making the wings a bit thinner and again and again until we get this model. We have 15 minutes. I'm thinking of what I will show you. We might end a bit earlier because I don't want to be focusing on these things. Another couple of more research is in this field. So Facebook has released a framework called PyTorch 3D a couple of months ago. Now PyTorch 3D is actually based on a paper by Facebook called Mesh RCNN from last year. I went over this paper. I read it. I got to say the 3D reconstruction quality of that paper is less good than the one that we presented here. The thing that they do in that paper which is interesting is they have real life images of scenes and they have 3D reconstruction from these images. Now in implicitly quoted the paper that we covered, the images are synthetic. They're blender generated images based on the mesh. So of course the quality will be good of the reconstruction in comparison to real life images. But there are more research is done. One interesting one is by Google. What they did there is they did a couple of things. We covered them here. One of the challenges was let's say we have an input model. This is a 3D shape. A mesh, for example, and we have one image of an object and we want to copy the texture from this object and put it onto the shape using neural nets. So there is actually a couple of neural networks involved here. One that is able to encode the texture from this image. The other one which is able to take the encoding of the texture, combine it with the 3D shape and create a new image of this sofa here based on the shape and texture. Now it's not only the shape and texture that are important is also the camera angle, which is another encoding by another network. And then we can see also images taken from different angles of the 3D model using the textures here and the shapes here. Now this works nicely for chairs, but the cool thing here, it works amazingly also for cars which have very different textures changing, right? We have this model, we take this image and we take all the texture from this image and paste it on the model. For whoever is interested, here is the link for this research. I don't think I saw open source for this research. Yeah, continuing on, another cool thing this allows us to do, very similar to how I showed you that we do generations of airplanes is we can generate different images of different models with the same texture. So we have different models. So somebody, Rafael asked, do vector images based guns exist? Yes, of course. This is usually how you do it. And you know, just because you asked, I'm going to show you a research about this, which is cool to show anyway. So what happens here quickly is that the shape changes kind of the same linear way that I showed you that the airplane change or you can change the the texture or the viewpoint. The same for the cars, you can move from a van sedan car to a sports van to a sports car, you can change the texture, or you can do also for both texture and model, you can do the change of the linear change for the cars the same. Now, Timothy, I might need your help with something here. I want to show a video with audio. I don't know how to share audio. If you can point Yeah, I'm gonna this is the video with audio. I want to share the audio from the video here. I will try. Please let me know if you can hear or no. So I'm taking off my earphones. You should be hearing me. Can you hear? It's okay if you don't hear if you hear badly. You have the link from the where you can use. And so I will try to explain a bit what we're seeing here. So what we have here is we have Justin Trudeau speaking on the right, but we have Xi Jinping pasted on his face looking like Xi Jinping is speaking Justin Trudeau's words. So the interesting part here is even though there are occlusions of the face, it's easy. It's possible to paste the face of John Bailey here on Ludwig Gornus. The last, the last example is the interesting one. Oh, somebody pasted the link. Perfect. Thank you. It's kind of freaky example. So you can see that even pasted the mustache and everything right on a black woman, white male on a black woman. So how does it work? They have this video here. The original idea is first to get the sort of an encoding of structure. And then on top of it, you want to paste a new face. That's a whole big research is done. Actually, I know one of the guys is from Israel, which is where I'm from. So the transgender network. Yeah, sort of. I'm gonna put my earphones back and I'm gonna finish the lectures. We have five minutes, which is perfect. We're right on time. Okay. You guys can hear me, right? I'm connected now with the earphones. A couple of cool things you can find on Google. So there's a website called Google Experiments. You go in and there's thousands of of different cool, mostly AI based work with artists and everything. Really, there's so many here. You can just browse so many the don't of art beyond screw. Really, there's thousands here. You can search. I'm going to show you an example of one. It's one I like. It's called auto draw. So here you draw your doodle something. Spoiler. That's the thing I wanted to do. How can I do? So you doodle, let's say a star, and it gives you recommendations of what did you mean? This is supposed to be based on neural nets. I haven't internally checked their code. Again, there are thousands of these experiments online. There's one where you can do dancing choreography, using stick figures, all sorts of cool things. So this is Google Experiments. Okay, continuing. Another we'll finish this one and the next one. So another thing, it's possible to synthesize not only 3D models, but also texture. So this is a type of biological texture, an image, real image. And then in this research, it's an Israeli professor, Chinese team. You can enlarge the texture and it seems like it's real life. It's not fake. And this texture of the wood and you can enlarge it also. Topological optimization we want to cover. So this is no need. And the last example I want to show you also we're having encoding the coding type of work here. This is style transfer. Maybe some of you are familiar with it. We have three networks, one that is encoding the style of the image here on the left. This is public Picasso first and then it continues to different another network that is recognizing objects on the top. And then third network that is pasting the style on the objects. And it looks like this. This is Alice in Wonderland, of course. Yeah, I'll move forward. Okay, and this thing you can actually do also yourself on your phones. There are a couple of apps that allow this. You can see them here as an example. These are images of me playing with these apps. And I think we're all done. So yeah, we're finished. This is all my contacts here. My blog and we actually have a Reddit group for everything to do with visualization and 3D and AI. Feel free to join. There's a discussion there. I'm available. We have two minutes. So we can do some small questions now. Is there a machine learning sandbox environment easy to hack within a week time with beginners within the I don't understand the question. What do you mean by hack? What is within the economy? Can you explain? So I'm not sure I understand the question. You do have Google Colab, which allows you to run Python code that runs neural network for free on the cloud. So I use it for teaching. It's free. Google Colab, I'll write it here on the chat. It's free. You can use it. It gives you cloud resources to run. You gotta, you know, you gotta know the code. You gotta know the deep learning and the architecture is there to do it. But yeah, it exists. When I do boot camps, I use it. Any more questions? Maybe I'll also stop the sharing and you can see me. So I see nobody's typing. No, I think hack is bad usage. I think I'm in double. Yeah. So that's that's Google Colab. A lot of the examples online for neural networks and things are are also created also for Google Colab. So many times you can use it very straightforwardly for style transfer. I notice they usually use the object detection net, which was trained with real. Yeah. Do you imagine using an object detection that trains based on artwork instead would get. I don't know. I haven't trained myself the cartoons type of things. So I can tell you myself. It's it's very hard to know unless you try it with your own hands. It's one of the issues here. Also, maybe a small thing. Many times people in the research, they say, yeah, we have achieved the latest state of the art something or we are the best in this or something. And you sometimes have to take the word for it because to check takes a lot of effort, which also sometimes leads to an issue that not all research is at what it says exactly, which is kind of shame because you can't trust what they say unless you try with your own hands. How does it look like a completed random? No idea. I didn't try. I would guess it will look similar to something inside of a category. So if we train for chairs, it will look similarly to something with a chair with holes in it with not real connections and stuff, but it will have more of a chair vibe to it than an airplane. And vice versa, if we train the vectors for airplane, that will look a bit more similar to the airplane. But I didn't try. It's interesting to try. Actually, I would love to if you tried and tell me. It's the open source available. You can just run it and check. By the way, if you have any more questions, my contact, I'll share the screen again. How do I share the screen? I missed the button. So my contact will be now on the screen. Feel free to send me via Reddit or email or any way message. I actually do this lecture also now to Reddit people for free. So they also get this benefit and we have a cool whole discussion there. And this is it, guys. Thank you. I think normal questions. Let me check. Yeah, thank you all. Enjoy. Good luck and stay safe.