 Can you hear me? Oh, great. Hi, everyone. Good morning. My name is Angelo. I've been part of the Blender community for about 13 years, and I'm working for a company called Lexet now that does machine learning, data sets for machine learning. So I'm going to give you guys a talk about what we've built on top of Blender and sort of where we're going. So last year, Andrew Price gave a talk about funny cat drawings. And there was other stuff in that talk, I guess, but cats seemed like the main topic. So I figured I would follow along with that. Andrew told us about AI technologies that were changing the 3D industry. He talked about the rise of procedural tools for modeling and texturing and scene composition. He talked about powerful utilities like AI de-noisers and up-resing and markerless motion tracker capture. Finally, he talked about the possibility of creating augmentation on creativity with generative AIs. And I think these are the ones that are the most extraordinary and have the most potential impact. And it's kind of hard to imagine what the 3D industry will be like once these tools get fully streamlined and easy to use and available to everyone. The reason I'm telling you this is because I really fully agree with Andrew's predictions. And I'd like to share another side of that story, which is that while the AI industry and tools of AI are coming to the 3D industry and changing how all of you do your work, or maybe impacting how all of you do your work, then the other side of it is the AI industry is starting to adopt 3D technologies as a way to accelerate and democratize machine learning tools. So what are we going to talk about? I'm going to try to give you a quick survey intro to convolutional neural nets. And then I'll try my darndest on that one. And then I'm going to talk a little bit about the challenges of current AI procedure and methodology and how we're trying to fix that. And finally, we'll talk a little bit about what the future will look like or might look like. A convolutional deep neural network is a computer program that takes an image as input and gives a prediction as output. Simple as that. It's trained by showing it thousands of labeled images and it attempts to generalize patterns about those images. The prediction could be a category or it could be a pixel by pixel prediction about that image. So what does it look like on the inside? You have a number of nodes all connected to each other in layers. At the beginning we have pixel values coming in. We have hidden layers that do magic and then on the back end we get a prediction and it's usually a zero to one value. Inside of each of those nodes it looks a little bit like this. You've got a bunch of multiplication and sums and activation functions and math. So you might be thinking why am I showing you all this math instead of showing you more pictures of cats and that is fair. The goal of a neural network is to learn a pattern from some initial labeled examples and then later after we do that training we want to look at an image we've never seen before from the real world and be able to predict that same set of things that we trained on. So you might be thinking what is a convolution? No? Oh well. A convolution is really just a pixel based filter. So if you, you know, this is a, it's very hard to see when it's that big. But this is an edge detect filter basically in Photoshop, right? You take a 3 by 3 grid of pixels and you multiply each of the pixel values by a given number and those specific numbers are called a kernel, right? Whatever that convolution is doing is called a kernel and then we output the end value, right? 6. And so when you use these numbers, negative 1, negative 1, negative 1, negative 1, 8, negative 1, negative 1, negative 1, negative 1, you get an edge detect. And this is basically what's happening in the beginning layers of a neural network. And the first layers pick up simple edges and sort of over here and as we go deeper into the network the filters end up looking more and more like real items that you might see. But all of the later patterns are built up of earlier simple patterns. So how do we, how does it work, right? Let's say we want to create a cat classifier. We want to have something where you can point it at something and it'll tell us whether it's a cat or not. You need a bunch of images of cats and a bunch of images of not cats. If you've watched Silicon Valley, you know this is very similar to a hot dog, not hot dog neural network. So you feed the images into the network one at a time and the pixel values go in and at the beginning all of the numbers, the internal parameters of this system are random, essentially. And you will get, it'll make a prediction but that prediction will be terrible. It's random. So, but the thing is we know what the right answer is when we feed these images in. We know which ones are cats and which ones are not cats. So when we get to the end, the neural network will calculate how far off it is from the correct answer and then it'll step backwards through the layers updating each one of those nodes, the parameters in each one of those nodes to try and make its guess better on that answer and sort of does that across an entire large data set of tens of thousands to millions of images. And at the end of that whole process of going, passing a thing through, finding out how bad the answer was and updating and going all the way through again and again and again, we have information in these hidden layers that really represent what is a cat, right? It has a visual intuition of what is a cat. Okay, so you're all experts now on machine learning. So we can move on. So the thing that's so challenging about the earlier way that things were done is that you have to produce lots of images and historically this is photo data sets. So let's look at sort of what happens when you use a photo data set. You hire photographers and image annotators. You get access to many, many locations to take photos at. You bring lots of objects, randomly place them. You have to take 10,000 plus photos. You consistently change camera settings. You take pictures of rare things because you have to see those things. You draw Bezier paths around every single object in all of those thousands or millions of images. You train your neural net, you find out you're missing something and then you cry big wet tears because it's very hard to iterate on your data set because it's extremely costly and really you have to be a big company to be able to do this kind of data process. And so what you find is that in the AI industry, mostly people are iterating on the AI architectures themselves and not really iterating as much on the data sets because it's just been prohibitive to do that. So how does Blender help fix that? Well, we set up a system that's fully parametric from end to end. And that includes models, it includes layouts, it includes textures and shaders. And you pick all those things, you press a button, you wait for a bit, you get some data back, you train your neural net, you find out something that you were missing something. And then you change one setting, you back into our GUI, you change one setting, press a button, wait a little bit, download those images and buy a snack with all that money that you saved by not hiring annotators. And so I'm going to show you, this is an example of a scene that we generate. So we generate the architecture and Archipac is very helpful for that. We lay out furniture, Chocopher is very helpful for that. We litter stuff across the floor using particle settings and eventually we're working on systems that will learn from photos about human room layouts, so we'll be able to apply machine learning into the actual layout of this system as well. And this is what sort of a sample of some photos from a data set are. You can see they're very different lighting settings, they're different shaders and colorings and this is really important when you're training an AI system, you need a high amount of variability and variety so you can learn edge cases. And so the other thing that we get for free when we use blunders, we get passes, can you imagine trying to take a photo and manually add a depth pass to it? Basically impossible, right? And nobody's going to do that, but you get that for free. And there are AI models that can look at photos and predict their depth and this is a way that you can train it very effectively. We also get that masking for free. So again, this is a thing that if you do this the old way, you go onto mechanical Turk typically and you hire somebody across the planet to draw Bezier curves 24 hours a day for weeks and weeks and weeks and weeks and weeks, right? That's how you solve that problem and that doesn't seem like a very good solve to me. And so that's where we are, those images where we are now, this doesn't look great blown up this big, but this is where we're going in terms of rendering quality. And the trick here is we have to balance between render quality and the amount of time because we're not just making one image, we're making 100,000 images. But this is all, we do all of our rendering in the cloud, so it's massively scaled. And I just want a big shout out, thank you to Chocopher and Polygon for existing because we could not exist if it weren't for community projects in the Blender community. There's so many amazing add-ons and resources. I can't even name all of the ones that we use, but we're a small team, we're three engineers. And so there's no way we could build all of the tooling that we would need to get this thing done. And so we rely hugely on open source projects and Blender being the thing that we're built most on top of. I showed you this image of the cat with the pixel-based thing. And I just want to show you some video of a system, it didn't play. Well, if this were playing, it's basically doing a real time mask of the floor, trained from one of our data sets. And so we've proved that we can get close enough to a real photo for the machine's sake in order to train these kind of networks. Okay, so what will the future look like? This is some work done by DeepMind, which is part of Google. And what's going on here is this neural network has been given that single image, that single observation image, and it is extrapolating, it's guessing about the shape of that entire scene from that one viewpoint into the scene, and allowing you to move around in that scene without, it doesn't have any scene graph, it's just photos and it knows what that entire scene looks like. And so I think that this looks rough today, and it is. But this will be working on ray trace quality images in the not too distant future. And I think that's a really big change to how a lot of us work potentially, because this essentially runs in real time. And so this is where the AI industry is today. There's lots of janky stuff. This is some more work by DeepMind. But this is where it's going. I don't know if you've seen this, this is wild. So that's my talk, and thank you very much. You can reach me at my email here. And if you're interested in talking more about Blender or AI or the Robot Apocalypse, just find me after the talk. Thanks.