 Let's just start so welcome to the presentation. I'm Sergey. I'm going to talk about something totally different comparing to all the previous years I wanted to make it some short and easy ununderstandable for everyone Not going into all the crazy details of task scheduler and everything like for cars about them anyway It's all hidden underneath the the blender. So In today's presentations Yeah, as usual the same topics my area of interest didn't change this year too much So we're going to talk about open sub if dependency dependency graph and cycles So let's just get straight into it. So open sub diff. Well, actually before we start Can I just wonder who is its first year? London conference. Oh Okay, so then maybe there is something interesting for you. Okay, that's good So open sub diff and like for those who didn't know what it is It's it's more optimized subdivision surface in gen which runs on GPU and It's it's in blender since year now and for the for the past year We've been mainly working on fixing called a slit of box with which are hardware dependent and everything Did some improvements for performance Trying to minimize the latency between you modify the policy and you get a result back to on the on your viewport and Finally, we have UV map support Which took us a bit to to to work things around because there was no support in in open some deep side So we needed to to work around our own way to achieve this But luckily now it's open subject 3.1 released so we can get rid of our hacks And also like small fixes here and there you can't remember everything So how does open sub diff works on the hood? So first of all you have some CPU site which basically analyzes your topology and See which which type of faces you have what the connectivity between vertices and everything and it's By doing this, it's also built so-called patches like subdivision patches Basically rectangular regular rectangular thing give us subdivision points on them It's only it's only happening once On CPU when you change topology once You have analyzed mesh The oldest data gets pushed to GPU side and it's GPU who is responsive for for ticillating the actual patch and Doing a shading on top of the old of the patch. So that's basically how it works What does actually this mean so this means that the Open sub diff will always require some CPU pre-processing parts. So you you cannot avoid some latency between when you change the policy basically and It also means that that open sub diff will always require some decent support of your GPU so if your GP does not support this relation shaders or Does not support some stuff you use for for shading You wouldn't see any option to use open sub diff even the CPU option to use it for open sub diff in the user preferences So that's why sometimes you see, okay, why have decent CPU Why cannot I use CPU ticillation in in for open subject? Well, because it's not faster than legacy opens as legacy subdivision surface code and it cannot be be used because you still need support of GPU for that so just be aware and To check your open GL version I think it's a common 3.0 and you can check it in in help save system info and it's it'll be written there Okay, so related note to open sub diff. Let's get a Bit in into this how selection works in blender So in CPU side you create so-called off-screen buffers Okay, I want to draw open GL not in the screen button some memory buffer then you actually draw viewport to this frame buffer and You don't really know what side it happens because it depends on various things So we'll get back to this in the next slide And and then blender gets back to CPU side to check which which exact pixel was clicked And what's the object under that pixel and it will select that object? so That side on which drawing this happens actually depends on this setting So there are three options First one is automatic which is no works for me. It doesn't pick a proper selection mode for me for some reason. I Someone's bigable object reason. There is Anthony somewhere here. Maybe he can explain me after after the talk So the open GL select is there is the legacy code which works on every hardware It doesn't require any anything fancy from your GPU But downside of it is that it's purely CPU side So when you have some decent shader usage, they will be all evaluated on CPU side, which is not necessarily fast So what this means for open subject is that all the heavy mesh processing which usually happens on fast GPU is now happening on CPU which gives you huge latency When you try to select open sub diff mesh There is another thing which is called open gel occlusion queries, which now works Quite stable. I think I don't remember any issues in this area like in the past months Didn't see any reports on it And this is actually the the the dude which you wanted to use because it doesn't go to CPU Side to draw off-screen Open sub diff mesh. So there is no latency anymore. So That's one of the tips like actually didn't count it. So Let's just go go through them and so don't worry. So yes, if this is The thing to is if you like if you use open sub diff use this guy for selection and maybe for some heavy scenes This is also the good choice to to use because then you don't you reduce latency of selection and heavy scenes Okay, so and what's going to haven't opened sub diff in in in in in the next year. It's Hopefully this 2.8 branch where we will be bumping open gel requirement to a much newer version where we can have Ticillation shaders Geometer shaders working together without any hacks and pork arounds then we can finally support proper smooth shading By evaluating smooth normal on GPU side because currently some approximation Which does not necessarily work and there are also plans to reconsider how it's open sub diff is integrated into into modifier stack. So Currently the problem is when you try to to combine some parenting to to match and subdivide that match Which force you to to to to to CPU side of a relation, which is not real great So you're going to change that make it probably some viewport option or render engine option or whatever it is going to be into point a branch Okay, so that's it for the for for the open sub diff. Let's talk about dependency graph and here I wouldn't Don't know nobody understands it anyway. So it just says some kick now. It's just some little things. So What happened to dependency graph? Well Not much not much visible. We've been just fixing lots of bugs in there What to get for for fixing bugs? We have Brig of thousand bones and then we try to figure it out why sometimes some bone pops up Yeah, very fun debug like two weeks in debug mode for one single ring and then we have 10 reports about this Yeah, very productive spend of time But we are getting more and more close to the point where we can just declare. Okay, this is the dependency graphic Gonna to to make it official one Just get rid of all one. It doesn't work anyway so The the the good milestone here is that we switched blender Institute to new dependency graph by default So now the full studio all the artists work with new dependency graph Which kind of stressful for me because it's only me for Team of four four four artists who does rigs and animation and things sometimes that work for them And sometimes it doesn't work for all of them. So there is line of four people waiting for me to do a bug fix but Last couple of weeks it became okay No, no big issues in there. So that's good. So maybe we can enable a default in master like soon Or maybe make it default into point a branch and get rid of legacy one so Yeah So relations debugging how do debug your relations actually that's a good question No, there is no way fortunately You can try but it's usually don't work. So In blender, it's the same story. So you cannot easily debug dependencies And the only thing which tells you that you have some issues in your rig is the system console There is no interface for for reporting some dependency cycles Which is the the biggest issue for for for your Reagan your scene. So Generally rule of thumb like when you have something weird happening You just check your system console if there is some dependency cycles report it to the to the console Read through the cycle it it'll it'll tell you what the exact modifiers or constraints or objects are in that Dependency cycle and if you find that it's it's a fake dependency one like It was detected says some of your things some your Pleasure might be created by dependency rough like box happens and then you just go to the bug track and report it and Pog the guy is nicknamed Sergey in there But if that's a real dependency cycle there then there is not much you can do from our side unfortunately So that then you can probably reconsider your rig or your constraint system or something like this to help you bug this so If you if you dog if you go to to to report this and it's get fixed like that the to come Pelican tokens are welcome for that commits Okay, and for for plans. Let's open something if it's like so it's It it's it's it's the balancing graph So step number one is to I finally need to get back to Working on the proposal for for the overrides local copies copy-on-rides stripe and swipe and everything and don't don't listen to me Yeah, yeah That that was like stride and swipe was technology like that. They were they've been wanted by him. So Okay, and that step two is to implement that proposal on profit. Yeah, that's you'll see how it goes but I should I should get there because we are done with 2.7 a release now. So should have more time now Okay, let's talk about cycles So I will just do some disclaimers later Because there are some controversial topics in there. So basically in cycles the the the biggest Change I think was the micro displacements from my and she's somewhere here No there Like when I was working on presentation They didn't see the the program so I thought it would be some feature list presentation from cycles. I'm not sure if it's Happening or not So because I don't want to go into all the improvements in there to be honest we can just talk to me or someone else related to this Like from from from what I was working is the more realistic PBR Which is a hip fort nowadays more PBR subsurface scattering Which is actually based on the approximation physical measured cores and everything which is kind of fun and lots of optimizations mainly in BVH which will get two to two days and Yeah, I can cover every everything. So we'll just like what is BVH actually does anyone knows what BVH is? Okay, a couple of guys. So BVH is basically an optimization structure which allows you to to Shoot the race in the scenes to find intersection between the right and the scene more efficient And it's basically the the bounding hierarchy Bounding volume hierarchy. So it's basically some bounding boxes Which like starting from small one which which encloses small small triangles and bigger and bigger and bigger and covers more of your scene so this is just the Attempt to visualize BVH of the monkey in in BVH of cycles. So we can kind of see the Hierarchy going up and up and up So let's talk about spatial split option in the performance like when you have to use this So and what the split BVH is so here we have an example of two of two different Of two triangles which are fit into two different types of of bounding boxes. So first one is the bounding box which encapsulate which Surrounds one triangle at a time and as you can see there is a big Intersection between these two bounding boxes. So if you try to to shoot ray somewhere here You're doomed to to find intersections between two bounding boxes and to check both triangles because You you don't like if you if you are here, you don't know if it actually hits red triangle or not So on this extra intersection checks slows down your render time So there is another way to feed the these triangles into BVH Boxes by doing something like this. So now one triangle belongs to two different bounding boxes Which is less efficient from memory point of view But then when whatever will I wherever your ray hits in the space it only does one bounding box intersection check which Makes checks much faster. Well It makes intersection check faster, but the overall render time might be not so significantly faster But usually it takes up to 10% faster in certain scenes. So Okay, let's not go into this yet. So basically you you always want to use split BVH It did the downside of his split BVH is that it requires more time to preprocess everything So if you if you're using simple scenes, then probably you you don't care which which structure you use but if it's a heavy scene then 10 seconds more of preprocessing doesn't kill it because it gives you minutes of improvement And then we need to render a thousand frames. It's like moist to talk quite decent amount of time So just use split BVH for Big scenes as simple as that Okay, and then There was also things which is called hair BVH. Okay, so what's the idea here? so I tried to to visualize hair strands with armature and blender just to make it more visible and The the the regular nodes in BVH before all the changes in this area was that they always axis aligned So you always put the axis aligned bounding box Around the hair strand even though it might be really long and stretchy and everything and that's how the It was originally implemented the idea to to to improve to improve the situation here because you might see Somewhere here and it hits bounding box, but it doesn't hit the the hair itself So what we can do is we're okay, so Let's say that we can rotate the bounding box and scale it a little bit. So We do this It's to non axis aligned bounding boxes for for the hair strands and now the the area where the Rake will hit the bounding box, but will not hit the rate itself is much smaller Which means you do much less intersection checks of free to hair strand Which which goes much faster like you if you if you compare just a hair intersection like right to to to seen intersections It goes probably as fast as three times. Maybe four times in in some corner cases So for hairy really hairy characters, it's helps a lot unfortunately For for this you pay much more like a penalty for achieving this is to store the orientation of the rotated bounding box So we try to keep the memory footprint the same across the regular one and the rotated BVH things But yeah, if he sometimes you you wanted to optimize your scene for memory and make Your scene to fit into VRAM or into your physical memory So for that thing you can disable hair BVH in the performance panel and we will gain up to 20% memory improvement for for for the hair is seen so when you have hair is seen without which doesn't fit into your your graphics card you can do that with hair BVH gain more Margin to to fit your scene in there like being able to render at all is much better than Just fancy data structure would not mean to render anyway, so That that's the was the the next state Okay, so let's jump to something totally different like all the hot topics CP versus GPU like Do I need to use CPU or GP for particular scene and for for for the last months, we've been collecting some statistics on the on the scenes and on various hardware in the studio and CPU is like For simple scenes It it seems to be much slower than the GPU But then when you add complexity to the scene like out hair stuff or subsurface stuff to it Then that's where the the CPU becomes Much further than the GPU. I'm not sure if it's going to change any time soon probably not not really soon because it it's like just the way how the Past racing works in cycles we can probably optimize something Here but in general probably stick to Such things for for for quite some time now so what could be the Searing here. Okay, so if you render some simple things just go GPU if you if you if you work on production movie type of scene Then the the CPU is the way to go actually. Oh unfortunately So this is the 88 core CPU from from Intel now it's dual it's it is dual is the on and 88 threads in Total this is 56 threads in total. This is 10 threads This is when we then I am the comparison Yeah, I didn't have that the up-to-date statistic from from 1080s because Bracket was fixing some performance progressions in there. So we need to rerun this Yeah, it's the same ballpark and Victor sins did not fit into into these cards at all so The thing and Just to state the obvious So GPU is not magic bullets. So you need to be careful when you Buy your GPU thinking okay I can render my movie sins with these millions of hair strands with top surface real fast. Well Not really but for for for some architecture visualization stuff like this. It's it's it's quite a spin improvement And Another thing which is a bit here is the branch pasteurizing which claim to be much slower according to Andy We can look into this because there are some stuff to this which you'll see later today Like during these presentations and the good thing for for CPUs It's easier to predict the renderer times once you add the more complexity to our scene like you you think okay so we added Two times more polygons in here. Okay, then render time is probably gonna to be two times slower Yeah, fair enough on GPU. It's much less linear scale. It's unfortunate So that's it for CPUs with GPU. I think so Yeah, okay. Yeah, it's it's perfect. So so tile order is has direct impact On the performance. So to get best performance is to is to get bottom to topple Hilbert Order, which is now Hilbert is now a default one This is the guys to to to to get ultimate performance rendering from command line can gain up to 10% maybe 5 to 10 ish percent depending on variables And if you're still rendering from interface good tip would be to enable GL cell image drop method in the user preferences it's a Bit more stress when you're panning image around but when you're rendering kids much less stress on on the CPU of a value rendering Okay, tail size That's that's the topic. So There is a hot discussions about our benchmarking saying like hey, you didn't use the Right tile size. Well because there is no right tile size for for for all the GPUs around It's going to vary around General rules is that is that for CP you wanted to keep them small for GP wanted to keep them big as simple as that and Sometimes on GPU when you run into some weird issues as time out in the kernel You might want it to make them slower and smaller to speed up per tile around the time So you don't run into into time limit inside the tile and this time limit We cannot solve because it's a driver side thing which is control of NVIDIA only or AMD or Microsoft depending on the driver on pricing system. Okay, so Sorry Yeah, so this is just stating the obvious Okay, so let's go a bit deeper So how does it? Work internally so Each each tile on GPU like GP works on one single tile and each tile gets split into blocks of 16 by 16 pixels and Each multiprocessor your graphics card has more than have several multiprocessor and each multiprocessor get fixed amount of this 16 by 16 blocks and works on them simultaneously and According to the specifications for for the recent cards like 901,000 it's for regular past racing is five blocks per multi-processing and for branch past racing It's only four blocks per multiprocessor. So that's why branch past racing might be slower And each of the pixels in in the block inside of the multiprocessor handle on different threads and Order of the schedule of the blocks and multiprocessor is not known So simple rules Keep your tile size a multiple of 16 So so you don't run into into things when you have some residue of two pixels of One dimension everything all the rest of that block is being masqueraded because there are no pixel in there And those threads are just idling on the GPU and so The it's tricky one. Okay, so you wanted the Integer like you wanted to your tile to be an integer amount of these thread blocks of these blocks Depending on on on on your branch past racing or regular past racing So for for branch past racing, we always want to to to fit for Regular one is five blocks and for like multiple of five blocks and for for branch past racing before four So there is a simple formula like for for modern cards because I believe both 910 hundred has 20 multiprocessors and just more than four five and that's how the amount of 16 by 16 blocks You need to have in in the tile Method disclaimer is okay. So this is quite tricky to predict. So Things which should work in theory not always works in in practice. So Sometimes you need to to fix some values according to your card and be happy so I think Actually was thinking like like I wanted to have maybe some answers to the questions But apparently last night or learning that there is no Q&A session No, I give you answers Yes, that's answers from my side, but I cannot give them because there is no Q&A So I will be happy to answer all the questions I can in in the breaks and everything but for now I think that's it and