PixelJunk Shooter 2 and SideScroller: Fluid And Lighting Part t 1
Uploader Comments (LoadHitStore)
All Comments (14)
-
@LoadHitStore Sweet Jeebus. Thanks very much for taking the time to type all that out ! I can't believe I'd never heard about the memory access required to use scalar floats and vectors together on the PPU...that stuff is mad. I don't suppose you fancy doing a video on using the PS3 effectively do you? I can guarantee at least 9 views - all of my class mates will watch it ^^. Thanks again!
-
@CallumBGood "Normally" in a vector you would store X,Y,Z,W or R,G,B,A. AoS is called that because you have an array of these structures. { {R1, G1, B1, A1},{R1, G1, B1, A1} }. That works great when you want to operate on vectors with things that line up. Meaning R+R, G+G, B+B, A+A. However, it kinda sucks at mixing and matching elements, and insert/extract. Meaning things like vec.y = 5.0f and vec1.z += vec2.x is usually pretty painful. SoA is the opposite, instead of an array of vector structs
-
@CallumBGood See the altdevblogaday article put-this-in-your-pipe-and-exec
ute-it/ for a better explanation. -
@CallumBGood 500 char limit? this will be a multipart comment. I can tell you what worked best for us but it may not work with you. The most important thing is to carefully consider what you want to do with the particles, how your data will be transformed, and then the best representation to use. For us, that happened to be SoA rather than AoS, but thats not right for every problem.
Fantastic stuff! I plan on coming back to watch part 2 tomorrow ^^
I'm going to show this to my lecturer at Gamer Camp Pro because we are doing a PSN project and this stuff is totally relevant. Did you do much SIMD stuff? I read about tracing 4 rays at a time using the 128 bit registers, how about integrating 4 particles at a time? ^^
CallumBGood 3 months ago
@CallumBGood instead of having an array of {x,y,z,w} structs you have a struct that contains 4 vector arrays. One array contains all the x values, one all the y values, etc. This lets you write scalar looking code (x = y * z) with no performance penalty but still lets you operate on 4 values under the hood. Notice that with SoA you are operating on 4 things at a time ( {x1,x2,x3,x4} = {y1,y2,y3,y4} + {z1,z2,z3,z4} to use the above example ) so its best not to use it to do 1 object at a time
LoadHitStore 3 months ago
@CallumBGood the other drawback is that a traditional vector represents just one thing while an SoA vector is 4 things. It makes it harder to handle conditional stuff. With and AoS vector, you can say if(vec.x > 0) vec *= 3.14f; but it gets more involved if you have to operate on 4 things at once. You can fix this by calculating vec *= 3.14f into a temporary, make a selection mask, and then select which to use. mask = cmp_gt(soa_vec, 0); tmp = soa_vec + 3.14; select(soa_vec, tmp, mask);
LoadHitStore 3 months ago
@CallumBGood that only works for smaller if blocks. You don't want to necessarily try that for larger more complicated blocks. It usually pays (on the SPUs) to try and eliminate branches from your code and try branch free algorithms. One last point on SoA. Float instructions on the SPU have a 6 cycle latency giving you a max of 5 more instructions you can put in there before the result of the first instruction gets written back to registers. Unrolling is our friend, especially with 128 registers
LoadHitStore 3 months ago
@CallumBGood ok, last bit of advice: on the SPUs it doesnt matter because there are only vector regs, but on the PPU it pays to use float-in-vec instead of float (where vectors and scalar floats have to interact). This is because to go from a vec to scalar float register, you first have to go through memory. This is very bad in cases like float val = vec.y and vec.y = val. If instead of float, you use a vector that acts like a float, you avoid expensive transfers between register files and it
LoadHitStore 3 months ago
@CallumBGood makes vector/scalar operations easier. in vec + scalar, if scalar is not really a float but rather a vector with the value splatted (val,val,val,val), it becomes a single vector multiply rather than a mess of loads and stores and duplicates and all that crap. So, let me know if you have any questions. I jut woke up and I am sure I screwed something up somewhere, or forgot to mention something, or just said stuff in an unclear way. Sorry!
LoadHitStore 3 months ago