Thanks for your reply. I never tried GPU, but there are some algorithms maybe you can use (such as link-list) to improve the searching speedup. By the way, in your code, which type of memory you are using for the searching neighboring partilces: global memory or shared memory. Is there any reference material recommend for particle programming in GPU?
OK, in my CPU code I'm using a tree search algorithm, and it's similar to linked-list. In the GPU code I'm using shared memory because it's very fast for access to all data at same time, the problem is the shared memory it's only 16k (GTS 250).
About the reference material, I began with the OpenCL and CUDA online reference and after tried to understand and modify the example codes in CUDA SDK.
I have used float4 data types to improve the performance. The application run at 240 Gflops with 25k particles, but the main problem is that number of operations increase with n^2 (with n=number of particles). In the CPU version I'm using a neighbour search optimization and the number of operations increase with n*log10(n), this means that using large particle numbers CPU is faster than GPU. I'm trying to adapt the neighbour serach to GPU, but isn't easy.
Thanks for your reply. I never tried GPU, but there are some algorithms maybe you can use (such as link-list) to improve the searching speedup. By the way, in your code, which type of memory you are using for the searching neighboring partilces: global memory or shared memory. Is there any reference material recommend for particle programming in GPU?
trimtrim1980 1 year ago
OK, in my CPU code I'm using a tree search algorithm, and it's similar to linked-list. In the GPU code I'm using shared memory because it's very fast for access to all data at same time, the problem is the shared memory it's only 16k (GTS 250).
About the reference material, I began with the OpenCL and CUDA online reference and after tried to understand and modify the example codes in CUDA SDK.
jahkr 1 year ago
Excellent simulation. How about the cuda speed up and how about data type used? single float to double precious?
trimtrim1980 1 year ago
Thanks for comment.
I have used float4 data types to improve the performance. The application run at 240 Gflops with 25k particles, but the main problem is that number of operations increase with n^2 (with n=number of particles). In the CPU version I'm using a neighbour search optimization and the number of operations increase with n*log10(n), this means that using large particle numbers CPU is faster than GPU. I'm trying to adapt the neighbour serach to GPU, but isn't easy.
jahkr 1 year ago