Hence to avoid this issue I just created an independent neural network on each work-group, which is easy to implement, but not practical/efficient for all problems.
No resource is used to combine the networks because I only needed to wait for the kernel to return (all the threads are synchronized at that point). The networks are then combined at the "ward clustering" stage once training is completed.
@michielvda It depends on how you implement the neural networks I guess... the most restrictive constraint is that the work-groups cannot be synchronized together, so you can't synchronize all threads at the end of an epoch. If you try to do so using global semaphores it will result in a spinlock, since the work-group dispatcher is essentially sequential in OpenCL.
Great stuff!
Are Neural Networks hard to implement with OpenCL?
Would you perhaps have some resources on how to combine the two?
Thanks
michielvda 1 year ago
@michielvda (... 500 char max continued ...)
Hence to avoid this issue I just created an independent neural network on each work-group, which is easy to implement, but not practical/efficient for all problems.
No resource is used to combine the networks because I only needed to wait for the kernel to return (all the threads are synchronized at that point). The networks are then combined at the "ward clustering" stage once training is completed.
Thanks for the interesting comment!
MrJeanmik 1 year ago
@michielvda It depends on how you implement the neural networks I guess... the most restrictive constraint is that the work-groups cannot be synchronized together, so you can't synchronize all threads at the end of an epoch. If you try to do so using global semaphores it will result in a spinlock, since the work-group dispatcher is essentially sequential in OpenCL.
(... 500 char max ...)
MrJeanmik 1 year ago