Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Dec 18, 2013
In this talk, I discuss the extra complications in porting a parallel, message-passing code to use OpenACC, using the example of the parallel Himeno code. In particular, I show how asynchronicity and dependency trees can be used to give best overlap of computation and communication. I also show how best to combine MPI single-sided communication with OpenACC asynchronicity, including when "G2G" MPI is called with GPU-resident buffers. The lecture concludes with a brief discussion of OpenACC features planned for future versions of the standard, and a comparison of OpenACC with the new OpenMP accelerator directives.
Programming for GPUs Course: Introduction to OpenACC 2.0 & CUDA 5.5 - December 4-6, 2013