Basically any routine optimized to perform well on ATI cards performs poorly on Nvidia cards. There was a paper that someone posted a while back that did a study on the performance differences between OpenCL and cuda, and compared performance between OpenCL routines optimized for ATI vs Nvidia cards.
It is also entirely possible that even then the code is poorly optimized for the ATI cards, but any optimization one way will make once card better than the other. It could very well be that the current miners are more optimized for the ATI cards. I was actually looking into GPU bitcoin mining and I noticed that all of the gpu mining programs use OpenCL. It’s great that Nvidia cards are better at :) This is unfortunate, since I am very interested in bitcoin mining AND I’d like to keep being an Nvidia guy, since my rig is intended for gaming first. It already looks like AMD is moving in the direction of NVIDIA for their next architecture: Hopefully in another GPU generation or two, we’ll see some architectural convergence that will make OpenCL work better across platforms. You should be able to find… Dr.Dongarra is pioneer in BLAS, LAPACK world… He still is.
You may want to check Dr.Dongarra’s paper on writing high performance BLAS kernels in OpenCL. Separate kernels are sometimes needed to address AMD and NVIDIA platofmrs separately… It aint too bad… AMD cards can give the bang for the buck as much as NVIDIA does… And, OpenCL is a standard anyway…īut, OpenCL does not really mitigate the portability issues. So, in those cases, achieving performance with AMD is a bit challenging - Totally depends on the problem in question…Īs far as memory bandwidth, AMD can manage well even if there is lot of non-coalescedness in your program. However GPGPU is moving non-traditional HPC problems to GPU. Most HPC related problems are data-parallel and hence vectorizable.