Hi marcelr,
I've performed two simple benchmarks and the results are not surprising :-)
size iCore7 920 (4c/8T) GTX 580 2xIntel E5-2670 (16T)
128^3 0.09918s/step 0.0967s/step 0.025s/step
256^3 1.10s/step 0.7347s/step 0.42s/step
Offloading the FFT on the GPU is not going to speed up k-Wave substantially because the data has to be moved back and forth 14 times per time step. This virtually kills all the benefit gained from the GPU. The only way to get a reasonable speed-up is to run k-Wave entirely on the GPU (this version is being tested).
The maximum size of the simulation domain I was able to run with a GTX580/1.5GB VRAM was 256^3 (cufft is incredibly memory voracious).
However, if you have a slow CPU and a powerful GPU, the speed-up is measurable. If interested, I can send you a patch file to be applied before compiling.
Jiri