Here are some preliminary results of running the ultrasound in a homogeneous medium example with and without an Nvidia Tesla C1060 GPU card (240 cores, 4 GB RAM). Below are the actual simulation times (in seconds).
System 1: XP, dual processor, 2 GB RAM.
Matrix size w/o GPU with GPU
100x100 5 7
200x200 25 15
300x300 101 24
400x400 243 35
500x500 515 69
600x600 908 67
System 2: Win 7-64 bit, 8 GB Ram, 8 processors
100x100 6 8
200x200 28 19
300x300 92 28
400x400 273 40
500x500 573 81
600x600 984 74
700x700 103
800x800 160
900x900 161
1000x1000 208
1200x1200 300
The speed improvements are dramatic at larger matrix sizes, and the GPU shows a significant improvement for matrices larger than 200x200. This is somewhat better than your reported breakeven point of about 512x512 elements. I find it interesting that Win7 seems to be slower than XP. Also, both systems showed an unexpected speed improvement going from 500 to 600 voxel matrices. I apparently hit some optimization sweet spot. Ultimately we wish to use large matrices at fine resolution to generate photoacoustic images, and the Tesla gives me some hope of getting this done in my lifetime.
Thanks for a great package that is a lot of fun to work with.
-Dan
Optosonics