We have been successfully running sound-field simulations on an HPC using kspaceFirstOrder3D-OMP for some time now. Currently I am trying to implement an ultrasound imaging simulation and am looking for advice on how to best parallelize the approach. Based on my experience using the code so far, I noticed diminishing returns on parallelization across nodes and have thus maximized thread counts while restricting to a single node. For the imaging purposes I would have to now re-run the code for each scan-line i.e.:
for (( i=1; i<=$NLines; i++ ))
mpirun -np $SLURM_NTASKS kspaceFirstOrder3D-OMP ….
This however would sequentially do one scan-line after the other not being able to capitalize on multiple nodes. I could alternatively create multiple jobs but this seems like a poor approach.
Question: How would I best go about parallelizing lines across nodes while parallelizing simulations across threads on a single node?
Is adapting the source code to include and manage the MPI allocations my only option?
Any guidance is appreciated!
Cheers, Cyrill