Hi everyone,
I tried to run the 2D k-wave simulation of the line sensor using the GPU version. Here is the output.
Resizing matrix...
input grid size: 384 by 384 elements
output grid size: 3840 by 3840 elements
completed in 0.080413s
Running k-Wave simulation...
start time: 26-Jan-2021 09:55:04
reference sound speed: 1500m/s
dt: 2ns, t_end: 36.202us, time steps: 18102
input grid size: 3840 by 3840 grid points (38.4 by 38.4mm)
maximum supported frequency: 75MHz
expanding computational grid...
computational grid size: 3880 by 3880 grid points
WARNING: Highest prime factors in each dimension are 97 97
Use dimension sizes with lower prime factors to improve speed
precomputation completed in 2.6354s
saving input files to disk...
completed in 1.6728s
+---------------------------------------------------------------+
| kspaceFirstOrder-CUDA v1.3 |
+---------------------------------------------------------------+
| Reading simulation configuration: Done |
| Selected GPU device id: 0 |
| GPU device name: GeForce RTX 2070 with Max-Q Design |
| Number of CPU threads: 1 |
| Processor name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz |
+---------------------------------------------------------------+
| Simulation details |
+---------------------------------------------------------------+
| Domain dimensions: 3880 x 3880 |
| Medium type: 2D |
| Simulation time steps: 18102 |
+---------------------------------------------------------------+
| Initialization |
+---------------------------------------------------------------+
| Memory allocation: Done |
| Data loading: Done |
| Elapsed time: 0.24s |
+---------------------------------------------------------------+
| FFT plans creation: Done |
| Pre-processing phase: Done |
| Elapsed time: 1.68s |
+---------------------------------------------------------------+
| Computational resources |
+---------------------------------------------------------------+
| Current host memory in use: 1521MB |
| Current device memory in use: 2917MB |
| Expected output file size: 8MB |
+---------------------------------------------------------------+
| Simulation |
+----------+----------------+--------------+--------------------+
| Progress | Elapsed time | Time to go | Est. finish time |
+----------+----------------+--------------+--------------------+
| 0% | 0.086s | 778.300s | 26/01/21 10:08:10 |
| 5% | 58.048s | 1100.480s | 26/01/21 10:14:30 |
| 10% | 127.455s | 1145.829s | 26/01/21 10:16:24 |
| 15% | 200.249s | 1133.909s | 26/01/21 10:17:25 |
| 20% | 273.220s | 1092.277s | 26/01/21 10:17:57 |
| 25% | 346.869s | 1040.147s | 26/01/21 10:18:19 |
| 30% | 419.282s | 977.964s | 26/01/21 10:18:28 |
| 35% | 492.980s | 915.245s | 26/01/21 10:18:40 |
| 40% | 567.184s | 850.541s | 26/01/21 10:18:49 |
| 45% | 640.790s | 782.996s | 26/01/21 10:18:55 |
| 50% | 714.071s | 713.755s | 26/01/21 10:18:59 |
| 55% | 787.334s | 643.909s | 26/01/21 10:19:02 |
| 60% | 862.208s | 574.567s | 26/01/21 10:19:08 |
| 65% | 935.525s | 503.536s | 26/01/21 10:19:10 |
| 70% | 1009.156s | 432.313s | 26/01/21 10:19:13 |
| 75% | 1082.477s | 360.666s | 26/01/21 10:19:14 |
| 80% | 1155.983s | 288.856s | 26/01/21 10:19:16 |
| 85% | 1229.012s | 216.762s | 26/01/21 10:19:17 |
| 90% | 1301.022s | 144.452s | 26/01/21 10:19:17 |
| 95% | 1373.348s | 72.189s | 26/01/21 10:19:17 |
+----------+----------------+--------------+--------------------+
| Elapsed time: 1446.09s |
+---------------------------------------------------------------+
| Sampled data post-processing: Done |
| Elapsed time: 0.01s |
+---------------------------------------------------------------+
| Summary |
+---------------------------------------------------------------+
| Peak host memory in use: 1521MB |
| Peak device memory in use: 2917MB |
+---------------------------------------------------------------+
| Total execution time: 1448.90s |
+---------------------------------------------------------------+
| End of computation |
+---------------------------------------------------------------+
GPU speeded up the progress of the simulations but it seems it doesn't improve the speed significantly. I don't know whether the total time 1448.90s is the shortest time it can achieve regarding my small spatial grid step? If not, would you please tell me how I can improve it? By the way, This is my first time trying GPU. Any suggestions and learning materials are very welcome. Thank you.