I posted a similar topic in general, and still have found no solution. I am running into a very weird situation, where kspaceFirstOrder-CUDA fails to open the input h5 file. It seems unable to open or locate the file:
me@cluster_node:/clusterpath/KWaveSimulationsCorrect/rng806480170$ kspaceFirstOrder-CUDA -i ./rholo3d_kwave_input.h5 -o test.h5
┌───────────────────────────────────────────────────────────────┐
│ kspaceFirstOrder-CUDA v1.3 │
├───────────────────────────────────────────────────────────────┤
│ Reading simulation configuration: Failed │
└───────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────┐
│ !!! K-Wave experienced a fatal error !!! │
├───────────────────────────────────────────────────────────────┤
│ Error: File ./rholo3d_kwave_input.h5 was not found or cannot │
│ be opened.: iostream error │
├───────────────────────────────────────────────────────────────┤
│ Execution terminated │
└───────────────────────────────────────────────────────────────┘
The weird thing is, that the same program can use the same input files and it works, when i start the program on my local pc, instead of the cluster i want to use everything works fine.
me@mypc:/cluster_remote_smtp_access/KWaveSimulationsCorrect/rng806480170$ kspaceFirstOrder-CUDA -i rholo3d_kwave_input.h5 -o test.h5
┌───────────────────────────────────────────────────────────────┐
│ kspaceFirstOrder-CUDA v1.3 │
├───────────────────────────────────────────────────────────────┤
│ Reading simulation configuration: Done │
│ Selected GPU device id: 0 │
│ GPU device name: NVIDIA GeForce GTX 1650 Ti │
│ Number of CPU threads: 16 │
│ Processor name: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz │
├───────────────────────────────────────────────────────────────┤
│ Simulation details │
├───────────────────────────────────────────────────────────────┤
│ Domain dimensions: 256 x 256 x 128 │
│ Medium type: 3D │
│ Simulation time steps: 828 │
├───────────────────────────────────────────────────────────────┤
│ Initialization │
├───────────────────────────────────────────────────────────────┤
│ Memory allocation: Done │
The files are created by Matlab2019b, since newer versions crash in various ways, i.e. 2022a:
Error using hdf5lib2
Unable to open the file because of HDF5 Library error. Reason:Unknown
Error in H5F.open (line 130)
file_id = H5ML.hdf5lib2('H5Fopen', filename, flags, fapl, is_remote);
Error in h5write (line 108)
file_id = H5F.open(Filename,flags,fapl);
Error in writeMatrix (line 204)
h5write(filename, ['/' matrix_name], matrix);
Error in kspaceFirstOrder_saveToDisk (line 462)
writeMatrix(flags.save_to_disk, eval(variable_list{cast_index}), variable_list{cast_index}, hdf_compression_level);
Error in kspaceFirstOrder3D (line 673)
kspaceFirstOrder_saveToDisk;
Error in init_kwave_sim (line 169)
sensor_data = kspaceFirstOrder3D(kgrid, medium, source, sensor, input_args{:});
Now i do not even know where to start debugging this.
First problem could be, that the file isn't there, but i checked that like 100 times. It could be some weird access stuff, but i never ran into problems before, though i obviously do not have sudo rights on the cluster.
I assume, since the binaries are statically linked everything is contained within them, and i do not need to worry about CUDA versions or hdf5 versions. Is that correct? Also does the Card i run it on matter? I assume it just works for all new Nvidia cards, like A100, Tesla V100 etc, correct?
Does anyone have any ideas that go beyond "have you checked if the file is really there?"