Hi Jiri,
Thank you for your response. I'm attaching 1) My bash script, where you can see the modules I'm uploading, and 2) the output log.
1) My script file
'
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --mem=60000
#SBATCH --time=6:00:00
#SBATCH --job-name=position1
#SBATCH --account=free
#SBATCH --partition=batch-sky
#SBATCH --mail-user=dz322@bath.ac.uk
#SBATCH --output=StdOut.o.%j
#SBATCH --error=StdErr.e.%j
module purge
module load slurm
module load matlab
module load hdf5/intel
module load intel/compiler/64/18.5.274
module load intel/mkl/64/18.5.274
module load fftw3/intel/avx/3.3.4
module load gcc/9.2.0
export OMP_NUM_THREADS=24
export OMP_PLACES=cores
export OMP_PROC_BIND=true
../kwave/kspaceFirstOrder-OMP/skylake/kspaceFirstOrder-OMP -i position1.h5 -o pos1_out_sky.h5 --checkpoint_file check_pos1_sky --checkpoint_interval 20000
'
2) Log
This first one is the error report.
'
┌───────────────────────────────────────────────────────────────┐
│ !!! K-Wave experienced a fatal error !!! │
├───────────────────────────────────────────────────────────────┤
│ basic_string::erase: __pos (which is 18446744073709551615) > │
│ this->size() (which is 14) │
├───────────────────────────────────────────────────────────────┤
│ Execution terminated │
└───────────────────────────────────────────────────────────────┘
'
And this is the output report. I used the university's HPC system, called "Balena". Also, this is the second leg of the computation, and the checkpoint file has been deleted since the task was completed.
'[dz322@balena-01 execute2.1]$ cat StdOut.o.4015071
┌───────────────────────────────────────────────────────────────┐
│ kspaceFirstOrder-OMP v1.3 │
├───────────────────────────────────────────────────────────────┤
│ Reading simulation configuration: Done │
│ Number of CPU threads: 24 │
│ Processor name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz │
├───────────────────────────────────────────────────────────────┤
│ Simulation details │
├───────────────────────────────────────────────────────────────┤
│ Domain dimensions: 7500 x 7500 │
│ Medium type: 2D │
│ Simulation time steps: 136000 │
├───────────────────────────────────────────────────────────────┤
│ Initialization │
├───────────────────────────────────────────────────────────────┤
│ Memory allocation: Done │
│ Data loading: Done │
│ Elapsed time: 62.30s │
├───────────────────────────────────────────────────────────────┤
│ Recovered from time step: 98504 │
├───────────────────────────────────────────────────────────────┤
│ FFT plans creation: Done │
│ Pre-processing phase: Done │
│ Elapsed time: 0.11s │
├───────────────────────────────────────────────────────────────┤
│ Computational resources │
├───────────────────────────────────────────────────────────────┤
│ Current host memory in use: 4181MB │
│ Expected output file size: 389MB │
├───────────────────────────────────────────────────────────────┤
│ Simulation │
├──────────┬────────────────┬──────────────┬────────────────────┤
│ Progress │ Elapsed time │ Time to go │ Est. finish time │
├──────────┼────────────────┼──────────────┼────────────────────┤
│ 72% │ 0.331s │ 7609.713s │ 07/03/21 18:16:37 │
│ 77% │ 1270.113s │ 6350.353s │ 07/03/21 18:16:48 │
│ 82% │ 2656.418s │ 4971.005s │ 07/03/21 18:16:55 │
│ 87% │ 4038.188s │ 3590.181s │ 07/03/21 18:16:56 │
│ 92% │ 5421.730s │ 2209.407s │ 07/03/21 18:16:59 │
│ 97% │ 6805.662s │ 828.360s │ 07/03/21 18:17:02 │
├──────────┴────────────────┴──────────────┴────────────────────┤
│ Elapsed time: 7642.89s │
├───────────────────────────────────────────────────────────────┤
│ Sampled data post-processing: Failed │
└───────────────────────────────────────────────────────────────┘
########################################################################
----------------------! Balena Job Details !----------------------------
JobID : 4015071
No. of nodes : 1
No. of CPUs : 24
Timelimit : 06:00:00
Elapsed time : 02:08:30
Nodelist : node-sky-001
Energy Used : 2665375.56 Joules
########################################################################
'
It still does give a final output file, and the results look fine. I'm just not sure what the "sampled data post-processing" step does to the results at the end?
Many thanks,
Dogu