Optimised Codes - Using Checkpoints

k-Wave User Forum » Using the C++ Code » CPU Binaries

Optimised Codes - Using Checkpoints

(3 posts) (2 voices)

Started 4 years ago by doguzaif
Latest reply from doguzaif

doguzaif

Member
Joined: Jul '20
Posts: 6

Hi all,

I have a simulation at hand that looks like it will take ~8 hours on the cluster machine I have access to, yet I can run simulations for only 6 hours in one job submission. Using a bash script, I have managed to create a checkpoint file for the first leg of the simulation. However, at the moment I am not sure how to make the second leg start from where the checkpoint file was created. Does this just happen automatically when I run the same simulation script? Here is my code:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24

#SBATCH --mem=60000

#SBATCH --time=6:00:00

#SBATCH --job-name=position1
#SBATCH --account=free
#SBATCH --partition=batch-sky

#SBATCH --mail-user=dz322@bath.ac.uk

#SBATCH --output=StdOut.o.%j
#SBATCH --error=StdErr.e.%j

module purge
module load slurm
module load matlab
module load hdf5/intel
module load intel/compiler/64/18.5.274
module load intel/mkl/64/18.5.274
module load fftw3/intel/avx/3.3.4
module load gcc/9.2.0

export OMP_NUM_THREADS=24
export OMP_PLACES=cores
export OMP_PROC_BIND=true

../kwave/kspaceFirstOrder-OMP/skylake/kspaceFirstOrder-OMP -i position1.h5 -o pos1_out_sky_2.h5 --checkpoint_file check_pos5_sky --checkpoint_interval 20000

The manual is talking about putting a loop in the bash script, yet I am unsure as to how I can implement this. Any help to do with this would be greatly appreciated. Many thanks!

Best wishes,
Dogu Zaifoglu
PhD candidate in Mechanical Engineering
MEng, University of Bath

Posted 4 years ago #
Jiri Jaros

Developer
Joined: Feb '12
Posts: 118

Hi Dogu,
it's really simple, just start it again with the same parameters. If the binary finds the checkpoint file, it will continue from the point where it was interrupted. If not, it will start from the beginning!

In order to automate the whole process, I check the output file and compare datasets t_index and Nt. If they are equal, the simulation has finished. If not, I resubmit the same job (you can do it as the last command in your bash script). You can use, e.g., h5dump to read the data, or use pyh5 (https://docs.h5py.org/en/stable/high/dataset.html)

Best wishes
Jiri

Posted 4 years ago #
doguzaif

Member
Joined: Jul '20
Posts: 6

Hi Jiri,

Thank you for your response - it was indeed as simple as running the code again!

And thank you for your tips regarding the script; I'll look into automating the process.

Kind regards,
Dogu

Posted 4 years ago #

RSS feed for this topic

Reply

You must log in to post.

k-Wave

A MATLAB toolbox for the time-domain simulation of acoustic wave fields

Optimised Codes - Using Checkpoints

Reply

A MATLAB toolbox for the time-domain
simulation of acoustic wave fields