kspaceFirstOrder3D-OMP  1.1
The C++ implementation of the k-wave toolbox for the time-domain simulation of acoustic wave fields in 3D
 All Classes Files Functions Variables Typedefs Enumerations Friends Pages
main.cpp
Go to the documentation of this file.
1 /**
2  * @file main.cpp
3  * @author Jiri Jaros \n
4  * Faculty of Information Technology\n
5  * Brno University of Technology \n
6  * jarosjir@fit.vutbr.cz
7  *
8  * @brief The main file of kspaceFirstOrder3D-OMP.
9  *
10  * @version kspaceFirstOrder3D 2.15
11  *
12  * @date 11 July 2012, 10:57 (created) \n
13  * 02 October 2014, 10:06 (revised)
14  *
15  *
16  *
17  * @section License
18  * This file is part of the C++ extension of the k-Wave Toolbox (http://www.k-wave.org).\n
19  * Copyright (C) 2014 Jiri Jaros and Bradley Treeby
20  *
21  * This file is part of k-Wave. k-Wave is free software: you can redistribute it
22  * and/or modify it under the terms of the GNU Lesser General Public License as
23  * published by the Free Software Foundation, either version 3 of the License,
24  * or (at your option) any later version.
25  *
26  * k-Wave is distributed in the hope that it will be useful, but
27  * WITHOUT ANY WARRANTY; without even the implied warranty of
28  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
29  * See the GNU Lesser General Public License for more details.
30  *
31  * You should have received a copy of the GNU Lesser General Public License
32  * along with k-Wave. If not, see http://www.gnu.org/licenses/.
33  *
34  *
35  *
36  * @mainpage kspaceFirstOrder3D-OMP
37  *
38  * @section Overview 1 Overview
39  *
40  * k-Wave is an open source MATLAB toolbox designed for the time-domain simulation
41  * of propagating acoustic waves in 1D, 2D, or 3D. The toolbox has a wide range of
42  * functionality, but at its heart is an advanced numerical model that can account
43  * for both linear or nonlinear wave propagation, an arbitrary distribution of weakly
44  * heterogeneous material parameters, and power law acoustic absorption
45  * (http://www.k-wave.org).
46  *
47  * This project is a part of the k-Wave toolbox accelerating 3D simulations using
48  * an optimized C++ implementation to run moderate to big grid sizes.
49  * Compiled binaries of the C++ code for x86 architectures are available from
50  * (http://www.k-wave.org/download). Both 64-bit Linux (Ubuntu / Debian) and 64-bit Windows
51  * versions are provided.
52  *
53  *
54  *
55  * @section Compilation 2 Compilation
56  *
57  *
58  * The source codes of <tt>kpsaceFirstOrder3D-OMP</tt> are written using the C++
59  * 2003 standard and the OpenMP 3.0 library. There are variety of different C++
60  * compilers that can be used to compile the source codes. We recommend using either
61  * the GNU C++ compiler (gcc/g++) version 4.4 and higher, or the Intel C++ compiler
62  * version 11.0 and higher. The codes can be compiled on 64-bit Linux and Windows.
63  * 32-bit systems are not supported!
64  * This section describes the compilation procedure using GNU and Intel compilers on Linux.
65  * (The Windows users are encouraged to download the Visual Studio 2012 project
66  * and compile it using Intel Compiler from within Visual Studio.)
67  *
68  * Before compiling the code, it is necessary to install a C++ compiler and several
69  * libraries. The GNU compiler is usually part of Linux distributions and distributed
70  * as open source. It can be downloaded from http://gcc.gnu.org/ if necessary.
71  * The Intel compiler can be downloaded from http://software.intel.com/en-us/intel-composer-xe/.
72  * This package also includes the Intel MKL (Math Kernel Library) library that contains FFT.
73  * The Intel compiler is only free for non-commercial use.
74  *
75  * The code also relies on several libraries that are to be installed before compiling:
76  *
77  * \li HDF5 library - Mandatory I/O library, version 1.8 or higher (http://www.hdfgroup.org/HDF5/).
78  * \li FFTW library - Optional library for FFT, version 3.0 or higher (http://www.fftw.org/).
79  * \li MKL library - Optional library for FFT, version 11.0 or higher (http://software.intel.com/en-us/intel-composer-xe/).
80 
81  *
82  * Although it is possible to use any combination of the FFT library and the compiler,
83  * the best performance is observed when using GNU compiler and FFTW, or Intel Compiler and Intel MKL.
84 
85  * <b> 2.1 The HDF5 library installation procedure </b>
86 
87  * 1. Download a 64-bit HDF5 library package for your platform (http://www.hdfgroup.org/HDF5/release/obtain5.html).
88  *
89  * 2. Configure the HDF5 distribution. Enable the high-level library and specify
90  * an installation folder by typing:
91  \verbatim
92  ./configure --enable-hl --prefix=folder_to_install
93 \endverbatim
94  * 3. Make the HDF5 library by typing:
95 \verbatim
96  make
97 \endverbatim
98  * 4. Install the HDF5 library by typing:
99 \verbatim
100  make install
101 \endverbatim
102  *
103  *
104  *
105  * <b> 2.2 The FFTW library installation procedure </b>
106  *
107  * 1. Download the FFTW library package for your platform (http://www.fftw.org/download.html).
108  *
109  * 2. Configure the FFTW distribution. Enable OpenMP support, SSE instruction set,
110  * single precision floating point arithmetic, and specify an installation folder:
111 \verbatim
112  ./configure --enable-single --enable-sse --enable-openmp --enable-shared --prefix=folder_to_install
113 \endverbatim
114  * if you intend to use the FFTW library (and the C++ code) only on a selected
115  * machine and want to get the best possible performance, you may also add processor
116  * specific optimisations and AVX instructions set. Note, the compiled binary
117  * code is not likely to be portable on different CPUs (e.g. even from Intel
118  * Sandy Bridge to Intel Nehalem).
119 \verbatim
120  ./configure --enable-single --enable-avx --enable-openmp --enable-shared --with-gcc-arch=native --prefix=folder_to_install
121 \endverbatim
122  * More information about the installation and customization can be found at http://www.fftw.org/fftw3_doc/Installation-and-Customization.htm.
123  * For recent CPUs based on Sandy Bridge, Ivy Bridge, Haswell and Broadwell with
124  * strongly recommend to use the AVX support.
125  * 3. Make the FFTW library by typing:
126 \verbatim
127  make
128 \endverbatim
129  * 4. Install the FFTW library by typing:
130 \verbatim
131  make install
132 \endverbatim
133 
134  * <b> 2.3 The Intel Compiler and MKL installation procedure </b>
135  *
136  *
137  * 1. Download the Intel Composer XE package for your platform (http://software.intel.com/en-us/intel-compilers).
138  *
139  * 2. Run the installation script and follow the procedure by typing:
140 \verbatim
141  ./install.sh
142 \endverbatim
143  *
144  *
145  *
146  * <b> 2.4 Compiling the C++ code on Linux</b>
147  *
148  * After the libraries and the compiler have been installed, you are ready to
149  * compile the <tt>kspaceFirstOrder3D-OMP</tt> code.
150  *
151  *
152  * 1. Download the <tt>kspaceFirstOrder3D-OMP</tt> source codes.
153  *
154  * 2. Open the \c Makefile file. The Makefile supports code compilation under
155  * GNU compiler and FFTW, or Intel compiler with MKL. Uncomment the desired compiler
156  * by removing character `<tt>#</tt>'.
157 \verbatim
158  #COMPILER = GNU
159  #COMPILER = Intel
160 \endverbatim
161  *
162  * 3. Select how to link the libraries. Static linking is preferred as it may be
163  * a bit faster, however, on some systems (HPC clusters) it may be better to use
164  * dynamic linking and use the system specific libraries at runtime.
165 \verbatim
166  #LINKING = STATIC
167  #LINKING = DYNAMIC
168 \endverbatim
169  *
170  * 4. Select the instruction set and the CPU architecture.
171  * For users who will only use the binary on the same machine as compiled, the
172  * best choice is <tt>CPU_ARCH=native</tt>.
173  * If you are about to run the same binary on different machines or you want to
174  * cross-compile the code, you are free to use any of the possible choices,
175  * where SSE 3 is the most general but slowest and AVX2 is the most recent
176  * instruction set while believed to be the fastest one.
177 \verbatim
178  CPU_ARCH = native
179  #CPU_ARCH = SSE3
180  #CPU_ARCH = SSE4
181  #CPU_ARCH = AVX
182  #CPU_ARCH = AVX2
183  \endverbatim
184  *
185  * 5. Set installation paths of the libraries (an example is shown bellow).
186 \verbatim
187  FFT_DIR=/usr/local
188  MKL_DIR=/opt/intel/composer_xe_2013/mkl
189  HDF5_DIR=/usr/local/hdf5-serial
190 \endverbatim
191  *
192  *
193  * 6. Compile the source code by typing:
194 \verbatim
195  make
196 \endverbatim
197  If you want to clean the distribution, type:
198 \verbatim
199  make clean
200 \endverbatim
201  *
202  *
203  *
204  *
205  * @section Parameters 3 Command Line Parameters
206  * The C++ code requires two mandatory parameters and accepts a few optional parameters and flags.
207  * The mandatory parameters \c -i and \c -o specify the input and output file. The
208  * file names respect the path conventions for particular operating system.
209  * If any of the files is not specified, cannot be found or created, an error
210  * message is shown.
211  *
212  * The \c -t parameter sets the number of threads used, which defaults the system maximum.
213  * On CPUs with Intel Hyper-Threading (HT), the performance is sometimes better if HT
214  * is disabled in the BIOS settings. If HT is switched on, the default will be to
215  * spawn twice as many threads as there are physical processor cores, which may but again
216  * may not slow down the code execution. Therefore, if the HT is on, try specifying the
217  * number of threads manually for best performance (e.g. 4 for Intel i7). We
218  * recommend experimenting with this parameter to find the best configuration.
219  * Note, if there are other tasks being executed on the system, it might be useful
220  * to further limit the number of threads to prevent system overload.
221  *
222  * The \c -r parameter specifies how often information about the simulation progress
223  * is printed out to the command line. By default, the C++ code prints out the
224  * progress of the simulation, the elapsed time, and the estimated time of
225  * completion in intervals corresponding to 5% of the total number of times steps.
226  *
227  * The \c -c parameter specifies the compression level used by the ZIP library to
228  * reduce the size of the output file. The actual compression rate is highly dependent
229  * on the shape of the sensor mask and the range of stored quantities. In general,
230  * the output data is very hard to compress, and using higher compression levels
231  * can greatly increase the time to save data while not having a large impact on
232  * the final file size. That's why we decided to disable compression in default settings.
233  *
234  * The \c <tt>--benchmark</tt> parameter enables the total length of simulation (i.e.,
235  * the number of time steps) to be overridden by setting a new number of time
236  * steps to simulate. This is particularly useful for performance evaluation and
237  * benchmarking. As the code performance is relatively stable, 50-100 time steps is
238  * usually enough to predict the simulation duration. This parameter can also be
239  * used to quickly find the ideal number of CPU threads to use.
240  *
241  *
242  *
243  * For jobs that are expected to run for a very long time, it may be useful to
244  * checkpoint and restart the execution. One motivation is the wall clock limit
245  * per task on clusters where jobs must fit within a given time span
246  * (e.g. 24 hours). The second motivation is a level of fault-tolerance, where
247  * you can back up the state of the simulation after a predefined period.
248  * To enable checkpoint-restart, the user is asked to specify a file to store the
249  * actual state of the simulation by <tt>--checkpoint_file</tt> and the
250  * period in seconds after which the simulation will be interrupted by <tt>--checkpoint_interval</tt>.
251  * When running on a cluster, please allocate enough time for the checkpoint procedure
252  * that can take a non-negligible amount of time (7 matrices have to be stored in
253  * the checkpoint file and all aggregated quantities are flushed into the output file).
254  *
255  * When controlling a multi-leg simulation by a script loop, the parameters of the code
256  * remains the same in all legs. The first leg of the simulation creates a checkpoint
257  * file while the last one deletes it. If the checkpoint file is not found the
258  * simulation starts from the beginning. In order to find out how many steps have been
259  * finished, please open the output file and read the variable <tt>t_index</tt>.
260  *
261  *
262  * The \c -h and \c --help parameters print all the parameters of the C++ code.
263  * The <tt> --version </tt>parameter reports detail information about the code useful for
264  * debugging and bug reports. It prints out the internal version, the build date and time, the
265  * git hash allowing us to track the version of the source code, the operating system,
266  * the compiler name and version and the instruction set used.
267  *
268  *
269  * The remaining flags specify the output quantities to be recorded during the
270  * simulation and stored on disk analogous to the sensor.record input. If
271  * the \c -p or \c --p\_raw flags are set (these are equivalent), a time series of
272  * the acoustic pressure at the grid points specified by the sensor mask is
273  * recorded. If the \c --p_rms, \c --p_max, \c --p_min flags are set,
274  * the root mean square and/or maximum and/or minimum values of the pressure at
275  * the grid points specified by the sensor mask are recorded. If the
276  * \c --p_final flag is set, the values for the entire acoustic pressure field
277  * in the final time step of the simulation is stored (this will always include
278  * the PML, regardless of the setting for <tt> `PMLInside'</tt>).
279  * The flags \c --p_max_all and \c --p_min_all allow to calculate the maximum and
280  * minimum values over the entire acoustic pressure field, regardless on the shape
281  * of the sensor mask.
282  * Flags to record the acoustic particle velocity are defined in an analogous fashion.
283  * For proper calculation of acoustic intensity, the particle velocity has to be
284  * shifted onto the same grid as the acoustic pressure. This can be done by setting
285  * \c --u_non_staggered_raw flag, that first shifts the particle velocity and then
286  * samples the grid points specified by the sensor mask. Since the shift operation
287  * requires additional FFTs, the impact on the simulation time may be significant.
288  *
289  * Any combination of <tt>p</tt> and <tt>u</tt> flags is admissible. If no output flag is set,
290  * a time-series for the acoustic pressure is recorded. If it is not necessary
291  * to collect the output quantities over the entire simulation, the starting time
292  * step when the collection begins can be specified using the -s parameter.
293  * Note, the index for the first time step is 1 (this follows the MATLAB indexing convention).
294  *
295  * The \c --copy_sensor_mask will copy the sensor from the input file to the output
296  * one at the end of the simulation. This helps in post-processing and visualisation of
297  * the outputs.
298  *
299 \verbatim
300 ---------------------------------- Usage ---------------------------------
301 Mandatory parameters:
302  -i <input_file_name> : HDF5 input file
303  -o <output_file_name> : HDF5 output file
304 
305 Optional parameters:
306  -t <num_threads> : Number of CPU threads
307  (default = 4)
308  -r <interval_in_%> : Progress print interval
309  (default = 5%)
310  -c <comp_level> : Output file compression level <0,9>
311  (default = 0)
312  --benchmark <steps> : Run a specified number of time steps
313 
314  --checkpoint_file <file_name> : HDF5 checkpoint file
315  --checkpoint_interval <seconds> : Stop after a given number of seconds and
316  store the actual state
317 
318  -h : Print help
319  --help : Print help
320  --version : Print version
321 
322 Output flags:
323  -p : Store acoustic pressure
324  (default if nothing else is on)
325  (the same as --p_raw)
326  --p_raw : Store raw time series of p (default)
327  --p_rms : Store rms of p
328  --p_max : Store max of p
329  --p_min : Store min of p
330  --p_max_all : Store max of p (whole domain)
331  --p_min_all : Store min of p (whole domain)
332  --p_final : Store final pressure field
333 
334  -u : Store ux, uy, uz
335  (the same as --u_raw)
336  --u_raw : Store raw time series of ux, uy, uz
337  --u_non_staggered_raw : Store non-staggered raw time series of
338  ux, uy, uz
339  --u_rms : Store rms of ux, uy, uz
340  --u_max : Store max of ux, uy, uz
341  --u_min : Store min of ux, uy, uz
342  --u_max_all : Store max of ux, uy, uz (whole domain)
343  --u_min_all : Store min of ux, uy, uz (whole domain)
344  --u_final : Store final acoustic velocity
345 
346  --copy_sensor_mask : Copy sensor mask to the output file
347 
348  -s <timestep> : Time step when data collection begins
349  (default = 1)
350 --------------------------------------------------------------------------
351 \endverbatim
352  *
353  *
354  *
355  *
356  *
357  *
358  * @section HDF5Files 4 HDF5 File Structure
359  *
360  * The C++ code has been designed as a standalone application which is not dependent
361  * on MATLAB libraries or a MEX interface. This is of particular importance when
362  * using servers and supercomputers without MATLAB support. For this reason, simulation
363  * data must be transferred between the C++ code and MATLAB using external input
364  * and output files. These files are stored using the Hierarchical Data Format
365  * HDF5 (http://www.hdfgroup.org/HDF5/). This is a data model, library, and file
366  * format for storing and managing data. It supports a variety of datatypes, and
367  * is designed for flexible and efficient I/O and for high volume and complex data.
368  * The HDF5 technology suite includes tools and applications for managing,
369  * manipulating, viewing, and analysing data in the HDF5 format.
370  *
371  *
372  *
373  * Each HDF5 file is a container for storing a variety of scientific data and is composed
374  * of two primary types of objects: groups and datasets. A HDF5 group is a structure
375  * containing zero or more HDF5 objects, together with supporting metadata. A HDF5
376  * group can be seen as a disk folder. A HDF5 dataset is a multidimensional array of
377  * data elements, together with supporting metadata. A HDF5 dataset can be seen as a
378  * disk file. Any HDF5 group or dataset may also have an associated attribute list. A
379  * HDF5 attribute is a user-defined HDF5 structure that provides extra information about a
380  * HDF5 object. More information can be obtained from the HDF5 documentation (http://www.hdfgroup.org/HDF5/doc/index.html).
381  *
382  *
383  * kspaceFirstOrder3D-OMP v1.1 introduces a new version of the HDF5 input and
384  * output file format. The code is happy to work with both versions (1.0 and 1.1), however when
385  * working with an input file of version 1.0, some features are not supported,
386  * namely the cuboid sensor mask, and u_non_staggered_raw.
387  * When running from within the actual K-Wave Toolbox, the files will always be generated
388  * in version 1.1
389  *
390  * The HDF5 input file for the C++ simulation code contains a file header with
391  * brief description of the simulation stored in string attributes,
392  * and the root group `/' which stores all the simulation properties in the form
393  * of 3D datasets (a complete list of input datasets is given bellow).
394  * The HDF5 checkpoint file contains the same file header as the input file and
395  * the root group `/' with a few datasets keeping the actual simulation state
396  * The HDF5 output file contains a file header with the simulation description
397  * as well as performance statistics, such as the simulation time and memory
398  * consumption, stored in string attributes.
399  * The results of the simulation are stored in the root group `/' in the form of 3D datasets.
400  *
401  *
402  * \verbatim
403 ==============================================================================================================
404  Input File/Checkpoint File Header
405 =============================================================================================================
406 created_by Short description of the tool that created this file
407 creation_date Date when the file was created
408 file_description Short description of the content of the file (e.g. simulation name)
409 file_type Type of the file (input)
410 major_version Major version of the file definition (1)
411 minor_version Minor version of the file definition (1)
412 ==============================================================================================================
413  \endverbatim
414  *
415  * \verbatim
416 ==============================================================================================================
417  Output File Header
418 ==============================================================================================================
419 created_by Short description of the tool that created this file
420 creation_date Date when the file was created
421 file_description Short description of the content of the file (e.g. simulation name)
422 file_type Type of the file (output)
423 major_version Major version of the file definition (1)
424 minor_version Minor version of the file definition (1)
425 -------------------------------------------------------------------------------------------------------------
426 host_names List of hosts (computer names) the simulation was executed on
427 number_of_cpu_cores Number of CPU cores used for the simulation
428 data_loading_phase_execution_time Time taken to load data from the file
429 pre-processing_phase_execution_time Time taken to pre-process data
430 simulation_phase_execution_time Time taken to run the simulation
431 post-processing_phase_execution_time Time taken to complete the post-processing phase
432 total_execution_time Total execution time
433 peak_core_memory_in_use Peak memory required per core during the simulation
434 total_memory_in_use Total Peak memory in use
435 ==============================================================================================================
436  \endverbatim
437  *
438  *
439  *
440  * The input and checkpoint file stores all quantities as three dimensional datasets with
441  * dimension sizes designed by <tt>(Nx, Ny, Nz)</tt>. In order to support scalars
442  * and 1D and 2D arrays, the unused dimensions are set to 1.
443  * For example, scalar variables are stored with a dimension size of <tt>(1,1,1)</tt>,
444  * 1D vectors oriented in y-direction are stored with a dimension size of <tt>(1, Ny, 1)</tt>,
445  * and so on. If the dataset stores a complex variable, the real and imaginary parts
446  * are stored in an interleaved layout and the lowest used dimension size is
447  * doubled (i.e., Nx for a 3D matrix, Ny for a 1D vector oriented in the y-direction). The
448  * datasets are physically stored in row-major order (in contrast to column-major order used
449  * by MATLAB) using either the <tt>`H5T_IEEE_F32LE'</tt> data type for floating point datasets or
450  * <tt>`H5T_STD_U64LE'</tt> for integer based datasets.
451  * All the datasets are store under the root group.
452  *
453  *
454  * The output file of version 1.0 could only store recorded quantities as 3D datasets
455  * under the root group. However, with version 1.1 and the new cuboid corner sensor
456  * mask, the sampled quantities may be laid out as 4D quantities stored under specific
457  * groups. The dimensions are always <tt>(Nx, Ny, Nz, Nt)</tt>, every sampled cuboid
458  * is stored as a distinct dataset and the datasets are grouped under a group named
459  * by the quantity stored. This makes the file clearly readable and easy to parse.
460  *
461  *
462  * In order to enable compression and more efficient data processing, big datasets
463  * are not stored as monolithic blocks
464  * but broken into chunks that may be compressed by the ZIP library and stored
465  * separately. The chunk size is defined as follows:
466  *
467  * \li <tt> (1M elements, 1, 1) </tt> in the case of 1D variables - index sensor mask (8MB blocks).
468  * \li <tt> (Nx, Ny, 1) </tt> in the case of 3D variables (one 2D slab).
469  * \li <tt> (Nx, Ny, Nz, 1) </tt> in the case of 4D variables (one time step).
470  * \li <tt> (N_sensor_points, 1, 1) </tt> in the case of the output time series (one time step of the simulation).
471  *
472  *
473  * All datasets have two attributes that specify the content of the dataset. The \c `data_type'
474  * attribute specifies the data type of the dataset. The admissible values are either \c `float' or
475  * \c `long'. The \c `domain_type' attribute specifies the domain of the dataset. The admissible
476  * values are either \c `real' for the real domain or \c `complex' for the complex domain. The
477  * C++ code reads these attributes and checks their values.
478  *
479  *
480  *
481  * \verbatim
482 ==============================================================================================================
483  Input File Datasets
484 ==============================================================================================================
485 Name Size Data type Domain Type Condition of Presence
486 ==============================================================================================================
487  1. Simulation Flags
488 --------------------------------------------------------------------------------------------------------------
489  ux_source_flag (1, 1, 1) long real
490  uy_source_flag (1, 1, 1) long real
491  uz_source_flag (1, 1, 1) long real
492  p_source_flag (1, 1, 1) long real
493  p0_source_flag (1, 1, 1) long real
494  transducer_source_flag (1, 1, 1) long real
495  nonuniform_grid_flag (1, 1, 1) long real must be set to 0
496  nonlinear_flag (1, 1, 1) long real
497  absorbing_flag (1, 1, 1) long real
498 --------------------------------------------------------------------------------------------------------------
499  2. Grid Properties
500 --------------------------------------------------------------------------------------------------------------
501  Nx (1, 1, 1) long real
502  Ny (1, 1, 1) long real
503  Nz (1, 1, 1) long real
504  Nt (1, 1, 1) long real
505  dt (1, 1, 1) float real
506  dx (1, 1, 1) float real
507  dy (1, 1, 1) float real
508  dz (1, 1, 1) float real
509  x_shift_neg_r (Nx/2+1, 1, 1) float complex File version 1.1
510  y_shift_neg_r (1, Ny/2+1, 1) float complex File version 1.1
511  z_shift_neg_r (1, 1, Nz/2+1) float complex File version 1.1
512 --------------------------------------------------------------------------------------------------------------
513  3 Medium Properties
514 --------------------------------------------------------------------------------------------------------------
515  3.1 Regular Medium Properties
516  rho0 (Nx, Ny, Nz) float real heterogenous
517  (1, 1, 1) float real homogenous
518  rho0_sgx (Nx, Ny, Nz) float real heterogenous
519  (1, 1, 1) float real homogenous
520  rho0_sgy (Nx, Ny, Nz) float real heterogenous
521  (1, 1, 1) float real homogenous
522  rho0_sgz (Nx, Ny, Nz) float real heterogenous
523  (1, 1, 1) float real homogenous
524  c0 (Nx, Ny, Nz) float real heterogenous
525  (1, 1, 1) float real homogenous
526  c_ref (1, 1, 1) float real
527 
528  3.2 Nonlinear Medium Properties (defined if (nonlinear_flag == 1))
529  BonA (Nx, Ny, Nz) float real heterogenous
530  (1, 1, 1) float real homogenous
531 
532  3.3 Absorbing Medium Properties (defined if (absorbing_flag == 1))
533  alpha_coef (Nx, Ny, Nz) float real heterogenous
534  (1, 1, 1) float real homogenous
535  alpha_power (1, 1, 1) float real
536 --------------------------------------------------------------------------------------------------------------
537  4. Sensor Variables
538 --------------------------------------------------------------------------------------------------------------
539  sensor_mask_type (1, 1, 1) long real File version 1.1 (0 = index, 1 = corners)
540  sensor_mask_index (Nsens, 1, 1) long real File version 1.0 always, File version 1.1 if sensor_mask_type == 0
541  sensor_mask_corners (Ncubes, 6, 1) long real File version 1.1, if sensor_mask_type == 1
542 --------------------------------------------------------------------------------------------------------------
543  5 Source Properties
544 --------------------------------------------------------------------------------------------------------------
545  5.1 Velocity Source Terms (defined if (ux_source_flag == 1 || uy_source_flag == 1 || uz_source_flag == 1))
546  u_source_mode (1, 1, 1) long real
547  u_source_many (1, 1, 1) long real
548  u_source_index (Nsrc, 1, 1) long real
549  ux_source_input (1, Nt_src, 1) float real u_source_many == 0
550  (Nsrc, Nt_src, 1) float real u_source_many == 1
551  uy_source_input (1, Nt_src, 1) float real u_source_many == 0
552  (Nsrc, Nt_src, 1) float real u_source_many == 1
553  uz_source_input (1, Nt_src, 1) float real u_source_many == 0
554  (Nt_src, Nsrc, 1) float real u_source_many == 1
555 
556  5.2 Pressure Source Terms (defined if p_source_flag == 1))
557  p_source_mode (1, 1, 1) long real
558  p_source_many (1, 1, 1) long real
559  p_source_index (Nsrc, 1, 1) long real
560  p_source_input (Nsrc, Nt_src, 1) float real p_source_many == 1
561  (1, Nt_src, 1) float real p_source_many == 0
562 
563  5.3 Transducer Source Terms (defined if (transducer_source_flag == 1))
564  u_source_index (Nsrc, 1, 1) long real
565  transducer_source_input (Nt_src, 1, 1) float real
566  delay_mask (Nsrc, 1, 1) float real
567 
568  5.4 IVP Source Terms (defined if ( p0_source_flag ==1))
569  p0_source_input (Nx, Ny, Nz) float real
570 --------------------------------------------------------------------------------------------------------------
571  6. K-space and Shift Variables
572 --------------------------------------------------------------------------------------------------------------
573  ddx_k_shift_pos_r (Nx/2 + 1, 1, 1) float complex
574  ddx_k_shift_neg_r (Nx/2 + 1, 1, 1) float complex
575  ddy_k_shift_pos (1, Ny, 1) float complex
576  ddy_k_shift_neg (1, Ny, 1) float complex
577  ddz_k_shift_pos (1, 1, Nz) float complex
578  ddz_k_shift_neg (1, 1, Nz) float complex
579 --------------------------------------------------------------------------------------------------------------
580  7. PML Variables
581 --------------------------------------------------------------------------------------------------------------
582  pml_x_size (1, 1, 1) long real
583  pml_y_size (1, 1, 1) long real
584  pml_z_size (1, 1, 1) long real
585  pml_x_alpha (1, 1, 1) float real
586  pml_y_alpha (1, 1, 1) float real
587  pml_z_alpha (1, 1, 1) float real
588 
589  pml_x (Nx, 1, 1) float real
590  pml_x_sgx (Nx, 1, 1) float real
591  pml_y (1, Ny, 1) float real
592  pml_y_sgy (1, Ny, 1) float real
593  pml_z (1, 1, Nz) float real
594  pml_z_sgz (1, 1, Nz) float real
595 ==============================================================================================================
596  \endverbatim
597 
598 \verbatim
599 ==============================================================================================================
600  Checkpoint File Datasets
601 ==============================================================================================================
602 Name Size Data type Domain Type Condition of Presence
603 ==============================================================================================================
604  1. Grid Properties
605 --------------------------------------------------------------------------------------------------------------
606  Nx (1, 1, 1) long real
607  Ny (1, 1, 1) long real
608  Nz (1, 1, 1) long real
609  Nt (1, 1, 1) long real
610  t_index (1, 1, 1) long real
611 --------------------------------------------------------------------------------------------------------------
612  2. Simulation state
613 --------------------------------------------------------------------------------------------------------------
614  p (Nx, Ny, Nz) float real
615  ux_sgx (Nx, Ny, Nz) float real
616  uy_sgy (Nx, Ny, Nz) float real
617  uz_sgz (Nx, Ny, Nz) float real
618  rhox (Nx, Ny, Nz) float real
619  rhoy (Nx, Ny, Nz) float real
620  rhoz (Nx, Ny, Nz) float real
621 --------------------------------------------------------------------------------------------------------------
622 \endverbatim
623 
624 
625  \verbatim
626 ==============================================================================================================
627  Output File Datasets
628 ==============================================================================================================
629 Name Size Data type Domain Type Condition of Presence
630 ==============================================================================================================
631  1. Simulation Flags
632 --------------------------------------------------------------------------------------------------------------
633  ux_source_flag (1, 1, 1) long real
634  uy_source_flag (1, 1, 1) long real
635  uz_source_flag (1, 1, 1) long real
636  p_source_flag (1, 1, 1) long real
637  p0_source_flag (1, 1, 1) long real
638  transducer_source_flag (1, 1, 1) long real
639  nonuniform_grid_flag (1, 1, 1) long real
640  nonlinear_flag (1, 1, 1) long real
641  absorbing_flag (1, 1, 1) long real
642  u_source_mode (1, 1, 1) long real if u_source
643  u_source_many (1, 1, 1) long real if u_source
644  p_source_mode (1, 1, 1) long real if p_source
645  p_source_many (1, 1, 1) long real if p_source
646 --------------------------------------------------------------------------------------------------------------
647  2. Grid Properties
648 --------------------------------------------------------------------------------------------------------------
649  Nx (1, 1, 1) long real
650  Ny (1, 1, 1) long real
651  Nz (1, 1, 1) long real
652  Nt (1, 1, 1) long real
653  dt (1, 1, 1) float real
654  dx (1, 1, 1) float real
655  dy (1, 1, 1) float real
656  dz (1, 1, 1) float real
657 -------------------------------------------------------------------------------------------------------------
658  3. PML Variables
659 --------------------------------------------------------------------------------------------------------------
660  pml_x_size (1, 1, 1) long real
661  pml_y_size (1, 1, 1) long real
662  pml_z_size (1, 1, 1) long real
663  pml_x_alpha (1, 1, 1) float real
664  pml_y_alpha (1, 1, 1) float real
665  pml_z_alpha (1, 1, 1) float real
666 
667  pml_x (Nx, 1, 1) float real
668  pml_x_sgx (Nx, 1, 1) float real
669  pml_y (1, Ny, 1) float real
670  pml_y_sgy (1, Ny, 1) float real
671  pml_z (1, 1, Nz) float real
672  pml_z_sgz (1, 1, Nz) float real
673 --------------------------------------------------------------------------------------------------------------
674  4. Sensor Variables (present if --copy_sensor_mask)
675 --------------------------------------------------------------------------------------------------------------
676  sensor_mask_type (1, 1, 1) long real File version 1.1 and --copy_sensor_mask
677  sensor_mask_index (Nsens, 1, 1) long real File version 1.1 and if sensor_mask_type == 0
678  sensor_mask_corners (Ncubes, 6, 1) long real File version 1.1 and if sensor_mask_type == 1
679 --------------------------------------------------------------------------------------------------------------
680  5a. Simulation Results: if sensor_mask_type == 0 (index), or File version == 1.0
681 --------------------------------------------------------------------------------------------------------------
682  p (Nsens, Nt - s, 1) float real -p or --p_raw
683  p_rms (Nsens, 1, 1) float real --p_rms
684  p_max (Nsens, 1, 1) float real --p_max
685  p_min (Nsens, 1, 1) float real --p_min
686  p_max_all (Nx, Ny, Nz) float real --p_max_all
687  p_min_all (Nx, Ny, Nz) float real --p_min_all
688  p_final (Nx, Ny, Nz) float real --p_final
689 
690 
691  ux (Nsens, Nt - s, 1) float real -u or --u_raw
692  uy (Nsens, Nt - s, 1) float real -u or --u_raw
693  uz (Nsens, Nt - s, 1) float real -u or --u_raw
694 
695  ux_non_staggered (Nsens, Nt - s, 1) float real --u_non_staggered_raw (File version ==1.1)
696  uy_non_staggered (Nsens, Nt - s, 1) float real --u_non_staggered_raw (File version ==1.1)
697  uz_non_staggered (Nsens, Nt - s, 1) float real --u_non_staggered_raw (File version ==1.1)
698 
699  ux_rms (Nsens, 1, 1) float real --u_rms
700  uy_rms (Nsens, 1, 1) float real --u_rms
701  uz_rms (Nsens, 1, 1) float real --u_rms
702 
703  ux_max (Nsens, 1, 1) float real --u_max
704  uy_max (Nsens, 1, 1) float real --u_max
705  uz_max (Nsens, 1, 1) float real --u_max
706 
707  ux_min (Nsens, 1, 1) float real --u_min
708  uy_min (Nsens, 1, 1) float real --u_min
709  uz_min (Nsens, 1, 1) float real --u_min
710 
711  ux_max_all (Nx, Ny, Nz) float real --u_max_all
712  uy_max_all (Nx, Ny, Nz) float real --u_max_all
713  uz_max_all (Nx, Ny, Nz) float real --u_max_all
714 
715  ux_min_all (Nx, Ny, Nz) float real --u_min_all
716  uy_min_all (Nx, Ny, Nz) float real --u_min_all
717  uz_min_all (Nx, Ny, Nz) float real --u_min_all
718 
719  ux_final (Nx, Ny, Nz) float real --u_final
720  uy_final (Nx, Ny, Nz) float real --u_final
721  uz_final (Nx, Ny, Nz) float real --u_final
722 --------------------------------------------------------------------------------------------------------------
723  5b. Simulation Results: if sensor_mask_type == 1 (corners) and File version == 1.1
724 --------------------------------------------------------------------------------------------------------------
725  /p group of datasets, one per cuboid -p or --p_raw
726  /p/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
727  /p/2 (Cx, Cy, Cz, Nt-s) float real 2nd sampled cuboid, etc.
728 
729  /p_rms group of datasets, one per cuboid --p_rms
730  /p_rms/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
731 
732  /p_max group of datasets, one per cuboid --p_max
733  /p_max/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
734 
735  /p_min group of datasets, one per cuboid --p_min
736  /p_min/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
737 
738  p_max_all (Nx, Ny, Nz) float real --p_max_all
739  p_min_all (Nx, Ny, Nz) float real --p_min_all
740  p_final (Nx, Ny, Nz) float real --p_final
741 
742 
743  /ux group of datasets, one per cuboid -u or --u_raw
744  /ux/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
745  /uy group of datasets, one per cuboid -u or --u_raw
746  /uy/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
747  /uz group of datasets, one per cuboid -u or --u_raw
748  /uz/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
749 
750  /ux_non_staggered group of datasets, one per cuboid --u_non_staggered_raw
751  /ux_non_staggered/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
752  /uy_non_staggered group of datasets, one per cuboid --u_non_staggered_raw
753  /uy_non_staggered/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
754  /uz_non_staggered group of datasets, one per cuboid --u_non_staggered_raw
755  /uz_non_staggered/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
756 
757  /ux_rms group of datasets, one per cuboid --u_rms
758  /ux_rms/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
759  /uy_rms group of datasets, one per cuboid --u_rms
760  /uy_rms/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
761  /uz_rms group of datasets, one per cuboid --u_rms
762  /uy_rms/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
763 
764  /ux_max group of datasets, one per cuboid --u_max
765  /ux_max/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
766  /uy_max group of datasets, one per cuboid --u_max
767  /ux_max/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
768  /uz_max group of datasets, one per cuboid --u_max
769  /ux_max/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
770 
771  /ux_min group of datasets, one per cuboid --u_min
772  /ux_min/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
773  /uy_min group of datasets, one per cuboid --u_min
774  /ux_min/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
775  /uz_min group of datasets, one per cuboid --u_min
776  /ux_min/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
777 
778  ux_max_all (Nx, Ny, Nz) float real --u_max_all
779  uy_max_all (Nx, Ny, Nz) float real --u_max_all
780  uz_max_all (Nx, Ny, Nz) float real --u_max_all
781 
782  ux_min_all (Nx, Ny, Nz) float real --u_min_all
783  uy_min_all (Nx, Ny, Nz) float real --u_min_all
784  uz_min_all (Nx, Ny, Nz) float real --u_min_all
785 
786  ux_final (Nx, Ny, Nz) float real --u_final
787  uy_final (Nx, Ny, Nz) float real --u_final
788  uz_final (Nx, Ny, Nz) float real --u_final
789 ==============================================================================================================
790 \endverbatim
791  *
792  *
793  */
794 
795 
796 
797 
798 #include <cstdlib>
799 #include <iostream>
800 #include <exception>
801 
802 #ifdef _OPENMP
803  #include <omp.h>
804 #endif
805 
807 
808 
809 using namespace std;
810 
811 
812 /// separator
813 static const char * FMT_SmallSeparator = "--------------------------------\n";
814 
815 /**
816  * The main function of the ksppaceFirstOrder3D-OMP
817  * @param [in] argc
818  * @param [in] argv
819  * @return
820  */
821 int main(int argc, char** argv)
822 {
823  // Create K-Space solver
824  TKSpaceFirstOrder3DSolver KSpaceSolver;
825 
826  // print header
827  fprintf(stdout,"%s",FMT_SmallSeparator);
828  fprintf(stdout," %s\n",KSpaceSolver.GetCodeName().c_str());
829  fprintf(stdout,"%s",FMT_SmallSeparator);
830 
831  // Create parameters and parse command line
832  TParameters* Parameters = TParameters::GetInstance();
833 
834  Parameters->ParseCommandLine(argc,argv);
835  if (Parameters->IsVersion())
836  {
837  KSpaceSolver.PrintFullNameCodeAndLicense(stdout);
838  return 0;
839  }
840 
841 
842  // set number of threads and bind them to cores
843  #ifdef _OPENMP
844  KSpaceSolver.SetProcessorAffinity();
845  omp_set_num_threads(Parameters->GetNumberOfThreads());
846  #endif
847 
848 
849  fprintf(stdout, "Number of CPU threads: %6ld\n", Parameters->GetNumberOfThreads());
850  KSpaceSolver.PrintParametersOfSimulation(stdout);
851 
852 
853  fprintf(stdout,"%s",FMT_SmallSeparator);
854  fprintf(stdout,"........ Initialization ........\n");
855  fprintf(stdout,"Memory allocation ..........");
856  fflush(stdout);
857 
858 
859  // allocate memory
860  try
861  {
862  KSpaceSolver.AllocateMemory();
863  }
864  catch (exception e)
865  {
866  fprintf(stdout, "Failed!\nK-Wave panic: Not enough memory to run this simulation!\n%s\n",e.what());
867  fprintf(stderr, "K-Wave panic: Not enough memory to run this simulation! \n%s\n",e.what());
868  return EXIT_FAILURE;
869  }
870  fprintf(stdout, "Done\n");
871 
872 
873  // Load data from disk
874  fprintf(stdout, "Data loading................");
875  fflush(stdout);
876  try
877  {
878  KSpaceSolver.LoadInputData();
879  }
880  catch (ios::failure e)
881  {
882  fprintf(stdout, "Failed!\nK-Wave panic: Data loading was not successful!\n%s\n",e.what());
883  fprintf(stderr, "K-Wave panic: Data loading was not successful! \n%s\n",e.what());
884  return EXIT_FAILURE;
885  }
886  fprintf(stdout, "Done\n");
887 
888  fprintf(stdout,"Elapsed time: %8.2fs\n",KSpaceSolver.GetDataLoadTime());
889 
890  if (Parameters->Get_t_index() > 0)
891  {
892  fprintf(stdout, "Recovered from t_index: %8ld\n", Parameters->Get_t_index());
893  }
894 
895  // start computation
896  fprintf(stdout,"%s",FMT_SmallSeparator);
897  fprintf(stdout, ".......... Computation .........\n");
898 
899  KSpaceSolver.Compute();
900 
901  fprintf(stdout,"%s",FMT_SmallSeparator);
902  fprintf(stdout, "............ Summary ...........\n");
903  fprintf(stdout, "Peak memory in use: %8ldMB\n",KSpaceSolver.ShowMemoryUsageInMB());
904  if (KSpaceSolver.GetCumulatedTotalTime() != KSpaceSolver.GetTotalTime())
905  {
906  fprintf(stdout,"This leg execution time:%7.2fs\n",KSpaceSolver.GetTotalTime());
907  }
908  fprintf(stdout, "Total execution time: %8.2fs\n",KSpaceSolver.GetCumulatedTotalTime());
909 
910 
911  fprintf(stdout,"%s",FMT_SmallSeparator);
912  fprintf(stdout," End of computation \n");
913  fprintf(stdout,"%s",FMT_SmallSeparator);
914 
915  return EXIT_SUCCESS;
916 }// end of main
917 //------------------------------------------------------------------------------
void ParseCommandLine(int argc, char **argv)
Parse command line.
Definition: Parameters.cpp:94
void PrintFullNameCodeAndLicense(FILE *file)
Print the code name and license.
double GetTotalTime() const
Get total simulation time.
size_t GetNumberOfThreads() const
Get number of threads.
Definition: Parameters.h:210
void SetProcessorAffinity()
Set processor affinity.
int main(int argc, char **argv)
Definition: main.cpp:821
bool IsVersion() const
Is –version specified at the command line?
Definition: Parameters.h:223
virtual size_t ShowMemoryUsageInMB()
Get memory usage in MB.
The header file containing the main class of the project responsible for the entire simulation...
static const char * FMT_SmallSeparator
separator
Definition: main.cpp:813
size_t Get_t_index() const
Get simulation time step.
Definition: Parameters.h:91
Class storing all parameters of the simulation.
Definition: Parameters.h:51
double GetCumulatedTotalTime() const
Get total simulation time cumulated over all legs.
virtual void LoadInputData()
Load simulation data from the input file.
virtual void PrintParametersOfSimulation(FILE *file)
Print parameters of the simulation.
string GetCodeName()
Get code name.
virtual void Compute()
Compute the 3D kspace first order simulation.
virtual void AllocateMemory()
Memory allocation.
double GetDataLoadTime() const
Get data load time.
Class responsible for running the k-space first order 3D method.
static TParameters * GetInstance()
Get instance of the singleton class.
Definition: Parameters.cpp:74