Hi Chao,
Matrices FFTW_X_temp, FFTW_Y_temp, and FFTW_Z_temp
serve as temporary storage for data in the complex domain at various places of the code.
Any time I want to take FFT of something, I need to store the result somewhere. I usually need to take FFTs of three different spatial components, e.g. ux, uy, uz calculate duxdx, dyuduy, duzdz, so that’s why I need three temporary matrices. In the special case of pressure, I don’t need to take FFT of pressure 3 times, as the result will always be the same. That’s why I do so only once and store the result in FFT_X_temp
. That’s the first line of Compute_ddx_kappa_fft_p()
.
However, when I multiply the FFT(p) with ddx/ddy/ddz and kappa I get 3 different results. Thus I need two more temporary matrices and store the output in FFT_X_temp, FFT_Y_temp and FFT_Z_temp in order to take the inverse FFT to calculate ux, uy, uz.
Let’s dismantle the Matlab code for this part (the C++ code follows the MATLAB):
% calculate ux, uy and uz at the next time step using dp/dx, dp/dy and
% dp/dz at the current time step
ux_sgx = bsxfun(@times, pml_x_sgx, ...
bsxfun(@times, pml_x_sgx, ux_sgx) ...
- dt./rho0_sgx .* real(ifftn( ...
bsxfun(@times, ddx_k_shift_pos, kappa .* p_k) )));
uy_sgy = bsxfun(@times, pml_y_sgy, ...
bsxfun(@times, pml_y_sgy, uy_sgy) ...
- dt./rho0_sgy .* real(ifftn( ...
bsxfun(@times, ddy_k_shift_pos, kappa .* p_k) )));
uz_sgz = bsxfun(@times, pml_z_sgz, ...
bsxfun(@times, pml_z_sgz, uz_sgz) ...
- dt./rho0_sgz .* real(ifftn( ...
bsxfun(@times, ddz_k_shift_pos, kappa .* p_k) )));
In order to save computational resources I calculate all of these statements simultaneously (kappa and p_k are the same think in all 3 statements).
1) We take fft(p) and store the result in p_k
. (in the C code this isFFT_X_temp
)
2) We calculate
a. bsxfun(@times, ddx_k_shift_pos, kappa .* p_k)
b. bsxfun(@times, ddy_k_shift_pos, kappa .* p_k)
c. bsxfun(@times, ddz_k_shift_pos, kappa .* p_k)
We’ll do this at once meaning taking an element from p_k, multiplying it with kappa (which is same in all statements) and store this in a single float variable say p_k_element
. Then we carry out bsxfun on this element and ddx/ddy/ddz and store three results into target elements in FFT_X_temp, FFT_Y_temp, FFT_Z_temp
3) Now, we will take inverse FFTs and we need some place where to store it. We can’t override ux_sgx
or uy_sgy
or uz_sgz
because we will need them. So we have to do
a. Temp_1_RS3D = real(ifft(FFT_X_temp));
b. Temp_2_RS3D = real(ifft(FFT_Y_temp));
c. Temp_3_RS3D = real(ifft(FFT_Z_temp));
4) Now we finish the rest of the statements:
a. ux_sgx = bsxfun(@times, pml_x_sgx, bsxfun(@times, pml_x_sgx, ux_sgx) .- dt./rho0_sgx .* Temp_1_RS3D );
b. uy_sgy = bsxfun(@times, pml_y_sgy, bsxfun(@times, pml_y_sgy, uy_sgy) .- dt./rho0_sgy .* Temp_2_RS3D );
c. uz_sgz = bsxfun(@times, pml_z_sgz, bsxfun(@times, pml_z_sgz, uz_sgz) .- dt./rho0_sgz .* Temp_3_RS3D );
Please note that, the Temp_1_RS3D, Temp_2_RS3D and Temp_3_RS3D
matrices are used at different places in the code in different meanings. Simply said, always I need a memory buffer I use one of these matrices since dynamic allocation could be very very very expensive!
The code makes a lot of compromises to the readability in order to squeeze every possible FLOP and B/s from the HW :-)
Jiri