IMB-IO Non-blocking Benchmarks

Intel® MPI Benchmarks implements blocking and non-blocking modes of the IMB-IO benchmarks as different benchmark flavors. The Read and Write components of the blocking benchmark name are replaced for non-blocking flavors by IRead and IWrite, respectively.

The definitions of blocking and non-blocking flavors are identical, except for their behavior in regard to:

Basically, an elementary transfer looks as follows:

time = MPI_Wtime()
for ( i=0; i<n_sample; i++ )
{
Initiate transfer
Exploit CPU
Wait for the end of transfer
}
time = (MPI_Wtime()-time)/n_sample

The Exploit CPU section in the above example is arbitrary. Intel® MPI Benchmarks exploits CPU as described below.

Exploiting CPU

Intel® MPI Benchmarks uses the following method to exploit the CPU. A kernel loop is executed repeatedly. The kernel is a fully vectorizable multiplication of a 100x100 matrix with a vector. The function is scalable in the following way:

CPU_Exploit(float desired_time, int initialize);

The input value of desired_time determines the time for the function to execute the kernel loop, with a slight variance. At the very beginning, the function is called with initialize=1 and an input value for desired_time. This determines an Mflop/s rate and a timing t_CPU, as close as possible to desired_time, obtained by running without any obstruction. During the actual benchmarking, CPU_Exploit is called with initialize=0, concurrently with the particular I/O action, and always performs the same type and number of operations as in the initialization step.

Displaying Results

Three timings are crucial to interpret the behavior of non-blocking I/O , overlapped with CPU exploitation:

A perfect overlap means: t_ovrl = max(t_pure,t_CPU)

No overlap means: t_ovrl = t_pure+t_CPU.

The actual amount of overlap is:  

overlap=(t_pure+t_CPU-t_ovrl)/min(t_pure,t_CPU)(*)

The Intel® MPI Benchmarks result tables report the timings t_ovrl, t_pure, t_CPU and the estimated overlap obtained by the (*) formula above. At the beginning of a run, the Mflop/s rate is corresponding to the t_CPU displayed.

Submit feedback on this help topic