5.1.4 Definition of the IMB-EXT Benchmarks

This section describes the benchmarks in detail. The benchmarks run with varying transfer sizes X (in bytes). The timings are averaged over multiple samples. Below you can see the description of one single sample with a fixed transfer size X.

The Unidir (Bidir) benchmarks are exact equivalents of the message passing PingPong. Their interpretation and output is analogous to their message passing equivalents.

5.1.4.1 Unidir_Put

This is the benchmark for the MPI_Put function. Below see the basic definitions and a schematic view of the pattern.

Table 5-3: Unidir_Put definition

measured pattern

as symbolized between in the figure below; 2 active processes only

based on

MPI_Put

MPI_Datatype

MPI_BYTE (origin and target)

reported timings

t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample)

reported throughput

X/t, aggregate and non-aggregate

Figure 5-2: Unidir_Put pattern

5.1.4.2 Unidir_Get

This is the benchmark for the MPI_Get function. Below see the basic definitions and a schematic view of the pattern.

Table 5-4: Unidir_Get definition

measured pattern

as symbolized between in the figure below; 2 active processes only

based on

MPI_Get

MPI_Datatype

MPI_BYTE (origin and target)

reported timings

t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample)

reported throughput

X/t, aggregate and non-aggregate

Figure 5-3: Unidir_Get pattern 

5.1.4.3 Bidir_Put

This is the benchmark for the MPI_Put function with bi-directional transfers. Below see the basic definitions and a schematic view of the pattern.

measured pattern

as symbolized between in the figure below;2 active processes only

based on

MPI_Put

MPI_Datatype

MPI_BYTE (origin and target)

reported timings

t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample)

reported throughput

X/t, aggregate and non-aggregate

Table 5-5: Bidir_Put definition

5.1.4.4 Bidir_Get

This is the benchmark for the MPI_Get function, with bi-directional transfers. Below see the basic definitions and a schematic view of the pattern.

measured pattern

as symbolized between in the figure below; 2 active processes only

based on

MPI_Get

MPI_Datatype

MPI_BYTE (origin and target)

reported timings

t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample)

reported throughput

X/t, aggregate and non-aggregate

Table 5-6: Bidir_Get definition

Figure 5-5: Bidir_Get pattern

5.1.4.5 MPI_Accumulate

This is the benchmark for the MPI_Accumulate function. It reduces a vector of length L = x/sizeof(float) of float items. The MPI data type is MPI_FLOAT, and the MPI operation is MPI_SUM. Below see the basic definitions and a schematic view of the pattern.

measured pattern

as symbolized between in the figure below; 2 active processes only

based on

MPI_Accumulate

MPI_Datatype

MPI_FLOAT

MPI_Op

MPI_SUM

Root

0

reported timings

t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample)

reported throughput

none

Table 5-7: Accumulate definition

Figure 5-6: Accumulate pattern

5.1.4.6 Window

This is the benchmark for measuring the overhead of an MPI_Win_create/MPI_Win_fence/MPI_Win_free combination. In case of an unused window, a negligible non-trivial action is performed inside the window. It minimizes optimization effects on the benchmark implementation. The MPI_Win_fence function is called to properly initialize an access epoch (this is a correction as compared to earlier releases of the Intel® MPI Benchmarks).

Below see the basic definitions and a schematic view of the pattern.

measured pattern

MPI_Win_create/MPI_Win_fence/MPI_Win_free

reported timings

t=Dt (in msec) as indicated in the figure below

reported throughput

none

Table 5-8: Window definition

Figure 5-7: Window pattern

Submit feedback on this help topic