This section describes the benchmarks in detail. The benchmarks run with varying transfer sizes X (in bytes). The timings are averaged over multiple samples. Below you can see the description of one single sample with a fixed transfer size X.
The Unidir (Bidir) benchmarks are exact equivalents of the message passing PingPong. Their interpretation and output is analogous to their message passing equivalents.
This is the benchmark for the MPI_Put function. Below see the basic definitions and a schematic view of the pattern.
Table 5-3: Unidir_Put definition
measured pattern |
as symbolized between |
based on |
MPI_Put |
MPI_Datatype |
MPI_BYTE (origin and target) |
reported timings |
t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample) |
reported throughput |
X/t, aggregate and non-aggregate |
Figure 5-2: Unidir_Put pattern
This is the benchmark for the MPI_Get function. Below see the basic definitions and a schematic view of the pattern.
Table 5-4: Unidir_Get definition
measured pattern |
as symbolized between |
based on |
MPI_Get |
MPI_Datatype |
MPI_BYTE (origin and target) |
reported timings |
t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample) |
reported throughput |
X/t, aggregate and non-aggregate |
Figure 5-3: Unidir_Get pattern
This is the benchmark for the MPI_Put function with bi-directional transfers. Below see the basic definitions and a schematic view of the pattern.
measured pattern |
as symbolized between |
based on |
MPI_Put |
MPI_Datatype |
MPI_BYTE (origin and target) |
reported timings |
t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample) |
reported throughput |
X/t, aggregate and non-aggregate |
Table 5-5: Bidir_Put definition
This is the benchmark for the MPI_Get function, with bi-directional transfers. Below see the basic definitions and a schematic view of the pattern.
measured pattern |
as symbolized between |
based on |
MPI_Get |
MPI_Datatype |
MPI_BYTE (origin and target) |
reported timings |
t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample) |
reported throughput |
X/t, aggregate and non-aggregate |
Table 5-6: Bidir_Get definition
Figure 5-5: Bidir_Get pattern
This is the benchmark for the MPI_Accumulate function. It reduces a vector of length L = x/sizeof(float) of float items. The MPI data type is MPI_FLOAT, and the MPI operation is MPI_SUM. Below see the basic definitions and a schematic view of the pattern.
measured pattern |
as symbolized between |
based on |
MPI_Accumulate |
MPI_Datatype |
MPI_FLOAT |
MPI_Op |
MPI_SUM |
Root |
0 |
reported timings |
t=t(M) (in msec) as indicated in the figure below, non-aggregate (M=1) and aggregate (cf. 0; M=n_sample) |
reported throughput |
none |
Table 5-7: Accumulate definition
Figure 5-6: Accumulate pattern
This is the benchmark for measuring the overhead of an MPI_Win_create/MPI_Win_fence/MPI_Win_free combination. In case of an unused window, a negligible non-trivial action is performed inside the window. It minimizes optimization effects on the benchmark implementation. The MPI_Win_fence function is called to properly initialize an access epoch (this is a correction as compared to earlier releases of the Intel® MPI Benchmarks).
Below see the basic definitions and a schematic view of the pattern.
measured pattern |
MPI_Win_create/MPI_Win_fence/MPI_Win_free |
reported timings |
t=Dt (in msec) as indicated in the figure below |
reported throughput |
none |
Table 5-8: Window definition
Figure 5-7: Window pattern