Intel® MPI Benchmarks 3.2.4
The command line is repeated in the output. The general command-line syntax is the following:
IMB-MPI1 [-h{elp}] [-npmin <NPmin>] [-multi <MultiMode>] [-off_cache <cache_size[,cache_line_size]> [-iter <msgspersample[,overall_vol[,msgs_nonaggr]]>] [-time <max_runtime per sample>] [-mem <max. mem usage per process>] [-msglen <Lengths_file>] [-map <PxQ>] [-input <filename>] [-include] [benchmark1 [,benchmark2 [,...]]] [-exclude] [benchmark1 [,benchmark2 [,...]]] [-msglog [<minlog>:]<maxlog>] [benchmark1 [,benchmark2 [,...]]]
The options may appear in any order.
Examples:
Get out-of-cache data for PingPong: mpirun -np 2 IMB-MPI1 pingpong -off_cache -1
Run a very large configuration: restrict iterations to 20, max. 1.5 seconds run time per message size, max. 2 GBytes for message buffers:
mpirun -np 512 IMB-MPI1 -npmin 512 alltoallv -iter 20 -time 1.5 -mem 2
Other examples:
mpirun -np 8 IMB-IO mpirun -np 10 IMB-MPI1 PingPing Reduce mpirun -np 11 IMB-EXT -npmin 5 mpirun -np 14 IMB-IO P_Read_shared -npmin 7 mpirun -np 3 IMB-EXT -input IMB_SELECT_EXT mpirun -np 14 IMB-MPI1 -multi 0 PingPong Barrier -map 2x7 mpirun -np 16 IMB-MPI1 -msglog 2:7 -include PingPongSpecificsource PingPingSpecificsource -exclude Alltoall Alltoallv mpirun -np 4 IMB-MPI1 -msglog 16 PingPong PingPing PingPongSpecificsource PingPingSpecificsource
Benchmark selection arguments are a sequence of blank-separated strings. Each argument is the name of a benchmark in exact spelling, case insensitive.
For example, the string IMB-MPI1 PingPong Allreduce specifies that you want to run PingPong and Allreduce benchmarks only.
Default: no benchmark selection. All benchmarks of the selected component are run.
Specifies the minimum number of processes P_min to run all selected benchmarks on. The P_min value after -npmin must be an integer.
Given P_min, the benchmarks run on the processes with the numbers selected as follows:
P_min, 2P_min, 4P_min, ..., largest 2xP_min <P, P
NOTE:
You may set P_min to 1. If you set P_min > P, Intel MPI Benchmarks interprets this value as P_min = P.
Default: no -npmin selection. Active processes are selected as described in the Running Intel® MPI Benchmarks section.
Defines whether the benchmark runs in the multiple mode. The argument after -multi is a meta-symbol <outflag> that can take an integer value of 0 or 1. This flag controls the way of displaying results:
Outflag = 0 only display maximum timings (minimum throughputs) over all active groups
Outflag = 1 report on all groups separately. The report may be long in this case.
When the number of processes running the benchmark is more than half of the overall number MPI_COMM_WORLD, the multiple benchmark coincides with the non-multiple one, as not more than one process group can be created.
Default: no -multi selection. Intel® MPI Benchmarks run non-multiple benchmark flavors.
Use the -off_cache flag to avoid cache re-usage. If you do not use this flag (default), the communications buffer is the same within all repetitions of one message size sample. In this case, Intel® MPI Benchmarks reuses the cache, so throughput results might be non-realistic.
The argument after off_cache can be a single number (cache_size), two comma-separated numbers (cache_size,cache_line_size), or -1:
cache_size is a float for an upper bound of the size of the last level cache, in MB.
cache_line_size is assumed to be the size of a last level cache line (can be an upper estimate).
-1 indicates that the default values from IMB_mem_info.h should be used. The cache_size and cache_line_size values are assumed to be statically defined in IMB_mem_info.h.
The sent/received data is stored in buffers of size ~2x MAX(cache_size, message_size). When repetitively using messages of a particular size, their addresses are advanced within those buffers so that a single message is at least 2 cache lines after the end of the previous message. When these buffers are filled up, they are reused from the beginning.
-off_cache is effective for IMB-MPI1 and IMB-EXT. You are not recommended to use this option for IMB-IO.
Examples
Use the default values defined in IMB_mem_info.h:
-off_cache -1
2.5 MB last level cache, default line size:
-off_cache 2.5
16 MB last level cache, line size 128:
-off_cache 16,128
The off_cache mode might also be influenced by eventual internal caching with the Intel® MPI Library. This could make results interpretation complicated.
Default: no cache control. Data may come out of cache.
Use this option to control iterations. The argument after -iter can be a single, two comma-separated, or three comma-separated integer numbers that override the default values of MSGSPERSAMPLE, OVERALL_VOL, and MSGS_NONAGGR defined in IMB_settings.h
Examples
-iter 2000 (override MSGSPERSAMPLE by value 2000) -iter 1000,100 (override OVERALL_VOL by 100) -iter 1000,40,150 (override MSGS_NONAGGR by 150)
The -iter option is overridden by a dynamic selection that is a new default in the Intel® MPI Benchmarks 3.2: when a maximum run time (per sample) is expected to be exceeded, the iteration number is cut down. See -time
Default: iteration control through parameters MSGSPERSAMPLE, OVERALL_VOL, and MSGS_NONAGGR defined in IMB_settings.h.
Specifies the number of seconds for the benchmark to run per message size. The argument after -time is a floating-point number.
The combination of this flag with the -iter flag or its default alternative ensures that the Intel MPI Benchmarks always chooses the maximum number of repetitions that conform to all restrictions.
A rough number of repetitions per sample to fulfill the -time request is estimated in preparatory runs that use ~1 second overhead.
Default: -time is activated. The floating-point value specifying the run-time seconds per sample is set in the SECS_PER_SAMPLE variable defined in IMB_settings.h/IMB_settings_io.h. The current value is 10.
Specifies the number of GB to be allocated per process for the message buffers benchmarks/message. If the size is exceeded, a warning is returned, stating how much memory is required for the overall run not to be interrupted.
The argument after -mem is a floating-point number.
Default: the memory is restricted by MAX_MEM_USAGE defined in IMB_mem_info.h.
Use the ASCII input file to select the benchmarks. For example, the IMB_SELECT_EXT file looks as following:
# # IMB benchmark selection file # # Every line must be a comment (beginning with #), or it # must contain exactly one IMB benchmark name # #Window Unidir_Get #Unidir_Put #Bidir_Get #Bidir_Put Accumulate
With the help of this file, the following command runs only Unidir_Get and Accumulate benchmarks of the IMB-EXT component:
mpirun .... IMB-EXT -input IMB_SELECT_EXT
Enter any set of non-negative message lengths to an ASCII file, line by line, and call the Intel® MPI Benchmarks with arguments:
-msglen Lengths
The Lengths value overrides the default message lengths. For IMB-IO, the file defines the I/O portion lengths.
Numbers processes along rows of the matrix:
0 |
P |
... |
(Q-2)P |
(Q-1)P |
1 |
|
|
|
|
... |
|
|
|
|
P-1 |
2P-1 |
|
(Q-1)P-1 |
QP-1 |
For example, to run Multi-PingPongbetween two nodes of size P, with each process on one node communicating with its counterpart on the other, call:
mpirun -np <2P> IMB-MPI1 -map <P>x2 PingPong
Specifies the list of additional benchmarks to run. For example, to add PingPongSpecificSource and PingPingSpecificSource benchmarks, call:
mpirun -np 2 IMB-MPI1 -include PingPongSpecificSource PingPingSpecificSource
Specifies the list of benchmarks to be exclude from the run. For example, to exclude Alltoall and Allgather, call:
mpirun -np 2 IMB-MPI1 -exclude Alltoall Allgather
This option allows you to control the lengths of the transfer messages. This setting overrides the MINMSGLOG and MAXMSGLOG values. The new message sizes are 0, 2^minlog, ..., 2^maxlog.
For example, try running the following command line:
mpirun -np 2 IMB-MPI1 -msglog 3:7 PingPong
Intel® MPI Benchmarks selects the lengths 0,8,16,32,64,128, as shown below:
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[μsec] Mbytes/sec
0 1000 0.70 0.00
8 1000 0.73 10.46
16 1000 0.74 20.65
32 1000 0.94 32.61
64 1000 0.94 65.14
128 1000 1.06 115.16
Alternatively, you can specify only the maxlog value:
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[μsec] Mbytes/sec
0 1000 0.69 0.00
1 1000 0.72 1.33
2 1000 0.71 2.69
4 1000 0.72 5.28
8 1000 0.73 10.47
This option specifies the desired thread level for MPI_Init_thread(). See description of MPI_Init_thread() for details. The option is available only if the Intel® MPI Benchmarks is built with the USE_MPI_INIT_THREAD macro defined. Possible values for <level> are single, funneled, serialized, and multiple.