Command-line Control

The command line is repeated in the output. The general command-line syntax is the following:

IMB-MPI1    [-h{elp}]
            [-npmin     <NPmin>]
            [-multi     <MultiMode>]
            [-off_cache <cache_size[,cache_line_size]>
[-iter      
<msgspersample[,overall_vol[,msgs_nonaggr]]>]
            [-time     <max_runtime per sample>]
            [-mem       <max. mem usage per process>]
            [-msglen    <Lengths_file>]
            [-map       <PxQ>]
            [-input     <filename>]
            [-include]  [benchmark1 [,benchmark2 [,...]]]
            [-exclude]  [benchmark1 [,benchmark2 [,...]]]
            [-msglog [<minlog>:]<maxlog>]
            [benchmark1 [,benchmark2 [,...]]]

The options may appear in any order.

Examples:

Get out-of-cache data for PingPong:
mpirun -np 2  IMB-MPI1 pingpong -off_cache -1

Run a very large configuration: restrict iterations to 20, max. 1.5 seconds run time per message size, max. 2 GBytes for message buffers:

mpirun -np 512 IMB-MPI1 -npmin 512
       alltoallv -iter 20 -time 1.5 -mem 2
 

Other examples:

mpirun -np 8  IMB-IO
mpirun -np 10 IMB-MPI1 PingPing Reduce
mpirun -np 11 IMB-EXT  -npmin 5
mpirun -np 14 IMB-IO P_Read_shared -npmin 7
 
mpirun -np 3  IMB-EXT  -input IMB_SELECT_EXT
mpirun -np 14 IMB-MPI1 -multi 0 PingPong Barrier
                       -map 2x7
mpirun -np 16 IMB-MPI1 -msglog 2:7 -include PingPongSpecificsource
PingPingSpecificsource -exclude Alltoall Alltoallv
mpirun -np 4 IMB-MPI1 -msglog 16 PingPong PingPing PingPongSpecificsource PingPingSpecificsource

Benchmark Selection Arguments

Benchmark selection arguments are a sequence of blank-separated strings. Each argument is the name of a benchmark in exact spelling, case insensitive.

For example, the string IMB-MPI1 PingPong Allreduce specifies that you want to run PingPong and Allreduce benchmarks only.

Default: no benchmark selection. All benchmarks of the selected component are run.

-npmin Option

Specifies the minimum number of processes P_min to run all selected benchmarks on. The P_min value after -npmin must be an integer.

Given P_min, the benchmarks run on the processes with the numbers selected as follows:

P_min, 2P_min, 4P_min, ..., largest 2xP_min <P, P

NOTE:

You may set P_min to 1. If you set P_min > P, Intel MPI Benchmarks interprets this value as P_min = P.

Default: no -npmin selection. Active processes are selected as described in the Running Intel® MPI Benchmarks section.

-multi outflag Option

Defines whether the benchmark runs in the multiple mode. The argument after -multi is a meta-symbol <outflag> that can take an integer value of 0 or 1. This flag controls the way of displaying results:

When the number of processes running the benchmark is more than half of the overall number MPI_COMM_WORLD, the multiple benchmark coincides with the non-multiple one, as not more than one process group can be created.

Default: no -multi selection. Intel® MPI Benchmarks run non-multiple benchmark flavors.

-off_cache cache_size[,cache_line_size] Option

Use the -off_cache flag to avoid cache re-usage. If you do not use this flag (default), the communications buffer is the same within all repetitions of one message size sample. In this case, Intel® MPI Benchmarks reuses the cache, so throughput results might be non-realistic.

The argument after off_cache can be a single number (cache_size), two comma-separated numbers (cache_size,cache_line_size), or -1:

The sent/received data is stored in buffers of size ~2x MAX(cache_size, message_size). When repetitively using messages of a particular size, their addresses are advanced within those buffers so that a single message is at least 2 cache lines after the end of the previous message. When these buffers are filled up, they are reused from the beginning.

-off_cache is effective for IMB-MPI1 and IMB-EXT. You are not recommended to use this option for IMB-IO.

Examples

Use the default values defined in IMB_mem_info.h:

-off_cache -1

2.5 MB last level cache, default line size:

-off_cache 2.5

16 MB last level cache, line size 128:

-off_cache 16,128

The off_cache mode might also be influenced by eventual internal caching with the Intel® MPI Library. This could make results interpretation complicated.

Default: no cache control. Data may come out of cache.

-iter Option

Use this option to control iterations. The argument after -iter can be a single, two comma-separated, or three comma-separated integer numbers that override the default values of MSGSPERSAMPLE, OVERALL_VOL, and MSGS_NONAGGR defined in IMB_settings.h

Examples

-iter 2000        (override MSGSPERSAMPLE by value 2000) 
-iter 1000,100    (override OVERALL_VOL by 100) 
-iter 1000,40,150 (override MSGS_NONAGGR by 150)

The -iter option is overridden by a dynamic selection that is a new default in the Intel® MPI Benchmarks 3.2: when a maximum run time (per sample) is expected to be exceeded, the iteration number is cut down. See -time

Default: iteration control through parameters MSGSPERSAMPLE, OVERALL_VOL, and MSGS_NONAGGR defined in IMB_settings.h.

-time Option

Specifies the number of seconds for the benchmark to run per message size. The argument after -time is a floating-point number.

The combination of this flag with the -iter flag or its default alternative ensures that the Intel MPI Benchmarks always chooses the maximum number of repetitions that conform to all restrictions.

A rough number of repetitions per sample to fulfill the -time request is estimated in preparatory runs that use ~1 second overhead.

Default: -time is activated. The floating-point value specifying the run-time seconds per sample is set in the SECS_PER_SAMPLE variable defined in IMB_settings.h/IMB_settings_io.h. The current value is 10.

-mem Option

Specifies the number of GB to be allocated per process for the message buffers benchmarks/message. If the size is exceeded, a warning is returned, stating how much memory is required for the overall run not to be interrupted.

The argument after -mem is a floating-point number.

Default: the memory is restricted by MAX_MEM_USAGE defined in IMB_mem_info.h.

-input <File> Option

Use the ASCII input file to select the benchmarks. For example, the IMB_SELECT_EXT file looks as following:

#
# IMB benchmark selection file
#
# Every line must be a comment (beginning with #), or it
# must contain exactly one IMB benchmark name
#
#Window
Unidir_Get
#Unidir_Put
#Bidir_Get
#Bidir_Put
Accumulate

With the help of this file, the following command runs only Unidir_Get and Accumulate benchmarks of the IMB-EXT component:

mpirun .... IMB-EXT -input IMB_SELECT_EXT

-msglen <File> Option

Enter any set of non-negative message lengths to an ASCII file, line by line, and call the Intel® MPI Benchmarks with arguments:

-msglen Lengths

The Lengths value overrides the default message lengths. For IMB-IO, the file defines the I/O portion lengths.

-map PxQ Option

Numbers processes along rows of the matrix:

0

P

...

(Q-2)P

(Q-1)P

1

 

 

 

 

...

 

 

 

 

P-1

2P-1

 

(Q-1)P-1

QP-1

For example, to run Multi-PingPongbetween two nodes of size P, with each process on one node communicating with its counterpart on the other, call:

mpirun -np <2P> IMB-MPI1 -map <P>x2 PingPong

-include [[benchmark1] benchmark2 ...]

Specifies the list of additional benchmarks to run. For example, to add PingPongSpecificSource and PingPingSpecificSource benchmarks, call:

mpirun -np 2 IMB-MPI1 -include PingPongSpecificSource PingPingSpecificSource

-exclude [[benchmark1] benchmark2 ...]

Specifies the list of benchmarks to be exclude from the run. For example, to exclude Alltoall and Allgather, call:

mpirun -np 2 IMB-MPI1 -exclude Alltoall Allgather

-msglog [<minlog>:]<maxlog>

This option allows you to control the lengths of the transfer messages. This setting overrides the MINMSGLOG and MAXMSGLOG values. The new message sizes are 0, 2^minlog, ..., 2^maxlog.

For example, try running the following command line:

mpirun -np 2 IMB-MPI1 -msglog 3:7 PingPong

Intel® MPI Benchmarks selects the lengths 0,8,16,32,64,128, as shown below:

#---------------------------------------------------

# Benchmarking PingPong

# #processes = 2

#---------------------------------------------------

       #bytes #repetitions      t[μsec]   Mbytes/sec

            0         1000         0.70         0.00

            8         1000         0.73        10.46

           16         1000         0.74        20.65

           32         1000         0.94        32.61

           64         1000         0.94        65.14

          128         1000         1.06       115.16

Alternatively, you can specify only the maxlog value:

#---------------------------------------------------

# Benchmarking PingPong

# #processes = 2

#---------------------------------------------------

       #bytes #repetitions      t[μsec]   Mbytes/sec

            0         1000         0.69         0.00

            1         1000         0.72         1.33

            2         1000         0.71         2.69

            4         1000         0.72         5.28

            8         1000         0.73        10.47

-thread_level Option

This option specifies the desired thread level for MPI_Init_thread(). See description of MPI_Init_thread() for details. The option is available only if the Intel® MPI Benchmarks is built with the USE_MPI_INIT_THREAD macro defined. Possible values for <level> are single, funneled, serialized, and multiple.

Submit feedback on this help topic