Update docs
This commit is contained in:
@ -38,14 +38,14 @@ produce an executable compatible with a specific hardware.
|
||||
:class: note
|
||||
|
||||
Kokkos with CUDA currently implicitly assumes that the MPI library is
|
||||
CUDA-aware. This is not always the case, especially when using
|
||||
GPU-aware. This is not always the case, especially when using
|
||||
pre-compiled MPI libraries provided by a Linux distribution. This is
|
||||
not a problem when using only a single GPU with a single MPI
|
||||
rank. When running with multiple MPI ranks, you may see segmentation
|
||||
faults without CUDA-aware MPI support. These can be avoided by adding
|
||||
the flags :doc:`-pk kokkos cuda/aware off <Run_options>` to the
|
||||
faults without GPU-aware MPI support. These can be avoided by adding
|
||||
the flags :doc:`-pk kokkos gpu/aware off <Run_options>` to the
|
||||
LAMMPS command line or by using the command :doc:`package kokkos
|
||||
cuda/aware off <package>` in the input file.
|
||||
gpu/aware off <package>` in the input file.
|
||||
|
||||
.. admonition:: AMD GPU support
|
||||
:class: note
|
||||
@ -242,8 +242,8 @@ case, also packing/unpacking communication buffers on the host may give
|
||||
speedup (see the KOKKOS :doc:`package <package>` command). Using CUDA MPS
|
||||
is recommended in this scenario.
|
||||
|
||||
Using a CUDA-aware MPI library is highly recommended. CUDA-aware MPI use can be
|
||||
avoided by using :doc:`-pk kokkos cuda/aware no <package>`. As above for
|
||||
Using a GPU-aware MPI library is highly recommended. GPU-aware MPI use can be
|
||||
avoided by using :doc:`-pk kokkos gpu/aware no <package>`. As above for
|
||||
multi-core CPUs (and no GPU), if N is the number of physical cores/node,
|
||||
then the number of MPI tasks/node should not exceed N.
|
||||
|
||||
|
||||
@ -65,7 +65,7 @@ Syntax
|
||||
*no_affinity* values = none
|
||||
*kokkos* args = keyword value ...
|
||||
zero or more keyword/value pairs may be appended
|
||||
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* *pair/comm/forward* *fix/comm/forward* or *comm/reverse* or *cuda/aware*
|
||||
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* *pair/comm/forward* *fix/comm/forward* or *comm/reverse* or *gpu/aware*
|
||||
*neigh* value = *full* or *half*
|
||||
full = full neighbor list
|
||||
half = half neighbor list built in thread-safe manner
|
||||
@ -84,15 +84,15 @@ Syntax
|
||||
use value for comm/exchange and comm/forward and pair/comm/forward and fix/comm/forward and comm/reverse
|
||||
*comm/exchange* value = *no* or *host* or *device*
|
||||
*comm/forward* value = *no* or *host* or *device*
|
||||
*pair/comm/forward* value = *no* or *host* or *device*
|
||||
*fix/comm/forward* value = *no* or *host* or *device*
|
||||
*pair/comm/forward* value = *no* or *device*
|
||||
*fix/comm/forward* value = *no* or *device*
|
||||
*comm/reverse* value = *no* or *host* or *device*
|
||||
no = perform communication pack/unpack in non-KOKKOS mode
|
||||
host = perform pack/unpack on host (e.g. with OpenMP threading)
|
||||
device = perform pack/unpack on device (e.g. on GPU)
|
||||
*cuda/aware* = *off* or *on*
|
||||
off = do not use CUDA-aware MPI
|
||||
on = use CUDA-aware MPI (default)
|
||||
*gpu/aware* = *off* or *on*
|
||||
off = do not use GPU-aware MPI
|
||||
on = use GPU-aware MPI (default)
|
||||
*omp* args = Nthreads keyword value ...
|
||||
Nthread = # of OpenMP threads to associate with each MPI process
|
||||
zero or more keyword/value pairs may be appended
|
||||
@ -502,13 +502,15 @@ additional communication in fixes, such as fix SHAKE.
|
||||
The *comm* keyword is simply a short-cut to set the same value for all
|
||||
the comm keywords.
|
||||
|
||||
The value options for all 3 keywords are *no* or *host* or *device*\ . A
|
||||
The value options for the keywords are *no* or *host* or *device*\ . A
|
||||
value of *no* means to use the standard non-KOKKOS method of
|
||||
packing/unpacking data for the communication. A value of *host* means to
|
||||
use the host, typically a multi-core CPU, and perform the
|
||||
packing/unpacking in parallel with threads. A value of *device* means to
|
||||
use the device, typically a GPU, to perform the packing/unpacking
|
||||
operation.
|
||||
operation. If a value of *host* is used for the *pair/comm/forward* or
|
||||
*fix/comm/forward* keyword, it will be automatically be changed to *no*
|
||||
since these keywords don't support *host* mode.
|
||||
|
||||
The optimal choice for these keywords depends on the input script and
|
||||
the hardware used. The *no* value is useful for verifying that the
|
||||
@ -529,18 +531,18 @@ pack/unpack communicated data. When running small systems on a GPU,
|
||||
performing the exchange pack/unpack on the host CPU can give speedup
|
||||
since it reduces the number of CUDA kernel launches.
|
||||
|
||||
The *cuda/aware* keyword chooses whether CUDA-aware MPI will be used. When
|
||||
The *gpu/aware* keyword chooses whether GPU-aware MPI will be used. When
|
||||
this keyword is set to *on*\ , buffers in GPU memory are passed directly
|
||||
through MPI send/receive calls. This reduces overhead of first copying
|
||||
the data to the host CPU. However CUDA-aware MPI is not supported on all
|
||||
the data to the host CPU. However GPU-aware MPI is not supported on all
|
||||
systems, which can lead to segmentation faults and would require using a
|
||||
value of *off*\ . If LAMMPS can safely detect that CUDA-aware MPI is not
|
||||
value of *off*\ . If LAMMPS can safely detect that GPU-aware MPI is not
|
||||
available (currently only possible with OpenMPI v2.0.0 or later), then
|
||||
the *cuda/aware* keyword is automatically set to *off* by default. When
|
||||
the *cuda/aware* keyword is set to *off* while any of the *comm*
|
||||
the *gpu/aware* keyword is automatically set to *off* by default. When
|
||||
the *gpu/aware* keyword is set to *off* while any of the *comm*
|
||||
keywords are set to *device*\ , the value for these *comm* keywords will
|
||||
be automatically changed to *host*\ . This setting has no effect if not
|
||||
running on GPUs or if using only one MPI rank. CUDA-aware MPI is available
|
||||
be automatically changed to *no*\ . This setting has no effect if not
|
||||
running on GPUs or if using only one MPI rank. GPU-aware MPI is available
|
||||
for OpenMPI 1.8 (or later versions), Mvapich2 1.9 (or later) when the
|
||||
"MV2_USE_CUDA" environment variable is set to "1", CrayMPI, and IBM
|
||||
Spectrum MPI when the "-gpu" flag is used.
|
||||
@ -654,8 +656,8 @@ script or via the "-pk intel" :doc:`command-line switch <Run_options>`.
|
||||
|
||||
For the KOKKOS package, the option defaults for GPUs are neigh = full,
|
||||
neigh/qeq = full, newton = off, binsize for GPUs = 2x LAMMPS default
|
||||
value, comm = device, cuda/aware = on. When LAMMPS can safely detect
|
||||
that CUDA-aware MPI is not available, the default value of cuda/aware
|
||||
value, comm = device, gpu/aware = on. When LAMMPS can safely detect
|
||||
that GPU-aware MPI is not available, the default value of gpu/aware
|
||||
becomes "off". For CPUs or Xeon Phis, the option defaults are neigh =
|
||||
half, neigh/qeq = half, newton = on, binsize = 0.0, and comm = no. The
|
||||
option neigh/thread = on when there are 16K atoms or less on an MPI
|
||||
|
||||
Reference in New Issue
Block a user