Update docs

This commit is contained in:
Stan Moore
2020-12-22 14:24:24 -07:00
parent 4520ef16e3
commit 3162618512
3 changed files with 26 additions and 24 deletions

View File

@ -38,14 +38,14 @@ produce an executable compatible with a specific hardware.
:class: note
Kokkos with CUDA currently implicitly assumes that the MPI library is
CUDA-aware. This is not always the case, especially when using
GPU-aware. This is not always the case, especially when using
pre-compiled MPI libraries provided by a Linux distribution. This is
not a problem when using only a single GPU with a single MPI
rank. When running with multiple MPI ranks, you may see segmentation
faults without CUDA-aware MPI support. These can be avoided by adding
the flags :doc:`-pk kokkos cuda/aware off <Run_options>` to the
faults without GPU-aware MPI support. These can be avoided by adding
the flags :doc:`-pk kokkos gpu/aware off <Run_options>` to the
LAMMPS command line or by using the command :doc:`package kokkos
cuda/aware off <package>` in the input file.
gpu/aware off <package>` in the input file.
.. admonition:: AMD GPU support
:class: note
@ -242,8 +242,8 @@ case, also packing/unpacking communication buffers on the host may give
speedup (see the KOKKOS :doc:`package <package>` command). Using CUDA MPS
is recommended in this scenario.
Using a CUDA-aware MPI library is highly recommended. CUDA-aware MPI use can be
avoided by using :doc:`-pk kokkos cuda/aware no <package>`. As above for
Using a GPU-aware MPI library is highly recommended. GPU-aware MPI use can be
avoided by using :doc:`-pk kokkos gpu/aware no <package>`. As above for
multi-core CPUs (and no GPU), if N is the number of physical cores/node,
then the number of MPI tasks/node should not exceed N.

View File

@ -65,7 +65,7 @@ Syntax
*no_affinity* values = none
*kokkos* args = keyword value ...
zero or more keyword/value pairs may be appended
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* *pair/comm/forward* *fix/comm/forward* or *comm/reverse* or *cuda/aware*
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* *pair/comm/forward* *fix/comm/forward* or *comm/reverse* or *gpu/aware*
*neigh* value = *full* or *half*
full = full neighbor list
half = half neighbor list built in thread-safe manner
@ -84,15 +84,15 @@ Syntax
use value for comm/exchange and comm/forward and pair/comm/forward and fix/comm/forward and comm/reverse
*comm/exchange* value = *no* or *host* or *device*
*comm/forward* value = *no* or *host* or *device*
*pair/comm/forward* value = *no* or *host* or *device*
*fix/comm/forward* value = *no* or *host* or *device*
*pair/comm/forward* value = *no* or *device*
*fix/comm/forward* value = *no* or *device*
*comm/reverse* value = *no* or *host* or *device*
no = perform communication pack/unpack in non-KOKKOS mode
host = perform pack/unpack on host (e.g. with OpenMP threading)
device = perform pack/unpack on device (e.g. on GPU)
*cuda/aware* = *off* or *on*
off = do not use CUDA-aware MPI
on = use CUDA-aware MPI (default)
*gpu/aware* = *off* or *on*
off = do not use GPU-aware MPI
on = use GPU-aware MPI (default)
*omp* args = Nthreads keyword value ...
Nthread = # of OpenMP threads to associate with each MPI process
zero or more keyword/value pairs may be appended
@ -502,13 +502,15 @@ additional communication in fixes, such as fix SHAKE.
The *comm* keyword is simply a short-cut to set the same value for all
the comm keywords.
The value options for all 3 keywords are *no* or *host* or *device*\ . A
The value options for the keywords are *no* or *host* or *device*\ . A
value of *no* means to use the standard non-KOKKOS method of
packing/unpacking data for the communication. A value of *host* means to
use the host, typically a multi-core CPU, and perform the
packing/unpacking in parallel with threads. A value of *device* means to
use the device, typically a GPU, to perform the packing/unpacking
operation.
operation. If a value of *host* is used for the *pair/comm/forward* or
*fix/comm/forward* keyword, it will be automatically be changed to *no*
since these keywords don't support *host* mode.
The optimal choice for these keywords depends on the input script and
the hardware used. The *no* value is useful for verifying that the
@ -529,18 +531,18 @@ pack/unpack communicated data. When running small systems on a GPU,
performing the exchange pack/unpack on the host CPU can give speedup
since it reduces the number of CUDA kernel launches.
The *cuda/aware* keyword chooses whether CUDA-aware MPI will be used. When
The *gpu/aware* keyword chooses whether GPU-aware MPI will be used. When
this keyword is set to *on*\ , buffers in GPU memory are passed directly
through MPI send/receive calls. This reduces overhead of first copying
the data to the host CPU. However CUDA-aware MPI is not supported on all
the data to the host CPU. However GPU-aware MPI is not supported on all
systems, which can lead to segmentation faults and would require using a
value of *off*\ . If LAMMPS can safely detect that CUDA-aware MPI is not
value of *off*\ . If LAMMPS can safely detect that GPU-aware MPI is not
available (currently only possible with OpenMPI v2.0.0 or later), then
the *cuda/aware* keyword is automatically set to *off* by default. When
the *cuda/aware* keyword is set to *off* while any of the *comm*
the *gpu/aware* keyword is automatically set to *off* by default. When
the *gpu/aware* keyword is set to *off* while any of the *comm*
keywords are set to *device*\ , the value for these *comm* keywords will
be automatically changed to *host*\ . This setting has no effect if not
running on GPUs or if using only one MPI rank. CUDA-aware MPI is available
be automatically changed to *no*\ . This setting has no effect if not
running on GPUs or if using only one MPI rank. GPU-aware MPI is available
for OpenMPI 1.8 (or later versions), Mvapich2 1.9 (or later) when the
"MV2_USE_CUDA" environment variable is set to "1", CrayMPI, and IBM
Spectrum MPI when the "-gpu" flag is used.
@ -654,8 +656,8 @@ script or via the "-pk intel" :doc:`command-line switch <Run_options>`.
For the KOKKOS package, the option defaults for GPUs are neigh = full,
neigh/qeq = full, newton = off, binsize for GPUs = 2x LAMMPS default
value, comm = device, cuda/aware = on. When LAMMPS can safely detect
that CUDA-aware MPI is not available, the default value of cuda/aware
value, comm = device, gpu/aware = on. When LAMMPS can safely detect
that GPU-aware MPI is not available, the default value of gpu/aware
becomes "off". For CPUs or Xeon Phis, the option defaults are neigh =
half, neigh/qeq = half, newton = on, binsize = 0.0, and comm = no. The
option neigh/thread = on when there are 16K atoms or less on an MPI