Update docs

2020-12-22 14:24:24 -07:00
parent 4520ef16e3
commit 3162618512
3 changed files with 26 additions and 24 deletions
--- a/doc/src/Speed_kokkos.rst
+++ b/doc/src/Speed_kokkos.rst
@ -38,14 +38,14 @@ produce an executable compatible with a specific hardware.
   :class: note

   Kokkos with CUDA currently implicitly assumes that the MPI library is
-   CUDA-aware. This is not always the case, especially when using
+   GPU-aware. This is not always the case, especially when using
   pre-compiled MPI libraries provided by a Linux distribution. This is
   not a problem when using only a single GPU with a single MPI
   rank. When running with multiple MPI ranks, you may see segmentation
-   faults without CUDA-aware MPI support. These can be avoided by adding
-   the flags :doc:`-pk kokkos cuda/aware off <Run_options>` to the
+   faults without GPU-aware MPI support. These can be avoided by adding
+   the flags :doc:`-pk kokkos gpu/aware off <Run_options>` to the
   LAMMPS command line or by using the command :doc:`package kokkos
-   cuda/aware off <package>` in the input file.
+   gpu/aware off <package>` in the input file.

 .. admonition:: AMD GPU support
   :class: note
@ -242,8 +242,8 @@ case, also packing/unpacking communication buffers on the host may give
 speedup (see the KOKKOS :doc:`package <package>` command). Using CUDA MPS
 is recommended in this scenario.

-Using a CUDA-aware MPI library is highly recommended. CUDA-aware MPI use can be
-avoided by using :doc:`-pk kokkos cuda/aware no <package>`. As above for
+Using a GPU-aware MPI library is highly recommended. GPU-aware MPI use can be
+avoided by using :doc:`-pk kokkos gpu/aware no <package>`. As above for
 multi-core CPUs (and no GPU), if N is the number of physical cores/node,
 then the number of MPI tasks/node should not exceed N.

--- a/doc/src/package.rst
+++ b/doc/src/package.rst
@ -65,7 +65,7 @@ Syntax
           *no_affinity* values = none
       *kokkos* args = keyword value ...
         zero or more keyword/value pairs may be appended
-         keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* *pair/comm/forward* *fix/comm/forward* or *comm/reverse* or *cuda/aware*
+         keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* *pair/comm/forward* *fix/comm/forward* or *comm/reverse* or *gpu/aware*
           *neigh* value = *full* or *half*
             full = full neighbor list
             half = half neighbor list built in thread-safe manner
@ -84,15 +84,15 @@ Syntax
             use value for comm/exchange and comm/forward and pair/comm/forward and fix/comm/forward and comm/reverse
           *comm/exchange* value = *no* or *host* or *device*
           *comm/forward* value = *no* or *host* or *device*
-           *pair/comm/forward* value = *no* or *host* or *device*
-           *fix/comm/forward* value = *no* or *host* or *device*
+           *pair/comm/forward* value = *no* or *device*
+           *fix/comm/forward* value = *no* or *device*
           *comm/reverse* value = *no* or *host* or *device*
             no = perform communication pack/unpack in non-KOKKOS mode
             host = perform pack/unpack on host (e.g. with OpenMP threading)
             device = perform pack/unpack on device (e.g. on GPU)
-           *cuda/aware* = *off* or *on*
-             off = do not use CUDA-aware MPI
-             on = use CUDA-aware MPI (default)
+           *gpu/aware* = *off* or *on*
+             off = do not use GPU-aware MPI
+             on = use GPU-aware MPI (default)
       *omp* args = Nthreads keyword value ...
         Nthread = # of OpenMP threads to associate with each MPI process
         zero or more keyword/value pairs may be appended
@ -502,13 +502,15 @@ additional communication in fixes, such as fix SHAKE.
 The *comm* keyword is simply a short-cut to set the same value for all
 the comm keywords.

-The value options for all 3 keywords are *no* or *host* or *device*\ . A
+The value options for the keywords are *no* or *host* or *device*\ . A
 value of *no* means to use the standard non-KOKKOS method of
 packing/unpacking data for the communication. A value of *host* means to
 use the host, typically a multi-core CPU, and perform the
 packing/unpacking in parallel with threads. A value of *device* means to
 use the device, typically a GPU, to perform the packing/unpacking
-operation.
+operation. If a value of *host* is used for the *pair/comm/forward* or
+*fix/comm/forward* keyword, it will be automatically be changed to *no*
+since these keywords don't support *host* mode.

 The optimal choice for these keywords depends on the input script and
 the hardware used. The *no* value is useful for verifying that the
@ -529,18 +531,18 @@ pack/unpack communicated data. When running small systems on a GPU,
 performing the exchange pack/unpack on the host CPU can give speedup
 since it reduces the number of CUDA kernel launches.

-The *cuda/aware* keyword chooses whether CUDA-aware MPI will be used. When
+The *gpu/aware* keyword chooses whether GPU-aware MPI will be used. When
 this keyword is set to *on*\ , buffers in GPU memory are passed directly
 through MPI send/receive calls. This reduces overhead of first copying
-the data to the host CPU. However CUDA-aware MPI is not supported on all
+the data to the host CPU. However GPU-aware MPI is not supported on all
 systems, which can lead to segmentation faults and would require using a
-value of *off*\ . If LAMMPS can safely detect that CUDA-aware MPI is not
+value of *off*\ . If LAMMPS can safely detect that GPU-aware MPI is not
 available (currently only possible with OpenMPI v2.0.0 or later), then
-the *cuda/aware* keyword is automatically set to *off* by default. When
-the *cuda/aware* keyword is set to *off* while any of the *comm*
+the *gpu/aware* keyword is automatically set to *off* by default. When
+the *gpu/aware* keyword is set to *off* while any of the *comm*
 keywords are set to *device*\ , the value for these *comm* keywords will
-be automatically changed to *host*\ . This setting has no effect if not
-running on GPUs or if using only one MPI rank. CUDA-aware MPI is available
+be automatically changed to *no*\ . This setting has no effect if not
+running on GPUs or if using only one MPI rank. GPU-aware MPI is available
 for OpenMPI 1.8 (or later versions), Mvapich2 1.9 (or later) when the
 "MV2_USE_CUDA" environment variable is set to "1", CrayMPI, and IBM
 Spectrum MPI when the "-gpu" flag is used.
@ -654,8 +656,8 @@ script or via the "-pk intel" :doc:`command-line switch <Run_options>`.

 For the KOKKOS package, the option defaults for GPUs are neigh = full,
 neigh/qeq = full, newton = off, binsize for GPUs = 2x LAMMPS default
-value, comm = device, cuda/aware = on. When LAMMPS can safely detect
-that CUDA-aware MPI is not available, the default value of cuda/aware
+value, comm = device, gpu/aware = on. When LAMMPS can safely detect
+that GPU-aware MPI is not available, the default value of gpu/aware
 becomes "off". For CPUs or Xeon Phis, the option defaults are neigh =
 half, neigh/qeq = half, newton = on, binsize = 0.0, and comm = no. The
 option neigh/thread = on when there are 16K atoms or less on an MPI