documentation corrections, spelling fixes and updates

2021-02-17 18:47:35 -05:00
parent e575c5fa29
commit f367e66aba
4 changed files with 42 additions and 26 deletions
--- a/doc/src/Speed_gpu.rst
+++ b/doc/src/Speed_gpu.rst
@ -1,11 +1,14 @@
 GPU package
 ===========
-The GPU package was developed by Mike Brown while at SNL and ORNL
+The GPU package was developed by Mike Brown while at SNL and ORNL (now
-and his collaborators, particularly Trung Nguyen (now at Northwestern).
+at Intel Corp.) and his collaborators, particularly Trung Nguyen (now at
-It provides GPU versions of many pair styles and for parts of the
+Northwestern).  Support for AMD GPUs via HIP was added by Vsevolod Nikolskiy
-:doc:`kspace_style pppm <kspace_style>` for long-range Coulombics.
+and coworkers at HSE University.
-It has the following general features:
+
 The GPU package provides GPU versions of many pair styles and for
 parts of the :doc:`kspace_style pppm <kspace_style>` for long-range
 Coulombics.  It has the following general features:
 * It is designed to exploit common GPU hardware configurations where one
  or more GPUs are coupled to many cores of one or more multi-core CPUs,
@ -24,8 +27,9 @@ It has the following general features:
  force vectors.
 * LAMMPS-specific code is in the GPU package.  It makes calls to a
  generic GPU library in the lib/gpu directory.  This library provides
-  NVIDIA support as well as more general OpenCL support, so that the
+  either Nvidia support, AMD support, or more general OpenCL support
-  same functionality is supported on a variety of hardware.
+  (for Nvidia GPUs, AMD GPUs, Intel GPUs, and multi-core CPUs).
  so that the same functionality is supported on a variety of hardware.
 **Required hardware/software:**
@ -89,10 +93,10 @@ shared by 4 MPI tasks.
 The GPU package also has limited support for OpenMP for both
 multi-threading and vectorization of routines that are run on the CPUs.
 This requires that the GPU library and LAMMPS are built with flags to
-enable OpenMP support (e.g. -fopenmp -fopenmp-simd). Some styles for
+enable OpenMP support (e.g. -fopenmp). Some styles for time integration
-time integration are also available in the GPU package. These run
+are also available in the GPU package. These run completely on the CPUs
-completely on the CPUs in full double precision, but exploit
+in full double precision, but exploit multi-threading and vectorization
-multi-threading and vectorization for faster performance.
+for faster performance.
 Use the "-sf gpu" :doc:`command-line switch <Run_options>`, which will
 automatically append "gpu" to styles that support it.  Use the "-pk
@ -159,11 +163,11 @@ Likewise, you should experiment with the precision setting for the GPU
 library to see if single or mixed precision will give accurate
 results, since they will typically be faster.
-MPI parallelism typically outperforms OpenMP parallelism, but in same cases
+MPI parallelism typically outperforms OpenMP parallelism, but in some
-using fewer MPI tasks and multiple OpenMP threads with the GPU package
+cases using fewer MPI tasks and multiple OpenMP threads with the GPU
-can give better performance. 3-body potentials can often perform better
+package can give better performance. 3-body potentials can often perform
-with multiple OMP threads because the inter-process communication is
+better with multiple OMP threads because the inter-process communication
-higher for these styles with the GPU package in order to allow
+is higher for these styles with the GPU package in order to allow
 deterministic results.
 **Guidelines for best performance:**
@ -189,6 +193,12 @@ deterministic results.
  :doc:`angle <angle_style>`, :doc:`dihedral <dihedral_style>`,
  :doc:`improper <improper_style>`, and :doc:`long-range <kspace_style>`
  calculations will not be included in the "Pair" time.
 * Since only part of the pppm kspace style is GPU accelerated, it
  may be faster to only use GPU acceleration for Pair styles with
  long-range electrostatics.  See the "pair/only" keyword of the
  package command for a shortcut to do that.  The work between kspace
  on the CPU and non-bonded interactions on the GPU can be balanced
  through adjusting the coulomb cutoff without loss of accuracy.
 * When the *mode* setting for the package gpu command is force/neigh,
  the time for neighbor list calculations on the GPU will be added into
  the "Pair" time, not the "Neigh" time.  An additional breakdown of the
--- a/doc/src/package.rst
+++ b/doc/src/package.rst
@ -175,7 +175,7 @@ package.
 The *Ngpu* argument sets the number of GPUs per node. If *Ngpu* is 0
 and no other keywords are specified, GPU or accelerator devices are
-autoselected. In this process, all platforms are searched for
+auto-selected. In this process, all platforms are searched for
 accelerator devices and GPUs are chosen if available. The device with
 the highest number of compute cores is selected. The number of devices
 is increased to be the number of matching accelerators with the same
@ -257,7 +257,8 @@ the other particles.
 The *gpuID* keyword is used to specify the first ID for the GPU or
 other accelerator that LAMMPS will use. For example, if the ID is
 1 and *Ngpu* is 3, GPUs 1-3 will be used. Device IDs should be
-determined from the output of nvc_get_devices or ocl_get_devices
+determined from the output of nvc_get_devices, ocl_get_devices,
 or hip_get_devices
 as provided in the lib/gpu directory. When using OpenCL with
 accelerators that have main memory NUMA, the accelerators can be
 split into smaller virtual accelerators for more efficient use
@ -306,13 +307,14 @@ PPPM_MAX_SPLINE.
 CONFIG_ID can be 0. SHUFFLE_AVAIL in {0,1} indicates that inline-PTX
 (NVIDIA) or OpenCL extensions (Intel) should be used for horizontal
-vector operataions. FAST_MATH in {0,1} indicates that OpenCL fast math
+vector operations. FAST_MATH in {0,1} indicates that OpenCL fast math
-optimizations are used during the build and HW-accelerated
+optimizations are used during the build and hardware-accelerated
-transcendentals are used when available. THREADS_PER_* give the default
+transcendental functions are used when available. THREADS_PER_* give the
-*tpa* values for ellipsoidal models, styles using charge, and any other
+default *tpa* values for ellipsoidal models, styles using charge, and
-styles. The BLOCK_* parameters specify the block sizes for various
+any other styles. The BLOCK_* parameters specify the block sizes for
-kernal calls and the MAX_*SHARED*_ parameters are used to determine the
+various kernel calls and the MAX_*SHARED*_ parameters are used to
-amount of local shared memory to use for storing model parameters.
+determine the amount of local shared memory to use for storing model
 parameters.
 For OpenCL, the routines are compiled at runtime for the specified GPU
 or accelerator architecture. The *ocl_args* keyword can be used to
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@ -2297,6 +2297,7 @@ omegaz
 Omelyan
 omp
 OMP
 oneAPI
 onelevel
 oneway
 onn
@ -2528,6 +2529,7 @@ ptm
 PTM
 ptol
 ptr
 PTX
 pu
 purdue
 Purohit
--- a/lib/gpu/README
+++ b/lib/gpu/README
@ -45,8 +45,10 @@ efficient use with MPI.
 After building the GPU library, for OpenCL:
  ./ocl_get_devices
-and for CUDA
+for CUDA:
  ./nvc_get_devices
 and for ROCm HIP:
  ./hip_get_devices
 ------------------------------------------------------------------------------
                              QUICK START