documentation corrections, spelling fixes and updates
This commit is contained in:
@ -1,11 +1,14 @@
|
|||||||
GPU package
|
GPU package
|
||||||
===========
|
===========
|
||||||
|
|
||||||
The GPU package was developed by Mike Brown while at SNL and ORNL
|
The GPU package was developed by Mike Brown while at SNL and ORNL (now
|
||||||
and his collaborators, particularly Trung Nguyen (now at Northwestern).
|
at Intel Corp.) and his collaborators, particularly Trung Nguyen (now at
|
||||||
It provides GPU versions of many pair styles and for parts of the
|
Northwestern). Support for AMD GPUs via HIP was added by Vsevolod Nikolskiy
|
||||||
:doc:`kspace_style pppm <kspace_style>` for long-range Coulombics.
|
and coworkers at HSE University.
|
||||||
It has the following general features:
|
|
||||||
|
The GPU package provides GPU versions of many pair styles and for
|
||||||
|
parts of the :doc:`kspace_style pppm <kspace_style>` for long-range
|
||||||
|
Coulombics. It has the following general features:
|
||||||
|
|
||||||
* It is designed to exploit common GPU hardware configurations where one
|
* It is designed to exploit common GPU hardware configurations where one
|
||||||
or more GPUs are coupled to many cores of one or more multi-core CPUs,
|
or more GPUs are coupled to many cores of one or more multi-core CPUs,
|
||||||
@ -24,8 +27,9 @@ It has the following general features:
|
|||||||
force vectors.
|
force vectors.
|
||||||
* LAMMPS-specific code is in the GPU package. It makes calls to a
|
* LAMMPS-specific code is in the GPU package. It makes calls to a
|
||||||
generic GPU library in the lib/gpu directory. This library provides
|
generic GPU library in the lib/gpu directory. This library provides
|
||||||
NVIDIA support as well as more general OpenCL support, so that the
|
either Nvidia support, AMD support, or more general OpenCL support
|
||||||
same functionality is supported on a variety of hardware.
|
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multi-core CPUs).
|
||||||
|
so that the same functionality is supported on a variety of hardware.
|
||||||
|
|
||||||
**Required hardware/software:**
|
**Required hardware/software:**
|
||||||
|
|
||||||
@ -89,10 +93,10 @@ shared by 4 MPI tasks.
|
|||||||
The GPU package also has limited support for OpenMP for both
|
The GPU package also has limited support for OpenMP for both
|
||||||
multi-threading and vectorization of routines that are run on the CPUs.
|
multi-threading and vectorization of routines that are run on the CPUs.
|
||||||
This requires that the GPU library and LAMMPS are built with flags to
|
This requires that the GPU library and LAMMPS are built with flags to
|
||||||
enable OpenMP support (e.g. -fopenmp -fopenmp-simd). Some styles for
|
enable OpenMP support (e.g. -fopenmp). Some styles for time integration
|
||||||
time integration are also available in the GPU package. These run
|
are also available in the GPU package. These run completely on the CPUs
|
||||||
completely on the CPUs in full double precision, but exploit
|
in full double precision, but exploit multi-threading and vectorization
|
||||||
multi-threading and vectorization for faster performance.
|
for faster performance.
|
||||||
|
|
||||||
Use the "-sf gpu" :doc:`command-line switch <Run_options>`, which will
|
Use the "-sf gpu" :doc:`command-line switch <Run_options>`, which will
|
||||||
automatically append "gpu" to styles that support it. Use the "-pk
|
automatically append "gpu" to styles that support it. Use the "-pk
|
||||||
@ -159,11 +163,11 @@ Likewise, you should experiment with the precision setting for the GPU
|
|||||||
library to see if single or mixed precision will give accurate
|
library to see if single or mixed precision will give accurate
|
||||||
results, since they will typically be faster.
|
results, since they will typically be faster.
|
||||||
|
|
||||||
MPI parallelism typically outperforms OpenMP parallelism, but in same cases
|
MPI parallelism typically outperforms OpenMP parallelism, but in some
|
||||||
using fewer MPI tasks and multiple OpenMP threads with the GPU package
|
cases using fewer MPI tasks and multiple OpenMP threads with the GPU
|
||||||
can give better performance. 3-body potentials can often perform better
|
package can give better performance. 3-body potentials can often perform
|
||||||
with multiple OMP threads because the inter-process communication is
|
better with multiple OMP threads because the inter-process communication
|
||||||
higher for these styles with the GPU package in order to allow
|
is higher for these styles with the GPU package in order to allow
|
||||||
deterministic results.
|
deterministic results.
|
||||||
|
|
||||||
**Guidelines for best performance:**
|
**Guidelines for best performance:**
|
||||||
@ -189,6 +193,12 @@ deterministic results.
|
|||||||
:doc:`angle <angle_style>`, :doc:`dihedral <dihedral_style>`,
|
:doc:`angle <angle_style>`, :doc:`dihedral <dihedral_style>`,
|
||||||
:doc:`improper <improper_style>`, and :doc:`long-range <kspace_style>`
|
:doc:`improper <improper_style>`, and :doc:`long-range <kspace_style>`
|
||||||
calculations will not be included in the "Pair" time.
|
calculations will not be included in the "Pair" time.
|
||||||
|
* Since only part of the pppm kspace style is GPU accelerated, it
|
||||||
|
may be faster to only use GPU acceleration for Pair styles with
|
||||||
|
long-range electrostatics. See the "pair/only" keyword of the
|
||||||
|
package command for a shortcut to do that. The work between kspace
|
||||||
|
on the CPU and non-bonded interactions on the GPU can be balanced
|
||||||
|
through adjusting the coulomb cutoff without loss of accuracy.
|
||||||
* When the *mode* setting for the package gpu command is force/neigh,
|
* When the *mode* setting for the package gpu command is force/neigh,
|
||||||
the time for neighbor list calculations on the GPU will be added into
|
the time for neighbor list calculations on the GPU will be added into
|
||||||
the "Pair" time, not the "Neigh" time. An additional breakdown of the
|
the "Pair" time, not the "Neigh" time. An additional breakdown of the
|
||||||
|
|||||||
@ -175,7 +175,7 @@ package.
|
|||||||
|
|
||||||
The *Ngpu* argument sets the number of GPUs per node. If *Ngpu* is 0
|
The *Ngpu* argument sets the number of GPUs per node. If *Ngpu* is 0
|
||||||
and no other keywords are specified, GPU or accelerator devices are
|
and no other keywords are specified, GPU or accelerator devices are
|
||||||
autoselected. In this process, all platforms are searched for
|
auto-selected. In this process, all platforms are searched for
|
||||||
accelerator devices and GPUs are chosen if available. The device with
|
accelerator devices and GPUs are chosen if available. The device with
|
||||||
the highest number of compute cores is selected. The number of devices
|
the highest number of compute cores is selected. The number of devices
|
||||||
is increased to be the number of matching accelerators with the same
|
is increased to be the number of matching accelerators with the same
|
||||||
@ -257,7 +257,8 @@ the other particles.
|
|||||||
The *gpuID* keyword is used to specify the first ID for the GPU or
|
The *gpuID* keyword is used to specify the first ID for the GPU or
|
||||||
other accelerator that LAMMPS will use. For example, if the ID is
|
other accelerator that LAMMPS will use. For example, if the ID is
|
||||||
1 and *Ngpu* is 3, GPUs 1-3 will be used. Device IDs should be
|
1 and *Ngpu* is 3, GPUs 1-3 will be used. Device IDs should be
|
||||||
determined from the output of nvc_get_devices or ocl_get_devices
|
determined from the output of nvc_get_devices, ocl_get_devices,
|
||||||
|
or hip_get_devices
|
||||||
as provided in the lib/gpu directory. When using OpenCL with
|
as provided in the lib/gpu directory. When using OpenCL with
|
||||||
accelerators that have main memory NUMA, the accelerators can be
|
accelerators that have main memory NUMA, the accelerators can be
|
||||||
split into smaller virtual accelerators for more efficient use
|
split into smaller virtual accelerators for more efficient use
|
||||||
@ -306,13 +307,14 @@ PPPM_MAX_SPLINE.
|
|||||||
|
|
||||||
CONFIG_ID can be 0. SHUFFLE_AVAIL in {0,1} indicates that inline-PTX
|
CONFIG_ID can be 0. SHUFFLE_AVAIL in {0,1} indicates that inline-PTX
|
||||||
(NVIDIA) or OpenCL extensions (Intel) should be used for horizontal
|
(NVIDIA) or OpenCL extensions (Intel) should be used for horizontal
|
||||||
vector operataions. FAST_MATH in {0,1} indicates that OpenCL fast math
|
vector operations. FAST_MATH in {0,1} indicates that OpenCL fast math
|
||||||
optimizations are used during the build and HW-accelerated
|
optimizations are used during the build and hardware-accelerated
|
||||||
transcendentals are used when available. THREADS_PER_* give the default
|
transcendental functions are used when available. THREADS_PER_* give the
|
||||||
*tpa* values for ellipsoidal models, styles using charge, and any other
|
default *tpa* values for ellipsoidal models, styles using charge, and
|
||||||
styles. The BLOCK_* parameters specify the block sizes for various
|
any other styles. The BLOCK_* parameters specify the block sizes for
|
||||||
kernal calls and the MAX_*SHARED*_ parameters are used to determine the
|
various kernel calls and the MAX_*SHARED*_ parameters are used to
|
||||||
amount of local shared memory to use for storing model parameters.
|
determine the amount of local shared memory to use for storing model
|
||||||
|
parameters.
|
||||||
|
|
||||||
For OpenCL, the routines are compiled at runtime for the specified GPU
|
For OpenCL, the routines are compiled at runtime for the specified GPU
|
||||||
or accelerator architecture. The *ocl_args* keyword can be used to
|
or accelerator architecture. The *ocl_args* keyword can be used to
|
||||||
|
|||||||
@ -2297,6 +2297,7 @@ omegaz
|
|||||||
Omelyan
|
Omelyan
|
||||||
omp
|
omp
|
||||||
OMP
|
OMP
|
||||||
|
oneAPI
|
||||||
onelevel
|
onelevel
|
||||||
oneway
|
oneway
|
||||||
onn
|
onn
|
||||||
@ -2528,6 +2529,7 @@ ptm
|
|||||||
PTM
|
PTM
|
||||||
ptol
|
ptol
|
||||||
ptr
|
ptr
|
||||||
|
PTX
|
||||||
pu
|
pu
|
||||||
purdue
|
purdue
|
||||||
Purohit
|
Purohit
|
||||||
|
|||||||
@ -45,8 +45,10 @@ efficient use with MPI.
|
|||||||
|
|
||||||
After building the GPU library, for OpenCL:
|
After building the GPU library, for OpenCL:
|
||||||
./ocl_get_devices
|
./ocl_get_devices
|
||||||
and for CUDA
|
for CUDA:
|
||||||
./nvc_get_devices
|
./nvc_get_devices
|
||||||
|
and for ROCm HIP:
|
||||||
|
./hip_get_devices
|
||||||
|
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
QUICK START
|
QUICK START
|
||||||
|
|||||||
Reference in New Issue
Block a user