add updates/corrections, improve formatting

This commit is contained in:
Axel Kohlmeyer
2024-12-27 03:50:51 -05:00
parent 53c3fa2afd
commit de0baba124
2 changed files with 47 additions and 36 deletions

View File

@ -31,7 +31,8 @@ Coulombics. It has the following general features:
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs). (for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs).
so that the same functionality is supported on a variety of hardware. so that the same functionality is supported on a variety of hardware.
**Required hardware/software:** Required hardware/software
""""""""""""""""""""""""""
To compile and use this package in CUDA mode, you currently need To compile and use this package in CUDA mode, you currently need
to have an NVIDIA GPU and install the corresponding NVIDIA CUDA to have an NVIDIA GPU and install the corresponding NVIDIA CUDA
@ -69,12 +70,14 @@ To compile and use this package in HIP mode, you have to have the AMD ROCm
software installed. Versions of ROCm older than 3.5 are currently deprecated software installed. Versions of ROCm older than 3.5 are currently deprecated
by AMD. by AMD.
**Building LAMMPS with the GPU package:** Building LAMMPS with the GPU package
""""""""""""""""""""""""""""""""""""
See the :ref:`Build extras <gpu>` page for See the :ref:`Build extras <gpu>` page for
instructions. instructions.
**Run with the GPU package from the command line:** Run with the GPU package from the command-line
""""""""""""""""""""""""""""""""""""""""""""""
The ``mpirun`` or ``mpiexec`` command sets the total number of MPI tasks The ``mpirun`` or ``mpiexec`` command sets the total number of MPI tasks
used by LAMMPS (one or multiple per compute node) and the number of MPI used by LAMMPS (one or multiple per compute node) and the number of MPI
@ -133,7 +136,8 @@ affect the setting for bonded interactions (LAMMPS default is "on").
The "off" setting for pairwise interaction is currently required for The "off" setting for pairwise interaction is currently required for
GPU package pair styles. GPU package pair styles.
**Or run with the GPU package by editing an input script:** Run with the GPU package by editing an input script
"""""""""""""""""""""""""""""""""""""""""""""""""""
The discussion above for the ``mpirun`` or ``mpiexec`` command, MPI The discussion above for the ``mpirun`` or ``mpiexec`` command, MPI
tasks/node, and use of multiple MPI tasks/GPU is the same. tasks/node, and use of multiple MPI tasks/GPU is the same.
@ -149,7 +153,8 @@ You must also use the :doc:`package gpu <package>` command to enable the
GPU package, unless the ``-sf gpu`` or ``-pk gpu`` :doc:`command-line switches <Run_options>` were used. It specifies the number of GPU package, unless the ``-sf gpu`` or ``-pk gpu`` :doc:`command-line switches <Run_options>` were used. It specifies the number of
GPUs/node to use, as well as other options. GPUs/node to use, as well as other options.
**Speed-ups to expect:** Speed-up to expect
""""""""""""""""""
The performance of a GPU versus a multicore CPU is a function of your The performance of a GPU versus a multicore CPU is a function of your
hardware, which pair style is used, the number of atoms/GPU, and the hardware, which pair style is used, the number of atoms/GPU, and the
@ -176,10 +181,13 @@ better with multiple OMP threads because the inter-process communication
is higher for these styles with the GPU package in order to allow is higher for these styles with the GPU package in order to allow
deterministic results. deterministic results.
**Guidelines for best performance:** Guidelines for best performance
"""""""""""""""""""""""""""""""
* Using multiple MPI tasks per GPU will often give the best performance, * Using multiple MPI tasks (2-10) per GPU will often give the best
as allowed my most multicore CPU/GPU configurations. performance, as allowed my most multicore CPU/GPU configurations.
Using too many MPI tasks will result in wors performance due to
growing overhead.
* If the number of particles per MPI task is small (e.g. 100s of * If the number of particles per MPI task is small (e.g. 100s of
particles), it can be more efficient to run with fewer MPI tasks per particles), it can be more efficient to run with fewer MPI tasks per
GPU, even if you do not use all the cores on the compute node. GPU, even if you do not use all the cores on the compute node.
@ -199,12 +207,13 @@ deterministic results.
:doc:`angle <angle_style>`, :doc:`dihedral <dihedral_style>`, :doc:`angle <angle_style>`, :doc:`dihedral <dihedral_style>`,
:doc:`improper <improper_style>`, and :doc:`long-range <kspace_style>` :doc:`improper <improper_style>`, and :doc:`long-range <kspace_style>`
calculations will not be included in the "Pair" time. calculations will not be included in the "Pair" time.
* Since only part of the pppm kspace style is GPU accelerated, it * Since only part of the pppm kspace style is GPU accelerated, it may be
may be faster to only use GPU acceleration for Pair styles with faster to only use GPU acceleration for Pair styles with long-range
long-range electrostatics. See the "pair/only" keyword of the electrostatics. See the "pair/only" keyword of the :doc:`package
package command for a shortcut to do that. The work between kspace command <package>` for a shortcut to do that. The distribution of
on the CPU and non-bonded interactions on the GPU can be balanced work between kspace on the CPU and non-bonded interactions on the GPU
through adjusting the coulomb cutoff without loss of accuracy. can be balanced through adjusting the coulomb cutoff without loss of
accuracy.
* When the *mode* setting for the package gpu command is force/neigh, * When the *mode* setting for the package gpu command is force/neigh,
the time for neighbor list calculations on the GPU will be added into the time for neighbor list calculations on the GPU will be added into
the "Pair" time, not the "Neigh" time. An additional breakdown of the the "Pair" time, not the "Neigh" time. An additional breakdown of the
@ -220,4 +229,6 @@ deterministic results.
Restrictions Restrictions
"""""""""""" """"""""""""
None. When using :doc:`hybrid pair styles <pair_hybrid>`, the neighbor list
must be generated on the host instead of the GPU and thus the potential
GPU acceleration is reduced.

View File

@ -1,5 +1,5 @@
INTEL package INTEL package
================== =============
The INTEL package is maintained by Mike Brown at Intel The INTEL package is maintained by Mike Brown at Intel
Corporation. It provides two methods for accelerating simulations, Corporation. It provides two methods for accelerating simulations,
@ -13,18 +13,18 @@ twice, once on the CPU and once with an offload flag. This allows
LAMMPS to run on the CPU cores and co-processor cores simultaneously. LAMMPS to run on the CPU cores and co-processor cores simultaneously.
Currently Available INTEL Styles Currently Available INTEL Styles
""""""""""""""""""""""""""""""""""""" """"""""""""""""""""""""""""""""
* Angle Styles: charmm, harmonic * Angle Styles: charmm, harmonic
* Bond Styles: fene, fourier, harmonic * Bond Styles: fene, harmonic
* Dihedral Styles: charmm, fourier, harmonic, opls * Dihedral Styles: charmm, fourier, harmonic, opls
* Fixes: nve, npt, nvt, nvt/sllod, nve/asphere * Fixes: nve, npt, nvt, nvt/sllod, nve/asphere, electrode/conp, electrode/conq, electrode/thermo
* Improper Styles: cvff, harmonic * Improper Styles: cvff, harmonic
* Pair Styles: airebo, airebo/morse, buck/coul/cut, buck/coul/long, * Pair Styles: airebo, airebo/morse, buck/coul/cut, buck/coul/long,
buck, dpd, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm, buck, dpd, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm,
lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long, lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long,
rebo, sw, tersoff rebo, snap, sw, tersoff
* K-Space Styles: pppm, pppm/disp * K-Space Styles: pppm, pppm/disp, pppm/electrode
.. warning:: .. warning::
@ -33,7 +33,7 @@ Currently Available INTEL Styles
input requires it, LAMMPS will abort with an error message. input requires it, LAMMPS will abort with an error message.
Speed-up to expect Speed-up to expect
""""""""""""""""""" """"""""""""""""""
The speedup will depend on your simulation, the hardware, which The speedup will depend on your simulation, the hardware, which
styles are used, the number of atoms, and the floating-point styles are used, the number of atoms, and the floating-point
@ -312,21 +312,21 @@ almost all cases.
recommended, especially when running on a machine with Intel recommended, especially when running on a machine with Intel
Hyper-Threading technology disabled. Hyper-Threading technology disabled.
Run with the INTEL package from the command line Run with the INTEL package from the command-line
""""""""""""""""""""""""""""""""""""""""""""""""""""" """"""""""""""""""""""""""""""""""""""""""""""""
To enable INTEL optimizations for all available styles used in To enable INTEL optimizations for all available styles used in the input
the input script, the ``-sf intel`` :doc:`command-line switch <Run_options>` can be used without any requirement for script, the ``-sf intel`` :doc:`command-line switch <Run_options>` can
editing the input script. This switch will automatically append be used without any requirement for editing the input script. This
"intel" to styles that support it. It also invokes a default command: switch will automatically append "intel" to styles that support it. It
:doc:`package intel 1 <package>`. This package command is used to set also invokes a default command: :doc:`package intel 1 <package>`. This
options for the INTEL package. The default package command will package command is used to set options for the INTEL package. The
specify that INTEL calculations are performed in mixed precision, default package command will specify that INTEL calculations are
that the number of OpenMP threads is specified by the OMP_NUM_THREADS performed in mixed precision, that the number of OpenMP threads is
environment variable, and that if co-processors are present and the specified by the OMP_NUM_THREADS environment variable, and that if
binary was built with offload support, that 1 co-processor per node co-processors are present and the binary was built with offload support,
will be used with automatic balancing of work between the CPU and the that 1 co-processor per node will be used with automatic balancing of
co-processor. work between the CPU and the co-processor.
You can specify different options for the INTEL package by using You can specify different options for the INTEL package by using
the ``-pk intel Nphi`` :doc:`command-line switch <Run_options>` with the ``-pk intel Nphi`` :doc:`command-line switch <Run_options>` with