add updates/corrections, improve formatting

This commit is contained in:
Axel Kohlmeyer
2024-12-27 03:50:51 -05:00
parent 53c3fa2afd
commit de0baba124
2 changed files with 47 additions and 36 deletions

View File

@ -31,7 +31,8 @@ Coulombics. It has the following general features:
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs).
so that the same functionality is supported on a variety of hardware.
**Required hardware/software:**
Required hardware/software
""""""""""""""""""""""""""
To compile and use this package in CUDA mode, you currently need
to have an NVIDIA GPU and install the corresponding NVIDIA CUDA
@ -69,12 +70,14 @@ To compile and use this package in HIP mode, you have to have the AMD ROCm
software installed. Versions of ROCm older than 3.5 are currently deprecated
by AMD.
**Building LAMMPS with the GPU package:**
Building LAMMPS with the GPU package
""""""""""""""""""""""""""""""""""""
See the :ref:`Build extras <gpu>` page for
instructions.
**Run with the GPU package from the command line:**
Run with the GPU package from the command-line
""""""""""""""""""""""""""""""""""""""""""""""
The ``mpirun`` or ``mpiexec`` command sets the total number of MPI tasks
used by LAMMPS (one or multiple per compute node) and the number of MPI
@ -133,7 +136,8 @@ affect the setting for bonded interactions (LAMMPS default is "on").
The "off" setting for pairwise interaction is currently required for
GPU package pair styles.
**Or run with the GPU package by editing an input script:**
Run with the GPU package by editing an input script
"""""""""""""""""""""""""""""""""""""""""""""""""""
The discussion above for the ``mpirun`` or ``mpiexec`` command, MPI
tasks/node, and use of multiple MPI tasks/GPU is the same.
@ -149,7 +153,8 @@ You must also use the :doc:`package gpu <package>` command to enable the
GPU package, unless the ``-sf gpu`` or ``-pk gpu`` :doc:`command-line switches <Run_options>` were used. It specifies the number of
GPUs/node to use, as well as other options.
**Speed-ups to expect:**
Speed-up to expect
""""""""""""""""""
The performance of a GPU versus a multicore CPU is a function of your
hardware, which pair style is used, the number of atoms/GPU, and the
@ -176,10 +181,13 @@ better with multiple OMP threads because the inter-process communication
is higher for these styles with the GPU package in order to allow
deterministic results.
**Guidelines for best performance:**
Guidelines for best performance
"""""""""""""""""""""""""""""""
* Using multiple MPI tasks per GPU will often give the best performance,
as allowed my most multicore CPU/GPU configurations.
* Using multiple MPI tasks (2-10) per GPU will often give the best
performance, as allowed my most multicore CPU/GPU configurations.
Using too many MPI tasks will result in wors performance due to
growing overhead.
* If the number of particles per MPI task is small (e.g. 100s of
particles), it can be more efficient to run with fewer MPI tasks per
GPU, even if you do not use all the cores on the compute node.
@ -199,12 +207,13 @@ deterministic results.
:doc:`angle <angle_style>`, :doc:`dihedral <dihedral_style>`,
:doc:`improper <improper_style>`, and :doc:`long-range <kspace_style>`
calculations will not be included in the "Pair" time.
* Since only part of the pppm kspace style is GPU accelerated, it
may be faster to only use GPU acceleration for Pair styles with
long-range electrostatics. See the "pair/only" keyword of the
package command for a shortcut to do that. The work between kspace
on the CPU and non-bonded interactions on the GPU can be balanced
through adjusting the coulomb cutoff without loss of accuracy.
* Since only part of the pppm kspace style is GPU accelerated, it may be
faster to only use GPU acceleration for Pair styles with long-range
electrostatics. See the "pair/only" keyword of the :doc:`package
command <package>` for a shortcut to do that. The distribution of
work between kspace on the CPU and non-bonded interactions on the GPU
can be balanced through adjusting the coulomb cutoff without loss of
accuracy.
* When the *mode* setting for the package gpu command is force/neigh,
the time for neighbor list calculations on the GPU will be added into
the "Pair" time, not the "Neigh" time. An additional breakdown of the
@ -220,4 +229,6 @@ deterministic results.
Restrictions
""""""""""""
None.
When using :doc:`hybrid pair styles <pair_hybrid>`, the neighbor list
must be generated on the host instead of the GPU and thus the potential
GPU acceleration is reduced.

View File

@ -1,5 +1,5 @@
INTEL package
==================
=============
The INTEL package is maintained by Mike Brown at Intel
Corporation. It provides two methods for accelerating simulations,
@ -13,18 +13,18 @@ twice, once on the CPU and once with an offload flag. This allows
LAMMPS to run on the CPU cores and co-processor cores simultaneously.
Currently Available INTEL Styles
"""""""""""""""""""""""""""""""""""""
""""""""""""""""""""""""""""""""
* Angle Styles: charmm, harmonic
* Bond Styles: fene, fourier, harmonic
* Bond Styles: fene, harmonic
* Dihedral Styles: charmm, fourier, harmonic, opls
* Fixes: nve, npt, nvt, nvt/sllod, nve/asphere
* Fixes: nve, npt, nvt, nvt/sllod, nve/asphere, electrode/conp, electrode/conq, electrode/thermo
* Improper Styles: cvff, harmonic
* Pair Styles: airebo, airebo/morse, buck/coul/cut, buck/coul/long,
buck, dpd, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm,
lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long,
rebo, sw, tersoff
* K-Space Styles: pppm, pppm/disp
rebo, snap, sw, tersoff
* K-Space Styles: pppm, pppm/disp, pppm/electrode
.. warning::
@ -33,7 +33,7 @@ Currently Available INTEL Styles
input requires it, LAMMPS will abort with an error message.
Speed-up to expect
"""""""""""""""""""
""""""""""""""""""
The speedup will depend on your simulation, the hardware, which
styles are used, the number of atoms, and the floating-point
@ -312,21 +312,21 @@ almost all cases.
recommended, especially when running on a machine with Intel
Hyper-Threading technology disabled.
Run with the INTEL package from the command line
"""""""""""""""""""""""""""""""""""""""""""""""""""""
Run with the INTEL package from the command-line
""""""""""""""""""""""""""""""""""""""""""""""""
To enable INTEL optimizations for all available styles used in
the input script, the ``-sf intel`` :doc:`command-line switch <Run_options>` can be used without any requirement for
editing the input script. This switch will automatically append
"intel" to styles that support it. It also invokes a default command:
:doc:`package intel 1 <package>`. This package command is used to set
options for the INTEL package. The default package command will
specify that INTEL calculations are performed in mixed precision,
that the number of OpenMP threads is specified by the OMP_NUM_THREADS
environment variable, and that if co-processors are present and the
binary was built with offload support, that 1 co-processor per node
will be used with automatic balancing of work between the CPU and the
co-processor.
To enable INTEL optimizations for all available styles used in the input
script, the ``-sf intel`` :doc:`command-line switch <Run_options>` can
be used without any requirement for editing the input script. This
switch will automatically append "intel" to styles that support it. It
also invokes a default command: :doc:`package intel 1 <package>`. This
package command is used to set options for the INTEL package. The
default package command will specify that INTEL calculations are
performed in mixed precision, that the number of OpenMP threads is
specified by the OMP_NUM_THREADS environment variable, and that if
co-processors are present and the binary was built with offload support,
that 1 co-processor per node will be used with automatic balancing of
work between the CPU and the co-processor.
You can specify different options for the INTEL package by using
the ``-pk intel Nphi`` :doc:`command-line switch <Run_options>` with