add updates/corrections, improve formatting
This commit is contained in:
@ -31,7 +31,8 @@ Coulombics. It has the following general features:
|
|||||||
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs).
|
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs).
|
||||||
so that the same functionality is supported on a variety of hardware.
|
so that the same functionality is supported on a variety of hardware.
|
||||||
|
|
||||||
**Required hardware/software:**
|
Required hardware/software
|
||||||
|
""""""""""""""""""""""""""
|
||||||
|
|
||||||
To compile and use this package in CUDA mode, you currently need
|
To compile and use this package in CUDA mode, you currently need
|
||||||
to have an NVIDIA GPU and install the corresponding NVIDIA CUDA
|
to have an NVIDIA GPU and install the corresponding NVIDIA CUDA
|
||||||
@ -69,12 +70,14 @@ To compile and use this package in HIP mode, you have to have the AMD ROCm
|
|||||||
software installed. Versions of ROCm older than 3.5 are currently deprecated
|
software installed. Versions of ROCm older than 3.5 are currently deprecated
|
||||||
by AMD.
|
by AMD.
|
||||||
|
|
||||||
**Building LAMMPS with the GPU package:**
|
Building LAMMPS with the GPU package
|
||||||
|
""""""""""""""""""""""""""""""""""""
|
||||||
|
|
||||||
See the :ref:`Build extras <gpu>` page for
|
See the :ref:`Build extras <gpu>` page for
|
||||||
instructions.
|
instructions.
|
||||||
|
|
||||||
**Run with the GPU package from the command line:**
|
Run with the GPU package from the command-line
|
||||||
|
""""""""""""""""""""""""""""""""""""""""""""""
|
||||||
|
|
||||||
The ``mpirun`` or ``mpiexec`` command sets the total number of MPI tasks
|
The ``mpirun`` or ``mpiexec`` command sets the total number of MPI tasks
|
||||||
used by LAMMPS (one or multiple per compute node) and the number of MPI
|
used by LAMMPS (one or multiple per compute node) and the number of MPI
|
||||||
@ -133,7 +136,8 @@ affect the setting for bonded interactions (LAMMPS default is "on").
|
|||||||
The "off" setting for pairwise interaction is currently required for
|
The "off" setting for pairwise interaction is currently required for
|
||||||
GPU package pair styles.
|
GPU package pair styles.
|
||||||
|
|
||||||
**Or run with the GPU package by editing an input script:**
|
Run with the GPU package by editing an input script
|
||||||
|
"""""""""""""""""""""""""""""""""""""""""""""""""""
|
||||||
|
|
||||||
The discussion above for the ``mpirun`` or ``mpiexec`` command, MPI
|
The discussion above for the ``mpirun`` or ``mpiexec`` command, MPI
|
||||||
tasks/node, and use of multiple MPI tasks/GPU is the same.
|
tasks/node, and use of multiple MPI tasks/GPU is the same.
|
||||||
@ -149,7 +153,8 @@ You must also use the :doc:`package gpu <package>` command to enable the
|
|||||||
GPU package, unless the ``-sf gpu`` or ``-pk gpu`` :doc:`command-line switches <Run_options>` were used. It specifies the number of
|
GPU package, unless the ``-sf gpu`` or ``-pk gpu`` :doc:`command-line switches <Run_options>` were used. It specifies the number of
|
||||||
GPUs/node to use, as well as other options.
|
GPUs/node to use, as well as other options.
|
||||||
|
|
||||||
**Speed-ups to expect:**
|
Speed-up to expect
|
||||||
|
""""""""""""""""""
|
||||||
|
|
||||||
The performance of a GPU versus a multicore CPU is a function of your
|
The performance of a GPU versus a multicore CPU is a function of your
|
||||||
hardware, which pair style is used, the number of atoms/GPU, and the
|
hardware, which pair style is used, the number of atoms/GPU, and the
|
||||||
@ -176,10 +181,13 @@ better with multiple OMP threads because the inter-process communication
|
|||||||
is higher for these styles with the GPU package in order to allow
|
is higher for these styles with the GPU package in order to allow
|
||||||
deterministic results.
|
deterministic results.
|
||||||
|
|
||||||
**Guidelines for best performance:**
|
Guidelines for best performance
|
||||||
|
"""""""""""""""""""""""""""""""
|
||||||
|
|
||||||
* Using multiple MPI tasks per GPU will often give the best performance,
|
* Using multiple MPI tasks (2-10) per GPU will often give the best
|
||||||
as allowed my most multicore CPU/GPU configurations.
|
performance, as allowed my most multicore CPU/GPU configurations.
|
||||||
|
Using too many MPI tasks will result in wors performance due to
|
||||||
|
growing overhead.
|
||||||
* If the number of particles per MPI task is small (e.g. 100s of
|
* If the number of particles per MPI task is small (e.g. 100s of
|
||||||
particles), it can be more efficient to run with fewer MPI tasks per
|
particles), it can be more efficient to run with fewer MPI tasks per
|
||||||
GPU, even if you do not use all the cores on the compute node.
|
GPU, even if you do not use all the cores on the compute node.
|
||||||
@ -199,12 +207,13 @@ deterministic results.
|
|||||||
:doc:`angle <angle_style>`, :doc:`dihedral <dihedral_style>`,
|
:doc:`angle <angle_style>`, :doc:`dihedral <dihedral_style>`,
|
||||||
:doc:`improper <improper_style>`, and :doc:`long-range <kspace_style>`
|
:doc:`improper <improper_style>`, and :doc:`long-range <kspace_style>`
|
||||||
calculations will not be included in the "Pair" time.
|
calculations will not be included in the "Pair" time.
|
||||||
* Since only part of the pppm kspace style is GPU accelerated, it
|
* Since only part of the pppm kspace style is GPU accelerated, it may be
|
||||||
may be faster to only use GPU acceleration for Pair styles with
|
faster to only use GPU acceleration for Pair styles with long-range
|
||||||
long-range electrostatics. See the "pair/only" keyword of the
|
electrostatics. See the "pair/only" keyword of the :doc:`package
|
||||||
package command for a shortcut to do that. The work between kspace
|
command <package>` for a shortcut to do that. The distribution of
|
||||||
on the CPU and non-bonded interactions on the GPU can be balanced
|
work between kspace on the CPU and non-bonded interactions on the GPU
|
||||||
through adjusting the coulomb cutoff without loss of accuracy.
|
can be balanced through adjusting the coulomb cutoff without loss of
|
||||||
|
accuracy.
|
||||||
* When the *mode* setting for the package gpu command is force/neigh,
|
* When the *mode* setting for the package gpu command is force/neigh,
|
||||||
the time for neighbor list calculations on the GPU will be added into
|
the time for neighbor list calculations on the GPU will be added into
|
||||||
the "Pair" time, not the "Neigh" time. An additional breakdown of the
|
the "Pair" time, not the "Neigh" time. An additional breakdown of the
|
||||||
@ -220,4 +229,6 @@ deterministic results.
|
|||||||
Restrictions
|
Restrictions
|
||||||
""""""""""""
|
""""""""""""
|
||||||
|
|
||||||
None.
|
When using :doc:`hybrid pair styles <pair_hybrid>`, the neighbor list
|
||||||
|
must be generated on the host instead of the GPU and thus the potential
|
||||||
|
GPU acceleration is reduced.
|
||||||
|
|||||||
@ -1,5 +1,5 @@
|
|||||||
INTEL package
|
INTEL package
|
||||||
==================
|
=============
|
||||||
|
|
||||||
The INTEL package is maintained by Mike Brown at Intel
|
The INTEL package is maintained by Mike Brown at Intel
|
||||||
Corporation. It provides two methods for accelerating simulations,
|
Corporation. It provides two methods for accelerating simulations,
|
||||||
@ -13,18 +13,18 @@ twice, once on the CPU and once with an offload flag. This allows
|
|||||||
LAMMPS to run on the CPU cores and co-processor cores simultaneously.
|
LAMMPS to run on the CPU cores and co-processor cores simultaneously.
|
||||||
|
|
||||||
Currently Available INTEL Styles
|
Currently Available INTEL Styles
|
||||||
"""""""""""""""""""""""""""""""""""""
|
""""""""""""""""""""""""""""""""
|
||||||
|
|
||||||
* Angle Styles: charmm, harmonic
|
* Angle Styles: charmm, harmonic
|
||||||
* Bond Styles: fene, fourier, harmonic
|
* Bond Styles: fene, harmonic
|
||||||
* Dihedral Styles: charmm, fourier, harmonic, opls
|
* Dihedral Styles: charmm, fourier, harmonic, opls
|
||||||
* Fixes: nve, npt, nvt, nvt/sllod, nve/asphere
|
* Fixes: nve, npt, nvt, nvt/sllod, nve/asphere, electrode/conp, electrode/conq, electrode/thermo
|
||||||
* Improper Styles: cvff, harmonic
|
* Improper Styles: cvff, harmonic
|
||||||
* Pair Styles: airebo, airebo/morse, buck/coul/cut, buck/coul/long,
|
* Pair Styles: airebo, airebo/morse, buck/coul/cut, buck/coul/long,
|
||||||
buck, dpd, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm,
|
buck, dpd, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm,
|
||||||
lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long,
|
lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long,
|
||||||
rebo, sw, tersoff
|
rebo, snap, sw, tersoff
|
||||||
* K-Space Styles: pppm, pppm/disp
|
* K-Space Styles: pppm, pppm/disp, pppm/electrode
|
||||||
|
|
||||||
.. warning::
|
.. warning::
|
||||||
|
|
||||||
@ -33,7 +33,7 @@ Currently Available INTEL Styles
|
|||||||
input requires it, LAMMPS will abort with an error message.
|
input requires it, LAMMPS will abort with an error message.
|
||||||
|
|
||||||
Speed-up to expect
|
Speed-up to expect
|
||||||
"""""""""""""""""""
|
""""""""""""""""""
|
||||||
|
|
||||||
The speedup will depend on your simulation, the hardware, which
|
The speedup will depend on your simulation, the hardware, which
|
||||||
styles are used, the number of atoms, and the floating-point
|
styles are used, the number of atoms, and the floating-point
|
||||||
@ -312,21 +312,21 @@ almost all cases.
|
|||||||
recommended, especially when running on a machine with Intel
|
recommended, especially when running on a machine with Intel
|
||||||
Hyper-Threading technology disabled.
|
Hyper-Threading technology disabled.
|
||||||
|
|
||||||
Run with the INTEL package from the command line
|
Run with the INTEL package from the command-line
|
||||||
"""""""""""""""""""""""""""""""""""""""""""""""""""""
|
""""""""""""""""""""""""""""""""""""""""""""""""
|
||||||
|
|
||||||
To enable INTEL optimizations for all available styles used in
|
To enable INTEL optimizations for all available styles used in the input
|
||||||
the input script, the ``-sf intel`` :doc:`command-line switch <Run_options>` can be used without any requirement for
|
script, the ``-sf intel`` :doc:`command-line switch <Run_options>` can
|
||||||
editing the input script. This switch will automatically append
|
be used without any requirement for editing the input script. This
|
||||||
"intel" to styles that support it. It also invokes a default command:
|
switch will automatically append "intel" to styles that support it. It
|
||||||
:doc:`package intel 1 <package>`. This package command is used to set
|
also invokes a default command: :doc:`package intel 1 <package>`. This
|
||||||
options for the INTEL package. The default package command will
|
package command is used to set options for the INTEL package. The
|
||||||
specify that INTEL calculations are performed in mixed precision,
|
default package command will specify that INTEL calculations are
|
||||||
that the number of OpenMP threads is specified by the OMP_NUM_THREADS
|
performed in mixed precision, that the number of OpenMP threads is
|
||||||
environment variable, and that if co-processors are present and the
|
specified by the OMP_NUM_THREADS environment variable, and that if
|
||||||
binary was built with offload support, that 1 co-processor per node
|
co-processors are present and the binary was built with offload support,
|
||||||
will be used with automatic balancing of work between the CPU and the
|
that 1 co-processor per node will be used with automatic balancing of
|
||||||
co-processor.
|
work between the CPU and the co-processor.
|
||||||
|
|
||||||
You can specify different options for the INTEL package by using
|
You can specify different options for the INTEL package by using
|
||||||
the ``-pk intel Nphi`` :doc:`command-line switch <Run_options>` with
|
the ``-pk intel Nphi`` :doc:`command-line switch <Run_options>` with
|
||||||
|
|||||||
Reference in New Issue
Block a user