Merge branch 'master' into tersoff-shift

This commit is contained in:
Axel Kohlmeyer
2021-01-11 04:30:11 -05:00
509 changed files with 7657 additions and 7183 deletions

View File

@ -807,7 +807,7 @@ compiling LAMMPS with Python version 3.6 or later.
the ``cythonize`` command in case the corresponding .pyx file(s) were
modified. You may need to modify ``lib/python/Makefile.lammps``
if the LAMMPS build fails.
To manually enforce building MLIAP with Python support enabled,
To manually enforce building MLIAP with Python support enabled,
you can add
``-DMLIAP_PYTHON`` to the ``LMP_INC`` variable in your machine makefile.
You may have to manually run the ``cythonize`` command on .pyx file(s)

View File

@ -1,5 +1,4 @@
Include packages in build
=========================
In LAMMPS, a package is a group of files that enable a specific set of

View File

@ -38,14 +38,14 @@ produce an executable compatible with a specific hardware.
:class: note
Kokkos with CUDA currently implicitly assumes that the MPI library is
CUDA-aware. This is not always the case, especially when using
GPU-aware. This is not always the case, especially when using
pre-compiled MPI libraries provided by a Linux distribution. This is
not a problem when using only a single GPU with a single MPI
rank. When running with multiple MPI ranks, you may see segmentation
faults without CUDA-aware MPI support. These can be avoided by adding
the flags :doc:`-pk kokkos cuda/aware off <Run_options>` to the
faults without GPU-aware MPI support. These can be avoided by adding
the flags :doc:`-pk kokkos gpu/aware off <Run_options>` to the
LAMMPS command line or by using the command :doc:`package kokkos
cuda/aware off <package>` in the input file.
gpu/aware off <package>` in the input file.
.. admonition:: AMD GPU support
:class: note
@ -242,8 +242,8 @@ case, also packing/unpacking communication buffers on the host may give
speedup (see the KOKKOS :doc:`package <package>` command). Using CUDA MPS
is recommended in this scenario.
Using a CUDA-aware MPI library is highly recommended. CUDA-aware MPI use can be
avoided by using :doc:`-pk kokkos cuda/aware no <package>`. As above for
Using a GPU-aware MPI library is highly recommended. GPU-aware MPI use can be
avoided by using :doc:`-pk kokkos gpu/aware off <package>`. As above for
multi-core CPUs (and no GPU), if N is the number of physical cores/node,
then the number of MPI tasks/node should not exceed N.

View File

@ -115,8 +115,8 @@ The optional keyword *chunksize* is only applicable when using the
the KOKKOS package and is ignored otherwise. This keyword controls
the number of atoms in each pass used to compute the bond-orientational
order parameters and is used to avoid running out of memory. For example
if there are 4000 atoms in the simulation and the *chunksize*
is set to 2000, the parameter calculation will be broken up
if there are 32768 atoms in the simulation and the *chunksize*
is set to 16384, the parameter calculation will be broken up
into two passes.
The value of :math:`Q_l` is set to zero for atoms not in the
@ -193,7 +193,7 @@ Default
The option defaults are *cutoff* = pair style cutoff, *nnn* = 12,
*degrees* = 5 4 6 8 10 12 i.e. :math:`Q_4`, :math:`Q_6`, :math:`Q_8`, :math:`Q_{10}`, and :math:`Q_{12}`,
*wl* = no, *wl/hat* = no, *components* off, and *chunksize* = 2000
*wl* = no, *wl/hat* = no, *components* off, and *chunksize* = 16384
----------

View File

@ -93,7 +93,7 @@ from a compute, fix, or variable, then see the :doc:`fix ave/chunk <fix_ave_chun
:doc:`fix ave/histo <fix_ave_histo>` commands. If you wish to convert a
per-atom quantity into a single global value, see the :doc:`compute reduce <compute_reduce>` command.
The input values must either be all scalars. What kinds of
The input values must be all scalars. What kinds of
correlations between input values are calculated is determined by the
*type* keyword as discussed below.

View File

@ -68,7 +68,7 @@ Syntax
*no_affinity* values = none
*kokkos* args = keyword value ...
zero or more keyword/value pairs may be appended
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* or *comm/reverse* or *cuda/aware* or *pair/only*
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* *pair/comm/forward* *fix/comm/forward* or *comm/reverse* or *gpu/aware* or *pair/only*
*neigh* value = *full* or *half*
full = full neighbor list
half = half neighbor list built in thread-safe manner
@ -84,16 +84,18 @@ Syntax
*binsize* value = size
size = bin size for neighbor list construction (distance units)
*comm* value = *no* or *host* or *device*
use value for comm/exchange and comm/forward and comm/reverse
use value for comm/exchange and comm/forward and pair/comm/forward and fix/comm/forward and comm/reverse
*comm/exchange* value = *no* or *host* or *device*
*comm/forward* value = *no* or *host* or *device*
*pair/comm/forward* value = *no* or *device*
*fix/comm/forward* value = *no* or *device*
*comm/reverse* value = *no* or *host* or *device*
no = perform communication pack/unpack in non-KOKKOS mode
host = perform pack/unpack on host (e.g. with OpenMP threading)
device = perform pack/unpack on device (e.g. on GPU)
*cuda/aware* = *off* or *on*
off = do not use CUDA-aware MPI
on = use CUDA-aware MPI (default)
*gpu/aware* = *off* or *on*
off = do not use GPU-aware MPI
on = use GPU-aware MPI (default)
*pair/only* = *off* or *on*
off = use device acceleration (e.g. GPU) for all available styles in the KOKKOS package (default)
on = use device acceleration only for pair styles (and host acceleration for others)
@ -498,7 +500,8 @@ because the GPU is faster at performing pairwise interactions, then this
rule of thumb may give too large a binsize and the default should be
overridden with a smaller value.
The *comm* and *comm/exchange* and *comm/forward* and *comm/reverse*
The *comm* and *comm/exchange* and *comm/forward* and *pair/comm/forward*
and *fix/comm/forward* and comm/reverse*
keywords determine whether the host or device performs the packing and
unpacking of data when communicating per-atom data between processors.
"Exchange" communication happens only on timesteps that neighbor lists
@ -506,18 +509,22 @@ are rebuilt. The data is only for atoms that migrate to new processors.
"Forward" communication happens every timestep. "Reverse" communication
happens every timestep if the *newton* option is on. The data is for
atom coordinates and any other atom properties that needs to be updated
for ghost atoms owned by each processor.
for ghost atoms owned by each processor. "Pair/comm" controls additional
communication in pair styles, such as pair_style EAM. "Fix/comm" controls
additional communication in fixes, such as fix SHAKE.
The *comm* keyword is simply a short-cut to set the same value for both
the *comm/exchange* and *comm/forward* and *comm/reverse* keywords.
The *comm* keyword is simply a short-cut to set the same value for all
the comm keywords.
The value options for all 3 keywords are *no* or *host* or *device*\ . A
The value options for the keywords are *no* or *host* or *device*\ . A
value of *no* means to use the standard non-KOKKOS method of
packing/unpacking data for the communication. A value of *host* means to
use the host, typically a multi-core CPU, and perform the
packing/unpacking in parallel with threads. A value of *device* means to
use the device, typically a GPU, to perform the packing/unpacking
operation.
operation. If a value of *host* is used for the *pair/comm/forward* or
*fix/comm/forward* keyword, it will be automatically be changed to *no*
since these keywords don't support *host* mode.
The optimal choice for these keywords depends on the input script and
the hardware used. The *no* value is useful for verifying that the
@ -538,18 +545,18 @@ pack/unpack communicated data. When running small systems on a GPU,
performing the exchange pack/unpack on the host CPU can give speedup
since it reduces the number of CUDA kernel launches.
The *cuda/aware* keyword chooses whether CUDA-aware MPI will be used. When
The *gpu/aware* keyword chooses whether GPU-aware MPI will be used. When
this keyword is set to *on*\ , buffers in GPU memory are passed directly
through MPI send/receive calls. This reduces overhead of first copying
the data to the host CPU. However CUDA-aware MPI is not supported on all
the data to the host CPU. However GPU-aware MPI is not supported on all
systems, which can lead to segmentation faults and would require using a
value of *off*\ . If LAMMPS can safely detect that CUDA-aware MPI is not
value of *off*\ . If LAMMPS can safely detect that GPU-aware MPI is not
available (currently only possible with OpenMPI v2.0.0 or later), then
the *cuda/aware* keyword is automatically set to *off* by default. When
the *cuda/aware* keyword is set to *off* while any of the *comm*
the *gpu/aware* keyword is automatically set to *off* by default. When
the *gpu/aware* keyword is set to *off* while any of the *comm*
keywords are set to *device*\ , the value for these *comm* keywords will
be automatically changed to *no*\ . This setting has no effect if not
running on GPUs or if using only one MPI rank. CUDA-aware MPI is available
running on GPUs or if using only one MPI rank. GPU-aware MPI is available
for OpenMPI 1.8 (or later versions), Mvapich2 1.9 (or later) when the
"MV2_USE_CUDA" environment variable is set to "1", CrayMPI, and IBM
Spectrum MPI when the "-gpu" flag is used.
@ -558,7 +565,7 @@ The *pair/only* keyword can change how the KOKKOS suffix "kk" is applied
when using an accelerator device. By default device acceleration is
always used for all available styles. With *pair/only* set to *on* the
suffix setting will choose device acceleration only for pair styles and
run all other force computations concurrently on the host CPU.
run all other force computations on the host CPU.
The *comm* flags will also automatically be changed to *no*\ . This can
result in better performance for certain configurations and system sizes.
@ -671,8 +678,8 @@ script or via the "-pk intel" :doc:`command-line switch <Run_options>`.
For the KOKKOS package, the option defaults for GPUs are neigh = full,
neigh/qeq = full, newton = off, binsize for GPUs = 2x LAMMPS default
value, comm = device, cuda/aware = on. When LAMMPS can safely detect
that CUDA-aware MPI is not available, the default value of cuda/aware
value, comm = device, gpu/aware = on. When LAMMPS can safely detect
that GPU-aware MPI is not available, the default value of gpu/aware
becomes "off". For CPUs or Xeon Phis, the option defaults are neigh =
half, neigh/qeq = half, newton = on, binsize = 0.0, and comm = no. The
option neigh/thread = on when there are 16K atoms or less on an MPI

View File

@ -152,7 +152,7 @@ The default values for these keywords are
* *chemflag* = 0
* *bnormflag* = 0
* *wselfallflag* = 0
* *chunksize* = 2000
* *chunksize* = 4096
If *quadraticflag* is set to 1, then the SNAP energy expression includes additional quadratic terms
that have been shown to increase the overall accuracy of the potential without much increase
@ -189,8 +189,8 @@ pair style *snap* with the KOKKOS package and is ignored otherwise.
This keyword controls
the number of atoms in each pass used to compute the bispectrum
components and is used to avoid running out of memory. For example
if there are 4000 atoms in the simulation and the *chunksize*
is set to 2000, the bispectrum calculation will be broken up
if there are 8192 atoms in the simulation and the *chunksize*
is set to 4096, the bispectrum calculation will be broken up
into two passes.
Detailed definitions for all the other keywords