Merge branch 'master' into tersoff-shift
This commit is contained in:
@ -807,7 +807,7 @@ compiling LAMMPS with Python version 3.6 or later.
|
||||
the ``cythonize`` command in case the corresponding .pyx file(s) were
|
||||
modified. You may need to modify ``lib/python/Makefile.lammps``
|
||||
if the LAMMPS build fails.
|
||||
To manually enforce building MLIAP with Python support enabled,
|
||||
To manually enforce building MLIAP with Python support enabled,
|
||||
you can add
|
||||
``-DMLIAP_PYTHON`` to the ``LMP_INC`` variable in your machine makefile.
|
||||
You may have to manually run the ``cythonize`` command on .pyx file(s)
|
||||
|
||||
@ -1,5 +1,4 @@
|
||||
Include packages in build
|
||||
|
||||
=========================
|
||||
|
||||
In LAMMPS, a package is a group of files that enable a specific set of
|
||||
|
||||
@ -38,14 +38,14 @@ produce an executable compatible with a specific hardware.
|
||||
:class: note
|
||||
|
||||
Kokkos with CUDA currently implicitly assumes that the MPI library is
|
||||
CUDA-aware. This is not always the case, especially when using
|
||||
GPU-aware. This is not always the case, especially when using
|
||||
pre-compiled MPI libraries provided by a Linux distribution. This is
|
||||
not a problem when using only a single GPU with a single MPI
|
||||
rank. When running with multiple MPI ranks, you may see segmentation
|
||||
faults without CUDA-aware MPI support. These can be avoided by adding
|
||||
the flags :doc:`-pk kokkos cuda/aware off <Run_options>` to the
|
||||
faults without GPU-aware MPI support. These can be avoided by adding
|
||||
the flags :doc:`-pk kokkos gpu/aware off <Run_options>` to the
|
||||
LAMMPS command line or by using the command :doc:`package kokkos
|
||||
cuda/aware off <package>` in the input file.
|
||||
gpu/aware off <package>` in the input file.
|
||||
|
||||
.. admonition:: AMD GPU support
|
||||
:class: note
|
||||
@ -242,8 +242,8 @@ case, also packing/unpacking communication buffers on the host may give
|
||||
speedup (see the KOKKOS :doc:`package <package>` command). Using CUDA MPS
|
||||
is recommended in this scenario.
|
||||
|
||||
Using a CUDA-aware MPI library is highly recommended. CUDA-aware MPI use can be
|
||||
avoided by using :doc:`-pk kokkos cuda/aware no <package>`. As above for
|
||||
Using a GPU-aware MPI library is highly recommended. GPU-aware MPI use can be
|
||||
avoided by using :doc:`-pk kokkos gpu/aware off <package>`. As above for
|
||||
multi-core CPUs (and no GPU), if N is the number of physical cores/node,
|
||||
then the number of MPI tasks/node should not exceed N.
|
||||
|
||||
|
||||
@ -115,8 +115,8 @@ The optional keyword *chunksize* is only applicable when using the
|
||||
the KOKKOS package and is ignored otherwise. This keyword controls
|
||||
the number of atoms in each pass used to compute the bond-orientational
|
||||
order parameters and is used to avoid running out of memory. For example
|
||||
if there are 4000 atoms in the simulation and the *chunksize*
|
||||
is set to 2000, the parameter calculation will be broken up
|
||||
if there are 32768 atoms in the simulation and the *chunksize*
|
||||
is set to 16384, the parameter calculation will be broken up
|
||||
into two passes.
|
||||
|
||||
The value of :math:`Q_l` is set to zero for atoms not in the
|
||||
@ -193,7 +193,7 @@ Default
|
||||
|
||||
The option defaults are *cutoff* = pair style cutoff, *nnn* = 12,
|
||||
*degrees* = 5 4 6 8 10 12 i.e. :math:`Q_4`, :math:`Q_6`, :math:`Q_8`, :math:`Q_{10}`, and :math:`Q_{12}`,
|
||||
*wl* = no, *wl/hat* = no, *components* off, and *chunksize* = 2000
|
||||
*wl* = no, *wl/hat* = no, *components* off, and *chunksize* = 16384
|
||||
|
||||
----------
|
||||
|
||||
|
||||
@ -93,7 +93,7 @@ from a compute, fix, or variable, then see the :doc:`fix ave/chunk <fix_ave_chun
|
||||
:doc:`fix ave/histo <fix_ave_histo>` commands. If you wish to convert a
|
||||
per-atom quantity into a single global value, see the :doc:`compute reduce <compute_reduce>` command.
|
||||
|
||||
The input values must either be all scalars. What kinds of
|
||||
The input values must be all scalars. What kinds of
|
||||
correlations between input values are calculated is determined by the
|
||||
*type* keyword as discussed below.
|
||||
|
||||
|
||||
@ -68,7 +68,7 @@ Syntax
|
||||
*no_affinity* values = none
|
||||
*kokkos* args = keyword value ...
|
||||
zero or more keyword/value pairs may be appended
|
||||
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* or *comm/reverse* or *cuda/aware* or *pair/only*
|
||||
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* *pair/comm/forward* *fix/comm/forward* or *comm/reverse* or *gpu/aware* or *pair/only*
|
||||
*neigh* value = *full* or *half*
|
||||
full = full neighbor list
|
||||
half = half neighbor list built in thread-safe manner
|
||||
@ -84,16 +84,18 @@ Syntax
|
||||
*binsize* value = size
|
||||
size = bin size for neighbor list construction (distance units)
|
||||
*comm* value = *no* or *host* or *device*
|
||||
use value for comm/exchange and comm/forward and comm/reverse
|
||||
use value for comm/exchange and comm/forward and pair/comm/forward and fix/comm/forward and comm/reverse
|
||||
*comm/exchange* value = *no* or *host* or *device*
|
||||
*comm/forward* value = *no* or *host* or *device*
|
||||
*pair/comm/forward* value = *no* or *device*
|
||||
*fix/comm/forward* value = *no* or *device*
|
||||
*comm/reverse* value = *no* or *host* or *device*
|
||||
no = perform communication pack/unpack in non-KOKKOS mode
|
||||
host = perform pack/unpack on host (e.g. with OpenMP threading)
|
||||
device = perform pack/unpack on device (e.g. on GPU)
|
||||
*cuda/aware* = *off* or *on*
|
||||
off = do not use CUDA-aware MPI
|
||||
on = use CUDA-aware MPI (default)
|
||||
*gpu/aware* = *off* or *on*
|
||||
off = do not use GPU-aware MPI
|
||||
on = use GPU-aware MPI (default)
|
||||
*pair/only* = *off* or *on*
|
||||
off = use device acceleration (e.g. GPU) for all available styles in the KOKKOS package (default)
|
||||
on = use device acceleration only for pair styles (and host acceleration for others)
|
||||
@ -498,7 +500,8 @@ because the GPU is faster at performing pairwise interactions, then this
|
||||
rule of thumb may give too large a binsize and the default should be
|
||||
overridden with a smaller value.
|
||||
|
||||
The *comm* and *comm/exchange* and *comm/forward* and *comm/reverse*
|
||||
The *comm* and *comm/exchange* and *comm/forward* and *pair/comm/forward*
|
||||
and *fix/comm/forward* and comm/reverse*
|
||||
keywords determine whether the host or device performs the packing and
|
||||
unpacking of data when communicating per-atom data between processors.
|
||||
"Exchange" communication happens only on timesteps that neighbor lists
|
||||
@ -506,18 +509,22 @@ are rebuilt. The data is only for atoms that migrate to new processors.
|
||||
"Forward" communication happens every timestep. "Reverse" communication
|
||||
happens every timestep if the *newton* option is on. The data is for
|
||||
atom coordinates and any other atom properties that needs to be updated
|
||||
for ghost atoms owned by each processor.
|
||||
for ghost atoms owned by each processor. "Pair/comm" controls additional
|
||||
communication in pair styles, such as pair_style EAM. "Fix/comm" controls
|
||||
additional communication in fixes, such as fix SHAKE.
|
||||
|
||||
The *comm* keyword is simply a short-cut to set the same value for both
|
||||
the *comm/exchange* and *comm/forward* and *comm/reverse* keywords.
|
||||
The *comm* keyword is simply a short-cut to set the same value for all
|
||||
the comm keywords.
|
||||
|
||||
The value options for all 3 keywords are *no* or *host* or *device*\ . A
|
||||
The value options for the keywords are *no* or *host* or *device*\ . A
|
||||
value of *no* means to use the standard non-KOKKOS method of
|
||||
packing/unpacking data for the communication. A value of *host* means to
|
||||
use the host, typically a multi-core CPU, and perform the
|
||||
packing/unpacking in parallel with threads. A value of *device* means to
|
||||
use the device, typically a GPU, to perform the packing/unpacking
|
||||
operation.
|
||||
operation. If a value of *host* is used for the *pair/comm/forward* or
|
||||
*fix/comm/forward* keyword, it will be automatically be changed to *no*
|
||||
since these keywords don't support *host* mode.
|
||||
|
||||
The optimal choice for these keywords depends on the input script and
|
||||
the hardware used. The *no* value is useful for verifying that the
|
||||
@ -538,18 +545,18 @@ pack/unpack communicated data. When running small systems on a GPU,
|
||||
performing the exchange pack/unpack on the host CPU can give speedup
|
||||
since it reduces the number of CUDA kernel launches.
|
||||
|
||||
The *cuda/aware* keyword chooses whether CUDA-aware MPI will be used. When
|
||||
The *gpu/aware* keyword chooses whether GPU-aware MPI will be used. When
|
||||
this keyword is set to *on*\ , buffers in GPU memory are passed directly
|
||||
through MPI send/receive calls. This reduces overhead of first copying
|
||||
the data to the host CPU. However CUDA-aware MPI is not supported on all
|
||||
the data to the host CPU. However GPU-aware MPI is not supported on all
|
||||
systems, which can lead to segmentation faults and would require using a
|
||||
value of *off*\ . If LAMMPS can safely detect that CUDA-aware MPI is not
|
||||
value of *off*\ . If LAMMPS can safely detect that GPU-aware MPI is not
|
||||
available (currently only possible with OpenMPI v2.0.0 or later), then
|
||||
the *cuda/aware* keyword is automatically set to *off* by default. When
|
||||
the *cuda/aware* keyword is set to *off* while any of the *comm*
|
||||
the *gpu/aware* keyword is automatically set to *off* by default. When
|
||||
the *gpu/aware* keyword is set to *off* while any of the *comm*
|
||||
keywords are set to *device*\ , the value for these *comm* keywords will
|
||||
be automatically changed to *no*\ . This setting has no effect if not
|
||||
running on GPUs or if using only one MPI rank. CUDA-aware MPI is available
|
||||
running on GPUs or if using only one MPI rank. GPU-aware MPI is available
|
||||
for OpenMPI 1.8 (or later versions), Mvapich2 1.9 (or later) when the
|
||||
"MV2_USE_CUDA" environment variable is set to "1", CrayMPI, and IBM
|
||||
Spectrum MPI when the "-gpu" flag is used.
|
||||
@ -558,7 +565,7 @@ The *pair/only* keyword can change how the KOKKOS suffix "kk" is applied
|
||||
when using an accelerator device. By default device acceleration is
|
||||
always used for all available styles. With *pair/only* set to *on* the
|
||||
suffix setting will choose device acceleration only for pair styles and
|
||||
run all other force computations concurrently on the host CPU.
|
||||
run all other force computations on the host CPU.
|
||||
The *comm* flags will also automatically be changed to *no*\ . This can
|
||||
result in better performance for certain configurations and system sizes.
|
||||
|
||||
@ -671,8 +678,8 @@ script or via the "-pk intel" :doc:`command-line switch <Run_options>`.
|
||||
|
||||
For the KOKKOS package, the option defaults for GPUs are neigh = full,
|
||||
neigh/qeq = full, newton = off, binsize for GPUs = 2x LAMMPS default
|
||||
value, comm = device, cuda/aware = on. When LAMMPS can safely detect
|
||||
that CUDA-aware MPI is not available, the default value of cuda/aware
|
||||
value, comm = device, gpu/aware = on. When LAMMPS can safely detect
|
||||
that GPU-aware MPI is not available, the default value of gpu/aware
|
||||
becomes "off". For CPUs or Xeon Phis, the option defaults are neigh =
|
||||
half, neigh/qeq = half, newton = on, binsize = 0.0, and comm = no. The
|
||||
option neigh/thread = on when there are 16K atoms or less on an MPI
|
||||
|
||||
@ -152,7 +152,7 @@ The default values for these keywords are
|
||||
* *chemflag* = 0
|
||||
* *bnormflag* = 0
|
||||
* *wselfallflag* = 0
|
||||
* *chunksize* = 2000
|
||||
* *chunksize* = 4096
|
||||
|
||||
If *quadraticflag* is set to 1, then the SNAP energy expression includes additional quadratic terms
|
||||
that have been shown to increase the overall accuracy of the potential without much increase
|
||||
@ -189,8 +189,8 @@ pair style *snap* with the KOKKOS package and is ignored otherwise.
|
||||
This keyword controls
|
||||
the number of atoms in each pass used to compute the bispectrum
|
||||
components and is used to avoid running out of memory. For example
|
||||
if there are 4000 atoms in the simulation and the *chunksize*
|
||||
is set to 2000, the bispectrum calculation will be broken up
|
||||
if there are 8192 atoms in the simulation and the *chunksize*
|
||||
is set to 4096, the bispectrum calculation will be broken up
|
||||
into two passes.
|
||||
|
||||
Detailed definitions for all the other keywords
|
||||
|
||||
Reference in New Issue
Block a user