+the GPU or USER-OMP packages, for testing or benchmarking purposes.
Additional optional keyword/value pairs can be specified which
determine how Kokkos will use the underlying hardware on your
platform. These settings apply to each MPI task you launch via the
@@ -1616,31 +1603,31 @@ partition screen files file.N.
Use variants of various styles if they exist. The specified style can
-be cuda, gpu, intel, kk, omp, opt, or hybrid. These refer
-to optional packages that LAMMPS can be built with, as described above in
-Section 2.3. The “cuda” style corresponds to the USER-CUDA
-package, the “gpu” style to the GPU package, the “intel” style to the
-USER-INTEL package, the “kk” style to the KOKKOS package, the “opt”
-style to the OPT package, and the “omp” style to the USER-OMP package. The
-hybrid style is the only style that accepts arguments. It allows for two
-packages to be specified. The first package specified is the default and
-will be used if it is available. If no style is available for the first
-package, the style for the second package will be used if available. For
-example, “-suffix hybrid intel omp” will use styles from the USER-INTEL
-package if they are installed and available, but styles for the USER-OMP
-package otherwise.
+be cuda, gpu, intel, kk, omp, opt, or hybrid. These
+refer to optional packages that LAMMPS can be built with, as described
+above in Section 2.3. The “gpu” style corresponds to the
+GPU package, the “intel” style to the USER-INTEL package, the “kk”
+style to the KOKKOS package, the “opt” style to the OPT package, and
+the “omp” style to the USER-OMP package. The hybrid style is the only
+style that accepts arguments. It allows for two packages to be
+specified. The first package specified is the default and will be used
+if it is available. If no style is available for the first package,
+the style for the second package will be used if available. For
+example, “-suffix hybrid intel omp” will use styles from the
+USER-INTEL package if they are installed and available, but styles for
+the USER-OMP package otherwise.
Along with the “-package” command-line switch, this is a convenient
mechanism for invoking accelerator packages and their options without
having to edit an input script.
-As an example, all of the packages provide a pair_style lj/cut variant, with style names lj/cut/cuda,
-lj/cut/gpu, lj/cut/intel, lj/cut/kk, lj/cut/omp, and lj/cut/opt. A
-variant style can be specified explicitly in your input script,
-e.g. pair_style lj/cut/gpu. If the -suffix switch is used the
-specified suffix (cuda,gpu,intel,kk,omp,opt) is automatically appended
-whenever your input script command creates a new
-atom, pair, fix,
-compute, or run style. If the variant
-version does not exist, the standard version is created.
+As an example, all of the packages provide a pair_style lj/cut variant, with style names lj/cut/gpu,
+lj/cut/intel, lj/cut/kk, lj/cut/omp, and lj/cut/opt. A variant style
+can be specified explicitly in your input script, e.g. pair_style
+lj/cut/gpu. If the -suffix switch is used the specified suffix
+(cuda,gpu,intel,kk,omp,opt) is automatically appended whenever your
+input script command creates a new atom,
+pair, fix, compute, or
+run style. If the variant version does not exist,
+the standard version is created.
For the GPU package, using this command-line switch also invokes the
default GPU settings, as if the command “package gpu 1” were used at
the top of your input script. These settings can be changed by using
diff --git a/doc/html/_sources/Section_accelerate.txt b/doc/html/_sources/Section_accelerate.txt
index eaa5e34e9d..e426b591b8 100644
--- a/doc/html/_sources/Section_accelerate.txt
+++ b/doc/html/_sources/Section_accelerate.txt
@@ -15,12 +15,11 @@ multi-core CPUs, GPUs, and Intel Xeon Phi coprocessors.
* 5.1 :ref:`Measuring performance `
* 5.2 :ref:`Algorithms and code options to boost performace `
* 5.3 :ref:`Accelerator packages with optimized styles `
-* 5.3.1 :doc:`USER-CUDA package `
-* 5.3.2 :doc:`GPU package `
-* 5.3.3 :doc:`USER-INTEL package `
-* 5.3.4 :doc:`KOKKOS package `
-* 5.3.5 :doc:`USER-OMP package `
-* 5.3.6 :doc:`OPT package `
+* 5.3.1 :doc:`GPU package `
+* 5.3.2 :doc:`USER-INTEL package `
+* 5.3.3 :doc:`KOKKOS package `
+* 5.3.4 :doc:`USER-OMP package `
+* 5.3.5 :doc:`OPT package `
* 5.4 :ref:`Comparison of various accelerator packages `
The `Benchmark page `_ of the LAMMPS
web site gives performance results for the various accelerator
@@ -164,8 +163,6 @@ overview of packages is give in :doc:`Section packages `.
These are the accelerator packages
currently in LAMMPS, either as standard or user packages:
-+--------------------------------------+------------------------------------------------+
-| :doc:`USER-CUDA ` | for NVIDIA GPUs |
+--------------------------------------+------------------------------------------------+
| :doc:`GPU ` | for NVIDIA GPUs as well as OpenCL support |
+--------------------------------------+------------------------------------------------+
@@ -184,7 +181,7 @@ three kinds of hardware, via the listed packages:
+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| Many-core CPUs | :doc:`USER-INTEL `, :doc:`KOKKOS `, :doc:`USER-OMP `, :doc:`OPT ` packages |
+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
-| NVIDIA GPUs | :doc:`USER-CUDA `, :doc:`GPU `, :doc:`KOKKOS ` packages |
+| NVIDIA GPUs | :doc:`GPU `, :doc:`KOKKOS ` packages |
+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| Intel Phi | :doc:`USER-INTEL `, :doc:`KOKKOS ` packages |
+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
@@ -204,7 +201,6 @@ same, except for precision and round-off effects.
For example, all of these styles are accelerated variants of the
Lennard-Jones :doc:`pair_style lj/cut `:
-* :doc:`pair_style lj/cut/cuda `
* :doc:`pair_style lj/cut/gpu `
* :doc:`pair_style lj/cut/intel `
* :doc:`pair_style lj/cut/kk `
@@ -223,7 +219,7 @@ package and are explained in the individual accelerator doc pages,
listed above:
+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
-| build the accelerator library | only for USER-CUDA and GPU packages |
+| build the accelerator library | only for GPU package |
+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
| install the accelerator package | make yes-opt, make yes-user-intel, etc |
+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
@@ -236,7 +232,7 @@ listed above:
mpirun -np 32 lmp_machine -in in.script |
+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
| enable the accelerator package | via "-c on" and "-k on" :ref:`command-line switches `,
- only for USER-CUDA and KOKKOS packages |
+ only for KOKKOS package |
+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
| set any needed options for the package | via "-pk" :ref:`command-line switch ` or
:doc:`package ` command,
@@ -270,18 +266,17 @@ script.
These are the exceptions. You cannot build a single executable with:
* both the USER-INTEL Phi and KOKKOS Phi options
-* the USER-INTEL Phi or Kokkos Phi option, and either the USER-CUDA or GPU packages
+* the USER-INTEL Phi or Kokkos Phi option, and the GPU package
See the examples/accelerate/README and make.list files for sample
Make.py commands that build LAMMPS with any or all of the accelerator
packages. As an example, here is a command that builds with all the
-GPU related packages installed (USER-CUDA, GPU, KOKKOS with Cuda),
-including settings to build the needed auxiliary USER-CUDA and GPU
-libraries for Kepler GPUs:
+GPU related packages installed (GPU, KOKKOS with Cuda), including
+settings to build the needed auxiliary GPU libraries for Kepler GPUs:
.. parsed-literal::
- Make.py -j 16 -p omp gpu cuda kokkos -cc nvcc wrap=mpi -cuda mode=double arch=35 -gpu mode=double arch=35 \ -kokkos cuda arch=35 lib-all file mpi
+ Make.py -j 16 -p omp gpu kokkos -cc nvcc wrap=mpi -gpu mode=double arch=35 -kokkos cuda arch=35 lib-all file mpi
The examples/accelerate directory also has input scripts that can be
used with all of the accelerator packages. See its README file for
@@ -300,10 +295,9 @@ size and number of compute nodes, on different hardware platforms.
Here is a brief summary of what the various packages provide. Details
are in the individual accelerator sections.
-* Styles with a "cuda" or "gpu" suffix are part of the USER-CUDA or GPU
- packages, and can be run on NVIDIA GPUs. The speed-up on a GPU
- depends on a variety of factors, discussed in the accelerator
- sections.
+* Styles with a "gpu" suffix are part of the GPU package, and can be run
+ on NVIDIA GPUs. The speed-up on a GPU depends on a variety of
+ factors, discussed in the accelerator sections.
* Styles with an "intel" suffix are part of the USER-INTEL
package. These styles support vectorized single and mixed precision
calculations, in addition to full double precision. In extreme cases,
@@ -364,28 +358,25 @@ section below for examples where this has been done.
* The GPU package allows you to assign multiple CPUs (cores) to a single
GPU (a common configuration for "hybrid" nodes that contain multicore
- CPU(s) and GPU(s)) and works effectively in this mode. The USER-CUDA
- package does not allow this; you can only use one CPU per GPU.
+ CPU(s) and GPU(s)) and works effectively in this mode.
* The GPU package moves per-atom data (coordinates, forces)
- back-and-forth between the CPU and GPU every timestep. The USER-CUDA
- package only does this on timesteps when a CPU calculation is required
- (e.g. to invoke a fix or compute that is non-GPU-ized). Hence, if you
- can formulate your input script to only use GPU-ized fixes and
- computes, and avoid doing I/O too often (thermo output, dump file
- snapshots, restart files), then the data transfer cost of the
- USER-CUDA package can be very low, causing it to run faster than the
+ back-and-forth between the CPU and GPU every timestep. The
+ KOKKOS/CUDA package only does this on timesteps when a CPU calculation
+ is required (e.g. to invoke a fix or compute that is non-GPU-ized).
+ Hence, if you can formulate your input script to only use GPU-ized
+ fixes and computes, and avoid doing I/O too often (thermo output, dump
+ file snapshots, restart files), then the data transfer cost of the
+ KOKKOS/CUDA package can be very low, causing it to run faster than the
GPU package.
-* The GPU package is often faster than the USER-CUDA package, if the
+* The GPU package is often faster than the KOKKOS/CUDA package, if the
number of atoms per GPU is smaller. The crossover point, in terms of
- atoms/GPU at which the USER-CUDA package becomes faster depends
+ atoms/GPU at which the KOKKOS/CUDA package becomes faster depends
strongly on the pair style. For example, for a simple Lennard Jones
system the crossover (in single precision) is often about 50K-100K
atoms per GPU. When performing double precision calculations the
crossover point can be significantly smaller.
* Both packages compute bonded interactions (bonds, angles, etc) on the
- CPU. This means a model with bonds will force the USER-CUDA package
- to transfer per-atom data back-and-forth between the CPU and GPU every
- timestep. If the GPU package is running with several MPI processes
+ CPU. If the GPU package is running with several MPI processes
assigned to one GPU, the cost of computing the bonded interactions is
spread across more CPUs and hence the GPU package can run faster.
* When using the GPU package with multiple CPUs assigned to one GPU, its
@@ -398,32 +389,9 @@ section below for examples where this has been done.
**Differences between the two packages:**
* The GPU package accelerates only pair force, neighbor list, and PPPM
- calculations. The USER-CUDA package currently supports a wider range
- of pair styles and can also accelerate many fix styles and some
- compute styles, as well as neighbor list and PPPM calculations.
-* The USER-CUDA package does not support acceleration for minimization.
-* The USER-CUDA package does not support hybrid pair styles.
-* The USER-CUDA package can order atoms in the neighbor list differently
- from run to run resulting in a different order for force accumulation.
-* The USER-CUDA package has a limit on the number of atom types that can be
- used in a simulation.
+ calculations.
* The GPU package requires neighbor lists to be built on the CPU when using
exclusion lists or a triclinic simulation box.
-* The GPU package uses more GPU memory than the USER-CUDA package. This
- is generally not a problem since typical runs are computation-limited
- rather than memory-limited.
-Examples
-""""""""
-
-The LAMMPS distribution has two directories with sample input scripts
-for the GPU and USER-CUDA packages.
-
-* lammps/examples/gpu = GPU package files
-* lammps/examples/USER/cuda = USER-CUDA package files
-
-These contain input scripts for identical systems, so they can be used
-to benchmark the performance of both packages on your system.
-
.. _lws: http://lammps.sandia.gov
.. _ld: Manual.html
diff --git a/doc/html/_sources/Section_commands.txt b/doc/html/_sources/Section_commands.txt
index 5b5632f5d1..f19aa4047c 100644
--- a/doc/html/_sources/Section_commands.txt
+++ b/doc/html/_sources/Section_commands.txt
@@ -454,23 +454,23 @@ See the :doc:`fix ` command for one-line descriptions of each style
or click on the style itself for a full description. Some of the
styles have accelerated versions, which can be used if LAMMPS is built
with the :doc:`appropriate accelerated package `.
-This is indicated by additional letters in parenthesis: c = USER-CUDA,
-g = GPU, i = USER-INTEL, k = KOKKOS, o = USER-OMP, t = OPT.
+This is indicated by additional letters in parenthesis: g = GPU, i =
+USER-INTEL, k = KOKKOS, o = USER-OMP, t = OPT.
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
-| :doc:`adapt ` | :doc:`addforce (c) ` | :doc:`append/atoms ` | :doc:`atom/swap ` | :doc:`aveforce (c) ` | :doc:`ave/atom ` | :doc:`ave/chunk ` | :doc:`ave/correlate ` |
+| :doc:`adapt ` | :doc:`addforce ` | :doc:`append/atoms ` | :doc:`atom/swap ` | :doc:`aveforce ` | :doc:`ave/atom ` | :doc:`ave/chunk ` | :doc:`ave/correlate ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
| :doc:`ave/histo ` | :doc:`ave/histo/weight ` | :doc:`ave/time ` | :doc:`balance ` | :doc:`bond/break ` | :doc:`bond/create ` | :doc:`bond/swap ` | :doc:`box/relax ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
-| :doc:`deform (k) ` | :doc:`deposit ` | :doc:`drag ` | :doc:`dt/reset ` | :doc:`efield ` | :doc:`ehex ` | :doc:`enforce2d (c) ` | :doc:`evaporate ` |
+| :doc:`deform (k) ` | :doc:`deposit ` | :doc:`drag ` | :doc:`dt/reset ` | :doc:`efield ` | :doc:`ehex ` | :doc:`enforce2d ` | :doc:`evaporate ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
-| :doc:`external ` | :doc:`freeze (c) ` | :doc:`gcmc ` | :doc:`gld ` | :doc:`gravity (co) ` | :doc:`heat ` | :doc:`indent ` | :doc:`langevin (k) ` |
+| :doc:`external ` | :doc:`freeze ` | :doc:`gcmc ` | :doc:`gld ` | :doc:`gravity (o) ` | :doc:`heat ` | :doc:`indent ` | :doc:`langevin (k) ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
| :doc:`lineforce ` | :doc:`momentum ` | :doc:`move ` | :doc:`msst ` | :doc:`neb ` | :doc:`nph (ko) ` | :doc:`nphug (o) ` | :doc:`nph/asphere (o) ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
-| :doc:`nph/body ` | :doc:`nph/sphere (o) ` | :doc:`npt (ckio) ` | :doc:`npt/asphere (o) ` | :doc:`npt/body ` | :doc:`npt/sphere (o) ` | :doc:`nve (ckio) ` | :doc:`nve/asphere (i) ` |
+| :doc:`nph/body ` | :doc:`nph/sphere (o) ` | :doc:`npt (kio) ` | :doc:`npt/asphere (o) ` | :doc:`npt/body ` | :doc:`npt/sphere (o) ` | :doc:`nve (kio) ` | :doc:`nve/asphere (i) ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
-| :doc:`nve/asphere/noforce ` | :doc:`nve/body ` | :doc:`nve/limit ` | :doc:`nve/line ` | :doc:`nve/noforce ` | :doc:`nve/sphere (o) ` | :doc:`nve/tri ` | :doc:`nvt (ciko) ` |
+| :doc:`nve/asphere/noforce ` | :doc:`nve/body ` | :doc:`nve/limit ` | :doc:`nve/line ` | :doc:`nve/noforce ` | :doc:`nve/sphere (o) ` | :doc:`nve/tri ` | :doc:`nvt (iko) ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
| :doc:`nvt/asphere (o) ` | :doc:`nvt/body ` | :doc:`nvt/sllod (io) ` | :doc:`nvt/sphere (o) ` | :doc:`oneway ` | :doc:`orient/fcc ` | :doc:`planeforce ` | :doc:`poems ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
@@ -480,11 +480,11 @@ g = GPU, i = USER-INTEL, k = KOKKOS, o = USER-OMP, t = OPT.
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
| :doc:`rigid/npt (o) ` | :doc:`rigid/nve (o) ` | :doc:`rigid/nvt (o) ` | :doc:`rigid/small (o) ` | :doc:`rigid/small/nph ` | :doc:`rigid/small/npt ` | :doc:`rigid/small/nve ` | :doc:`rigid/small/nvt ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
-| :doc:`setforce (ck) ` | :doc:`shake (c) ` | :doc:`spring ` | :doc:`spring/rg ` | :doc:`spring/self ` | :doc:`srd ` | :doc:`store/force ` | :doc:`store/state ` |
+| :doc:`setforce (k) ` | :doc:`shake ` | :doc:`spring ` | :doc:`spring/rg ` | :doc:`spring/self ` | :doc:`srd ` | :doc:`store/force ` | :doc:`store/state ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
-| :doc:`temp/berendsen (c) ` | :doc:`temp/csld ` | :doc:`temp/csvr ` | :doc:`temp/rescale (c) ` | :doc:`tfmc ` | :doc:`thermal/conductivity ` | :doc:`tmd ` | :doc:`ttm ` |
+| :doc:`temp/berendsen ` | :doc:`temp/csld ` | :doc:`temp/csvr ` | :doc:`temp/rescale ` | :doc:`tfmc ` | :doc:`thermal/conductivity ` | :doc:`tmd ` | :doc:`ttm ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
-| :doc:`tune/kspace ` | :doc:`vector ` | :doc:`viscosity ` | :doc:`viscous (c) ` | :doc:`wall/colloid ` | :doc:`wall/gran ` | :doc:`wall/harmonic ` | :doc:`wall/lj1043 ` |
+| :doc:`tune/kspace ` | :doc:`vector ` | :doc:`viscosity ` | :doc:`viscous ` | :doc:`wall/colloid ` | :doc:`wall/gran ` | :doc:`wall/harmonic ` | :doc:`wall/lj1043 ` |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
| :doc:`wall/lj126 ` | :doc:`wall/lj93 ` | :doc:`wall/piston ` | :doc:`wall/reflect (k) ` | :doc:`wall/region ` | :doc:`wall/srd ` | | |
+------------------------------------------------------+----------------------------------------------+----------------------------------------+--------------------------------------------+--------------------------------------+--------------------------------------------------------+--------------------------------------+------------------------------------------+
@@ -523,7 +523,7 @@ See the :doc:`compute ` command for one-line descriptions of
each style or click on the style itself for a full description. Some
of the styles have accelerated versions, which can be used if LAMMPS
is built with the :doc:`appropriate accelerated package `. This is indicated by additional
-letters in parenthesis: c = USER-CUDA, g = GPU, i = USER-INTEL, k =
+letters in parenthesis: g = GPU, i = USER-INTEL, k =
KOKKOS, o = USER-OMP, t = OPT.
+------------------------------------------------+------------------------------------------------+--------------------------------------------------+--------------------------------------------------+----------------------------------------------------+----------------------------------------------------------+
@@ -541,13 +541,13 @@ KOKKOS, o = USER-OMP, t = OPT.
+------------------------------------------------+------------------------------------------------+--------------------------------------------------+--------------------------------------------------+----------------------------------------------------+----------------------------------------------------------+
| :doc:`msd ` | :doc:`msd/chunk ` | :doc:`msd/nongauss ` | :doc:`omega/chunk ` | :doc:`orientorder/atom ` | :doc:`pair ` |
+------------------------------------------------+------------------------------------------------+--------------------------------------------------+--------------------------------------------------+----------------------------------------------------+----------------------------------------------------------+
-| :doc:`pair/local ` | :doc:`pe (c) ` | :doc:`pe/atom ` | :doc:`plasticity/atom ` | :doc:`pressure (c) ` | :doc:`property/atom ` |
+| :doc:`pair/local ` | :doc:`pe ` | :doc:`pe/atom ` | :doc:`plasticity/atom ` | :doc:`pressure ` | :doc:`property/atom ` |
+------------------------------------------------+------------------------------------------------+--------------------------------------------------+--------------------------------------------------+----------------------------------------------------+----------------------------------------------------------+
| :doc:`property/local ` | :doc:`property/chunk ` | :doc:`rdf ` | :doc:`reduce