Merge branch 'master' into collected-small-changes

This commit is contained in:
Axel Kohlmeyer
2021-08-27 14:58:42 -04:00
19 changed files with 822 additions and 533 deletions

View File

@ -22,4 +22,5 @@ page.
Build_extras Build_extras
Build_manual Build_manual
Build_windows Build_windows
Build_diskspace
Build_development Build_development

View File

@ -0,0 +1,45 @@
Notes for saving disk space when building LAMMPS from source
------------------------------------------------------------
LAMMPS is a large software project with a large number of source files,
extensive documentation, and a large collection of example files.
When downloading LAMMPS by cloning the
`git repository from GitHub <https://github.com/lammps/lammps>`_ this
will by default also download the entire commit history since September 2006.
Compiling LAMMPS will add the storage requirements of the compiled object
files and libraries to the tally.
In a user account on an HPC cluster with filesystem quotas or in other
environments with restricted disk space capacity it may be needed to
reduce the storage requirements. Here are some suggestions:
- Create a so-called shallow repository by cloning only the last commit
instead of the full project history by using ``git clone git@github.com:lammps/lammps --depth=1 --branch=master``.
This reduces the downloaded size to about half. With ``--depth=1`` it is not possible to check out different
versions/branches of LAMMPS, using ``--depth=1000`` will make multiple recent versions available at little
extra storage needs (the entire git history had nearly 30,000 commits in fall 2021).
- Download a tar archive from either the `download section on the LAMMPS homepage <https://www.lammps.org/download.html>`_
or from the `LAMMPS releases page on GitHub <https://github.com/lammps/lammps/releases>`_ these will not
contain the git history at all.
- Build LAMMPS without the debug flag (remove ``-g`` from the machine makefile or use ``-DCMAKE_BUILD_TYPE=Release``)
or use the ``strip`` command on the LAMMPS executable when no more debugging would be needed. The strip command
may also be applied to the LAMMPS shared library. The static library may be deleted entirely.
- Delete compiled object files and libraries after copying the LAMMPS executable to a permanent location.
When using the traditional build process, one may use ``make clean-<machine>`` or ``make clean-all``
to delete object files in the src folder. For CMake based builds, one may use ``make clean`` or just
delete the entire build folder.
- The folders containing the documentation tree (doc), the examples (examples) are not needed to build and
run LAMMPS and can be safely deleted. Some files in the potentials folder are large and may be deleted,
if not needed. The largest of those files (occupying about 120 MBytes combined) will only be downloaded on
demand, when the corresponding package is installed.
- When using the CMake build procedure, the compilation can be done on a (local) scratch storage that will not
count toward the quota. A local scratch file system may offer the additional benefit of speeding up creating
object files and linking with libraries compared to a networked file system. Also with CMake (and unlike with
the traditional make) it is possible to compile LAMMPS executables with different settings and packages included
from the same source tree since all the configuration information is stored in the build folder. So it is
not necessary to have multiple copies of LAMMPS.

View File

@ -29,7 +29,7 @@ The following folks deserve special recognition. Many of the packages
they have written are unique for an MD code and LAMMPS would not be as they have written are unique for an MD code and LAMMPS would not be as
general-purpose as it is without their expertise and efforts. general-purpose as it is without their expertise and efforts.
* Metin Aktulga (MSU), REAXFF package for C version of ReaxFF * Metin Aktulga (MSU), REAXFF package for C/C++ version of ReaxFF
* Mike Brown (Intel), GPU and INTEL packages * Mike Brown (Intel), GPU and INTEL packages
* Colin Denniston (U Western Ontario), LATBOLTZ package * Colin Denniston (U Western Ontario), LATBOLTZ package
* Georg Ganzenmuller (EMI), MACHDYN and SPH packages * Georg Ganzenmuller (EMI), MACHDYN and SPH packages
@ -37,9 +37,10 @@ general-purpose as it is without their expertise and efforts.
* Reese Jones (Sandia) and colleagues, ATC package for atom/continuum coupling * Reese Jones (Sandia) and colleagues, ATC package for atom/continuum coupling
* Christoph Kloss (DCS Computing), LIGGGHTS code for granular materials, built on top of LAMMPS * Christoph Kloss (DCS Computing), LIGGGHTS code for granular materials, built on top of LAMMPS
* Rudra Mukherjee (JPL), POEMS package for articulated rigid body motion * Rudra Mukherjee (JPL), POEMS package for articulated rigid body motion
* Trung Ngyuen (Northwestern U), GPU and RIGID and BODY packages * Trung Ngyuen (Northwestern U), GPU, RIGID, BODY, and DIELECTRIC packages
* Mike Parks (Sandia), PERI package for Peridynamics * Mike Parks (Sandia), PERI package for Peridynamics
* Roy Pollock (LLNL), Ewald and PPPM solvers * Roy Pollock (LLNL), Ewald and PPPM solvers
* Julien Tranchida (Sandia), SPIN package
* Christian Trott (Sandia), CUDA and KOKKOS packages * Christian Trott (Sandia), CUDA and KOKKOS packages
* Ilya Valuev (JIHT), AWPMD package for wave packet MD * Ilya Valuev (JIHT), AWPMD package for wave packet MD
* Greg Wagner (Northwestern U), MEAM package for MEAM potential * Greg Wagner (Northwestern U), MEAM package for MEAM potential

View File

@ -27,19 +27,19 @@ General features
* distributed memory message-passing parallelism (MPI) * distributed memory message-passing parallelism (MPI)
* shared memory multi-threading parallelism (OpenMP) * shared memory multi-threading parallelism (OpenMP)
* spatial decomposition of simulation domain for MPI parallelism * spatial decomposition of simulation domain for MPI parallelism
* particle decomposition inside of spatial decomposition for OpenMP parallelism * particle decomposition inside of spatial decomposition for OpenMP and GPU parallelism
* GPLv2 licensed open-source distribution * GPLv2 licensed open-source distribution
* highly portable C++-11 * highly portable C++-11
* modular code with most functionality in optional packages * modular code with most functionality in optional packages
* only depends on MPI library for basic parallel functionality * only depends on MPI library for basic parallel functionality, MPI stub for serial compilation
* other libraries are optional and only required for specific packages * other libraries are optional and only required for specific packages
* GPU (CUDA and OpenCL), Intel Xeon Phi, and OpenMP support for many code features * GPU (CUDA, OpenCL, HIP, SYCL), Intel Xeon Phi, and OpenMP support for many code features
* easy to extend with new features and functionality * easy to extend with new features and functionality
* runs from an input script * runs from an input script
* syntax for defining and using variables and formulas * syntax for defining and using variables and formulas
* syntax for looping over runs and breaking out of loops * syntax for looping over runs and breaking out of loops
* run one or multiple simulations simultaneously (in parallel) from one script * run one or multiple simulations simultaneously (in parallel) from one script
* build as library, invoke LAMMPS through library interface or provided Python wrapper * build as library, invoke LAMMPS through library interface or provided Python wrapper or SWIG based wrappers
* couple with other codes: LAMMPS calls other code, other code calls LAMMPS, umbrella code calls both * couple with other codes: LAMMPS calls other code, other code calls LAMMPS, umbrella code calls both
.. _particle: .. _particle:
@ -57,9 +57,11 @@ Particle and model types
* granular materials * granular materials
* coarse-grained mesoscale models * coarse-grained mesoscale models
* finite-size spherical and ellipsoidal particles * finite-size spherical and ellipsoidal particles
* finite-size line segment (2d) and triangle (3d) particles * finite-size line segment (2d) and triangle (3d) particles
* finite-size rounded polygons (2d) and polyhedra (3d) particles
* point dipole particles * point dipole particles
* rigid collections of particles * particles with magnetic spin
* rigid collections of n particles
* hybrid combinations of these * hybrid combinations of these
.. _ff: .. _ff:
@ -74,24 +76,28 @@ commands)
* pairwise potentials: Lennard-Jones, Buckingham, Morse, Born-Mayer-Huggins, Yukawa, soft, class 2 (COMPASS), hydrogen bond, tabulated * pairwise potentials: Lennard-Jones, Buckingham, Morse, Born-Mayer-Huggins, Yukawa, soft, class 2 (COMPASS), hydrogen bond, tabulated
* charged pairwise potentials: Coulombic, point-dipole * charged pairwise potentials: Coulombic, point-dipole
* many-body potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, REBO, AIREBO, ReaxFF, COMB, SNAP, Streitz-Mintmire, 3-body polymorphic * many-body potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, REBO, AIREBO, ReaxFF, COMB, Streitz-Mintmire, 3-body polymorphic, BOP, Vashishta
* long-range interactions for charge, point-dipoles, and LJ dispersion: Ewald, Wolf, PPPM (similar to particle-mesh Ewald) * machine learning potentials: SNAP, GAP, ACE, N2P2, RANN, AGNI
* long-range interactions for charge, point-dipoles, and LJ dispersion: Ewald, Wolf, PPPM (similar to particle-mesh Ewald), MSM
* polarization models: :doc:`QEq <fix_qeq>`, :doc:`core/shell model <Howto_coreshell>`, :doc:`Drude dipole model <Howto_drude>` * polarization models: :doc:`QEq <fix_qeq>`, :doc:`core/shell model <Howto_coreshell>`, :doc:`Drude dipole model <Howto_drude>`
* charge equilibration (QEq via dynamic, point, shielded, Slater methods) * charge equilibration (QEq via dynamic, point, shielded, Slater methods)
* coarse-grained potentials: DPD, GayBerne, REsquared, colloidal, DLVO * coarse-grained potentials: DPD, GayBerne, REsquared, colloidal, DLVO
* mesoscopic potentials: granular, Peridynamics, SPH * mesoscopic potentials: granular, Peridynamics, SPH, mesoscopic tubular potential (MESONT)
* semi-empirical potentials: multi-ion generalized pseudopotential theory (MGPT), second moment tight binding + QEq (SMTB-Q), density functional tight-binding (LATTE)
* electron force field (eFF, AWPMD) * electron force field (eFF, AWPMD)
* bond potentials: harmonic, FENE, Morse, nonlinear, class 2, quartic (breakable) * bond potentials: harmonic, FENE, Morse, nonlinear, class 2, quartic (breakable), tabulated
* angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic, class 2 (COMPASS) * angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic, class 2 (COMPASS), tabulated
* dihedral potentials: harmonic, CHARMM, multi-harmonic, helix, class 2 (COMPASS), OPLS * dihedral potentials: harmonic, CHARMM, multi-harmonic, helix, class 2 (COMPASS), OPLS, tabulated
* improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS) * improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS), tabulated
* polymer potentials: all-atom, united-atom, bead-spring, breakable * polymer potentials: all-atom, united-atom, bead-spring, breakable
* water potentials: TIP3P, TIP4P, SPC * water potentials: TIP3P, TIP4P, SPC, SPC/E and variants
* interlayer potentials for graphene and analogues
* metal-organic framework potentials (QuickFF, MO-FF)
* implicit solvent potentials: hydrodynamic lubrication, Debye * implicit solvent potentials: hydrodynamic lubrication, Debye
* force-field compatibility with common CHARMM, AMBER, DREIDING, OPLS, GROMACS, COMPASS options * force-field compatibility with common CHARMM, AMBER, DREIDING, OPLS, GROMACS, COMPASS options
* access to the `OpenKIM Repository <http://openkim.org>`_ of potentials via :doc:`kim command <kim_commands>` * access to the `OpenKIM Repository <http://openkim.org>`_ of potentials via :doc:`kim command <kim_commands>`
* hybrid potentials: multiple pair, bond, angle, dihedral, improper potentials can be used in one simulation * hybrid potentials: multiple pair, bond, angle, dihedral, improper potentials can be used in one simulation
* overlaid potentials: superposition of multiple pair potentials * overlaid potentials: superposition of multiple pair potentials (including many-body) with optional scale factor
.. _create: .. _create:
@ -124,9 +130,10 @@ Ensembles, constraints, and boundary conditions
* harmonic (umbrella) constraint forces * harmonic (umbrella) constraint forces
* rigid body constraints * rigid body constraints
* SHAKE bond and angle constraints * SHAKE bond and angle constraints
* Monte Carlo bond breaking, formation, swapping * motion constraints to manifold surfaces
* Monte Carlo bond breaking, formation, swapping, template based reaction modeling
* atom/molecule insertion and deletion * atom/molecule insertion and deletion
* walls of various kinds * walls of various kinds, static and moving
* non-equilibrium molecular dynamics (NEMD) * non-equilibrium molecular dynamics (NEMD)
* variety of additional boundary conditions and constraints * variety of additional boundary conditions and constraints
@ -150,6 +157,7 @@ Diagnostics
^^^^^^^^^^^ ^^^^^^^^^^^
* see various flavors of the :doc:`fix <fix>` and :doc:`compute <compute>` commands * see various flavors of the :doc:`fix <fix>` and :doc:`compute <compute>` commands
* introspection command for system, simulation, and compile time settings and configurations
.. _output: .. _output:
@ -164,8 +172,9 @@ Output
* parallel I/O of dump and restart files * parallel I/O of dump and restart files
* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc) * per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
* user-defined system-wide (log file) or per-atom (dump file) calculations * user-defined system-wide (log file) or per-atom (dump file) calculations
* spatial and time averaging of per-atom quantities * custom partitioning (chunks) for binning, and static or dynamic grouping of atoms for analysis
* time averaging of system-wide quantities * spatial, time, and per-chunk averaging of per-atom quantities
* time averaging and histogramming of system-wide quantities
* atom snapshots in native, XYZ, XTC, DCD, CFG formats * atom snapshots in native, XYZ, XTC, DCD, CFG formats
.. _replica1: .. _replica1:
@ -178,7 +187,7 @@ Multi-replica models
* :doc:`parallel replica dynamics <prd>` * :doc:`parallel replica dynamics <prd>`
* :doc:`temperature accelerated dynamics <tad>` * :doc:`temperature accelerated dynamics <tad>`
* :doc:`parallel tempering <temper>` * :doc:`parallel tempering <temper>`
* :doc:`path-integral MD <fix_pimd>` * path-integral MD: `first variant <fix_pimd>`, `second variant <fix_ipi>`
* multi-walker collective variables with :doc:`Colvars <fix_colvars>` and :doc:`Plumed <fix_plumed>` * multi-walker collective variables with :doc:`Colvars <fix_colvars>` and :doc:`Plumed <fix_plumed>`
.. _prepost: .. _prepost:
@ -210,11 +219,12 @@ page for details.
These are LAMMPS capabilities which you may not think of as typical These are LAMMPS capabilities which you may not think of as typical
classical MD options: classical MD options:
* :doc:`static <balance>` and :doc:`dynamic load-balancing <fix_balance>` * :doc:`static <balance>` and :doc:`dynamic load-balancing <fix_balance>`, optional with recursive bisectioning decomposition
* :doc:`generalized aspherical particles <Howto_body>` * :doc:`generalized aspherical particles <Howto_body>`
* :doc:`stochastic rotation dynamics (SRD) <fix_srd>` * :doc:`stochastic rotation dynamics (SRD) <fix_srd>`
* :doc:`real-time visualization and interactive MD <fix_imd>` * :doc:`real-time visualization and interactive MD <fix_imd>`, :doc:`built-in renderer for images and movies <dump_image>`
* calculate :doc:`virtual diffraction patterns <compute_xrd>` * calculate :doc:`virtual diffraction patterns <compute_xrd>`
* calculate :doc:`finite temperature phonon dispersion <fix_phonon>` and the :doc:`dynamical matrix of minimized structures <dynamical_matrix>`
* :doc:`atom-to-continuum coupling <fix_atc>` with finite elements * :doc:`atom-to-continuum coupling <fix_atc>` with finite elements
* coupled rigid body integration via the :doc:`POEMS <fix_poems>` library * coupled rigid body integration via the :doc:`POEMS <fix_poems>` library
* :doc:`QM/MM coupling <fix_qmmm>` * :doc:`QM/MM coupling <fix_qmmm>`

View File

@ -1,40 +1,61 @@
LAMMPS open-source license LAMMPS open-source license
-------------------------- --------------------------
LAMMPS is a freely-available open-source code, distributed under the GPL version of LAMMPS
terms of the `GNU Public License Version 2 <gpl_>`_, which means you can ^^^^^^^^^^^^^^^^^^^^^
use or modify the code however you wish for your own purposes, but have
to adhere to certain rules when redistributing it or software derived LAMMPS is an open-source code, available free-of-charge, and distributed
under the terms of the `GNU Public License Version 2 <gpl_>`_ (GPLv2),
which means you can use or modify the code however you wish for your own
purposes, but have to adhere to certain rules when redistributing it -
specifically in binary form - or are distributing software derived
from it or that includes parts of it. from it or that includes parts of it.
LAMMPS comes with no warranty of any kind. As each source file states LAMMPS comes with no warranty of any kind.
in its header, it is a copyrighted code that is distributed free-of-
charge, under the terms of the `GNU Public License Version 2 <gpl_>`_ As each source file states in its header, it is a copyrighted code, and
(GPLv2). This is often referred to as open-source distribution - see thus not in the public domain. For more information about open-source
`www.gnu.org <gnuorg_>`_ or `www.opensource.org <opensource_>`_. The software and open-source distribution, see `www.gnu.org <gnuorg_>`_
legal text of the GPL is in the LICENSE file included in the LAMMPS or `www.opensource.org <opensource_>`_. The legal text of the GPL as it
distribution. applies to LAMMPS is in the LICENSE file included in the LAMMPS distribution.
.. _gpl: https://github.com/lammps/lammps/blob/master/LICENSE .. _gpl: https://github.com/lammps/lammps/blob/master/LICENSE
.. _lgpl: https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html
.. _gnuorg: http://www.gnu.org .. _gnuorg: http://www.gnu.org
.. _opensource: http://www.opensource.org .. _opensource: http://www.opensource.org
Here is a summary of what the GPL means for LAMMPS users: Here is a more specific summary of what the GPL means for LAMMPS users:
(1) Anyone is free to use, modify, or extend LAMMPS in any way they (1) Anyone is free to use, copy, modify, or extend LAMMPS in any way they
choose, including for commercial purposes. choose, including for commercial purposes.
(2) If you **distribute** a modified version of LAMMPS, it must remain (2) If you **distribute** a modified version of LAMMPS, it must remain
open-source, meaning you distribute **all** of it under the terms of open-source, meaning you are required to distribute **all** of it under
the GPL. You should clearly annotate such a code as a derivative version the terms of the GPL. You should clearly annotate such a modified code
of LAMMPS. as a derivative version of LAMMPS.
(3) If you release any code that includes or uses LAMMPS source code, (3) If you release any code that includes or uses LAMMPS source code,
then it must also be open-sourced, meaning you distribute it under then it must also be open-sourced, meaning you distribute it under
the terms of the GPL. the terms of the GPL. You may write code that interfaces LAMMPS to
a differently licensed library. In that case the code that provides
the interface must be licensed GPL, but not necessarily that library
unless you are distributing binaries that require the library to run.
(4) If you give LAMMPS files to someone else, the GPL LICENSE file and (4) If you give LAMMPS files to someone else, the GPL LICENSE file and
source file headers (including the copyright and GPL notices) should source file headers (including the copyright and GPL notices) should
remain part of the code. remain part of the code.
LGPL version of LAMMPS
^^^^^^^^^^^^^^^^^^^^^^
We occasionally make stable LAMMPS releases available under the `GNU
Lesser Public License v2.1 <lgpl_>`_. This is on request only and with
non-LGPL compliant files removed. This allows uses linking non-GPL
compatible software with the (otherwise unmodified) LAMMPS library
or loading it dynamically at runtime. Any **modifications** to
the LAMMPS code however, even with the LGPL licensed version, must still
be made available under the same open source terms as LAMMPS itself.

View File

@ -10,24 +10,26 @@ conditions. It can model 2d or 3d systems with only a few particles
up to millions or billions. up to millions or billions.
LAMMPS can be built and run on a laptop or desktop machine, but is LAMMPS can be built and run on a laptop or desktop machine, but is
designed for parallel computers. It will run on any parallel machine designed for parallel computers. It will run in serial and on any
that supports the `MPI <mpi_>`_ message-passing library. This includes parallel machine that supports the `MPI <mpi_>`_ message-passing
shared-memory boxes and distributed-memory clusters and library. This includes shared-memory boxes and distributed-memory
supercomputers. clusters and supercomputers. Parts of LAMMPS also support
`OpenMP multi-threading <omp_>`_, vectorization and GPU acceleration.
.. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface .. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface
.. _lws: https://www.lammps.org .. _lws: https://www.lammps.org
.. _omp: https://www.openmp.org
LAMMPS is written in C++ and requires a compiler that is at least LAMMPS is written in C++ and requires a compiler that is at least
compatible with the C++-11 standard. compatible with the C++-11 standard. Earlier versions were written in
Earlier versions were written in F77 and F90. See the `History page F77, F90, and C++-98. See the `History page
<https://www.lammps.org/history.html>`_ of the website for details. All <https://www.lammps.org/history.html>`_ of the website for details. All
versions can be downloaded from the `LAMMPS website <lws_>`_. versions can be downloaded as source code from the `LAMMPS website
<lws_>`_.
LAMMPS is designed to be easy to modify or extend with new LAMMPS is designed to be easy to modify or extend with new capabilities,
capabilities, such as new force fields, atom types, boundary such as new force fields, atom types, boundary conditions, or
conditions, or diagnostics. See the :doc:`Modify <Modify>` page for diagnostics. See the :doc:`Modify <Modify>` page for more details.
more details.
In the most general sense, LAMMPS integrates Newton's equations of In the most general sense, LAMMPS integrates Newton's equations of
motion for a collection of interacting particles. A single particle motion for a collection of interacting particles. A single particle
@ -47,4 +49,5 @@ MPI parallelization to partition the simulation domain into small
sub-domains of equal computational cost, one of which is assigned to sub-domains of equal computational cost, one of which is assigned to
each processor. Processors communicate and store "ghost" atom each processor. Processors communicate and store "ghost" atom
information for atoms that border their sub-domain. Multi-threading information for atoms that border their sub-domain. Multi-threading
parallelization with with particle-decomposition can be used in addition. parallelization and GPU acceleration with with particle-decomposition
can be used in addition.

View File

@ -2,12 +2,21 @@ What does a LAMMPS version mean
------------------------------- -------------------------------
The LAMMPS "version" is the date when it was released, such as 1 May The LAMMPS "version" is the date when it was released, such as 1 May
2014. LAMMPS is updated continuously. Whenever we fix a bug or add a 2014. LAMMPS is updated continuously and we aim to keep it working
feature, we release it in the next *patch* release, which are correctly and reliably at all times. You can follow its development
typically made every couple of weeks. Info on patch releases are on in a public `git repository on GitHub <https://github.com/lammps/lammps>`_.
`this website page <https://www.lammps.org/bug.html>`_. Every few
months, the latest patch release is subjected to more thorough testing Whenever we fix a bug or update or add a feature, it will be merged into
and labeled as a *stable* version. the `master` branch of the git repository. When a sufficient number of
changes have accumulated *and* the software passes a set of automated
tests, we release it in the next *patch* release, which are made every
few weeks. Info on patch releases are on `this website page
<https://www.lammps.org/bug.html>`_.
Once or twice a year, only bug fixes and small, non-intrusive changes are
included for a period of time, and the code is subjected to more detailed
and thorough testing than the default automated testing. The latest
patch release after such a period is then labeled as a *stable* version.
Each version of LAMMPS contains all the features and bug-fixes up to Each version of LAMMPS contains all the features and bug-fixes up to
and including its version date. and including its version date.

View File

@ -19,7 +19,7 @@ Syntax
bondmax = length of longest bond in the system (in length units) bondmax = length of longest bond in the system (in length units)
tlimit = elapsed CPU time (in seconds) tlimit = elapsed CPU time (in seconds)
diskfree = free disk space (in megabytes) diskfree = free disk space (in MBytes)
v_name = name of :doc:`equal-style variable <variable>` v_name = name of :doc:`equal-style variable <variable>`
* operator = "<" or "<=" or ">" or ">=" or "==" or "!=" or "\|\^" * operator = "<" or "<=" or ">" or ">=" or "==" or "!=" or "\|\^"
@ -81,7 +81,7 @@ the timer frequently across a large number of processors may be
non-negligible. non-negligible.
The *diskfree* attribute will check for available disk space (in The *diskfree* attribute will check for available disk space (in
megabytes) on supported operating systems. By default it will MBytes) on supported operating systems. By default it will
check the file system of the current working directory. This check the file system of the current working directory. This
can be changed with the optional *path* keyword, which will take can be changed with the optional *path* keyword, which will take
the path to a file or folder on the file system to be checked the path to a file or folder on the file system to be checked

View File

@ -128,7 +128,7 @@ spectrum while consumes more memory. With fixed *f_max* and
:math:`\gamma`, *N_f* should be big enough to converge the classical :math:`\gamma`, *N_f* should be big enough to converge the classical
temperature :math:`T^{cl}` as a function of target quantum bath temperature :math:`T^{cl}` as a function of target quantum bath
temperature. Memory usage per processor could be from 10 to 100 temperature. Memory usage per processor could be from 10 to 100
Mbytes. MBytes.
.. note:: .. note::

View File

@ -135,7 +135,7 @@ with #) anywhere. Each non-blank non-comment line must contain one
keyword/value pair. The required keywords are *rcutfac* and keyword/value pair. The required keywords are *rcutfac* and
*twojmax*\ . Optional keywords are *rfac0*, *rmin0*, *twojmax*\ . Optional keywords are *rfac0*, *rmin0*,
*switchflag*, *bzeroflag*, *quadraticflag*, *chemflag*, *switchflag*, *bzeroflag*, *quadraticflag*, *chemflag*,
*bnormflag*, *wselfallflag*, and *chunksize*\ . *bnormflag*, *wselfallflag*, *chunksize*, and *parallelthresh*\ .
The default values for these keywords are The default values for these keywords are
@ -147,7 +147,8 @@ The default values for these keywords are
* *chemflag* = 0 * *chemflag* = 0
* *bnormflag* = 0 * *bnormflag* = 0
* *wselfallflag* = 0 * *wselfallflag* = 0
* *chunksize* = 4096 * *chunksize* = 32768
* *parallelthresh* = 8192
If *quadraticflag* is set to 1, then the SNAP energy expression includes If *quadraticflag* is set to 1, then the SNAP energy expression includes
additional quadratic terms that have been shown to increase the overall additional quadratic terms that have been shown to increase the overall
@ -188,14 +189,24 @@ corresponding *K*-vector of linear coefficients for element
which must equal the number of unique elements appearing in the LAMMPS which must equal the number of unique elements appearing in the LAMMPS
pair_coeff command, to avoid ambiguity in the number of coefficients. pair_coeff command, to avoid ambiguity in the number of coefficients.
The keyword *chunksize* is only applicable when using the The keywords *chunksize* and *parallelthresh* are only applicable when
pair style *snap* with the KOKKOS package and is ignored otherwise. using the pair style *snap* with the KOKKOS package on GPUs and are
This keyword controls ignored otherwise.
The *chunksize* keyword controls
the number of atoms in each pass used to compute the bispectrum the number of atoms in each pass used to compute the bispectrum
components and is used to avoid running out of memory. For example components and is used to avoid running out of memory. For example
if there are 8192 atoms in the simulation and the *chunksize* if there are 8192 atoms in the simulation and the *chunksize*
is set to 4096, the bispectrum calculation will be broken up is set to 4096, the bispectrum calculation will be broken up
into two passes. into two passes (running on a single GPU).
The *parallelthresh* keyword controls
a crossover threshold for performing extra parallelism. For
small systems, exposing additional parallism can be beneficial when
there is not enough work to fully saturate the GPU threads otherwise.
However, the extra parallelism also leads to more divergence
and can hurt performance when the system is already large enough to
saturate the GPU threads. Extra parallelism will be performed if the
*chunksize* (or total number of atoms per GPU) is smaller than
*parallelthresh*.
Detailed definitions for all the other keywords Detailed definitions for all the other keywords
are given on the :doc:`compute sna/atom <compute_sna_atom>` doc page. are given on the :doc:`compute sna/atom <compute_sna_atom>` doc page.

View File

@ -1174,6 +1174,7 @@ googletest
Gordan Gordan
Goudeau Goudeau
GPa GPa
GPL
gpu gpu
gpuID gpuID
gpus gpus
@ -1689,6 +1690,7 @@ Lett
Leuven Leuven
Leven Leven
Lewy Lewy
LGPL
lgvdw lgvdw
Liang Liang
libatc libatc
@ -1889,7 +1891,6 @@ maxX
Mayergoyz Mayergoyz
Mayoral Mayoral
mbt mbt
Mbytes
MBytes MBytes
mc mc
McLachlan McLachlan

View File

@ -44,7 +44,8 @@ struct TagPairSNAPComputeForce{};
struct TagPairSNAPComputeNeigh{}; struct TagPairSNAPComputeNeigh{};
struct TagPairSNAPComputeCayleyKlein{}; struct TagPairSNAPComputeCayleyKlein{};
struct TagPairSNAPPreUi{}; struct TagPairSNAPPreUi{};
struct TagPairSNAPComputeUi{}; struct TagPairSNAPComputeUiSmall{}; // more parallelism, more divergence
struct TagPairSNAPComputeUiLarge{}; // less parallelism, no divergence
struct TagPairSNAPTransformUi{}; // re-order ulisttot from SoA to AoSoA, zero ylist struct TagPairSNAPTransformUi{}; // re-order ulisttot from SoA to AoSoA, zero ylist
struct TagPairSNAPComputeZi{}; struct TagPairSNAPComputeZi{};
struct TagPairSNAPBeta{}; struct TagPairSNAPBeta{};
@ -53,7 +54,9 @@ struct TagPairSNAPTransformBi{}; // re-order blist from AoSoA to AoS
struct TagPairSNAPComputeYi{}; struct TagPairSNAPComputeYi{};
struct TagPairSNAPComputeYiWithZlist{}; struct TagPairSNAPComputeYiWithZlist{};
template<int dir> template<int dir>
struct TagPairSNAPComputeFusedDeidrj{}; struct TagPairSNAPComputeFusedDeidrjSmall{}; // more parallelism, more divergence
template<int dir>
struct TagPairSNAPComputeFusedDeidrjLarge{}; // less parallelism, no divergence
// CPU backend only // CPU backend only
struct TagPairSNAPComputeNeighCPU{}; struct TagPairSNAPComputeNeighCPU{};
@ -143,7 +146,10 @@ public:
void operator() (TagPairSNAPPreUi,const int iatom_mod, const int j, const int iatom_div) const; void operator() (TagPairSNAPPreUi,const int iatom_mod, const int j, const int iatom_div) const;
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void operator() (TagPairSNAPComputeUi,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUi>::member_type& team) const; void operator() (TagPairSNAPComputeUiSmall,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiSmall>::member_type& team) const;
KOKKOS_INLINE_FUNCTION
void operator() (TagPairSNAPComputeUiLarge,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiLarge>::member_type& team) const;
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void operator() (TagPairSNAPTransformUi,const int iatom_mod, const int j, const int iatom_div) const; void operator() (TagPairSNAPTransformUi,const int iatom_mod, const int j, const int iatom_div) const;
@ -168,7 +174,11 @@ public:
template<int dir> template<int dir>
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void operator() (TagPairSNAPComputeFusedDeidrj<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrj<dir> >::member_type& team) const; void operator() (TagPairSNAPComputeFusedDeidrjSmall<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrjSmall<dir> >::member_type& team) const;
template<int dir>
KOKKOS_INLINE_FUNCTION
void operator() (TagPairSNAPComputeFusedDeidrjLarge<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrjLarge<dir> >::member_type& team) const;
// CPU backend only // CPU backend only
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION

View File

@ -341,18 +341,32 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::compute(int eflag_in,
// ComputeUi w/vector parallelism, shared memory, direct atomicAdd into ulisttot // ComputeUi w/vector parallelism, shared memory, direct atomicAdd into ulisttot
{ {
// team_size_compute_ui is defined in `pair_snap_kokkos.h` // team_size_compute_ui is defined in `pair_snap_kokkos.h`
// scratch size: 32 atoms * (twojmax+1) cached values, no double buffer // scratch size: 32 atoms * (twojmax+1) cached values, no double buffer
const int tile_size = vector_length * (twojmax + 1); const int tile_size = vector_length * (twojmax + 1);
const int scratch_size = scratch_size_helper<complex>(team_size_compute_ui * tile_size); const int scratch_size = scratch_size_helper<complex>(team_size_compute_ui * tile_size);
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations) if (chunk_size < parallel_thresh)
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1); {
const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui; // Version with parallelism over j_bend
SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUi> policy_ui(n_teams_div, team_size_compute_ui, vector_length); // total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size)); const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
Kokkos::parallel_for("ComputeUi",policy_ui,*this); const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUiSmall> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
Kokkos::parallel_for("ComputeUiSmall",policy_ui,*this);
} else {
// Version w/out parallelism over j_bend
// total number of teams needed: (natoms / 32) * (max_neighs)
const int n_teams = chunk_size_div * max_neighs;
const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUiLarge> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
Kokkos::parallel_for("ComputeUiLarge",policy_ui,*this);
}
} }
//TransformUi: un-"fold" ulisttot, zero ylist //TransformUi: un-"fold" ulisttot, zero ylist
@ -412,25 +426,51 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::compute(int eflag_in,
const int tile_size = vector_length * (twojmax + 1); const int tile_size = vector_length * (twojmax + 1);
const int scratch_size = scratch_size_helper<complex>(2 * team_size_compute_fused_deidrj * tile_size); const int scratch_size = scratch_size_helper<complex>(2 * team_size_compute_fused_deidrj * tile_size);
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations) if (chunk_size < parallel_thresh)
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1); {
const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj; // Version with parallelism over j_bend
// x direction // total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length); const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size)); const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
Kokkos::parallel_for("ComputeFusedDeidrj<0>",policy_fused_deidrj_x,*this);
// y direction // x direction
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length); SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size)); policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
Kokkos::parallel_for("ComputeFusedDeidrj<1>",policy_fused_deidrj_y,*this); Kokkos::parallel_for("ComputeFusedDeidrjSmall<0>",policy_fused_deidrj_x,*this);
// z direction // y direction
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length); SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size)); policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
Kokkos::parallel_for("ComputeFusedDeidrj<2>",policy_fused_deidrj_z,*this); Kokkos::parallel_for("ComputeFusedDeidrjSmall<1>",policy_fused_deidrj_y,*this);
// z direction
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
Kokkos::parallel_for("ComputeFusedDeidrjSmall<2>",policy_fused_deidrj_z,*this);
} else {
// Version w/out parallelism over j_bend
// total number of teams needed: (natoms / 32) * (max_neighs)
const int n_teams = chunk_size_div * max_neighs;
const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
// x direction
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
Kokkos::parallel_for("ComputeFusedDeidrjLarge<0>",policy_fused_deidrj_x,*this);
// y direction
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
Kokkos::parallel_for("ComputeFusedDeidrjLarge<1>",policy_fused_deidrj_y,*this);
// z direction
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
Kokkos::parallel_for("ComputeFusedDeidrjLarge<2>",policy_fused_deidrj_z,*this);
}
} }
#endif // LMP_KOKKOS_GPU #endif // LMP_KOKKOS_GPU
@ -603,13 +643,13 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
for (int icoeff = 0; icoeff < ncoeff; icoeff++) { for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
const auto idxb = icoeff % idxb_max; const auto idxb = icoeff % idxb_max;
const auto idx_chem = icoeff / idxb_max; const auto idx_chem = icoeff / idxb_max;
auto bveci = my_sna.blist(idxb, idx_chem, ii); real_type bveci = my_sna.blist(ii, idx_chem, idxb);
d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bveci; d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bveci;
k++; k++;
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) { for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
const auto jdxb = jcoeff % idxb_max; const auto jdxb = jcoeff % idxb_max;
const auto jdx_chem = jcoeff / idxb_max; const auto jdx_chem = jcoeff / idxb_max;
real_type bvecj = my_sna.blist(jdxb, jdx_chem, ii); real_type bvecj = my_sna.blist(ii, jdx_chem, jdxb);
d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bvecj; d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bvecj;
d_beta_pack(iatom_mod,jcoeff,iatom_div) += d_coeffi[k]*bveci; d_beta_pack(iatom_mod,jcoeff,iatom_div) += d_coeffi[k]*bveci;
k++; k++;
@ -736,7 +776,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
template<class DeviceType, typename real_type, int vector_length> template<class DeviceType, typename real_type, int vector_length>
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUi,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUi>::member_type& team) const { void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUiSmall,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUiSmall>::member_type& team) const {
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK; SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
// extract flattened atom_div / neighbor number / bend location // extract flattened atom_div / neighbor number / bend location
@ -756,11 +796,37 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
const int ninside = d_ninside(ii); const int ninside = d_ninside(ii);
if (jj >= ninside) return; if (jj >= ninside) return;
my_sna.compute_ui(team,iatom_mod, jbend, jj, iatom_div); my_sna.compute_ui_small(team, iatom_mod, jbend, jj, iatom_div);
}); });
} }
template<class DeviceType, typename real_type, int vector_length>
KOKKOS_INLINE_FUNCTION
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUiLarge,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUiLarge>::member_type& team) const {
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
// extract flattened atom_div / neighbor number / bend location
int flattened_idx = team.team_rank() + team.league_rank() * team_size_compute_ui;
// extract neighbor index, iatom_div
int iatom_div = flattened_idx / max_neighs; // removed "const" to work around GCC 7 bug
int jj = flattened_idx - iatom_div * max_neighs;
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, vector_length),
[&] (const int iatom_mod) {
const int ii = iatom_mod + vector_length * iatom_div;
if (ii >= chunk_size) return;
const int ninside = d_ninside(ii);
if (jj >= ninside) return;
my_sna.compute_ui_large(team,iatom_mod, jj, iatom_div);
});
}
template<class DeviceType, typename real_type, int vector_length> template<class DeviceType, typename real_type, int vector_length>
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPTransformUi,const int iatom_mod, const int idxu, const int iatom_div) const { void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPTransformUi,const int iatom_mod, const int idxu, const int iatom_div) const {
@ -861,9 +927,9 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
for (int itriple = 0; itriple < ntriples; itriple++) { for (int itriple = 0; itriple < ntriples; itriple++) {
const auto blocal = my_sna.blist_pack(iatom_mod, idxb, itriple, iatom_div); const real_type blocal = my_sna.blist_pack(iatom_mod, idxb, itriple, iatom_div);
my_sna.blist(idxb, itriple, iatom) = blocal; my_sna.blist(iatom, itriple, idxb) = blocal;
} }
} }
@ -871,7 +937,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
template<class DeviceType, typename real_type, int vector_length> template<class DeviceType, typename real_type, int vector_length>
template<int dir> template<int dir>
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrj<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrj<dir> >::member_type& team) const { void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrjSmall<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrjSmall<dir> >::member_type& team) const {
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK; SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
// extract flattened atom_div / neighbor number / bend location // extract flattened atom_div / neighbor number / bend location
@ -891,12 +957,38 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
const int ninside = d_ninside(ii); const int ninside = d_ninside(ii);
if (jj >= ninside) return; if (jj >= ninside) return;
my_sna.template compute_fused_deidrj<dir>(team, iatom_mod, jbend, jj, iatom_div); my_sna.template compute_fused_deidrj_small<dir>(team, iatom_mod, jbend, jj, iatom_div);
}); });
} }
template<class DeviceType, typename real_type, int vector_length>
template<int dir>
KOKKOS_INLINE_FUNCTION
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrjLarge<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrjLarge<dir> >::member_type& team) const {
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
// extract flattened atom_div / neighbor number / bend location
int flattened_idx = team.team_rank() + team.league_rank() * team_size_compute_fused_deidrj;
// extract neighbor index, iatom_div
int iatom_div = flattened_idx / max_neighs; // removed "const" to work around GCC 7 bug
int jj = flattened_idx - max_neighs * iatom_div;
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, vector_length),
[&] (const int iatom_mod) {
const int ii = iatom_mod + vector_length * iatom_div;
if (ii >= chunk_size) return;
const int ninside = d_ninside(ii);
if (jj >= ninside) return;
my_sna.template compute_fused_deidrj_large<dir>(team, iatom_mod, jj, iatom_div);
});
}
/* ---------------------------------------------------------------------- /* ----------------------------------------------------------------------
Begin routines that are unique to the CPU codepath. These do not take Begin routines that are unique to the CPU codepath. These do not take
advantage of AoSoA data layouts, but that could be a good point of advantage of AoSoA data layouts, but that could be a good point of
@ -925,13 +1017,13 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
for (int icoeff = 0; icoeff < ncoeff; icoeff++) { for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
const auto idxb = icoeff % idxb_max; const auto idxb = icoeff % idxb_max;
const auto idx_chem = icoeff / idxb_max; const auto idx_chem = icoeff / idxb_max;
auto bveci = my_sna.blist(idxb,idx_chem,ii); real_type bveci = my_sna.blist(ii,idx_chem,idxb);
d_beta(icoeff,ii) += d_coeffi[k]*bveci; d_beta(icoeff,ii) += d_coeffi[k]*bveci;
k++; k++;
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) { for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
const auto jdxb = jcoeff % idxb_max; const auto jdxb = jcoeff % idxb_max;
const auto jdx_chem = jcoeff / idxb_max; const auto jdx_chem = jcoeff / idxb_max;
auto bvecj = my_sna.blist(jdxb,jdx_chem,ii); real_type bvecj = my_sna.blist(ii,jdx_chem,jdxb);
d_beta(icoeff,ii) += d_coeffi[k]*bvecj; d_beta(icoeff,ii) += d_coeffi[k]*bvecj;
d_beta(jcoeff,ii) += d_coeffi[k]*bveci; d_beta(jcoeff,ii) += d_coeffi[k]*bveci;
k++; k++;
@ -1221,7 +1313,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
for (int icoeff = 0; icoeff < ncoeff; icoeff++) { for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
const auto idxb = icoeff % idxb_max; const auto idxb = icoeff % idxb_max;
const auto idx_chem = icoeff / idxb_max; const auto idx_chem = icoeff / idxb_max;
evdwl += d_coeffi[icoeff+1]*my_sna.blist(idxb,idx_chem,ii); evdwl += d_coeffi[icoeff+1]*my_sna.blist(ii,idx_chem,idxb);
} }
// quadratic contributions // quadratic contributions
@ -1230,12 +1322,12 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
for (int icoeff = 0; icoeff < ncoeff; icoeff++) { for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
const auto idxb = icoeff % idxb_max; const auto idxb = icoeff % idxb_max;
const auto idx_chem = icoeff / idxb_max; const auto idx_chem = icoeff / idxb_max;
auto bveci = my_sna.blist(idxb,idx_chem,ii); real_type bveci = my_sna.blist(ii,idx_chem,idxb);
evdwl += 0.5*d_coeffi[k++]*bveci*bveci; evdwl += 0.5*d_coeffi[k++]*bveci*bveci;
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) { for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
auto jdxb = jcoeff % idxb_max; auto jdxb = jcoeff % idxb_max;
auto jdx_chem = jcoeff / idxb_max; auto jdx_chem = jcoeff / idxb_max;
auto bvecj = my_sna.blist(jdxb,jdx_chem,ii); auto bvecj = my_sna.blist(ii,jdx_chem,jdxb);
evdwl += d_coeffi[k++]*bveci*bvecj; evdwl += d_coeffi[k++]*bveci*bvecj;
} }
} }

View File

@ -45,12 +45,12 @@ struct WignerWrapper {
{ ; } { ; }
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
complex get(const int& ma) { complex get(const int& ma) const {
return complex(buffer[offset + 2 * vector_length * ma], buffer[offset + vector_length + 2 * vector_length * ma]); return complex(buffer[offset + 2 * vector_length * ma], buffer[offset + vector_length + 2 * vector_length * ma]);
} }
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void set(const int& ma, const complex& store) { void set(const int& ma, const complex& store) const {
buffer[offset + 2 * vector_length * ma] = store.re; buffer[offset + 2 * vector_length * ma] = store.re;
buffer[offset + vector_length + 2 * vector_length * ma] = store.im; buffer[offset + vector_length + 2 * vector_length * ma] = store.im;
} }
@ -122,8 +122,14 @@ inline
void compute_cayley_klein(const int&, const int&, const int&); void compute_cayley_klein(const int&, const int&, const int&);
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void pre_ui(const int&, const int&, const int&, const int&); // ForceSNAP void pre_ui(const int&, const int&, const int&, const int&); // ForceSNAP
// version of the code with parallelism over j_bend
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void compute_ui(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); // ForceSNAP void compute_ui_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); // ForceSNAP
// version of the code without parallelism over j_bend
KOKKOS_INLINE_FUNCTION
void compute_ui_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); // ForceSNAP
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void compute_zi(const int&, const int&, const int&); // ForceSNAP void compute_zi(const int&, const int&, const int&); // ForceSNAP
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
@ -135,6 +141,35 @@ inline
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void compute_bi(const int&, const int&, const int&); // ForceSNAP void compute_bi(const int&, const int&, const int&); // ForceSNAP
// functions for derivatives, GPU only
// version of the code with parallelism over j_bend
template<int dir>
KOKKOS_INLINE_FUNCTION
void compute_fused_deidrj_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); //ForceSNAP
// version of the code without parallelism over j_bend
template<int dir>
KOKKOS_INLINE_FUNCTION
void compute_fused_deidrj_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); //ForceSNAP
// core "evaluation" functions that get plugged into "compute" functions
// plugged into compute_ui_small, compute_ui_large
KOKKOS_FORCEINLINE_FUNCTION
void evaluate_ui_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&, const int&,
const int&, const int&, const int&);
// plugged into compute_zi, compute_yi
KOKKOS_FORCEINLINE_FUNCTION
complex evaluate_zi(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
const int&, const int&, const int&, const int&, const real_type*);
// plugged into compute_yi, compute_yi_with_zlist
KOKKOS_FORCEINLINE_FUNCTION
real_type evaluate_beta_scaled(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
const Kokkos::View<real_type***, Kokkos::LayoutLeft, DeviceType> &);
// plugged into compute_fused_deidrj_small, compute_fused_deidrj_large
KOKKOS_FORCEINLINE_FUNCTION
real_type evaluate_duidrj_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&,
const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&,
const int&, const int&, const int&, const int&);
// functions for bispectrum coefficients, CPU only // functions for bispectrum coefficients, CPU only
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void pre_ui_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team,const int&,const int&); // ForceSNAP void pre_ui_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team,const int&,const int&); // ForceSNAP
@ -148,11 +183,6 @@ inline
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void compute_bi_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int); // ForceSNAP void compute_bi_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int); // ForceSNAP
// functions for derivatives, GPU only
template<int dir>
KOKKOS_INLINE_FUNCTION
void compute_fused_deidrj(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); //ForceSNAP
// functions for derivatives, CPU only // functions for derivatives, CPU only
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void compute_duidrj_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int); //ForceSNAP void compute_duidrj_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int); //ForceSNAP
@ -168,23 +198,6 @@ inline
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void compute_s_dsfac(const real_type, const real_type, real_type&, real_type&); // compute_cayley_klein void compute_s_dsfac(const real_type, const real_type, real_type&, real_type&); // compute_cayley_klein
static KOKKOS_FORCEINLINE_FUNCTION
void sincos_wrapper(double x, double* sin_, double *cos_) {
#ifdef __SYCL_DEVICE_ONLY__
*sin_ = sycl::sincos(x, cos_);
#else
sincos(x, sin_, cos_);
#endif
}
static KOKKOS_FORCEINLINE_FUNCTION
void sincos_wrapper(float x, float* sin_, float *cos_) {
#ifdef __SYCL_DEVICE_ONLY__
*sin_ = sycl::sincos(x, cos_);
#else
sincosf(x, sin_, cos_);
#endif
}
#ifdef TIMING_INFO #ifdef TIMING_INFO
double* timers; double* timers;
timespec starttime, endtime; timespec starttime, endtime;
@ -207,7 +220,7 @@ inline
int twojmax, diagonalstyle; int twojmax, diagonalstyle;
t_sna_3d_ll blist; t_sna_3d blist;
t_sna_3c_ll ulisttot; t_sna_3c_ll ulisttot;
t_sna_3c_ll ulisttot_full; // un-folded ulisttot, cpu only t_sna_3c_ll ulisttot_full; // un-folded ulisttot, cpu only
t_sna_3c_ll zlist; t_sna_3c_ll zlist;

File diff suppressed because it is too large Load Diff

View File

@ -628,7 +628,8 @@ void PairSNAP::read_files(char *coefffilename, char *paramfilename)
chemflag = 0; chemflag = 0;
bnormflag = 0; bnormflag = 0;
wselfallflag = 0; wselfallflag = 0;
chunksize = 4096; chunksize = 32768;
parallel_thresh = 8192;
// open SNAP parameter file on proc 0 // open SNAP parameter file on proc 0
@ -696,6 +697,8 @@ void PairSNAP::read_files(char *coefffilename, char *paramfilename)
wselfallflag = utils::inumeric(FLERR,keyval.c_str(),false,lmp); wselfallflag = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
else if (keywd == "chunksize") else if (keywd == "chunksize")
chunksize = utils::inumeric(FLERR,keyval.c_str(),false,lmp); chunksize = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
else if (keywd == "parallelthresh")
parallel_thresh = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
else else
error->all(FLERR,"Unknown parameter '{}' in SNAP " error->all(FLERR,"Unknown parameter '{}' in SNAP "
"parameter file", keywd); "parameter file", keywd);

View File

@ -59,7 +59,7 @@ class PairSNAP : public Pair {
double **scale; // for thermodynamic integration double **scale; // for thermodynamic integration
int twojmax, switchflag, bzeroflag, bnormflag; int twojmax, switchflag, bzeroflag, bnormflag;
int chemflag, wselfallflag; int chemflag, wselfallflag;
int chunksize; int chunksize,parallel_thresh;
double rfac0, rmin0, wj1, wj2; double rfac0, rmin0, wj1, wj2;
int rcutfacflag, twojmaxflag; // flags for required parameters int rcutfacflag, twojmaxflag; // flags for required parameters
int beta_max; // length of beta int beta_max; // length of beta

View File

@ -20,7 +20,7 @@ charges (dsf and long-range treatment of charges)
out-of-plane angle out-of-plane angle
See the file doc/drude_tutorial.html for getting started. See the file doc/drude_tutorial.html for getting started.
See the doc pages for "pair_style buck6d/coul/gauss", "anlge_style class2", See the doc pages for "pair_style buck6d/coul/gauss", "angle_style class2",
"angle_style cosine/buck6d", and "improper_style inversion/harmonic" "angle_style cosine/buck6d", and "improper_style inversion/harmonic"
commands to get started. Also see the above mentioned website and commands to get started. Also see the above mentioned website and
literature for further documentation about the force field. literature for further documentation about the force field.

View File

@ -34,6 +34,7 @@ exclude:
- lib/hdnnp - lib/hdnnp
- lib/kim - lib/kim
- lib/kokkos - lib/kokkos
- lib/latte
- lib/machdyn - lib/machdyn
- lib/mdi - lib/mdi
- lib/mscg - lib/mscg
@ -41,6 +42,7 @@ exclude:
- lib/plumed - lib/plumed
- lib/quip - lib/quip
- lib/scafacos - lib/scafacos
- lib/voronoi
- src/Make.sh - src/Make.sh
patterns: patterns:
- "*.c" - "*.c"