Merge branch 'master' into collected-small-changes
This commit is contained in:
@ -22,4 +22,5 @@ page.
|
||||
Build_extras
|
||||
Build_manual
|
||||
Build_windows
|
||||
Build_diskspace
|
||||
Build_development
|
||||
|
||||
45
doc/src/Build_diskspace.rst
Normal file
45
doc/src/Build_diskspace.rst
Normal file
@ -0,0 +1,45 @@
|
||||
Notes for saving disk space when building LAMMPS from source
|
||||
------------------------------------------------------------
|
||||
|
||||
LAMMPS is a large software project with a large number of source files,
|
||||
extensive documentation, and a large collection of example files.
|
||||
When downloading LAMMPS by cloning the
|
||||
`git repository from GitHub <https://github.com/lammps/lammps>`_ this
|
||||
will by default also download the entire commit history since September 2006.
|
||||
Compiling LAMMPS will add the storage requirements of the compiled object
|
||||
files and libraries to the tally.
|
||||
|
||||
In a user account on an HPC cluster with filesystem quotas or in other
|
||||
environments with restricted disk space capacity it may be needed to
|
||||
reduce the storage requirements. Here are some suggestions:
|
||||
|
||||
- Create a so-called shallow repository by cloning only the last commit
|
||||
instead of the full project history by using ``git clone git@github.com:lammps/lammps --depth=1 --branch=master``.
|
||||
This reduces the downloaded size to about half. With ``--depth=1`` it is not possible to check out different
|
||||
versions/branches of LAMMPS, using ``--depth=1000`` will make multiple recent versions available at little
|
||||
extra storage needs (the entire git history had nearly 30,000 commits in fall 2021).
|
||||
|
||||
- Download a tar archive from either the `download section on the LAMMPS homepage <https://www.lammps.org/download.html>`_
|
||||
or from the `LAMMPS releases page on GitHub <https://github.com/lammps/lammps/releases>`_ these will not
|
||||
contain the git history at all.
|
||||
|
||||
- Build LAMMPS without the debug flag (remove ``-g`` from the machine makefile or use ``-DCMAKE_BUILD_TYPE=Release``)
|
||||
or use the ``strip`` command on the LAMMPS executable when no more debugging would be needed. The strip command
|
||||
may also be applied to the LAMMPS shared library. The static library may be deleted entirely.
|
||||
|
||||
- Delete compiled object files and libraries after copying the LAMMPS executable to a permanent location.
|
||||
When using the traditional build process, one may use ``make clean-<machine>`` or ``make clean-all``
|
||||
to delete object files in the src folder. For CMake based builds, one may use ``make clean`` or just
|
||||
delete the entire build folder.
|
||||
|
||||
- The folders containing the documentation tree (doc), the examples (examples) are not needed to build and
|
||||
run LAMMPS and can be safely deleted. Some files in the potentials folder are large and may be deleted,
|
||||
if not needed. The largest of those files (occupying about 120 MBytes combined) will only be downloaded on
|
||||
demand, when the corresponding package is installed.
|
||||
|
||||
- When using the CMake build procedure, the compilation can be done on a (local) scratch storage that will not
|
||||
count toward the quota. A local scratch file system may offer the additional benefit of speeding up creating
|
||||
object files and linking with libraries compared to a networked file system. Also with CMake (and unlike with
|
||||
the traditional make) it is possible to compile LAMMPS executables with different settings and packages included
|
||||
from the same source tree since all the configuration information is stored in the build folder. So it is
|
||||
not necessary to have multiple copies of LAMMPS.
|
||||
@ -29,7 +29,7 @@ The following folks deserve special recognition. Many of the packages
|
||||
they have written are unique for an MD code and LAMMPS would not be as
|
||||
general-purpose as it is without their expertise and efforts.
|
||||
|
||||
* Metin Aktulga (MSU), REAXFF package for C version of ReaxFF
|
||||
* Metin Aktulga (MSU), REAXFF package for C/C++ version of ReaxFF
|
||||
* Mike Brown (Intel), GPU and INTEL packages
|
||||
* Colin Denniston (U Western Ontario), LATBOLTZ package
|
||||
* Georg Ganzenmuller (EMI), MACHDYN and SPH packages
|
||||
@ -37,9 +37,10 @@ general-purpose as it is without their expertise and efforts.
|
||||
* Reese Jones (Sandia) and colleagues, ATC package for atom/continuum coupling
|
||||
* Christoph Kloss (DCS Computing), LIGGGHTS code for granular materials, built on top of LAMMPS
|
||||
* Rudra Mukherjee (JPL), POEMS package for articulated rigid body motion
|
||||
* Trung Ngyuen (Northwestern U), GPU and RIGID and BODY packages
|
||||
* Trung Ngyuen (Northwestern U), GPU, RIGID, BODY, and DIELECTRIC packages
|
||||
* Mike Parks (Sandia), PERI package for Peridynamics
|
||||
* Roy Pollock (LLNL), Ewald and PPPM solvers
|
||||
* Julien Tranchida (Sandia), SPIN package
|
||||
* Christian Trott (Sandia), CUDA and KOKKOS packages
|
||||
* Ilya Valuev (JIHT), AWPMD package for wave packet MD
|
||||
* Greg Wagner (Northwestern U), MEAM package for MEAM potential
|
||||
|
||||
@ -27,19 +27,19 @@ General features
|
||||
* distributed memory message-passing parallelism (MPI)
|
||||
* shared memory multi-threading parallelism (OpenMP)
|
||||
* spatial decomposition of simulation domain for MPI parallelism
|
||||
* particle decomposition inside of spatial decomposition for OpenMP parallelism
|
||||
* particle decomposition inside of spatial decomposition for OpenMP and GPU parallelism
|
||||
* GPLv2 licensed open-source distribution
|
||||
* highly portable C++-11
|
||||
* modular code with most functionality in optional packages
|
||||
* only depends on MPI library for basic parallel functionality
|
||||
* only depends on MPI library for basic parallel functionality, MPI stub for serial compilation
|
||||
* other libraries are optional and only required for specific packages
|
||||
* GPU (CUDA and OpenCL), Intel Xeon Phi, and OpenMP support for many code features
|
||||
* GPU (CUDA, OpenCL, HIP, SYCL), Intel Xeon Phi, and OpenMP support for many code features
|
||||
* easy to extend with new features and functionality
|
||||
* runs from an input script
|
||||
* syntax for defining and using variables and formulas
|
||||
* syntax for looping over runs and breaking out of loops
|
||||
* run one or multiple simulations simultaneously (in parallel) from one script
|
||||
* build as library, invoke LAMMPS through library interface or provided Python wrapper
|
||||
* build as library, invoke LAMMPS through library interface or provided Python wrapper or SWIG based wrappers
|
||||
* couple with other codes: LAMMPS calls other code, other code calls LAMMPS, umbrella code calls both
|
||||
|
||||
.. _particle:
|
||||
@ -57,9 +57,11 @@ Particle and model types
|
||||
* granular materials
|
||||
* coarse-grained mesoscale models
|
||||
* finite-size spherical and ellipsoidal particles
|
||||
* finite-size line segment (2d) and triangle (3d) particles
|
||||
* finite-size line segment (2d) and triangle (3d) particles
|
||||
* finite-size rounded polygons (2d) and polyhedra (3d) particles
|
||||
* point dipole particles
|
||||
* rigid collections of particles
|
||||
* particles with magnetic spin
|
||||
* rigid collections of n particles
|
||||
* hybrid combinations of these
|
||||
|
||||
.. _ff:
|
||||
@ -74,24 +76,28 @@ commands)
|
||||
|
||||
* pairwise potentials: Lennard-Jones, Buckingham, Morse, Born-Mayer-Huggins, Yukawa, soft, class 2 (COMPASS), hydrogen bond, tabulated
|
||||
* charged pairwise potentials: Coulombic, point-dipole
|
||||
* many-body potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, REBO, AIREBO, ReaxFF, COMB, SNAP, Streitz-Mintmire, 3-body polymorphic
|
||||
* long-range interactions for charge, point-dipoles, and LJ dispersion: Ewald, Wolf, PPPM (similar to particle-mesh Ewald)
|
||||
* many-body potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, REBO, AIREBO, ReaxFF, COMB, Streitz-Mintmire, 3-body polymorphic, BOP, Vashishta
|
||||
* machine learning potentials: SNAP, GAP, ACE, N2P2, RANN, AGNI
|
||||
* long-range interactions for charge, point-dipoles, and LJ dispersion: Ewald, Wolf, PPPM (similar to particle-mesh Ewald), MSM
|
||||
* polarization models: :doc:`QEq <fix_qeq>`, :doc:`core/shell model <Howto_coreshell>`, :doc:`Drude dipole model <Howto_drude>`
|
||||
* charge equilibration (QEq via dynamic, point, shielded, Slater methods)
|
||||
* coarse-grained potentials: DPD, GayBerne, REsquared, colloidal, DLVO
|
||||
* mesoscopic potentials: granular, Peridynamics, SPH
|
||||
* mesoscopic potentials: granular, Peridynamics, SPH, mesoscopic tubular potential (MESONT)
|
||||
* semi-empirical potentials: multi-ion generalized pseudopotential theory (MGPT), second moment tight binding + QEq (SMTB-Q), density functional tight-binding (LATTE)
|
||||
* electron force field (eFF, AWPMD)
|
||||
* bond potentials: harmonic, FENE, Morse, nonlinear, class 2, quartic (breakable)
|
||||
* angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic, class 2 (COMPASS)
|
||||
* dihedral potentials: harmonic, CHARMM, multi-harmonic, helix, class 2 (COMPASS), OPLS
|
||||
* improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS)
|
||||
* bond potentials: harmonic, FENE, Morse, nonlinear, class 2, quartic (breakable), tabulated
|
||||
* angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic, class 2 (COMPASS), tabulated
|
||||
* dihedral potentials: harmonic, CHARMM, multi-harmonic, helix, class 2 (COMPASS), OPLS, tabulated
|
||||
* improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS), tabulated
|
||||
* polymer potentials: all-atom, united-atom, bead-spring, breakable
|
||||
* water potentials: TIP3P, TIP4P, SPC
|
||||
* water potentials: TIP3P, TIP4P, SPC, SPC/E and variants
|
||||
* interlayer potentials for graphene and analogues
|
||||
* metal-organic framework potentials (QuickFF, MO-FF)
|
||||
* implicit solvent potentials: hydrodynamic lubrication, Debye
|
||||
* force-field compatibility with common CHARMM, AMBER, DREIDING, OPLS, GROMACS, COMPASS options
|
||||
* force-field compatibility with common CHARMM, AMBER, DREIDING, OPLS, GROMACS, COMPASS options
|
||||
* access to the `OpenKIM Repository <http://openkim.org>`_ of potentials via :doc:`kim command <kim_commands>`
|
||||
* hybrid potentials: multiple pair, bond, angle, dihedral, improper potentials can be used in one simulation
|
||||
* overlaid potentials: superposition of multiple pair potentials
|
||||
* hybrid potentials: multiple pair, bond, angle, dihedral, improper potentials can be used in one simulation
|
||||
* overlaid potentials: superposition of multiple pair potentials (including many-body) with optional scale factor
|
||||
|
||||
.. _create:
|
||||
|
||||
@ -124,9 +130,10 @@ Ensembles, constraints, and boundary conditions
|
||||
* harmonic (umbrella) constraint forces
|
||||
* rigid body constraints
|
||||
* SHAKE bond and angle constraints
|
||||
* Monte Carlo bond breaking, formation, swapping
|
||||
* motion constraints to manifold surfaces
|
||||
* Monte Carlo bond breaking, formation, swapping, template based reaction modeling
|
||||
* atom/molecule insertion and deletion
|
||||
* walls of various kinds
|
||||
* walls of various kinds, static and moving
|
||||
* non-equilibrium molecular dynamics (NEMD)
|
||||
* variety of additional boundary conditions and constraints
|
||||
|
||||
@ -150,6 +157,7 @@ Diagnostics
|
||||
^^^^^^^^^^^
|
||||
|
||||
* see various flavors of the :doc:`fix <fix>` and :doc:`compute <compute>` commands
|
||||
* introspection command for system, simulation, and compile time settings and configurations
|
||||
|
||||
.. _output:
|
||||
|
||||
@ -164,8 +172,9 @@ Output
|
||||
* parallel I/O of dump and restart files
|
||||
* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
|
||||
* user-defined system-wide (log file) or per-atom (dump file) calculations
|
||||
* spatial and time averaging of per-atom quantities
|
||||
* time averaging of system-wide quantities
|
||||
* custom partitioning (chunks) for binning, and static or dynamic grouping of atoms for analysis
|
||||
* spatial, time, and per-chunk averaging of per-atom quantities
|
||||
* time averaging and histogramming of system-wide quantities
|
||||
* atom snapshots in native, XYZ, XTC, DCD, CFG formats
|
||||
|
||||
.. _replica1:
|
||||
@ -178,7 +187,7 @@ Multi-replica models
|
||||
* :doc:`parallel replica dynamics <prd>`
|
||||
* :doc:`temperature accelerated dynamics <tad>`
|
||||
* :doc:`parallel tempering <temper>`
|
||||
* :doc:`path-integral MD <fix_pimd>`
|
||||
* path-integral MD: `first variant <fix_pimd>`, `second variant <fix_ipi>`
|
||||
* multi-walker collective variables with :doc:`Colvars <fix_colvars>` and :doc:`Plumed <fix_plumed>`
|
||||
|
||||
.. _prepost:
|
||||
@ -210,11 +219,12 @@ page for details.
|
||||
These are LAMMPS capabilities which you may not think of as typical
|
||||
classical MD options:
|
||||
|
||||
* :doc:`static <balance>` and :doc:`dynamic load-balancing <fix_balance>`
|
||||
* :doc:`static <balance>` and :doc:`dynamic load-balancing <fix_balance>`, optional with recursive bisectioning decomposition
|
||||
* :doc:`generalized aspherical particles <Howto_body>`
|
||||
* :doc:`stochastic rotation dynamics (SRD) <fix_srd>`
|
||||
* :doc:`real-time visualization and interactive MD <fix_imd>`
|
||||
* :doc:`real-time visualization and interactive MD <fix_imd>`, :doc:`built-in renderer for images and movies <dump_image>`
|
||||
* calculate :doc:`virtual diffraction patterns <compute_xrd>`
|
||||
* calculate :doc:`finite temperature phonon dispersion <fix_phonon>` and the :doc:`dynamical matrix of minimized structures <dynamical_matrix>`
|
||||
* :doc:`atom-to-continuum coupling <fix_atc>` with finite elements
|
||||
* coupled rigid body integration via the :doc:`POEMS <fix_poems>` library
|
||||
* :doc:`QM/MM coupling <fix_qmmm>`
|
||||
|
||||
@ -1,40 +1,61 @@
|
||||
LAMMPS open-source license
|
||||
--------------------------
|
||||
|
||||
LAMMPS is a freely-available open-source code, distributed under the
|
||||
terms of the `GNU Public License Version 2 <gpl_>`_, which means you can
|
||||
use or modify the code however you wish for your own purposes, but have
|
||||
to adhere to certain rules when redistributing it or software derived
|
||||
GPL version of LAMMPS
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
LAMMPS is an open-source code, available free-of-charge, and distributed
|
||||
under the terms of the `GNU Public License Version 2 <gpl_>`_ (GPLv2),
|
||||
which means you can use or modify the code however you wish for your own
|
||||
purposes, but have to adhere to certain rules when redistributing it -
|
||||
specifically in binary form - or are distributing software derived
|
||||
from it or that includes parts of it.
|
||||
|
||||
LAMMPS comes with no warranty of any kind. As each source file states
|
||||
in its header, it is a copyrighted code that is distributed free-of-
|
||||
charge, under the terms of the `GNU Public License Version 2 <gpl_>`_
|
||||
(GPLv2). This is often referred to as open-source distribution - see
|
||||
`www.gnu.org <gnuorg_>`_ or `www.opensource.org <opensource_>`_. The
|
||||
legal text of the GPL is in the LICENSE file included in the LAMMPS
|
||||
distribution.
|
||||
LAMMPS comes with no warranty of any kind.
|
||||
|
||||
As each source file states in its header, it is a copyrighted code, and
|
||||
thus not in the public domain. For more information about open-source
|
||||
software and open-source distribution, see `www.gnu.org <gnuorg_>`_
|
||||
or `www.opensource.org <opensource_>`_. The legal text of the GPL as it
|
||||
applies to LAMMPS is in the LICENSE file included in the LAMMPS distribution.
|
||||
|
||||
.. _gpl: https://github.com/lammps/lammps/blob/master/LICENSE
|
||||
|
||||
.. _lgpl: https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html
|
||||
|
||||
.. _gnuorg: http://www.gnu.org
|
||||
|
||||
.. _opensource: http://www.opensource.org
|
||||
|
||||
Here is a summary of what the GPL means for LAMMPS users:
|
||||
Here is a more specific summary of what the GPL means for LAMMPS users:
|
||||
|
||||
(1) Anyone is free to use, modify, or extend LAMMPS in any way they
|
||||
(1) Anyone is free to use, copy, modify, or extend LAMMPS in any way they
|
||||
choose, including for commercial purposes.
|
||||
|
||||
(2) If you **distribute** a modified version of LAMMPS, it must remain
|
||||
open-source, meaning you distribute **all** of it under the terms of
|
||||
the GPL. You should clearly annotate such a code as a derivative version
|
||||
of LAMMPS.
|
||||
open-source, meaning you are required to distribute **all** of it under
|
||||
the terms of the GPL. You should clearly annotate such a modified code
|
||||
as a derivative version of LAMMPS.
|
||||
|
||||
(3) If you release any code that includes or uses LAMMPS source code,
|
||||
then it must also be open-sourced, meaning you distribute it under
|
||||
the terms of the GPL.
|
||||
the terms of the GPL. You may write code that interfaces LAMMPS to
|
||||
a differently licensed library. In that case the code that provides
|
||||
the interface must be licensed GPL, but not necessarily that library
|
||||
unless you are distributing binaries that require the library to run.
|
||||
|
||||
(4) If you give LAMMPS files to someone else, the GPL LICENSE file and
|
||||
source file headers (including the copyright and GPL notices) should
|
||||
remain part of the code.
|
||||
|
||||
|
||||
LGPL version of LAMMPS
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
We occasionally make stable LAMMPS releases available under the `GNU
|
||||
Lesser Public License v2.1 <lgpl_>`_. This is on request only and with
|
||||
non-LGPL compliant files removed. This allows uses linking non-GPL
|
||||
compatible software with the (otherwise unmodified) LAMMPS library
|
||||
or loading it dynamically at runtime. Any **modifications** to
|
||||
the LAMMPS code however, even with the LGPL licensed version, must still
|
||||
be made available under the same open source terms as LAMMPS itself.
|
||||
|
||||
@ -10,24 +10,26 @@ conditions. It can model 2d or 3d systems with only a few particles
|
||||
up to millions or billions.
|
||||
|
||||
LAMMPS can be built and run on a laptop or desktop machine, but is
|
||||
designed for parallel computers. It will run on any parallel machine
|
||||
that supports the `MPI <mpi_>`_ message-passing library. This includes
|
||||
shared-memory boxes and distributed-memory clusters and
|
||||
supercomputers.
|
||||
designed for parallel computers. It will run in serial and on any
|
||||
parallel machine that supports the `MPI <mpi_>`_ message-passing
|
||||
library. This includes shared-memory boxes and distributed-memory
|
||||
clusters and supercomputers. Parts of LAMMPS also support
|
||||
`OpenMP multi-threading <omp_>`_, vectorization and GPU acceleration.
|
||||
|
||||
.. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface
|
||||
.. _lws: https://www.lammps.org
|
||||
.. _omp: https://www.openmp.org
|
||||
|
||||
LAMMPS is written in C++ and requires a compiler that is at least
|
||||
compatible with the C++-11 standard.
|
||||
Earlier versions were written in F77 and F90. See the `History page
|
||||
compatible with the C++-11 standard. Earlier versions were written in
|
||||
F77, F90, and C++-98. See the `History page
|
||||
<https://www.lammps.org/history.html>`_ of the website for details. All
|
||||
versions can be downloaded from the `LAMMPS website <lws_>`_.
|
||||
versions can be downloaded as source code from the `LAMMPS website
|
||||
<lws_>`_.
|
||||
|
||||
LAMMPS is designed to be easy to modify or extend with new
|
||||
capabilities, such as new force fields, atom types, boundary
|
||||
conditions, or diagnostics. See the :doc:`Modify <Modify>` page for
|
||||
more details.
|
||||
LAMMPS is designed to be easy to modify or extend with new capabilities,
|
||||
such as new force fields, atom types, boundary conditions, or
|
||||
diagnostics. See the :doc:`Modify <Modify>` page for more details.
|
||||
|
||||
In the most general sense, LAMMPS integrates Newton's equations of
|
||||
motion for a collection of interacting particles. A single particle
|
||||
@ -47,4 +49,5 @@ MPI parallelization to partition the simulation domain into small
|
||||
sub-domains of equal computational cost, one of which is assigned to
|
||||
each processor. Processors communicate and store "ghost" atom
|
||||
information for atoms that border their sub-domain. Multi-threading
|
||||
parallelization with with particle-decomposition can be used in addition.
|
||||
parallelization and GPU acceleration with with particle-decomposition
|
||||
can be used in addition.
|
||||
|
||||
@ -2,12 +2,21 @@ What does a LAMMPS version mean
|
||||
-------------------------------
|
||||
|
||||
The LAMMPS "version" is the date when it was released, such as 1 May
|
||||
2014. LAMMPS is updated continuously. Whenever we fix a bug or add a
|
||||
feature, we release it in the next *patch* release, which are
|
||||
typically made every couple of weeks. Info on patch releases are on
|
||||
`this website page <https://www.lammps.org/bug.html>`_. Every few
|
||||
months, the latest patch release is subjected to more thorough testing
|
||||
and labeled as a *stable* version.
|
||||
2014. LAMMPS is updated continuously and we aim to keep it working
|
||||
correctly and reliably at all times. You can follow its development
|
||||
in a public `git repository on GitHub <https://github.com/lammps/lammps>`_.
|
||||
|
||||
Whenever we fix a bug or update or add a feature, it will be merged into
|
||||
the `master` branch of the git repository. When a sufficient number of
|
||||
changes have accumulated *and* the software passes a set of automated
|
||||
tests, we release it in the next *patch* release, which are made every
|
||||
few weeks. Info on patch releases are on `this website page
|
||||
<https://www.lammps.org/bug.html>`_.
|
||||
|
||||
Once or twice a year, only bug fixes and small, non-intrusive changes are
|
||||
included for a period of time, and the code is subjected to more detailed
|
||||
and thorough testing than the default automated testing. The latest
|
||||
patch release after such a period is then labeled as a *stable* version.
|
||||
|
||||
Each version of LAMMPS contains all the features and bug-fixes up to
|
||||
and including its version date.
|
||||
|
||||
@ -19,7 +19,7 @@ Syntax
|
||||
|
||||
bondmax = length of longest bond in the system (in length units)
|
||||
tlimit = elapsed CPU time (in seconds)
|
||||
diskfree = free disk space (in megabytes)
|
||||
diskfree = free disk space (in MBytes)
|
||||
v_name = name of :doc:`equal-style variable <variable>`
|
||||
|
||||
* operator = "<" or "<=" or ">" or ">=" or "==" or "!=" or "\|\^"
|
||||
@ -81,7 +81,7 @@ the timer frequently across a large number of processors may be
|
||||
non-negligible.
|
||||
|
||||
The *diskfree* attribute will check for available disk space (in
|
||||
megabytes) on supported operating systems. By default it will
|
||||
MBytes) on supported operating systems. By default it will
|
||||
check the file system of the current working directory. This
|
||||
can be changed with the optional *path* keyword, which will take
|
||||
the path to a file or folder on the file system to be checked
|
||||
|
||||
@ -128,7 +128,7 @@ spectrum while consumes more memory. With fixed *f_max* and
|
||||
:math:`\gamma`, *N_f* should be big enough to converge the classical
|
||||
temperature :math:`T^{cl}` as a function of target quantum bath
|
||||
temperature. Memory usage per processor could be from 10 to 100
|
||||
Mbytes.
|
||||
MBytes.
|
||||
|
||||
.. note::
|
||||
|
||||
|
||||
@ -135,7 +135,7 @@ with #) anywhere. Each non-blank non-comment line must contain one
|
||||
keyword/value pair. The required keywords are *rcutfac* and
|
||||
*twojmax*\ . Optional keywords are *rfac0*, *rmin0*,
|
||||
*switchflag*, *bzeroflag*, *quadraticflag*, *chemflag*,
|
||||
*bnormflag*, *wselfallflag*, and *chunksize*\ .
|
||||
*bnormflag*, *wselfallflag*, *chunksize*, and *parallelthresh*\ .
|
||||
|
||||
The default values for these keywords are
|
||||
|
||||
@ -147,7 +147,8 @@ The default values for these keywords are
|
||||
* *chemflag* = 0
|
||||
* *bnormflag* = 0
|
||||
* *wselfallflag* = 0
|
||||
* *chunksize* = 4096
|
||||
* *chunksize* = 32768
|
||||
* *parallelthresh* = 8192
|
||||
|
||||
If *quadraticflag* is set to 1, then the SNAP energy expression includes
|
||||
additional quadratic terms that have been shown to increase the overall
|
||||
@ -188,14 +189,24 @@ corresponding *K*-vector of linear coefficients for element
|
||||
which must equal the number of unique elements appearing in the LAMMPS
|
||||
pair_coeff command, to avoid ambiguity in the number of coefficients.
|
||||
|
||||
The keyword *chunksize* is only applicable when using the
|
||||
pair style *snap* with the KOKKOS package and is ignored otherwise.
|
||||
This keyword controls
|
||||
The keywords *chunksize* and *parallelthresh* are only applicable when
|
||||
using the pair style *snap* with the KOKKOS package on GPUs and are
|
||||
ignored otherwise.
|
||||
The *chunksize* keyword controls
|
||||
the number of atoms in each pass used to compute the bispectrum
|
||||
components and is used to avoid running out of memory. For example
|
||||
if there are 8192 atoms in the simulation and the *chunksize*
|
||||
is set to 4096, the bispectrum calculation will be broken up
|
||||
into two passes.
|
||||
into two passes (running on a single GPU).
|
||||
The *parallelthresh* keyword controls
|
||||
a crossover threshold for performing extra parallelism. For
|
||||
small systems, exposing additional parallism can be beneficial when
|
||||
there is not enough work to fully saturate the GPU threads otherwise.
|
||||
However, the extra parallelism also leads to more divergence
|
||||
and can hurt performance when the system is already large enough to
|
||||
saturate the GPU threads. Extra parallelism will be performed if the
|
||||
*chunksize* (or total number of atoms per GPU) is smaller than
|
||||
*parallelthresh*.
|
||||
|
||||
Detailed definitions for all the other keywords
|
||||
are given on the :doc:`compute sna/atom <compute_sna_atom>` doc page.
|
||||
|
||||
@ -1174,6 +1174,7 @@ googletest
|
||||
Gordan
|
||||
Goudeau
|
||||
GPa
|
||||
GPL
|
||||
gpu
|
||||
gpuID
|
||||
gpus
|
||||
@ -1689,6 +1690,7 @@ Lett
|
||||
Leuven
|
||||
Leven
|
||||
Lewy
|
||||
LGPL
|
||||
lgvdw
|
||||
Liang
|
||||
libatc
|
||||
@ -1889,7 +1891,6 @@ maxX
|
||||
Mayergoyz
|
||||
Mayoral
|
||||
mbt
|
||||
Mbytes
|
||||
MBytes
|
||||
mc
|
||||
McLachlan
|
||||
|
||||
@ -44,7 +44,8 @@ struct TagPairSNAPComputeForce{};
|
||||
struct TagPairSNAPComputeNeigh{};
|
||||
struct TagPairSNAPComputeCayleyKlein{};
|
||||
struct TagPairSNAPPreUi{};
|
||||
struct TagPairSNAPComputeUi{};
|
||||
struct TagPairSNAPComputeUiSmall{}; // more parallelism, more divergence
|
||||
struct TagPairSNAPComputeUiLarge{}; // less parallelism, no divergence
|
||||
struct TagPairSNAPTransformUi{}; // re-order ulisttot from SoA to AoSoA, zero ylist
|
||||
struct TagPairSNAPComputeZi{};
|
||||
struct TagPairSNAPBeta{};
|
||||
@ -53,7 +54,9 @@ struct TagPairSNAPTransformBi{}; // re-order blist from AoSoA to AoS
|
||||
struct TagPairSNAPComputeYi{};
|
||||
struct TagPairSNAPComputeYiWithZlist{};
|
||||
template<int dir>
|
||||
struct TagPairSNAPComputeFusedDeidrj{};
|
||||
struct TagPairSNAPComputeFusedDeidrjSmall{}; // more parallelism, more divergence
|
||||
template<int dir>
|
||||
struct TagPairSNAPComputeFusedDeidrjLarge{}; // less parallelism, no divergence
|
||||
|
||||
// CPU backend only
|
||||
struct TagPairSNAPComputeNeighCPU{};
|
||||
@ -143,7 +146,10 @@ public:
|
||||
void operator() (TagPairSNAPPreUi,const int iatom_mod, const int j, const int iatom_div) const;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator() (TagPairSNAPComputeUi,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUi>::member_type& team) const;
|
||||
void operator() (TagPairSNAPComputeUiSmall,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiSmall>::member_type& team) const;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator() (TagPairSNAPComputeUiLarge,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiLarge>::member_type& team) const;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator() (TagPairSNAPTransformUi,const int iatom_mod, const int j, const int iatom_div) const;
|
||||
@ -168,7 +174,11 @@ public:
|
||||
|
||||
template<int dir>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator() (TagPairSNAPComputeFusedDeidrj<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrj<dir> >::member_type& team) const;
|
||||
void operator() (TagPairSNAPComputeFusedDeidrjSmall<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrjSmall<dir> >::member_type& team) const;
|
||||
|
||||
template<int dir>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator() (TagPairSNAPComputeFusedDeidrjLarge<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrjLarge<dir> >::member_type& team) const;
|
||||
|
||||
// CPU backend only
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
||||
@ -341,18 +341,32 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::compute(int eflag_in,
|
||||
// ComputeUi w/vector parallelism, shared memory, direct atomicAdd into ulisttot
|
||||
{
|
||||
// team_size_compute_ui is defined in `pair_snap_kokkos.h`
|
||||
|
||||
// scratch size: 32 atoms * (twojmax+1) cached values, no double buffer
|
||||
const int tile_size = vector_length * (twojmax + 1);
|
||||
const int scratch_size = scratch_size_helper<complex>(team_size_compute_ui * tile_size);
|
||||
|
||||
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
|
||||
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
|
||||
const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
|
||||
if (chunk_size < parallel_thresh)
|
||||
{
|
||||
// Version with parallelism over j_bend
|
||||
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUi> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
|
||||
policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeUi",policy_ui,*this);
|
||||
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
|
||||
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
|
||||
const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
|
||||
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUiSmall> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
|
||||
policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeUiSmall",policy_ui,*this);
|
||||
} else {
|
||||
// Version w/out parallelism over j_bend
|
||||
|
||||
// total number of teams needed: (natoms / 32) * (max_neighs)
|
||||
const int n_teams = chunk_size_div * max_neighs;
|
||||
const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
|
||||
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUiLarge> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
|
||||
policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeUiLarge",policy_ui,*this);
|
||||
}
|
||||
}
|
||||
|
||||
//TransformUi: un-"fold" ulisttot, zero ylist
|
||||
@ -412,25 +426,51 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::compute(int eflag_in,
|
||||
const int tile_size = vector_length * (twojmax + 1);
|
||||
const int scratch_size = scratch_size_helper<complex>(2 * team_size_compute_fused_deidrj * tile_size);
|
||||
|
||||
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
|
||||
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
|
||||
const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
|
||||
if (chunk_size < parallel_thresh)
|
||||
{
|
||||
// Version with parallelism over j_bend
|
||||
|
||||
// x direction
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||
policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeFusedDeidrj<0>",policy_fused_deidrj_x,*this);
|
||||
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
|
||||
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
|
||||
const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
|
||||
|
||||
// y direction
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||
policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeFusedDeidrj<1>",policy_fused_deidrj_y,*this);
|
||||
// x direction
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||
policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeFusedDeidrjSmall<0>",policy_fused_deidrj_x,*this);
|
||||
|
||||
// z direction
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||
policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeFusedDeidrj<2>",policy_fused_deidrj_z,*this);
|
||||
// y direction
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||
policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeFusedDeidrjSmall<1>",policy_fused_deidrj_y,*this);
|
||||
|
||||
// z direction
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||
policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeFusedDeidrjSmall<2>",policy_fused_deidrj_z,*this);
|
||||
} else {
|
||||
// Version w/out parallelism over j_bend
|
||||
|
||||
// total number of teams needed: (natoms / 32) * (max_neighs)
|
||||
const int n_teams = chunk_size_div * max_neighs;
|
||||
const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
|
||||
|
||||
// x direction
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||
policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeFusedDeidrjLarge<0>",policy_fused_deidrj_x,*this);
|
||||
|
||||
// y direction
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||
policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeFusedDeidrjLarge<1>",policy_fused_deidrj_y,*this);
|
||||
|
||||
// z direction
|
||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||
policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||
Kokkos::parallel_for("ComputeFusedDeidrjLarge<2>",policy_fused_deidrj_z,*this);
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
#endif // LMP_KOKKOS_GPU
|
||||
@ -603,13 +643,13 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
||||
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
||||
const auto idxb = icoeff % idxb_max;
|
||||
const auto idx_chem = icoeff / idxb_max;
|
||||
auto bveci = my_sna.blist(idxb, idx_chem, ii);
|
||||
real_type bveci = my_sna.blist(ii, idx_chem, idxb);
|
||||
d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bveci;
|
||||
k++;
|
||||
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
|
||||
const auto jdxb = jcoeff % idxb_max;
|
||||
const auto jdx_chem = jcoeff / idxb_max;
|
||||
real_type bvecj = my_sna.blist(jdxb, jdx_chem, ii);
|
||||
real_type bvecj = my_sna.blist(ii, jdx_chem, jdxb);
|
||||
d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bvecj;
|
||||
d_beta_pack(iatom_mod,jcoeff,iatom_div) += d_coeffi[k]*bveci;
|
||||
k++;
|
||||
@ -736,7 +776,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
||||
|
||||
template<class DeviceType, typename real_type, int vector_length>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUi,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUi>::member_type& team) const {
|
||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUiSmall,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUiSmall>::member_type& team) const {
|
||||
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
||||
|
||||
// extract flattened atom_div / neighbor number / bend location
|
||||
@ -756,11 +796,37 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
||||
const int ninside = d_ninside(ii);
|
||||
if (jj >= ninside) return;
|
||||
|
||||
my_sna.compute_ui(team,iatom_mod, jbend, jj, iatom_div);
|
||||
my_sna.compute_ui_small(team, iatom_mod, jbend, jj, iatom_div);
|
||||
});
|
||||
|
||||
}
|
||||
|
||||
template<class DeviceType, typename real_type, int vector_length>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUiLarge,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUiLarge>::member_type& team) const {
|
||||
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
||||
|
||||
// extract flattened atom_div / neighbor number / bend location
|
||||
int flattened_idx = team.team_rank() + team.league_rank() * team_size_compute_ui;
|
||||
|
||||
// extract neighbor index, iatom_div
|
||||
int iatom_div = flattened_idx / max_neighs; // removed "const" to work around GCC 7 bug
|
||||
int jj = flattened_idx - iatom_div * max_neighs;
|
||||
|
||||
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, vector_length),
|
||||
[&] (const int iatom_mod) {
|
||||
const int ii = iatom_mod + vector_length * iatom_div;
|
||||
if (ii >= chunk_size) return;
|
||||
|
||||
const int ninside = d_ninside(ii);
|
||||
if (jj >= ninside) return;
|
||||
|
||||
my_sna.compute_ui_large(team,iatom_mod, jj, iatom_div);
|
||||
});
|
||||
|
||||
}
|
||||
|
||||
|
||||
template<class DeviceType, typename real_type, int vector_length>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPTransformUi,const int iatom_mod, const int idxu, const int iatom_div) const {
|
||||
@ -861,9 +927,9 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
||||
|
||||
for (int itriple = 0; itriple < ntriples; itriple++) {
|
||||
|
||||
const auto blocal = my_sna.blist_pack(iatom_mod, idxb, itriple, iatom_div);
|
||||
const real_type blocal = my_sna.blist_pack(iatom_mod, idxb, itriple, iatom_div);
|
||||
|
||||
my_sna.blist(idxb, itriple, iatom) = blocal;
|
||||
my_sna.blist(iatom, itriple, idxb) = blocal;
|
||||
}
|
||||
|
||||
}
|
||||
@ -871,7 +937,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
||||
template<class DeviceType, typename real_type, int vector_length>
|
||||
template<int dir>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrj<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrj<dir> >::member_type& team) const {
|
||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrjSmall<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrjSmall<dir> >::member_type& team) const {
|
||||
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
||||
|
||||
// extract flattened atom_div / neighbor number / bend location
|
||||
@ -891,12 +957,38 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
||||
const int ninside = d_ninside(ii);
|
||||
if (jj >= ninside) return;
|
||||
|
||||
my_sna.template compute_fused_deidrj<dir>(team, iatom_mod, jbend, jj, iatom_div);
|
||||
my_sna.template compute_fused_deidrj_small<dir>(team, iatom_mod, jbend, jj, iatom_div);
|
||||
|
||||
});
|
||||
|
||||
}
|
||||
|
||||
template<class DeviceType, typename real_type, int vector_length>
|
||||
template<int dir>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrjLarge<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrjLarge<dir> >::member_type& team) const {
|
||||
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
||||
|
||||
// extract flattened atom_div / neighbor number / bend location
|
||||
int flattened_idx = team.team_rank() + team.league_rank() * team_size_compute_fused_deidrj;
|
||||
|
||||
// extract neighbor index, iatom_div
|
||||
int iatom_div = flattened_idx / max_neighs; // removed "const" to work around GCC 7 bug
|
||||
int jj = flattened_idx - max_neighs * iatom_div;
|
||||
|
||||
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, vector_length),
|
||||
[&] (const int iatom_mod) {
|
||||
const int ii = iatom_mod + vector_length * iatom_div;
|
||||
if (ii >= chunk_size) return;
|
||||
|
||||
const int ninside = d_ninside(ii);
|
||||
if (jj >= ninside) return;
|
||||
|
||||
my_sna.template compute_fused_deidrj_large<dir>(team, iatom_mod, jj, iatom_div);
|
||||
|
||||
});
|
||||
}
|
||||
|
||||
/* ----------------------------------------------------------------------
|
||||
Begin routines that are unique to the CPU codepath. These do not take
|
||||
advantage of AoSoA data layouts, but that could be a good point of
|
||||
@ -925,13 +1017,13 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
||||
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
||||
const auto idxb = icoeff % idxb_max;
|
||||
const auto idx_chem = icoeff / idxb_max;
|
||||
auto bveci = my_sna.blist(idxb,idx_chem,ii);
|
||||
real_type bveci = my_sna.blist(ii,idx_chem,idxb);
|
||||
d_beta(icoeff,ii) += d_coeffi[k]*bveci;
|
||||
k++;
|
||||
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
|
||||
const auto jdxb = jcoeff % idxb_max;
|
||||
const auto jdx_chem = jcoeff / idxb_max;
|
||||
auto bvecj = my_sna.blist(jdxb,jdx_chem,ii);
|
||||
real_type bvecj = my_sna.blist(ii,jdx_chem,jdxb);
|
||||
d_beta(icoeff,ii) += d_coeffi[k]*bvecj;
|
||||
d_beta(jcoeff,ii) += d_coeffi[k]*bveci;
|
||||
k++;
|
||||
@ -1221,7 +1313,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
||||
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
||||
const auto idxb = icoeff % idxb_max;
|
||||
const auto idx_chem = icoeff / idxb_max;
|
||||
evdwl += d_coeffi[icoeff+1]*my_sna.blist(idxb,idx_chem,ii);
|
||||
evdwl += d_coeffi[icoeff+1]*my_sna.blist(ii,idx_chem,idxb);
|
||||
}
|
||||
|
||||
// quadratic contributions
|
||||
@ -1230,12 +1322,12 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
||||
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
||||
const auto idxb = icoeff % idxb_max;
|
||||
const auto idx_chem = icoeff / idxb_max;
|
||||
auto bveci = my_sna.blist(idxb,idx_chem,ii);
|
||||
real_type bveci = my_sna.blist(ii,idx_chem,idxb);
|
||||
evdwl += 0.5*d_coeffi[k++]*bveci*bveci;
|
||||
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
|
||||
auto jdxb = jcoeff % idxb_max;
|
||||
auto jdx_chem = jcoeff / idxb_max;
|
||||
auto bvecj = my_sna.blist(jdxb,jdx_chem,ii);
|
||||
auto bvecj = my_sna.blist(ii,jdx_chem,jdxb);
|
||||
evdwl += d_coeffi[k++]*bveci*bvecj;
|
||||
}
|
||||
}
|
||||
|
||||
@ -45,12 +45,12 @@ struct WignerWrapper {
|
||||
{ ; }
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex get(const int& ma) {
|
||||
complex get(const int& ma) const {
|
||||
return complex(buffer[offset + 2 * vector_length * ma], buffer[offset + vector_length + 2 * vector_length * ma]);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void set(const int& ma, const complex& store) {
|
||||
void set(const int& ma, const complex& store) const {
|
||||
buffer[offset + 2 * vector_length * ma] = store.re;
|
||||
buffer[offset + vector_length + 2 * vector_length * ma] = store.im;
|
||||
}
|
||||
@ -122,8 +122,14 @@ inline
|
||||
void compute_cayley_klein(const int&, const int&, const int&);
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void pre_ui(const int&, const int&, const int&, const int&); // ForceSNAP
|
||||
|
||||
// version of the code with parallelism over j_bend
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_ui(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); // ForceSNAP
|
||||
void compute_ui_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); // ForceSNAP
|
||||
// version of the code without parallelism over j_bend
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_ui_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); // ForceSNAP
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_zi(const int&, const int&, const int&); // ForceSNAP
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
@ -135,6 +141,35 @@ inline
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_bi(const int&, const int&, const int&); // ForceSNAP
|
||||
|
||||
// functions for derivatives, GPU only
|
||||
// version of the code with parallelism over j_bend
|
||||
template<int dir>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_fused_deidrj_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); //ForceSNAP
|
||||
// version of the code without parallelism over j_bend
|
||||
template<int dir>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_fused_deidrj_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); //ForceSNAP
|
||||
|
||||
// core "evaluation" functions that get plugged into "compute" functions
|
||||
// plugged into compute_ui_small, compute_ui_large
|
||||
KOKKOS_FORCEINLINE_FUNCTION
|
||||
void evaluate_ui_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&, const int&,
|
||||
const int&, const int&, const int&);
|
||||
// plugged into compute_zi, compute_yi
|
||||
KOKKOS_FORCEINLINE_FUNCTION
|
||||
complex evaluate_zi(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
|
||||
const int&, const int&, const int&, const int&, const real_type*);
|
||||
// plugged into compute_yi, compute_yi_with_zlist
|
||||
KOKKOS_FORCEINLINE_FUNCTION
|
||||
real_type evaluate_beta_scaled(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
|
||||
const Kokkos::View<real_type***, Kokkos::LayoutLeft, DeviceType> &);
|
||||
// plugged into compute_fused_deidrj_small, compute_fused_deidrj_large
|
||||
KOKKOS_FORCEINLINE_FUNCTION
|
||||
real_type evaluate_duidrj_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&,
|
||||
const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&,
|
||||
const int&, const int&, const int&, const int&);
|
||||
|
||||
// functions for bispectrum coefficients, CPU only
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void pre_ui_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team,const int&,const int&); // ForceSNAP
|
||||
@ -148,11 +183,6 @@ inline
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_bi_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int); // ForceSNAP
|
||||
|
||||
// functions for derivatives, GPU only
|
||||
template<int dir>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_fused_deidrj(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); //ForceSNAP
|
||||
|
||||
// functions for derivatives, CPU only
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_duidrj_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int); //ForceSNAP
|
||||
@ -168,23 +198,6 @@ inline
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void compute_s_dsfac(const real_type, const real_type, real_type&, real_type&); // compute_cayley_klein
|
||||
|
||||
static KOKKOS_FORCEINLINE_FUNCTION
|
||||
void sincos_wrapper(double x, double* sin_, double *cos_) {
|
||||
#ifdef __SYCL_DEVICE_ONLY__
|
||||
*sin_ = sycl::sincos(x, cos_);
|
||||
#else
|
||||
sincos(x, sin_, cos_);
|
||||
#endif
|
||||
}
|
||||
static KOKKOS_FORCEINLINE_FUNCTION
|
||||
void sincos_wrapper(float x, float* sin_, float *cos_) {
|
||||
#ifdef __SYCL_DEVICE_ONLY__
|
||||
*sin_ = sycl::sincos(x, cos_);
|
||||
#else
|
||||
sincosf(x, sin_, cos_);
|
||||
#endif
|
||||
}
|
||||
|
||||
#ifdef TIMING_INFO
|
||||
double* timers;
|
||||
timespec starttime, endtime;
|
||||
@ -207,7 +220,7 @@ inline
|
||||
|
||||
int twojmax, diagonalstyle;
|
||||
|
||||
t_sna_3d_ll blist;
|
||||
t_sna_3d blist;
|
||||
t_sna_3c_ll ulisttot;
|
||||
t_sna_3c_ll ulisttot_full; // un-folded ulisttot, cpu only
|
||||
t_sna_3c_ll zlist;
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -628,7 +628,8 @@ void PairSNAP::read_files(char *coefffilename, char *paramfilename)
|
||||
chemflag = 0;
|
||||
bnormflag = 0;
|
||||
wselfallflag = 0;
|
||||
chunksize = 4096;
|
||||
chunksize = 32768;
|
||||
parallel_thresh = 8192;
|
||||
|
||||
// open SNAP parameter file on proc 0
|
||||
|
||||
@ -696,6 +697,8 @@ void PairSNAP::read_files(char *coefffilename, char *paramfilename)
|
||||
wselfallflag = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
|
||||
else if (keywd == "chunksize")
|
||||
chunksize = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
|
||||
else if (keywd == "parallelthresh")
|
||||
parallel_thresh = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
|
||||
else
|
||||
error->all(FLERR,"Unknown parameter '{}' in SNAP "
|
||||
"parameter file", keywd);
|
||||
|
||||
@ -59,7 +59,7 @@ class PairSNAP : public Pair {
|
||||
double **scale; // for thermodynamic integration
|
||||
int twojmax, switchflag, bzeroflag, bnormflag;
|
||||
int chemflag, wselfallflag;
|
||||
int chunksize;
|
||||
int chunksize,parallel_thresh;
|
||||
double rfac0, rmin0, wj1, wj2;
|
||||
int rcutfacflag, twojmaxflag; // flags for required parameters
|
||||
int beta_max; // length of beta
|
||||
|
||||
@ -20,7 +20,7 @@ charges (dsf and long-range treatment of charges)
|
||||
out-of-plane angle
|
||||
|
||||
See the file doc/drude_tutorial.html for getting started.
|
||||
See the doc pages for "pair_style buck6d/coul/gauss", "anlge_style class2",
|
||||
See the doc pages for "pair_style buck6d/coul/gauss", "angle_style class2",
|
||||
"angle_style cosine/buck6d", and "improper_style inversion/harmonic"
|
||||
commands to get started. Also see the above mentioned website and
|
||||
literature for further documentation about the force field.
|
||||
|
||||
@ -34,6 +34,7 @@ exclude:
|
||||
- lib/hdnnp
|
||||
- lib/kim
|
||||
- lib/kokkos
|
||||
- lib/latte
|
||||
- lib/machdyn
|
||||
- lib/mdi
|
||||
- lib/mscg
|
||||
@ -41,6 +42,7 @@ exclude:
|
||||
- lib/plumed
|
||||
- lib/quip
|
||||
- lib/scafacos
|
||||
- lib/voronoi
|
||||
- src/Make.sh
|
||||
patterns:
|
||||
- "*.c"
|
||||
|
||||
Reference in New Issue
Block a user