Merge branch 'master' into collected-small-changes
This commit is contained in:
@ -22,4 +22,5 @@ page.
|
|||||||
Build_extras
|
Build_extras
|
||||||
Build_manual
|
Build_manual
|
||||||
Build_windows
|
Build_windows
|
||||||
|
Build_diskspace
|
||||||
Build_development
|
Build_development
|
||||||
|
|||||||
45
doc/src/Build_diskspace.rst
Normal file
45
doc/src/Build_diskspace.rst
Normal file
@ -0,0 +1,45 @@
|
|||||||
|
Notes for saving disk space when building LAMMPS from source
|
||||||
|
------------------------------------------------------------
|
||||||
|
|
||||||
|
LAMMPS is a large software project with a large number of source files,
|
||||||
|
extensive documentation, and a large collection of example files.
|
||||||
|
When downloading LAMMPS by cloning the
|
||||||
|
`git repository from GitHub <https://github.com/lammps/lammps>`_ this
|
||||||
|
will by default also download the entire commit history since September 2006.
|
||||||
|
Compiling LAMMPS will add the storage requirements of the compiled object
|
||||||
|
files and libraries to the tally.
|
||||||
|
|
||||||
|
In a user account on an HPC cluster with filesystem quotas or in other
|
||||||
|
environments with restricted disk space capacity it may be needed to
|
||||||
|
reduce the storage requirements. Here are some suggestions:
|
||||||
|
|
||||||
|
- Create a so-called shallow repository by cloning only the last commit
|
||||||
|
instead of the full project history by using ``git clone git@github.com:lammps/lammps --depth=1 --branch=master``.
|
||||||
|
This reduces the downloaded size to about half. With ``--depth=1`` it is not possible to check out different
|
||||||
|
versions/branches of LAMMPS, using ``--depth=1000`` will make multiple recent versions available at little
|
||||||
|
extra storage needs (the entire git history had nearly 30,000 commits in fall 2021).
|
||||||
|
|
||||||
|
- Download a tar archive from either the `download section on the LAMMPS homepage <https://www.lammps.org/download.html>`_
|
||||||
|
or from the `LAMMPS releases page on GitHub <https://github.com/lammps/lammps/releases>`_ these will not
|
||||||
|
contain the git history at all.
|
||||||
|
|
||||||
|
- Build LAMMPS without the debug flag (remove ``-g`` from the machine makefile or use ``-DCMAKE_BUILD_TYPE=Release``)
|
||||||
|
or use the ``strip`` command on the LAMMPS executable when no more debugging would be needed. The strip command
|
||||||
|
may also be applied to the LAMMPS shared library. The static library may be deleted entirely.
|
||||||
|
|
||||||
|
- Delete compiled object files and libraries after copying the LAMMPS executable to a permanent location.
|
||||||
|
When using the traditional build process, one may use ``make clean-<machine>`` or ``make clean-all``
|
||||||
|
to delete object files in the src folder. For CMake based builds, one may use ``make clean`` or just
|
||||||
|
delete the entire build folder.
|
||||||
|
|
||||||
|
- The folders containing the documentation tree (doc), the examples (examples) are not needed to build and
|
||||||
|
run LAMMPS and can be safely deleted. Some files in the potentials folder are large and may be deleted,
|
||||||
|
if not needed. The largest of those files (occupying about 120 MBytes combined) will only be downloaded on
|
||||||
|
demand, when the corresponding package is installed.
|
||||||
|
|
||||||
|
- When using the CMake build procedure, the compilation can be done on a (local) scratch storage that will not
|
||||||
|
count toward the quota. A local scratch file system may offer the additional benefit of speeding up creating
|
||||||
|
object files and linking with libraries compared to a networked file system. Also with CMake (and unlike with
|
||||||
|
the traditional make) it is possible to compile LAMMPS executables with different settings and packages included
|
||||||
|
from the same source tree since all the configuration information is stored in the build folder. So it is
|
||||||
|
not necessary to have multiple copies of LAMMPS.
|
||||||
@ -29,7 +29,7 @@ The following folks deserve special recognition. Many of the packages
|
|||||||
they have written are unique for an MD code and LAMMPS would not be as
|
they have written are unique for an MD code and LAMMPS would not be as
|
||||||
general-purpose as it is without their expertise and efforts.
|
general-purpose as it is without their expertise and efforts.
|
||||||
|
|
||||||
* Metin Aktulga (MSU), REAXFF package for C version of ReaxFF
|
* Metin Aktulga (MSU), REAXFF package for C/C++ version of ReaxFF
|
||||||
* Mike Brown (Intel), GPU and INTEL packages
|
* Mike Brown (Intel), GPU and INTEL packages
|
||||||
* Colin Denniston (U Western Ontario), LATBOLTZ package
|
* Colin Denniston (U Western Ontario), LATBOLTZ package
|
||||||
* Georg Ganzenmuller (EMI), MACHDYN and SPH packages
|
* Georg Ganzenmuller (EMI), MACHDYN and SPH packages
|
||||||
@ -37,9 +37,10 @@ general-purpose as it is without their expertise and efforts.
|
|||||||
* Reese Jones (Sandia) and colleagues, ATC package for atom/continuum coupling
|
* Reese Jones (Sandia) and colleagues, ATC package for atom/continuum coupling
|
||||||
* Christoph Kloss (DCS Computing), LIGGGHTS code for granular materials, built on top of LAMMPS
|
* Christoph Kloss (DCS Computing), LIGGGHTS code for granular materials, built on top of LAMMPS
|
||||||
* Rudra Mukherjee (JPL), POEMS package for articulated rigid body motion
|
* Rudra Mukherjee (JPL), POEMS package for articulated rigid body motion
|
||||||
* Trung Ngyuen (Northwestern U), GPU and RIGID and BODY packages
|
* Trung Ngyuen (Northwestern U), GPU, RIGID, BODY, and DIELECTRIC packages
|
||||||
* Mike Parks (Sandia), PERI package for Peridynamics
|
* Mike Parks (Sandia), PERI package for Peridynamics
|
||||||
* Roy Pollock (LLNL), Ewald and PPPM solvers
|
* Roy Pollock (LLNL), Ewald and PPPM solvers
|
||||||
|
* Julien Tranchida (Sandia), SPIN package
|
||||||
* Christian Trott (Sandia), CUDA and KOKKOS packages
|
* Christian Trott (Sandia), CUDA and KOKKOS packages
|
||||||
* Ilya Valuev (JIHT), AWPMD package for wave packet MD
|
* Ilya Valuev (JIHT), AWPMD package for wave packet MD
|
||||||
* Greg Wagner (Northwestern U), MEAM package for MEAM potential
|
* Greg Wagner (Northwestern U), MEAM package for MEAM potential
|
||||||
|
|||||||
@ -27,19 +27,19 @@ General features
|
|||||||
* distributed memory message-passing parallelism (MPI)
|
* distributed memory message-passing parallelism (MPI)
|
||||||
* shared memory multi-threading parallelism (OpenMP)
|
* shared memory multi-threading parallelism (OpenMP)
|
||||||
* spatial decomposition of simulation domain for MPI parallelism
|
* spatial decomposition of simulation domain for MPI parallelism
|
||||||
* particle decomposition inside of spatial decomposition for OpenMP parallelism
|
* particle decomposition inside of spatial decomposition for OpenMP and GPU parallelism
|
||||||
* GPLv2 licensed open-source distribution
|
* GPLv2 licensed open-source distribution
|
||||||
* highly portable C++-11
|
* highly portable C++-11
|
||||||
* modular code with most functionality in optional packages
|
* modular code with most functionality in optional packages
|
||||||
* only depends on MPI library for basic parallel functionality
|
* only depends on MPI library for basic parallel functionality, MPI stub for serial compilation
|
||||||
* other libraries are optional and only required for specific packages
|
* other libraries are optional and only required for specific packages
|
||||||
* GPU (CUDA and OpenCL), Intel Xeon Phi, and OpenMP support for many code features
|
* GPU (CUDA, OpenCL, HIP, SYCL), Intel Xeon Phi, and OpenMP support for many code features
|
||||||
* easy to extend with new features and functionality
|
* easy to extend with new features and functionality
|
||||||
* runs from an input script
|
* runs from an input script
|
||||||
* syntax for defining and using variables and formulas
|
* syntax for defining and using variables and formulas
|
||||||
* syntax for looping over runs and breaking out of loops
|
* syntax for looping over runs and breaking out of loops
|
||||||
* run one or multiple simulations simultaneously (in parallel) from one script
|
* run one or multiple simulations simultaneously (in parallel) from one script
|
||||||
* build as library, invoke LAMMPS through library interface or provided Python wrapper
|
* build as library, invoke LAMMPS through library interface or provided Python wrapper or SWIG based wrappers
|
||||||
* couple with other codes: LAMMPS calls other code, other code calls LAMMPS, umbrella code calls both
|
* couple with other codes: LAMMPS calls other code, other code calls LAMMPS, umbrella code calls both
|
||||||
|
|
||||||
.. _particle:
|
.. _particle:
|
||||||
@ -57,9 +57,11 @@ Particle and model types
|
|||||||
* granular materials
|
* granular materials
|
||||||
* coarse-grained mesoscale models
|
* coarse-grained mesoscale models
|
||||||
* finite-size spherical and ellipsoidal particles
|
* finite-size spherical and ellipsoidal particles
|
||||||
* finite-size line segment (2d) and triangle (3d) particles
|
* finite-size line segment (2d) and triangle (3d) particles
|
||||||
|
* finite-size rounded polygons (2d) and polyhedra (3d) particles
|
||||||
* point dipole particles
|
* point dipole particles
|
||||||
* rigid collections of particles
|
* particles with magnetic spin
|
||||||
|
* rigid collections of n particles
|
||||||
* hybrid combinations of these
|
* hybrid combinations of these
|
||||||
|
|
||||||
.. _ff:
|
.. _ff:
|
||||||
@ -74,24 +76,28 @@ commands)
|
|||||||
|
|
||||||
* pairwise potentials: Lennard-Jones, Buckingham, Morse, Born-Mayer-Huggins, Yukawa, soft, class 2 (COMPASS), hydrogen bond, tabulated
|
* pairwise potentials: Lennard-Jones, Buckingham, Morse, Born-Mayer-Huggins, Yukawa, soft, class 2 (COMPASS), hydrogen bond, tabulated
|
||||||
* charged pairwise potentials: Coulombic, point-dipole
|
* charged pairwise potentials: Coulombic, point-dipole
|
||||||
* many-body potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, REBO, AIREBO, ReaxFF, COMB, SNAP, Streitz-Mintmire, 3-body polymorphic
|
* many-body potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, REBO, AIREBO, ReaxFF, COMB, Streitz-Mintmire, 3-body polymorphic, BOP, Vashishta
|
||||||
* long-range interactions for charge, point-dipoles, and LJ dispersion: Ewald, Wolf, PPPM (similar to particle-mesh Ewald)
|
* machine learning potentials: SNAP, GAP, ACE, N2P2, RANN, AGNI
|
||||||
|
* long-range interactions for charge, point-dipoles, and LJ dispersion: Ewald, Wolf, PPPM (similar to particle-mesh Ewald), MSM
|
||||||
* polarization models: :doc:`QEq <fix_qeq>`, :doc:`core/shell model <Howto_coreshell>`, :doc:`Drude dipole model <Howto_drude>`
|
* polarization models: :doc:`QEq <fix_qeq>`, :doc:`core/shell model <Howto_coreshell>`, :doc:`Drude dipole model <Howto_drude>`
|
||||||
* charge equilibration (QEq via dynamic, point, shielded, Slater methods)
|
* charge equilibration (QEq via dynamic, point, shielded, Slater methods)
|
||||||
* coarse-grained potentials: DPD, GayBerne, REsquared, colloidal, DLVO
|
* coarse-grained potentials: DPD, GayBerne, REsquared, colloidal, DLVO
|
||||||
* mesoscopic potentials: granular, Peridynamics, SPH
|
* mesoscopic potentials: granular, Peridynamics, SPH, mesoscopic tubular potential (MESONT)
|
||||||
|
* semi-empirical potentials: multi-ion generalized pseudopotential theory (MGPT), second moment tight binding + QEq (SMTB-Q), density functional tight-binding (LATTE)
|
||||||
* electron force field (eFF, AWPMD)
|
* electron force field (eFF, AWPMD)
|
||||||
* bond potentials: harmonic, FENE, Morse, nonlinear, class 2, quartic (breakable)
|
* bond potentials: harmonic, FENE, Morse, nonlinear, class 2, quartic (breakable), tabulated
|
||||||
* angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic, class 2 (COMPASS)
|
* angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic, class 2 (COMPASS), tabulated
|
||||||
* dihedral potentials: harmonic, CHARMM, multi-harmonic, helix, class 2 (COMPASS), OPLS
|
* dihedral potentials: harmonic, CHARMM, multi-harmonic, helix, class 2 (COMPASS), OPLS, tabulated
|
||||||
* improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS)
|
* improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS), tabulated
|
||||||
* polymer potentials: all-atom, united-atom, bead-spring, breakable
|
* polymer potentials: all-atom, united-atom, bead-spring, breakable
|
||||||
* water potentials: TIP3P, TIP4P, SPC
|
* water potentials: TIP3P, TIP4P, SPC, SPC/E and variants
|
||||||
|
* interlayer potentials for graphene and analogues
|
||||||
|
* metal-organic framework potentials (QuickFF, MO-FF)
|
||||||
* implicit solvent potentials: hydrodynamic lubrication, Debye
|
* implicit solvent potentials: hydrodynamic lubrication, Debye
|
||||||
* force-field compatibility with common CHARMM, AMBER, DREIDING, OPLS, GROMACS, COMPASS options
|
* force-field compatibility with common CHARMM, AMBER, DREIDING, OPLS, GROMACS, COMPASS options
|
||||||
* access to the `OpenKIM Repository <http://openkim.org>`_ of potentials via :doc:`kim command <kim_commands>`
|
* access to the `OpenKIM Repository <http://openkim.org>`_ of potentials via :doc:`kim command <kim_commands>`
|
||||||
* hybrid potentials: multiple pair, bond, angle, dihedral, improper potentials can be used in one simulation
|
* hybrid potentials: multiple pair, bond, angle, dihedral, improper potentials can be used in one simulation
|
||||||
* overlaid potentials: superposition of multiple pair potentials
|
* overlaid potentials: superposition of multiple pair potentials (including many-body) with optional scale factor
|
||||||
|
|
||||||
.. _create:
|
.. _create:
|
||||||
|
|
||||||
@ -124,9 +130,10 @@ Ensembles, constraints, and boundary conditions
|
|||||||
* harmonic (umbrella) constraint forces
|
* harmonic (umbrella) constraint forces
|
||||||
* rigid body constraints
|
* rigid body constraints
|
||||||
* SHAKE bond and angle constraints
|
* SHAKE bond and angle constraints
|
||||||
* Monte Carlo bond breaking, formation, swapping
|
* motion constraints to manifold surfaces
|
||||||
|
* Monte Carlo bond breaking, formation, swapping, template based reaction modeling
|
||||||
* atom/molecule insertion and deletion
|
* atom/molecule insertion and deletion
|
||||||
* walls of various kinds
|
* walls of various kinds, static and moving
|
||||||
* non-equilibrium molecular dynamics (NEMD)
|
* non-equilibrium molecular dynamics (NEMD)
|
||||||
* variety of additional boundary conditions and constraints
|
* variety of additional boundary conditions and constraints
|
||||||
|
|
||||||
@ -150,6 +157,7 @@ Diagnostics
|
|||||||
^^^^^^^^^^^
|
^^^^^^^^^^^
|
||||||
|
|
||||||
* see various flavors of the :doc:`fix <fix>` and :doc:`compute <compute>` commands
|
* see various flavors of the :doc:`fix <fix>` and :doc:`compute <compute>` commands
|
||||||
|
* introspection command for system, simulation, and compile time settings and configurations
|
||||||
|
|
||||||
.. _output:
|
.. _output:
|
||||||
|
|
||||||
@ -164,8 +172,9 @@ Output
|
|||||||
* parallel I/O of dump and restart files
|
* parallel I/O of dump and restart files
|
||||||
* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
|
* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
|
||||||
* user-defined system-wide (log file) or per-atom (dump file) calculations
|
* user-defined system-wide (log file) or per-atom (dump file) calculations
|
||||||
* spatial and time averaging of per-atom quantities
|
* custom partitioning (chunks) for binning, and static or dynamic grouping of atoms for analysis
|
||||||
* time averaging of system-wide quantities
|
* spatial, time, and per-chunk averaging of per-atom quantities
|
||||||
|
* time averaging and histogramming of system-wide quantities
|
||||||
* atom snapshots in native, XYZ, XTC, DCD, CFG formats
|
* atom snapshots in native, XYZ, XTC, DCD, CFG formats
|
||||||
|
|
||||||
.. _replica1:
|
.. _replica1:
|
||||||
@ -178,7 +187,7 @@ Multi-replica models
|
|||||||
* :doc:`parallel replica dynamics <prd>`
|
* :doc:`parallel replica dynamics <prd>`
|
||||||
* :doc:`temperature accelerated dynamics <tad>`
|
* :doc:`temperature accelerated dynamics <tad>`
|
||||||
* :doc:`parallel tempering <temper>`
|
* :doc:`parallel tempering <temper>`
|
||||||
* :doc:`path-integral MD <fix_pimd>`
|
* path-integral MD: `first variant <fix_pimd>`, `second variant <fix_ipi>`
|
||||||
* multi-walker collective variables with :doc:`Colvars <fix_colvars>` and :doc:`Plumed <fix_plumed>`
|
* multi-walker collective variables with :doc:`Colvars <fix_colvars>` and :doc:`Plumed <fix_plumed>`
|
||||||
|
|
||||||
.. _prepost:
|
.. _prepost:
|
||||||
@ -210,11 +219,12 @@ page for details.
|
|||||||
These are LAMMPS capabilities which you may not think of as typical
|
These are LAMMPS capabilities which you may not think of as typical
|
||||||
classical MD options:
|
classical MD options:
|
||||||
|
|
||||||
* :doc:`static <balance>` and :doc:`dynamic load-balancing <fix_balance>`
|
* :doc:`static <balance>` and :doc:`dynamic load-balancing <fix_balance>`, optional with recursive bisectioning decomposition
|
||||||
* :doc:`generalized aspherical particles <Howto_body>`
|
* :doc:`generalized aspherical particles <Howto_body>`
|
||||||
* :doc:`stochastic rotation dynamics (SRD) <fix_srd>`
|
* :doc:`stochastic rotation dynamics (SRD) <fix_srd>`
|
||||||
* :doc:`real-time visualization and interactive MD <fix_imd>`
|
* :doc:`real-time visualization and interactive MD <fix_imd>`, :doc:`built-in renderer for images and movies <dump_image>`
|
||||||
* calculate :doc:`virtual diffraction patterns <compute_xrd>`
|
* calculate :doc:`virtual diffraction patterns <compute_xrd>`
|
||||||
|
* calculate :doc:`finite temperature phonon dispersion <fix_phonon>` and the :doc:`dynamical matrix of minimized structures <dynamical_matrix>`
|
||||||
* :doc:`atom-to-continuum coupling <fix_atc>` with finite elements
|
* :doc:`atom-to-continuum coupling <fix_atc>` with finite elements
|
||||||
* coupled rigid body integration via the :doc:`POEMS <fix_poems>` library
|
* coupled rigid body integration via the :doc:`POEMS <fix_poems>` library
|
||||||
* :doc:`QM/MM coupling <fix_qmmm>`
|
* :doc:`QM/MM coupling <fix_qmmm>`
|
||||||
|
|||||||
@ -1,40 +1,61 @@
|
|||||||
LAMMPS open-source license
|
LAMMPS open-source license
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
LAMMPS is a freely-available open-source code, distributed under the
|
GPL version of LAMMPS
|
||||||
terms of the `GNU Public License Version 2 <gpl_>`_, which means you can
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
use or modify the code however you wish for your own purposes, but have
|
|
||||||
to adhere to certain rules when redistributing it or software derived
|
LAMMPS is an open-source code, available free-of-charge, and distributed
|
||||||
|
under the terms of the `GNU Public License Version 2 <gpl_>`_ (GPLv2),
|
||||||
|
which means you can use or modify the code however you wish for your own
|
||||||
|
purposes, but have to adhere to certain rules when redistributing it -
|
||||||
|
specifically in binary form - or are distributing software derived
|
||||||
from it or that includes parts of it.
|
from it or that includes parts of it.
|
||||||
|
|
||||||
LAMMPS comes with no warranty of any kind. As each source file states
|
LAMMPS comes with no warranty of any kind.
|
||||||
in its header, it is a copyrighted code that is distributed free-of-
|
|
||||||
charge, under the terms of the `GNU Public License Version 2 <gpl_>`_
|
As each source file states in its header, it is a copyrighted code, and
|
||||||
(GPLv2). This is often referred to as open-source distribution - see
|
thus not in the public domain. For more information about open-source
|
||||||
`www.gnu.org <gnuorg_>`_ or `www.opensource.org <opensource_>`_. The
|
software and open-source distribution, see `www.gnu.org <gnuorg_>`_
|
||||||
legal text of the GPL is in the LICENSE file included in the LAMMPS
|
or `www.opensource.org <opensource_>`_. The legal text of the GPL as it
|
||||||
distribution.
|
applies to LAMMPS is in the LICENSE file included in the LAMMPS distribution.
|
||||||
|
|
||||||
.. _gpl: https://github.com/lammps/lammps/blob/master/LICENSE
|
.. _gpl: https://github.com/lammps/lammps/blob/master/LICENSE
|
||||||
|
|
||||||
|
.. _lgpl: https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html
|
||||||
|
|
||||||
.. _gnuorg: http://www.gnu.org
|
.. _gnuorg: http://www.gnu.org
|
||||||
|
|
||||||
.. _opensource: http://www.opensource.org
|
.. _opensource: http://www.opensource.org
|
||||||
|
|
||||||
Here is a summary of what the GPL means for LAMMPS users:
|
Here is a more specific summary of what the GPL means for LAMMPS users:
|
||||||
|
|
||||||
(1) Anyone is free to use, modify, or extend LAMMPS in any way they
|
(1) Anyone is free to use, copy, modify, or extend LAMMPS in any way they
|
||||||
choose, including for commercial purposes.
|
choose, including for commercial purposes.
|
||||||
|
|
||||||
(2) If you **distribute** a modified version of LAMMPS, it must remain
|
(2) If you **distribute** a modified version of LAMMPS, it must remain
|
||||||
open-source, meaning you distribute **all** of it under the terms of
|
open-source, meaning you are required to distribute **all** of it under
|
||||||
the GPL. You should clearly annotate such a code as a derivative version
|
the terms of the GPL. You should clearly annotate such a modified code
|
||||||
of LAMMPS.
|
as a derivative version of LAMMPS.
|
||||||
|
|
||||||
(3) If you release any code that includes or uses LAMMPS source code,
|
(3) If you release any code that includes or uses LAMMPS source code,
|
||||||
then it must also be open-sourced, meaning you distribute it under
|
then it must also be open-sourced, meaning you distribute it under
|
||||||
the terms of the GPL.
|
the terms of the GPL. You may write code that interfaces LAMMPS to
|
||||||
|
a differently licensed library. In that case the code that provides
|
||||||
|
the interface must be licensed GPL, but not necessarily that library
|
||||||
|
unless you are distributing binaries that require the library to run.
|
||||||
|
|
||||||
(4) If you give LAMMPS files to someone else, the GPL LICENSE file and
|
(4) If you give LAMMPS files to someone else, the GPL LICENSE file and
|
||||||
source file headers (including the copyright and GPL notices) should
|
source file headers (including the copyright and GPL notices) should
|
||||||
remain part of the code.
|
remain part of the code.
|
||||||
|
|
||||||
|
|
||||||
|
LGPL version of LAMMPS
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
We occasionally make stable LAMMPS releases available under the `GNU
|
||||||
|
Lesser Public License v2.1 <lgpl_>`_. This is on request only and with
|
||||||
|
non-LGPL compliant files removed. This allows uses linking non-GPL
|
||||||
|
compatible software with the (otherwise unmodified) LAMMPS library
|
||||||
|
or loading it dynamically at runtime. Any **modifications** to
|
||||||
|
the LAMMPS code however, even with the LGPL licensed version, must still
|
||||||
|
be made available under the same open source terms as LAMMPS itself.
|
||||||
|
|||||||
@ -10,24 +10,26 @@ conditions. It can model 2d or 3d systems with only a few particles
|
|||||||
up to millions or billions.
|
up to millions or billions.
|
||||||
|
|
||||||
LAMMPS can be built and run on a laptop or desktop machine, but is
|
LAMMPS can be built and run on a laptop or desktop machine, but is
|
||||||
designed for parallel computers. It will run on any parallel machine
|
designed for parallel computers. It will run in serial and on any
|
||||||
that supports the `MPI <mpi_>`_ message-passing library. This includes
|
parallel machine that supports the `MPI <mpi_>`_ message-passing
|
||||||
shared-memory boxes and distributed-memory clusters and
|
library. This includes shared-memory boxes and distributed-memory
|
||||||
supercomputers.
|
clusters and supercomputers. Parts of LAMMPS also support
|
||||||
|
`OpenMP multi-threading <omp_>`_, vectorization and GPU acceleration.
|
||||||
|
|
||||||
.. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface
|
.. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface
|
||||||
.. _lws: https://www.lammps.org
|
.. _lws: https://www.lammps.org
|
||||||
|
.. _omp: https://www.openmp.org
|
||||||
|
|
||||||
LAMMPS is written in C++ and requires a compiler that is at least
|
LAMMPS is written in C++ and requires a compiler that is at least
|
||||||
compatible with the C++-11 standard.
|
compatible with the C++-11 standard. Earlier versions were written in
|
||||||
Earlier versions were written in F77 and F90. See the `History page
|
F77, F90, and C++-98. See the `History page
|
||||||
<https://www.lammps.org/history.html>`_ of the website for details. All
|
<https://www.lammps.org/history.html>`_ of the website for details. All
|
||||||
versions can be downloaded from the `LAMMPS website <lws_>`_.
|
versions can be downloaded as source code from the `LAMMPS website
|
||||||
|
<lws_>`_.
|
||||||
|
|
||||||
LAMMPS is designed to be easy to modify or extend with new
|
LAMMPS is designed to be easy to modify or extend with new capabilities,
|
||||||
capabilities, such as new force fields, atom types, boundary
|
such as new force fields, atom types, boundary conditions, or
|
||||||
conditions, or diagnostics. See the :doc:`Modify <Modify>` page for
|
diagnostics. See the :doc:`Modify <Modify>` page for more details.
|
||||||
more details.
|
|
||||||
|
|
||||||
In the most general sense, LAMMPS integrates Newton's equations of
|
In the most general sense, LAMMPS integrates Newton's equations of
|
||||||
motion for a collection of interacting particles. A single particle
|
motion for a collection of interacting particles. A single particle
|
||||||
@ -47,4 +49,5 @@ MPI parallelization to partition the simulation domain into small
|
|||||||
sub-domains of equal computational cost, one of which is assigned to
|
sub-domains of equal computational cost, one of which is assigned to
|
||||||
each processor. Processors communicate and store "ghost" atom
|
each processor. Processors communicate and store "ghost" atom
|
||||||
information for atoms that border their sub-domain. Multi-threading
|
information for atoms that border their sub-domain. Multi-threading
|
||||||
parallelization with with particle-decomposition can be used in addition.
|
parallelization and GPU acceleration with with particle-decomposition
|
||||||
|
can be used in addition.
|
||||||
|
|||||||
@ -2,12 +2,21 @@ What does a LAMMPS version mean
|
|||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
||||||
The LAMMPS "version" is the date when it was released, such as 1 May
|
The LAMMPS "version" is the date when it was released, such as 1 May
|
||||||
2014. LAMMPS is updated continuously. Whenever we fix a bug or add a
|
2014. LAMMPS is updated continuously and we aim to keep it working
|
||||||
feature, we release it in the next *patch* release, which are
|
correctly and reliably at all times. You can follow its development
|
||||||
typically made every couple of weeks. Info on patch releases are on
|
in a public `git repository on GitHub <https://github.com/lammps/lammps>`_.
|
||||||
`this website page <https://www.lammps.org/bug.html>`_. Every few
|
|
||||||
months, the latest patch release is subjected to more thorough testing
|
Whenever we fix a bug or update or add a feature, it will be merged into
|
||||||
and labeled as a *stable* version.
|
the `master` branch of the git repository. When a sufficient number of
|
||||||
|
changes have accumulated *and* the software passes a set of automated
|
||||||
|
tests, we release it in the next *patch* release, which are made every
|
||||||
|
few weeks. Info on patch releases are on `this website page
|
||||||
|
<https://www.lammps.org/bug.html>`_.
|
||||||
|
|
||||||
|
Once or twice a year, only bug fixes and small, non-intrusive changes are
|
||||||
|
included for a period of time, and the code is subjected to more detailed
|
||||||
|
and thorough testing than the default automated testing. The latest
|
||||||
|
patch release after such a period is then labeled as a *stable* version.
|
||||||
|
|
||||||
Each version of LAMMPS contains all the features and bug-fixes up to
|
Each version of LAMMPS contains all the features and bug-fixes up to
|
||||||
and including its version date.
|
and including its version date.
|
||||||
|
|||||||
@ -19,7 +19,7 @@ Syntax
|
|||||||
|
|
||||||
bondmax = length of longest bond in the system (in length units)
|
bondmax = length of longest bond in the system (in length units)
|
||||||
tlimit = elapsed CPU time (in seconds)
|
tlimit = elapsed CPU time (in seconds)
|
||||||
diskfree = free disk space (in megabytes)
|
diskfree = free disk space (in MBytes)
|
||||||
v_name = name of :doc:`equal-style variable <variable>`
|
v_name = name of :doc:`equal-style variable <variable>`
|
||||||
|
|
||||||
* operator = "<" or "<=" or ">" or ">=" or "==" or "!=" or "\|\^"
|
* operator = "<" or "<=" or ">" or ">=" or "==" or "!=" or "\|\^"
|
||||||
@ -81,7 +81,7 @@ the timer frequently across a large number of processors may be
|
|||||||
non-negligible.
|
non-negligible.
|
||||||
|
|
||||||
The *diskfree* attribute will check for available disk space (in
|
The *diskfree* attribute will check for available disk space (in
|
||||||
megabytes) on supported operating systems. By default it will
|
MBytes) on supported operating systems. By default it will
|
||||||
check the file system of the current working directory. This
|
check the file system of the current working directory. This
|
||||||
can be changed with the optional *path* keyword, which will take
|
can be changed with the optional *path* keyword, which will take
|
||||||
the path to a file or folder on the file system to be checked
|
the path to a file or folder on the file system to be checked
|
||||||
|
|||||||
@ -128,7 +128,7 @@ spectrum while consumes more memory. With fixed *f_max* and
|
|||||||
:math:`\gamma`, *N_f* should be big enough to converge the classical
|
:math:`\gamma`, *N_f* should be big enough to converge the classical
|
||||||
temperature :math:`T^{cl}` as a function of target quantum bath
|
temperature :math:`T^{cl}` as a function of target quantum bath
|
||||||
temperature. Memory usage per processor could be from 10 to 100
|
temperature. Memory usage per processor could be from 10 to 100
|
||||||
Mbytes.
|
MBytes.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
|
|||||||
@ -135,7 +135,7 @@ with #) anywhere. Each non-blank non-comment line must contain one
|
|||||||
keyword/value pair. The required keywords are *rcutfac* and
|
keyword/value pair. The required keywords are *rcutfac* and
|
||||||
*twojmax*\ . Optional keywords are *rfac0*, *rmin0*,
|
*twojmax*\ . Optional keywords are *rfac0*, *rmin0*,
|
||||||
*switchflag*, *bzeroflag*, *quadraticflag*, *chemflag*,
|
*switchflag*, *bzeroflag*, *quadraticflag*, *chemflag*,
|
||||||
*bnormflag*, *wselfallflag*, and *chunksize*\ .
|
*bnormflag*, *wselfallflag*, *chunksize*, and *parallelthresh*\ .
|
||||||
|
|
||||||
The default values for these keywords are
|
The default values for these keywords are
|
||||||
|
|
||||||
@ -147,7 +147,8 @@ The default values for these keywords are
|
|||||||
* *chemflag* = 0
|
* *chemflag* = 0
|
||||||
* *bnormflag* = 0
|
* *bnormflag* = 0
|
||||||
* *wselfallflag* = 0
|
* *wselfallflag* = 0
|
||||||
* *chunksize* = 4096
|
* *chunksize* = 32768
|
||||||
|
* *parallelthresh* = 8192
|
||||||
|
|
||||||
If *quadraticflag* is set to 1, then the SNAP energy expression includes
|
If *quadraticflag* is set to 1, then the SNAP energy expression includes
|
||||||
additional quadratic terms that have been shown to increase the overall
|
additional quadratic terms that have been shown to increase the overall
|
||||||
@ -188,14 +189,24 @@ corresponding *K*-vector of linear coefficients for element
|
|||||||
which must equal the number of unique elements appearing in the LAMMPS
|
which must equal the number of unique elements appearing in the LAMMPS
|
||||||
pair_coeff command, to avoid ambiguity in the number of coefficients.
|
pair_coeff command, to avoid ambiguity in the number of coefficients.
|
||||||
|
|
||||||
The keyword *chunksize* is only applicable when using the
|
The keywords *chunksize* and *parallelthresh* are only applicable when
|
||||||
pair style *snap* with the KOKKOS package and is ignored otherwise.
|
using the pair style *snap* with the KOKKOS package on GPUs and are
|
||||||
This keyword controls
|
ignored otherwise.
|
||||||
|
The *chunksize* keyword controls
|
||||||
the number of atoms in each pass used to compute the bispectrum
|
the number of atoms in each pass used to compute the bispectrum
|
||||||
components and is used to avoid running out of memory. For example
|
components and is used to avoid running out of memory. For example
|
||||||
if there are 8192 atoms in the simulation and the *chunksize*
|
if there are 8192 atoms in the simulation and the *chunksize*
|
||||||
is set to 4096, the bispectrum calculation will be broken up
|
is set to 4096, the bispectrum calculation will be broken up
|
||||||
into two passes.
|
into two passes (running on a single GPU).
|
||||||
|
The *parallelthresh* keyword controls
|
||||||
|
a crossover threshold for performing extra parallelism. For
|
||||||
|
small systems, exposing additional parallism can be beneficial when
|
||||||
|
there is not enough work to fully saturate the GPU threads otherwise.
|
||||||
|
However, the extra parallelism also leads to more divergence
|
||||||
|
and can hurt performance when the system is already large enough to
|
||||||
|
saturate the GPU threads. Extra parallelism will be performed if the
|
||||||
|
*chunksize* (or total number of atoms per GPU) is smaller than
|
||||||
|
*parallelthresh*.
|
||||||
|
|
||||||
Detailed definitions for all the other keywords
|
Detailed definitions for all the other keywords
|
||||||
are given on the :doc:`compute sna/atom <compute_sna_atom>` doc page.
|
are given on the :doc:`compute sna/atom <compute_sna_atom>` doc page.
|
||||||
|
|||||||
@ -1174,6 +1174,7 @@ googletest
|
|||||||
Gordan
|
Gordan
|
||||||
Goudeau
|
Goudeau
|
||||||
GPa
|
GPa
|
||||||
|
GPL
|
||||||
gpu
|
gpu
|
||||||
gpuID
|
gpuID
|
||||||
gpus
|
gpus
|
||||||
@ -1689,6 +1690,7 @@ Lett
|
|||||||
Leuven
|
Leuven
|
||||||
Leven
|
Leven
|
||||||
Lewy
|
Lewy
|
||||||
|
LGPL
|
||||||
lgvdw
|
lgvdw
|
||||||
Liang
|
Liang
|
||||||
libatc
|
libatc
|
||||||
@ -1889,7 +1891,6 @@ maxX
|
|||||||
Mayergoyz
|
Mayergoyz
|
||||||
Mayoral
|
Mayoral
|
||||||
mbt
|
mbt
|
||||||
Mbytes
|
|
||||||
MBytes
|
MBytes
|
||||||
mc
|
mc
|
||||||
McLachlan
|
McLachlan
|
||||||
|
|||||||
@ -44,7 +44,8 @@ struct TagPairSNAPComputeForce{};
|
|||||||
struct TagPairSNAPComputeNeigh{};
|
struct TagPairSNAPComputeNeigh{};
|
||||||
struct TagPairSNAPComputeCayleyKlein{};
|
struct TagPairSNAPComputeCayleyKlein{};
|
||||||
struct TagPairSNAPPreUi{};
|
struct TagPairSNAPPreUi{};
|
||||||
struct TagPairSNAPComputeUi{};
|
struct TagPairSNAPComputeUiSmall{}; // more parallelism, more divergence
|
||||||
|
struct TagPairSNAPComputeUiLarge{}; // less parallelism, no divergence
|
||||||
struct TagPairSNAPTransformUi{}; // re-order ulisttot from SoA to AoSoA, zero ylist
|
struct TagPairSNAPTransformUi{}; // re-order ulisttot from SoA to AoSoA, zero ylist
|
||||||
struct TagPairSNAPComputeZi{};
|
struct TagPairSNAPComputeZi{};
|
||||||
struct TagPairSNAPBeta{};
|
struct TagPairSNAPBeta{};
|
||||||
@ -53,7 +54,9 @@ struct TagPairSNAPTransformBi{}; // re-order blist from AoSoA to AoS
|
|||||||
struct TagPairSNAPComputeYi{};
|
struct TagPairSNAPComputeYi{};
|
||||||
struct TagPairSNAPComputeYiWithZlist{};
|
struct TagPairSNAPComputeYiWithZlist{};
|
||||||
template<int dir>
|
template<int dir>
|
||||||
struct TagPairSNAPComputeFusedDeidrj{};
|
struct TagPairSNAPComputeFusedDeidrjSmall{}; // more parallelism, more divergence
|
||||||
|
template<int dir>
|
||||||
|
struct TagPairSNAPComputeFusedDeidrjLarge{}; // less parallelism, no divergence
|
||||||
|
|
||||||
// CPU backend only
|
// CPU backend only
|
||||||
struct TagPairSNAPComputeNeighCPU{};
|
struct TagPairSNAPComputeNeighCPU{};
|
||||||
@ -143,7 +146,10 @@ public:
|
|||||||
void operator() (TagPairSNAPPreUi,const int iatom_mod, const int j, const int iatom_div) const;
|
void operator() (TagPairSNAPPreUi,const int iatom_mod, const int j, const int iatom_div) const;
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void operator() (TagPairSNAPComputeUi,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUi>::member_type& team) const;
|
void operator() (TagPairSNAPComputeUiSmall,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiSmall>::member_type& team) const;
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void operator() (TagPairSNAPComputeUiLarge,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiLarge>::member_type& team) const;
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void operator() (TagPairSNAPTransformUi,const int iatom_mod, const int j, const int iatom_div) const;
|
void operator() (TagPairSNAPTransformUi,const int iatom_mod, const int j, const int iatom_div) const;
|
||||||
@ -168,7 +174,11 @@ public:
|
|||||||
|
|
||||||
template<int dir>
|
template<int dir>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void operator() (TagPairSNAPComputeFusedDeidrj<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrj<dir> >::member_type& team) const;
|
void operator() (TagPairSNAPComputeFusedDeidrjSmall<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrjSmall<dir> >::member_type& team) const;
|
||||||
|
|
||||||
|
template<int dir>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void operator() (TagPairSNAPComputeFusedDeidrjLarge<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrjLarge<dir> >::member_type& team) const;
|
||||||
|
|
||||||
// CPU backend only
|
// CPU backend only
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
|||||||
@ -341,18 +341,32 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::compute(int eflag_in,
|
|||||||
// ComputeUi w/vector parallelism, shared memory, direct atomicAdd into ulisttot
|
// ComputeUi w/vector parallelism, shared memory, direct atomicAdd into ulisttot
|
||||||
{
|
{
|
||||||
// team_size_compute_ui is defined in `pair_snap_kokkos.h`
|
// team_size_compute_ui is defined in `pair_snap_kokkos.h`
|
||||||
|
|
||||||
// scratch size: 32 atoms * (twojmax+1) cached values, no double buffer
|
// scratch size: 32 atoms * (twojmax+1) cached values, no double buffer
|
||||||
const int tile_size = vector_length * (twojmax + 1);
|
const int tile_size = vector_length * (twojmax + 1);
|
||||||
const int scratch_size = scratch_size_helper<complex>(team_size_compute_ui * tile_size);
|
const int scratch_size = scratch_size_helper<complex>(team_size_compute_ui * tile_size);
|
||||||
|
|
||||||
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
|
if (chunk_size < parallel_thresh)
|
||||||
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
|
{
|
||||||
const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
|
// Version with parallelism over j_bend
|
||||||
|
|
||||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUi> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
|
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
|
||||||
policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
|
||||||
Kokkos::parallel_for("ComputeUi",policy_ui,*this);
|
const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
|
||||||
|
|
||||||
|
SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUiSmall> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
|
||||||
|
policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||||
|
Kokkos::parallel_for("ComputeUiSmall",policy_ui,*this);
|
||||||
|
} else {
|
||||||
|
// Version w/out parallelism over j_bend
|
||||||
|
|
||||||
|
// total number of teams needed: (natoms / 32) * (max_neighs)
|
||||||
|
const int n_teams = chunk_size_div * max_neighs;
|
||||||
|
const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
|
||||||
|
|
||||||
|
SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUiLarge> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
|
||||||
|
policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||||
|
Kokkos::parallel_for("ComputeUiLarge",policy_ui,*this);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
//TransformUi: un-"fold" ulisttot, zero ylist
|
//TransformUi: un-"fold" ulisttot, zero ylist
|
||||||
@ -412,25 +426,51 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::compute(int eflag_in,
|
|||||||
const int tile_size = vector_length * (twojmax + 1);
|
const int tile_size = vector_length * (twojmax + 1);
|
||||||
const int scratch_size = scratch_size_helper<complex>(2 * team_size_compute_fused_deidrj * tile_size);
|
const int scratch_size = scratch_size_helper<complex>(2 * team_size_compute_fused_deidrj * tile_size);
|
||||||
|
|
||||||
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
|
if (chunk_size < parallel_thresh)
|
||||||
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
|
{
|
||||||
const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
|
// Version with parallelism over j_bend
|
||||||
|
|
||||||
// x direction
|
// total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
|
||||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
|
||||||
policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
|
||||||
Kokkos::parallel_for("ComputeFusedDeidrj<0>",policy_fused_deidrj_x,*this);
|
|
||||||
|
|
||||||
// y direction
|
// x direction
|
||||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||||
policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||||
Kokkos::parallel_for("ComputeFusedDeidrj<1>",policy_fused_deidrj_y,*this);
|
Kokkos::parallel_for("ComputeFusedDeidrjSmall<0>",policy_fused_deidrj_x,*this);
|
||||||
|
|
||||||
// z direction
|
// y direction
|
||||||
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||||
policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||||
Kokkos::parallel_for("ComputeFusedDeidrj<2>",policy_fused_deidrj_z,*this);
|
Kokkos::parallel_for("ComputeFusedDeidrjSmall<1>",policy_fused_deidrj_y,*this);
|
||||||
|
|
||||||
|
// z direction
|
||||||
|
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||||
|
policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||||
|
Kokkos::parallel_for("ComputeFusedDeidrjSmall<2>",policy_fused_deidrj_z,*this);
|
||||||
|
} else {
|
||||||
|
// Version w/out parallelism over j_bend
|
||||||
|
|
||||||
|
// total number of teams needed: (natoms / 32) * (max_neighs)
|
||||||
|
const int n_teams = chunk_size_div * max_neighs;
|
||||||
|
const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
|
||||||
|
|
||||||
|
// x direction
|
||||||
|
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||||
|
policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||||
|
Kokkos::parallel_for("ComputeFusedDeidrjLarge<0>",policy_fused_deidrj_x,*this);
|
||||||
|
|
||||||
|
// y direction
|
||||||
|
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||||
|
policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||||
|
Kokkos::parallel_for("ComputeFusedDeidrjLarge<1>",policy_fused_deidrj_y,*this);
|
||||||
|
|
||||||
|
// z direction
|
||||||
|
SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
|
||||||
|
policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
|
||||||
|
Kokkos::parallel_for("ComputeFusedDeidrjLarge<2>",policy_fused_deidrj_z,*this);
|
||||||
|
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#endif // LMP_KOKKOS_GPU
|
#endif // LMP_KOKKOS_GPU
|
||||||
@ -603,13 +643,13 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
|||||||
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
||||||
const auto idxb = icoeff % idxb_max;
|
const auto idxb = icoeff % idxb_max;
|
||||||
const auto idx_chem = icoeff / idxb_max;
|
const auto idx_chem = icoeff / idxb_max;
|
||||||
auto bveci = my_sna.blist(idxb, idx_chem, ii);
|
real_type bveci = my_sna.blist(ii, idx_chem, idxb);
|
||||||
d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bveci;
|
d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bveci;
|
||||||
k++;
|
k++;
|
||||||
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
|
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
|
||||||
const auto jdxb = jcoeff % idxb_max;
|
const auto jdxb = jcoeff % idxb_max;
|
||||||
const auto jdx_chem = jcoeff / idxb_max;
|
const auto jdx_chem = jcoeff / idxb_max;
|
||||||
real_type bvecj = my_sna.blist(jdxb, jdx_chem, ii);
|
real_type bvecj = my_sna.blist(ii, jdx_chem, jdxb);
|
||||||
d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bvecj;
|
d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bvecj;
|
||||||
d_beta_pack(iatom_mod,jcoeff,iatom_div) += d_coeffi[k]*bveci;
|
d_beta_pack(iatom_mod,jcoeff,iatom_div) += d_coeffi[k]*bveci;
|
||||||
k++;
|
k++;
|
||||||
@ -736,7 +776,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
|||||||
|
|
||||||
template<class DeviceType, typename real_type, int vector_length>
|
template<class DeviceType, typename real_type, int vector_length>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUi,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUi>::member_type& team) const {
|
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUiSmall,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUiSmall>::member_type& team) const {
|
||||||
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
||||||
|
|
||||||
// extract flattened atom_div / neighbor number / bend location
|
// extract flattened atom_div / neighbor number / bend location
|
||||||
@ -756,11 +796,37 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
|||||||
const int ninside = d_ninside(ii);
|
const int ninside = d_ninside(ii);
|
||||||
if (jj >= ninside) return;
|
if (jj >= ninside) return;
|
||||||
|
|
||||||
my_sna.compute_ui(team,iatom_mod, jbend, jj, iatom_div);
|
my_sna.compute_ui_small(team, iatom_mod, jbend, jj, iatom_div);
|
||||||
});
|
});
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
template<class DeviceType, typename real_type, int vector_length>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUiLarge,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUiLarge>::member_type& team) const {
|
||||||
|
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
||||||
|
|
||||||
|
// extract flattened atom_div / neighbor number / bend location
|
||||||
|
int flattened_idx = team.team_rank() + team.league_rank() * team_size_compute_ui;
|
||||||
|
|
||||||
|
// extract neighbor index, iatom_div
|
||||||
|
int iatom_div = flattened_idx / max_neighs; // removed "const" to work around GCC 7 bug
|
||||||
|
int jj = flattened_idx - iatom_div * max_neighs;
|
||||||
|
|
||||||
|
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, vector_length),
|
||||||
|
[&] (const int iatom_mod) {
|
||||||
|
const int ii = iatom_mod + vector_length * iatom_div;
|
||||||
|
if (ii >= chunk_size) return;
|
||||||
|
|
||||||
|
const int ninside = d_ninside(ii);
|
||||||
|
if (jj >= ninside) return;
|
||||||
|
|
||||||
|
my_sna.compute_ui_large(team,iatom_mod, jj, iatom_div);
|
||||||
|
});
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
template<class DeviceType, typename real_type, int vector_length>
|
template<class DeviceType, typename real_type, int vector_length>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPTransformUi,const int iatom_mod, const int idxu, const int iatom_div) const {
|
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPTransformUi,const int iatom_mod, const int idxu, const int iatom_div) const {
|
||||||
@ -861,9 +927,9 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
|||||||
|
|
||||||
for (int itriple = 0; itriple < ntriples; itriple++) {
|
for (int itriple = 0; itriple < ntriples; itriple++) {
|
||||||
|
|
||||||
const auto blocal = my_sna.blist_pack(iatom_mod, idxb, itriple, iatom_div);
|
const real_type blocal = my_sna.blist_pack(iatom_mod, idxb, itriple, iatom_div);
|
||||||
|
|
||||||
my_sna.blist(idxb, itriple, iatom) = blocal;
|
my_sna.blist(iatom, itriple, idxb) = blocal;
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
@ -871,7 +937,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
|||||||
template<class DeviceType, typename real_type, int vector_length>
|
template<class DeviceType, typename real_type, int vector_length>
|
||||||
template<int dir>
|
template<int dir>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrj<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrj<dir> >::member_type& team) const {
|
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrjSmall<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrjSmall<dir> >::member_type& team) const {
|
||||||
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
||||||
|
|
||||||
// extract flattened atom_div / neighbor number / bend location
|
// extract flattened atom_div / neighbor number / bend location
|
||||||
@ -891,12 +957,38 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
|||||||
const int ninside = d_ninside(ii);
|
const int ninside = d_ninside(ii);
|
||||||
if (jj >= ninside) return;
|
if (jj >= ninside) return;
|
||||||
|
|
||||||
my_sna.template compute_fused_deidrj<dir>(team, iatom_mod, jbend, jj, iatom_div);
|
my_sna.template compute_fused_deidrj_small<dir>(team, iatom_mod, jbend, jj, iatom_div);
|
||||||
|
|
||||||
});
|
});
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
template<class DeviceType, typename real_type, int vector_length>
|
||||||
|
template<int dir>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrjLarge<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrjLarge<dir> >::member_type& team) const {
|
||||||
|
SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
|
||||||
|
|
||||||
|
// extract flattened atom_div / neighbor number / bend location
|
||||||
|
int flattened_idx = team.team_rank() + team.league_rank() * team_size_compute_fused_deidrj;
|
||||||
|
|
||||||
|
// extract neighbor index, iatom_div
|
||||||
|
int iatom_div = flattened_idx / max_neighs; // removed "const" to work around GCC 7 bug
|
||||||
|
int jj = flattened_idx - max_neighs * iatom_div;
|
||||||
|
|
||||||
|
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, vector_length),
|
||||||
|
[&] (const int iatom_mod) {
|
||||||
|
const int ii = iatom_mod + vector_length * iatom_div;
|
||||||
|
if (ii >= chunk_size) return;
|
||||||
|
|
||||||
|
const int ninside = d_ninside(ii);
|
||||||
|
if (jj >= ninside) return;
|
||||||
|
|
||||||
|
my_sna.template compute_fused_deidrj_large<dir>(team, iatom_mod, jj, iatom_div);
|
||||||
|
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
/* ----------------------------------------------------------------------
|
/* ----------------------------------------------------------------------
|
||||||
Begin routines that are unique to the CPU codepath. These do not take
|
Begin routines that are unique to the CPU codepath. These do not take
|
||||||
advantage of AoSoA data layouts, but that could be a good point of
|
advantage of AoSoA data layouts, but that could be a good point of
|
||||||
@ -925,13 +1017,13 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
|||||||
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
||||||
const auto idxb = icoeff % idxb_max;
|
const auto idxb = icoeff % idxb_max;
|
||||||
const auto idx_chem = icoeff / idxb_max;
|
const auto idx_chem = icoeff / idxb_max;
|
||||||
auto bveci = my_sna.blist(idxb,idx_chem,ii);
|
real_type bveci = my_sna.blist(ii,idx_chem,idxb);
|
||||||
d_beta(icoeff,ii) += d_coeffi[k]*bveci;
|
d_beta(icoeff,ii) += d_coeffi[k]*bveci;
|
||||||
k++;
|
k++;
|
||||||
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
|
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
|
||||||
const auto jdxb = jcoeff % idxb_max;
|
const auto jdxb = jcoeff % idxb_max;
|
||||||
const auto jdx_chem = jcoeff / idxb_max;
|
const auto jdx_chem = jcoeff / idxb_max;
|
||||||
auto bvecj = my_sna.blist(jdxb,jdx_chem,ii);
|
real_type bvecj = my_sna.blist(ii,jdx_chem,jdxb);
|
||||||
d_beta(icoeff,ii) += d_coeffi[k]*bvecj;
|
d_beta(icoeff,ii) += d_coeffi[k]*bvecj;
|
||||||
d_beta(jcoeff,ii) += d_coeffi[k]*bveci;
|
d_beta(jcoeff,ii) += d_coeffi[k]*bveci;
|
||||||
k++;
|
k++;
|
||||||
@ -1221,7 +1313,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
|||||||
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
||||||
const auto idxb = icoeff % idxb_max;
|
const auto idxb = icoeff % idxb_max;
|
||||||
const auto idx_chem = icoeff / idxb_max;
|
const auto idx_chem = icoeff / idxb_max;
|
||||||
evdwl += d_coeffi[icoeff+1]*my_sna.blist(idxb,idx_chem,ii);
|
evdwl += d_coeffi[icoeff+1]*my_sna.blist(ii,idx_chem,idxb);
|
||||||
}
|
}
|
||||||
|
|
||||||
// quadratic contributions
|
// quadratic contributions
|
||||||
@ -1230,12 +1322,12 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
|
|||||||
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
|
||||||
const auto idxb = icoeff % idxb_max;
|
const auto idxb = icoeff % idxb_max;
|
||||||
const auto idx_chem = icoeff / idxb_max;
|
const auto idx_chem = icoeff / idxb_max;
|
||||||
auto bveci = my_sna.blist(idxb,idx_chem,ii);
|
real_type bveci = my_sna.blist(ii,idx_chem,idxb);
|
||||||
evdwl += 0.5*d_coeffi[k++]*bveci*bveci;
|
evdwl += 0.5*d_coeffi[k++]*bveci*bveci;
|
||||||
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
|
for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
|
||||||
auto jdxb = jcoeff % idxb_max;
|
auto jdxb = jcoeff % idxb_max;
|
||||||
auto jdx_chem = jcoeff / idxb_max;
|
auto jdx_chem = jcoeff / idxb_max;
|
||||||
auto bvecj = my_sna.blist(jdxb,jdx_chem,ii);
|
auto bvecj = my_sna.blist(ii,jdx_chem,jdxb);
|
||||||
evdwl += d_coeffi[k++]*bveci*bvecj;
|
evdwl += d_coeffi[k++]*bveci*bvecj;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -45,12 +45,12 @@ struct WignerWrapper {
|
|||||||
{ ; }
|
{ ; }
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
complex get(const int& ma) {
|
complex get(const int& ma) const {
|
||||||
return complex(buffer[offset + 2 * vector_length * ma], buffer[offset + vector_length + 2 * vector_length * ma]);
|
return complex(buffer[offset + 2 * vector_length * ma], buffer[offset + vector_length + 2 * vector_length * ma]);
|
||||||
}
|
}
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void set(const int& ma, const complex& store) {
|
void set(const int& ma, const complex& store) const {
|
||||||
buffer[offset + 2 * vector_length * ma] = store.re;
|
buffer[offset + 2 * vector_length * ma] = store.re;
|
||||||
buffer[offset + vector_length + 2 * vector_length * ma] = store.im;
|
buffer[offset + vector_length + 2 * vector_length * ma] = store.im;
|
||||||
}
|
}
|
||||||
@ -122,8 +122,14 @@ inline
|
|||||||
void compute_cayley_klein(const int&, const int&, const int&);
|
void compute_cayley_klein(const int&, const int&, const int&);
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void pre_ui(const int&, const int&, const int&, const int&); // ForceSNAP
|
void pre_ui(const int&, const int&, const int&, const int&); // ForceSNAP
|
||||||
|
|
||||||
|
// version of the code with parallelism over j_bend
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void compute_ui(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); // ForceSNAP
|
void compute_ui_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); // ForceSNAP
|
||||||
|
// version of the code without parallelism over j_bend
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void compute_ui_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); // ForceSNAP
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void compute_zi(const int&, const int&, const int&); // ForceSNAP
|
void compute_zi(const int&, const int&, const int&); // ForceSNAP
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
@ -135,6 +141,35 @@ inline
|
|||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void compute_bi(const int&, const int&, const int&); // ForceSNAP
|
void compute_bi(const int&, const int&, const int&); // ForceSNAP
|
||||||
|
|
||||||
|
// functions for derivatives, GPU only
|
||||||
|
// version of the code with parallelism over j_bend
|
||||||
|
template<int dir>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void compute_fused_deidrj_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); //ForceSNAP
|
||||||
|
// version of the code without parallelism over j_bend
|
||||||
|
template<int dir>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void compute_fused_deidrj_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); //ForceSNAP
|
||||||
|
|
||||||
|
// core "evaluation" functions that get plugged into "compute" functions
|
||||||
|
// plugged into compute_ui_small, compute_ui_large
|
||||||
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
|
void evaluate_ui_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&, const int&,
|
||||||
|
const int&, const int&, const int&);
|
||||||
|
// plugged into compute_zi, compute_yi
|
||||||
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
|
complex evaluate_zi(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
|
||||||
|
const int&, const int&, const int&, const int&, const real_type*);
|
||||||
|
// plugged into compute_yi, compute_yi_with_zlist
|
||||||
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
|
real_type evaluate_beta_scaled(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
|
||||||
|
const Kokkos::View<real_type***, Kokkos::LayoutLeft, DeviceType> &);
|
||||||
|
// plugged into compute_fused_deidrj_small, compute_fused_deidrj_large
|
||||||
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
|
real_type evaluate_duidrj_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&,
|
||||||
|
const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&,
|
||||||
|
const int&, const int&, const int&, const int&);
|
||||||
|
|
||||||
// functions for bispectrum coefficients, CPU only
|
// functions for bispectrum coefficients, CPU only
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void pre_ui_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team,const int&,const int&); // ForceSNAP
|
void pre_ui_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team,const int&,const int&); // ForceSNAP
|
||||||
@ -148,11 +183,6 @@ inline
|
|||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void compute_bi_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int); // ForceSNAP
|
void compute_bi_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int); // ForceSNAP
|
||||||
|
|
||||||
// functions for derivatives, GPU only
|
|
||||||
template<int dir>
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void compute_fused_deidrj(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); //ForceSNAP
|
|
||||||
|
|
||||||
// functions for derivatives, CPU only
|
// functions for derivatives, CPU only
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void compute_duidrj_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int); //ForceSNAP
|
void compute_duidrj_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int); //ForceSNAP
|
||||||
@ -168,23 +198,6 @@ inline
|
|||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void compute_s_dsfac(const real_type, const real_type, real_type&, real_type&); // compute_cayley_klein
|
void compute_s_dsfac(const real_type, const real_type, real_type&, real_type&); // compute_cayley_klein
|
||||||
|
|
||||||
static KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
void sincos_wrapper(double x, double* sin_, double *cos_) {
|
|
||||||
#ifdef __SYCL_DEVICE_ONLY__
|
|
||||||
*sin_ = sycl::sincos(x, cos_);
|
|
||||||
#else
|
|
||||||
sincos(x, sin_, cos_);
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
static KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
void sincos_wrapper(float x, float* sin_, float *cos_) {
|
|
||||||
#ifdef __SYCL_DEVICE_ONLY__
|
|
||||||
*sin_ = sycl::sincos(x, cos_);
|
|
||||||
#else
|
|
||||||
sincosf(x, sin_, cos_);
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
#ifdef TIMING_INFO
|
#ifdef TIMING_INFO
|
||||||
double* timers;
|
double* timers;
|
||||||
timespec starttime, endtime;
|
timespec starttime, endtime;
|
||||||
@ -207,7 +220,7 @@ inline
|
|||||||
|
|
||||||
int twojmax, diagonalstyle;
|
int twojmax, diagonalstyle;
|
||||||
|
|
||||||
t_sna_3d_ll blist;
|
t_sna_3d blist;
|
||||||
t_sna_3c_ll ulisttot;
|
t_sna_3c_ll ulisttot;
|
||||||
t_sna_3c_ll ulisttot_full; // un-folded ulisttot, cpu only
|
t_sna_3c_ll ulisttot_full; // un-folded ulisttot, cpu only
|
||||||
t_sna_3c_ll zlist;
|
t_sna_3c_ll zlist;
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@ -628,7 +628,8 @@ void PairSNAP::read_files(char *coefffilename, char *paramfilename)
|
|||||||
chemflag = 0;
|
chemflag = 0;
|
||||||
bnormflag = 0;
|
bnormflag = 0;
|
||||||
wselfallflag = 0;
|
wselfallflag = 0;
|
||||||
chunksize = 4096;
|
chunksize = 32768;
|
||||||
|
parallel_thresh = 8192;
|
||||||
|
|
||||||
// open SNAP parameter file on proc 0
|
// open SNAP parameter file on proc 0
|
||||||
|
|
||||||
@ -696,6 +697,8 @@ void PairSNAP::read_files(char *coefffilename, char *paramfilename)
|
|||||||
wselfallflag = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
|
wselfallflag = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
|
||||||
else if (keywd == "chunksize")
|
else if (keywd == "chunksize")
|
||||||
chunksize = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
|
chunksize = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
|
||||||
|
else if (keywd == "parallelthresh")
|
||||||
|
parallel_thresh = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
|
||||||
else
|
else
|
||||||
error->all(FLERR,"Unknown parameter '{}' in SNAP "
|
error->all(FLERR,"Unknown parameter '{}' in SNAP "
|
||||||
"parameter file", keywd);
|
"parameter file", keywd);
|
||||||
|
|||||||
@ -59,7 +59,7 @@ class PairSNAP : public Pair {
|
|||||||
double **scale; // for thermodynamic integration
|
double **scale; // for thermodynamic integration
|
||||||
int twojmax, switchflag, bzeroflag, bnormflag;
|
int twojmax, switchflag, bzeroflag, bnormflag;
|
||||||
int chemflag, wselfallflag;
|
int chemflag, wselfallflag;
|
||||||
int chunksize;
|
int chunksize,parallel_thresh;
|
||||||
double rfac0, rmin0, wj1, wj2;
|
double rfac0, rmin0, wj1, wj2;
|
||||||
int rcutfacflag, twojmaxflag; // flags for required parameters
|
int rcutfacflag, twojmaxflag; // flags for required parameters
|
||||||
int beta_max; // length of beta
|
int beta_max; // length of beta
|
||||||
|
|||||||
@ -20,7 +20,7 @@ charges (dsf and long-range treatment of charges)
|
|||||||
out-of-plane angle
|
out-of-plane angle
|
||||||
|
|
||||||
See the file doc/drude_tutorial.html for getting started.
|
See the file doc/drude_tutorial.html for getting started.
|
||||||
See the doc pages for "pair_style buck6d/coul/gauss", "anlge_style class2",
|
See the doc pages for "pair_style buck6d/coul/gauss", "angle_style class2",
|
||||||
"angle_style cosine/buck6d", and "improper_style inversion/harmonic"
|
"angle_style cosine/buck6d", and "improper_style inversion/harmonic"
|
||||||
commands to get started. Also see the above mentioned website and
|
commands to get started. Also see the above mentioned website and
|
||||||
literature for further documentation about the force field.
|
literature for further documentation about the force field.
|
||||||
|
|||||||
@ -34,6 +34,7 @@ exclude:
|
|||||||
- lib/hdnnp
|
- lib/hdnnp
|
||||||
- lib/kim
|
- lib/kim
|
||||||
- lib/kokkos
|
- lib/kokkos
|
||||||
|
- lib/latte
|
||||||
- lib/machdyn
|
- lib/machdyn
|
||||||
- lib/mdi
|
- lib/mdi
|
||||||
- lib/mscg
|
- lib/mscg
|
||||||
@ -41,6 +42,7 @@ exclude:
|
|||||||
- lib/plumed
|
- lib/plumed
|
||||||
- lib/quip
|
- lib/quip
|
||||||
- lib/scafacos
|
- lib/scafacos
|
||||||
|
- lib/voronoi
|
||||||
- src/Make.sh
|
- src/Make.sh
|
||||||
patterns:
|
patterns:
|
||||||
- "*.c"
|
- "*.c"
|
||||||
|
|||||||
Reference in New Issue
Block a user