Merge branch 'master' into collected-small-changes

2021-08-27 14:58:42 -04:00
parent 4be33df8fb e363b4aeff
commit 4eef3eaac6
19 changed files with 822 additions and 533 deletions
--- a/doc/src/Build.rst
+++ b/doc/src/Build.rst
@ -22,4 +22,5 @@ page.
   Build_extras
   Build_manual
   Build_windows
   Build_diskspace
   Build_development
--- a/doc/src/Build_diskspace.rst
+++ b/doc/src/Build_diskspace.rst
@ -0,0 +1,45 @@
 Notes for saving disk space when building LAMMPS from source
 ------------------------------------------------------------
 LAMMPS is a large software project with a large number of source files,
 extensive documentation, and a large collection of example files.
 When downloading LAMMPS by cloning the
 `git repository from GitHub <https://github.com/lammps/lammps>`_ this
 will by default also download the entire commit history since September 2006.
 Compiling LAMMPS will add the storage requirements of the compiled object
 files and libraries to the tally.
 In a user account on an HPC cluster with filesystem quotas or in other
 environments with restricted disk space capacity it may be needed to
 reduce the storage requirements. Here are some suggestions:
 - Create a so-called shallow repository by cloning only the last commit
  instead of the full project history by using ``git clone git@github.com:lammps/lammps --depth=1 --branch=master``.
  This reduces the downloaded size to about half.  With ``--depth=1`` it is not possible to check out different
  versions/branches of LAMMPS, using ``--depth=1000`` will make multiple recent versions available at little
  extra storage needs (the entire git history had nearly 30,000 commits in fall 2021).
 - Download a tar archive from either the `download section on the LAMMPS homepage <https://www.lammps.org/download.html>`_
  or from the `LAMMPS releases page on GitHub <https://github.com/lammps/lammps/releases>`_ these will not
  contain the git history at all.
 - Build LAMMPS without the debug flag (remove ``-g`` from the machine makefile or use ``-DCMAKE_BUILD_TYPE=Release``)
  or use the ``strip`` command on the LAMMPS executable when no more debugging would be needed.  The strip command
  may also be applied to the LAMMPS shared library. The static library may be deleted entirely.
 - Delete compiled object files and libraries after copying the LAMMPS executable to a permanent location.
  When using the traditional build process, one may use ``make clean-<machine>`` or ``make clean-all``
  to delete object files in the src folder.  For CMake based builds, one may use ``make clean`` or just
  delete the entire build folder.
 - The folders containing the documentation tree (doc), the examples (examples) are not needed to build and
  run LAMMPS and can be safely deleted.  Some files in the potentials folder are large and may be deleted,
  if not needed.  The largest of those files (occupying about 120 MBytes combined) will only be downloaded on
  demand, when the corresponding package is installed.
 - When using the CMake build procedure, the compilation can be done on a (local) scratch storage that will not
  count toward the quota.  A local scratch file system may offer the additional benefit of speeding up creating
  object files and linking with libraries compared to a networked file system.  Also with CMake (and unlike with
  the traditional make) it is possible to compile LAMMPS executables with different settings and packages included
  from the same source tree since all the configuration information is stored in the build folder.  So it is
  not necessary to have multiple copies of LAMMPS.
--- a/doc/src/Intro_authors.rst
+++ b/doc/src/Intro_authors.rst
@ -29,7 +29,7 @@ The following folks deserve special recognition.  Many of the packages
 they have written are unique for an MD code and LAMMPS would not be as
 general-purpose as it is without their expertise and efforts.
-* Metin Aktulga (MSU), REAXFF package for C version of ReaxFF
+* Metin Aktulga (MSU), REAXFF package for C/C++ version of ReaxFF
 * Mike Brown (Intel), GPU and INTEL packages
 * Colin Denniston (U Western Ontario), LATBOLTZ package
 * Georg Ganzenmuller (EMI), MACHDYN and SPH packages
@ -37,9 +37,10 @@ general-purpose as it is without their expertise and efforts.
 * Reese Jones (Sandia) and colleagues, ATC package for atom/continuum coupling
 * Christoph Kloss (DCS Computing), LIGGGHTS code for granular materials, built on top of LAMMPS
 * Rudra Mukherjee (JPL), POEMS package for articulated rigid body motion
-* Trung Ngyuen (Northwestern U), GPU and RIGID and BODY packages
+* Trung Ngyuen (Northwestern U), GPU, RIGID, BODY, and DIELECTRIC packages
 * Mike Parks (Sandia), PERI package for Peridynamics
 * Roy Pollock (LLNL), Ewald and PPPM solvers
 * Julien Tranchida (Sandia), SPIN package
 * Christian Trott (Sandia), CUDA and KOKKOS packages
 * Ilya Valuev (JIHT), AWPMD package for wave packet MD
 * Greg Wagner (Northwestern U), MEAM package for MEAM potential
--- a/doc/src/Intro_features.rst
+++ b/doc/src/Intro_features.rst
@ -27,19 +27,19 @@ General features
 * distributed memory message-passing parallelism (MPI)
 * shared memory multi-threading parallelism (OpenMP)
 * spatial decomposition of simulation domain for MPI parallelism
-* particle decomposition inside of spatial decomposition for OpenMP parallelism
+* particle decomposition inside of spatial decomposition for OpenMP and GPU parallelism
 * GPLv2 licensed open-source distribution
 * highly portable C++-11
 * modular code with most functionality in optional packages
-* only depends on MPI library for basic parallel functionality
+* only depends on MPI library for basic parallel functionality, MPI stub for serial compilation
 * other libraries are optional and only required for specific packages
-* GPU (CUDA and OpenCL), Intel Xeon Phi, and OpenMP support for many code features
+* GPU (CUDA, OpenCL, HIP, SYCL), Intel Xeon Phi, and OpenMP support for many code features
 * easy to extend with new features and functionality
 * runs from an input script
 * syntax for defining and using variables and formulas
 * syntax for looping over runs and breaking out of loops
 * run one or multiple simulations simultaneously (in parallel) from one script
-* build as library, invoke LAMMPS through library interface or provided Python wrapper
+* build as library, invoke LAMMPS through library interface or provided Python wrapper or SWIG based wrappers
 * couple with other codes: LAMMPS calls other code, other code calls LAMMPS, umbrella code calls both
 .. _particle:
@ -57,9 +57,11 @@ Particle and model types
 * granular materials
 * coarse-grained mesoscale models
 * finite-size spherical and ellipsoidal particles
-* finite-size  line segment (2d) and triangle (3d) particles
+* finite-size line segment (2d) and triangle (3d) particles
 * finite-size rounded polygons (2d) and polyhedra (3d) particles
 * point dipole particles
-* rigid collections of particles
+* particles with magnetic spin
 * rigid collections of n particles
 * hybrid combinations of these
 .. _ff:
@ -74,24 +76,28 @@ commands)
 * pairwise potentials: Lennard-Jones, Buckingham, Morse, Born-Mayer-Huggins, Yukawa, soft, class 2 (COMPASS), hydrogen bond, tabulated
 * charged pairwise potentials: Coulombic, point-dipole
-* many-body potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, REBO, AIREBO, ReaxFF, COMB, SNAP, Streitz-Mintmire, 3-body polymorphic
+* many-body potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, REBO, AIREBO, ReaxFF, COMB, Streitz-Mintmire, 3-body polymorphic, BOP, Vashishta
-* long-range interactions for charge, point-dipoles, and LJ dispersion:     Ewald, Wolf, PPPM (similar to particle-mesh Ewald)
+* machine learning potentials: SNAP, GAP, ACE, N2P2, RANN, AGNI
 * long-range interactions for charge, point-dipoles, and LJ dispersion:  Ewald, Wolf, PPPM (similar to particle-mesh Ewald), MSM
 * polarization models: :doc:`QEq <fix_qeq>`,     :doc:`core/shell model <Howto_coreshell>`,     :doc:`Drude dipole model <Howto_drude>`
 * charge equilibration (QEq via dynamic, point, shielded, Slater methods)
 * coarse-grained potentials: DPD, GayBerne, REsquared, colloidal, DLVO
-* mesoscopic potentials: granular, Peridynamics, SPH
+* mesoscopic potentials: granular, Peridynamics, SPH, mesoscopic tubular potential (MESONT)
 * semi-empirical potentials: multi-ion generalized pseudopotential theory (MGPT), second moment tight binding + QEq (SMTB-Q), density functional tight-binding (LATTE)
 * electron force field (eFF, AWPMD)
-* bond potentials: harmonic, FENE, Morse, nonlinear, class 2,     quartic (breakable)
+* bond potentials: harmonic, FENE, Morse, nonlinear, class 2, quartic (breakable), tabulated
-* angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic,     class 2 (COMPASS)
+* angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic, class 2 (COMPASS), tabulated
-* dihedral potentials: harmonic, CHARMM, multi-harmonic, helix,     class 2 (COMPASS), OPLS
+* dihedral potentials: harmonic, CHARMM, multi-harmonic, helix, class 2 (COMPASS), OPLS, tabulated
-* improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS)
+* improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS), tabulated
 * polymer potentials: all-atom, united-atom, bead-spring, breakable
-* water potentials: TIP3P, TIP4P, SPC
+* water potentials: TIP3P, TIP4P, SPC, SPC/E and variants
 * interlayer potentials for graphene and analogues
 * metal-organic framework potentials (QuickFF, MO-FF)
 * implicit solvent potentials: hydrodynamic lubrication, Debye
-* force-field compatibility with common CHARMM, AMBER, DREIDING,     OPLS, GROMACS, COMPASS options
+* force-field compatibility with common CHARMM, AMBER, DREIDING, OPLS, GROMACS, COMPASS options
 * access to the `OpenKIM Repository <http://openkim.org>`_ of potentials via     :doc:`kim command <kim_commands>`
-* hybrid potentials: multiple pair, bond, angle, dihedral, improper     potentials can be used in one simulation
+* hybrid potentials: multiple pair, bond, angle, dihedral, improper potentials can be used in one simulation
-* overlaid potentials: superposition of multiple pair potentials
+* overlaid potentials: superposition of multiple pair potentials (including many-body) with optional scale factor
 .. _create:
@ -124,9 +130,10 @@ Ensembles, constraints, and boundary conditions
 * harmonic (umbrella) constraint forces
 * rigid body constraints
 * SHAKE bond and angle constraints
-* Monte Carlo bond breaking, formation, swapping
+* motion constraints to manifold surfaces
 * Monte Carlo bond breaking, formation, swapping, template based reaction modeling
 * atom/molecule insertion and deletion
-* walls of various kinds
+* walls of various kinds, static and moving
 * non-equilibrium molecular dynamics (NEMD)
 * variety of additional boundary conditions and constraints
@ -150,6 +157,7 @@ Diagnostics
 ^^^^^^^^^^^
 * see various flavors of the :doc:`fix <fix>` and :doc:`compute <compute>` commands
 * introspection command for system, simulation, and compile time settings and configurations
 .. _output:
@ -164,8 +172,9 @@ Output
 * parallel I/O of dump and restart files
 * per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
 * user-defined system-wide (log file) or per-atom (dump file) calculations
-* spatial and time averaging of per-atom quantities
+* custom partitioning (chunks) for binning, and static or dynamic grouping of atoms for analysis
-* time averaging of system-wide quantities
+* spatial, time, and per-chunk averaging of per-atom quantities
 * time averaging and histogramming of system-wide quantities
 * atom snapshots in native, XYZ, XTC, DCD, CFG formats
 .. _replica1:
@ -178,7 +187,7 @@ Multi-replica models
 * :doc:`parallel replica dynamics <prd>`
 * :doc:`temperature accelerated dynamics <tad>`
 * :doc:`parallel tempering <temper>`
-* :doc:`path-integral MD <fix_pimd>`
+* path-integral MD: `first variant <fix_pimd>`, `second variant <fix_ipi>`
 * multi-walker collective variables with :doc:`Colvars <fix_colvars>` and :doc:`Plumed <fix_plumed>`
 .. _prepost:
@ -210,11 +219,12 @@ page for details.
 These are LAMMPS capabilities which you may not think of as typical
 classical MD options:
-* :doc:`static <balance>` and :doc:`dynamic load-balancing <fix_balance>`
+* :doc:`static <balance>` and :doc:`dynamic load-balancing <fix_balance>`, optional with recursive bisectioning decomposition
 * :doc:`generalized aspherical particles <Howto_body>`
 * :doc:`stochastic rotation dynamics (SRD) <fix_srd>`
-* :doc:`real-time visualization and interactive MD <fix_imd>`
+* :doc:`real-time visualization and interactive MD <fix_imd>`, :doc:`built-in renderer for images and movies <dump_image>`
 * calculate :doc:`virtual diffraction patterns <compute_xrd>`
 * calculate :doc:`finite temperature phonon dispersion <fix_phonon>` and the :doc:`dynamical matrix of minimized structures <dynamical_matrix>`
 * :doc:`atom-to-continuum coupling <fix_atc>` with finite elements
 * coupled rigid body integration via the :doc:`POEMS <fix_poems>` library
 * :doc:`QM/MM coupling <fix_qmmm>`
--- a/doc/src/Intro_opensource.rst
+++ b/doc/src/Intro_opensource.rst
@ -1,40 +1,61 @@
 LAMMPS open-source license
 --------------------------
-LAMMPS is a freely-available open-source code, distributed under the
+GPL version of LAMMPS
-terms of the `GNU Public License Version 2 <gpl_>`_, which means you can
+^^^^^^^^^^^^^^^^^^^^^
-use or modify the code however you wish for your own purposes, but have
+
-to adhere to certain rules when redistributing it or software derived
+LAMMPS is an open-source code, available free-of-charge, and distributed
 under the terms of the `GNU Public License Version 2 <gpl_>`_ (GPLv2),
 which means you can use or modify the code however you wish for your own
 purposes, but have to adhere to certain rules when redistributing it -
 specifically in binary form - or are distributing software derived
 from it or that includes parts of it.
-LAMMPS comes with no warranty of any kind.  As each source file states
+LAMMPS comes with no warranty of any kind.
-in its header, it is a copyrighted code that is distributed free-of-
+
-charge, under the terms of the `GNU Public License Version 2 <gpl_>`_
+As each source file states in its header, it is a copyrighted code, and
-(GPLv2).  This is often referred to as open-source distribution - see
+thus not in the public domain. For more information about open-source
-`www.gnu.org <gnuorg_>`_ or `www.opensource.org <opensource_>`_.  The
+software and open-source distribution, see `www.gnu.org <gnuorg_>`_
-legal text of the GPL is in the LICENSE file included in the LAMMPS
+or `www.opensource.org <opensource_>`_.  The legal text of the GPL as it
-distribution.
+applies to LAMMPS is in the LICENSE file included in the LAMMPS distribution.
 .. _gpl: https://github.com/lammps/lammps/blob/master/LICENSE
 .. _lgpl: https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html
 .. _gnuorg: http://www.gnu.org
 .. _opensource: http://www.opensource.org
-Here is a summary of what the GPL means for LAMMPS users:
+Here is a more specific summary of what the GPL means for LAMMPS users:
-(1) Anyone is free to use, modify, or extend LAMMPS in any way they
+(1) Anyone is free to use, copy, modify, or extend LAMMPS in any way they
 choose, including for commercial purposes.
 (2) If you **distribute** a modified version of LAMMPS, it must remain
-open-source, meaning you distribute **all** of it under the terms of
+open-source, meaning you are required to distribute **all** of it under
-the GPL.  You should clearly annotate such a code as a derivative version
+the terms of the GPL.  You should clearly annotate such a modified code
-of LAMMPS.
+as a derivative version of LAMMPS.
 (3) If you release any code that includes or uses LAMMPS source code,
 then it must also be open-sourced, meaning you distribute it under
-the terms of the GPL.
+the terms of the GPL.  You may write code that interfaces LAMMPS to
 a differently licensed library.  In that case the code that provides
 the interface must be licensed GPL, but not necessarily that library
 unless you are distributing binaries that require the library to run.
 (4) If you give LAMMPS files to someone else, the GPL LICENSE file and
 source file headers (including the copyright and GPL notices) should
 remain part of the code.
 LGPL version of LAMMPS
 ^^^^^^^^^^^^^^^^^^^^^^
 We occasionally make stable LAMMPS releases available under the `GNU
 Lesser Public License v2.1 <lgpl_>`_.  This is on request only and with
 non-LGPL compliant files removed.  This allows uses linking non-GPL
 compatible software with the (otherwise unmodified) LAMMPS library
 or loading it dynamically at runtime.  Any **modifications** to
 the LAMMPS code however, even with the LGPL licensed version, must still
 be made available under the same open source terms as LAMMPS itself.
--- a/doc/src/Intro_overview.rst
+++ b/doc/src/Intro_overview.rst
@ -10,24 +10,26 @@ conditions.  It can model 2d or 3d systems with only a few particles
 up to millions or billions.
 LAMMPS can be built and run on a laptop or desktop machine, but is
-designed for parallel computers.  It will run on any parallel machine
+designed for parallel computers.  It will run in serial and on any
-that supports the `MPI <mpi_>`_ message-passing library.  This includes
+parallel machine that supports the `MPI <mpi_>`_ message-passing
-shared-memory boxes and distributed-memory clusters and
+library.  This includes shared-memory boxes and distributed-memory
-supercomputers.
+clusters and supercomputers. Parts of LAMMPS also support
 `OpenMP multi-threading <omp_>`_, vectorization and GPU acceleration.
 .. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface
 .. _lws: https://www.lammps.org
 .. _omp: https://www.openmp.org
 LAMMPS is written in C++ and requires a compiler that is at least
-compatible with the C++-11 standard.
+compatible with the C++-11 standard.  Earlier versions were written in
-Earlier versions were written in F77 and F90.  See the `History page
+F77, F90, and C++-98.  See the `History page
 <https://www.lammps.org/history.html>`_ of the website for details.  All
-versions can be downloaded from the `LAMMPS website <lws_>`_.
+versions can be downloaded as source code from the `LAMMPS website
 <lws_>`_.
-LAMMPS is designed to be easy to modify or extend with new
+LAMMPS is designed to be easy to modify or extend with new capabilities,
-capabilities, such as new force fields, atom types, boundary
+such as new force fields, atom types, boundary conditions, or
-conditions, or diagnostics.  See the :doc:`Modify <Modify>` page for
+diagnostics.  See the :doc:`Modify <Modify>` page for more details.
 more details.
 In the most general sense, LAMMPS integrates Newton's equations of
 motion for a collection of interacting particles.  A single particle
@ -47,4 +49,5 @@ MPI parallelization to partition the simulation domain into small
 sub-domains of equal computational cost, one of which is assigned to
 each processor.  Processors communicate and store "ghost" atom
 information for atoms that border their sub-domain.  Multi-threading
-parallelization with with particle-decomposition can be used in addition.
+parallelization and GPU acceleration with with particle-decomposition
 can be used in addition.
--- a/doc/src/Manual_version.rst
+++ b/doc/src/Manual_version.rst
@ -2,12 +2,21 @@ What does a LAMMPS version mean
 -------------------------------
 The LAMMPS "version" is the date when it was released, such as 1 May
-2014. LAMMPS is updated continuously.  Whenever we fix a bug or add a
+2014.  LAMMPS is updated continuously and we aim to keep it working
-feature, we release it in the next *patch* release, which are
+correctly and reliably at all times.  You can follow its development
-typically made every couple of weeks.  Info on patch releases are on
+in a public `git repository on GitHub <https://github.com/lammps/lammps>`_.
-`this website page <https://www.lammps.org/bug.html>`_. Every few
+
-months, the latest patch release is subjected to more thorough testing
+Whenever we fix a bug or update or add a feature, it will be merged into
-and labeled as a *stable* version.
+the `master` branch of the git repository.  When a sufficient number of
 changes have accumulated *and* the software passes a set of automated
 tests, we release it in the next *patch* release, which are made every
 few weeks.  Info on patch releases are on `this website page
 <https://www.lammps.org/bug.html>`_.
 Once or twice a year, only bug fixes and small, non-intrusive changes are
 included for a period of time, and the code is subjected to more detailed
 and thorough testing than the default automated testing.  The latest
 patch release after such a period is then labeled as a *stable* version.
 Each version of LAMMPS contains all the features and bug-fixes up to
 and including its version date.
--- a/doc/src/fix_halt.rst
+++ b/doc/src/fix_halt.rst
@ -19,7 +19,7 @@ Syntax
       bondmax = length of longest bond in the system (in length units)
       tlimit = elapsed CPU time (in seconds)
-       diskfree = free disk space (in megabytes)
+       diskfree = free disk space (in MBytes)
       v_name = name of :doc:`equal-style variable <variable>`
 * operator = "<" or "<=" or ">" or ">=" or "==" or "!=" or "\|\^"
@ -81,7 +81,7 @@ the timer frequently across a large number of processors may be
 non-negligible.
 The *diskfree* attribute will check for available disk space (in
-megabytes) on supported operating systems. By default it will
+MBytes) on supported operating systems. By default it will
 check the file system of the current working directory.  This
 can be changed with the optional *path* keyword, which will take
 the path to a file or folder on the file system to be checked
--- a/doc/src/fix_qtb.rst
+++ b/doc/src/fix_qtb.rst
@ -128,7 +128,7 @@ spectrum while consumes more memory.  With fixed *f_max* and
 :math:`\gamma`, *N_f* should be big enough to converge the classical
 temperature :math:`T^{cl}` as a function of target quantum bath
 temperature. Memory usage per processor could be from 10 to 100
-Mbytes.
+MBytes.
 .. note::
--- a/doc/src/pair_snap.rst
+++ b/doc/src/pair_snap.rst
@ -135,7 +135,7 @@ with #) anywhere. Each non-blank non-comment line must contain one
 keyword/value pair. The required keywords are *rcutfac* and
 *twojmax*\ . Optional keywords are *rfac0*, *rmin0*,
 *switchflag*, *bzeroflag*, *quadraticflag*, *chemflag*,
-*bnormflag*, *wselfallflag*, and *chunksize*\ .
+*bnormflag*, *wselfallflag*, *chunksize*, and *parallelthresh*\ .
 The default values for these keywords are
@ -147,7 +147,8 @@ The default values for these keywords are
 * *chemflag* = 0
 * *bnormflag* = 0
 * *wselfallflag* = 0
-* *chunksize* = 4096
+* *chunksize* = 32768
 * *parallelthresh* = 8192
 If *quadraticflag* is set to 1, then the SNAP energy expression includes
 additional quadratic terms that have been shown to increase the overall
@ -188,14 +189,24 @@ corresponding *K*-vector of linear coefficients for element
 which must equal the number of unique elements appearing in the LAMMPS
 pair_coeff command, to avoid ambiguity in the number of coefficients.
-The keyword *chunksize* is only applicable when using the
+The keywords *chunksize* and *parallelthresh* are only applicable when
-pair style *snap* with the KOKKOS package and is ignored otherwise.
+using the pair style *snap* with the KOKKOS package on GPUs and are
-This keyword controls
+ignored otherwise.
 The *chunksize* keyword controls
 the number of atoms in each pass used to compute the bispectrum
 components and is used to avoid running out of memory. For example
 if there are 8192 atoms in the simulation and the *chunksize*
 is set to 4096, the bispectrum calculation will be broken up
-into two passes.
+into two passes (running on a single GPU).
 The *parallelthresh* keyword controls
 a crossover threshold for performing extra parallelism. For
 small systems, exposing additional parallism can be beneficial when
 there is not enough work to fully saturate the GPU threads otherwise.
 However, the extra parallelism also leads to more divergence
 and can hurt performance when the system is already large enough to
 saturate the GPU threads. Extra parallelism will be performed if the
 *chunksize* (or total number of atoms per GPU) is smaller than
 *parallelthresh*.
 Detailed definitions for all the other keywords
 are given on the :doc:`compute sna/atom <compute_sna_atom>` doc page.
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@ -1174,6 +1174,7 @@ googletest
 Gordan
 Goudeau
 GPa
 GPL
 gpu
 gpuID
 gpus
@ -1689,6 +1690,7 @@ Lett
 Leuven
 Leven
 Lewy
 LGPL
 lgvdw
 Liang
 libatc
@ -1889,7 +1891,6 @@ maxX
 Mayergoyz
 Mayoral
 mbt
 Mbytes
 MBytes
 mc
 McLachlan
--- a/src/KOKKOS/pair_snap_kokkos.h
+++ b/src/KOKKOS/pair_snap_kokkos.h
@ -44,7 +44,8 @@ struct TagPairSNAPComputeForce{};
 struct TagPairSNAPComputeNeigh{};
 struct TagPairSNAPComputeCayleyKlein{};
 struct TagPairSNAPPreUi{};
-struct TagPairSNAPComputeUi{};
+struct TagPairSNAPComputeUiSmall{}; // more parallelism, more divergence
 struct TagPairSNAPComputeUiLarge{}; // less parallelism, no divergence
 struct TagPairSNAPTransformUi{}; // re-order ulisttot from SoA to AoSoA, zero ylist
 struct TagPairSNAPComputeZi{};
 struct TagPairSNAPBeta{};
@ -53,7 +54,9 @@ struct TagPairSNAPTransformBi{}; // re-order blist from AoSoA to AoS
 struct TagPairSNAPComputeYi{};
 struct TagPairSNAPComputeYiWithZlist{};
 template<int dir>
-struct TagPairSNAPComputeFusedDeidrj{};
+struct TagPairSNAPComputeFusedDeidrjSmall{}; // more parallelism, more divergence
 template<int dir>
 struct TagPairSNAPComputeFusedDeidrjLarge{}; // less parallelism, no divergence
 // CPU backend only
 struct TagPairSNAPComputeNeighCPU{};
@ -143,7 +146,10 @@ public:
  void operator() (TagPairSNAPPreUi,const int iatom_mod, const int j, const int iatom_div) const;
  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeUi,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUi>::member_type& team) const;
+  void operator() (TagPairSNAPComputeUiSmall,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiSmall>::member_type& team) const;
  KOKKOS_INLINE_FUNCTION
  void operator() (TagPairSNAPComputeUiLarge,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiLarge>::member_type& team) const;
  KOKKOS_INLINE_FUNCTION
  void operator() (TagPairSNAPTransformUi,const int iatom_mod, const int j, const int iatom_div) const;
@ -168,7 +174,11 @@ public:
  template<int dir>
  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeFusedDeidrj<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrj<dir> >::member_type& team) const;
+  void operator() (TagPairSNAPComputeFusedDeidrjSmall<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrjSmall<dir> >::member_type& team) const;
  template<int dir>
  KOKKOS_INLINE_FUNCTION
  void operator() (TagPairSNAPComputeFusedDeidrjLarge<dir>,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeFusedDeidrjLarge<dir> >::member_type& team) const;
  // CPU backend only
  KOKKOS_INLINE_FUNCTION
--- a/src/KOKKOS/pair_snap_kokkos_impl.h
+++ b/src/KOKKOS/pair_snap_kokkos_impl.h
@ -341,18 +341,32 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::compute(int eflag_in,
      // ComputeUi w/vector parallelism, shared memory, direct atomicAdd into ulisttot
      {
        // team_size_compute_ui is defined in `pair_snap_kokkos.h`
        // scratch size: 32 atoms * (twojmax+1) cached values, no double buffer
        const int tile_size = vector_length * (twojmax + 1);
        const int scratch_size = scratch_size_helper<complex>(team_size_compute_ui * tile_size);
-        // total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
+        if (chunk_size < parallel_thresh)
-        const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
+        {
-        const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
+          // Version with parallelism over j_bend
-        SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUi> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
+          // total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
-        policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
+          const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
-        Kokkos::parallel_for("ComputeUi",policy_ui,*this);
+          const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
          SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUiSmall> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
          policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
          Kokkos::parallel_for("ComputeUiSmall",policy_ui,*this);
        } else {
          // Version w/out parallelism over j_bend
          // total number of teams needed: (natoms / 32) * (max_neighs)
          const int n_teams = chunk_size_div * max_neighs;
          const int n_teams_div = (n_teams + team_size_compute_ui - 1) / team_size_compute_ui;
          SnapAoSoATeamPolicy<DeviceType, team_size_compute_ui, TagPairSNAPComputeUiLarge> policy_ui(n_teams_div, team_size_compute_ui, vector_length);
          policy_ui = policy_ui.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
          Kokkos::parallel_for("ComputeUiLarge",policy_ui,*this);
        }
      }
      //TransformUi: un-"fold" ulisttot, zero ylist
@ -412,25 +426,51 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::compute(int eflag_in,
        const int tile_size = vector_length * (twojmax + 1);
        const int scratch_size = scratch_size_helper<complex>(2 * team_size_compute_fused_deidrj * tile_size);
-        // total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
+        if (chunk_size < parallel_thresh)
-        const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
+        {
-        const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
+          // Version with parallelism over j_bend
-        // x direction
+          // total number of teams needed: (natoms / 32) * (max_neighs) * ("bend" locations)
-        SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
+          const int n_teams = chunk_size_div * max_neighs * (twojmax + 1);
-        policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
+          const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
        Kokkos::parallel_for("ComputeFusedDeidrj<0>",policy_fused_deidrj_x,*this);
-        // y direction
+          // x direction
-        SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
+          SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
-        policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
+          policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
-        Kokkos::parallel_for("ComputeFusedDeidrj<1>",policy_fused_deidrj_y,*this);
+          Kokkos::parallel_for("ComputeFusedDeidrjSmall<0>",policy_fused_deidrj_x,*this);
-        // z direction
+          // y direction
-        SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrj<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
+          SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
-        policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
+          policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
-        Kokkos::parallel_for("ComputeFusedDeidrj<2>",policy_fused_deidrj_z,*this);
+          Kokkos::parallel_for("ComputeFusedDeidrjSmall<1>",policy_fused_deidrj_y,*this);
          // z direction
          SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjSmall<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
          policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
          Kokkos::parallel_for("ComputeFusedDeidrjSmall<2>",policy_fused_deidrj_z,*this);
        } else {
          // Version w/out parallelism over j_bend
          // total number of teams needed: (natoms / 32) * (max_neighs)
          const int n_teams = chunk_size_div * max_neighs;
          const int n_teams_div = (n_teams + team_size_compute_fused_deidrj - 1) / team_size_compute_fused_deidrj;
          // x direction
          SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<0> > policy_fused_deidrj_x(n_teams_div,team_size_compute_fused_deidrj,vector_length);
          policy_fused_deidrj_x = policy_fused_deidrj_x.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
          Kokkos::parallel_for("ComputeFusedDeidrjLarge<0>",policy_fused_deidrj_x,*this);
          // y direction
          SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<1> > policy_fused_deidrj_y(n_teams_div,team_size_compute_fused_deidrj,vector_length);
          policy_fused_deidrj_y = policy_fused_deidrj_y.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
          Kokkos::parallel_for("ComputeFusedDeidrjLarge<1>",policy_fused_deidrj_y,*this);
          // z direction
          SnapAoSoATeamPolicy<DeviceType, team_size_compute_fused_deidrj, TagPairSNAPComputeFusedDeidrjLarge<2> > policy_fused_deidrj_z(n_teams_div,team_size_compute_fused_deidrj,vector_length);
          policy_fused_deidrj_z = policy_fused_deidrj_z.set_scratch_size(0, Kokkos::PerTeam(scratch_size));
          Kokkos::parallel_for("ComputeFusedDeidrjLarge<2>",policy_fused_deidrj_z,*this);
        }
      }
 #endif // LMP_KOKKOS_GPU
@ -603,13 +643,13 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
    for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
      const auto idxb = icoeff % idxb_max;
      const auto idx_chem = icoeff / idxb_max;
-      auto bveci = my_sna.blist(idxb, idx_chem, ii);
+      real_type bveci = my_sna.blist(ii, idx_chem, idxb);
      d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bveci;
      k++;
      for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
        const auto jdxb = jcoeff % idxb_max;
        const auto jdx_chem = jcoeff / idxb_max;
-        real_type bvecj = my_sna.blist(jdxb, jdx_chem, ii);
+        real_type bvecj = my_sna.blist(ii, jdx_chem, jdxb);
        d_beta_pack(iatom_mod,icoeff,iatom_div) += d_coeffi[k]*bvecj;
        d_beta_pack(iatom_mod,jcoeff,iatom_div) += d_coeffi[k]*bveci;
        k++;
@ -736,7 +776,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
 template<class DeviceType, typename real_type, int vector_length>
 KOKKOS_INLINE_FUNCTION
-void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUi,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUi>::member_type& team) const {
+void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUiSmall,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUiSmall>::member_type& team) const {
  SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
  // extract flattened atom_div / neighbor number / bend location
@ -756,11 +796,37 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
    const int ninside = d_ninside(ii);
    if (jj >= ninside) return;
-    my_sna.compute_ui(team,iatom_mod, jbend, jj, iatom_div);
+    my_sna.compute_ui_small(team, iatom_mod, jbend, jj, iatom_div);
  });
 }
 template<class DeviceType, typename real_type, int vector_length>
 KOKKOS_INLINE_FUNCTION
 void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeUiLarge,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeUiLarge>::member_type& team) const {
  SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
  // extract flattened atom_div / neighbor number / bend location
  int flattened_idx = team.team_rank() + team.league_rank() * team_size_compute_ui;
  // extract neighbor index, iatom_div
  int iatom_div = flattened_idx / max_neighs; // removed "const" to work around GCC 7 bug
  int jj = flattened_idx - iatom_div * max_neighs;
  Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, vector_length),
    [&] (const int iatom_mod) {
    const int ii = iatom_mod + vector_length * iatom_div;
    if (ii >= chunk_size) return;
    const int ninside = d_ninside(ii);
    if (jj >= ninside) return;
    my_sna.compute_ui_large(team,iatom_mod, jj, iatom_div);
  });
 }
 template<class DeviceType, typename real_type, int vector_length>
 KOKKOS_INLINE_FUNCTION
 void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPTransformUi,const int iatom_mod, const int idxu, const int iatom_div) const {
@ -861,9 +927,9 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
  for (int itriple = 0; itriple < ntriples; itriple++) {
-    const auto blocal = my_sna.blist_pack(iatom_mod, idxb, itriple, iatom_div);
+    const real_type blocal = my_sna.blist_pack(iatom_mod, idxb, itriple, iatom_div);
-    my_sna.blist(idxb, itriple, iatom) = blocal;
+    my_sna.blist(iatom, itriple, idxb) = blocal;
  }
 }
@ -871,7 +937,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
 template<class DeviceType, typename real_type, int vector_length>
 template<int dir>
 KOKKOS_INLINE_FUNCTION
-void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrj<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrj<dir> >::member_type& team) const {
+void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrjSmall<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrjSmall<dir> >::member_type& team) const {
  SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
  // extract flattened atom_div / neighbor number / bend location
@ -891,12 +957,38 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
    const int ninside = d_ninside(ii);
    if (jj >= ninside) return;
-    my_sna.template compute_fused_deidrj<dir>(team, iatom_mod, jbend, jj, iatom_div);
+    my_sna.template compute_fused_deidrj_small<dir>(team, iatom_mod, jbend, jj, iatom_div);
  });
 }
 template<class DeviceType, typename real_type, int vector_length>
 template<int dir>
 KOKKOS_INLINE_FUNCTION
 void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSNAPComputeFusedDeidrjLarge<dir>,const typename Kokkos::TeamPolicy<DeviceType,TagPairSNAPComputeFusedDeidrjLarge<dir> >::member_type& team) const {
  SNAKokkos<DeviceType, real_type, vector_length> my_sna = snaKK;
  // extract flattened atom_div / neighbor number / bend location
  int flattened_idx = team.team_rank() + team.league_rank() * team_size_compute_fused_deidrj;
  // extract neighbor index, iatom_div
  int iatom_div = flattened_idx / max_neighs; // removed "const" to work around GCC 7 bug
  int jj = flattened_idx - max_neighs * iatom_div;
  Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, vector_length),
    [&] (const int iatom_mod) {
    const int ii = iatom_mod + vector_length * iatom_div;
    if (ii >= chunk_size) return;
    const int ninside = d_ninside(ii);
    if (jj >= ninside) return;
    my_sna.template compute_fused_deidrj_large<dir>(team, iatom_mod, jj, iatom_div);
  });
 }
 /* ----------------------------------------------------------------------
   Begin routines that are unique to the CPU codepath. These do not take
   advantage of AoSoA data layouts, but that could be a good point of
@ -925,13 +1017,13 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
    for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
      const auto idxb = icoeff % idxb_max;
      const auto idx_chem = icoeff / idxb_max;
-      auto bveci = my_sna.blist(idxb,idx_chem,ii);
+      real_type bveci = my_sna.blist(ii,idx_chem,idxb);
      d_beta(icoeff,ii) += d_coeffi[k]*bveci;
      k++;
      for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
        const auto jdxb = jcoeff % idxb_max;
        const auto jdx_chem = jcoeff / idxb_max;
-        auto bvecj = my_sna.blist(jdxb,jdx_chem,ii);
+        real_type bvecj = my_sna.blist(ii,jdx_chem,jdxb);
        d_beta(icoeff,ii) += d_coeffi[k]*bvecj;
        d_beta(jcoeff,ii) += d_coeffi[k]*bveci;
        k++;
@ -1221,7 +1313,7 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
      for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
        const auto idxb = icoeff % idxb_max;
        const auto idx_chem = icoeff / idxb_max;
-        evdwl += d_coeffi[icoeff+1]*my_sna.blist(idxb,idx_chem,ii);
+        evdwl += d_coeffi[icoeff+1]*my_sna.blist(ii,idx_chem,idxb);
      }
      // quadratic contributions
@ -1230,12 +1322,12 @@ void PairSNAPKokkos<DeviceType, real_type, vector_length>::operator() (TagPairSN
        for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
          const auto idxb = icoeff % idxb_max;
          const auto idx_chem = icoeff / idxb_max;
-          auto bveci = my_sna.blist(idxb,idx_chem,ii);
+          real_type bveci = my_sna.blist(ii,idx_chem,idxb);
          evdwl += 0.5*d_coeffi[k++]*bveci*bveci;
          for (int jcoeff = icoeff+1; jcoeff < ncoeff; jcoeff++) {
            auto jdxb = jcoeff % idxb_max;
            auto jdx_chem = jcoeff / idxb_max;
-            auto bvecj = my_sna.blist(jdxb,jdx_chem,ii);
+            auto bvecj = my_sna.blist(ii,jdx_chem,jdxb);
            evdwl += d_coeffi[k++]*bveci*bvecj;
          }
        }
--- a/src/KOKKOS/sna_kokkos.h
+++ b/src/KOKKOS/sna_kokkos.h
@ -45,12 +45,12 @@ struct WignerWrapper {
  { ; }
  KOKKOS_INLINE_FUNCTION
-  complex get(const int& ma) {
+  complex get(const int& ma) const {
    return complex(buffer[offset + 2 * vector_length * ma], buffer[offset + vector_length + 2 * vector_length * ma]);
  }
  KOKKOS_INLINE_FUNCTION
-  void set(const int& ma, const complex& store) {
+  void set(const int& ma, const complex& store) const {
    buffer[offset + 2 * vector_length * ma] = store.re;
    buffer[offset + vector_length + 2 * vector_length * ma] = store.im;
  }
@ -122,8 +122,14 @@ inline
  void compute_cayley_klein(const int&, const int&, const int&);
  KOKKOS_INLINE_FUNCTION
  void pre_ui(const int&, const int&, const int&, const int&); // ForceSNAP
  // version of the code with parallelism over j_bend
  KOKKOS_INLINE_FUNCTION
-  void compute_ui(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); // ForceSNAP
+  void compute_ui_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); // ForceSNAP
  // version of the code without parallelism over j_bend
  KOKKOS_INLINE_FUNCTION
  void compute_ui_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); // ForceSNAP
  KOKKOS_INLINE_FUNCTION
  void compute_zi(const int&, const int&, const int&);    // ForceSNAP
  KOKKOS_INLINE_FUNCTION
@ -135,6 +141,35 @@ inline
  KOKKOS_INLINE_FUNCTION
  void compute_bi(const int&, const int&, const int&);    // ForceSNAP
  // functions for derivatives, GPU only
  // version of the code with parallelism over j_bend
  template<int dir>
  KOKKOS_INLINE_FUNCTION
  void compute_fused_deidrj_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); //ForceSNAP
  // version of the code without parallelism over j_bend
  template<int dir>
  KOKKOS_INLINE_FUNCTION
  void compute_fused_deidrj_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); //ForceSNAP
  // core "evaluation" functions that get plugged into "compute" functions
  // plugged into compute_ui_small, compute_ui_large
  KOKKOS_FORCEINLINE_FUNCTION
  void evaluate_ui_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&, const int&,
                        const int&, const int&, const int&);
  // plugged into compute_zi, compute_yi
  KOKKOS_FORCEINLINE_FUNCTION
  complex evaluate_zi(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
                        const int&, const int&, const int&, const int&, const real_type*);
  // plugged into compute_yi, compute_yi_with_zlist
  KOKKOS_FORCEINLINE_FUNCTION
  real_type evaluate_beta_scaled(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
                        const Kokkos::View<real_type***, Kokkos::LayoutLeft, DeviceType> &);
  // plugged into compute_fused_deidrj_small, compute_fused_deidrj_large
  KOKKOS_FORCEINLINE_FUNCTION
  real_type evaluate_duidrj_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&, 
                        const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&,
                        const int&, const int&, const int&, const int&);
  // functions for bispectrum coefficients, CPU only
  KOKKOS_INLINE_FUNCTION
  void pre_ui_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team,const int&,const int&); // ForceSNAP
@ -148,11 +183,6 @@ inline
    KOKKOS_INLINE_FUNCTION
  void compute_bi_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int);    // ForceSNAP
  // functions for derivatives, GPU only
  template<int dir>
  KOKKOS_INLINE_FUNCTION
  void compute_fused_deidrj(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); //ForceSNAP
  // functions for derivatives, CPU only
  KOKKOS_INLINE_FUNCTION
  void compute_duidrj_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int); //ForceSNAP
@ -168,23 +198,6 @@ inline
  KOKKOS_INLINE_FUNCTION
  void compute_s_dsfac(const real_type, const real_type, real_type&, real_type&); // compute_cayley_klein
  static KOKKOS_FORCEINLINE_FUNCTION
  void sincos_wrapper(double x, double* sin_, double *cos_) {
 #ifdef __SYCL_DEVICE_ONLY__
    *sin_ = sycl::sincos(x, cos_);
 #else
    sincos(x, sin_, cos_);
 #endif
  }
  static KOKKOS_FORCEINLINE_FUNCTION
  void sincos_wrapper(float x, float* sin_, float *cos_) {
 #ifdef __SYCL_DEVICE_ONLY__
    *sin_ = sycl::sincos(x, cos_);
 #else
    sincosf(x, sin_, cos_);
 #endif
  }
 #ifdef TIMING_INFO
  double* timers;
  timespec starttime, endtime;
@ -207,7 +220,7 @@ inline
  int twojmax, diagonalstyle;
-  t_sna_3d_ll blist;
+  t_sna_3d blist;
  t_sna_3c_ll ulisttot;
  t_sna_3c_ll ulisttot_full; // un-folded ulisttot, cpu only
  t_sna_3c_ll zlist;
--- a/src/KOKKOS/sna_kokkos_impl.h
+++ b/src/KOKKOS/sna_kokkos_impl.h
--- a/src/ML-SNAP/pair_snap.cpp
+++ b/src/ML-SNAP/pair_snap.cpp
@ -628,7 +628,8 @@ void PairSNAP::read_files(char *coefffilename, char *paramfilename)
  chemflag = 0;
  bnormflag = 0;
  wselfallflag = 0;
-  chunksize = 4096;
+  chunksize = 32768;
  parallel_thresh = 8192;
  // open SNAP parameter file on proc 0
@ -696,6 +697,8 @@ void PairSNAP::read_files(char *coefffilename, char *paramfilename)
      wselfallflag = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
    else if (keywd == "chunksize")
      chunksize = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
    else if (keywd == "parallelthresh")
      parallel_thresh = utils::inumeric(FLERR,keyval.c_str(),false,lmp);
    else
      error->all(FLERR,"Unknown parameter '{}' in SNAP "
                                   "parameter file", keywd);
--- a/src/ML-SNAP/pair_snap.h
+++ b/src/ML-SNAP/pair_snap.h
@ -59,7 +59,7 @@ class PairSNAP : public Pair {
  double **scale;         // for thermodynamic integration
  int twojmax, switchflag, bzeroflag, bnormflag;
  int chemflag, wselfallflag;
-  int chunksize;
+  int chunksize,parallel_thresh;
  double rfac0, rmin0, wj1, wj2;
  int rcutfacflag, twojmaxflag;    // flags for required parameters
  int beta_max;                    // length of beta
--- a/src/MOFFF/README
+++ b/src/MOFFF/README
@ -20,7 +20,7 @@ charges (dsf and long-range treatment of charges)
 out-of-plane angle
 See the file doc/drude_tutorial.html for getting started.
-See the doc pages for "pair_style buck6d/coul/gauss", "anlge_style class2",
+See the doc pages for "pair_style buck6d/coul/gauss", "angle_style class2",
 "angle_style cosine/buck6d", and "improper_style inversion/harmonic"
 commands to get started. Also see the above mentioned website and
 literature for further documentation about the force field.
--- a/tools/coding_standard/whitespace.py
+++ b/tools/coding_standard/whitespace.py
@ -34,6 +34,7 @@ exclude:
    - lib/hdnnp
    - lib/kim
    - lib/kokkos
    - lib/latte
    - lib/machdyn
    - lib/mdi
    - lib/mscg
@ -41,6 +42,7 @@ exclude:
    - lib/plumed
    - lib/quip
    - lib/scafacos
    - lib/voronoi
    - src/Make.sh
 patterns:
    - "*.c"