Merge branch 'develop' of https://www.github.com/lammps/lammps into kmc

2024-12-19 11:29:52 -07:00
parent 82569f4448 aeb2190582
commit acb107af5e
39 changed files with 1810 additions and 6077 deletions
--- a/doc/doxygen/Doxyfile.in
+++ b/doc/doxygen/Doxyfile.in
@ -2,7 +2,7 @@

 DOXYFILE_ENCODING      = UTF-8
 PROJECT_NAME           = "LAMMPS Programmer's Guide"
-PROJECT_NUMBER         = "4 May 2022"
+PROJECT_NUMBER         = "19 November 2024"
 PROJECT_BRIEF          = "Documentation of the LAMMPS library interface and Python wrapper"
 PROJECT_LOGO           = lammps-logo.png
 CREATE_SUBDIRS         = NO
--- a/doc/src/Intro_authors.rst
+++ b/doc/src/Intro_authors.rst
@ -8,6 +8,8 @@ send an email to all of them at this address: "developers at
 lammps.org".  General questions about LAMMPS should be posted in the
 `LAMMPS forum on MatSci <https://matsci.org/lammps/>`_.

+.. We need to keep this file in sync with https://www.lammps.org/authors.html
+
 .. raw:: latex

   \small
@ -27,7 +29,7 @@ lammps.org".  General questions about LAMMPS should be posted in the
   * - `Steve Plimpton <sjp_>`_
     - SNL (retired)
     - sjplimp at gmail.com
-     - MD kernels, parallel algorithms & scalability, code structure and design
+     - original author, MD kernels, parallel algorithms & scalability, code structure and design
   * - `Aidan Thompson <at_>`_
     - SNL
     - athomps at sandia.gov
--- a/doc/src/Packages_details.rst
+++ b/doc/src/Packages_details.rst
@ -2789,14 +2789,15 @@ implements smoothed particle hydrodynamics (SPH) for liquids.  See the
 related :ref:`MACHDYN package <PKG-MACHDYN>` package for smooth Mach dynamics
 (SMD) for solids.

-This package contains ideal gas, Lennard-Jones equation of states,
-Tait, and full support for complete (i.e. internal-energy dependent)
-equations of state.  It allows for plain or Monaghans XSPH integration
-of the equations of motion.  It has options for density continuity or
-density summation to propagate the density field.  It has
-:doc:`set <set>` command options to set the internal energy and density
-of particles from the input script and allows the same quantities to
-be output with thermodynamic output or to dump files via the :doc:`compute property/atom <compute_property_atom>` command.
+This package contains ideal gas, Lennard-Jones equation of states, Tait,
+and full support for complete (i.e. internal-energy dependent) equations
+of state.  It allows for plain or Monaghans XSPH integration of the
+equations of motion.  It has options for density continuity or density
+summation to propagate the density field.  It has :doc:`set <set>`
+command options to set the internal energy and density of particles from
+the input script and allows the same quantities to be output with
+thermodynamic output or to dump files via the :doc:`compute
+property/atom <compute_property_atom>` command.

 **Author:** Georg Ganzenmuller (Fraunhofer-Institute for High-Speed
 Dynamics, Ernst Mach Institute, Germany).
@ -2809,6 +2810,17 @@ Dynamics, Ernst Mach Institute, Germany).
 * ``examples/PACKAGES/sph``
 * https://www.lammps.org/movies.html#sph

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
+.. note::
+
+   Please also note, that the :ref:`RHEO package <PKG-RHEO>` offers
+   similar functionality in a more modern and flexible implementation.
+
 ----------

 .. _PKG-SPIN:
--- a/doc/src/bond_bpm_rotational.rst
+++ b/doc/src/bond_bpm_rotational.rst
@ -155,7 +155,7 @@ page on BPMs.
 If the *break* keyword is set to *no*, LAMMPS assumes bonds should not break
 during a simulation run. This will prevent some unnecessary calculation.
 The recommended bond communication distance no longer depends on bond failure
-coefficients (which are ignored) but instead corresponds to the typical heurestic
+coefficients (which are ignored) but instead corresponds to the typical heuristic
 maximum strain used by typical non-bpm bond styles. Similar behavior to *break no*
 can also be attained by setting arbitrarily high values for all four failure
 coefficients. One cannot use *break no* with *smooth yes*.
--- a/doc/src/bond_bpm_spring.rst
+++ b/doc/src/bond_bpm_spring.rst
@ -119,7 +119,7 @@ If the *break* keyword is set to *no*, LAMMPS assumes bonds should not break
 during a simulation run. This will prevent some unnecessary calculation.
 The recommended bond communication distance no longer depends on the value of
 :math:`\epsilon_c` (which is ignored) but instead corresponds to the typical
-heurestic maximum strain used by typical non-bpm bond styles. Similar behavior
+heuristic maximum strain used by typical non-bpm bond styles. Similar behavior
 to *break no* can also be attained by setting an arbitrarily high value of
 :math:`\epsilon_c`. One cannot use *break no* with *smooth yes*.

--- a/doc/src/compute_sph_e_atom.rst
+++ b/doc/src/compute_sph_e_atom.rst
@ -33,6 +33,12 @@ particle.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 The value of the internal energy will be 0.0 for atoms not in the
 specified compute group.

--- a/doc/src/compute_sph_rho_atom.rst
+++ b/doc/src/compute_sph_rho_atom.rst
@ -32,6 +32,12 @@ kernel function interpolation using "pair style sph/rhosum".
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 The value of the SPH density will be 0.0 for atoms not in the
 specified compute group.

--- a/doc/src/compute_sph_t_atom.rst
+++ b/doc/src/compute_sph_t_atom.rst
@ -37,6 +37,12 @@ particles, i.e. a Smooth-Particle Hydrodynamics particle.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 The value of the internal energy will be 0.0 for atoms not in the
 specified compute group.

--- a/doc/src/fix_rheo.rst
+++ b/doc/src/fix_rheo.rst
@ -64,7 +64,7 @@ Description

 Perform time integration for RHEO particles, updating positions, velocities,
 and densities. For a detailed breakdown of the integration timestep and
-numerical details, see :ref:`(Palermo) <rheo_palermo>`. For an overview
+numerical details, see :ref:`(Palermo) <fix_rheo_palermo>`. For an overview
 and list of other features available in the RHEO package, see
 :doc:`the RHEO howto <Howto_rheo>`.

@ -101,7 +101,7 @@ A modified form of Fickian particle shifting can be enabled with the
 more uniform spatial distribution. By default, shifting does not consider the
 type of a particle and therefore may be inappropriate in systems consisting
 of multiple atom types representing multiple fluid phases. However, two
-optional subarguments can follow the *shift* keyword, *exclude/type* and
+optional sub-arguments can follow the *shift* keyword, *exclude/type* and
 *scale/cross/type* to adjust shifting at fluid interfaces.

 The *exclude/type* option lets the user specify a list of atom types which
@ -155,7 +155,7 @@ threshold for this classification is set by the numerical value of
 By default, RHEO integrates particles' densities using a mass diffusion
 equation. Alternatively, one can update densities every timestep by performing
 a kernel summation of the masses of neighboring particles by specifying the *rho/sum*
-keyword. Following this keyword, one may include the optional *self/mass* subargument
+keyword. Following this keyword, one may include the optional *self/mass* sub-argument
 which modifies the behavior of the density summation. Typically, the density
 :math:`\rho` of a particle is calculated as the sum over neighbors

@ -218,11 +218,11 @@ Default

 ----------

-.. _rheo_palermo:
+.. _fix_rheo_palermo:

 **(Palermo)** Palermo, Wolf, Clemmer, O'Connor, Phys. Fluids, 36, 113337 (2024).

-.. _rheo_yang:
+.. _fix_rheo_yang:

 **(Yang)** Yang, Rakhsha, Hu, Negrut, J. Comp. Physics, 458, 111079 (2022).

--- a/doc/src/fix_sph.rst
+++ b/doc/src/fix_sph.rst
@ -32,6 +32,12 @@ Hydrodynamics.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 Restart, fix_modify, output, run start/stop, minimize info
 """""""""""""""""""""""""""""""""""""""""""""""""""""""""""

--- a/doc/src/fix_sph_stationary.rst
+++ b/doc/src/fix_sph_stationary.rst
@ -32,6 +32,12 @@ space.  SPH stands for Smoothed Particle Hydrodynamics.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 Restart, fix_modify, output, run start/stop, minimize info
 """""""""""""""""""""""""""""""""""""""""""""""""""""""""""

--- a/doc/src/pair_sph_heatconduction.rst
+++ b/doc/src/pair_sph_heatconduction.rst
@ -30,6 +30,12 @@ The transport model is the diffusion equation for the internal energy.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 The following coefficients must be defined for each pair of atoms
 types via the :doc:`pair_coeff <pair_coeff>` command as in the examples
 above.
--- a/doc/src/pair_sph_idealgas.rst
+++ b/doc/src/pair_sph_idealgas.rst
@ -36,6 +36,12 @@ particles from interpenetrating :ref:`(Monaghan) <ideal-Monoghan>`.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 The following coefficients must be defined for each pair of atoms
 types via the :doc:`pair_coeff <pair_coeff>` command as in the examples
 above.
--- a/doc/src/pair_sph_lj.rst
+++ b/doc/src/pair_sph_lj.rst
@ -34,6 +34,12 @@ interpenetrating :ref:`(Monaghan) <Monoghan>`.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 The following coefficients must be defined for each pair of atoms
 types via the :doc:`pair_coeff <pair_coeff>` command as in the examples
 above.
--- a/doc/src/pair_sph_rhosum.rst
+++ b/doc/src/pair_sph_rhosum.rst
@ -29,6 +29,12 @@ SPH particles by kernel function interpolation, every Nstep timesteps.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 The following coefficients must be defined for each pair of atoms
 types via the :doc:`pair_coeff <pair_coeff>` command as in the examples
 above.
--- a/doc/src/pair_sph_taitwater.rst
+++ b/doc/src/pair_sph_taitwater.rst
@ -41,6 +41,12 @@ prevent particles from interpenetrating :ref:`(Monaghan) <Monaghan>`.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 The following coefficients must be defined for each pair of atoms
 types via the :doc:`pair_coeff <pair_coeff>` command as in the examples
 above.
--- a/doc/src/pair_sph_taitwater_morris.rst
+++ b/doc/src/pair_sph_taitwater_morris.rst
@ -37,6 +37,12 @@ This pair style also computes laminar viscosity :ref:`(Morris) <Morris>`.
 See `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in
 LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 The following coefficients must be defined for each pair of atoms
 types via the :doc:`pair_coeff <pair_coeff>` command as in the examples
 above.
--- a/doc/src/set.rst
+++ b/doc/src/set.rst
@ -516,6 +516,12 @@ Keywords *sph/e*, *sph/cv*, and *sph/rho* set the energy, heat capacity,
 and density of smoothed particle hydrodynamics (SPH) particles.  See
 `this PDF guide <PDF/SPH_LAMMPS_userguide.pdf>`_ to using SPH in LAMMPS.

+.. note::
+
+   Please note that the SPH PDF guide file has not been updated for
+   many years and thus does not reflect the current *syntax* of the
+   SPH package commands. For that please refer to the LAMMPS manual.
+
 Keyword *smd/mass/density* sets the mass of all selected particles, but
 it is only applicable to the Smooth Mach Dynamics package MACHDYN.  It
 assumes that the particle volume has already been correctly set and
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@ -2499,6 +2499,7 @@ neel
 Neel
 Neelov
 Negre
+Negrut
 nelem
 Nelement
 Nelements
@ -3116,6 +3117,7 @@ Rafferty
 rahman
 Rahman
 Rajamanickam
+Rakhsha
 Ralf
 Raman
 ramped
--- a/examples/comb/README
+++ b/examples/comb/README
@ -20,5 +20,3 @@ Examples:
 4. in.comb.Cu2O.elastic: Cu2O crystal, qeq on, minimizes, then calculates
   elastic constants
 5. in.comb.HfO2: HfO2 polymorphs: Monoclinic HfO2 NVT @ 300K
-6. in.comb.CuaS: Metallic Cu and amorphous silica interface, qeq on,
-	five step NVE run
--- a/examples/comb/data.CuaS
+++ b/examples/comb/data.CuaS
--- a/src/COLVARS/colvarproxy_lammps.cpp
+++ b/src/COLVARS/colvarproxy_lammps.cpp
@ -204,7 +204,7 @@ cvm::rvector colvarproxy_lammps::position_distance(cvm::atom_pos const &pos1,
  double xtmp = pos2.x - pos1.x;
  double ytmp = pos2.y - pos1.y;
  double ztmp = pos2.z - pos1.z;
-  _lmp->domain->minimum_image(xtmp,ytmp,ztmp);
+  _lmp->domain->minimum_image_big(xtmp,ytmp,ztmp);
  return {xtmp, ytmp, ztmp};
 }

--- a/src/KOKKOS/pair_snap_kokkos.h
+++ b/src/KOKKOS/pair_snap_kokkos.h
@ -30,29 +30,34 @@ PairStyle(snap/kk/host,PairSNAPKokkosDevice<LMPHostType>);
 #include "pair_snap.h"
 #include "kokkos_type.h"
 #include "neigh_list_kokkos.h"
-#include "sna_kokkos.h"
 #include "pair_kokkos.h"

+namespace LAMMPS_NS {
+// pre-declare so sna_kokkos.h can refer to it
+template<class DeviceType, typename real_type_, int vector_length_> class PairSNAPKokkos;
+};
+
+#include "sna_kokkos.h"
+
 namespace LAMMPS_NS {

 // Routines for both the CPU and GPU backend
+struct TagPairSNAPPreUi{};
+struct TagPairSNAPTransformUi{}; // re-order ulisttot from SoA to AoSoA, zero ylist
+template <bool chemsnap> struct TagPairSNAPComputeZi{};
+template <bool chemsnap> struct TagPairSNAPComputeBi{};
+struct TagPairSNAPComputeBetaLinear{};
+struct TagPairSNAPComputeBetaQuadratic{};
+template <bool chemsnap> struct TagPairSNAPComputeYi{};
+template <bool chemsnap> struct TagPairSNAPComputeYiWithZlist{};
 template<int NEIGHFLAG, int EVFLAG>
 struct TagPairSNAPComputeForce{};

-
 // GPU backend only
 struct TagPairSNAPComputeNeigh{};
 struct TagPairSNAPComputeCayleyKlein{};
-struct TagPairSNAPPreUi{};
 struct TagPairSNAPComputeUiSmall{}; // more parallelism, more divergence
 struct TagPairSNAPComputeUiLarge{}; // less parallelism, no divergence
-struct TagPairSNAPTransformUi{}; // re-order ulisttot from SoA to AoSoA, zero ylist
-struct TagPairSNAPComputeZi{};
-struct TagPairSNAPBeta{};
-struct TagPairSNAPComputeBi{};
-struct TagPairSNAPTransformBi{}; // re-order blist from AoSoA to AoS
-struct TagPairSNAPComputeYi{};
-struct TagPairSNAPComputeYiWithZlist{};
 template<int dir>
 struct TagPairSNAPComputeFusedDeidrjSmall{}; // more parallelism, more divergence
 template<int dir>
@ -60,14 +65,7 @@ struct TagPairSNAPComputeFusedDeidrjLarge{}; // less parallelism, no divergence

 // CPU backend only
 struct TagPairSNAPComputeNeighCPU{};
-struct TagPairSNAPPreUiCPU{};
 struct TagPairSNAPComputeUiCPU{};
-struct TagPairSNAPTransformUiCPU{};
-struct TagPairSNAPComputeZiCPU{};
-struct TagPairSNAPBetaCPU{};
-struct TagPairSNAPComputeBiCPU{};
-struct TagPairSNAPZeroYiCPU{};
-struct TagPairSNAPComputeYiCPU{};
 struct TagPairSNAPComputeDuidrjCPU{};
 struct TagPairSNAPComputeDeidrjCPU{};

@ -80,6 +78,8 @@ class PairSNAPKokkos : public PairSNAP {
  typedef ArrayTypes<DeviceType> AT;
  typedef EV_FLOAT value_type;

+  static constexpr LAMMPS_NS::ExecutionSpace execution_space = ExecutionSpaceFromDevice<DeviceType>::space;
+  static constexpr int host_flag = (execution_space == LAMMPS_NS::Host);
  static constexpr int vector_length = vector_length_;
  using real_type = real_type_;
  using complex = SNAComplex<real_type>;
@ -93,9 +93,11 @@ class PairSNAPKokkos : public PairSNAP {
  static constexpr int team_size_compute_ui = 2;
  static constexpr int tile_size_transform_ui = 2;
  static constexpr int tile_size_compute_zi = 2;
+  static constexpr int min_blocks_compute_zi = 0; // no minimum bound
  static constexpr int tile_size_compute_bi = 2;
-  static constexpr int tile_size_transform_bi = 2;
+  static constexpr int tile_size_compute_beta = 2;
  static constexpr int tile_size_compute_yi = 2;
+  static constexpr int min_blocks_compute_yi = 0; // no minimum bound
  static constexpr int team_size_compute_fused_deidrj = 2;
 #elif defined(KOKKOS_ENABLE_SYCL)
  static constexpr int team_size_compute_neigh = 4;
@ -104,9 +106,11 @@ class PairSNAPKokkos : public PairSNAP {
  static constexpr int team_size_compute_ui = 8;
  static constexpr int tile_size_transform_ui = 8;
  static constexpr int tile_size_compute_zi = 4;
+  static constexpr int min_blocks_compute_zi = 0; // no minimum bound
  static constexpr int tile_size_compute_bi = 4;
-  static constexpr int tile_size_transform_bi = 4;
+  static constexpr int tile_size_compute_beta = 8;
  static constexpr int tile_size_compute_yi = 8;
+  static constexpr int min_blocks_compute_yi = 0; // no minimum bound
  static constexpr int team_size_compute_fused_deidrj = 4;
 #else
  static constexpr int team_size_compute_neigh = 4;
@ -116,17 +120,21 @@ class PairSNAPKokkos : public PairSNAP {
  static constexpr int tile_size_transform_ui = 4;
  static constexpr int tile_size_compute_zi = 8;
  static constexpr int tile_size_compute_bi = 4;
-  static constexpr int tile_size_transform_bi = 4;
+  static constexpr int tile_size_compute_beta = 4;
  static constexpr int tile_size_compute_yi = 8;
  static constexpr int team_size_compute_fused_deidrj = sizeof(real_type) == 4 ? 4 : 2;
+
+  // this empirically reduces perf fluctuations from compiler version to compiler version
+  static constexpr int min_blocks_compute_zi = 4;
+  static constexpr int min_blocks_compute_yi = 4;
 #endif

  // Custom MDRangePolicy, Rank3, to reduce verbosity of kernel launches
  // This hides the Kokkos::IndexType<int> and Kokkos::Rank<3...>
  // and reduces the verbosity of the LaunchBound by hiding the explicit
  // multiplication by vector_length
-  template <class Device, int num_tiles, class TagPairSNAP>
-  using Snap3DRangePolicy = typename Kokkos::MDRangePolicy<Device, Kokkos::IndexType<int>, Kokkos::Rank<3, Kokkos::Iterate::Left, Kokkos::Iterate::Left>, Kokkos::LaunchBounds<vector_length * num_tiles>, TagPairSNAP>;
+  template <class Device, int num_tiles, class TagPairSNAP, int min_blocks = 0>
+  using Snap3DRangePolicy = typename Kokkos::MDRangePolicy<Device, Kokkos::IndexType<int>, Kokkos::Rank<3, Kokkos::Iterate::Left, Kokkos::Iterate::Left>, Kokkos::LaunchBounds<vector_length * num_tiles, min_blocks>, TagPairSNAP>;

  // Custom SnapAoSoATeamPolicy to reduce the verbosity of kernel launches
  // This hides the LaunchBounds abstraction by hiding the explicit
@ -134,6 +142,29 @@ class PairSNAPKokkos : public PairSNAP {
  template <class Device, int num_teams, class TagPairSNAP>
  using SnapAoSoATeamPolicy = typename Kokkos::TeamPolicy<Device, Kokkos::LaunchBounds<vector_length * num_teams>, TagPairSNAP>;

+  // Custom MDRangePolicy, Rank2, on the host, to reduce verbosity of kernel launches. The striding of this launch is intentionally
+  // different from the tiled 3D range policy on the device.
+  template <class Device, class TagPairSNAP>
+  using Snap2DHostRangePolicy = typename Kokkos::MDRangePolicy<Device, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<int>, Kokkos::Rank<2, Kokkos::Iterate::Right, Kokkos::Iterate::Right>, TagPairSNAP>;
+
+  // Custom RangePolicy, Rank2, on the host, to reduce verbosity of kernel launches
+  template <class Device, class TagPairSNAP>
+  using Snap1DHostRangePolicy = typename Kokkos::RangePolicy<Device, Kokkos::Schedule<Kokkos::Dynamic>, TagPairSNAP>;
+
+  // Helper routine that returns a CPU or a GPU policy as appropriate
+  template <class Device, int num_tiles, class TagPairSNAP, int min_blocks = 0>
+  auto snap_get_policy(const int& chunk_size_div, const int& second_loop) {
+    if constexpr (host_flag) {
+      return Snap1DHostRangePolicy<Device, TagPairSNAP>(0, chunk_size_div * vector_length);
+
+      // the 2-d policy is still correct but it has atomics so it's slower on the CPU
+      //return Snap2DHostRangePolicy<Device, TagPairSNAP>({0, 0}, {chunk_size_div * vector_length, second_loop});
+    } else
+      return Snap3DRangePolicy<Device, num_tiles, TagPairSNAP, min_blocks>({0, 0, 0},
+                                                                   {vector_length, second_loop, chunk_size_div},
+                                                                   {vector_length, num_tiles, 1});
+  }
+
  PairSNAPKokkos(class LAMMPS *);
  ~PairSNAPKokkos() override;

@ -149,6 +180,7 @@ class PairSNAPKokkos : public PairSNAP {
  template<class TagStyle>
  void check_team_size_reduce(int, int&);

+  // CPU and GPU backend
  template<int NEIGHFLAG, int EVFLAG>
  KOKKOS_INLINE_FUNCTION
  void operator() (TagPairSNAPComputeForce<NEIGHFLAG,EVFLAG>,const int& ii) const;
@ -157,18 +189,23 @@ class PairSNAPKokkos : public PairSNAP {
  KOKKOS_INLINE_FUNCTION
  void operator() (TagPairSNAPComputeForce<NEIGHFLAG,EVFLAG>,const int& ii, EV_FLOAT&) const;

-  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPBetaCPU,const int& ii) const;
-
  // GPU backend only
  KOKKOS_INLINE_FUNCTION
  void operator() (TagPairSNAPComputeNeigh,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeNeigh>::member_type& team) const;

+  // GPU backend only
  KOKKOS_INLINE_FUNCTION
  void operator() (TagPairSNAPComputeCayleyKlein, const int iatom_mod, const int jnbor, const int iatom_div) const;

+  // CPU and GPU
  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPPreUi,const int iatom_mod, const int j, const int iatom_div) const;
+  void operator() (TagPairSNAPPreUi, const int& iatom_mod, const int& j, const int& iatom_div) const;
+
+  KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPPreUi, const int& iatom, const int& j) const;
+
+  KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPPreUi, const int& iatom) const;

  KOKKOS_INLINE_FUNCTION
  void operator() (TagPairSNAPComputeUiSmall,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiSmall>::member_type& team) const;
@ -177,25 +214,67 @@ class PairSNAPKokkos : public PairSNAP {
  void operator() (TagPairSNAPComputeUiLarge,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiLarge>::member_type& team) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPTransformUi,const int iatom_mod, const int j, const int iatom_div) const;
+  void operator() (TagPairSNAPTransformUi, const int& iatom_mod, const int& idxu, const int& iatom_div) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeZi,const int iatom_mod, const int idxz, const int iatom_div) const;
+  void operator() (TagPairSNAPTransformUi, const int& iatom, const int& idxu) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPBeta, const int& ii) const;
+  void operator() (TagPairSNAPTransformUi, const int& iatom) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeZi<chemsnap>, const int& iatom_mod, const int& idxz, const int& iatom_div) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeZi<chemsnap>, const int& iatom, const int& idxz) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeZi<chemsnap>, const int& iatom) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeBi<chemsnap>, const int& iatom_mod, const int& idxb, const int& iatom_div) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeBi<chemsnap>, const int& iatom, const int& idxb) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeBi<chemsnap>, const int& iatom) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeBi,const int iatom_mod, const int idxb, const int iatom_div) const;
+  void operator() (TagPairSNAPComputeBetaLinear, const int& iatom_mod, const int& idxb, const int& iatom_div) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPTransformBi,const int iatom_mod, const int idxb, const int iatom_div) const;
+  void operator() (TagPairSNAPComputeBetaLinear, const int& iatom, const int& idxb) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeYi,const int iatom_mod, const int idxz, const int iatom_div) const;
+  void operator() (TagPairSNAPComputeBetaLinear, const int& iatom) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeYiWithZlist,const int iatom_mod, const int idxz, const int iatom_div) const;
+  void operator() (TagPairSNAPComputeBetaQuadratic, const int& iatom_mod, const int& idxb, const int& iatom_div) const;
+
+  KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeBetaQuadratic, const int& iatom, const int& idxb) const;
+
+  KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeBetaQuadratic, const int& iatom) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeYi<chemsnap>, const int& iatom_mod, const int& idxz, const int& iatom_div) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeYi<chemsnap>, const int& iatom, const int& idxz) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeYi<chemsnap>, const int& iatom) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeYiWithZlist<chemsnap>, const int& iatom_mod, const int& idxz, const int& iatom_div) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeYiWithZlist<chemsnap>, const int& iatom, const int& idxz) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void operator() (TagPairSNAPComputeYiWithZlist<chemsnap>, const int& iatom) const;

  template<int dir>
  KOKKOS_INLINE_FUNCTION
@ -210,28 +289,22 @@ class PairSNAPKokkos : public PairSNAP {
  void operator() (TagPairSNAPComputeNeighCPU,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeNeighCPU>::member_type& team) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPPreUiCPU,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPPreUiCPU>::member_type& team) const;
+  void operator() (TagPairSNAPComputeUiCPU, const int& iatom, const int& jnbor) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeUiCPU,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeUiCPU>::member_type& team) const;
+  void operator() (TagPairSNAPComputeUiCPU, const int& iatom) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPTransformUiCPU, const int j, const int iatom) const;
+  void operator() (TagPairSNAPComputeDuidrjCPU, const int& iatom, const int& jnbor) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeZiCPU,const int& ii) const;
+  void operator() (TagPairSNAPComputeDuidrjCPU, const int& iatom) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeBiCPU,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeBiCPU>::member_type& team) const;
+  void operator() (TagPairSNAPComputeDeidrjCPU, const int& iatom, const int& jnbor) const;

  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeYiCPU,const int& ii) const;
-
-  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeDuidrjCPU,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeDuidrjCPU>::member_type& team) const;
-
-  KOKKOS_INLINE_FUNCTION
-  void operator() (TagPairSNAPComputeDeidrjCPU,const typename Kokkos::TeamPolicy<DeviceType, TagPairSNAPComputeDeidrjCPU>::member_type& team) const;
+  void operator() (TagPairSNAPComputeDeidrjCPU, const int& iatom) const;

  template<int NEIGHFLAG>
  KOKKOS_INLINE_FUNCTION
@ -252,7 +325,7 @@ class PairSNAPKokkos : public PairSNAP {
  SNAKokkos<DeviceType, real_type, vector_length> snaKK;

  int inum,max_neighs,chunk_size,chunk_offset;
-  int host_flag,neighflag;
+  int neighflag;

  int eflag,vflag;

@ -260,13 +333,12 @@ class PairSNAPKokkos : public PairSNAP {

  Kokkos::View<real_type*, DeviceType> d_radelem;              // element radii
  Kokkos::View<real_type*, DeviceType> d_wjelem;               // elements weights
-  Kokkos::View<real_type**, Kokkos::LayoutRight, DeviceType> d_coeffelem;           // element bispectrum coefficients
+  typename SNAKokkos<DeviceType, real_type, vector_length>::t_sna_2d_lr d_coeffelem; // element bispectrum coefficients
  Kokkos::View<real_type*, DeviceType> d_sinnerelem;           // element inner cutoff midpoint
  Kokkos::View<real_type*, DeviceType> d_dinnerelem;           // element inner cutoff half-width
  Kokkos::View<T_INT*, DeviceType> d_map;                    // mapping from atom types to elements
  Kokkos::View<T_INT*, DeviceType> d_ninside;                // ninside for all atoms in list
-  Kokkos::View<real_type**, DeviceType> d_beta;                // betas for all atoms in list
-  Kokkos::View<real_type***, Kokkos::LayoutLeft, DeviceType> d_beta_pack;          // betas for all atoms in list, GPU
+  typename SNAKokkos<DeviceType, real_type, vector_length>::t_sna_2d d_beta;                // betas for all atoms in list

  typedef Kokkos::DualView<F_FLOAT**, DeviceType> tdual_fparams;
  tdual_fparams k_cutsq;
@ -301,6 +373,9 @@ class PairSNAPKokkos : public PairSNAP {
  template <typename scratch_type>
  int scratch_size_helper(int values_per_team);

+  // Make SNAKokkos a friend
+  friend class SNAKokkos<DeviceType, real_type, vector_length>;
+
 };


--- a/src/KOKKOS/pair_snap_kokkos_impl.h
+++ b/src/KOKKOS/pair_snap_kokkos_impl.h
--- a/src/KOKKOS/sna_kokkos.h
+++ b/src/KOKKOS/sna_kokkos.h
@ -134,6 +134,8 @@ class SNAKokkos {
  static constexpr int vector_length = vector_length_;

  using KKDeviceType = typename KKDevice<DeviceType>::value;
+  static constexpr LAMMPS_NS::ExecutionSpace execution_space = ExecutionSpaceFromDevice<DeviceType>::space;
+  static constexpr int host_flag = (execution_space == LAMMPS_NS::Host);

  typedef Kokkos::View<int*, DeviceType> t_sna_1i;
  typedef Kokkos::View<real_type*, DeviceType> t_sna_1d;
@ -141,6 +143,7 @@ class SNAKokkos {
  typedef Kokkos::View<int**, DeviceType> t_sna_2i;
  typedef Kokkos::View<real_type**, DeviceType> t_sna_2d;
  typedef Kokkos::View<real_type**, Kokkos::LayoutLeft, DeviceType> t_sna_2d_ll;
+  typedef Kokkos::View<real_type**, Kokkos::LayoutRight, DeviceType> t_sna_2d_lr;
  typedef Kokkos::View<real_type***, DeviceType> t_sna_3d;
  typedef Kokkos::View<real_type***, Kokkos::LayoutLeft, DeviceType> t_sna_3d_ll;
  typedef Kokkos::View<real_type***[3], DeviceType> t_sna_4d;
@ -156,7 +159,7 @@ class SNAKokkos {
  typedef Kokkos::View<complex***, DeviceType> t_sna_3c;
  typedef Kokkos::View<complex***, Kokkos::LayoutLeft, DeviceType> t_sna_3c_ll;
  typedef Kokkos::View<complex***[3], DeviceType> t_sna_4c;
-  typedef Kokkos::View<complex***[3], Kokkos::LayoutLeft, DeviceType> t_sna_4c3_ll;
+  typedef Kokkos::View<complex***[3], DeviceType> t_sna_4c3;
  typedef Kokkos::View<complex****, Kokkos::LayoutLeft, DeviceType> t_sna_4c_ll;
  typedef Kokkos::View<complex**[3], DeviceType> t_sna_3c3;
  typedef Kokkos::View<complex*****, DeviceType> t_sna_5c;
@ -168,7 +171,8 @@ class SNAKokkos {
  SNAKokkos(const SNAKokkos<DeviceType,real_type,vector_length>& sna, const typename Kokkos::TeamPolicy<DeviceType>::member_type& team);

  inline
-  SNAKokkos(real_type, int, real_type, int, int, int, int, int, int, int);
+  //SNAKokkos(real_type, int, real_type, int, int, int, int, int, int, int);
+  SNAKokkos(const PairSNAPKokkos<DeviceType, real_type, vector_length>&);

  KOKKOS_INLINE_FUNCTION
  ~SNAKokkos();
@ -182,88 +186,87 @@ class SNAKokkos {
  double memory_usage();

  int ncoeff;
-  int host_flag;

  // functions for bispectrum coefficients, GPU only
  KOKKOS_INLINE_FUNCTION
-  void compute_cayley_klein(const int&, const int&, const int&);
+  void compute_cayley_klein(const int&, const int&) const;
  KOKKOS_INLINE_FUNCTION
-  void pre_ui(const int&, const int&, const int&, const int&); // ForceSNAP
+  void pre_ui(const int&, const int&, const int&) const; // ForceSNAP

  // version of the code with parallelism over j_bend
  KOKKOS_INLINE_FUNCTION
-  void compute_ui_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); // ForceSNAP
+  void compute_ui_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int) const; // ForceSNAP
  // version of the code without parallelism over j_bend
  KOKKOS_INLINE_FUNCTION
-  void compute_ui_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); // ForceSNAP
+  void compute_ui_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int) const; // ForceSNAP

+  // desymmetrize ulisttot
  KOKKOS_INLINE_FUNCTION
-  void compute_zi(const int&, const int&, const int&);    // ForceSNAP
+  void transform_ui(const int&, const int&) const;
+
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void compute_zi(const int&, const int&) const;    // ForceSNAP
+  template <bool chemsnap, bool need_atomics> KOKKOS_INLINE_FUNCTION
+  void compute_yi(const int&, const int&) const; // ForceSNAP
+  template <bool chemsnap, bool need_atomics> KOKKOS_INLINE_FUNCTION
+  void compute_yi_with_zlist(const int&, const int&) const; // ForceSNAP
+  template <bool chemsnap> KOKKOS_INLINE_FUNCTION
+  void compute_bi(const int&, const int&) const;    // ForceSNAP
  KOKKOS_INLINE_FUNCTION
-  void compute_yi(int,int,int,
-   const Kokkos::View<real_type***, Kokkos::LayoutLeft, DeviceType> &beta_pack); // ForceSNAP
-  KOKKOS_INLINE_FUNCTION
-  void compute_yi_with_zlist(int,int,int,
-   const Kokkos::View<real_type***, Kokkos::LayoutLeft, DeviceType> &beta_pack); // ForceSNAP
-  KOKKOS_INLINE_FUNCTION
-  void compute_bi(const int&, const int&, const int&);    // ForceSNAP
+  void compute_beta_linear(const int&, const int&, const int&) const;
+  template <bool need_atomics> KOKKOS_INLINE_FUNCTION
+  void compute_beta_quadratic(const int&, const int&, const int&) const;

  // functions for derivatives, GPU only
  // version of the code with parallelism over j_bend
  template<int dir>
  KOKKOS_INLINE_FUNCTION
-  void compute_fused_deidrj_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int); //ForceSNAP
+  void compute_fused_deidrj_small(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int, const int) const; //ForceSNAP
  // version of the code without parallelism over j_bend
  template<int dir>
  KOKKOS_INLINE_FUNCTION
-  void compute_fused_deidrj_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int); //ForceSNAP
+  void compute_fused_deidrj_large(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, const int, const int, const int) const; //ForceSNAP

  // core "evaluation" functions that get plugged into "compute" functions
  // plugged into compute_ui_small, compute_ui_large
  KOKKOS_FORCEINLINE_FUNCTION
  void evaluate_ui_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&, const int&,
-                        const int&, const int&, const int&);
+                        const int&, const int&) const;
  // plugged into compute_zi, compute_yi
  KOKKOS_FORCEINLINE_FUNCTION
  complex evaluate_zi(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
-                        const int&, const int&, const int&, const int&, const real_type*);
-  // plugged into compute_yi, compute_yi_with_zlist
+                        const int&, const int&, const int&, const real_type*) const;
+  // plugged into compute_bi
  KOKKOS_FORCEINLINE_FUNCTION
-  real_type evaluate_beta_scaled(const int&, const int&, const int&, const int&, const int&, const int&, const int&, const int&,
-                        const Kokkos::View<real_type***, Kokkos::LayoutLeft, DeviceType> &);
+  real_type evaluate_bi(const int&, const int&, const int&, const int&,
+                          const int&, const int&, const int&) const;
+  // plugged into compute_yi, compute_yi_with_zlist
+  template <bool chemsnap> KOKKOS_FORCEINLINE_FUNCTION
+  real_type evaluate_beta_scaled(const int&, const int&, const int&, const int&, const int&, const int&, const int&) const;
  // plugged into compute_fused_deidrj_small, compute_fused_deidrj_large
  KOKKOS_FORCEINLINE_FUNCTION
  real_type evaluate_duidrj_jbend(const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&,
                        const WignerWrapper<real_type, vector_length>&, const complex&, const complex&, const real_type&,
-                        const int&, const int&, const int&, const int&);
+                        const int&, const int&, const int&) const;

  // functions for bispectrum coefficients, CPU only
-  KOKKOS_INLINE_FUNCTION
-  void pre_ui_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team,const int&,const int&); // ForceSNAP
-  KOKKOS_INLINE_FUNCTION
-  void compute_ui_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int); // ForceSNAP
-  KOKKOS_INLINE_FUNCTION
-  void compute_zi_cpu(const int&);    // ForceSNAP
-  KOKKOS_INLINE_FUNCTION
-  void compute_yi_cpu(int,
-   const Kokkos::View<real_type**, DeviceType> &beta); // ForceSNAP
-    KOKKOS_INLINE_FUNCTION
-  void compute_bi_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int);    // ForceSNAP
+  template <bool need_atomics> KOKKOS_INLINE_FUNCTION
+  void compute_ui_cpu(const int&, const int&) const; // ForceSNAP

  // functions for derivatives, CPU only
  KOKKOS_INLINE_FUNCTION
-  void compute_duidrj_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int); //ForceSNAP
+  void compute_duidrj_cpu(const int&, const int&) const; //ForceSNAP
  KOKKOS_INLINE_FUNCTION
-  void compute_deidrj_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int); // ForceSNAP
+  void compute_deidrj_cpu(const int&, const int&) const; // ForceSNAP

  KOKKOS_INLINE_FUNCTION
-  real_type compute_sfac(real_type, real_type, real_type, real_type); // add_uarraytot, compute_duarray
+  real_type compute_sfac(real_type, real_type, real_type, real_type) const; // add_uarraytot, compute_duarray

  KOKKOS_INLINE_FUNCTION
-  real_type compute_dsfac(real_type, real_type, real_type, real_type); // compute_duarray
+  real_type compute_dsfac(real_type, real_type, real_type, real_type) const; // compute_duarray

  KOKKOS_INLINE_FUNCTION
-  void compute_s_dsfac(const real_type, const real_type, const real_type, const real_type, real_type&, real_type&); // compute_cayley_klein
+  void compute_s_dsfac(const real_type, const real_type, const real_type, const real_type, real_type&, real_type&) const; // compute_cayley_klein

 #ifdef TIMING_INFO
  double* timers;
@ -283,37 +286,41 @@ class SNAKokkos {
  t_sna_2d dinnerij;
  t_sna_2i element;
  t_sna_3d dedr;
-  int natom, nmax;
+  int natom, natom_pad, nmax;

  void grow_rij(int, int);

  int twojmax, diagonalstyle;

+  // Input beta coefficients; aliases the object in PairSnapKokkos
+  t_sna_2d_lr d_coeffelem;
+
+  // Beta for all atoms in list; aliases the object in PairSnapKokkos
+  // for qSNAP the quadratic terms get accumulated into it
+  // in compute_bi
+  t_sna_2d d_beta;
+
+  // Structures for both the CPU, GPU backend
+  t_sna_3d ulisttot_re;
+  t_sna_3d ulisttot_im;
+  t_sna_3c ulisttot; // un-folded ulisttot
+
+  t_sna_3c zlist;
  t_sna_3d blist;
-  t_sna_3c_ll ulisttot;
-  t_sna_3c_ll ulisttot_full; // un-folded ulisttot, cpu only
-  t_sna_3c_ll zlist;

-  t_sna_3c_ll ulist;
-  t_sna_3c_ll ylist;
+  t_sna_3d ylist_re;
+  t_sna_3d ylist_im;

-  // derivatives of data
-  t_sna_4c3_ll dulist;
+  // Structures for the CPU backend only
+  t_sna_3c ulist_cpu;
+  t_sna_4c3 dulist_cpu;

  // Modified structures for GPU backend
-  t_sna_3c_ll a_pack; // Cayley-Klein `a`
-  t_sna_3c_ll b_pack; // `b`
-  t_sna_4c_ll da_pack; // `da`
-  t_sna_4c_ll db_pack; // `db`
-  t_sna_4d_ll sfac_pack; // sfac, dsfac_{x,y,z}
-
-  t_sna_4d_ll ulisttot_re_pack; // split real,
-  t_sna_4d_ll ulisttot_im_pack; // imag, AoSoA, flattened
-  t_sna_4c_ll ulisttot_pack; // AoSoA layout
-  t_sna_4c_ll zlist_pack; // AoSoA layout
-  t_sna_4d_ll blist_pack;
-  t_sna_4d_ll ylist_pack_re; // split real,
-  t_sna_4d_ll ylist_pack_im; // imag AoSoA layout
+  t_sna_2c a_gpu; // Cayley-Klein `a`
+  t_sna_2c b_gpu; // `b`
+  t_sna_3c da_gpu; // `da`
+  t_sna_3c db_gpu; // `db`
+  t_sna_3d sfac_gpu; // sfac, dsfac_{x,y,z}

  int idxcg_max, idxu_max, idxu_half_max, idxu_cache_max, idxz_max, idxb_max;

@ -363,25 +370,11 @@ class SNAKokkos {
  inline
  void init_rootpqarray();    // init()

-  KOKKOS_INLINE_FUNCTION
-  void add_uarraytot(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int, const real_type&, const real_type&, const real_type&, const real_type&, const real_type&, int); // compute_ui
-
-  KOKKOS_INLINE_FUNCTION
-  void compute_uarray_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int,
-                      const real_type&, const real_type&, const real_type&,
-                      const real_type&, const real_type&); // compute_ui_cpu
-
-
  inline
  double deltacg(int, int, int);  // init_clebsch_gordan

  inline
  int compute_ncoeff();           // SNAKokkos()
-  KOKKOS_INLINE_FUNCTION
-  void compute_duarray_cpu(const typename Kokkos::TeamPolicy<DeviceType>::member_type& team, int, int,
-                       const real_type&, const real_type&, const real_type&, // compute_duidrj_cpu
-                           const real_type&, const real_type&, const real_type&, const real_type&, const real_type&,
-                           const real_type&, const real_type&);

  // Sets the style for the switching function
  // 0 = none
@ -401,6 +394,9 @@ class SNAKokkos {
  real_type wself;
  int wselfall_flag;

+  // quadratic flag
+  int quadratic_flag;
+
  int bzero_flag; // 1 if bzero subtracted from barray
  Kokkos::View<real_type*, DeviceType> bzero; // array of B values for isolated atoms
 };
@ -409,4 +405,3 @@ class SNAKokkos {

 #include "sna_kokkos_impl.h"
 #endif
-
--- a/src/KOKKOS/sna_kokkos_impl.h
+++ b/src/KOKKOS/sna_kokkos_impl.h
--- a/src/PHONON/fix_phonon.cpp
+++ b/src/PHONON/fix_phonon.cpp
@ -400,7 +400,7 @@ void FixPhonon::end_of_step()
      ndim = sysdim;
      for (i = 1; i < nucell; ++i) {
        for (idim = 0; idim < sysdim; ++idim) dist2orig[idim] = Rnow[idx][ndim++] - Rnow[idx][idim];
-        domain->minimum_image(dist2orig);
+        domain->minimum_image_big(dist2orig);
        for (idim = 0; idim < sysdim; ++idim) basis[i][idim] += dist2orig[idim];
      }
    }
--- a/src/REAXFF/fix_qtpie_reaxff.cpp
+++ b/src/REAXFF/fix_qtpie_reaxff.cpp
@ -49,7 +49,6 @@ using namespace LAMMPS_NS;
 using namespace FixConst;

 static constexpr double CONV_TO_EV = 14.4;
-static constexpr double SMALL = 1.0e-14;
 static constexpr double QSUMSMALL = 0.00001;
 static constexpr double ANGSTROM_TO_BOHRRADIUS = 1.8897261259;

@ -1101,7 +1100,6 @@ void FixQtpieReaxFF::calc_chi_eff()
  memset(&chi_eff[0],0,atom->nmax*sizeof(double));

  const auto x = (const double * const *)atom->x;
-  const int ntypes = atom->ntypes;
  const int *type = atom->type;

  double dist,overlap,sum_n,sum_d,expa,expb,chia,chib,phia,phib,p,m;
--- a/src/RIGID/fix_ehex.cpp
+++ b/src/RIGID/fix_ehex.cpp
@ -434,7 +434,7 @@ bool FixEHEX::check_cluster(tagint *shake_atom, int n, Region *region)

      // take into account pbc

-      domain->minimum_image(xtemp);
+      domain->minimum_image_big(xtemp);

      for (int k = 0; k < 3; k++) xcom[k] += mi * (x[lid[0]][k] + xtemp[k]);
    }
--- a/src/angle_zero.cpp
+++ b/src/angle_zero.cpp
@ -54,13 +54,13 @@ void AngleZero::compute(int eflag, int vflag)

 void AngleZero::settings(int narg, char **arg)
 {
-  if ((narg != 0) && (narg != 1)) error->all(FLERR, "Illegal angle_style command");
+  if (narg > 1) error->all(FLERR, "Too many angle_style zero keywords");

  if (narg == 1) {
    if (strcmp("nocoeff", arg[0]) == 0)
      coeffflag = 0;
    else
-      error->all(FLERR, "Illegal angle_style command");
+      error->all(FLERR, "Unknown angle_style zero keyword {}", arg[0]);
  }
 }

--- a/src/dump.cpp
+++ b/src/dump.cpp
@ -73,6 +73,7 @@ Dump::Dump(LAMMPS *lmp, int /*narg*/, char **arg) :

  clearstep = 0;
  sort_flag = 0;
+  sortcol = 0;
  balance_flag = 0;
  append_flag = 0;
  buffer_allow = 0;
--- a/src/fix_vector.cpp
+++ b/src/fix_vector.cpp
@ -114,9 +114,11 @@ FixVector::FixVector(LAMMPS *lmp, int narg, char **arg) :
        error->all(FLERR, "Fix for fix {} vector not computed at compatible time", val.id);

      if (val.argindex == 0)
+        value = ifix->extscalar;
+      else if (ifix->extvector >= 0)
        value = ifix->extvector;
      else
-        value = ifix->extarray;
+        value = ifix->extlist[val.argindex - 1];
      val.val.f = ifix;

    } else if (val.which == ArgInfo::VARIABLE) {
--- a/src/library.cpp
+++ b/src/library.cpp
@ -519,7 +519,7 @@ must be freed with :cpp:func:`lammps_free` after use to avoid a memory leak.
 \endverbatim
 *
 * \param  handle  pointer to a previously created LAMMPS instance
- * \param  cmd     string with a single LAMMPS input line
+ * \param  line    string with a single LAMMPS input line
 * \return         string with expanded line */

 char *lammps_expand(void *handle, const char *line)
--- a/src/lmptype.h
+++ b/src/lmptype.h
@ -287,7 +287,7 @@ struct multitype {
    int64_t b;
  } data;

-  multitype() : type(LAMMPS_NONE) { data.d = 0.0; }
+  multitype() noexcept : type(LAMMPS_NONE) { data.d = 0.0; }
  multitype(const multitype &) = default;
  multitype(multitype &&) = default;
  ~multitype() = default;
--- a/src/tokenizer.h
+++ b/src/tokenizer.h
@ -59,17 +59,21 @@ class TokenizerException : public std::exception {
  std::string message;

 public:
-  // remove unused default constructor
+  /** The default constructor is disabled */
  TokenizerException() = delete;

  /** Thrown during retrieving or skipping tokens
   *
-   * \param  msg    String with error message
-   * \param  token  String of the token/word that caused the error */
+   * \param   msg     String with error message
+   * \param   token   String of the token or word that caused the error */
  explicit TokenizerException(const std::string &msg, const std::string &token);

  /** Retrieve message describing the thrown exception
-   * \return string with error message */
+   *
+   * This function provides the message that can be retrieved when the corresponding
+   * exception is caught.
+   *
+   * \return  String with error message */
  const char *what() const noexcept override { return message.c_str(); }
 };

--- a/src/utils.h
+++ b/src/utils.h
@ -59,7 +59,7 @@ namespace utils {

  void missing_cmd_args(const std::string &file, int line, const std::string &cmd, Error *error);

-  /* Internal function handling the argument list for logmesg(). */
+  /*! Internal function handling the argument list for logmesg(). */

  void fmtargs_logmesg(LAMMPS *lmp, fmt::string_view format, fmt::format_args args);

@ -426,12 +426,11 @@ This functions adds the following case to :cpp:func:`utils::bounds() <LAMMPS_NS:
   * \param ref     per-grid reference from input script, e.g. "c_10:grid:data[2]"
   * \param nevery  frequency at which caller will access fix for per-grid info,
   *                ignored when reference is to a compute
+   * \param id     ID of Compute or Fix
+   * \param igrid  which grid is referenced (0 to N-1)
+   * \param idata  which data on grid is referenced (0 to N-1)
+   * \param index  which column of data is referenced (0 for vec, 1-N for array)
   * \param lmp     pointer to top-level LAMMPS class instance
-   * \param verify  check bounds for interaction type
-   * \return id     ID of Compute or Fix
-   * \return igrid  which grid is referenced (0 to N-1)
-   * \return idata  which data on grid is referenced (0 to N-1)
-   * \return index  which column of data is referenced (0 for vec, 1-N for array)
   * \return        ArgINFO::COMPUTE or FIX or UNKNOWN or NONE */

  int check_grid_reference(char *errstr, char *ref, int nevery, char *&id, int &igrid, int &idata,
@ -442,7 +441,10 @@ This functions adds the following case to :cpp:func:`utils::bounds() <LAMMPS_NS:
   * Format of grid ID reference = id:gname:dname.
   * Return vector with the 3 sub-strings.
   *
-   * \param name = complete grid ID
+   * \param file     name of source file for error message
+   * \param line     line number in source file for error message
+   * \param name     complete grid ID
+   * \param error    pointer to Error class
   * \return std::vector<std::string> containing the 3 sub-strings  */

  std::vector<std::string> parse_grid_id(const char *file, int line, const std::string &name,
--- a/tools/lammps-gui/CMakeLists.txt
+++ b/tools/lammps-gui/CMakeLists.txt
@ -205,7 +205,7 @@ if(FLATPAK_COMMAND AND FLATPAK_BUILDER)
  file(STRINGS ${LAMMPS_DIR}/src/version.h line REGEX LAMMPS_VERSION)
  string(REGEX REPLACE "#define LAMMPS_VERSION \"([0-9]+) ([A-Za-z][A-Za-z][A-Za-z])[A-Za-z]* ([0-9]+)\""
                        "\\1\\2\\3" LAMMPS_RELEASE "${line}")
-  set(FLATPAK_BUNDLE "LAMMPS_GUI-Linux-amd64-${LAMMPS_RELEASE}.flatpak")
+  set(FLATPAK_BUNDLE "LAMMPS-Linux-x86_64-GUI-${LAMMPS_RELEASE}.flatpak")
  add_custom_target(flatpak
    COMMAND ${FLATPAK_COMMAND} --user remote-add --if-not-exists flathub https://dl.flathub.org/repo/flathub.flatpakrepo
    COMMAND ${FLATPAK_BUILDER} --force-clean --verbose --repo=${CMAKE_CURRENT_BINARY_DIR}/flatpak-repo
--- a/tools/lammps-gui/lammps-gui.appdata.xml
+++ b/tools/lammps-gui/lammps-gui.appdata.xml
@ -56,6 +56,9 @@
  <releases>
    <release version="1.6.11" timestamp="1725080055">
      <description>
+        move cursor to end of log buffer before inserting new text
+        remove empirical filter to remove outliers from corrupted data
+        change tutorial download URL to tutorial website
      </description>
    </release>
    <release version="1.6.10" timestamp="1724585189">
--- a/tools/lammps-gui/org.lammps.lammps-gui.yml
+++ b/tools/lammps-gui/org.lammps.lammps-gui.yml
@ -88,6 +88,7 @@ modules:
      - -D PKG_QTB=yes
      - -D PKG_REACTION=yes
      - -D PKG_REAXFF=yes
+      - -D PKG_RHEO=yes
      - -D PKG_RIGID=yes
      - -D PKG_SHOCK=yes
      - -D PKG_SMTBQ=yes