diff --git a/doc/src/Commands_pair.rst b/doc/src/Commands_pair.rst
index fd1ef692c5..7f7b2d4b7d 100644
--- a/doc/src/Commands_pair.rst
+++ b/doc/src/Commands_pair.rst
@@ -200,6 +200,7 @@ OPT.
    * :doc:`mdpd <pair_mesodpd>`
    * :doc:`mdpd/rhosum <pair_mesodpd>`
    * :doc:`meam (k) <pair_meam>`
+   * :doc:`meam/ms (k) <pair_meam>`
    * :doc:`meam/spline (o) <pair_meam_spline>`
    * :doc:`meam/sw/spline <pair_meam_sw_spline>`
    * :doc:`mesocnt <pair_mesocnt>`
diff --git a/doc/src/pair_meam.rst b/doc/src/pair_meam.rst
index 6a3d52c4d5..57c40aa6ee 100644
--- a/doc/src/pair_meam.rst
+++ b/doc/src/pair_meam.rst
@@ -1,17 +1,26 @@
 .. index:: pair_style meam
 .. index:: pair_style meam/kk
+.. index:: pair_style meam/ms
+.. index:: pair_style meam/ms/kk
 
 pair_style meam command
 =========================
 
 Accelerator Variants: *meam/kk*
 
+pair_style meam/ms command
+==========================
+
+Accelerator Variants: *meam/ms/kk*
+
 Syntax
 """"""
 
 .. code-block:: LAMMPS
 
-   pair_style meam
+   pair_style style
+
+* style = *meam* or *meam/ms*
 
 Examples
 """"""""
@@ -22,6 +31,9 @@ Examples
    pair_coeff * * ../potentials/library.meam Si ../potentials/si.meam Si
    pair_coeff * * ../potentials/library.meam Ni Al NULL Ni Al Ni Ni
 
+   pair_style meam/ms
+   pair_coeff * * ../potentials/library.msmeam H Ga ../potentials/HGa.meam H Ga
+
 Description
 """""""""""
 
@@ -31,16 +43,23 @@ Description
    as of November 2010; see description below of the mixture_ref_t
    parameter
 
-Pair style *meam* computes non-bonded interactions for a variety of materials
-using the modified embedded-atom method (MEAM)
-:ref:`(Baskes) <Baskes>`.  Conceptually, it is an extension to the original
-:doc:`EAM method <pair_eam>` which adds angular forces.  It is
-thus suitable for modeling metals and alloys with fcc, bcc, hcp and
-diamond cubic structures, as well as materials with covalent interactions
-like silicon and carbon. This *meam* pair style is a translation of the
-original Fortran version to C++. It is functionally equivalent but more
-efficient and has additional features. The Fortran version of the *meam*
-pair style has been removed from LAMMPS after the 12 December 2018 release.
+Pair style *meam* computes non-bonded interactions for a variety of
+materials using the modified embedded-atom method (MEAM) :ref:`(Baskes)
+<Baskes>`.  Conceptually, it is an extension to the original :doc:`EAM
+method <pair_eam>` which adds angular forces.  It is thus suitable for
+modeling metals and alloys with fcc, bcc, hcp and diamond cubic
+structures, as well as materials with covalent interactions like silicon
+and carbon.
+
+The *meam* pair style is a translation of the original Fortran version
+to C++. It is functionally equivalent but more efficient and has
+additional features. The Fortran version of the *meam* pair style has
+been removed from LAMMPS after the 12 December 2018 release.
+
+Pair style *meam/ms* uses the multi-state MEAM (MS-MEAM) method
+according to :ref:`(Baskes2) <Baskes2>`, which is an extension to MEAM.
+This pair style is mostly equivalent to *meam* and differs only
+where noted in the documentation below.
 
 In the MEAM formulation, the total energy E of a system of atoms is
 given by:
@@ -351,6 +370,16 @@ Most published MEAM parameter sets use the default values *attrac* = *repulse* =
 Setting *repuls* = *attrac* = *delta* corresponds to the form used in several
 recent published MEAM parameter sets, such as :ref:`(Valone) <Valone>`
 
+Then using *meam/ms* pair style the multi-state MEAM (MS-MEAM) method is
+activated.  This requires 6 extra parameters in the MEAM library file,
+resulting in 25 parameters ordered that are ordered like this:
+
+elt, lat, z, ielement, atwt, alpha, b0, b1, b2, b3, b1m, b2m, b3m, alat, esub, asub,
+t0, t1, t2, t3, t1m, t2m, t3m, rozero, ibar
+
+The 6 extra MS-MEAM parameters are *b1m, b2m, b3m, t1m, t2m, t3m*.
+In the LAMMPS ``potentials`` folder, compatible files have an ".msmeam" extension.
+
 ----------
 
 .. include:: accel_styles.rst
@@ -393,16 +422,15 @@ This pair style can only be used via the *pair* keyword of the
 Restrictions
 """"""""""""
 
-The *meam* style is provided in the MEAM package. It is
-only enabled if LAMMPS was built with that package.
+The *meam* and *meam/ms* pair styles are provided in the MEAM
+package. They are only enabled if LAMMPS was built with that package.
 See the :doc:`Build package <Build_package>` page for more info.
 
-The maximum number of elements, that can be read from the MEAM
-library file, is determined at compile time. The default is 5.
-If you need support for more elements, you have to change the
-define for the constant 'maxelt' at the beginning of the file
-src/MEAM/meam.h and update/recompile LAMMPS. There is no
-limit on the number of atoms types.
+The maximum number of elements, that can be read from the MEAM library
+file, is determined at compile time. The default is 5.  If you need
+support for more elements, you have to change the the constant 'maxelt'
+at the beginning of the file ``src/MEAM/meam.h`` and update/recompile
+LAMMPS.  There is no limit on the number of atoms types.
 
 Related commands
 """"""""""""""""
@@ -421,6 +449,10 @@ none
 
 **(Baskes)** Baskes, Phys Rev B, 46, 2727-2742 (1992).
 
+.. _Baskes2:
+
+**(Baskes2)** Baskes, Phys Rev B, 75, 094113 (2007).
+
 .. _Gullet:
 
 **(Gullet)** Gullet, Wagner, Slepoy, SANDIA Report 2003-8782 (2003). DOI:10.2172/918395
diff --git a/doc/src/pair_style.rst b/doc/src/pair_style.rst
index facfadeb9b..b3f7276480 100644
--- a/doc/src/pair_style.rst
+++ b/doc/src/pair_style.rst
@@ -277,7 +277,8 @@ accelerated styles exist.
 * :doc:`lubricateU/poly <pair_lubricateU>` - hydrodynamic lubrication forces for Fast Lubrication with polydispersity
 * :doc:`mdpd <pair_mesodpd>` - mDPD particle interactions
 * :doc:`mdpd/rhosum <pair_mesodpd>` - mDPD particle interactions for mass density
-* :doc:`meam <pair_meam>` - modified embedded atom method (MEAM) in C
+* :doc:`meam <pair_meam>` - modified embedded atom method (MEAM)
+* :doc:`meam/ms <pair_meam>` - multi-state modified embedded atom method (MS-MEAM)
 * :doc:`meam/spline <pair_meam_spline>` - splined version of MEAM
 * :doc:`meam/sw/spline <pair_meam_sw_spline>` - splined version of MEAM with a Stillinger-Weber term
 * :doc:`mesocnt <pair_mesocnt>` - mesoscopic vdW potential for (carbon) nanotubes
diff --git a/examples/meam/msmeam/HGa.meam b/examples/meam/msmeam/HGa.meam
new file mode 100644
index 0000000000..9f01501c16
--- /dev/null
+++ b/examples/meam/msmeam/HGa.meam
@@ -0,0 +1,30 @@
+bkgd_dyn        =       1
+emb_lin_neg = 1
+augt1=0 
+ialloy=1 
+rc	=	 5.9 
+#H
+attrac(1,1)=0.460 
+repuls(1,1)=0.460 
+Cmin(1,1,1)=1.3 # PuMS
+Cmax(1,1,1)= 2.80 
+nn2(1,1)=1
+#Ga
+rho0(2)         =       0.6
+attrac(2,2)=0.097 
+repuls(2,2)=0.097 
+nn2(2,2)=1
+#HGa
+attrac(1,2)=0.300 
+repuls(1,2)=0.300 
+lattce(1,2)=l12 
+re(1,2)=3.19 
+delta(1,2)=-0.48  
+alpha(1,2)=6.6 
+Cmin(1,1,2)=2.0 
+Cmin(2,1,2)= 2.0 
+Cmin(1,2,1)=2.0 
+Cmin(2,2,1)     =       1.4
+Cmin(1,2,2)     =       1.4
+Cmin(1,1,2)     =       1.4
+nn2(1,2)=1
diff --git a/examples/meam/msmeam/README.md b/examples/meam/msmeam/README.md
new file mode 100644
index 0000000000..dbf569d4b3
--- /dev/null
+++ b/examples/meam/msmeam/README.md
@@ -0,0 +1,9 @@
+To run Baske's test, do
+
+    lmp -in in.msmeam
+
+Then 
+
+    diff dump.msmeam dump.msmeam.bu
+
+
diff --git a/examples/meam/msmeam/data.msmeam.bu b/examples/meam/msmeam/data.msmeam.bu
new file mode 100644
index 0000000000..576a3c50de
--- /dev/null
+++ b/examples/meam/msmeam/data.msmeam.bu
@@ -0,0 +1,25 @@
+LAMMPS data file via write_data, version 16 Feb 2016, timestep = 1
+
+3 atoms
+2 atom types
+
+-4.0000000000000000e+00 4.0000000000000000e+00 xlo xhi
+-4.0000000000000000e+00 4.0000000000000000e+00 ylo yhi
+-4.0000000000000000e+00 4.0000000000000000e+00 zlo zhi
+
+Masses
+
+1 1.0079
+2 69.723
+
+Atoms # atomic
+
+1 1 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 0 0 0
+2 2 2.2000000000000002e+00 0.0000000000000000e+00 0.0000000000000000e+00 0 0 0
+3 2 2.9999999999999999e-01 2.2999999999999998e+00 0.0000000000000000e+00 0 0 0
+
+Velocities
+
+1 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
+2 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
+3 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
diff --git a/examples/meam/msmeam/dump.msmeam.bu b/examples/meam/msmeam/dump.msmeam.bu
new file mode 100644
index 0000000000..039f630073
--- /dev/null
+++ b/examples/meam/msmeam/dump.msmeam.bu
@@ -0,0 +1,24 @@
+ITEM: TIMESTEP
+0
+ITEM: NUMBER OF ATOMS
+3
+ITEM: BOX BOUNDS pp pp pp
+-4 4
+-4 4
+-4 4
+ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] 
+1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0 
+2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0 
+3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0 
+ITEM: TIMESTEP
+1
+ITEM: NUMBER OF ATOMS
+3
+ITEM: BOX BOUNDS pp pp pp
+-4 4
+-4 4
+-4 4
+ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] 
+1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0 
+2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0 
+3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0 
diff --git a/examples/meam/msmeam/in.msmeam b/examples/meam/msmeam/in.msmeam
new file mode 100644
index 0000000000..82ffb89a13
--- /dev/null
+++ b/examples/meam/msmeam/in.msmeam
@@ -0,0 +1,31 @@
+echo	both
+log	log.msmeam
+# Test of MEAM potential for HGa
+
+# ------------------------ INITIALIZATION ----------------------------
+units           metal
+dimension       3
+boundary        p       p       p
+atom_style      atomic
+variable latparam equal 4.646
+variable ncell equal 3
+
+# ----------------------- ATOM DEFINITION ----------------------------
+region          box block -4 4 -4 4 -4 4
+create_box      2 box
+
+#
+
+include potential.mod
+create_atoms    1 single 0 0 0  units box
+create_atoms    2 single 2.2 0 0  units box
+create_atoms    2 single 0.3 2.3 0  units box
+# ---------- Define Settings ---------------------
+variable	teng equal "c_eatoms"
+compute pot_energy all pe/atom
+compute stress all stress/atom NULL
+dump 1 all custom 1 dump.msmeam id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
+run	1
+write_data	data.msmeam
+
+print "All done!"
diff --git a/examples/meam/msmeam/library.msmeam b/examples/meam/msmeam/library.msmeam
new file mode 100644
index 0000000000..9937eaee08
--- /dev/null
+++ b/examples/meam/msmeam/library.msmeam
@@ -0,0 +1,14 @@
+# DATE: 2018-09-22 UNITS: metal CONTRIBUTOR: Steve Valone, smv@lanl.gov CITATION: Baskes, PRB 1992; smv, sr, mib, JNM 2010
+# ms-meam data format May 2010
+#  elt   lat    z     ielement      atwt
+#  alpha b0     b1    b2     b3     b1m     b2m   b3m      alat   esub   asub
+#    -   t0     t1    t2     t3     t1m     t2m   t3m      rozero ibar
+#  NOTE:  leading character cannot be a space
+
+'H'    'dim'  1.0   1      1.0079
+2.960  2.960  3.0   1.0    1.0    1.0    3.0  1.0       0.741  2.235  2.50
+1.0    0.44721 0.0  0.00   0.0    0.31623 0  6.70 0
+
+'Ga4'  'fcc'  12.0  31     69.723
+4.42   4.80   3.10  6.00   0.00   0.0    0.0  0.5       4.247  2.897  0.97
+1.0    1.649 1.435  0.00   0.0    0.0  2.0       0.70   0
diff --git a/examples/meam/msmeam/log.msmeam.bu b/examples/meam/msmeam/log.msmeam.bu
new file mode 100644
index 0000000000..8eac453c1e
--- /dev/null
+++ b/examples/meam/msmeam/log.msmeam.bu
@@ -0,0 +1,107 @@
+# Test of MEAM potential for HGa
+
+# ------------------------ INITIALIZATION ----------------------------
+units           metal
+dimension       3
+boundary        p       p       p
+atom_style      atomic
+variable latparam equal 4.646
+variable ncell equal 3
+
+# ----------------------- ATOM DEFINITION ----------------------------
+region          box block -4 4 -4 4 -4 4
+create_box      2 box
+Created orthogonal box = (-4 -4 -4) to (4 4 4)
+  1 by 1 by 1 MPI processor grid
+
+#
+
+include potential.mod
+# NOTE: This script can be modified for different pair styles
+# See in.elastic for more info.
+
+variable Pu string H
+print "potential chosen ${Pu}"
+potential chosen H
+# Choose potential
+pair_style      MSmeam
+print		"we just executed"
+we just executed
+
+pair_coeff      * * library.MSmeam ${Pu} Ga4  HGaMS.meam ${Pu} Ga4
+pair_coeff      * * library.MSmeam H Ga4  HGaMS.meam ${Pu} Ga4
+pair_coeff      * * library.MSmeam H Ga4  HGaMS.meam H Ga4
+Reading potential file library.MSmeam with DATE: 2018-09-22
+# Setup neighbor style
+neighbor 1.0 nsq
+neigh_modify once no every 1 delay 0 check yes
+
+# Setup minimization style
+variable dmax equal 1.0e-2
+min_style	     cg
+min_modify	     dmax ${dmax} line quadratic
+min_modify	     dmax 0.01 line quadratic
+compute eng all pe/atom
+compute eatoms all reduce sum c_eng
+
+# Setup output
+thermo		100
+thermo_style custom step temp etotal  press pxx pyy pzz pxy pxz pyz lx ly lz vol c_eatoms
+thermo_modify norm yes
+create_atoms    1 single 0 0 0  units box
+Created 1 atoms
+create_atoms    2 single 2.2 0 0  units box
+Created 1 atoms
+create_atoms    2 single 0.3 2.3 0  units box
+Created 1 atoms
+# ---------- Define Settings ---------------------
+variable	teng equal "c_eatoms"
+compute pot_energy all pe/atom
+compute stress all stress/atom NULL
+dump 1 all custom 1 dump.msmeam id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
+run	1
+WARNING: No fixes defined, atoms won't move (../verlet.cpp:55)
+Neighbor list info ...
+  2 neighbor list requests
+  update every 1 steps, delay 0 steps, check yes
+  max neighbors/atom: 2000, page size: 100000
+  master list distance cutoff = 6.9
+  ghost atom cutoff = 6.9
+Memory usage per processor = 12.9295 Mbytes
+Step Temp TotEng Press Pxx Pyy Pzz Pxy Pxz Pyz Lx Ly Lz Volume eatoms 
+       0            0    15.433079    491354.68    838670.91    635393.13            0    80195.793            0            0            8            8            8          512    15.433079 
+       1            0    15.433079    491354.68    838670.91    635393.13            0    80195.793            0            0            8            8            8          512    15.433079 
+Loop time of 0.000172138 on 1 procs for 1 steps with 3 atoms
+
+Performance: 501.922 ns/day, 0.048 hours/ns, 5809.285 timesteps/s
+81.3% CPU use with 1 MPI tasks x no OpenMP threads
+
+MPI task timing breakdown:
+Section |  min time  |  avg time  |  max time  |%varavg| %total
+---------------------------------------------------------------
+Pair    | 6.6996e-05 | 6.6996e-05 | 6.6996e-05 |   0.0 | 38.92
+Neigh   | 0          | 0          | 0          |   0.0 |  0.00
+Comm    | 1.9073e-06 | 1.9073e-06 | 1.9073e-06 |   0.0 |  1.11
+Output  | 9.7036e-05 | 9.7036e-05 | 9.7036e-05 |   0.0 | 56.37
+Modify  | 0          | 0          | 0          |   0.0 |  0.00
+Other   |            | 6.199e-06  |            |       |  3.60
+
+Nlocal:    3 ave 3 max 3 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+Nghost:    78 ave 78 max 78 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+Neighs:    7 ave 7 max 7 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+FullNghs:  14 ave 14 max 14 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+
+Total # of neighbors = 14
+Ave neighs/atom = 4.66667
+Neighbor list builds = 0
+Dangerous builds = 0
+write_data	data.msmeam
+
+print "All done!"
+All done!
+Total wall time: 0:00:00
+
diff --git a/examples/meam/msmeam/msmeam.dump.bu b/examples/meam/msmeam/msmeam.dump.bu
new file mode 100644
index 0000000000..039f630073
--- /dev/null
+++ b/examples/meam/msmeam/msmeam.dump.bu
@@ -0,0 +1,24 @@
+ITEM: TIMESTEP
+0
+ITEM: NUMBER OF ATOMS
+3
+ITEM: BOX BOUNDS pp pp pp
+-4 4
+-4 4
+-4 4
+ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] 
+1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0 
+2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0 
+3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0 
+ITEM: TIMESTEP
+1
+ITEM: NUMBER OF ATOMS
+3
+ITEM: BOX BOUNDS pp pp pp
+-4 4
+-4 4
+-4 4
+ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] 
+1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0 
+2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0 
+3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0 
diff --git a/examples/meam/msmeam/potential.mod b/examples/meam/msmeam/potential.mod
new file mode 100644
index 0000000000..760cc93503
--- /dev/null
+++ b/examples/meam/msmeam/potential.mod
@@ -0,0 +1,25 @@
+# NOTE: This script can be modified for different pair styles 
+# See in.elastic for more info.
+
+variable Pu string H
+print "potential chosen ${Pu}"
+# Choose potential
+pair_style meam/ms
+print		"we just executed"
+
+pair_coeff      * * library.msmeam ${Pu} Ga4  HGa.meam ${Pu} Ga4
+# Setup neighbor style
+neighbor 1.0 bin
+neigh_modify once no every 1 delay 0 check yes
+
+# Setup minimization style
+variable dmax equal 1.0e-2
+min_style	     cg
+min_modify	     dmax ${dmax} line quadratic
+compute eng all pe/atom
+compute eatoms all reduce sum c_eng
+
+# Setup output
+thermo		100
+thermo_style custom step temp etotal  press pxx pyy pzz pxy pxz pyz lx ly lz vol c_eatoms
+thermo_modify norm yes
diff --git a/lib/kokkos/kokkos_5538.diff b/lib/kokkos/kokkos_5538.diff
deleted file mode 100644
index 6bf2ccf6a4..0000000000
--- a/lib/kokkos/kokkos_5538.diff
+++ /dev/null
@@ -1,199 +0,0 @@
-diff --git a/lib/kokkos/Makefile.kokkos b/lib/kokkos/Makefile.kokkos
-index 22af411f32..530510a0d1 100644
---- a/lib/kokkos/Makefile.kokkos
-+++ b/lib/kokkos/Makefile.kokkos
-@@ -20,7 +20,7 @@ KOKKOS_DEVICES ?= "OpenMP"
- #KOKKOS_DEVICES ?= "Threads"
- # Options:
- # Intel:    KNC,KNL,SNB,HSW,BDW,SKL,SKX,ICL,ICX,SPR
--# NVIDIA:   Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86
-+# NVIDIA:   Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86,Hopper90
- # ARM:      ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX
- # IBM:      BGQ,Power7,Power8,Power9
- # AMD-GPUS: Vega900,Vega906,Vega908,Vega90A
-@@ -401,6 +401,7 @@ KOKKOS_INTERNAL_USE_ARCH_VOLTA72 := $(call kokkos_has_string,$(KOKKOS_ARCH),Volt
- KOKKOS_INTERNAL_USE_ARCH_TURING75 := $(call kokkos_has_string,$(KOKKOS_ARCH),Turing75)
- KOKKOS_INTERNAL_USE_ARCH_AMPERE80 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere80)
- KOKKOS_INTERNAL_USE_ARCH_AMPERE86 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere86)
-+KOKKOS_INTERNAL_USE_ARCH_HOPPER90 := $(call kokkos_has_string,$(KOKKOS_ARCH),Hopper90)
- KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30)  \
-                                               + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32)  \
-                                               + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35)  \
-@@ -414,7 +415,8 @@ KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLE
-                                               + $(KOKKOS_INTERNAL_USE_ARCH_VOLTA72)   \
-                                               + $(KOKKOS_INTERNAL_USE_ARCH_TURING75)  \
-                                               + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE80)  \
--                                              + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE86))
-+                                              + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE86)  \
-+                                              + $(KOKKOS_INTERNAL_USE_ARCH_HOPPER90))
- 
- #SEK: This seems like a bug to me
- ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
-@@ -1194,6 +1196,11 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA_ARCH), 1)
-     tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMPERE86")
-     KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_86
-   endif
-+  ifeq ($(KOKKOS_INTERNAL_USE_ARCH_HOPPER90), 1)
-+    tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_HOPPER")
-+    tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_HOPPER90")
-+    KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_90
-+  endif
- 
-   ifneq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
-     KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)
-diff --git a/lib/kokkos/cmake/KokkosCore_config.h.in b/lib/kokkos/cmake/KokkosCore_config.h.in
-index 88ddc48378..b83ced9243 100644
---- a/lib/kokkos/cmake/KokkosCore_config.h.in
-+++ b/lib/kokkos/cmake/KokkosCore_config.h.in
-@@ -102,6 +102,7 @@
- #cmakedefine KOKKOS_ARCH_AMPERE
- #cmakedefine KOKKOS_ARCH_AMPERE80
- #cmakedefine KOKKOS_ARCH_AMPERE86
-+#cmakedefine KOKKOS_ARCH_HOPPER90
- #cmakedefine KOKKOS_ARCH_AMD_ZEN
- #cmakedefine KOKKOS_ARCH_AMD_ZEN2
- #cmakedefine KOKKOS_ARCH_AMD_ZEN3
-diff --git a/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc b/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc
-index f56cef1651..2585a6a64c 100644
---- a/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc
-+++ b/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc
-@@ -74,6 +74,7 @@ int main() {
-     case 75: std::cout << "Set -DKokkos_ARCH_TURING75=ON ." << std::endl; break;
-     case 80: std::cout << "Set -DKokkos_ARCH_AMPERE80=ON ." << std::endl; break;
-     case 86: std::cout << "Set -DKokkos_ARCH_AMPERE86=ON ." << std::endl; break;
-+    case 90: std::cout << "Set -DKokkos_ARCH_HOPPER90=ON ." << std::endl; break;
-     default:
-       std::cout << "Compute capability " << compute_capability
-                 << " is not supported" << std::endl;
-diff --git a/lib/kokkos/cmake/kokkos_arch.cmake b/lib/kokkos/cmake/kokkos_arch.cmake
-index ef16aad047..c1d76cceeb 100644
---- a/lib/kokkos/cmake/kokkos_arch.cmake
-+++ b/lib/kokkos/cmake/kokkos_arch.cmake
-@@ -86,6 +86,7 @@ KOKKOS_ARCH_OPTION(VOLTA72         GPU  "NVIDIA Volta generation CC 7.2"   "KOKK
- KOKKOS_ARCH_OPTION(TURING75        GPU  "NVIDIA Turing generation CC 7.5"  "KOKKOS_SHOW_CUDA_ARCHS")
- KOKKOS_ARCH_OPTION(AMPERE80        GPU  "NVIDIA Ampere generation CC 8.0"  "KOKKOS_SHOW_CUDA_ARCHS")
- KOKKOS_ARCH_OPTION(AMPERE86        GPU  "NVIDIA Ampere generation CC 8.6"  "KOKKOS_SHOW_CUDA_ARCHS")
-+KOKKOS_ARCH_OPTION(HOPPER90        GPU  "NVIDIA Hopper generation CC 9.0"  "KOKKOS_SHOW_CUDA_ARCHS")
- 
- IF(Kokkos_ENABLE_HIP OR Kokkos_ENABLE_OPENMPTARGET OR Kokkos_ENABLE_UNSUPPORTED_ARCHS)
-   SET(KOKKOS_SHOW_HIP_ARCHS ON)
-@@ -544,6 +545,7 @@ CHECK_CUDA_ARCH(VOLTA72   sm_72)
- CHECK_CUDA_ARCH(TURING75  sm_75)
- CHECK_CUDA_ARCH(AMPERE80  sm_80)
- CHECK_CUDA_ARCH(AMPERE86  sm_86)
-+CHECK_CUDA_ARCH(HOPPER90  sm_90)
- 
- SET(AMDGPU_ARCH_ALREADY_SPECIFIED "")
- FUNCTION(CHECK_AMDGPU_ARCH ARCH FLAG)
-@@ -806,6 +808,10 @@ IF (KOKKOS_ARCH_AMPERE80 OR KOKKOS_ARCH_AMPERE86)
-   SET(KOKKOS_ARCH_AMPERE ON)
- ENDIF()
- 
-+IF (KOKKOS_ARCH_HOPPER90)
-+  SET(KOKKOS_ARCH_HOPPER ON)
-+ENDIF()
-+
- #Regardless of version, make sure we define the general architecture name
- IF (KOKKOS_ARCH_VEGA900 OR KOKKOS_ARCH_VEGA906 OR KOKKOS_ARCH_VEGA908 OR KOKKOS_ARCH_VEGA90A)
-   SET(KOKKOS_ARCH_VEGA ON)
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-index 56f9117844..fcd4773dbc 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-@@ -232,7 +232,8 @@ inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
-       case 61: return 96;
-       case 70:
-       case 80:
--      case 86: return 8;
-+      case 86:
-+      case 90: return 8;
-       case 75: return 32;
-       default:
-         Kokkos::Impl::throw_runtime_exception(
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp
-index 40a263561f..8c40ebd60d 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp
-@@ -418,7 +418,7 @@ KOKKOS_INLINE_FUNCTION
- #endif  // CUDA_VERSION >= 11000 && CUDA_VERSION < 11010
- 
- #if CUDA_VERSION >= 11010 && \
--    ((defined(KOKKOS_ARCH_AMPERE80) || defined(KOKKOS_ARCH_AMPERE86)))
-+    ((defined(KOKKOS_ARCH_AMPERE) || defined(KOKKOS_ARCH_HOPPER)))
- KOKKOS_INLINE_FUNCTION
- bhalf_t cast_to_bhalf(bhalf_t val) { return val; }
- KOKKOS_INLINE_FUNCTION
-diff --git a/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp b/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
-index f9451ecfe6..2ce1efb98c 100644
---- a/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
-+++ b/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
-@@ -51,7 +51,7 @@ namespace Kokkos::Experimental::Impl {
- 
- struct OpenACC_Traits {
- #if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
--    defined(KOKKOS_ARCH_AMPERE)
-+    defined(KOKKOS_ARCH_AMPERE) || defined(KOKKOS_ARCH_HOPPER)
-   static constexpr acc_device_t dev_type     = acc_device_nvidia;
-   static constexpr bool may_fallback_to_host = false;
- #else
-diff --git a/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp b/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp
-index a9bc085912..27ee1d4232 100644
---- a/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp
-+++ b/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp
-@@ -115,8 +115,9 @@ void OpenMPTargetInternal::impl_initialize() {
- 
-   // FIXME_OPENMPTARGET:  Only fix the number of teams for NVIDIA architectures
-   // from Pascal and upwards.
--#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
--    defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE)
-+#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) ||    \
-+    defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \
-+    defined(KOKKOS_ARCH_HOPPER)
- #if defined(KOKKOS_COMPILER_CLANG) && (KOKKOS_COMPILER_CLANG >= 1300)
-   omp_set_num_teams(512);
- #endif
-diff --git a/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp b/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp
-index 840db4327c..7e5addbc5b 100644
---- a/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp
-+++ b/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp
-@@ -155,7 +155,7 @@ void SYCL::impl_initialize(InitializationSettings const& settings) {
- #if !defined(KOKKOS_ARCH_INTEL_GPU) && !defined(KOKKOS_ARCH_KEPLER) && \
-     !defined(KOKKOS_ARCH_MAXWELL) && !defined(KOKKOS_ARCH_PASCAL) &&   \
-     !defined(KOKKOS_ARCH_VOLTA) && !defined(KOKKOS_ARCH_TURING75) &&   \
--    !defined(KOKKOS_ARCH_AMPERE)
-+    !defined(KOKKOS_ARCH_AMPERE) && !defined(KOKKOS_ARCH_HOPPER)
-   if (!settings.has_device_id() && gpu_devices.empty()) {
-     Impl::SYCLInternal::singleton().initialize(sycl::device());
-     return;
-diff --git a/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp b/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp
-index 5ac7d8af30..ba101f699e 100644
---- a/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp
-+++ b/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp
-@@ -335,9 +335,10 @@ class TeamPolicyInternal<Kokkos::Experimental::SYCL, Properties...>
-     return std::min({
-              int(m_space.impl_internal_space_instance()->m_maxWorkgroupSize),
-       // FIXME_SYCL Avoid requesting to many registers on NVIDIA GPUs.
--#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
--    defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) ||   \
--    defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE)
-+#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) ||  \
-+    defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) ||    \
-+    defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \
-+    defined(KOKKOS_ARCH_HOPPER)
-                  256,
- #endif
-                  max_threads_for_memory
-@@ -367,9 +368,10 @@ class TeamPolicyInternal<Kokkos::Experimental::SYCL, Properties...>
-     return std::min<int>({
-              int(m_space.impl_internal_space_instance()->m_maxWorkgroupSize),
-       // FIXME_SYCL Avoid requesting to many registers on NVIDIA GPUs.
--#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
--    defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) ||   \
--    defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE)
-+#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) ||  \
-+    defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) ||    \
-+    defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \
-+    defined(KOKKOS_ARCH_HOPPER)
-                  256,
- #endif
-                  max_threads_for_memory
diff --git a/lib/kokkos/kokkos_5706.diff b/lib/kokkos/kokkos_5706.diff
deleted file mode 100644
index 2bfbb35b06..0000000000
--- a/lib/kokkos/kokkos_5706.diff
+++ /dev/null
@@ -1,523 +0,0 @@
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-index fcd4773dbc..30b6958a67 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-@@ -207,7 +207,6 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
-                                 LaunchBounds{});
- }
- 
--// Assuming cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1)
- // NOTE these number can be obtained several ways:
- // * One option is to download the CUDA Occupancy Calculator spreadsheet, select
- // "Compute Capability" first and check what is the smallest "Shared Memory
-@@ -242,6 +241,7 @@ inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
-     return 0;
-   }() * 1024;
- }
-+
- }  // namespace Impl
- }  // namespace Kokkos
- 
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
-index 5811498e01..e22eb3b842 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
-@@ -569,12 +569,6 @@ Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
-   }
- #endif
- 
--#ifdef KOKKOS_ENABLE_PRE_CUDA_10_DEPRECATION_API
--  cudaThreadSetCacheConfig(cudaFuncCachePreferShared);
--#else
--  cudaDeviceSetCacheConfig(cudaFuncCachePreferShared);
--#endif
--
-   // Init the array for used for arbitrarily sized atomics
-   if (stream == nullptr) Impl::initialize_host_cuda_lock_arrays();
- 
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
-index b7a80ad84f..5c4c3a7d39 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
-@@ -93,10 +93,6 @@ namespace Impl {
- //   __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
- // function qualifier which could be used to improve performance.
- //----------------------------------------------------------------------------
--// Maximize L1 cache and minimize shared memory:
--//   cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1 );
--// For 2.0 capability: 48 KB L1 and 16 KB shared
--//----------------------------------------------------------------------------
- 
- template <class DriverType>
- __global__ static void cuda_parallel_launch_constant_memory() {
-@@ -158,63 +154,105 @@ inline void check_shmem_request(CudaInternal const* cuda_instance, int shmem) {
-   }
- }
- 
--// This function needs to be template on DriverType and LaunchBounds
-+// These functions needs to be template on DriverType and LaunchBounds
- // so that the static bool is unique for each type combo
- // KernelFuncPtr does not necessarily contain that type information.
-+
- template <class DriverType, class LaunchBounds, class KernelFuncPtr>
--inline void configure_shmem_preference(KernelFuncPtr const& func,
--                                       bool prefer_shmem) {
-+const cudaFuncAttributes& get_cuda_kernel_func_attributes(
-+    const KernelFuncPtr& func) {
-+  // Only call cudaFuncGetAttributes once for each unique kernel
-+  // by leveraging static variable initialization rules
-+  auto wrap_get_attributes = [&]() -> cudaFuncAttributes {
-+    cudaFuncAttributes attr;
-+    KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncGetAttributes(&attr, func));
-+    return attr;
-+  };
-+  static cudaFuncAttributes func_attr = wrap_get_attributes();
-+  return func_attr;
-+}
-+
-+template <class DriverType, class LaunchBounds, class KernelFuncPtr>
-+inline void configure_shmem_preference(const KernelFuncPtr& func,
-+                                       const cudaDeviceProp& device_props,
-+                                       const size_t block_size, int& shmem,
-+                                       const size_t occupancy) {
- #ifndef KOKKOS_ARCH_KEPLER
--  // On Kepler the L1 has no benefit since it doesn't cache reads
-+
-+  const auto& func_attr =
-+      get_cuda_kernel_func_attributes<DriverType, LaunchBounds>(func);
-+
-+  // Compute limits for number of blocks due to registers/SM
-+  const size_t regs_per_sm     = device_props.regsPerMultiprocessor;
-+  const size_t regs_per_thread = func_attr.numRegs;
-+  // The granularity of register allocation is chunks of 256 registers per warp
-+  // -> 8 registers per thread
-+  const size_t allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8);
-+  const size_t max_blocks_regs =
-+      regs_per_sm / (allocated_regs_per_thread * block_size);
-+
-+  // Compute how many threads per sm we actually want
-+  const size_t max_threads_per_sm = device_props.maxThreadsPerMultiProcessor;
-+  // only allocate multiples of warp size
-+  const size_t num_threads_desired =
-+      ((max_threads_per_sm * occupancy / 100 + 31) / 32) * 32;
-+  // Get close to the desired occupancy,
-+  // don't undershoot by much but also don't allocate a whole new block just
-+  // because one is a few threads over otherwise.
-+  size_t num_blocks_desired =
-+      (num_threads_desired + block_size * 0.8) / block_size;
-+  num_blocks_desired = ::std::min(max_blocks_regs, num_blocks_desired);
-+  if (num_blocks_desired == 0) num_blocks_desired = 1;
-+
-+  // Calculate how much shared memory we need per block
-+  size_t shmem_per_block = shmem + func_attr.sharedSizeBytes;
-+
-+  // The minimum shared memory allocation we can have in total per SM is 8kB.
-+  // If we want to lower occupancy we have to make sure we request at least that
-+  // much in aggregate over all blocks, so that shared memory actually becomes a
-+  // limiting factor for occupancy
-+  constexpr size_t min_shmem_size_per_sm = 8192;
-+  if ((occupancy < 100) &&
-+      (shmem_per_block * num_blocks_desired < min_shmem_size_per_sm)) {
-+    shmem_per_block = min_shmem_size_per_sm / num_blocks_desired;
-+    // Need to set the caller's shmem variable so that the
-+    // kernel launch uses the correct dynamic shared memory request
-+    shmem = shmem_per_block - func_attr.sharedSizeBytes;
-+  }
-+
-+  // Compute the carveout fraction we need based on occupancy
-+  // Use multiples of 8kB
-+  const size_t max_shmem_per_sm = device_props.sharedMemPerMultiprocessor;
-+  size_t carveout               = shmem_per_block == 0
-+                        ? 0
-+                        : 100 *
-+                              (((num_blocks_desired * shmem_per_block +
-+                                 min_shmem_size_per_sm - 1) /
-+                                min_shmem_size_per_sm) *
-+                               min_shmem_size_per_sm) /
-+                              max_shmem_per_sm;
-+  if (carveout > 100) carveout = 100;
-+
-+  // Set the carveout, but only call it once per kernel or when it changes
-   auto set_cache_config = [&] {
--    KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
--        func,
--        (prefer_shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1)));
--    return prefer_shmem;
-+    KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncSetAttribute(
-+        func, cudaFuncAttributePreferredSharedMemoryCarveout, carveout));
-+    return carveout;
-   };
--  static bool cache_config_preference_cached = set_cache_config();
--  if (cache_config_preference_cached != prefer_shmem) {
-+  // Store the value in a static variable so we only reset if needed
-+  static size_t cache_config_preference_cached = set_cache_config();
-+  if (cache_config_preference_cached != carveout) {
-     cache_config_preference_cached = set_cache_config();
-   }
- #else
-   // Use the parameters so we don't get a warning
-   (void)func;
--  (void)prefer_shmem;
-+  (void)device_props;
-+  (void)block_size;
-+  (void)occupancy;
- #endif
- }
- 
--template <class Policy>
--std::enable_if_t<Policy::experimental_contains_desired_occupancy>
--modify_launch_configuration_if_desired_occupancy_is_specified(
--    Policy const& policy, cudaDeviceProp const& properties,
--    cudaFuncAttributes const& attributes, dim3 const& block, int& shmem,
--    bool& prefer_shmem) {
--  int const block_size        = block.x * block.y * block.z;
--  int const desired_occupancy = policy.impl_get_desired_occupancy().value();
--
--  size_t const shmem_per_sm_prefer_l1 = get_shmem_per_sm_prefer_l1(properties);
--  size_t const static_shmem           = attributes.sharedSizeBytes;
--
--  // round to nearest integer and avoid division by zero
--  int active_blocks = std::max(
--      1, static_cast<int>(std::round(
--             static_cast<double>(properties.maxThreadsPerMultiProcessor) /
--             block_size * desired_occupancy / 100)));
--  int const dynamic_shmem =
--      shmem_per_sm_prefer_l1 / active_blocks - static_shmem;
--
--  if (dynamic_shmem > shmem) {
--    shmem        = dynamic_shmem;
--    prefer_shmem = false;
--  }
--}
--
--template <class Policy>
--std::enable_if_t<!Policy::experimental_contains_desired_occupancy>
--modify_launch_configuration_if_desired_occupancy_is_specified(
--    Policy const&, cudaDeviceProp const&, cudaFuncAttributes const&,
--    dim3 const& /*block*/, int& /*shmem*/, bool& /*prefer_shmem*/) {}
--
- // </editor-fold> end Some helper functions for launch code readability }}}1
- //==============================================================================
- 
-@@ -348,7 +386,7 @@ struct CudaParallelLaunchKernelInvoker<
- #ifdef KOKKOS_CUDA_ENABLE_GRAPHS
-   inline static void create_parallel_launch_graph_node(
-       DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
--      CudaInternal const* cuda_instance, bool prefer_shmem) {
-+      CudaInternal const* cuda_instance) {
-     //----------------------------------------
-     auto const& graph = Impl::get_cuda_graph_from_kernel(driver);
-     KOKKOS_EXPECTS(bool(graph));
-@@ -358,8 +396,15 @@ struct CudaParallelLaunchKernelInvoker<
- 
-     if (!Impl::is_empty_launch(grid, block)) {
-       Impl::check_shmem_request(cuda_instance, shmem);
--      Impl::configure_shmem_preference<DriverType, LaunchBounds>(
--          base_t::get_kernel_func(), prefer_shmem);
-+      if (DriverType::Policy::
-+                        experimental_contains_desired_occupancy) {
-+        int desired_occupancy =
-+            driver.get_policy().impl_get_desired_occupancy().value();
-+        size_t block_size = block.x * block.y * block.z;
-+        Impl::configure_shmem_preference<DriverType, LaunchBounds>(
-+            base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
-+            shmem, desired_occupancy);
-+      }
- 
-       void const* args[] = {&driver};
- 
-@@ -442,7 +487,7 @@ struct CudaParallelLaunchKernelInvoker<
- #ifdef KOKKOS_CUDA_ENABLE_GRAPHS
-   inline static void create_parallel_launch_graph_node(
-       DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
--      CudaInternal const* cuda_instance, bool prefer_shmem) {
-+      CudaInternal const* cuda_instance) {
-     //----------------------------------------
-     auto const& graph = Impl::get_cuda_graph_from_kernel(driver);
-     KOKKOS_EXPECTS(bool(graph));
-@@ -452,8 +497,15 @@ struct CudaParallelLaunchKernelInvoker<
- 
-     if (!Impl::is_empty_launch(grid, block)) {
-       Impl::check_shmem_request(cuda_instance, shmem);
--      Impl::configure_shmem_preference<DriverType, LaunchBounds>(
--          base_t::get_kernel_func(), prefer_shmem);
-+      if constexpr (DriverType::Policy::
-+                        experimental_contains_desired_occupancy) {
-+        int desired_occupancy =
-+            driver.get_policy().impl_get_desired_occupancy().value();
-+        size_t block_size = block.x * block.y * block.z;
-+        Impl::configure_shmem_preference<DriverType, LaunchBounds>(
-+            base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
-+            shmem, desired_occupancy);
-+      }
- 
-       auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver);
- 
-@@ -566,7 +618,7 @@ struct CudaParallelLaunchKernelInvoker<
- #ifdef KOKKOS_CUDA_ENABLE_GRAPHS
-   inline static void create_parallel_launch_graph_node(
-       DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
--      CudaInternal const* cuda_instance, bool prefer_shmem) {
-+      CudaInternal const* cuda_instance) {
-     // Just use global memory; coordinating through events to share constant
-     // memory with the non-graph interface is not really reasonable since
-     // events don't work with Graphs directly, and this would anyway require
-@@ -580,7 +632,7 @@ struct CudaParallelLaunchKernelInvoker<
-         DriverType, LaunchBounds,
-         Experimental::CudaLaunchMechanism::GlobalMemory>;
-     global_launch_impl_t::create_parallel_launch_graph_node(
--        driver, grid, block, shmem, cuda_instance, prefer_shmem);
-+        driver, grid, block, shmem, cuda_instance);
-   }
- #endif
- };
-@@ -613,8 +665,7 @@ struct CudaParallelLaunchImpl<
- 
-   inline static void launch_kernel(const DriverType& driver, const dim3& grid,
-                                    const dim3& block, int shmem,
--                                   const CudaInternal* cuda_instance,
--                                   bool prefer_shmem) {
-+                                   const CudaInternal* cuda_instance) {
-     if (!Impl::is_empty_launch(grid, block)) {
-       // Prevent multiple threads to simultaneously set the cache configuration
-       // preference and launch the same kernel
-@@ -623,18 +674,17 @@ struct CudaParallelLaunchImpl<
- 
-       Impl::check_shmem_request(cuda_instance, shmem);
- 
--      // If a desired occupancy is specified, we compute how much shared memory
--      // to ask for to achieve that occupancy, assuming that the cache
--      // configuration is `cudaFuncCachePreferL1`.  If the amount of dynamic
--      // shared memory computed is actually smaller than `shmem` we overwrite
--      // `shmem` and set `prefer_shmem` to `false`.
--      modify_launch_configuration_if_desired_occupancy_is_specified(
--          driver.get_policy(), cuda_instance->m_deviceProp,
--          get_cuda_func_attributes(), block, shmem, prefer_shmem);
--
--      Impl::configure_shmem_preference<
--          DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
--          base_t::get_kernel_func(), prefer_shmem);
-+      if (DriverType::Policy::
-+                        experimental_contains_desired_occupancy) {
-+        int desired_occupancy =
-+            driver.get_policy().impl_get_desired_occupancy().value();
-+        size_t block_size = block.x * block.y * block.z;
-+        Impl::configure_shmem_preference<
-+            DriverType,
-+            Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
-+            base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
-+            shmem, desired_occupancy);
-+      }
- 
-       KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
- 
-@@ -650,18 +700,9 @@ struct CudaParallelLaunchImpl<
-   }
- 
-   static cudaFuncAttributes get_cuda_func_attributes() {
--    // Race condition inside of cudaFuncGetAttributes if the same address is
--    // given requires using a local variable as input instead of a static Rely
--    // on static variable initialization to make sure only one thread executes
--    // the code and the result is visible.
--    auto wrap_get_attributes = []() -> cudaFuncAttributes {
--      cudaFuncAttributes attr_tmp;
--      KOKKOS_IMPL_CUDA_SAFE_CALL(
--          cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()));
--      return attr_tmp;
--    };
--    static cudaFuncAttributes attr = wrap_get_attributes();
--    return attr;
-+    return get_cuda_kernel_func_attributes<
-+        DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
-+        base_t::get_kernel_func());
-   }
- };
- 
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
-index e586bb4cc6..0e348c092a 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
-@@ -121,8 +121,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
-               maxblocks[1]),
-           1);
-       CudaParallelLaunch<ParallelFor, LaunchBounds>(
--          *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
--          false);
-+          *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
-     } else if (RP::rank == 3) {
-       const dim3 block(m_rp.m_tile[0], m_rp.m_tile[1], m_rp.m_tile[2]);
-       KOKKOS_ASSERT(block.x > 0);
-@@ -139,8 +138,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
-               (m_rp.m_upper[2] - m_rp.m_lower[2] + block.z - 1) / block.z,
-               maxblocks[2]));
-       CudaParallelLaunch<ParallelFor, LaunchBounds>(
--          *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
--          false);
-+          *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
-     } else if (RP::rank == 4) {
-       // id0,id1 encoded within threadIdx.x; id2 to threadIdx.y; id3 to
-       // threadIdx.z
-@@ -158,8 +156,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
-               (m_rp.m_upper[3] - m_rp.m_lower[3] + block.z - 1) / block.z,
-               maxblocks[2]));
-       CudaParallelLaunch<ParallelFor, LaunchBounds>(
--          *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
--          false);
-+          *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
-     } else if (RP::rank == 5) {
-       // id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4 to
-       // threadIdx.z
-@@ -175,8 +172,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
-               (m_rp.m_upper[4] - m_rp.m_lower[4] + block.z - 1) / block.z,
-               maxblocks[2]));
-       CudaParallelLaunch<ParallelFor, LaunchBounds>(
--          *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
--          false);
-+          *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
-     } else if (RP::rank == 6) {
-       // id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4,id5 to
-       // threadIdx.z
-@@ -191,8 +187,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
-           std::min<array_index_type>(m_rp.m_tile_end[4] * m_rp.m_tile_end[5],
-                                      maxblocks[2]));
-       CudaParallelLaunch<ParallelFor, LaunchBounds>(
--          *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
--          false);
-+          *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
-     } else {
-       Kokkos::abort("Kokkos::MDRange Error: Exceeded rank bounds with Cuda\n");
-     }
-@@ -405,8 +400,8 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
- 
-       CudaParallelLaunch<ParallelReduce, LaunchBounds>(
-           *this, grid, block, shmem,
--          m_policy.space().impl_internal_space_instance(),
--          false);  // copy to device and execute
-+          m_policy.space()
-+	      .impl_internal_space_instance()); // copy to device and execute
- 
-       if (!m_result_ptr_device_accessible) {
-         if (m_result_ptr) {
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
-index ac160f8fe2..d1031751c2 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
-@@ -135,8 +135,7 @@ class ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
- #endif
- 
-     CudaParallelLaunch<ParallelFor, LaunchBounds>(
--        *this, grid, block, 0, m_policy.space().impl_internal_space_instance(),
--        false);
-+        *this, grid, block, 0, m_policy.space().impl_internal_space_instance());
-   }
- 
-   ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)
-@@ -375,8 +374,8 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
- 
-       CudaParallelLaunch<ParallelReduce, LaunchBounds>(
-           *this, grid, block, shmem,
--          m_policy.space().impl_internal_space_instance(),
--          false);  // copy to device and execute
-+          m_policy.space()
-+	      .impl_internal_space_instance());  // copy to device and execute
- 
-       if (!m_result_ptr_device_accessible) {
-         if (m_result_ptr) {
-@@ -726,16 +725,16 @@ class ParallelScan<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
-         m_final = false;
-         CudaParallelLaunch<ParallelScan, LaunchBounds>(
-             *this, grid, block, shmem,
--            m_policy.space().impl_internal_space_instance(),
--            false);  // copy to device and execute
-+            m_policy.space()
-+	        .impl_internal_space_instance());  // copy to device and execute
- #ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
-       }
- #endif
-       m_final = true;
-       CudaParallelLaunch<ParallelScan, LaunchBounds>(
-           *this, grid, block, shmem,
--          m_policy.space().impl_internal_space_instance(),
--          false);  // copy to device and execute
-+          m_policy.space()
-+	      .impl_internal_space_instance());  // copy to device and execute
-     }
-   }
- 
-@@ -1038,16 +1037,16 @@ class ParallelScanWithTotal<FunctorType, Kokkos::RangePolicy<Traits...>,
-         m_final = false;
-         CudaParallelLaunch<ParallelScanWithTotal, LaunchBounds>(
-             *this, grid, block, shmem,
--            m_policy.space().impl_internal_space_instance(),
--            false);  // copy to device and execute
-+            m_policy.space()
-+	        .impl_internal_space_instance());  // copy to device and execute
- #ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
-       }
- #endif
-       m_final = true;
-       CudaParallelLaunch<ParallelScanWithTotal, LaunchBounds>(
-           *this, grid, block, shmem,
--          m_policy.space().impl_internal_space_instance(),
--          false);  // copy to device and execute
-+          m_policy.space()
-+              .impl_internal_space_instance());  // copy to device and execute
- 
-       const int size = Analysis::value_size(m_functor);
- #ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp
-index cdd16085b3..ea9430b812 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp
-@@ -552,8 +552,8 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
- 
-     CudaParallelLaunch<ParallelFor, LaunchBounds>(
-         *this, grid, block, shmem_size_total,
--        m_policy.space().impl_internal_space_instance(),
--        true);  // copy to device and execute
-+        m_policy.space()
-+	    .impl_internal_space_instance());  // copy to device and execute
-   }
- 
-   ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)
-@@ -878,8 +878,8 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
- 
-       CudaParallelLaunch<ParallelReduce, LaunchBounds>(
-           *this, grid, block, shmem_size_total,
--          m_policy.space().impl_internal_space_instance(),
--          true);  // copy to device and execute
-+          m_policy.space()
-+	      .impl_internal_space_instance());  // copy to device and execute
- 
-       if (!m_result_ptr_device_accessible) {
-         m_policy.space().fence(
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
-index 34d4bef9fd..178012431c 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
-@@ -428,11 +428,6 @@ struct CudaReductionsFunctor<FunctorType, false, false> {
- //   __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
- // function qualifier which could be used to improve performance.
- //----------------------------------------------------------------------------
--// Maximize shared memory and minimize L1 cache:
--//   cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShared );
--// For 2.0 capability: 48 KB shared and 16 KB L1
--//----------------------------------------------------------------------------
--//----------------------------------------------------------------------------
- /*
-  *  Algorithmic constraints:
-  *   (a) blockDim.y <= 1024
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp
-index fb3a6b138f..a12378a891 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp
-@@ -100,8 +100,7 @@ class ParallelFor<FunctorType, Kokkos::WorkGraphPolicy<Traits...>,
-     const int shared = 0;
- 
-     Kokkos::Impl::CudaParallelLaunch<Self>(
--        *this, grid, block, shared, Cuda().impl_internal_space_instance(),
--        false);
-+        *this, grid, block, shared, Cuda().impl_internal_space_instance());
-   }
- 
-   inline ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)
diff --git a/lib/kokkos/kokkos_5731.diff b/lib/kokkos/kokkos_5731.diff
deleted file mode 100644
index e95f4a1546..0000000000
--- a/lib/kokkos/kokkos_5731.diff
+++ /dev/null
@@ -1,46 +0,0 @@
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-index 30b6958a67..b94f053272 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-@@ -207,41 +207,6 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
-                                 LaunchBounds{});
- }
- 
--// NOTE these number can be obtained several ways:
--// * One option is to download the CUDA Occupancy Calculator spreadsheet, select
--// "Compute Capability" first and check what is the smallest "Shared Memory
--// Size Config" that is available.  The "Shared Memory Per Multiprocessor" in
--// bytes is then to be found below in the summary.
--// * Another option would be to look for the information in the "Tuning
--// Guide(s)" of the CUDA Toolkit Documentation for each GPU architecture, in
--// the "Shared Memory" section (more tedious)
--inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
--  int const compute_capability = properties.major * 10 + properties.minor;
--  return [compute_capability]() {
--    switch (compute_capability) {
--      case 30:
--      case 32:
--      case 35: return 16;
--      case 37: return 80;
--      case 50:
--      case 53:
--      case 60:
--      case 62: return 64;
--      case 52:
--      case 61: return 96;
--      case 70:
--      case 80:
--      case 86:
--      case 90: return 8;
--      case 75: return 32;
--      default:
--        Kokkos::Impl::throw_runtime_exception(
--            "Unknown device in cuda block size deduction");
--    }
--    return 0;
--  }() * 1024;
--}
--
- }  // namespace Impl
- }  // namespace Kokkos
- 
diff --git a/lib/kokkos/kokkos_5739.diff b/lib/kokkos/kokkos_5739.diff
deleted file mode 100644
index fe7a1ff551..0000000000
--- a/lib/kokkos/kokkos_5739.diff
+++ /dev/null
@@ -1,204 +0,0 @@
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-index b94f053272..252c13c524 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
-@@ -53,17 +53,69 @@
- namespace Kokkos {
- namespace Impl {
- 
-+inline int cuda_warp_per_sm_allocation_granularity(
-+    cudaDeviceProp const& properties) {
-+  // Allocation granularity of warps in each sm
-+  switch (properties.major) {
-+    case 3:
-+    case 5:
-+    case 7:
-+    case 8:
-+    case 9: return 4;
-+    case 6: return (properties.minor == 0 ? 2 : 4);
-+    default:
-+      throw_runtime_exception(
-+          "Unknown device in cuda warp per sm allocation granularity");
-+      return 0;
-+  }
-+}
-+
-+inline int cuda_max_warps_per_sm_registers(
-+    cudaDeviceProp const& properties, cudaFuncAttributes const& attributes) {
-+  // Maximum number of warps per sm as a function of register counts,
-+  // subject to the constraint that warps are allocated with a fixed granularity
-+  int const max_regs_per_block = properties.regsPerBlock;
-+  int const regs_per_warp      = attributes.numRegs * properties.warpSize;
-+  int const warp_granularity =
-+      cuda_warp_per_sm_allocation_granularity(properties);
-+  // The granularity of register allocation is chunks of 256 registers per warp,
-+  // which implies a need to over-allocate, so we round up
-+  int const allocated_regs_per_warp = (regs_per_warp + 256 - 1) / 256;
-+
-+  // The maximum number of warps per SM is constrained from above by register
-+  // allocation. To satisfy the constraint that warps per SM is allocated at a
-+  // finite granularity, we need to round down.
-+  int const max_warps_per_sm =
-+      warp_granularity *
-+      (max_regs_per_block / (allocated_regs_per_warp * warp_granularity));
-+
-+  return max_warps_per_sm;
-+}
-+
- inline int cuda_max_active_blocks_per_sm(cudaDeviceProp const& properties,
-                                          cudaFuncAttributes const& attributes,
-                                          int block_size, size_t dynamic_shmem) {
--  // Limits due do registers/SM
-+  // Limits due to registers/SM
-   int const regs_per_sm     = properties.regsPerMultiprocessor;
-   int const regs_per_thread = attributes.numRegs;
-   // The granularity of register allocation is chunks of 256 registers per warp
-   // -> 8 registers per thread
-   int const allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8);
--  int const max_blocks_regs =
--      regs_per_sm / (allocated_regs_per_thread * block_size);
-+  int max_blocks_regs = regs_per_sm / (allocated_regs_per_thread * block_size);
-+
-+  // Compute the maximum number of warps as a function of the number of
-+  // registers
-+  int const max_warps_per_sm_registers =
-+      cuda_max_warps_per_sm_registers(properties, attributes);
-+
-+  // Constrain the number of blocks to respect the maximum number of warps per
-+  // SM On face value this should be an equality, but due to the warp
-+  // granularity constraints noted in `cuda_max_warps_per_sm_registers` the
-+  // left-hand-side of this comparison can overshoot what the hardware allows
-+  // based on register counts alone
-+  while ((max_blocks_regs * block_size / properties.warpSize) >
-+         max_warps_per_sm_registers)
-+    max_blocks_regs--;
- 
-   // Limits due to shared memory/SM
-   size_t const shmem_per_sm            = properties.sharedMemPerMultiprocessor;
-@@ -207,6 +259,19 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
-                                 LaunchBounds{});
- }
- 
-+template <class LaunchBounds>
-+int cuda_get_opt_block_size_no_shmem(const cudaFuncAttributes& attr,
-+                                     LaunchBounds) {
-+  auto const& prop = Kokkos::Cuda().cuda_device_prop();
-+
-+  // Thin version of cuda_get_opt_block_size for cases where there is no shared
-+  // memory
-+  auto const block_size_to_no_shmem = [&](int /*block_size*/) { return 0; };
-+
-+  return cuda_deduce_block_size(false, prop, attr, block_size_to_no_shmem,
-+                                LaunchBounds{});
-+}
-+
- }  // namespace Impl
- }  // namespace Kokkos
- 
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
-index 5c4c3a7d39..170183ca0a 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
-@@ -188,9 +188,23 @@ inline void configure_shmem_preference(const KernelFuncPtr& func,
-   // The granularity of register allocation is chunks of 256 registers per warp
-   // -> 8 registers per thread
-   const size_t allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8);
--  const size_t max_blocks_regs =
-+  size_t max_blocks_regs =
-       regs_per_sm / (allocated_regs_per_thread * block_size);
- 
-+  // Compute the maximum number of warps as a function of the number of
-+  // registers
-+  const size_t max_warps_per_sm_registers =
-+      cuda_max_warps_per_sm_registers(device_props, func_attr);
-+
-+  // Constrain the number of blocks to respect the maximum number of warps per
-+  // SM On face value this should be an equality, but due to the warp
-+  // granularity constraints noted in `cuda_max_warps_per_sm_registers` the
-+  // left-hand-side of this comparison can overshoot what the hardware allows
-+  // based on register counts alone
-+  while ((max_blocks_regs * block_size / device_props.warpSize) >
-+         max_warps_per_sm_registers)
-+    max_blocks_regs--;
-+
-   // Compute how many threads per sm we actually want
-   const size_t max_threads_per_sm = device_props.maxThreadsPerMultiProcessor;
-   // only allocate multiples of warp size
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
-index 0e348c092a..7e4f62f12e 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
-@@ -67,6 +67,34 @@
- namespace Kokkos {
- namespace Impl {
- 
-+template <typename ParallelType, typename Policy, typename LaunchBounds>
-+int max_tile_size_product_helper(const Policy& pol, const LaunchBounds&) {
-+  cudaFuncAttributes attr =
-+      CudaParallelLaunch<ParallelType,
-+                         LaunchBounds>::get_cuda_func_attributes();
-+  auto const& prop = pol.space().cuda_device_prop();
-+
-+  // Limits due to registers/SM, MDRange doesn't have
-+  // shared memory constraints
-+  int const optimal_block_size =
-+      Kokkos::Impl::cuda_get_opt_block_size_no_shmem(attr, LaunchBounds{});
-+
-+  // Compute how many blocks of this size we can launch, based on warp
-+  // constraints
-+  int const max_warps_per_sm_registers =
-+      Kokkos::Impl::cuda_max_warps_per_sm_registers(prop, attr);
-+  int const max_num_threads_from_warps =
-+      max_warps_per_sm_registers * prop.warpSize;
-+  int const max_num_blocks = max_num_threads_from_warps / optimal_block_size;
-+
-+  // Compute the total number of threads
-+  int const max_threads_per_sm = optimal_block_size * max_num_blocks;
-+
-+  return std::min(
-+      max_threads_per_sm,
-+      static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
-+}
-+
- template <class FunctorType, class... Traits>
- class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
-  public:
-@@ -85,18 +113,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
-  public:
-   template <typename Policy, typename Functor>
-   static int max_tile_size_product(const Policy& pol, const Functor&) {
--    cudaFuncAttributes attr =
--        CudaParallelLaunch<ParallelFor,
--                           LaunchBounds>::get_cuda_func_attributes();
--    auto const& prop = pol.space().cuda_device_prop();
--    // Limits due to registers/SM, MDRange doesn't have
--    // shared memory constraints
--    int const regs_per_sm        = prop.regsPerMultiprocessor;
--    int const regs_per_thread    = attr.numRegs;
--    int const max_threads_per_sm = regs_per_sm / regs_per_thread;
--    return std::min(
--        max_threads_per_sm,
--        static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
-+    return max_tile_size_product_helper<ParallelFor>(pol, LaunchBounds{});
-   }
-   Policy const& get_policy() const { return m_rp; }
-   inline __device__ void operator()() const {
-@@ -258,17 +275,7 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
-  public:
-   template <typename Policy, typename Functor>
-   static int max_tile_size_product(const Policy& pol, const Functor&) {
--    cudaFuncAttributes attr =
--        CudaParallelLaunch<ParallelReduce,
--                           LaunchBounds>::get_cuda_func_attributes();
--    auto const& prop = pol.space().cuda_device_prop();
--    // Limits due do registers/SM
--    int const regs_per_sm        = prop.regsPerMultiprocessor;
--    int const regs_per_thread    = attr.numRegs;
--    int const max_threads_per_sm = regs_per_sm / regs_per_thread;
--    return std::min(
--        max_threads_per_sm,
--        static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
-+    return max_tile_size_product_helper<ParallelReduce>(pol, LaunchBounds{});
-   }
-   Policy const& get_policy() const { return m_policy; }
-   inline __device__ void exec_range(reference_type update) const {
diff --git a/lib/kokkos/kokkos_fix_5706_apply_last.diff b/lib/kokkos/kokkos_fix_5706_apply_last.diff
deleted file mode 100644
index 5d298323fd..0000000000
--- a/lib/kokkos/kokkos_fix_5706_apply_last.diff
+++ /dev/null
@@ -1,63 +0,0 @@
-diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
-index 170183ca0a..ba43e362bb 100644
---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
-+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
-@@ -412,12 +412,16 @@ struct CudaParallelLaunchKernelInvoker<
-       Impl::check_shmem_request(cuda_instance, shmem);
-       if (DriverType::Policy::
-                         experimental_contains_desired_occupancy) {
-+      /*
-         int desired_occupancy =
-             driver.get_policy().impl_get_desired_occupancy().value();
-         size_t block_size = block.x * block.y * block.z;
-         Impl::configure_shmem_preference<DriverType, LaunchBounds>(
-             base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
--            shmem, desired_occupancy);
-+            shmem, desired_occupancy);*/
-+        Kokkos::Impl::throw_runtime_exception(
-+        std::string("Cuda graph node creation FAILED:"
-+                    " occupancy requests are currently broken."));
-       }
- 
-       void const* args[] = {&driver};
-@@ -511,14 +515,17 @@ struct CudaParallelLaunchKernelInvoker<
- 
-     if (!Impl::is_empty_launch(grid, block)) {
-       Impl::check_shmem_request(cuda_instance, shmem);
--      if constexpr (DriverType::Policy::
-+      if (DriverType::Policy::
-                         experimental_contains_desired_occupancy) {
--        int desired_occupancy =
-+        /*int desired_occupancy =
-             driver.get_policy().impl_get_desired_occupancy().value();
-         size_t block_size = block.x * block.y * block.z;
-         Impl::configure_shmem_preference<DriverType, LaunchBounds>(
-             base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
--            shmem, desired_occupancy);
-+            shmem, desired_occupancy);*/
-+        Kokkos::Impl::throw_runtime_exception(
-+        std::string("Cuda graph node creation FAILED:"
-+                    " occupancy requests are currently broken."));
-       }
- 
-       auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver);
-@@ -690,14 +697,17 @@ struct CudaParallelLaunchImpl<
- 
-       if (DriverType::Policy::
-                         experimental_contains_desired_occupancy) {
--        int desired_occupancy =
-+        /*int desired_occupancy =
-             driver.get_policy().impl_get_desired_occupancy().value();
-         size_t block_size = block.x * block.y * block.z;
-         Impl::configure_shmem_preference<
-             DriverType,
-             Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
-             base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
--            shmem, desired_occupancy);
-+            shmem, desired_occupancy);*/
-+        Kokkos::Impl::throw_runtime_exception(
-+        std::string("Cuda graph node creation FAILED:"
-+                    " occupancy requests are currently broken."));
-       }
- 
-       KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
diff --git a/potentials/HGa.msmeam b/potentials/HGa.msmeam
new file mode 100644
index 0000000000..9f01501c16
--- /dev/null
+++ b/potentials/HGa.msmeam
@@ -0,0 +1,30 @@
+bkgd_dyn        =       1
+emb_lin_neg = 1
+augt1=0 
+ialloy=1 
+rc	=	 5.9 
+#H
+attrac(1,1)=0.460 
+repuls(1,1)=0.460 
+Cmin(1,1,1)=1.3 # PuMS
+Cmax(1,1,1)= 2.80 
+nn2(1,1)=1
+#Ga
+rho0(2)         =       0.6
+attrac(2,2)=0.097 
+repuls(2,2)=0.097 
+nn2(2,2)=1
+#HGa
+attrac(1,2)=0.300 
+repuls(1,2)=0.300 
+lattce(1,2)=l12 
+re(1,2)=3.19 
+delta(1,2)=-0.48  
+alpha(1,2)=6.6 
+Cmin(1,1,2)=2.0 
+Cmin(2,1,2)= 2.0 
+Cmin(1,2,1)=2.0 
+Cmin(2,2,1)     =       1.4
+Cmin(1,2,2)     =       1.4
+Cmin(1,1,2)     =       1.4
+nn2(1,2)=1
diff --git a/potentials/library.msmeam b/potentials/library.msmeam
new file mode 100644
index 0000000000..9937eaee08
--- /dev/null
+++ b/potentials/library.msmeam
@@ -0,0 +1,14 @@
+# DATE: 2018-09-22 UNITS: metal CONTRIBUTOR: Steve Valone, smv@lanl.gov CITATION: Baskes, PRB 1992; smv, sr, mib, JNM 2010
+# ms-meam data format May 2010
+#  elt   lat    z     ielement      atwt
+#  alpha b0     b1    b2     b3     b1m     b2m   b3m      alat   esub   asub
+#    -   t0     t1    t2     t3     t1m     t2m   t3m      rozero ibar
+#  NOTE:  leading character cannot be a space
+
+'H'    'dim'  1.0   1      1.0079
+2.960  2.960  3.0   1.0    1.0    1.0    3.0  1.0       0.741  2.235  2.50
+1.0    0.44721 0.0  0.00   0.0    0.31623 0  6.70 0
+
+'Ga4'  'fcc'  12.0  31     69.723
+4.42   4.80   3.10  6.00   0.00   0.0    0.0  0.5       4.247  2.897  0.97
+1.0    1.649 1.435  0.00   0.0    0.0  2.0       0.70   0
diff --git a/src/KOKKOS/Install.sh b/src/KOKKOS/Install.sh
index 1b58abe3a2..f493b5438a 100755
--- a/src/KOKKOS/Install.sh
+++ b/src/KOKKOS/Install.sh
@@ -316,6 +316,8 @@ action pair_lj_spica_kokkos.cpp pair_lj_spica.cpp
 action pair_lj_spica_kokkos.h pair_lj_spica.h
 action pair_meam_kokkos.cpp pair_meam.cpp
 action pair_meam_kokkos.h pair_meam.h
+action pair_meam_ms_kokkos.cpp pair_meam_ms.cpp
+action pair_meam_ms_kokkos.h pair_meam_ms.h
 action pair_mliap_kokkos.cpp pair_mliap.cpp
 action pair_mliap_kokkos.h pair_mliap.h
 action pair_morse_kokkos.cpp
diff --git a/src/KOKKOS/meam_dens_final_kokkos.h b/src/KOKKOS/meam_dens_final_kokkos.h
index bcc7b558dc..5e7ffdec20 100644
--- a/src/KOKKOS/meam_dens_final_kokkos.h
+++ b/src/KOKKOS/meam_dens_final_kokkos.h
@@ -61,34 +61,61 @@ void MEAMKokkos<DeviceType>::operator()(TagMEAMDensFinal, const int &i, EV_FLOAT
   if (elti >= 0) {
     scaleii = d_scale(type[i],type[i]);
     d_rho1[i] = 0.0;
-    d_rho2[i] = -1.0 / 3.0 * d_arho2b[i] * d_arho2b[i];
+    if (msmeamflag) {
+      d_rho2[i] = -1.0 / 3.0 * (d_arho2b[i] * d_arho2b[i]
+                              - d_arho2mb[i] * d_arho2mb[i]);
+    } else{
+      d_rho2[i] = -1.0 / 3.0 * d_arho2b[i] * d_arho2b[i];
+    }
     d_rho3[i] = 0.0;
     for (int m = 0; m < 3; m++) {
-      d_rho1[i] += d_arho1(i,m) * d_arho1(i,m);
-      d_rho3[i] -= 3.0 / 5.0 * d_arho3b(i,m) * d_arho3b(i,m);
-    }
-    for (int m = 0; m < 6; m++)
-      d_rho2[i] += v2D[m] * d_arho2(i,m) * d_arho2(i,m);
-    for (int m = 0; m < 10; m++)
-      d_rho3[i] += v3D[m] * d_arho3(i,m) * d_arho3(i,m);
-
-    if (d_rho0[i] > 0.0) {
-      if (ialloy == 1) {
-        d_t_ave(i,0) = fdiv_zero_kk(d_t_ave(i,0), d_tsq_ave(i,0));
-        d_t_ave(i,1) = fdiv_zero_kk(d_t_ave(i,1), d_tsq_ave(i,1));
-        d_t_ave(i,2) = fdiv_zero_kk(d_t_ave(i,2), d_tsq_ave(i,2));
-      } else if (ialloy == 2) {
-        d_t_ave(i,0) = t1_meam[elti];
-        d_t_ave(i,1) = t2_meam[elti];
-        d_t_ave(i,2) = t3_meam[elti];
-      } else {
-        d_t_ave(i,0) /= d_rho0[i];
-        d_t_ave(i,1) /= d_rho0[i];
-        d_t_ave(i,2) /= d_rho0[i];
+      if (msmeamflag) {
+        d_rho1[i] = d_rho1[i] + d_arho1(i, m) * d_arho1(i, m)
+                             - d_arho1m(i, m) * d_arho1m(i, m);
+        d_rho3[i] = d_rho3[i] - 3.0 / 5.0 * (d_arho3b(i, m) * d_arho3b(i, m)
+                                          - d_arho3mb(i, m) * d_arho3mb(i, m));
+      } else{
+        d_rho1[i] += d_arho1(i,m) * d_arho1(i,m);
+        d_rho3[i] -= 3.0 / 5.0 * d_arho3b(i,m) * d_arho3b(i,m);
       }
     }
+    for (int m = 0; m < 6; m++){
+      if (msmeamflag) {
+        d_rho2[i] = d_rho2[i] + v2D[m] * (d_arho2(i, m) * d_arho2(i, m)
+                                         - d_arho2m(i, m) * d_arho2m(i, m));
+      } else{
+        d_rho2[i] += v2D[m] * d_arho2(i,m) * d_arho2(i,m);
+      }
+    }
+    for (int m = 0; m < 10; m++)
+      if (msmeamflag) {
+        d_rho3[i] = d_rho3[i] + v3D[m] * (d_arho3(i, m) * d_arho3(i, m)
+                                        - d_arho3m(i, m) * d_arho3m(i, m));
+      } else{
+        d_rho3[i] += v3D[m] * d_arho3(i,m) * d_arho3(i,m);
+      }
 
-    d_gamma[i] = d_t_ave(i,0) * d_rho1[i] + d_t_ave(i,1) * d_rho2[i] + d_t_ave(i,2) * d_rho3[i];
+    if (msmeamflag) {
+      // with msmeam all t weights are already accounted for in rho
+      d_gamma[i] = d_rho1[i] + d_rho2[i] + d_rho3[i];
+    } else{
+      if (d_rho0[i] > 0.0) {
+        if (ialloy == 1) {
+          d_t_ave(i,0) = fdiv_zero_kk(d_t_ave(i,0), d_tsq_ave(i,0));
+          d_t_ave(i,1) = fdiv_zero_kk(d_t_ave(i,1), d_tsq_ave(i,1));
+          d_t_ave(i,2) = fdiv_zero_kk(d_t_ave(i,2), d_tsq_ave(i,2));
+        } else if (ialloy == 2) {
+          d_t_ave(i,0) = t1_meam[elti];
+          d_t_ave(i,1) = t2_meam[elti];
+          d_t_ave(i,2) = t3_meam[elti];
+        } else {
+          d_t_ave(i,0) /= d_rho0[i];
+          d_t_ave(i,1) /= d_rho0[i];
+          d_t_ave(i,2) /= d_rho0[i];
+        }
+      }
+      d_gamma[i] = d_t_ave(i,0) * d_rho1[i] + d_t_ave(i,1) * d_rho2[i] + d_t_ave(i,2) * d_rho3[i];
+    }
 
     if (d_rho0[i] > 0.0)
       d_gamma[i] /= (d_rho0[i] * d_rho0[i]);
diff --git a/src/KOKKOS/meam_dens_init_kokkos.h b/src/KOKKOS/meam_dens_init_kokkos.h
index 31ac046dcf..60bb6553d8 100644
--- a/src/KOKKOS/meam_dens_init_kokkos.h
+++ b/src/KOKKOS/meam_dens_init_kokkos.h
@@ -43,11 +43,23 @@ void MEAMKokkos<DeviceType>::operator()(TagMEAMZero, const int &i) const {
   d_rho0[i] = 0.0;
   d_arho2b[i] = 0.0;
   d_arho1(i,0) = d_arho1(i,1) = d_arho1(i,2) = 0.0;
-  for (int j = 0; j < 6; j++)
+  if (msmeamflag) {
+    d_arho2mb[i] = 0.0;
+    d_arho1m(i,0) = d_arho1m(i,1) = d_arho1m(i,2) = 0.0;
+  }
+  for (int j = 0; j < 6; j++) {
     d_arho2(i,j) = 0.0;
-  for (int j = 0; j < 10; j++)
+    if (msmeamflag)
+      d_arho2m(i,j) = 0.0;
+  }
+  for (int j = 0; j < 10; j++) {
     d_arho3(i,j) = 0.0;
+    if (msmeamflag)
+      d_arho3m(i,j) = 0.0;
+  }
   d_arho3b(i,0) = d_arho3b(i,1) = d_arho3b(i,2) = 0.0;
+  if (msmeamflag)
+    d_arho3mb(i,0) = d_arho3mb(i,1) = d_arho3mb(i,2) = 0.0;
   d_t_ave(i,0) = d_t_ave(i,1) = d_t_ave(i,2) = 0.0;
   d_tsq_ave(i,0) = d_tsq_ave(i,1) = d_tsq_ave(i,2) = 0.0;
 }
@@ -80,13 +92,20 @@ MEAMKokkos<DeviceType>::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
     memoryKK->destroy_kokkos(k_arho3b,arho3b);
     memoryKK->destroy_kokkos(k_t_ave,t_ave);
     memoryKK->destroy_kokkos(k_tsq_ave,tsq_ave);
+    // msmeam
+    memoryKK->destroy_kokkos(k_arho2mb, arho2mb);
+    memoryKK->destroy_kokkos(k_arho1m, arho1m);
+    memoryKK->destroy_kokkos(k_arho2m, arho2m);
+    memoryKK->destroy_kokkos(k_arho3m, arho3m);
+    memoryKK->destroy_kokkos(k_arho3mb, arho3mb);
 
     nmax = atom_nmax;
-//    memory->create(rho, nmax, "pair:rho");
+
+    //memory->create(rho, nmax, "pair:rho");
     k_rho = DAT::tdual_ffloat_1d("pair:rho",nmax);
     d_rho = k_rho.template view<DeviceType>();
     h_rho = k_rho.h_view;
- //   memory->create(rho0, nmax, "pair:rho0");
+    //memory->create(rho0, nmax, "pair:rho0");
     k_rho0 = DAT::tdual_ffloat_1d("pair:rho0",nmax);
     d_rho0 = k_rho0.template view<DeviceType>();
     h_rho0 = k_rho0.h_view;
@@ -150,6 +169,28 @@ MEAMKokkos<DeviceType>::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
     k_tsq_ave = DAT::tdual_ffloat_2d("pair:tsq_ave",nmax, 3);
     d_tsq_ave = k_tsq_ave.template view<DeviceType>();
     h_tsq_ave = k_tsq_ave.h_view;
+
+    // msmeam
+    //memory->create(arho2mb, nmax, "pair:arho2mb");
+    k_arho2mb = DAT::tdual_ffloat_1d("pair:arho2mb",nmax);
+    d_arho2mb = k_arho2mb.template view<DeviceType>();
+    h_arho2mb = k_arho2mb.h_view;
+    //memory->create(arho1m, nmax, 3, "pair:arho1m");
+    k_arho1m = DAT::tdual_ffloat_2d("pair:arho1m", nmax, 3);
+    d_arho1m = k_arho1m.template view<DeviceType>();
+    h_arho1m = k_arho1m.h_view;
+    //memory->create(arho2m, nmax, 6, "pair:arho2m");
+    k_arho2m = DAT::tdual_ffloat_2d("pair:arho2m", nmax, 6);
+    d_arho2m = k_arho2m.template view<DeviceType>();
+    h_arho2m = k_arho2m.h_view;
+    //memory->create(arho3m, nmax, 10, "pair:arho3m");
+    k_arho3m = DAT::tdual_ffloat_2d("pair:arho3m", nmax, 10);
+    d_arho3m = k_arho3m.template view<DeviceType>();
+    h_arho3m = k_arho3m.h_view;
+    //memory->create(arho3mb, nmax, 3, "pair:arho3mb");
+    k_arho3mb = DAT::tdual_ffloat_2d("pair:arho3mb", nmax, 3);
+    d_arho3mb = k_arho3mb.template view<DeviceType>();
+    h_arho3mb = k_arho3mb.h_view;
   }
 
   if (n_neigh > maxneigh) {
@@ -206,6 +247,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
     dup_arho3b = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho3b);
     dup_t_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_t_ave);
     dup_tsq_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_tsq_ave);
+    // msmeam
+    dup_arho2mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho2mb);
+    dup_arho1m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho1m);
+    dup_arho2m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho2m);
+    dup_arho3m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho3m);
+    dup_arho3mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho3mb);
   } else {
     ndup_rho0 = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_rho0);
     ndup_arho2b = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho2b);
@@ -215,6 +262,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
     ndup_arho3b = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho3b);
     ndup_t_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_t_ave);
     ndup_tsq_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_tsq_ave);
+    // msmeam
+    ndup_arho2mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho2mb);
+    ndup_arho1m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho1m);
+    ndup_arho2m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho2m);
+    ndup_arho3m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho3m);
+    ndup_arho3mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho3mb);
   }
 
   copymode = 1;
@@ -233,6 +286,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
     Kokkos::Experimental::contribute(d_arho3b, dup_arho3b);
     Kokkos::Experimental::contribute(d_t_ave, dup_t_ave);
     Kokkos::Experimental::contribute(d_tsq_ave, dup_tsq_ave);
+    // msmeam
+    Kokkos::Experimental::contribute(d_arho2mb, dup_arho2mb);
+    Kokkos::Experimental::contribute(d_arho1m, dup_arho1m);
+    Kokkos::Experimental::contribute(d_arho2m, dup_arho2m);
+    Kokkos::Experimental::contribute(d_arho3m, dup_arho3m);
+    Kokkos::Experimental::contribute(d_arho3mb, dup_arho3mb);
 
     // free duplicated memory
     dup_rho0 = decltype(dup_rho0)();
@@ -243,6 +302,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
     dup_arho3b = decltype(dup_arho3b)();
     dup_t_ave = decltype(dup_t_ave)();
     dup_tsq_ave = decltype(dup_tsq_ave)();
+    // msmeam
+    dup_arho2mb = decltype(dup_arho2mb)();
+    dup_arho1m = decltype(dup_arho1m)();
+    dup_arho2m = decltype(dup_arho2m)();
+    dup_arho3m = decltype(dup_arho3m)();
+    dup_arho3mb = decltype(dup_arho3mb)();
   }
 }
 
@@ -417,7 +482,6 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
                 int offset) const
 {
   // The rho0, etc. arrays are duplicated for OpenMP, atomic for CUDA, and neither for Serial
-
   auto v_rho0 = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_rho0),decltype(ndup_rho0)>::get(dup_rho0,ndup_rho0);
   auto a_rho0 = v_rho0.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
   auto v_arho2b = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho2b),decltype(ndup_arho2b)>::get(dup_arho2b,ndup_arho2b);
@@ -434,6 +498,17 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
   auto a_t_ave = v_t_ave.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
   auto v_tsq_ave = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_tsq_ave),decltype(ndup_tsq_ave)>::get(dup_tsq_ave,ndup_tsq_ave);
   auto a_tsq_ave = v_tsq_ave.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
+  // msmeam
+  auto v_arho2mb = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho2mb),decltype(ndup_arho2mb)>::get(dup_arho2mb,ndup_arho2mb);
+  auto a_arho2mb = v_arho2mb.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
+  auto v_arho1m = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho1m),decltype(ndup_arho1m)>::get(dup_arho1m,ndup_arho1m);
+  auto a_arho1m = v_arho1m.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
+  auto v_arho2m = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho2m),decltype(ndup_arho2m)>::get(dup_arho2m,ndup_arho2m);
+  auto a_arho2m = v_arho2m.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
+  auto v_arho3m = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho3m),decltype(ndup_arho3m)>::get(dup_arho3m,ndup_arho3m);
+  auto a_arho3m = v_arho3m.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
+  auto v_arho3mb = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho3mb),decltype(ndup_arho3mb)>::get(dup_arho3mb,ndup_arho3mb);
+  auto a_arho3mb = v_arho3mb.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
 
   const int elti = d_map[type[i]];
   const double xtmp = x(i,0);
@@ -463,6 +538,16 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
         double rhoa1i = ro0i * MathSpecialKokkos::fm_exp(-beta1_meam[elti] * ai) * sij;
         double rhoa2i = ro0i * MathSpecialKokkos::fm_exp(-beta2_meam[elti] * ai) * sij;
         double rhoa3i = ro0i * MathSpecialKokkos::fm_exp(-beta3_meam[elti] * ai) * sij;
+        // msmeam
+        double rhoa1mj, rhoa2mj, rhoa3mj, rhoa1mi, rhoa2mi, rhoa3mi;
+        if (msmeamflag) {
+          rhoa1mj = ro0j * t1m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta1m_meam[eltj] * aj) * sij;
+          rhoa2mj = ro0j * t2m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta2m_meam[eltj] * aj) * sij;
+          rhoa3mj = ro0j * t3m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta3m_meam[eltj] * aj) * sij;
+          rhoa1mi = ro0i * t1m_meam[elti] * MathSpecialKokkos::fm_exp(-beta1m_meam[elti] * ai) * sij;
+          rhoa2mi = ro0i * t2m_meam[elti] * MathSpecialKokkos::fm_exp(-beta2m_meam[elti] * ai) * sij;
+          rhoa3mi = ro0i * t3m_meam[elti] * MathSpecialKokkos::fm_exp(-beta3m_meam[elti] * ai) * sij;
+        }
         if (ialloy == 1) {
           rhoa1j *= t1_meam[eltj];
           rhoa2j *= t2_meam[eltj];
@@ -499,20 +584,45 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
         const double A1i = rhoa1i / rij;
         const double A2i = rhoa2i / rij2;
         const double A3i = rhoa3i / (rij2 * rij);
+        double A1mj, A2mj, A3mj, A1mi, A2mi, A3mi;
+        if (msmeamflag) {
+          a_arho2mb[i] += rhoa2mj;
+          a_arho2mb[j] += rhoa2mi;
+          A1mj = rhoa1mj / rij;
+          A2mj = rhoa2mj / rij2;
+          A3mj = rhoa3mj / (rij2 * rij);
+          A1mi = rhoa1mi / rij;
+          A2mi = rhoa2mi / rij2;
+          A3mi = rhoa3mi / (rij2 * rij);
+        }
         int nv2 = 0;
         int nv3 = 0;
         for (int m = 0; m < 3; m++) {
-          a_arho1(i,m) += A1j * delij[m];
+          a_arho1(i,m) +=  A1j * delij[m];
           a_arho1(j,m) += -A1i * delij[m];
-          a_arho3b(i,m) += rhoa3j * delij[m] / rij;
+          a_arho3b(i,m) +=  rhoa3j * delij[m] / rij;
           a_arho3b(j,m) += -rhoa3i * delij[m] / rij;
+          if (msmeamflag) {
+            a_arho1m(i,m) +=  A1mj * delij[m];
+            a_arho1m(j,m) += -A1mi * delij[m];
+            a_arho3mb(i,m) +=  rhoa3mj * delij[m] / rij;
+            a_arho3mb(j,m) += -rhoa3mi * delij[m] / rij;
+          }
           for (int n = m; n < 3; n++) {
             a_arho2(i,nv2) += A2j * delij[m] * delij[n];
             a_arho2(j,nv2) += A2i * delij[m] * delij[n];
+            if (msmeamflag) {
+              a_arho2m(i,nv2) += A2mj * delij[m] * delij[n];
+              a_arho2m(j,nv2) += A2mi * delij[m] * delij[n];
+            }
             nv2++;
             for (int p = n; p < 3; p++) {
-              a_arho3(i,nv3) += A3j * delij[m] * delij[n] * delij[p];
+              a_arho3(i,nv3) +=  A3j * delij[m] * delij[n] * delij[p];
               a_arho3(j,nv3) += -A3i * delij[m] * delij[n] * delij[p];
+              if (msmeamflag) {
+                a_arho3m(i,nv3) +=  A3mj * delij[m] * delij[n] * delij[p];
+                a_arho3m(j,nv3) += -A3mi * delij[m] * delij[n] * delij[p];
+              }
               nv3++;
             }
           }
diff --git a/src/KOKKOS/meam_force_kokkos.h b/src/KOKKOS/meam_force_kokkos.h
index e7e6c64231..5c4244e99b 100644
--- a/src/KOKKOS/meam_force_kokkos.h
+++ b/src/KOKKOS/meam_force_kokkos.h
@@ -119,6 +119,17 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
   double dsij1, dsij2, force1, force2;
   double t1i, t2i, t3i, t1j, t2j, t3j;
   int fnoffset;
+  // msmeam
+  double rhoa1mj,drhoa1mj,rhoa1mi,drhoa1mi;
+  double rhoa2mj,drhoa2mj,rhoa2mi,drhoa2mi;
+  double rhoa3mj, drhoa3mj, rhoa3mi, drhoa3mi;
+  double arg1i1m, arg1j1m, arg1i2m, arg1j2m, arg1i3m, arg1j3m, arg3i3m, arg3j3m;
+  double drho1mdr1, drho1mdr2, drho1mds1, drho1mds2;
+  double drho1mdrm1[3], drho1mdrm2[3];
+  double drho2mdr1, drho2mdr2, drho2mds1, drho2mds2;
+  double drho2mdrm1[3], drho2mdrm2[3];
+  double drho3mdr1, drho3mdr2, drho3mds1, drho3mds2;
+  double drho3mdrm1[3], drho3mdrm2[3];
 
   // The f, etc. arrays are duplicated for OpenMP, atomic for CUDA, and neither for Serial
 
@@ -197,6 +208,14 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
         drhoa2i = -beta2_meam[elti] * invrei * rhoa2i;
         rhoa3i = ro0i * MathSpecialKokkos::fm_exp(-beta3_meam[elti] * ai);
         drhoa3i = -beta3_meam[elti] * invrei * rhoa3i;
+        if (msmeamflag) {
+          rhoa1mi = ro0i * MathSpecialKokkos::fm_exp(-beta1m_meam[elti] * ai) * t1m_meam[elti];
+          drhoa1mi = -beta1m_meam[elti] * invrei * rhoa1mi;
+          rhoa2mi = ro0i * MathSpecialKokkos::fm_exp(-beta2m_meam[elti] * ai) * t2m_meam[elti];
+          drhoa2mi = -beta2m_meam[elti] * invrei * rhoa2mi;
+          rhoa3mi = ro0i * MathSpecialKokkos::fm_exp(-beta3m_meam[elti] * ai) * t3m_meam[elti];
+          drhoa3mi = -beta3m_meam[elti] * invrei * rhoa3mi;
+        }
 
         if (elti != eltj) {
           invrej = 1.0 / re_meam[eltj][eltj];
@@ -210,6 +229,14 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
           drhoa2j = -beta2_meam[eltj] * invrej * rhoa2j;
           rhoa3j = ro0j * MathSpecialKokkos::fm_exp(-beta3_meam[eltj] * aj);
           drhoa3j = -beta3_meam[eltj] * invrej * rhoa3j;
+          if (msmeamflag) {
+            rhoa1mj = ro0j * MathSpecialKokkos::fm_exp(-beta1m_meam[eltj] * aj) * t1m_meam[eltj];
+            drhoa1mj = -beta1m_meam[eltj] * invrej * rhoa1mj;
+            rhoa2mj = ro0j * MathSpecialKokkos::fm_exp(-beta2m_meam[eltj] * aj) * t2m_meam[eltj];
+            drhoa2mj = -beta2m_meam[eltj] * invrej * rhoa2mj;
+            rhoa3mj = ro0j * MathSpecialKokkos::fm_exp(-beta3m_meam[eltj] * aj) * t3m_meam[eltj];
+            drhoa3mj = -beta3m_meam[eltj] * invrej * rhoa3mj;
+          }
         } else {
           rhoa0j = rhoa0i;
           drhoa0j = drhoa0i;
@@ -219,6 +246,14 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
           drhoa2j = drhoa2i;
           rhoa3j = rhoa3i;
           drhoa3j = drhoa3i;
+          if (msmeamflag) {
+            rhoa1mj = rhoa1mi;
+            drhoa1mj = drhoa1mi;
+            rhoa2mj = rhoa2mi;
+            drhoa2mj = drhoa2mi;
+            rhoa3mj = rhoa3mi;
+            drhoa3mj = drhoa3mi;
+          }
         }
 
         const double t1mi = t1_meam[elti];
@@ -228,7 +263,10 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
         const double t2mj = t2_meam[eltj];
         const double t3mj = t3_meam[eltj];
 
-        if (ialloy == 1) {
+        // ialloy mod not needed in MS-MEAM, but similarity here is that we multply rhos by t.
+        // We did this above with rhoa1mj, rhoa2mj, etc.
+
+        if (ialloy == 1 || msmeamflag) {
           rhoa1j *= t1mj;
           rhoa2j *= t2mj;
           rhoa3j *= t3mj;
@@ -272,6 +310,38 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
           arg3j3 = arg3j3 - d_arho3b(j, n) * delij[n];
         }
 
+        // msmeam arhom args
+        nv2 = 0.0;
+        nv3 = 0.0;
+        arg1i1m = 0.0;
+        arg1j1m = 0.0;
+        arg1i2m = 0.0;
+        arg1j2m = 0.0;
+        arg1i3m = 0.0;
+        arg1j3m = 0.0;
+        arg3i3m = 0.0;
+        arg3j3m = 0.0;
+        if (msmeamflag) {
+          for (n = 0; n < 3; n++) {
+            for (p = n; p < 3; p++) {
+              for (q = p; q < 3; q++) {
+                arg = delij[n] * delij[p] * delij[q] * v3D[nv3];
+                arg1i3m = arg1i3m - d_arho3m(i, nv3) * arg;
+                arg1j3m = arg1j3m + d_arho3m(j, nv3) * arg;
+                nv3 = nv3 + 1;
+              }
+              arg = delij[n] * delij[p] * v2D[nv2];
+              arg1i2m = arg1i2m + d_arho2m(i, nv2) * arg;
+              arg1j2m = arg1j2m + d_arho2m(j, nv2) * arg;
+              nv2 = nv2 + 1;
+            }
+            arg1i1m = arg1i1m - d_arho1m(i, n) * delij[n];
+            arg1j1m = arg1j1m + d_arho1m(j, n) * delij[n];
+            arg3i3m = arg3i3m - d_arho3mb(i, n) * delij[n];
+            arg3j3m = arg3j3m + d_arho3mb(j, n) * delij[n];
+          }
+        }
+
         // rho0 terms
         drho0dr1 = drhoa0j * sij;
         drho0dr2 = drhoa0i * sij;
@@ -330,75 +400,183 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
           drho3drm2[m] = (-a3 * drho3drm2[m] + a3a * d_arho3b(j, m)) * rhoa3i;
         }
 
+        if (msmeamflag) {
+          //     rho1m terms
+          a1 = 2 * sij / rij;
+          drho1mdr1 = a1 * (drhoa1mj - rhoa1mj / rij) * arg1i1m;
+          drho1mdr2 = a1 * (drhoa1mi - rhoa1mi / rij) * arg1j1m;
+          drho1mdr1 *= -1.0;
+          drho1mdr2 *= -1.0;
+          a1 = 2.0 * sij / rij;
+          for (m = 0; m < 3; m++) {
+            drho1mdrm1[m] = a1 * rhoa1mj * d_arho1m(i, m);
+            drho1mdrm2[m] = -a1 * rhoa1mi * d_arho1m(j, m);
+          }
+
+          //     rho2m terms
+          a2 = 2 * sij / rij2;
+          drho2mdr1 = a2 * (drhoa2mj - 2 * rhoa2mj / rij) * arg1i2m - 2.0 / 3.0 * d_arho2mb[i] * drhoa2mj * sij;
+          drho2mdr2 = a2 * (drhoa2mi - 2 * rhoa2mi / rij) * arg1j2m - 2.0 / 3.0 * d_arho2mb[j] * drhoa2mi * sij;
+          a2 = 4 * sij / rij2;
+          for (m = 0; m < 3; m++) {
+            drho2mdrm1[m] = 0.0;
+            drho2mdrm2[m] = 0.0;
+            for (n = 0; n < 3; n++) {
+              drho2mdrm1[m] = drho2mdrm1[m] + d_arho2m(i, vind2D[m][n]) * delij[n];
+              drho2mdrm2[m] = drho2mdrm2[m] - d_arho2m(j, vind2D[m][n]) * delij[n];
+            }
+            drho2mdrm1[m] = a2 * rhoa2mj * drho2mdrm1[m];
+            drho2mdrm2[m] = -a2 * rhoa2mi * drho2mdrm2[m];
+          }
+
+          //     rho3m terms
+          rij3 = rij * rij2;
+          a3 = 2 * sij / rij3;
+          a3a = 6.0 / 5.0 * sij / rij;
+          drho3mdr1 = a3 * (drhoa3mj - 3 * rhoa3mj / rij) * arg1i3m - a3a * (drhoa3mj - rhoa3mj / rij) * arg3i3m;
+          drho3mdr2 = a3 * (drhoa3mi - 3 * rhoa3mi / rij) * arg1j3m - a3a * (drhoa3mi - rhoa3mi / rij) * arg3j3m;
+          drho3mdr1 *= -1.0;
+          drho3mdr2 *= -1.0;
+
+          a3 = 6 * sij / rij3;
+          a3a = 6 * sij / (5 * rij);
+          for (m = 0; m < 3; m++) {
+            drho3mdrm1[m] = 0.0;
+            drho3mdrm2[m] = 0.0;
+            nv2 = 0;
+            for (n = 0; n < 3; n++) {
+              for (p = n; p < 3; p++) {
+                arg = delij[n] * delij[p] * this->v2D[nv2];
+                drho3mdrm1[m] = drho3mdrm1[m] + d_arho3m(i, vind3D[m][n][p]) * arg;
+                drho3mdrm2[m] = drho3mdrm2[m] + d_arho3m(j, vind3D[m][n][p]) * arg;
+                nv2 = nv2 + 1;
+              }
+            }
+            drho3mdrm1[m] = (a3 * drho3mdrm1[m] - a3a * d_arho3mb(i, m)) * rhoa3mj;
+            drho3mdrm2[m] = (-a3 * drho3mdrm2[m] + a3a * d_arho3mb(j, m)) * rhoa3mi;
+          }
+        } else {
+          for (m = 0; m < 3; m++) {
+            drho1mdrm1[m] = 0.0;
+            drho1mdrm2[m] = 0.0;
+            drho2mdrm1[m] = 0.0;
+            drho2mdrm2[m] = 0.0;
+            drho3mdrm1[m] = 0.0;
+            drho3mdrm2[m] = 0.0;
+          }
+        }
+
         // Compute derivatives of weighting functions t wrt rij
-        t1i = d_t_ave(i, 0);
-        t2i = d_t_ave(i, 1);
-        t3i = d_t_ave(i, 2);
-        t1j = d_t_ave(j, 0);
-        t2j = d_t_ave(j, 1);
-        t3j = d_t_ave(j, 2);
-
-        if (ialloy == 1) {
-
-          a1i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 0));
-          a1j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 0));
-          a2i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 1));
-          a2j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 1));
-          a3i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 2));
-          a3j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 2));
-
-          dt1dr1 = a1i * (t1mj - t1i * MathSpecialKokkos::square(t1mj));
-          dt1dr2 = a1j * (t1mi - t1j * MathSpecialKokkos::square(t1mi));
-          dt2dr1 = a2i * (t2mj - t2i * MathSpecialKokkos::square(t2mj));
-          dt2dr2 = a2j * (t2mi - t2j * MathSpecialKokkos::square(t2mi));
-          dt3dr1 = a3i * (t3mj - t3i * MathSpecialKokkos::square(t3mj));
-          dt3dr2 = a3j * (t3mi - t3j * MathSpecialKokkos::square(t3mi));
-
-        } else if (ialloy == 2) {
+        // Weighting functions t set to unity for msmeam
 
+        if (msmeamflag) {
+          t1i = 1.0;
+          t2i = 1.0;
+          t3i = 1.0;
+          t1j = 1.0;
+          t2j = 1.0;
+          t3j = 1.0;
           dt1dr1 = 0.0;
           dt1dr2 = 0.0;
           dt2dr1 = 0.0;
           dt2dr2 = 0.0;
           dt3dr1 = 0.0;
           dt3dr2 = 0.0;
-
         } else {
 
-          ai = 0.0;
-          if (!iszero_kk(d_rho0[i])) ai = drhoa0j * sij / d_rho0[i];
-          aj = 0.0;
-          if (!iszero_kk(d_rho0[j])) aj = drhoa0i * sij / d_rho0[j];
+          t1i = d_t_ave(i, 0);
+          t2i = d_t_ave(i, 1);
+          t3i = d_t_ave(i, 2);
+          t1j = d_t_ave(j, 0);
+          t2j = d_t_ave(j, 1);
+          t3j = d_t_ave(j, 2);
 
-          dt1dr1 = ai * (t1mj - t1i);
-          dt1dr2 = aj * (t1mi - t1j);
-          dt2dr1 = ai * (t2mj - t2i);
-          dt2dr2 = aj * (t2mi - t2j);
-          dt3dr1 = ai * (t3mj - t3i);
-          dt3dr2 = aj * (t3mi - t3j);
+          if (ialloy == 1) {
+
+            a1i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 0));
+            a1j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 0));
+            a2i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 1));
+            a2j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 1));
+            a3i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 2));
+            a3j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 2));
+
+            dt1dr1 = a1i * (t1mj - t1i * MathSpecialKokkos::square(t1mj));
+            dt1dr2 = a1j * (t1mi - t1j * MathSpecialKokkos::square(t1mi));
+            dt2dr1 = a2i * (t2mj - t2i * MathSpecialKokkos::square(t2mj));
+            dt2dr2 = a2j * (t2mi - t2j * MathSpecialKokkos::square(t2mi));
+            dt3dr1 = a3i * (t3mj - t3i * MathSpecialKokkos::square(t3mj));
+            dt3dr2 = a3j * (t3mi - t3j * MathSpecialKokkos::square(t3mi));
+
+          } else if (ialloy == 2) {
+
+            dt1dr1 = 0.0;
+            dt1dr2 = 0.0;
+            dt2dr1 = 0.0;
+            dt2dr2 = 0.0;
+            dt3dr1 = 0.0;
+            dt3dr2 = 0.0;
+
+          } else {
+
+            ai = 0.0;
+            if (!iszero_kk(d_rho0[i])) ai = drhoa0j * sij / d_rho0[i];
+            aj = 0.0;
+            if (!iszero_kk(d_rho0[j])) aj = drhoa0i * sij / d_rho0[j];
+
+            dt1dr1 = ai * (t1mj - t1i);
+            dt1dr2 = aj * (t1mi - t1j);
+            dt2dr1 = ai * (t2mj - t2i);
+            dt2dr2 = aj * (t2mi - t2j);
+            dt3dr1 = ai * (t3mj - t3i);
+            dt3dr2 = aj * (t3mi - t3j);
+          }
         }
 
         // Compute derivatives of total density wrt rij, sij and rij(3)
         get_shpfcn(lattce_meam[elti][elti], stheta_meam[elti][elti], ctheta_meam[elti][elti], shpi);
         get_shpfcn(lattce_meam[eltj][eltj], stheta_meam[elti][elti], ctheta_meam[elti][elti], shpj);
 
-        drhodr1 = d_dgamma1[i] * drho0dr1 +
-            d_dgamma2[i] *
-                (dt1dr1 * d_rho1[i] + t1i * drho1dr1 + dt2dr1 * d_rho2[i] + t2i * drho2dr1 +
-                 dt3dr1 * d_rho3[i] + t3i * drho3dr1) -
-            d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
-        drhodr2 = d_dgamma1[j] * drho0dr2 +
-            d_dgamma2[j] *
-                (dt1dr2 * d_rho1[j] + t1j * drho1dr2 + dt2dr2 * d_rho2[j] + t2j * drho2dr2 +
-                 dt3dr2 * d_rho3[j] + t3j * drho3dr2) -
-            d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
-        for (m = 0; m < 3; m++) {
-          drhodrm1[m] = 0.0;
-          drhodrm2[m] = 0.0;
-          drhodrm1[m] =
-              d_dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
-          drhodrm2[m] =
-              d_dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
+
+        if (msmeamflag) {
+          drhodr1 = d_dgamma1[i] * drho0dr1 +
+                    d_dgamma2[i] * (dt1dr1 * d_rho1[i] + t1i * (drho1dr1 - drho1mdr1) +
+                                    dt2dr1 * d_rho2[i] + t2i * (drho2dr1 - drho2mdr1) +
+                                    dt3dr1 * d_rho3[i] + t3i * (drho3dr1 - drho3mdr1)) -
+                    d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
+          drhodr2 = d_dgamma1[j] * drho0dr2 +
+                    d_dgamma2[j] * (dt1dr2 * d_rho1[j] + t1j * (drho1dr2 - drho1mdr2) +
+                                    dt2dr2 * d_rho2[j] + t2j * (drho2dr2 - drho2mdr2) +
+                                    dt3dr2 * d_rho3[j] + t3j * (drho3dr2 - drho3mdr2)) -
+                    d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
+          for (m = 0; m < 3; m++) {
+            drhodrm1[m] = 0.0;
+            drhodrm2[m] = 0.0;
+            drhodrm1[m] = d_dgamma2[i] * (t1i * (drho1drm1[m] - drho1mdrm1[m]) +
+                                          t2i * (drho2drm1[m] - drho2mdrm1[m]) +
+                                          t3i * (drho3drm1[m] - drho3mdrm1[m]) );
+            drhodrm2[m] = d_dgamma2[j] * (t1j * (drho1drm2[m] - drho1mdrm2[m]) +
+                                          t2j * (drho2drm2[m] - drho2mdrm2[m]) +
+                                          t3j * (drho3drm2[m] - drho3mdrm2[m]) );
+          }
+        } else {
+          drhodr1 = d_dgamma1[i] * drho0dr1 +
+              d_dgamma2[i] *
+                  (dt1dr1 * d_rho1[i] + t1i * drho1dr1 + dt2dr1 * d_rho2[i] + t2i * drho2dr1 +
+                   dt3dr1 * d_rho3[i] + t3i * drho3dr1) -
+              d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
+          drhodr2 = d_dgamma1[j] * drho0dr2 +
+              d_dgamma2[j] *
+                  (dt1dr2 * d_rho1[j] + t1j * drho1dr2 + dt2dr2 * d_rho2[j] + t2j * drho2dr2 +
+                   dt3dr2 * d_rho3[j] + t3j * drho3dr2) -
+              d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
+          for (m = 0; m < 3; m++) {
+            drhodrm1[m] = 0.0;
+            drhodrm2[m] = 0.0;
+            drhodrm1[m] =
+                d_dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
+            drhodrm2[m] =
+                d_dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
+          }
         }
 
         // Compute derivatives wrt sij, but only if necessary
@@ -416,6 +594,24 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
           drho3ds1 = a3 * rhoa3j * arg1i3 - a3a * rhoa3j * arg3i3;
           drho3ds2 = a3 * rhoa3i * arg1j3 - a3a * rhoa3i * arg3j3;
 
+          if (msmeamflag) {
+            drho1mds1 = a1 * rhoa1mj * arg1i1m;
+            drho1mds2 = a1 * rhoa1mi * arg1j1m;
+            drho2mds1 = a2 * rhoa2mj * arg1i2m - 2.0 / 3.0 * d_arho2mb[i] * rhoa2mj;
+            drho2mds2 = a2 * rhoa2mi * arg1j2m - 2.0 / 3.0 * d_arho2mb[j] * rhoa2mi;
+            drho3mds1 = a3 * rhoa3mj * arg1i3m - a3a * rhoa3mj * arg3i3m;
+            drho3mds2 = a3 * rhoa3mi * arg1j3m - a3a * rhoa3mi * arg3j3m;
+            drho3mds1 *= -1;
+            drho3mds2 *= -1;
+          } else {
+            drho1mds1 = 0.0;
+            drho1mds2 = 0.0;
+            drho2mds1 = 0.0;
+            drho2mds2 = 0.0;
+            drho3mds1 = 0.0;
+            drho3mds2 = 0.0;
+          }
+
           if (ialloy == 1) {
             a1i = fdiv_zero_kk(rhoa0j, d_tsq_ave(i, 0));
             a1j = fdiv_zero_kk(rhoa0i, d_tsq_ave(j, 0));
@@ -455,19 +651,33 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
             dt3ds2 = aj * (t3mi - t3j);
           }
 
-          drhods1 = d_dgamma1[i] * drho0ds1 +
-              d_dgamma2[i] *
-                  (dt1ds1 * d_rho1[i] + t1i * drho1ds1 + dt2ds1 * d_rho2[i] + t2i * drho2ds1 +
-                   dt3ds1 * d_rho3[i] + t3i * drho3ds1) -
+          if (msmeamflag) {
+            drhods1 = d_dgamma1[i] * drho0ds1 +
+              d_dgamma2[i] * (dt1ds1 * d_rho1[i] + t1i * (drho1ds1 - drho1mds1) +
+                              dt2ds1 * d_rho2[i] + t2i * (drho2ds1 - drho2mds1) +
+                              dt3ds1 * d_rho3[i] + t3i * (drho3ds1 - drho3mds1)) -
               d_dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
-          drhods2 = d_dgamma1[j] * drho0ds2 +
-              d_dgamma2[j] *
-                  (dt1ds2 * d_rho1[j] + t1j * drho1ds2 + dt2ds2 * d_rho2[j] + t2j * drho2ds2 +
-                   dt3ds2 * d_rho3[j] + t3j * drho3ds2) -
+            drhods2 = d_dgamma1[j] * drho0ds2 +
+              d_dgamma2[j] * (dt1ds2 * d_rho1[j] + t1j * (drho1ds2 - drho1mds2) +
+                              dt2ds2 * d_rho2[j] + t2j * (drho2ds2 - drho2mds2) +
+                              dt3ds2 * d_rho3[j] + t3j * (drho3ds2 - drho3mds2)) -
               d_dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
+          } else {
+            drhods1 = d_dgamma1[i] * drho0ds1 +
+                d_dgamma2[i] *
+                    (dt1ds1 * d_rho1[i] + t1i * drho1ds1 + dt2ds1 * d_rho2[i] + t2i * drho2ds1 +
+                     dt3ds1 * d_rho3[i] + t3i * drho3ds1) -
+                d_dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
+            drhods2 = d_dgamma1[j] * drho0ds2 +
+                d_dgamma2[j] *
+                    (dt1ds2 * d_rho1[j] + t1j * drho1ds2 + dt2ds2 * d_rho2[j] + t2j * drho2ds2 +
+                     dt3ds2 * d_rho3[j] + t3j * drho3ds2) -
+                d_dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
+          }
         }
 
         // Compute derivatives of energy wrt rij, sij and rij[3]
+
         dUdrij = phip * sij + d_frhop[i] * drhodr1 + d_frhop[j] * drhodr2;
         dUdsij = 0.0;
         if (!iszero_kk(d_dscrfcn[fnoffset + jn])) {
diff --git a/src/KOKKOS/meam_impl_kokkos.h b/src/KOKKOS/meam_impl_kokkos.h
index bdd8728356..9d630d724e 100644
--- a/src/KOKKOS/meam_impl_kokkos.h
+++ b/src/KOKKOS/meam_impl_kokkos.h
@@ -58,6 +58,14 @@ MEAMKokkos<DeviceType>::~MEAMKokkos()
   memoryKK->destroy_kokkos(k_scrfcn,scrfcn);
   memoryKK->destroy_kokkos(k_dscrfcn,dscrfcn);
   memoryKK->destroy_kokkos(k_fcpair,fcpair);
+
+  // msmeam
+
+  memoryKK->destroy_kokkos(k_arho2mb, arho2mb);
+  memoryKK->destroy_kokkos(k_arho1m, arho1m);
+  memoryKK->destroy_kokkos(k_arho2m, arho2m);
+  memoryKK->destroy_kokkos(k_arho3m, arho3m);
+  memoryKK->destroy_kokkos(k_arho3mb, arho3mb);
 }
 
 #include "meam_setup_done_kokkos.h"
diff --git a/src/KOKKOS/meam_kokkos.h b/src/KOKKOS/meam_kokkos.h
index cc75023810..2203355641 100644
--- a/src/KOKKOS/meam_kokkos.h
+++ b/src/KOKKOS/meam_kokkos.h
@@ -136,6 +136,13 @@ template <class DeviceType> class MEAMKokkos : public MEAM {
   DAT::tdual_ffloat_1d k_scrfcn, k_dscrfcn, k_fcpair;
   typename ArrayTypes<DeviceType>::t_ffloat_1d d_scrfcn, d_dscrfcn, d_fcpair;
   HAT::t_ffloat_1d h_scrfcn, h_dscrfcn, h_fcpair;
+  // msmeam
+  DAT::tdual_ffloat_2d k_arho1m, k_arho2m, k_arho3m, k_arho3mb;
+  typename ArrayTypes<DeviceType>::t_ffloat_2d d_arho1m, d_arho2m, d_arho3m, d_arho3mb;
+  HAT::t_ffloat_2d h_arho1m, h_arho2m, h_arho3m, h_arho3mb;
+  DAT::tdual_ffloat_1d k_arho2mb;
+  typename ArrayTypes<DeviceType>::t_ffloat_1d d_arho2mb;
+  HAT::t_ffloat_1d h_arho2mb;
 
  protected:
   int need_dup;
@@ -195,6 +202,31 @@ template <class DeviceType> class MEAMKokkos : public MEAM {
       dup_vatom;
   NonDupScatterView<typename decltype(d_vatom)::data_type, typename decltype(d_vatom)::array_layout>
       ndup_vatom;
+
+  // msmeam
+
+  DupScatterView<typename decltype(d_arho1m)::data_type, typename decltype(d_arho1m)::array_layout>
+      dup_arho1m;
+  NonDupScatterView<typename decltype(d_arho1m)::data_type, typename decltype(d_arho1m)::array_layout>
+      ndup_arho1m;
+  DupScatterView<typename decltype(d_arho2m)::data_type, typename decltype(d_arho2m)::array_layout>
+      dup_arho2m;
+  NonDupScatterView<typename decltype(d_arho2m)::data_type, typename decltype(d_arho2m)::array_layout>
+      ndup_arho2m;
+  DupScatterView<typename decltype(d_arho3m)::data_type, typename decltype(d_arho3m)::array_layout>
+      dup_arho3m;
+  NonDupScatterView<typename decltype(d_arho3m)::data_type, typename decltype(d_arho3m)::array_layout>
+      ndup_arho3m;
+  DupScatterView<typename decltype(d_arho2mb)::data_type, typename decltype(d_arho2mb)::array_layout>
+      dup_arho2mb;
+  NonDupScatterView<typename decltype(d_arho2mb)::data_type,
+                    typename decltype(d_arho2mb)::array_layout>
+      ndup_arho2mb;
+  DupScatterView<typename decltype(d_arho3mb)::data_type, typename decltype(d_arho3mb)::array_layout>
+      dup_arho3mb;
+  NonDupScatterView<typename decltype(d_arho3mb)::data_type,
+                    typename decltype(d_arho3mb)::array_layout>
+      ndup_arho3mb;
 };
 
 KOKKOS_INLINE_FUNCTION
diff --git a/src/KOKKOS/pair_meam_kokkos.cpp b/src/KOKKOS/pair_meam_kokkos.cpp
index 90e714cefe..c2b03c2054 100644
--- a/src/KOKKOS/pair_meam_kokkos.cpp
+++ b/src/KOKKOS/pair_meam_kokkos.cpp
@@ -51,6 +51,7 @@ PairMEAMKokkos<DeviceType>::PairMEAMKokkos(LAMMPS *lmp) : PairMEAM(lmp)
   delete meam_inst;
   meam_inst_kk = new MEAMKokkos<DeviceType>(memory);
   meam_inst = meam_inst_kk;
+  myname = "meam/kk";
 }
 
 /* ---------------------------------------------------------------------- */
@@ -156,7 +157,8 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
 
   int need_dup = lmp->kokkos->need_dup<DeviceType>();
 
-  meam_inst_kk->meam_dens_init(inum_half,ntype,type,d_map,x,d_numneigh_half,d_numneigh_full,d_ilist_half,d_neighbors_half, d_neighbors_full, d_offset, neighflag, need_dup);
+  meam_inst_kk->meam_dens_init(inum_half,ntype,type,d_map,x,d_numneigh_half,d_numneigh_full,
+    d_ilist_half,d_neighbors_half, d_neighbors_full, d_offset, neighflag, need_dup);
 
   meam_inst_kk->k_rho0.template modify<DeviceType>();
   meam_inst_kk->k_arho2b.template modify<DeviceType>();
@@ -166,6 +168,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
   meam_inst_kk->k_arho3b.template modify<DeviceType>();
   meam_inst_kk->k_t_ave.template modify<DeviceType>();
   meam_inst_kk->k_tsq_ave.template modify<DeviceType>();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.template modify<DeviceType>();
+    meam_inst_kk->k_arho1m.template modify<DeviceType>();
+    meam_inst_kk->k_arho2m.template modify<DeviceType>();
+    meam_inst_kk->k_arho3m.template modify<DeviceType>();
+    meam_inst_kk->k_arho3mb.template modify<DeviceType>();
+  }
 
   comm->reverse_comm(this);
 
@@ -177,6 +186,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
   meam_inst_kk->k_arho3b.template sync<DeviceType>();
   meam_inst_kk->k_t_ave.template sync<DeviceType>();
   meam_inst_kk->k_tsq_ave.template sync<DeviceType>();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.template sync<DeviceType>();
+    meam_inst_kk->k_arho1m.template sync<DeviceType>();
+    meam_inst_kk->k_arho2m.template sync<DeviceType>();
+    meam_inst_kk->k_arho3m.template sync<DeviceType>();
+    meam_inst_kk->k_arho3mb.template sync<DeviceType>();
+  }
 
   meam_inst_kk->meam_dens_final(nlocal,eflag_either,eflag_global,eflag_atom,
                    d_eatom,ntype,type,d_map,d_scale,errorflag,ev);
@@ -200,6 +216,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
   meam_inst_kk->k_arho3b.template modify<DeviceType>();
   meam_inst_kk->k_t_ave.template modify<DeviceType>();
   meam_inst_kk->k_tsq_ave.template modify<DeviceType>();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.template modify<DeviceType>();
+    meam_inst_kk->k_arho1m.template modify<DeviceType>();
+    meam_inst_kk->k_arho2m.template modify<DeviceType>();
+    meam_inst_kk->k_arho3m.template modify<DeviceType>();
+    meam_inst_kk->k_arho3mb.template modify<DeviceType>();
+  }
 
   comm->forward_comm(this);
 
@@ -219,6 +242,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
   meam_inst_kk->k_arho3b.template sync<DeviceType>();
   meam_inst_kk->k_t_ave.template sync<DeviceType>();
   meam_inst_kk->k_tsq_ave.template sync<DeviceType>();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.template sync<DeviceType>();
+    meam_inst_kk->k_arho1m.template sync<DeviceType>();
+    meam_inst_kk->k_arho2m.template sync<DeviceType>();
+    meam_inst_kk->k_arho3m.template sync<DeviceType>();
+    meam_inst_kk->k_arho3mb.template sync<DeviceType>();
+  }
 
   meam_inst_kk->meam_force(inum_half,eflag_global,eflag_atom,vflag_global,
                            vflag_atom,d_eatom,ntype,type,d_map,x,
@@ -315,7 +345,7 @@ int PairMEAMKokkos<DeviceType>::pack_forward_comm_kokkos(int n, DAT::tdual_int_2
   iswap = iswap_in;
   v_buf = buf.view<DeviceType>();
   Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, TagPairMEAMPackForwardComm>(0,n),*this);
-  return n*38;
+  return n*comm_forward;
 }
 
 /* ---------------------------------------------------------------------- */
@@ -324,7 +354,7 @@ template<class DeviceType>
 KOKKOS_INLINE_FUNCTION
 void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackForwardComm, const int &i) const {
   int j = d_sendlist(iswap, i);
-  int m = i*38;
+  int m = i*comm_forward;
   v_buf[m++] = d_rho0[j];
   v_buf[m++] = d_rho1[j];
   v_buf[m++] = d_rho2[j];
@@ -354,6 +384,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackForwardComm, const in
   v_buf[m++] = d_tsq_ave(j,0);
   v_buf[m++] = d_tsq_ave(j,1);
   v_buf[m++] = d_tsq_ave(j,2);
+  if (msmeamflag) {
+    v_buf[m++] = d_arho2mb[j];
+    v_buf[m++] = d_arho1m(j,0);
+    v_buf[m++] = d_arho1m(j,1);
+    v_buf[m++] = d_arho1m(j,2);
+    v_buf[m++] = d_arho2m(j,0);
+    v_buf[m++] = d_arho2m(j,1);
+    v_buf[m++] = d_arho2m(j,2);
+    v_buf[m++] = d_arho2m(j,3);
+    v_buf[m++] = d_arho2m(j,4);
+    v_buf[m++] = d_arho2m(j,5);
+    for (int k = 0; k < 10; k++) v_buf[m++] = d_arho3m(j,k);
+    v_buf[m++] = d_arho3mb(j,0);
+    v_buf[m++] = d_arho3mb(j,1);
+    v_buf[m++] = d_arho3mb(j,2);
+  }
 }
 
 /* ---------------------------------------------------------------------- */
@@ -371,7 +417,8 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm_kokkos(int n, int first_in,
 template<class DeviceType>
 KOKKOS_INLINE_FUNCTION
 void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackForwardComm, const int &i) const{
-   int m = i*38;
+  //int m = i*38;
+  int m = i*comm_forward;
 
     d_rho0[i+first] = v_buf[m++];
     d_rho1[i+first] = v_buf[m++];
@@ -402,6 +449,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackForwardComm, const
     d_tsq_ave(i+first,0) = v_buf[m++];
     d_tsq_ave(i+first,1) = v_buf[m++];
     d_tsq_ave(i+first,2) = v_buf[m++];
+    if (msmeamflag) {
+      d_arho2mb[i+first] = v_buf[m++];
+      d_arho1m(i+first,0) = v_buf[m++];
+      d_arho1m(i+first,1) = v_buf[m++];
+      d_arho1m(i+first,2) = v_buf[m++];
+      d_arho2m(i+first,0) = v_buf[m++];
+      d_arho2m(i+first,1) = v_buf[m++];
+      d_arho2m(i+first,2) = v_buf[m++];
+      d_arho2m(i+first,3) = v_buf[m++];
+      d_arho2m(i+first,4) = v_buf[m++];
+      d_arho2m(i+first,5) = v_buf[m++];
+      for (int k = 0; k < 10; k++) d_arho3m(i+first,k) = v_buf[m++];
+      d_arho3mb(i+first,0) = v_buf[m++];
+      d_arho3mb(i+first,1) = v_buf[m++];
+      d_arho3mb(i+first,2) = v_buf[m++];
+    }
  }
 
 /* ---------------------------------------------------------------------- */
@@ -426,6 +489,13 @@ int PairMEAMKokkos<DeviceType>::pack_forward_comm(int n, int *list, double *buf,
   meam_inst_kk->k_arho3b.sync_host();
   meam_inst_kk->k_t_ave.sync_host();
   meam_inst_kk->k_tsq_ave.sync_host();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.sync_host();
+    meam_inst_kk->k_arho1m.sync_host();
+    meam_inst_kk->k_arho2m.sync_host();
+    meam_inst_kk->k_arho3m.sync_host();
+    meam_inst_kk->k_arho3mb.sync_host();
+  }
 
   int m = 0;
   for (int i = 0; i < n; i++) {
@@ -459,6 +529,22 @@ int PairMEAMKokkos<DeviceType>::pack_forward_comm(int n, int *list, double *buf,
     buf[m++] = meam_inst_kk->h_tsq_ave(j,0);
     buf[m++] = meam_inst_kk->h_tsq_ave(j,1);
     buf[m++] = meam_inst_kk->h_tsq_ave(j,2);
+    if (msmeamflag) {
+      buf[m++] = meam_inst_kk->h_arho2mb[j];
+      buf[m++] = meam_inst_kk->h_arho1m(j,0);
+      buf[m++] = meam_inst_kk->h_arho1m(j,1);
+      buf[m++] = meam_inst_kk->h_arho1m(j,2);
+      buf[m++] = meam_inst_kk->h_arho2m(j,0);
+      buf[m++] = meam_inst_kk->h_arho2m(j,1);
+      buf[m++] = meam_inst_kk->h_arho2m(j,2);
+      buf[m++] = meam_inst_kk->h_arho2m(j,3);
+      buf[m++] = meam_inst_kk->h_arho2m(j,4);
+      buf[m++] = meam_inst_kk->h_arho2m(j,5);
+      for (int k = 0; k < 10; k++) buf[m++] = meam_inst_kk->h_arho3m(j,k);
+      buf[m++] = meam_inst_kk->h_arho3mb(j,0);
+      buf[m++] = meam_inst_kk->h_arho3mb(j,1);
+      buf[m++] = meam_inst_kk->h_arho3mb(j,2);
+    }
   }
 
   return m;
@@ -485,6 +571,13 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *b
   meam_inst_kk->k_arho3b.sync_host();
   meam_inst_kk->k_t_ave.sync_host();
   meam_inst_kk->k_tsq_ave.sync_host();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.sync_host();
+    meam_inst_kk->k_arho1m.sync_host();
+    meam_inst_kk->k_arho2m.sync_host();
+    meam_inst_kk->k_arho3m.sync_host();
+    meam_inst_kk->k_arho3mb.sync_host();
+  }
 
   int m = 0;
   const int last = first + n;
@@ -518,6 +611,22 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *b
     meam_inst_kk->h_tsq_ave(i,0) = buf[m++];
     meam_inst_kk->h_tsq_ave(i,1) = buf[m++];
     meam_inst_kk->h_tsq_ave(i,2) = buf[m++];
+    if (msmeamflag) {
+      meam_inst_kk->h_arho2mb[i] = buf[m++];
+      meam_inst_kk->h_arho1m(i,0) = buf[m++];
+      meam_inst_kk->h_arho1m(i,1) = buf[m++];
+      meam_inst_kk->h_arho1m(i,2) = buf[m++];
+      meam_inst_kk->h_arho2m(i,0) = buf[m++];
+      meam_inst_kk->h_arho2m(i,1) = buf[m++];
+      meam_inst_kk->h_arho2m(i,2) = buf[m++];
+      meam_inst_kk->h_arho2m(i,3) = buf[m++];
+      meam_inst_kk->h_arho2m(i,4) = buf[m++];
+      meam_inst_kk->h_arho2m(i,5) = buf[m++];
+      for (int k = 0; k < 10; k++) meam_inst_kk->h_arho3m(i,k) = buf[m++];
+      meam_inst_kk->h_arho3mb(i,0) = buf[m++];
+      meam_inst_kk->h_arho3mb(i,1) = buf[m++];
+      meam_inst_kk->h_arho3mb(i,2) = buf[m++];
+    }
   }
 
   meam_inst_kk->k_rho0.modify_host();
@@ -536,6 +645,13 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *b
   meam_inst_kk->k_arho3b.modify_host();
   meam_inst_kk->k_t_ave.modify_host();
   meam_inst_kk->k_tsq_ave.modify_host();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.modify_host();
+    meam_inst_kk->k_arho1m.modify_host();
+    meam_inst_kk->k_arho2m.modify_host();
+    meam_inst_kk->k_arho3m.modify_host();
+    meam_inst_kk->k_arho3mb.modify_host();
+  }
 }
 
 /* ---------------------------------------------------------------------- */
@@ -546,7 +662,8 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm_kokkos(int n, int first_in, DA
   first = first_in;
   v_buf = buf.view<DeviceType>();
   Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, TagPairMEAMPackReverseComm>(0,n),*this);
-  return n*30;
+  //return n*30;
+  return n*comm_reverse;
 }
 
 /* ---------------------------------------------------------------------- */
@@ -554,7 +671,8 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm_kokkos(int n, int first_in, DA
 template<class DeviceType>
 KOKKOS_INLINE_FUNCTION
 void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackReverseComm, const int &i) const {
-  int m = i*30;
+  //int m = i*30;
+  int m = i*comm_reverse;
 
   v_buf[m++] = d_rho0[i+first];
   v_buf[m++] = d_arho2b[i+first];
@@ -577,6 +695,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackReverseComm, const in
   v_buf[m++] = d_tsq_ave(i+first,0);
   v_buf[m++] = d_tsq_ave(i+first,1);
   v_buf[m++] = d_tsq_ave(i+first,2);
+  if (msmeamflag) {
+    v_buf[m++] = d_arho2mb[i+first];
+    v_buf[m++] = d_arho1m(i+first,0);
+    v_buf[m++] = d_arho1m(i+first,1);
+    v_buf[m++] = d_arho1m(i+first,2);
+    v_buf[m++] = d_arho2m(i+first,0);
+    v_buf[m++] = d_arho2m(i+first,1);
+    v_buf[m++] = d_arho2m(i+first,2);
+    v_buf[m++] = d_arho2m(i+first,3);
+    v_buf[m++] = d_arho2m(i+first,4);
+    v_buf[m++] = d_arho2m(i+first,5);
+    for (int k = 0; k < 10; k++) v_buf[m++] = d_arho3m(i+first,k);
+    v_buf[m++] = d_arho3mb(i+first,0);
+    v_buf[m++] = d_arho3mb(i+first,1);
+    v_buf[m++] = d_arho3mb(i+first,2);
+  }
 }
 
 /* ---------------------------------------------------------------------- */
@@ -592,6 +726,13 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm(int n, int first, double *buf)
   meam_inst_kk->k_arho3b.sync_host();
   meam_inst_kk->k_t_ave.sync_host();
   meam_inst_kk->k_tsq_ave.sync_host();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.sync_host();
+    meam_inst_kk->k_arho1m.sync_host();
+    meam_inst_kk->k_arho2m.sync_host();
+    meam_inst_kk->k_arho3m.sync_host();
+    meam_inst_kk->k_arho3mb.sync_host();
+  }
 
   int m = 0;
   const int last = first + n;
@@ -617,6 +758,22 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm(int n, int first, double *buf)
     buf[m++] = meam_inst_kk->h_tsq_ave(i,0);
     buf[m++] = meam_inst_kk->h_tsq_ave(i,1);
     buf[m++] = meam_inst_kk->h_tsq_ave(i,2);
+    if (msmeamflag) {
+      buf[m++] = meam_inst_kk->h_arho2mb[i];
+      buf[m++] = meam_inst_kk->h_arho1m(i,0);
+      buf[m++] = meam_inst_kk->h_arho1m(i,1);
+      buf[m++] = meam_inst_kk->h_arho1m(i,2);
+      buf[m++] = meam_inst_kk->h_arho2m(i,0);
+      buf[m++] = meam_inst_kk->h_arho2m(i,1);
+      buf[m++] = meam_inst_kk->h_arho2m(i,2);
+      buf[m++] = meam_inst_kk->h_arho2m(i,3);
+      buf[m++] = meam_inst_kk->h_arho2m(i,4);
+      buf[m++] = meam_inst_kk->h_arho2m(i,5);
+      for (int k = 0; k < 10; k++) buf[m++] = meam_inst_kk->h_arho3m(i,k);
+      buf[m++] = meam_inst_kk->h_arho3mb(i,0);
+      buf[m++] = meam_inst_kk->h_arho3mb(i,1);
+      buf[m++] = meam_inst_kk->h_arho3mb(i,2);
+    }
   }
 
   return m;
@@ -639,7 +796,8 @@ template<class DeviceType>
 KOKKOS_INLINE_FUNCTION
 void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackReverseComm, const int &i) const {
   int j = d_sendlist(iswap, i);
-  int m = i*30;
+  //int m = i*30;
+  int m = i*comm_reverse;
 
   d_rho0[j] += v_buf[m++];
   d_arho2b[j] += v_buf[m++];
@@ -662,6 +820,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackReverseComm, const
   d_tsq_ave(j,0) += v_buf[m++];
   d_tsq_ave(j,1) += v_buf[m++];
   d_tsq_ave(j,2) += v_buf[m++];
+  if (msmeamflag) {
+    d_arho2mb[j] += v_buf[m++];
+    d_arho1m(j,0) += v_buf[m++];
+    d_arho1m(j,1) += v_buf[m++];
+    d_arho1m(j,2) += v_buf[m++];
+    d_arho2m(j,0) += v_buf[m++];
+    d_arho2m(j,1) += v_buf[m++];
+    d_arho2m(j,2) += v_buf[m++];
+    d_arho2m(j,3) += v_buf[m++];
+    d_arho2m(j,4) += v_buf[m++];
+    d_arho2m(j,5) += v_buf[m++];
+    for (int k = 0; k < 10; k++) d_arho3m(j,k) += v_buf[m++];
+    d_arho3mb(j,0) += v_buf[m++];
+    d_arho3mb(j,1) += v_buf[m++];
+    d_arho3mb(j,2) += v_buf[m++];
+  }
 }
 
 /* ---------------------------------------------------------------------- */
@@ -677,6 +851,13 @@ void PairMEAMKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *b
   meam_inst_kk->k_arho3b.sync_host();
   meam_inst_kk->k_t_ave.sync_host();
   meam_inst_kk->k_tsq_ave.sync_host();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.sync_host();
+    meam_inst_kk->k_arho1m.sync_host();
+    meam_inst_kk->k_arho2m.sync_host();
+    meam_inst_kk->k_arho3m.sync_host();
+    meam_inst_kk->k_arho3mb.sync_host();
+  }
 
   int m = 0;
   for (int i = 0; i < n; i++) {
@@ -702,6 +883,22 @@ void PairMEAMKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *b
     meam_inst_kk->h_tsq_ave(j,0) += buf[m++];
     meam_inst_kk->h_tsq_ave(j,1) += buf[m++];
     meam_inst_kk->h_tsq_ave(j,2) += buf[m++];
+    if (msmeamflag) {
+      meam_inst_kk->h_arho2mb[j] += buf[m++];
+      meam_inst_kk->h_arho1m(j,0) += buf[m++];
+      meam_inst_kk->h_arho1m(j,1) += buf[m++];
+      meam_inst_kk->h_arho1m(j,2) += buf[m++];
+      meam_inst_kk->h_arho2m(j,0) += buf[m++];
+      meam_inst_kk->h_arho2m(j,1) += buf[m++];
+      meam_inst_kk->h_arho2m(j,2) += buf[m++];
+      meam_inst_kk->h_arho2m(j,3) += buf[m++];
+      meam_inst_kk->h_arho2m(j,4) += buf[m++];
+      meam_inst_kk->h_arho2m(j,5) += buf[m++];
+      for (int k = 0; k < 10; k++) meam_inst_kk->h_arho3m(j,k) += buf[m++];
+      meam_inst_kk->h_arho3mb(j,0) += buf[m++];
+      meam_inst_kk->h_arho3mb(j,1) += buf[m++];
+      meam_inst_kk->h_arho3mb(j,2) += buf[m++];
+    }
   }
 
   meam_inst_kk->k_rho0.modify_host();
@@ -712,6 +909,13 @@ void PairMEAMKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *b
   meam_inst_kk->k_arho3b.modify_host();
   meam_inst_kk->k_t_ave.modify_host();
   meam_inst_kk->k_tsq_ave.modify_host();
+  if (msmeamflag) {
+    meam_inst_kk->k_arho2mb.modify_host();
+    meam_inst_kk->k_arho1m.modify_host();
+    meam_inst_kk->k_arho2m.modify_host();
+    meam_inst_kk->k_arho3m.modify_host();
+    meam_inst_kk->k_arho3mb.modify_host();
+  }
 }
 
 /* ----------------------------------------------------------------------
@@ -764,6 +968,12 @@ void PairMEAMKokkos<DeviceType>::update_meam_views()
   d_arho3b = meam_inst_kk->d_arho3b;
   d_t_ave = meam_inst_kk->d_t_ave;
   d_tsq_ave = meam_inst_kk->d_tsq_ave;
+  // msmeam
+  d_arho1m = meam_inst_kk->d_arho1m;
+  d_arho2m = meam_inst_kk->d_arho2m;
+  d_arho3m = meam_inst_kk->d_arho3m;
+  d_arho2mb = meam_inst_kk->d_arho2mb;
+  d_arho3mb = meam_inst_kk->d_arho3mb;
 }
 
 /* ---------------------------------------------------------------------- */
diff --git a/src/KOKKOS/pair_meam_kokkos.h b/src/KOKKOS/pair_meam_kokkos.h
index c5fe82fa79..0d0d7667f3 100644
--- a/src/KOKKOS/pair_meam_kokkos.h
+++ b/src/KOKKOS/pair_meam_kokkos.h
@@ -13,12 +13,12 @@
 
 #ifdef PAIR_CLASS
 // clang-format off
-PairStyle(meam/c/kk,PairMEAMKokkos<LMPDeviceType>)
-PairStyle(meam/c/kk/device,PairMEAMKokkos<LMPDeviceType>)
-PairStyle(meam/c/kk/host,PairMEAMKokkos<LMPHostType>)
-PairStyle(meam/kk,PairMEAMKokkos<LMPDeviceType>)
-PairStyle(meam/kk/device,PairMEAMKokkos<LMPDeviceType>)
-PairStyle(meam/kk/host,PairMEAMKokkos<LMPHostType>)
+PairStyle(meam/c/kk,PairMEAMKokkos<LMPDeviceType>);
+PairStyle(meam/c/kk/device,PairMEAMKokkos<LMPDeviceType>);
+PairStyle(meam/c/kk/host,PairMEAMKokkos<LMPHostType>);
+PairStyle(meam/kk,PairMEAMKokkos<LMPDeviceType>);
+PairStyle(meam/kk/device,PairMEAMKokkos<LMPDeviceType>);
+PairStyle(meam/kk/host,PairMEAMKokkos<LMPHostType>);
 // clang-format on
 #else
 
@@ -117,6 +117,9 @@ class PairMEAMKokkos : public PairMEAM, public KokkosBase {
   typename ArrayTypes<DeviceType>::t_ffloat_1d d_rho, d_rho0, d_rho1, d_rho2, d_rho3, d_frhop;
   typename ArrayTypes<DeviceType>::t_ffloat_1d d_gamma, d_dgamma1, d_dgamma2, d_dgamma3, d_arho2b;
   typename ArrayTypes<DeviceType>::t_ffloat_2d d_arho1, d_arho2, d_arho3, d_arho3b, d_t_ave, d_tsq_ave;
+  // msmeam params
+  typename ArrayTypes<DeviceType>::t_ffloat_1d d_arho2mb;
+  typename ArrayTypes<DeviceType>::t_ffloat_2d d_arho1m, d_arho2m, d_arho3m, d_arho3mb;
 
   void update_meam_views();
 
diff --git a/src/KOKKOS/pair_meam_ms_kokkos.cpp b/src/KOKKOS/pair_meam_ms_kokkos.cpp
new file mode 100644
index 0000000000..491fc0273c
--- /dev/null
+++ b/src/KOKKOS/pair_meam_ms_kokkos.cpp
@@ -0,0 +1,32 @@
+/* ----------------------------------------------------------------------
+   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
+   https://www.lammps.org/ Sandia National Laboratories
+   LAMMPS development team: developers@lammps.org
+
+   Copyright (2003) Sandia Corporation.  Under the terms of Contract
+   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+   certain rights in this software.  This software is distributed under
+   the GNU General Public License.
+
+   See the README file in the top-level LAMMPS directory.
+------------------------------------------------------------------------- */
+
+#include "pair_meam_ms_kokkos.h"
+#include "meam.h"
+
+using namespace LAMMPS_NS;
+
+/* ---------------------------------------------------------------------- */
+template<class DeviceType>
+PairMEAMMSKokkos<DeviceType>::PairMEAMMSKokkos(LAMMPS *lmp) : PairMEAMKokkos<DeviceType>(lmp)
+{
+  this->meam_inst->msmeamflag = this->msmeamflag = 1;
+  this->myname = "meam/ms/kk";
+}
+
+namespace LAMMPS_NS {
+template class PairMEAMMSKokkos<LMPDeviceType>;
+#ifdef KOKKOS_ENABLE_CUDA
+template class PairMEAMMSKokkos<LMPHostType>;
+#endif
+}
diff --git a/src/KOKKOS/pair_meam_ms_kokkos.h b/src/KOKKOS/pair_meam_ms_kokkos.h
new file mode 100644
index 0000000000..a2cefc2c16
--- /dev/null
+++ b/src/KOKKOS/pair_meam_ms_kokkos.h
@@ -0,0 +1,36 @@
+/* -*- c++ -*- ----------------------------------------------------------
+   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
+   https://www.lammps.org/ Sandia National Laboratories
+   LAMMPS development team: developers@lammps.org
+
+   Copyright (2003) Sandia Corporation.  Under the terms of Contract
+   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+   certain rights in this software.  This software is distributed under
+   the GNU General Public License.
+
+   See the README file in the top-level LAMMPS directory.
+------------------------------------------------------------------------- */
+
+#ifdef PAIR_CLASS
+// clang-format off
+PairStyle(meam/ms/kk,PairMEAMMSKokkos<LMPDeviceType>);
+PairStyle(meam/ms/kk/device,PairMEAMMSKokkos<LMPDeviceType>);
+PairStyle(meam/ms/kk/host,PairMEAMMSKokkos<LMPHostType>);
+// clang-format on
+#else
+
+#ifndef LMP_PAIR_MEAM_MS_KOKKOS_H
+#define LMP_PAIR_MEAM_MS_KOKKOS_H
+
+#include "pair_meam_kokkos.h"
+
+namespace LAMMPS_NS {
+
+template <class DeviceType>
+class PairMEAMMSKokkos : public PairMEAMKokkos<DeviceType> {
+ public:
+  PairMEAMMSKokkos(class LAMMPS *);
+};
+}    // namespace LAMMPS_NS
+#endif
+#endif
diff --git a/src/MEAM/meam.h b/src/MEAM/meam.h
index 9ec7de3426..5a131bdc34 100644
--- a/src/MEAM/meam.h
+++ b/src/MEAM/meam.h
@@ -17,7 +17,7 @@
 #include <cmath>
 #include <string>
 
-#define maxelt 5
+constexpr int maxelt = 5;
 
 namespace LAMMPS_NS {
 class Memory;
@@ -30,6 +30,7 @@ class MEAM {
   virtual ~MEAM();
 
   int copymode;
+  int msmeamflag;
 
  protected:
   Memory *memory;
@@ -74,6 +75,12 @@ class MEAM {
   // vind[23]D = Voight notation index maps for 2 and 3D
   // v2D,v3D = array of factors to apply for Voight notation
 
+  // MS-MEAM parameters
+
+  // msmeamflag = flag to activate MS-MEAM
+  // betam[1-3]_meam = MS-MEAM electron density constants
+  // tm[1-3]_meam = MS-MEAM coefficients on densities in Gamma computation
+
   // nr,dr = pair function discretization parameters
   // nrar,rdrar = spline coeff array parameters
 
@@ -115,12 +122,22 @@ class MEAM {
   int nr, nrar;
   double dr, rdrar;
 
+  // MS-MEAM parameters
+
+  double t1m_meam[maxelt], t2m_meam[maxelt], t3m_meam[maxelt];
+  double beta1m_meam[maxelt], beta2m_meam[maxelt], beta3m_meam[maxelt];
+  //int msmeamflag; // made public for pair style settings
+
  public:
   int nmax;
   double *rho, *rho0, *rho1, *rho2, *rho3, *frhop;
   double *gamma, *dgamma1, *dgamma2, *dgamma3, *arho2b;
   double **arho1, **arho2, **arho3, **arho3b, **t_ave, **tsq_ave;
 
+  // MS-MEAM arrays
+
+  double **arho1m, **arho2m, *arho2mb, **arho3m, **arho3mb;
+
   int maxneigh;
   double *scrfcn, *dscrfcn, *fcpair;
 
@@ -242,7 +259,7 @@ class MEAM {
                   double, double, double, double, double, int, int, lattice_t);
   void get_sijk(double, int, int, int, double *);
   void get_densref(double, int, int, double *, double *, double *, double *, double *, double *,
-                   double *, double *);
+                   double *, double *, double *, double *, double *, double *, double *, double *); // last 6 args for msmeam
   void interpolate_meam(int);
 
  public:
@@ -282,10 +299,12 @@ class MEAM {
   }
   // clang-format on
   static int get_Zij(const lattice_t latt);
+  // last 6 args are optional msmeam parameters
   void meam_setup_global(int nelt, lattice_t *lat, int *ielement, double *atwt, double *alpha,
                          double *b0, double *b1, double *b2, double *b3, double *alat, double *esub,
                          double *asub, double *t0, double *t1, double *t2, double *t3,
-                         double *rozero, int *ibar);
+                         double *rozero, int *ibar, double *b1m, double *b2m, double *b3m,
+                         double *t1m, double *t2m, double *t3m);
   void meam_setup_param(int which, double value, int nindex, int *index /*index(3)*/,
                         int *errorflag);
   virtual void meam_setup_done(double *cutmax);
diff --git a/src/MEAM/meam_dens_final.cpp b/src/MEAM/meam_dens_final.cpp
index cf964a4724..ab0ac8c53f 100644
--- a/src/MEAM/meam_dens_final.cpp
+++ b/src/MEAM/meam_dens_final.cpp
@@ -27,115 +27,222 @@ MEAM::meam_dens_final(int nlocal, int eflag_either, int eflag_global, int eflag_
 
   //     Complete the calculation of density
 
-  for (i = 0; i < nlocal; i++) {
-    elti = fmap[type[i]];
-    if (elti >= 0) {
-      scaleii = scale[type[i]][type[i]];
-      rho1[i] = 0.0;
-      rho2[i] = -1.0 / 3.0 * arho2b[i] * arho2b[i];
-      rho3[i] = 0.0;
-      for (m = 0; m < 3; m++) {
-        rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m];
-        rho3[i] = rho3[i] - 3.0 / 5.0 * arho3b[i][m] * arho3b[i][m];
-      }
-      for (m = 0; m < 6; m++) {
-        rho2[i] = rho2[i] + this->v2D[m] * arho2[i][m] * arho2[i][m];
-      }
-      for (m = 0; m < 10; m++) {
-        rho3[i] = rho3[i] + this->v3D[m] * arho3[i][m] * arho3[i][m];
-      }
-
-      if (rho0[i] > 0.0) {
-        if (this->ialloy == 1) {
-          t_ave[i][0] = fdiv_zero(t_ave[i][0], tsq_ave[i][0]);
-          t_ave[i][1] = fdiv_zero(t_ave[i][1], tsq_ave[i][1]);
-          t_ave[i][2] = fdiv_zero(t_ave[i][2], tsq_ave[i][2]);
-        } else if (this->ialloy == 2) {
-          t_ave[i][0] = this->t1_meam[elti];
-          t_ave[i][1] = this->t2_meam[elti];
-          t_ave[i][2] = this->t3_meam[elti];
-        } else {
-          t_ave[i][0] = t_ave[i][0] / rho0[i];
-          t_ave[i][1] = t_ave[i][1] / rho0[i];
-          t_ave[i][2] = t_ave[i][2] / rho0[i];
+  if (this->msmeamflag) {
+    for (i = 0; i < nlocal; i++) {
+      elti = fmap[type[i]];
+      if (elti >= 0) {
+        scaleii = scale[type[i]][type[i]];
+        rho1[i] = 0.0;
+        rho2[i] = -1.0 / 3.0 * (arho2b[i] * arho2b[i]
+                              - arho2mb[i] * arho2mb[i]);
+        rho3[i] = 0.0;
+        for (m = 0; m < 3; m++) {
+          rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m]
+                            - arho1m[i][m] * arho1m[i][m];
+          rho3[i] = rho3[i] - 3.0 / 5.0 * (arho3b[i][m] * arho3b[i][m]
+                                         - arho3mb[i][m] * arho3mb[i][m]);
         }
-      }
-
-      gamma[i] = t_ave[i][0] * rho1[i] + t_ave[i][1] * rho2[i] + t_ave[i][2] * rho3[i];
-
-      if (rho0[i] > 0.0) {
-        gamma[i] = gamma[i] / (rho0[i] * rho0[i]);
-      }
-
-      Z = get_Zij(this->lattce_meam[elti][elti]);
-
-      G = G_gam(gamma[i], this->ibar_meam[elti], errorflag);
-      if (errorflag != 0)
-        return;
-
-      get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp);
-
-      if (this->ibar_meam[elti] <= 0) {
-        Gbar = 1.0;
-        dGbar = 0.0;
-      } else {
-        if (this->mix_ref_t == 1) {
-          gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
-        } else {
-          gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) /
-                (Z * Z);
+        for (m = 0; m < 6; m++) {
+          rho2[i] = rho2[i] + this->v2D[m] * (arho2[i][m] * arho2[i][m]
+                                            - arho2m[i][m] * arho2m[i][m]);
         }
-        Gbar = G_gam(gam, this->ibar_meam[elti], errorflag);
-      }
-      rho[i] = rho0[i] * G;
 
-      if (this->mix_ref_t == 1) {
+        for (m = 0; m < 10; m++) {
+          rho3[i] = rho3[i] + this->v3D[m] * (arho3[i][m] * arho3[i][m]
+                                            - arho3m[i][m] * arho3m[i][m]);
+        }
+
+        // all the t weights are already accounted for with msmeam
+        gamma[i] = rho1[i] + rho2[i] + rho3[i];
+
+        if (rho0[i] > 0.0) {
+          gamma[i] = gamma[i] / (rho0[i] * rho0[i]);
+        }
+
+        Z = get_Zij(this->lattce_meam[elti][elti]);
+
+        G = G_gam(gamma[i], this->ibar_meam[elti], errorflag);
+        if (errorflag != 0)
+          return;
+
+        get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp);
+
         if (this->ibar_meam[elti] <= 0) {
           Gbar = 1.0;
           dGbar = 0.0;
         } else {
-          gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
-          Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar);
+          if (this->mix_ref_t == 1) {
+            gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
+          } else {
+            gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) /
+                  (Z * Z);
+          }
+          Gbar = G_gam(gam, this->ibar_meam[elti], errorflag);
         }
-        rho_bkgd = this->rho0_meam[elti] * Z * Gbar;
-      } else {
-        if (this->bkgd_dyn == 1) {
-          rho_bkgd = this->rho0_meam[elti] * Z;
+        rho[i] = rho0[i] * G;
+
+        if (this->mix_ref_t == 1) {
+          if (this->ibar_meam[elti] <= 0) {
+            Gbar = 1.0;
+            dGbar = 0.0;
+          } else {
+            gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
+            Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar);
+          }
+          rho_bkgd = this->rho0_meam[elti] * Z * Gbar;
         } else {
-          rho_bkgd = this->rho_ref_meam[elti];
+          if (this->bkgd_dyn == 1) {
+            rho_bkgd = this->rho0_meam[elti] * Z;
+          } else {
+            rho_bkgd = this->rho_ref_meam[elti];
+          }
+        }
+        rhob = rho[i] / rho_bkgd;
+        denom = 1.0 / rho_bkgd;
+
+        G = dG_gam(gamma[i], this->ibar_meam[elti], dG);
+
+        dgamma1[i] = (G - 2 * dG * gamma[i]) * denom;
+
+        if (!iszero(rho0[i])) {
+          dgamma2[i] = (dG / rho0[i]) * denom;
+        } else {
+          dgamma2[i] = 0.0;
+        }
+
+        //     dgamma3 is nonzero only if we are using the "mixed" rule for
+        //     computing t in the reference system (which is not correct, but
+        //     included for backward compatibility
+        if (this->mix_ref_t == 1) {
+          dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom;
+        } else {
+          dgamma3[i] = 0.0;
+        }
+
+        Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]);
+        if (eflag_either != 0) {
+          Fl *= scaleii;
+          if (eflag_global != 0) {
+            *eng_vdwl = *eng_vdwl + Fl;
+          }
+          if (eflag_atom != 0) {
+            eatom[i] = eatom[i] + Fl;
+          }
         }
       }
-      rhob = rho[i] / rho_bkgd;
-      denom = 1.0 / rho_bkgd;
-
-      G = dG_gam(gamma[i], this->ibar_meam[elti], dG);
-
-      dgamma1[i] = (G - 2 * dG * gamma[i]) * denom;
-
-      if (!iszero(rho0[i])) {
-        dgamma2[i] = (dG / rho0[i]) * denom;
-      } else {
-        dgamma2[i] = 0.0;
-      }
-
-      //     dgamma3 is nonzero only if we are using the "mixed" rule for
-      //     computing t in the reference system (which is not correct, but
-      //     included for backward compatibility
-      if (this->mix_ref_t == 1) {
-        dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom;
-      } else {
-        dgamma3[i] = 0.0;
-      }
-
-      Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]);
-
-      if (eflag_either != 0) {
-        Fl *= scaleii;
-        if (eflag_global != 0) {
-          *eng_vdwl = *eng_vdwl + Fl;
+    }
+  } else {
+    for (i = 0; i < nlocal; i++) {
+      elti = fmap[type[i]];
+      if (elti >= 0) {
+        scaleii = scale[type[i]][type[i]];
+        rho1[i] = 0.0;
+        rho2[i] = -1.0 / 3.0 * arho2b[i] * arho2b[i];
+        rho3[i] = 0.0;
+        for (m = 0; m < 3; m++) {
+          rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m];
+          rho3[i] = rho3[i] - 3.0 / 5.0 * arho3b[i][m] * arho3b[i][m];
         }
-        if (eflag_atom != 0) {
-          eatom[i] = eatom[i] + Fl;
+        for (m = 0; m < 6; m++) {
+          rho2[i] = rho2[i] + this->v2D[m] * arho2[i][m] * arho2[i][m];
+        }
+        for (m = 0; m < 10; m++) {
+          rho3[i] = rho3[i] + this->v3D[m] * arho3[i][m] * arho3[i][m];
+        }
+
+        if (rho0[i] > 0.0) {
+          if (this->ialloy == 1) {
+            t_ave[i][0] = fdiv_zero(t_ave[i][0], tsq_ave[i][0]);
+            t_ave[i][1] = fdiv_zero(t_ave[i][1], tsq_ave[i][1]);
+            t_ave[i][2] = fdiv_zero(t_ave[i][2], tsq_ave[i][2]);
+          } else if (this->ialloy == 2) {
+            t_ave[i][0] = this->t1_meam[elti];
+            t_ave[i][1] = this->t2_meam[elti];
+            t_ave[i][2] = this->t3_meam[elti];
+          } else {
+            t_ave[i][0] = t_ave[i][0] / rho0[i];
+            t_ave[i][1] = t_ave[i][1] / rho0[i];
+            t_ave[i][2] = t_ave[i][2] / rho0[i];
+          }
+        }
+
+        gamma[i] = t_ave[i][0] * rho1[i] + t_ave[i][1] * rho2[i] + t_ave[i][2] * rho3[i];
+
+        if (rho0[i] > 0.0) {
+          gamma[i] = gamma[i] / (rho0[i] * rho0[i]);
+        }
+
+        Z = get_Zij(this->lattce_meam[elti][elti]);
+
+        G = G_gam(gamma[i], this->ibar_meam[elti], errorflag);
+        if (errorflag != 0)
+          return;
+
+        get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp);
+
+        if (this->ibar_meam[elti] <= 0) {
+          Gbar = 1.0;
+          dGbar = 0.0;
+        } else {
+          if (this->mix_ref_t == 1) {
+            gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
+          } else {
+            gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) /
+                  (Z * Z);
+          }
+          Gbar = G_gam(gam, this->ibar_meam[elti], errorflag);
+        }
+        rho[i] = rho0[i] * G;
+
+        if (this->mix_ref_t == 1) {
+          if (this->ibar_meam[elti] <= 0) {
+            Gbar = 1.0;
+            dGbar = 0.0;
+          } else {
+            gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
+            Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar);
+          }
+          rho_bkgd = this->rho0_meam[elti] * Z * Gbar;
+        } else {
+          if (this->bkgd_dyn == 1) {
+            rho_bkgd = this->rho0_meam[elti] * Z;
+          } else {
+            rho_bkgd = this->rho_ref_meam[elti];
+          }
+        }
+        rhob = rho[i] / rho_bkgd;
+        denom = 1.0 / rho_bkgd;
+
+        G = dG_gam(gamma[i], this->ibar_meam[elti], dG);
+
+        dgamma1[i] = (G - 2 * dG * gamma[i]) * denom;
+
+        if (!iszero(rho0[i])) {
+          dgamma2[i] = (dG / rho0[i]) * denom;
+        } else {
+          dgamma2[i] = 0.0;
+        }
+
+        //     dgamma3 is nonzero only if we are using the "mixed" rule for
+        //     computing t in the reference system (which is not correct, but
+        //     included for backward compatibility
+        if (this->mix_ref_t == 1) {
+          dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom;
+        } else {
+          dgamma3[i] = 0.0;
+        }
+
+        Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]);
+
+        if (eflag_either != 0) {
+          Fl *= scaleii;
+          if (eflag_global != 0) {
+            *eng_vdwl = *eng_vdwl + Fl;
+          }
+          if (eflag_atom != 0) {
+            eatom[i] = eatom[i] + Fl;
+
+          }
         }
       }
     }
diff --git a/src/MEAM/meam_dens_init.cpp b/src/MEAM/meam_dens_init.cpp
index b60e1a7a17..00ad276ad7 100644
--- a/src/MEAM/meam_dens_init.cpp
+++ b/src/MEAM/meam_dens_init.cpp
@@ -45,6 +45,14 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
     memory->destroy(arho3b);
     memory->destroy(t_ave);
     memory->destroy(tsq_ave);
+    // msmeam params
+    if (this->msmeamflag) {
+      memory->destroy(arho1m);
+      memory->destroy(arho2m);
+      memory->destroy(arho3m);
+      memory->destroy(arho2mb);
+      memory->destroy(arho3mb);
+    }
 
     nmax = atom_nmax;
 
@@ -65,6 +73,14 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
     memory->create(arho3b, nmax, 3, "pair:arho3b");
     memory->create(t_ave, nmax, 3, "pair:t_ave");
     memory->create(tsq_ave, nmax, 3, "pair:tsq_ave");
+    // msmeam params
+    if (this->msmeamflag) {
+      memory->create(arho1m, nmax, 3, "pair:arho1m");
+      memory->create(arho2m, nmax, 6, "pair:arho2m");
+      memory->create(arho3m, nmax, 10, "pair:arho3m");
+      memory->create(arho2mb, nmax, "pair:arho2mb");
+      memory->create(arho3mb, nmax, 3, "pair:arho3mb");
+    }
   }
 
   if (n_neigh > maxneigh) {
@@ -83,14 +99,30 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
     rho0[i] = 0.0;
     arho2b[i] = 0.0;
     arho1[i][0] = arho1[i][1] = arho1[i][2] = 0.0;
-    for (j = 0; j < 6; j++)
+    if (this->msmeamflag) {
+      arho2mb[i] = 0.0;
+      arho1m[i][0] = arho1m[i][1] = arho1m[i][2] = 0.0;
+    }
+    for (j = 0; j < 6; j++) {
       arho2[i][j] = 0.0;
-    for (j = 0; j < 10; j++)
+      if (this->msmeamflag) {
+        arho2m[i][j] = 0.0;
+      }
+    }
+    for (j = 0; j < 10; j++) {
       arho3[i][j] = 0.0;
+      if (this->msmeamflag) {
+        arho3m[i][j] = 0.0;
+      }
+    }
     arho3b[i][0] = arho3b[i][1] = arho3b[i][2] = 0.0;
+    if (this->msmeamflag) {
+      arho3mb[i][0] = arho3mb[i][1] = arho3mb[i][2] = 0.0;
+    }
     t_ave[i][0] = t_ave[i][1] = t_ave[i][2] = 0.0;
     tsq_ave[i][0] = tsq_ave[i][1] = tsq_ave[i][2] = 0.0;
   }
+
 }
 
 void
@@ -282,6 +314,9 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
   // double G,Gbar,gam,shp[3+1];
   double ro0i, ro0j;
   double rhoa0i, rhoa1i, rhoa2i, rhoa3i, A1i, A2i, A3i;
+  // msmeam params
+  double rhoa1mj, rhoa2mj, rhoa3mj, A1mj, A2mj, A3mj;
+  double rhoa1mi, rhoa2mi, rhoa3mi, A1mi, A2mi, A3mi;
 
   elti = fmap[type[i]];
   xtmp = x[i][0];
@@ -306,10 +341,20 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
         rhoa1j = ro0j * MathSpecial::fm_exp(-this->beta1_meam[eltj] * aj) * sij;
         rhoa2j = ro0j * MathSpecial::fm_exp(-this->beta2_meam[eltj] * aj) * sij;
         rhoa3j = ro0j * MathSpecial::fm_exp(-this->beta3_meam[eltj] * aj) * sij;
+        if (this->msmeamflag){
+          rhoa1mj = ro0j * this->t1m_meam[eltj] * MathSpecial::fm_exp(-this->beta1m_meam[eltj] * aj) * sij;
+          rhoa2mj = ro0j * this->t2m_meam[eltj] * MathSpecial::fm_exp(-this->beta2m_meam[eltj] * aj) * sij;
+          rhoa3mj = ro0j * this->t3m_meam[eltj] * MathSpecial::fm_exp(-this->beta3m_meam[eltj] * aj) * sij;
+        }
         rhoa0i = ro0i * MathSpecial::fm_exp(-this->beta0_meam[elti] * ai) * sij;
         rhoa1i = ro0i * MathSpecial::fm_exp(-this->beta1_meam[elti] * ai) * sij;
         rhoa2i = ro0i * MathSpecial::fm_exp(-this->beta2_meam[elti] * ai) * sij;
         rhoa3i = ro0i * MathSpecial::fm_exp(-this->beta3_meam[elti] * ai) * sij;
+        if (this->msmeamflag){
+          rhoa1mi = ro0i * this->t1m_meam[elti] * MathSpecial::fm_exp(-this->beta1m_meam[elti] * ai) * sij;
+          rhoa2mi = ro0i * this->t2m_meam[elti] * MathSpecial::fm_exp(-this->beta2m_meam[elti] * ai) * sij;
+          rhoa3mi = ro0i * this->t3m_meam[elti] * MathSpecial::fm_exp(-this->beta3m_meam[elti] * ai) * sij;
+        }
         if (this->ialloy == 1) {
           rhoa1j = rhoa1j * this->t1_meam[eltj];
           rhoa2j = rhoa2j * this->t2_meam[eltj];
@@ -321,6 +366,7 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
         rho0[i] = rho0[i] + rhoa0j;
         rho0[j] = rho0[j] + rhoa0i;
         // For ialloy = 2, use single-element value (not average)
+        // For ialloy = 2, use single-element value (not average)
         if (this->ialloy != 2) {
           t_ave[i][0] = t_ave[i][0] + this->t1_meam[eltj] * rhoa0j;
           t_ave[i][1] = t_ave[i][1] + this->t2_meam[eltj] * rhoa0j;
@@ -348,18 +394,42 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
         A3i = rhoa3i / (rij2 * rij);
         nv2 = 0;
         nv3 = 0;
+        if (this->msmeamflag) {
+          arho2mb[i] = arho2mb[i] + rhoa2mj;
+          arho2mb[j] = arho2mb[j] + rhoa2mi;
+          A1mj = rhoa1mj/rij;
+          A2mj = rhoa2mj/rij2;
+          A3mj = rhoa3mj/(rij2*rij);
+          A1mi = rhoa1mi/rij;
+          A2mi = rhoa2mi/rij2;
+          A3mi = rhoa3mi/(rij2*rij);
+        }
         for (m = 0; m < 3; m++) {
           arho1[i][m] = arho1[i][m] + A1j * delij[m];
           arho1[j][m] = arho1[j][m] - A1i * delij[m];
           arho3b[i][m] = arho3b[i][m] + rhoa3j * delij[m] / rij;
           arho3b[j][m] = arho3b[j][m] - rhoa3i * delij[m] / rij;
+          if (this->msmeamflag) {
+            arho1m[i][m] = arho1m[i][m] + A1mj*delij[m];
+            arho1m[j][m] = arho1m[j][m] - A1mi*delij[m];
+            arho3mb[i][m] = arho3mb[i][m] + rhoa3mj*delij[m] / rij;
+            arho3mb[j][m] = arho3mb[j][m] - rhoa3mi*delij[m] / rij;
+          }
           for (n = m; n < 3; n++) {
             arho2[i][nv2] = arho2[i][nv2] + A2j * delij[m] * delij[n];
             arho2[j][nv2] = arho2[j][nv2] + A2i * delij[m] * delij[n];
+            if (this->msmeamflag) {
+              arho2m[i][nv2] = arho2m[i][nv2] + A2mj*delij[m] * delij[n];
+              arho2m[j][nv2] = arho2m[j][nv2] + A2mi*delij[m] * delij[n];
+            }
             nv2 = nv2 + 1;
             for (p = n; p < 3; p++) {
               arho3[i][nv3] = arho3[i][nv3] + A3j * delij[m] * delij[n] * delij[p];
               arho3[j][nv3] = arho3[j][nv3] - A3i * delij[m] * delij[n] * delij[p];
+              if (this->msmeamflag) {
+                arho3m[i][nv3] = arho3m[i][nv3] + A3mj*delij[m]*delij[n]*delij[p];
+                arho3m[j][nv3] = arho3m[j][nv3] - A3mi*delij[m]*delij[n]*delij[p];
+              }
               nv3 = nv3 + 1;
             }
           }
diff --git a/src/MEAM/meam_force.cpp b/src/MEAM/meam_force.cpp
index acc3d5672a..4bc7380898 100644
--- a/src/MEAM/meam_force.cpp
+++ b/src/MEAM/meam_force.cpp
@@ -61,6 +61,17 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
   double t1i, t2i, t3i, t1j, t2j, t3j;
   double scaleij;
 
+  double rhoa1mj,drhoa1mj,rhoa1mi,drhoa1mi;
+  double rhoa2mj,drhoa2mj,rhoa2mi,drhoa2mi;
+  double rhoa3mj, drhoa3mj, rhoa3mi, drhoa3mi;
+  double arg1i1m, arg1j1m, arg1i2m, arg1j2m, arg1i3m, arg1j3m, arg3i3m, arg3j3m;
+  double drho1mdr1, drho1mdr2, drho1mds1, drho1mds2;
+  double drho1mdrm1[3], drho1mdrm2[3];
+  double drho2mdr1, drho2mdr2, drho2mds1, drho2mds2;
+  double drho2mdrm1[3], drho2mdrm2[3];
+  double drho3mdr1, drho3mdr2, drho3mds1, drho3mds2;
+  double drho3mdrm1[3], drho3mdrm2[3];
+
   third = 1.0 / 3.0;
   sixth = 1.0 / 6.0;
 
@@ -74,6 +85,7 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
   zitmp = x[i][2];
 
   //     Treat each pair
+
   for (jn = 0; jn < numneigh; jn++) {
     j = firstneigh[jn];
     eltj = fmap[type[j]];
@@ -89,7 +101,6 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
       if (rij2 < this->cutforcesq) {
         rij = sqrt(rij2);
         recip = 1.0 / rij;
-
         //     Compute phi and phip
         ind = this->eltind[elti][eltj];
         pp = rij * this->rdrar;
@@ -114,6 +125,7 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
         //     write(1,*) "force_meamf: phip: ",phip
 
         //     Compute pair densities and derivatives
+
         invrei = 1.0 / this->re_meam[elti][elti];
         ai = rij * invrei - 1.0;
         ro0i = this->rho0_meam[elti];
@@ -126,6 +138,15 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
         rhoa3i = ro0i * MathSpecial::fm_exp(-this->beta3_meam[elti] * ai);
         drhoa3i = -this->beta3_meam[elti] * invrei * rhoa3i;
 
+        if (this->msmeamflag) {
+          rhoa1mi = ro0i * MathSpecial::fm_exp(-this->beta1m_meam[elti] * ai) * t1m_meam[elti];
+          drhoa1mi = -this->beta1m_meam[elti] * invrei * rhoa1mi;
+          rhoa2mi = ro0i * MathSpecial::fm_exp(-this->beta2m_meam[elti] * ai) * t2m_meam[elti];
+          drhoa2mi = -this->beta2m_meam[elti] * invrei * rhoa2mi;
+          rhoa3mi = ro0i * MathSpecial::fm_exp(-this->beta3m_meam[elti] * ai) * t3m_meam[elti];
+          drhoa3mi = -this->beta3m_meam[elti] * invrei * rhoa3mi;
+        }
+
         if (elti != eltj) {
           invrej = 1.0 / this->re_meam[eltj][eltj];
           aj = rij * invrej - 1.0;
@@ -138,6 +159,16 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
           drhoa2j = -this->beta2_meam[eltj] * invrej * rhoa2j;
           rhoa3j = ro0j * MathSpecial::fm_exp(-this->beta3_meam[eltj] * aj);
           drhoa3j = -this->beta3_meam[eltj] * invrej * rhoa3j;
+
+          if (this->msmeamflag) {
+            rhoa1mj = ro0j * t1m_meam[eltj] * MathSpecial::fm_exp(-this->beta1m_meam[eltj] * aj);
+            drhoa1mj = -this->beta1m_meam[eltj] * invrej * rhoa1mj;
+            rhoa2mj = ro0j * t2m_meam[eltj] * MathSpecial::fm_exp(-this->beta2m_meam[eltj] * aj);
+            drhoa2mj = -this->beta2m_meam[eltj] * invrej * rhoa2mj;
+            rhoa3mj = ro0j * t3m_meam[eltj] * MathSpecial::fm_exp(-this->beta3m_meam[eltj] * aj);
+            drhoa3mj = -this->beta3m_meam[eltj] * invrej * rhoa3mj;
+          }
+
         } else {
           rhoa0j = rhoa0i;
           drhoa0j = drhoa0i;
@@ -147,6 +178,15 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
           drhoa2j = drhoa2i;
           rhoa3j = rhoa3i;
           drhoa3j = drhoa3i;
+
+          if (this->msmeamflag) {
+            rhoa1mj = rhoa1mi;
+            drhoa1mj = drhoa1mi;
+            rhoa2mj = rhoa2mi;
+            drhoa2mj = drhoa2mi;
+            rhoa3mj = rhoa3mi;
+            drhoa3mj = drhoa3mi;
+          }
         }
 
         const double t1mi = this->t1_meam[elti];
@@ -156,7 +196,10 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
         const double t2mj = this->t2_meam[eltj];
         const double t3mj = this->t3_meam[eltj];
 
-        if (this->ialloy == 1) {
+        // ialloy mod not needed in MS-MEAM, but similarity here is that we multply rhos by t.
+        // We did this above with rhoa1mj, rhoa2mj, etc.
+
+        if (this->ialloy == 1 || this->msmeamflag) {
           rhoa1j  *= t1mj;
           rhoa2j  *= t2mj;
           rhoa3j  *= t3mj;
@@ -200,6 +243,39 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
           arg3j3 = arg3j3 - arho3b[j][n] * delij[n];
         }
 
+        // msmeam arhom args
+
+        nv2 = 0;
+        nv3 = 0;
+        arg1i1m = 0.0;
+        arg1j1m = 0.0;
+        arg1i2m = 0.0;
+        arg1j2m = 0.0;
+        arg1i3m = 0.0;
+        arg1j3m = 0.0;
+        arg3i3m = 0.0;
+        arg3j3m = 0.0;
+        if (this->msmeamflag) {
+          for (n = 0; n < 3; n++) {
+            for (p = n; p < 3; p++) {
+              for (q = p; q < 3; q++) {
+                arg = delij[n] * delij[p] * delij[q] * this->v3D[nv3];
+                arg1i3m = arg1i3m - arho3m[i][nv3] * arg;
+                arg1j3m = arg1j3m + arho3m[j][nv3] * arg;
+                nv3 = nv3 + 1;
+              }
+              arg = delij[n] * delij[p] * this->v2D[nv2];
+              arg1i2m = arg1i2m + arho2m[i][nv2] * arg;
+              arg1j2m = arg1j2m + arho2m[j][nv2] * arg;
+              nv2 = nv2 + 1;
+            }
+            arg1i1m = arg1i1m - arho1m[i][n] * delij[n];
+            arg1j1m = arg1j1m + arho1m[j][n] * delij[n];
+            arg3i3m = arg3i3m - arho3mb[i][n] * delij[n];
+            arg3j3m = arg3j3m + arho3mb[j][n] * delij[n];
+          }
+        }
+
         //     rho0 terms
         drho0dr1 = drhoa0j * sij;
         drho0dr2 = drhoa0i * sij;
@@ -254,32 +330,83 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
           drho3drm2[m] = (-a3 * drho3drm2[m] + a3a * arho3b[j][m]) * rhoa3i;
         }
 
-        //     Compute derivatives of weighting functions t wrt rij
-        t1i = t_ave[i][0];
-        t2i = t_ave[i][1];
-        t3i = t_ave[i][2];
-        t1j = t_ave[j][0];
-        t2j = t_ave[j][1];
-        t3j = t_ave[j][2];
+        if (this->msmeamflag) {
+          //     rho1m terms
+          a1 = 2 * sij / rij;
+          drho1mdr1 = a1 * (drhoa1mj - rhoa1mj / rij) * arg1i1m;
+          drho1mdr2 = a1 * (drhoa1mi - rhoa1mi / rij) * arg1j1m;
+          drho1mdr1 *= -1.0;
+          drho1mdr2 *= -1.0;
+          a1 = 2.0 * sij / rij;
+          for (m = 0; m < 3; m++) {
+            drho1mdrm1[m] = a1 * rhoa1mj * arho1m[i][m];
+            drho1mdrm2[m] = -a1 * rhoa1mi * arho1m[j][m];
+          }
 
-        if (this->ialloy == 1) {
+          //     rho2m terms
+          a2 = 2 * sij / rij2;
+          drho2mdr1 = a2 * (drhoa2mj - 2 * rhoa2mj / rij) * arg1i2m - 2.0 / 3.0 * arho2mb[i] * drhoa2mj * sij;
+          drho2mdr2 = a2 * (drhoa2mi - 2 * rhoa2mi / rij) * arg1j2m - 2.0 / 3.0 * arho2mb[j] * drhoa2mi * sij;
+          a2 = 4 * sij / rij2;
+          for (m = 0; m < 3; m++) {
+            drho2mdrm1[m] = 0.0;
+            drho2mdrm2[m] = 0.0;
+            for (n = 0; n < 3; n++) {
+              drho2mdrm1[m] += arho2m[i][this->vind2D[m][n]] * delij[n];
+              drho2mdrm2[m] -= arho2m[j][this->vind2D[m][n]] * delij[n];
+            }
+            drho2mdrm1[m] = a2 * rhoa2mj * drho2mdrm1[m];
+            drho2mdrm2[m] = -a2 * rhoa2mi * drho2mdrm2[m];
+          }
 
-          a1i = fdiv_zero(drhoa0j * sij, tsq_ave[i][0]);
-          a1j = fdiv_zero(drhoa0i * sij, tsq_ave[j][0]);
-          a2i = fdiv_zero(drhoa0j * sij, tsq_ave[i][1]);
-          a2j = fdiv_zero(drhoa0i * sij, tsq_ave[j][1]);
-          a3i = fdiv_zero(drhoa0j * sij, tsq_ave[i][2]);
-          a3j = fdiv_zero(drhoa0i * sij, tsq_ave[j][2]);
+          //     rho3m terms
+          rij3 = rij * rij2;
+          a3 = 2 * sij / rij3;
+          a3a = 6.0 / 5.0 * sij / rij;
+          drho3mdr1 = a3 * (drhoa3mj - 3 * rhoa3mj / rij) * arg1i3m - a3a * (drhoa3mj - rhoa3mj / rij) * arg3i3m;
+          drho3mdr2 = a3 * (drhoa3mi - 3 * rhoa3mi / rij) * arg1j3m - a3a * (drhoa3mi - rhoa3mi / rij) * arg3j3m;
+          drho3mdr1 *= -1.0;
+          drho3mdr2 *= -1.0;
 
-          dt1dr1 = a1i * (t1mj - t1i * MathSpecial::square(t1mj));
-          dt1dr2 = a1j * (t1mi - t1j * MathSpecial::square(t1mi));
-          dt2dr1 = a2i * (t2mj - t2i * MathSpecial::square(t2mj));
-          dt2dr2 = a2j * (t2mi - t2j * MathSpecial::square(t2mi));
-          dt3dr1 = a3i * (t3mj - t3i * MathSpecial::square(t3mj));
-          dt3dr2 = a3j * (t3mi - t3j * MathSpecial::square(t3mi));
+          a3 = 6 * sij / rij3;
+          a3a = 6 * sij / (5 * rij);
+          for (m = 0; m < 3; m++) {
+            drho3mdrm1[m] = 0.0;
+            drho3mdrm2[m] = 0.0;
+            nv2 = 0;
+            for (n = 0; n < 3; n++) {
+              for (p = n; p < 3; p++) {
+                arg = delij[n] * delij[p] * this->v2D[nv2];
+                drho3mdrm1[m] += arho3m[i][this->vind3D[m][n][p]] * arg;
+                drho3mdrm2[m] += arho3m[j][this->vind3D[m][n][p]] * arg;
+                nv2 = nv2 + 1;
+              }
+            }
+            drho3mdrm1[m] = (a3 * drho3mdrm1[m] - a3a * arho3mb[i][m]) * rhoa3mj;
+            drho3mdrm2[m] = (-a3 * drho3mdrm2[m] + a3a * arho3mb[j][m]) * rhoa3mi;
+          }
+        } else {
+          for (m = 0; m < 3; m++) {
+            drho1mdrm1[m] = 0.0;
+            drho1mdrm2[m] = 0.0;
+            drho2mdrm1[m] = 0.0;
+            drho2mdrm2[m] = 0.0;
+            drho3mdrm1[m] = 0.0;
+            drho3mdrm2[m] = 0.0;
+          }
+        }
 
-        } else if (this->ialloy == 2) {
+        // compute derivatives of weighting functions t wrt rij
+        // weighting functions t set to unity for MS-MEAM
 
+        if (this->msmeamflag) {
+
+          t1i = 1.0;
+          t2i = 1.0;
+          t3i = 1.0;
+          t1j = 1.0;
+          t2j = 1.0;
+          t3j = 1.0;
           dt1dr1 = 0.0;
           dt1dr2 = 0.0;
           dt2dr1 = 0.0;
@@ -289,38 +416,98 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
 
         } else {
 
-          ai = 0.0;
-          if (!iszero(rho0[i]))
-            ai = drhoa0j * sij / rho0[i];
-          aj = 0.0;
-          if (!iszero(rho0[j]))
-            aj = drhoa0i * sij / rho0[j];
+          t1i = t_ave[i][0];
+          t2i = t_ave[i][1];
+          t3i = t_ave[i][2];
+          t1j = t_ave[j][0];
+          t2j = t_ave[j][1];
+          t3j = t_ave[j][2];
+
+          if (this->ialloy == 1) {
+
+            a1i = fdiv_zero(drhoa0j * sij, tsq_ave[i][0]);
+            a1j = fdiv_zero(drhoa0i * sij, tsq_ave[j][0]);
+            a2i = fdiv_zero(drhoa0j * sij, tsq_ave[i][1]);
+            a2j = fdiv_zero(drhoa0i * sij, tsq_ave[j][1]);
+            a3i = fdiv_zero(drhoa0j * sij, tsq_ave[i][2]);
+            a3j = fdiv_zero(drhoa0i * sij, tsq_ave[j][2]);
+
+            dt1dr1 = a1i * (t1mj - t1i * MathSpecial::square(t1mj));
+            dt1dr2 = a1j * (t1mi - t1j * MathSpecial::square(t1mi));
+            dt2dr1 = a2i * (t2mj - t2i * MathSpecial::square(t2mj));
+            dt2dr2 = a2j * (t2mi - t2j * MathSpecial::square(t2mi));
+            dt3dr1 = a3i * (t3mj - t3i * MathSpecial::square(t3mj));
+            dt3dr2 = a3j * (t3mi - t3j * MathSpecial::square(t3mi));
+
+          } else if (this->ialloy == 2) {
+
+            dt1dr1 = 0.0;
+            dt1dr2 = 0.0;
+            dt2dr1 = 0.0;
+            dt2dr2 = 0.0;
+            dt3dr1 = 0.0;
+            dt3dr2 = 0.0;
+
+          } else {
+
+            ai = 0.0;
+            if (!iszero(rho0[i]))
+              ai = drhoa0j * sij / rho0[i];
+            aj = 0.0;
+            if (!iszero(rho0[j]))
+              aj = drhoa0i * sij / rho0[j];
+
+            dt1dr1 = ai * (t1mj - t1i);
+            dt1dr2 = aj * (t1mi - t1j);
+            dt2dr1 = ai * (t2mj - t2i);
+            dt2dr2 = aj * (t2mi - t2j);
+            dt3dr1 = ai * (t3mj - t3i);
+            dt3dr2 = aj * (t3mi - t3j);
+          }
 
-          dt1dr1 = ai * (t1mj - t1i);
-          dt1dr2 = aj * (t1mi - t1j);
-          dt2dr1 = ai * (t2mj - t2i);
-          dt2dr2 = aj * (t2mi - t2j);
-          dt3dr1 = ai * (t3mj - t3i);
-          dt3dr2 = aj * (t3mi - t3j);
         }
 
         //     Compute derivatives of total density wrt rij, sij and rij(3)
         get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shpi);
         get_shpfcn(this->lattce_meam[eltj][eltj], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shpj);
 
-        drhodr1 = dgamma1[i] * drho0dr1 +
-          dgamma2[i] * (dt1dr1 * rho1[i] + t1i * drho1dr1 + dt2dr1 * rho2[i] + t2i * drho2dr1 +
-                        dt3dr1 * rho3[i] + t3i * drho3dr1) -
-          dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
-        drhodr2 = dgamma1[j] * drho0dr2 +
-          dgamma2[j] * (dt1dr2 * rho1[j] + t1j * drho1dr2 + dt2dr2 * rho2[j] + t2j * drho2dr2 +
-                        dt3dr2 * rho3[j] + t3j * drho3dr2) -
-          dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
-        for (m = 0; m < 3; m++) {
-          drhodrm1[m] = 0.0;
-          drhodrm2[m] = 0.0;
-          drhodrm1[m] = dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
-          drhodrm2[m] = dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
+        if (this->msmeamflag) {
+          drhodr1 = dgamma1[i] * drho0dr1 +
+            dgamma2[i] * (dt1dr1 * rho1[i] + t1i * (drho1dr1 - drho1mdr1) +
+                          dt2dr1 * rho2[i] + t2i * (drho2dr1 - drho2mdr1) +
+                          dt3dr1 * rho3[i] + t3i * (drho3dr1 - drho3mdr1)) -
+            dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
+          drhodr2 = dgamma1[j] * drho0dr2 +
+            dgamma2[j] * (dt1dr2 * rho1[j] + t1j * (drho1dr2 - drho1mdr2) +
+                          dt2dr2 * rho2[j] + t2j * (drho2dr2 - drho2mdr2) +
+                          dt3dr2 * rho3[j] + t3j * (drho3dr2 - drho3mdr2)) -
+            dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
+          for (m = 0; m < 3; m++) {
+            drhodrm1[m] = 0.0;
+            drhodrm2[m] = 0.0;
+            drhodrm1[m] = dgamma2[i] * (t1i * (drho1drm1[m] - drho1mdrm1[m]) +
+                                        t2i * (drho2drm1[m] - drho2mdrm1[m]) +
+                                        t3i * (drho3drm1[m] - drho3mdrm1[m]) );
+            drhodrm2[m] = dgamma2[j] * (t1j * (drho1drm2[m] - drho1mdrm2[m]) +
+                                        t2j * (drho2drm2[m] - drho2mdrm2[m]) +
+                                        t3j * (drho3drm2[m] - drho3mdrm2[m]) );
+          }
+        } else {
+
+          drhodr1 = dgamma1[i] * drho0dr1 +
+            dgamma2[i] * (dt1dr1 * rho1[i] + t1i * drho1dr1 + dt2dr1 * rho2[i] + t2i * drho2dr1 +
+                          dt3dr1 * rho3[i] + t3i * drho3dr1) -
+            dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
+          drhodr2 = dgamma1[j] * drho0dr2 +
+            dgamma2[j] * (dt1dr2 * rho1[j] + t1j * drho1dr2 + dt2dr2 * rho2[j] + t2j * drho2dr2 +
+                          dt3dr2 * rho3[j] + t3j * drho3dr2) -
+            dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
+          for (m = 0; m < 3; m++) {
+            drhodrm1[m] = 0.0;
+            drhodrm2[m] = 0.0;
+            drhodrm1[m] = dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
+            drhodrm2[m] = dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
+          }
         }
 
         //     Compute derivatives wrt sij, but only if necessary
@@ -328,17 +515,37 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
           drho0ds1 = rhoa0j;
           drho0ds2 = rhoa0i;
           a1 = 2.0 / rij;
-          drho1ds1 = a1 * rhoa1j * arg1i1;
-          drho1ds2 = a1 * rhoa1i * arg1j1;
           a2 = 2.0 / rij2;
-          drho2ds1 = a2 * rhoa2j * arg1i2 - 2.0 / 3.0 * arho2b[i] * rhoa2j;
-          drho2ds2 = a2 * rhoa2i * arg1j2 - 2.0 / 3.0 * arho2b[j] * rhoa2i;
           a3 = 2.0 / rij3;
           a3a = 6.0 / (5.0 * rij);
+
+          drho1ds1 = a1 * rhoa1j * arg1i1;
+          drho1ds2 = a1 * rhoa1i * arg1j1;
+          drho2ds1 = a2 * rhoa2j * arg1i2 - 2.0 / 3.0 * arho2b[i] * rhoa2j;
+          drho2ds2 = a2 * rhoa2i * arg1j2 - 2.0 / 3.0 * arho2b[j] * rhoa2i;
           drho3ds1 = a3 * rhoa3j * arg1i3 - a3a * rhoa3j * arg3i3;
           drho3ds2 = a3 * rhoa3i * arg1j3 - a3a * rhoa3i * arg3j3;
 
+          if (this->msmeamflag) {
+            drho1mds1 = a1 * rhoa1mj * arg1i1m;
+            drho1mds2 = a1 * rhoa1mi * arg1j1m;
+            drho2mds1 = a2 * rhoa2mj * arg1i2m - 2.0 / 3.0 * arho2mb[i] * rhoa2mj;
+            drho2mds2 = a2 * rhoa2mi * arg1j2m - 2.0 / 3.0 * arho2mb[j] * rhoa2mi;
+            drho3mds1 = a3 * rhoa3mj * arg1i3m - a3a * rhoa3mj * arg3i3m;
+            drho3mds2 = a3 * rhoa3mi * arg1j3m - a3a * rhoa3mi * arg3j3m;
+            drho3mds1 *= -1;
+            drho3mds2 *= -1;
+          } else {
+            drho1mds1 = 0.0;
+            drho1mds2 = 0.0;
+            drho2mds1 = 0.0;
+            drho2mds2 = 0.0;
+            drho3mds1 = 0.0;
+            drho3mds2 = 0.0;
+          }
+
           if (this->ialloy == 1) {
+
             a1i = fdiv_zero(rhoa0j, tsq_ave[i][0]);
             a1j = fdiv_zero(rhoa0i, tsq_ave[j][0]);
             a2i = fdiv_zero(rhoa0j, tsq_ave[i][1]);
@@ -379,19 +586,36 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
             dt3ds2 = aj * (t3mi - t3j);
           }
 
-          drhods1 = dgamma1[i] * drho0ds1 +
-            dgamma2[i] * (dt1ds1 * rho1[i] + t1i * drho1ds1 + dt2ds1 * rho2[i] + t2i * drho2ds1 +
-                          dt3ds1 * rho3[i] + t3i * drho3ds1) -
-            dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
-          drhods2 = dgamma1[j] * drho0ds2 +
-            dgamma2[j] * (dt1ds2 * rho1[j] + t1j * drho1ds2 + dt2ds2 * rho2[j] + t2j * drho2ds2 +
-                          dt3ds2 * rho3[j] + t3j * drho3ds2) -
-            dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
+          if (this->msmeamflag) {
+            drhods1 = dgamma1[i] * drho0ds1 +
+              dgamma2[i] * (dt1ds1 * rho1[i] + t1i * (drho1ds1 - drho1mds1) +
+                            dt2ds1 * rho2[i] + t2i * (drho2ds1 - drho2mds1) +
+                            dt3ds1 * rho3[i] + t3i * (drho3ds1 - drho3mds1)) -
+              dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
+            drhods2 = dgamma1[j] * drho0ds2 +
+              dgamma2[j] * (dt1ds2 * rho1[j] + t1j * (drho1ds2 - drho1mds2) +
+                            dt2ds2 * rho2[j] + t2j * (drho2ds2 - drho2mds2) +
+                            dt3ds2 * rho3[j] + t3j * (drho3ds2 - drho3mds2)) -
+              dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
+          }
+          else {
+            drhods1 = dgamma1[i] * drho0ds1 +
+              dgamma2[i] * (dt1ds1 * rho1[i] + t1i * drho1ds1 + dt2ds1 * rho2[i] + t2i * drho2ds1 +
+                            dt3ds1 * rho3[i] + t3i * drho3ds1) -
+              dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
+            drhods2 = dgamma1[j] * drho0ds2 +
+              dgamma2[j] * (dt1ds2 * rho1[j] + t1j * drho1ds2 + dt2ds2 * rho2[j] + t2j * drho2ds2 +
+                            dt3ds2 * rho3[j] + t3j * drho3ds2) -
+              dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
+          }
         }
 
-        //     Compute derivatives of energy wrt rij, sij and rij[3]
+        // Compute derivatives of energy wrt rij, sij and rij[3]
+        // MS-MEAM affects phip
+
         dUdrij = phip * sij + frhop[i] * drhodr1 + frhop[j] * drhodr2;
         dUdsij = 0.0;
+
         if (!iszero(dscrfcn[fnoffset + jn])) {
           dUdsij = phi + frhop[i] * drhods1 + frhop[j] * drhods2;
         }
diff --git a/src/MEAM/meam_impl.cpp b/src/MEAM/meam_impl.cpp
index bbfb83e94a..5290647b18 100644
--- a/src/MEAM/meam_impl.cpp
+++ b/src/MEAM/meam_impl.cpp
@@ -34,6 +34,11 @@ MEAM::MEAM(Memory* mem)
   gamma = dgamma1 = dgamma2 = dgamma3 = arho2b = nullptr;
   arho1 = arho2 = arho3 = arho3b = t_ave = tsq_ave = nullptr;
 
+  // msmeam arrays
+  msmeamflag = 0;
+  arho2mb = nullptr;
+  arho1m = arho2m = arho3m = arho3mb = nullptr;
+
   maxneigh = 0;
   scrfcn = dscrfcn = fcpair = nullptr;
   copymode = 0;
@@ -43,7 +48,9 @@ MEAM::MEAM(Memory* mem)
     A_meam[i] = rho0_meam[i] = beta0_meam[i] =
       beta1_meam[i]= beta2_meam[i] = beta3_meam[i] =
       t0_meam[i] = t1_meam[i] = t2_meam[i] = t3_meam[i] =
-      rho_ref_meam[i] = ibar_meam[i] = ielt_meam[i] = 0.0;
+      rho_ref_meam[i] = ibar_meam[i] = ielt_meam[i] =
+      t1m_meam[i] = t2m_meam[i] = t3m_meam[i] =
+      beta1m_meam[i] = beta2m_meam[i] = beta3m_meam[i] = 0.0;
     for (int j = 0; j < maxelt; j++) {
       lattce_meam[i][j] = FCC;
       Ec_meam[i][j] = re_meam[i][j] = alpha_meam[i][j] = delta_meam[i][j] = ebound_meam[i][j] = attrac_meam[i][j] = repuls_meam[i][j] = 0.0;
@@ -87,4 +94,13 @@ MEAM::~MEAM()
   memory->destroy(this->scrfcn);
   memory->destroy(this->dscrfcn);
   memory->destroy(this->fcpair);
+
+  // msmeam
+  if (this->msmeamflag){
+    memory->destroy(this->arho1m);
+    memory->destroy(this->arho2m);
+    memory->destroy(this->arho3m);
+    memory->destroy(this->arho2mb);
+    memory->destroy(this->arho3mb);
+  }
 }
diff --git a/src/MEAM/meam_setup_done.cpp b/src/MEAM/meam_setup_done.cpp
index 93f2552465..de1188349c 100644
--- a/src/MEAM/meam_setup_done.cpp
+++ b/src/MEAM/meam_setup_done.cpp
@@ -220,7 +220,6 @@ void MEAM::compute_pair_meam()
       // loop over r values and compute
       for (j = 0; j < this->nr; j++) {
         r = j * this->dr;
-
         this->phir[nv2][j] = phi_meam(r, a, b);
 
         // if using second-nearest neighbor, solve recursive problem
@@ -333,9 +332,12 @@ double MEAM::phi_meam(double r, int a, int b)
   lattice_t latta /*unused:,lattb*/;
   double rho_bkgd1, rho_bkgd2;
   double b11s, b22s;
+  // msmeam
+  double t1m1av, t2m1av, t3m1av, t1m2av, t2m2av, t3m2av;
+  double rho1m1, rho2m1, rho3m1;
+  double rho1m2, rho2m2, rho3m2;
 
   double phi_m = 0.0;
-
   // Equation numbers below refer to:
   //   I. Huang et.al., Modelling simul. Mater. Sci. Eng. 3:615
 
@@ -345,8 +347,16 @@ double MEAM::phi_meam(double r, int a, int b)
   Z2 = get_Zij(this->lattce_meam[b][b]);
   Z12 = get_Zij(this->lattce_meam[a][b]);
 
-  get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32);
-
+  // this function has extra args for msmeam
+  if (this->msmeamflag) {
+    get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32,
+                &rho1m1, &rho2m1, &rho3m1,
+                &rho1m2, &rho2m2, &rho3m2);
+  } else {
+    get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32,
+                nullptr, nullptr, nullptr,
+                nullptr, nullptr, nullptr);
+  }
   // if densities are too small, numerical problems may result; just return zero
   if (rho01 <= 1e-14 && rho02 <= 1e-14)
     return 0.0;
@@ -374,6 +384,12 @@ double MEAM::phi_meam(double r, int a, int b)
     get_tavref(&t11av, &t21av, &t31av, &t12av, &t22av, &t32av, this->t1_meam[a], this->t2_meam[a],
                this->t3_meam[a], this->t1_meam[b], this->t2_meam[b], this->t3_meam[b], r, a, b,
                this->lattce_meam[a][b]);
+    // with msmeam call twice with different sets of variables
+    if (this->msmeamflag) {
+      get_tavref(&t1m1av, &t2m1av, &t3m1av, &t1m2av, &t2m2av, &t3m2av, this->t1m_meam[a], this->t2m_meam[a],
+                this->t3m_meam[a], this->t1m_meam[b], this->t2m_meam[b], this->t3m_meam[b], r, a, b,
+                this->lattce_meam[a][b]);
+    }
   }
 
   // for c11b structure, calculate background electron densities
@@ -420,17 +436,33 @@ double MEAM::phi_meam(double r, int a, int b)
       rho0_1 = this->rho0_meam[a] * Z1 * G1;
       rho0_2 = this->rho0_meam[b] * Z2 * G2;
     }
-    Gam1 = (t11av * rho11 + t21av * rho21 + t31av * rho31);
-    if (rho01 < 1.0e-14)
-      Gam1 = 0.0;
-    else
-      Gam1 = Gam1 / (rho01 * rho01);
 
-    Gam2 = (t12av * rho12 + t22av * rho22 + t32av * rho32);
-    if (rho02 < 1.0e-14)
-      Gam2 = 0.0;
-    else
-      Gam2 = Gam2 / (rho02 * rho02);
+    if (this->msmeamflag) {
+      // no additional use of t's here; all included in definitions of rho's for msmeam
+      Gam1 = rho11 + rho21 + rho31 - (rho1m1 + rho2m1 + rho3m1);
+      if (rho01 < 1.0e-14)
+        Gam1 = 0.0;
+      else
+        Gam1 = Gam1 / (rho01 * rho01);
+      Gam2 = rho12 + rho22 + rho32 - (rho1m2 + rho2m2 + rho3m2);
+      if (rho02 < 1.0e-14)
+        Gam2 = 0.0;
+      else
+        Gam2 = Gam2 / (rho02 * rho02);
+
+    } else {
+      Gam1 = (t11av * rho11 + t21av * rho21 + t31av * rho31);
+      if (rho01 < 1.0e-14)
+        Gam1 = 0.0;
+      else
+        Gam1 = Gam1 / (rho01 * rho01);
+
+      Gam2 = (t12av * rho12 + t22av * rho22 + t32av * rho32);
+      if (rho02 < 1.0e-14)
+        Gam2 = 0.0;
+      else
+        Gam2 = Gam2 / (rho02 * rho02);
+    }
 
     G1 = G_gam(Gam1, this->ibar_meam[a], errorflag);
     G2 = G_gam(Gam2, this->ibar_meam[b], errorflag);
@@ -655,7 +687,9 @@ void MEAM::get_sijk(double C, int i, int j, int k, double* sijk)
 //------------------------------------------------------------------------------c
 // Calculate density functions, assuming reference configuration
 void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, double* rho21, double* rho31,
-                       double* rho02, double* rho12, double* rho22, double* rho32)
+                       double* rho02, double* rho12, double* rho22, double* rho32,
+                       double* rho1m1, double* rho2m1, double* rho3m1,
+                       double* rho1m2, double* rho2m2, double* rho3m2)
 {
   double a1, a2;
   double s[3];
@@ -666,18 +700,39 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
   double rhoa02, rhoa12, rhoa22, rhoa32;
   double arat, scrn, denom;
   double C, s111, s112, s221, S11, S22;
+  // msmeam
+  double rhoa1m1, rhoa2m1, rhoa3m1, rhoa1m2, rhoa2m2, rhoa3m2;
 
   a1 = r / this->re_meam[a][a] - 1.0;
   a2 = r / this->re_meam[b][b] - 1.0;
 
   rhoa01 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta0_meam[a] * a1);
-  rhoa11 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1);
-  rhoa21 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1);
-  rhoa31 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1);
-  rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2);
-  rhoa12 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2);
-  rhoa22 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2);
-  rhoa32 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2);
+
+  if (this->msmeamflag) {
+    // the rho variables are multiplied by t here since ialloy not needed in msmeam
+    rhoa11 = this->rho0_meam[a] * this->t1_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1);
+    rhoa21 = this->rho0_meam[a] * this->t2_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1);
+    rhoa31 = this->rho0_meam[a] * this->t3_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1);
+    rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2);
+    rhoa12 = this->rho0_meam[b] * this->t1_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2);
+    rhoa22 = this->rho0_meam[b] * this->t2_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2);
+    rhoa32 = this->rho0_meam[b] * this->t3_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2);
+    // msmeam specific rho vars
+    rhoa1m1 = this->rho0_meam[a] * this->t1m_meam[a] * MathSpecial::fm_exp(-this->beta1m_meam[a] * a1);
+    rhoa2m1 = this->rho0_meam[a] * this->t2m_meam[a] * MathSpecial::fm_exp(-this->beta2m_meam[a] * a1);
+    rhoa3m1 = this->rho0_meam[a] * this->t3m_meam[a] * MathSpecial::fm_exp(-this->beta3m_meam[a] * a1);
+    rhoa1m2 = this->rho0_meam[b] * this->t1m_meam[b] * MathSpecial::fm_exp(-this->beta1m_meam[b] * a2);
+    rhoa2m2 = this->rho0_meam[b] * this->t2m_meam[b] * MathSpecial::fm_exp(-this->beta2m_meam[b] * a2);
+    rhoa3m2 = this->rho0_meam[b] * this->t3m_meam[b] * MathSpecial::fm_exp(-this->beta3m_meam[b] * a2);
+  } else {
+    rhoa11 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1);
+    rhoa21 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1);
+    rhoa31 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1);
+    rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2);
+    rhoa12 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2);
+    rhoa22 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2);
+    rhoa32 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2);
+  }
 
   lat = this->lattce_meam[a][b];
 
@@ -689,7 +744,16 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
   *rho12 = 0.0;
   *rho22 = 0.0;
   *rho32 = 0.0;
+  if (this->msmeamflag) {
+    *rho1m1 = 0.0;
+    *rho2m1 = 0.0;
+    *rho3m1 = 0.0;
+    *rho1m2 = 0.0;
+    *rho2m2 = 0.0;
+    *rho3m2 = 0.0;
+  }
 
+  // keep track of density components separately; combine in the calling subroutine
   switch (lat) {
     case FCC:
       *rho01 = 12.0 * rhoa02;
@@ -710,12 +774,20 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
       *rho02 = 4.0 * rhoa01;
       *rho31 = 32.0 / 9.0 * rhoa32 * rhoa32;
       *rho32 = 32.0 / 9.0 * rhoa31 * rhoa31;
+      if (this->msmeamflag) {
+        *rho3m1 = 32.0 / 9.0 * rhoa3m2 * rhoa3m2;
+        *rho3m2 = 32.0 / 9.0 * rhoa3m1 * rhoa3m1;
+      }
       break;
     case HCP:
       *rho01 = 12 * rhoa02;
       *rho02 = 12 * rhoa01;
       *rho31 = 1.0 / 3.0 * rhoa32 * rhoa32;
       *rho32 = 1.0 / 3.0 * rhoa31 * rhoa31;
+      if (this->msmeamflag) {
+        *rho3m1 = 1.0 / 3.0 * rhoa3m2 * rhoa3m2;
+        *rho3m2 = 1.0 / 3.0 * rhoa3m1 * rhoa3m1;
+      }
       break;
     case DIM:
       get_shpfcn(DIM, 0, 0, s);
@@ -727,6 +799,14 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
       *rho22 = s[1] * rhoa21 * rhoa21;
       *rho31 = s[2] * rhoa32 * rhoa32;
       *rho32 = s[2] * rhoa31 * rhoa31;
+      if (this->msmeamflag) {
+        *rho1m1 = s[0] * rhoa1m2 * rhoa1m2;
+        *rho1m2 = s[0] * rhoa1m1 * rhoa1m1;
+        *rho2m1 = s[1] * rhoa2m2 * rhoa2m2;
+        *rho2m2 = s[1] * rhoa2m1 * rhoa2m1;
+        *rho3m1 = s[2] * rhoa3m2 * rhoa3m2;
+        *rho3m2 = s[2] * rhoa3m1 * rhoa3m1;
+      }
       break;
     case C11:
       *rho01 = rhoa01;
@@ -737,17 +817,28 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
       *rho22 = rhoa22;
       *rho31 = rhoa31;
       *rho32 = rhoa32;
+      if (this->msmeamflag) {
+        *rho1m1 = rhoa1m1;
+        *rho1m2 = rhoa1m2;
+        *rho2m1 = rhoa2m1;
+        *rho2m2 = rhoa2m2;
+        *rho3m1 = rhoa3m1;
+        *rho3m2 = rhoa3m2;
+      }
       break;
     case L12:
       *rho01 = 8 * rhoa01 + 4 * rhoa02;
       *rho02 = 12 * rhoa01;
-      if (this->ialloy == 1) {
+      if (this->ialloy ==1){
         *rho21 = 8. / 3. * MathSpecial::square(rhoa21 * this->t2_meam[a] - rhoa22 * this->t2_meam[b]);
         denom = 8 * rhoa01 * MathSpecial::square(this->t2_meam[a]) + 4 * rhoa02 * MathSpecial::square(this->t2_meam[b]);
         if (denom > 0.)
           *rho21 = *rho21 / denom * *rho01;
       } else
         *rho21 = 8. / 3. * (rhoa21 - rhoa22) * (rhoa21 - rhoa22);
+      if (this->msmeamflag) {
+        *rho2m1 = 8. / 3. * (rhoa2m1 - rhoa2m2) * (rhoa2m1 - rhoa2m2);
+      }
       break;
     case B2:
       *rho01 = 8.0 * rhoa02;
@@ -864,6 +955,7 @@ void MEAM::interpolate_meam(int ind)
   this->rdrar = 1.0 / drar;
 
   // phir interp
+
   for (j = 0; j < this->nrar; j++) {
     this->phirar[ind][j] = this->phir[ind][j];
   }
diff --git a/src/MEAM/meam_setup_global.cpp b/src/MEAM/meam_setup_global.cpp
index 545a2ad3f4..5d35242e7c 100644
--- a/src/MEAM/meam_setup_global.cpp
+++ b/src/MEAM/meam_setup_global.cpp
@@ -36,7 +36,8 @@ void
 MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*/, double* alpha,
                         double* b0, double* b1, double* b2, double* b3, double* alat, double* esub,
                         double* asub, double* t0, double* t1, double* t2, double* t3, double* rozero,
-                        int* ibar)
+                        int* ibar, double* b1m, double *b2m, double *b3m, double *t1m, double *t2m,
+                        double *t3m)
 {
 
   int i;
@@ -53,6 +54,11 @@ MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*
     this->beta1_meam[i] = b1[i];
     this->beta2_meam[i] = b2[i];
     this->beta3_meam[i] = b3[i];
+    if (this->msmeamflag){
+      this->beta1m_meam[i] = b1m[i];
+      this->beta2m_meam[i] = b2m[i];
+      this->beta3m_meam[i] = b3m[i];
+    }
     tmplat[i] = alat[i];
     this->Ec_meam[i][i] = esub[i];
     this->A_meam[i] = asub[i];
@@ -60,6 +66,11 @@ MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*
     this->t1_meam[i] = t1[i];
     this->t2_meam[i] = t2[i];
     this->t3_meam[i] = t3[i];
+    if (this->msmeamflag){
+      this->t1m_meam[i] = t1m[i];
+      this->t2m_meam[i] = t2m[i];
+      this->t3m_meam[i] = t3m[i];
+    }
     this->rho0_meam[i] = rozero[i];
     this->ibar_meam[i] = ibar[i];
 
diff --git a/src/MEAM/pair_meam.cpp b/src/MEAM/pair_meam.cpp
index bcfffbe52b..c4a4cfa1d7 100644
--- a/src/MEAM/pair_meam.cpp
+++ b/src/MEAM/pair_meam.cpp
@@ -58,13 +58,12 @@ PairMEAM::PairMEAM(LAMMPS *lmp) : Pair(lmp)
   allocated = 0;
 
   nlibelements = 0;
+
   meam_inst = new MEAM(memory);
+  meam_inst->msmeamflag = msmeamflag = 0;
+  myname = "meam";
+
   scale = nullptr;
-
-  // set comm size needed by this Pair
-
-  comm_forward = 38;
-  comm_reverse = 30;
 }
 
 /* ----------------------------------------------------------------------
@@ -93,7 +92,6 @@ void PairMEAM::compute(int eflag, int vflag)
   int i,ii,n,inum_half,errorflag;
   int *ilist_half,*numneigh_half,**firstneigh_half;
   int *numneigh_full,**firstneigh_full;
-
   ev_init(eflag,vflag);
 
   // neighbor list info
@@ -133,7 +131,6 @@ void PairMEAM::compute(int eflag, int vflag)
 
   int offset = 0;
   errorflag = 0;
-
   for (ii = 0; ii < inum_half; ii++) {
     i = ilist_half[ii];
     meam_inst->meam_dens_init(i,ntype,type,map,x,
@@ -142,9 +139,7 @@ void PairMEAM::compute(int eflag, int vflag)
                     offset);
     offset += numneigh_half[i];
   }
-
   comm->reverse_comm(this);
-
   meam_inst->meam_dens_final(nlocal,eflag_either,eflag_global,eflag_atom,
                    &eng_vdwl,eatom,ntype,type,map,scale,errorflag);
   if (errorflag)
@@ -159,7 +154,6 @@ void PairMEAM::compute(int eflag, int vflag)
 
   double **vptr = nullptr;
   if (vflag_atom) vptr = vatom;
-
   for (ii = 0; ii < inum_half; ii++) {
     i = ilist_half[ii];
     meam_inst->meam_force(i,eflag_global,eflag_atom,vflag_global,
@@ -169,7 +163,6 @@ void PairMEAM::compute(int eflag, int vflag)
                           offset,f,vptr,virial);
     offset += numneigh_half[i];
   }
-
   if (vflag_fdotr) virial_fdotr_compute();
 }
 
@@ -193,7 +186,17 @@ void PairMEAM::allocate()
 
 void PairMEAM::settings(int narg, char ** /*arg*/)
 {
-  if (narg != 0) error->all(FLERR,"Illegal pair_style command");
+  if (narg != 0) error->all(FLERR,"Illegal pair_style {} command", myname);
+
+  // set comm size needed by this Pair
+
+  if (msmeamflag) {
+    comm_forward = 38+23; // plus 23 for msmeam
+    comm_reverse = 30+23; // plus 23 for msmeam
+  } else {
+    comm_forward = 38;
+    comm_reverse = 30;
+  }
 }
 
 /* ----------------------------------------------------------------------
@@ -206,12 +209,7 @@ void PairMEAM::coeff(int narg, char **arg)
 
   if (!allocated) allocate();
 
-  if (narg < 6) error->all(FLERR,"Incorrect args for pair coefficients");
-
-  // ensure I,J args are * *
-
-  if (strcmp(arg[0],"*") != 0 || strcmp(arg[1],"*") != 0)
-    error->all(FLERR,"Incorrect args for pair coefficients");
+  if (narg < 6) error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
 
   // check for presence of first meam file
 
@@ -239,7 +237,7 @@ void PairMEAM::coeff(int narg, char **arg)
   }
   if (paridx < 0) error->all(FLERR,"No MEAM parameter file in pair coefficients");
   if ((narg - paridx - 1) != atom->ntypes)
-    error->all(FLERR,"Incorrect args for pair coefficients");
+    error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
 
   // MEAM element names between 2 filenames
   // nlibelements = # of MEAM elements
@@ -282,7 +280,7 @@ void PairMEAM::coeff(int narg, char **arg)
       if (libelements[j] == arg[i]) break;
     if (j < nlibelements) map[m] = j;
     else if (strcmp(arg[i],"NULL") == 0) map[m] = -1;
-    else error->all(FLERR,"Incorrect args for pair coefficients");
+    else error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
   }
 
   // clear setflag since coeff() called once with I,J = * *
@@ -307,7 +305,7 @@ void PairMEAM::coeff(int narg, char **arg)
     }
   }
 
-  if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
+  if (count == 0) error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
 }
 
 /* ----------------------------------------------------------------------
@@ -317,7 +315,7 @@ void PairMEAM::coeff(int narg, char **arg)
 void PairMEAM::init_style()
 {
   if (force->newton_pair == 0)
-    error->all(FLERR,"Pair style MEAM requires newton pair on");
+    error->all(FLERR,"Pair style {} requires newton pair on", myname);
 
   // need a full and a half neighbor list
 
@@ -360,7 +358,9 @@ void PairMEAM::read_files(const std::string &globalfile,
 
 void PairMEAM::read_global_meam_file(const std::string &globalfile)
 {
+
   // allocate parameter arrays
+
   std::vector<lattice_t> lat(nlibelements);
   std::vector<int> ielement(nlibelements);
   std::vector<int> ibar(nlibelements);
@@ -381,6 +381,15 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
   std::vector<double> rozero(nlibelements);
   std::vector<bool> found(nlibelements, false);
 
+  // allocate 6 extra arrays for msmeam
+
+  std::vector<double> b1m(nlibelements);
+  std::vector<double> b2m(nlibelements);
+  std::vector<double> b3m(nlibelements);
+  std::vector<double> t1m(nlibelements);
+  std::vector<double> t2m(nlibelements);
+  std::vector<double> t3m(nlibelements);
+
   // open global meamf file on proc 0
 
   if (comm->me == 0) {
@@ -416,8 +425,7 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
         std::string lattice_type = values.next_string();
 
         if (!MEAM::str_to_lat(lattice_type, true, lat[index]))
-          error->one(FLERR,"Unrecognized lattice type in MEAM "
-                                       "library file: {}", lattice_type);
+          error->one(FLERR,"Unrecognized lattice type in MEAM library file: {}", lattice_type);
 
         // store parameters
 
@@ -429,6 +437,11 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
         b1[index] = values.next_double();
         b2[index] = values.next_double();
         b3[index] = values.next_double();
+        if (msmeamflag) {
+          b1m[index] = values.next_double();
+          b2m[index] = values.next_double();
+          b3m[index] = values.next_double();
+        }
         alat[index] = values.next_double();
         esub[index] = values.next_double();
         asub[index] = values.next_double();
@@ -436,15 +449,20 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
         t1[index] = values.next_double();
         t2[index] = values.next_double();
         t3[index] = values.next_double();
+        if (msmeamflag) {
+          t1m[index] = values.next_double();
+          t2m[index] = values.next_double();
+          t3m[index] = values.next_double();
+        }
         rozero[index] = values.next_double();
         ibar[index] = values.next_int();
 
         if (!isone(t0[index]))
-          error->one(FLERR,"Unsupported parameter in MEAM library file: t0!=1");
+          error->one(FLERR,"Unsupported parameter in MEAM library file: t0 != 1");
 
         // z given is ignored: if this is mismatched, we definitely won't do what the user said -> fatal error
         if (z[index] != MEAM::get_Zij(lat[index]))
-          error->one(FLERR,"Mismatched parameter in MEAM library file: z!=lat");
+          error->one(FLERR,"Mismatched parameter in MEAM library file: z != lat");
 
         nset++;
       } catch (TokenizerException &e) {
@@ -484,13 +502,29 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
   MPI_Bcast(t2.data(), nlibelements, MPI_DOUBLE, 0, world);
   MPI_Bcast(t3.data(), nlibelements, MPI_DOUBLE, 0, world);
   MPI_Bcast(rozero.data(), nlibelements, MPI_DOUBLE, 0, world);
+  // distribute msmeam parameter sets
+  MPI_Bcast(b1m.data(), nlibelements, MPI_DOUBLE, 0, world);
+  MPI_Bcast(b2m.data(), nlibelements, MPI_DOUBLE, 0, world);
+  MPI_Bcast(b3m.data(), nlibelements, MPI_DOUBLE, 0, world);
+  MPI_Bcast(t1m.data(), nlibelements, MPI_DOUBLE, 0, world);
+  MPI_Bcast(t2m.data(), nlibelements, MPI_DOUBLE, 0, world);
+  MPI_Bcast(t3m.data(), nlibelements, MPI_DOUBLE, 0, world);
 
   // pass element parameters to MEAM package
 
-  meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(),
-                               alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(),
-                               alat.data(), esub.data(), asub.data(), t0.data(), t1.data(),
-                               t2.data(), t3.data(), rozero.data(), ibar.data());
+  if (msmeamflag) {
+    meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(),
+                                alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(),
+                                alat.data(), esub.data(), asub.data(), t0.data(), t1.data(),
+                                t2.data(), t3.data(), rozero.data(), ibar.data(), b1m.data(),
+                                b2m.data(), b3m.data(), t1m.data(), t2m.data(), t3m.data());
+  } else {
+    meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(),
+                                alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(),
+                                alat.data(), esub.data(), asub.data(), t0.data(), t1.data(),
+                                t2.data(), t3.data(), rozero.data(), ibar.data(), nullptr,
+                                nullptr, nullptr, nullptr, nullptr, nullptr);
+  }
 
   // set element masses
 
@@ -613,6 +647,23 @@ int PairMEAM::pack_forward_comm(int n, int *list, double *buf,
     buf[m++] = meam_inst->tsq_ave[j][0];
     buf[m++] = meam_inst->tsq_ave[j][1];
     buf[m++] = meam_inst->tsq_ave[j][2];
+    if (msmeamflag) {
+      buf[m++] = meam_inst->arho2mb[j];
+      buf[m++] = meam_inst->arho1m[j][0];
+      buf[m++] = meam_inst->arho1m[j][1];
+      buf[m++] = meam_inst->arho1m[j][2];
+      buf[m++] = meam_inst->arho2m[j][0];
+      buf[m++] = meam_inst->arho2m[j][1];
+      buf[m++] = meam_inst->arho2m[j][2];
+      buf[m++] = meam_inst->arho2m[j][3];
+      buf[m++] = meam_inst->arho2m[j][4];
+      buf[m++] = meam_inst->arho2m[j][5];
+      for (k = 0; k < 10; k++) buf[m++] = meam_inst->arho3m[j][k];
+      buf[m++] = meam_inst->arho3mb[j][0];
+      buf[m++] = meam_inst->arho3mb[j][1];
+      buf[m++] = meam_inst->arho3mb[j][2];
+    }
+
   }
 
   return m;
@@ -656,6 +707,22 @@ void PairMEAM::unpack_forward_comm(int n, int first, double *buf)
     meam_inst->tsq_ave[i][0] = buf[m++];
     meam_inst->tsq_ave[i][1] = buf[m++];
     meam_inst->tsq_ave[i][2] = buf[m++];
+    if (msmeamflag) {
+      meam_inst->arho2mb[i] = buf[m++];
+      meam_inst->arho1m[i][0] = buf[m++];
+      meam_inst->arho1m[i][1] = buf[m++];
+      meam_inst->arho1m[i][2] = buf[m++];
+      meam_inst->arho2m[i][0] = buf[m++];
+      meam_inst->arho2m[i][1] = buf[m++];
+      meam_inst->arho2m[i][2] = buf[m++];
+      meam_inst->arho2m[i][3] = buf[m++];
+      meam_inst->arho2m[i][4] = buf[m++];
+      meam_inst->arho2m[i][5] = buf[m++];
+      for (k = 0; k < 10; k++) meam_inst->arho3m[i][k] = buf[m++];
+      meam_inst->arho3mb[i][0] = buf[m++];
+      meam_inst->arho3mb[i][1] = buf[m++];
+      meam_inst->arho3mb[i][2] = buf[m++];
+    }
   }
 }
 
@@ -689,6 +756,22 @@ int PairMEAM::pack_reverse_comm(int n, int first, double *buf)
     buf[m++] = meam_inst->tsq_ave[i][0];
     buf[m++] = meam_inst->tsq_ave[i][1];
     buf[m++] = meam_inst->tsq_ave[i][2];
+    if (msmeamflag) {
+      buf[m++] = meam_inst->arho2mb[i];
+      buf[m++] = meam_inst->arho1m[i][0];
+      buf[m++] = meam_inst->arho1m[i][1];
+      buf[m++] = meam_inst->arho1m[i][2];
+      buf[m++] = meam_inst->arho2m[i][0];
+      buf[m++] = meam_inst->arho2m[i][1];
+      buf[m++] = meam_inst->arho2m[i][2];
+      buf[m++] = meam_inst->arho2m[i][3];
+      buf[m++] = meam_inst->arho2m[i][4];
+      buf[m++] = meam_inst->arho2m[i][5];
+      for (k = 0; k < 10; k++) buf[m++] = meam_inst->arho3m[i][k];
+      buf[m++] = meam_inst->arho3mb[i][0];
+      buf[m++] = meam_inst->arho3mb[i][1];
+      buf[m++] = meam_inst->arho3mb[i][2];
+    }
   }
 
   return m;
@@ -724,7 +807,25 @@ void PairMEAM::unpack_reverse_comm(int n, int *list, double *buf)
     meam_inst->tsq_ave[j][0] += buf[m++];
     meam_inst->tsq_ave[j][1] += buf[m++];
     meam_inst->tsq_ave[j][2] += buf[m++];
+    if (msmeamflag) {
+      meam_inst->arho2mb[j] += buf[m++];
+      meam_inst->arho1m[j][0] += buf[m++];
+      meam_inst->arho1m[j][1] += buf[m++];
+      meam_inst->arho1m[j][2] += buf[m++];
+      meam_inst->arho2m[j][0] += buf[m++];
+      meam_inst->arho2m[j][1] += buf[m++];
+      meam_inst->arho2m[j][2] += buf[m++];
+      meam_inst->arho2m[j][3] += buf[m++];
+      meam_inst->arho2m[j][4] += buf[m++];
+      meam_inst->arho2m[j][5] += buf[m++];
+      for (k = 0; k < 10; k++) meam_inst->arho3m[j][k] += buf[m++];
+      meam_inst->arho3mb[j][0] += buf[m++];
+      meam_inst->arho3mb[j][1] += buf[m++];
+      meam_inst->arho3mb[j][2] += buf[m++];
+    }
   }
+
+
 }
 
 /* ----------------------------------------------------------------------
diff --git a/src/MEAM/pair_meam.h b/src/MEAM/pair_meam.h
index 16ba38fcb2..a89714bfa9 100644
--- a/src/MEAM/pair_meam.h
+++ b/src/MEAM/pair_meam.h
@@ -47,6 +47,8 @@ class PairMEAM : public Pair {
   class MEAM *meam_inst;
   double cutmax;                           // max cutoff for all elements
   int nlibelements;                        // # of library elements
+  int msmeamflag;                          // 0 (default) for normal MEAM, 1 for MS-MEAM
+  std::string myname;                      // name of the pair style
   std::vector<std::string> libelements;    // names of library elements
   std::vector<double> mass;                // mass of library element
 
diff --git a/src/MEAM/pair_meam_ms.cpp b/src/MEAM/pair_meam_ms.cpp
new file mode 100644
index 0000000000..982a54f546
--- /dev/null
+++ b/src/MEAM/pair_meam_ms.cpp
@@ -0,0 +1,25 @@
+/* ----------------------------------------------------------------------
+   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
+   https://www.lammps.org/ Sandia National Laboratories
+   LAMMPS development team: developers@lammps.org
+
+   Copyright (2003) Sandia Corporation.  Under the terms of Contract
+   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+   certain rights in this software.  This software is distributed under
+   the GNU General Public License.
+
+   See the README file in the top-level LAMMPS directory.
+------------------------------------------------------------------------- */
+
+#include "pair_meam_ms.h"
+#include "meam.h"
+
+using namespace LAMMPS_NS;
+
+/* ---------------------------------------------------------------------- */
+
+PairMEAMMS::PairMEAMMS(LAMMPS *lmp) : PairMEAM(lmp)
+{
+  meam_inst->msmeamflag = msmeamflag = 1;
+  myname = "meam/ms";
+}
diff --git a/src/MEAM/pair_meam_ms.h b/src/MEAM/pair_meam_ms.h
new file mode 100644
index 0000000000..25878203ed
--- /dev/null
+++ b/src/MEAM/pair_meam_ms.h
@@ -0,0 +1,33 @@
+/* -*- c++ -*- ----------------------------------------------------------
+   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
+   https://www.lammps.org/ Sandia National Laboratories
+   LAMMPS development team: developers@lammps.org
+
+   Copyright (2003) Sandia Corporation.  Under the terms of Contract
+   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+   certain rights in this software.  This software is distributed under
+   the GNU General Public License.
+
+   See the README file in the top-level LAMMPS directory.
+------------------------------------------------------------------------- */
+
+#ifdef PAIR_CLASS
+// clang-format off
+PairStyle(meam/ms,PairMEAMMS);
+// clang-format on
+#else
+
+#ifndef LMP_PAIR_MEAM_MS_H
+#define LMP_PAIR_MEAM_MS_H
+
+#include "pair_meam.h"
+
+namespace LAMMPS_NS {
+
+class PairMEAMMS : public PairMEAM {
+ public:
+  PairMEAMMS(class LAMMPS *);
+};
+}    // namespace LAMMPS_NS
+#endif
+#endif
diff --git a/unittest/force-styles/tests/atomic-pair-meam_ms.yaml b/unittest/force-styles/tests/atomic-pair-meam_ms.yaml
new file mode 100644
index 0000000000..e479514017
--- /dev/null
+++ b/unittest/force-styles/tests/atomic-pair-meam_ms.yaml
@@ -0,0 +1,94 @@
+---
+lammps_version: 22 Dec 2022
+tags: slow
+date_generated: Thu Jan 26 15:27:03 2023
+epsilon: 2.5e-12
+skip_tests:
+prerequisites: ! |
+  pair meam/ms
+pre_commands: ! |
+  variable newton_pair delete
+  if "$(is_active(package,gpu)) > 0.0" then "variable newton_pair index off" else "variable newton_pair index on"
+post_commands: ! ""
+input_file: in.metal
+pair_style: meam/ms
+pair_coeff: ! |
+  * * library.msmeam H Ga4 HGa.msmeam H Ga4
+extract: ! |
+  scale 2
+natoms: 32
+init_vdwl: 785.6030480758675
+init_coul: 0
+init_stress: ! |2-
+   3.3502530994900699e+03  3.6405858278699407e+03  3.6349804214165547e+03 -3.1609283411508039e+02 -7.9448207656135153e+01 -1.9854140603340727e+02
+init_forces: ! |2
+    1  1.2872255079741514e+01 -7.5031848810810864e-01  4.5969595156096510e+01
+    2 -3.9028679722038632e+01 -1.5647800180326567e+02 -1.6643992152928173e+00
+    3 -6.1521549955194672e+01  2.6970968316419874e+02 -9.6866430262650326e+01
+    4  3.1462579880342336e+01  4.0240291291218455e+01  1.1654869213327775e+01
+    5  1.4859248182951113e+01 -3.4132880749392825e+01  6.7430378007130244e+01
+    6  6.4609571260694096e+00 -3.8973222482916441e+01 -2.8510000379627442e+01
+    7  7.8114612113500250e+00 -1.0421431668544374e+01 -4.2887607385766536e+01
+    8 -4.8934215863351795e+01 -6.3567347969802590e-01  1.1845972792272754e+02
+    9  9.4089549606898402e+01 -7.4342942103394511e+00  2.5331198575951383e+01
+   10  1.5130369934140692e+01 -5.9245630928969938e+01 -6.7469126603400198e+01
+   11 -2.5176547213746847e+01  1.1577205529172168e+02 -2.2897457133540517e+01
+   12  6.2237686199502349e+01  2.0501996047945163e+01 -2.8805091517252826e+01
+   13 -5.9438589221526925e+01  3.0453092653824072e+01 -1.9919245831196157e+01
+   14  6.9128305482543766e+01 -7.7400771634148342e+01  3.3376079908119145e+01
+   15 -4.9671207786831857e+01 -4.9520814527298228e+01  8.4325181097614305e+01
+   16 -1.1782591146017666e+01 -3.2478963020209051e+01  1.5503663677714293e+01
+   17  9.0881787245915220e+00  6.2377477671714963e+01 -4.0411006180232363e+01
+   18 -4.2285082775720454e+01  2.4883979527636967e+01 -4.4858149086530510e+00
+   19 -8.0259798420493979e+01  9.6356660229207137e+01  6.0543230952477984e+01
+   20  8.0924547938759346e+01  7.1034504027236025e+01 -7.1958482512489610e+01
+   21  1.0833434220705425e+02 -1.5973910256481020e+02 -2.5432700070393153e+01
+   22 -2.3754601906353900e+00  5.2216955012971823e+01  4.7112051341131576e+00
+   23 -2.7227169255996543e+01  8.1968603165764222e+01  4.6535834898716878e+01
+   24 -2.9230758067555616e+01  6.5909555829367733e+01 -2.8250697734131258e+01
+   25 -5.1310041582953993e+01 -3.0895272949222822e+01 -5.4271286813003794e+00
+   26  3.9605941911194620e+01 -5.5919050176828883e+01 -1.0209061328106253e+01
+   27  8.2934427989660890e+01  6.1956200199325636e+01  5.0072108788590960e+01
+   28 -7.8572755094413296e+01 -3.9613391730681300e+01 -2.6183413623428891e+00
+   29  6.9475725072041925e+01 -6.0535433603583563e+01 -1.4566536349135829e+01
+   30 -2.4347184151182930e+01 -1.9359391333689970e+02 -2.6718379302915952e+01
+   31  7.7351971629808688e+01 -7.0102650745312999e+01 -5.4615048867524763e+01
+   32 -1.5060591772899014e+02  8.4489763988097266e+01  2.9799482293372058e+01
+run_vdwl: 682.3107192428497
+run_coul: 0
+run_stress: ! |2-
+   3.2247564044913129e+03  3.3749506031067485e+03  3.3223794967215117e+03 -2.8460979167554797e+02 -7.2614457076660575e+00 -3.1510685747732862e+02
+run_forces: ! |2
+    1 -1.2037185973996296e+01 -2.5090364403764944e+01  1.4014184973113366e+01
+    2 -3.7365848425239264e+01 -1.5871199357658887e+02  3.7846333470446991e+00
+    3 -3.2057228694304293e+01  2.5316344962361612e+02 -6.0679585186816752e+01
+    4  2.9086197614116237e+01  4.8267528016068823e+01  4.3387429619749920e+00
+    5 -1.1672554618399744e+01 -2.6840760926124332e+01  4.9694308545223279e+01
+    6  1.1892092913978592e+01 -4.9360840569608243e+01 -2.3083171938147949e+01
+    7  2.1084251901459215e+01 -4.8251731643401072e+00 -3.8474871193885967e+01
+    8 -5.7775944085787714e+01  1.3522956442661442e+01  1.1661345819661486e+02
+    9  7.2926105059437930e+01  4.8686056096860133e+00  2.3817134806042311e+01
+   10  1.7307367990304396e+01 -3.0865570121704572e+01 -1.2314307646704794e+01
+   11 -1.1341297645054201e+01  9.1441145595173211e+01 -2.1806407500802493e+01
+   12  4.0645024127126625e+01  1.2207243511090397e+01 -2.6757649464936929e+01
+   13 -5.2283270287937697e+01  3.4023912643812679e+01 -1.9030352703627774e+01
+   14  8.4403128243303399e+01 -9.3773678297574406e+01  1.6481720093363641e+01
+   15 -4.2790833192154764e+01 -4.3242943642279130e+01  7.1075696811865868e+01
+   16 -1.5041912007490836e+01 -3.3544044565611586e+01  2.4823109532967212e+01
+   17 -9.6413207346836316e-01  4.5826021602656141e+01 -3.9155163702194102e+01
+   18 -2.0337015515785971e+01  7.2815285567550134e+00 -8.2049879725129813e+00
+   19 -6.4105384732081120e+01  1.1564665740933788e+02  2.4163791756721466e+01
+   20  8.5723654185276146e+01  8.3354105531647818e+01 -6.6380939444134356e+01
+   21  7.2614253221132458e+01 -1.0858997173537107e+02 -9.7505297776024449e+00
+   22 -7.0420361713052930e+00  5.3431098224890221e+01  3.3089063930822551e+00
+   23 -2.6591358240682062e+01  5.7408565880721866e+01  2.7437106471305679e+01
+   24 -4.1792038450554799e+01  5.1730557789864775e+01 -4.0814677464080816e+01
+   25 -4.1432062506590214e+01 -2.5839213423062226e+01  4.2240164846210408e+00
+   26  4.7210066329871566e+01 -5.2462761136081880e+01 -7.3222050314410501e+00
+   27  7.1880187551772764e+01  6.4264938765955392e+01  4.3600944370341068e+01
+   28 -8.4540787660053340e+01 -3.5402262816619938e+01 -1.8100280797937039e+01
+   29  6.9538301274653790e+01 -6.3441028093040622e+01 -1.4636386232064458e+01
+   30 -1.0347208112535196e+01 -1.7647584813608077e+02  7.2581082578181517e+00
+   31  5.5139777976761025e+01 -4.2081916983382541e+01 -4.6602437208067727e+01
+   32 -1.0993230999577290e+02  3.4110056387297462e+01  1.8478090262857769e+01
+...