diff --git a/doc/src/Commands_pair.rst b/doc/src/Commands_pair.rst index fd1ef692c5..7f7b2d4b7d 100644 --- a/doc/src/Commands_pair.rst +++ b/doc/src/Commands_pair.rst @@ -200,6 +200,7 @@ OPT. * :doc:`mdpd ` * :doc:`mdpd/rhosum ` * :doc:`meam (k) ` + * :doc:`meam/ms (k) ` * :doc:`meam/spline (o) ` * :doc:`meam/sw/spline ` * :doc:`mesocnt ` diff --git a/doc/src/pair_meam.rst b/doc/src/pair_meam.rst index 6a3d52c4d5..57c40aa6ee 100644 --- a/doc/src/pair_meam.rst +++ b/doc/src/pair_meam.rst @@ -1,17 +1,26 @@ .. index:: pair_style meam .. index:: pair_style meam/kk +.. index:: pair_style meam/ms +.. index:: pair_style meam/ms/kk pair_style meam command ========================= Accelerator Variants: *meam/kk* +pair_style meam/ms command +========================== + +Accelerator Variants: *meam/ms/kk* + Syntax """""" .. code-block:: LAMMPS - pair_style meam + pair_style style + +* style = *meam* or *meam/ms* Examples """""""" @@ -22,6 +31,9 @@ Examples pair_coeff * * ../potentials/library.meam Si ../potentials/si.meam Si pair_coeff * * ../potentials/library.meam Ni Al NULL Ni Al Ni Ni + pair_style meam/ms + pair_coeff * * ../potentials/library.msmeam H Ga ../potentials/HGa.meam H Ga + Description """"""""""" @@ -31,16 +43,23 @@ Description as of November 2010; see description below of the mixture_ref_t parameter -Pair style *meam* computes non-bonded interactions for a variety of materials -using the modified embedded-atom method (MEAM) -:ref:`(Baskes) `. Conceptually, it is an extension to the original -:doc:`EAM method ` which adds angular forces. It is -thus suitable for modeling metals and alloys with fcc, bcc, hcp and -diamond cubic structures, as well as materials with covalent interactions -like silicon and carbon. This *meam* pair style is a translation of the -original Fortran version to C++. It is functionally equivalent but more -efficient and has additional features. The Fortran version of the *meam* -pair style has been removed from LAMMPS after the 12 December 2018 release. +Pair style *meam* computes non-bonded interactions for a variety of +materials using the modified embedded-atom method (MEAM) :ref:`(Baskes) +`. Conceptually, it is an extension to the original :doc:`EAM +method ` which adds angular forces. It is thus suitable for +modeling metals and alloys with fcc, bcc, hcp and diamond cubic +structures, as well as materials with covalent interactions like silicon +and carbon. + +The *meam* pair style is a translation of the original Fortran version +to C++. It is functionally equivalent but more efficient and has +additional features. The Fortran version of the *meam* pair style has +been removed from LAMMPS after the 12 December 2018 release. + +Pair style *meam/ms* uses the multi-state MEAM (MS-MEAM) method +according to :ref:`(Baskes2) `, which is an extension to MEAM. +This pair style is mostly equivalent to *meam* and differs only +where noted in the documentation below. In the MEAM formulation, the total energy E of a system of atoms is given by: @@ -351,6 +370,16 @@ Most published MEAM parameter sets use the default values *attrac* = *repulse* = Setting *repuls* = *attrac* = *delta* corresponds to the form used in several recent published MEAM parameter sets, such as :ref:`(Valone) ` +Then using *meam/ms* pair style the multi-state MEAM (MS-MEAM) method is +activated. This requires 6 extra parameters in the MEAM library file, +resulting in 25 parameters ordered that are ordered like this: + +elt, lat, z, ielement, atwt, alpha, b0, b1, b2, b3, b1m, b2m, b3m, alat, esub, asub, +t0, t1, t2, t3, t1m, t2m, t3m, rozero, ibar + +The 6 extra MS-MEAM parameters are *b1m, b2m, b3m, t1m, t2m, t3m*. +In the LAMMPS ``potentials`` folder, compatible files have an ".msmeam" extension. + ---------- .. include:: accel_styles.rst @@ -393,16 +422,15 @@ This pair style can only be used via the *pair* keyword of the Restrictions """""""""""" -The *meam* style is provided in the MEAM package. It is -only enabled if LAMMPS was built with that package. +The *meam* and *meam/ms* pair styles are provided in the MEAM +package. They are only enabled if LAMMPS was built with that package. See the :doc:`Build package ` page for more info. -The maximum number of elements, that can be read from the MEAM -library file, is determined at compile time. The default is 5. -If you need support for more elements, you have to change the -define for the constant 'maxelt' at the beginning of the file -src/MEAM/meam.h and update/recompile LAMMPS. There is no -limit on the number of atoms types. +The maximum number of elements, that can be read from the MEAM library +file, is determined at compile time. The default is 5. If you need +support for more elements, you have to change the the constant 'maxelt' +at the beginning of the file ``src/MEAM/meam.h`` and update/recompile +LAMMPS. There is no limit on the number of atoms types. Related commands """""""""""""""" @@ -421,6 +449,10 @@ none **(Baskes)** Baskes, Phys Rev B, 46, 2727-2742 (1992). +.. _Baskes2: + +**(Baskes2)** Baskes, Phys Rev B, 75, 094113 (2007). + .. _Gullet: **(Gullet)** Gullet, Wagner, Slepoy, SANDIA Report 2003-8782 (2003). DOI:10.2172/918395 diff --git a/doc/src/pair_style.rst b/doc/src/pair_style.rst index facfadeb9b..b3f7276480 100644 --- a/doc/src/pair_style.rst +++ b/doc/src/pair_style.rst @@ -277,7 +277,8 @@ accelerated styles exist. * :doc:`lubricateU/poly ` - hydrodynamic lubrication forces for Fast Lubrication with polydispersity * :doc:`mdpd ` - mDPD particle interactions * :doc:`mdpd/rhosum ` - mDPD particle interactions for mass density -* :doc:`meam ` - modified embedded atom method (MEAM) in C +* :doc:`meam ` - modified embedded atom method (MEAM) +* :doc:`meam/ms ` - multi-state modified embedded atom method (MS-MEAM) * :doc:`meam/spline ` - splined version of MEAM * :doc:`meam/sw/spline ` - splined version of MEAM with a Stillinger-Weber term * :doc:`mesocnt ` - mesoscopic vdW potential for (carbon) nanotubes diff --git a/examples/meam/msmeam/HGa.meam b/examples/meam/msmeam/HGa.meam new file mode 100644 index 0000000000..9f01501c16 --- /dev/null +++ b/examples/meam/msmeam/HGa.meam @@ -0,0 +1,30 @@ +bkgd_dyn = 1 +emb_lin_neg = 1 +augt1=0 +ialloy=1 +rc = 5.9 +#H +attrac(1,1)=0.460 +repuls(1,1)=0.460 +Cmin(1,1,1)=1.3 # PuMS +Cmax(1,1,1)= 2.80 +nn2(1,1)=1 +#Ga +rho0(2) = 0.6 +attrac(2,2)=0.097 +repuls(2,2)=0.097 +nn2(2,2)=1 +#HGa +attrac(1,2)=0.300 +repuls(1,2)=0.300 +lattce(1,2)=l12 +re(1,2)=3.19 +delta(1,2)=-0.48 +alpha(1,2)=6.6 +Cmin(1,1,2)=2.0 +Cmin(2,1,2)= 2.0 +Cmin(1,2,1)=2.0 +Cmin(2,2,1) = 1.4 +Cmin(1,2,2) = 1.4 +Cmin(1,1,2) = 1.4 +nn2(1,2)=1 diff --git a/examples/meam/msmeam/README.md b/examples/meam/msmeam/README.md new file mode 100644 index 0000000000..dbf569d4b3 --- /dev/null +++ b/examples/meam/msmeam/README.md @@ -0,0 +1,9 @@ +To run Baske's test, do + + lmp -in in.msmeam + +Then + + diff dump.msmeam dump.msmeam.bu + + diff --git a/examples/meam/msmeam/data.msmeam.bu b/examples/meam/msmeam/data.msmeam.bu new file mode 100644 index 0000000000..576a3c50de --- /dev/null +++ b/examples/meam/msmeam/data.msmeam.bu @@ -0,0 +1,25 @@ +LAMMPS data file via write_data, version 16 Feb 2016, timestep = 1 + +3 atoms +2 atom types + +-4.0000000000000000e+00 4.0000000000000000e+00 xlo xhi +-4.0000000000000000e+00 4.0000000000000000e+00 ylo yhi +-4.0000000000000000e+00 4.0000000000000000e+00 zlo zhi + +Masses + +1 1.0079 +2 69.723 + +Atoms # atomic + +1 1 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 0 0 0 +2 2 2.2000000000000002e+00 0.0000000000000000e+00 0.0000000000000000e+00 0 0 0 +3 2 2.9999999999999999e-01 2.2999999999999998e+00 0.0000000000000000e+00 0 0 0 + +Velocities + +1 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 +2 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 +3 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 diff --git a/examples/meam/msmeam/dump.msmeam.bu b/examples/meam/msmeam/dump.msmeam.bu new file mode 100644 index 0000000000..039f630073 --- /dev/null +++ b/examples/meam/msmeam/dump.msmeam.bu @@ -0,0 +1,24 @@ +ITEM: TIMESTEP +0 +ITEM: NUMBER OF ATOMS +3 +ITEM: BOX BOUNDS pp pp pp +-4 4 +-4 4 +-4 4 +ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] +1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0 +2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0 +3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0 +ITEM: TIMESTEP +1 +ITEM: NUMBER OF ATOMS +3 +ITEM: BOX BOUNDS pp pp pp +-4 4 +-4 4 +-4 4 +ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] +1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0 +2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0 +3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0 diff --git a/examples/meam/msmeam/in.msmeam b/examples/meam/msmeam/in.msmeam new file mode 100644 index 0000000000..82ffb89a13 --- /dev/null +++ b/examples/meam/msmeam/in.msmeam @@ -0,0 +1,31 @@ +echo both +log log.msmeam +# Test of MEAM potential for HGa + +# ------------------------ INITIALIZATION ---------------------------- +units metal +dimension 3 +boundary p p p +atom_style atomic +variable latparam equal 4.646 +variable ncell equal 3 + +# ----------------------- ATOM DEFINITION ---------------------------- +region box block -4 4 -4 4 -4 4 +create_box 2 box + +# + +include potential.mod +create_atoms 1 single 0 0 0 units box +create_atoms 2 single 2.2 0 0 units box +create_atoms 2 single 0.3 2.3 0 units box +# ---------- Define Settings --------------------- +variable teng equal "c_eatoms" +compute pot_energy all pe/atom +compute stress all stress/atom NULL +dump 1 all custom 1 dump.msmeam id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] +run 1 +write_data data.msmeam + +print "All done!" diff --git a/examples/meam/msmeam/library.msmeam b/examples/meam/msmeam/library.msmeam new file mode 100644 index 0000000000..9937eaee08 --- /dev/null +++ b/examples/meam/msmeam/library.msmeam @@ -0,0 +1,14 @@ +# DATE: 2018-09-22 UNITS: metal CONTRIBUTOR: Steve Valone, smv@lanl.gov CITATION: Baskes, PRB 1992; smv, sr, mib, JNM 2010 +# ms-meam data format May 2010 +# elt lat z ielement atwt +# alpha b0 b1 b2 b3 b1m b2m b3m alat esub asub +# - t0 t1 t2 t3 t1m t2m t3m rozero ibar +# NOTE: leading character cannot be a space + +'H' 'dim' 1.0 1 1.0079 +2.960 2.960 3.0 1.0 1.0 1.0 3.0 1.0 0.741 2.235 2.50 +1.0 0.44721 0.0 0.00 0.0 0.31623 0 6.70 0 + +'Ga4' 'fcc' 12.0 31 69.723 +4.42 4.80 3.10 6.00 0.00 0.0 0.0 0.5 4.247 2.897 0.97 +1.0 1.649 1.435 0.00 0.0 0.0 2.0 0.70 0 diff --git a/examples/meam/msmeam/log.msmeam.bu b/examples/meam/msmeam/log.msmeam.bu new file mode 100644 index 0000000000..8eac453c1e --- /dev/null +++ b/examples/meam/msmeam/log.msmeam.bu @@ -0,0 +1,107 @@ +# Test of MEAM potential for HGa + +# ------------------------ INITIALIZATION ---------------------------- +units metal +dimension 3 +boundary p p p +atom_style atomic +variable latparam equal 4.646 +variable ncell equal 3 + +# ----------------------- ATOM DEFINITION ---------------------------- +region box block -4 4 -4 4 -4 4 +create_box 2 box +Created orthogonal box = (-4 -4 -4) to (4 4 4) + 1 by 1 by 1 MPI processor grid + +# + +include potential.mod +# NOTE: This script can be modified for different pair styles +# See in.elastic for more info. + +variable Pu string H +print "potential chosen ${Pu}" +potential chosen H +# Choose potential +pair_style MSmeam +print "we just executed" +we just executed + +pair_coeff * * library.MSmeam ${Pu} Ga4 HGaMS.meam ${Pu} Ga4 +pair_coeff * * library.MSmeam H Ga4 HGaMS.meam ${Pu} Ga4 +pair_coeff * * library.MSmeam H Ga4 HGaMS.meam H Ga4 +Reading potential file library.MSmeam with DATE: 2018-09-22 +# Setup neighbor style +neighbor 1.0 nsq +neigh_modify once no every 1 delay 0 check yes + +# Setup minimization style +variable dmax equal 1.0e-2 +min_style cg +min_modify dmax ${dmax} line quadratic +min_modify dmax 0.01 line quadratic +compute eng all pe/atom +compute eatoms all reduce sum c_eng + +# Setup output +thermo 100 +thermo_style custom step temp etotal press pxx pyy pzz pxy pxz pyz lx ly lz vol c_eatoms +thermo_modify norm yes +create_atoms 1 single 0 0 0 units box +Created 1 atoms +create_atoms 2 single 2.2 0 0 units box +Created 1 atoms +create_atoms 2 single 0.3 2.3 0 units box +Created 1 atoms +# ---------- Define Settings --------------------- +variable teng equal "c_eatoms" +compute pot_energy all pe/atom +compute stress all stress/atom NULL +dump 1 all custom 1 dump.msmeam id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] +run 1 +WARNING: No fixes defined, atoms won't move (../verlet.cpp:55) +Neighbor list info ... + 2 neighbor list requests + update every 1 steps, delay 0 steps, check yes + max neighbors/atom: 2000, page size: 100000 + master list distance cutoff = 6.9 + ghost atom cutoff = 6.9 +Memory usage per processor = 12.9295 Mbytes +Step Temp TotEng Press Pxx Pyy Pzz Pxy Pxz Pyz Lx Ly Lz Volume eatoms + 0 0 15.433079 491354.68 838670.91 635393.13 0 80195.793 0 0 8 8 8 512 15.433079 + 1 0 15.433079 491354.68 838670.91 635393.13 0 80195.793 0 0 8 8 8 512 15.433079 +Loop time of 0.000172138 on 1 procs for 1 steps with 3 atoms + +Performance: 501.922 ns/day, 0.048 hours/ns, 5809.285 timesteps/s +81.3% CPU use with 1 MPI tasks x no OpenMP threads + +MPI task timing breakdown: +Section | min time | avg time | max time |%varavg| %total +--------------------------------------------------------------- +Pair | 6.6996e-05 | 6.6996e-05 | 6.6996e-05 | 0.0 | 38.92 +Neigh | 0 | 0 | 0 | 0.0 | 0.00 +Comm | 1.9073e-06 | 1.9073e-06 | 1.9073e-06 | 0.0 | 1.11 +Output | 9.7036e-05 | 9.7036e-05 | 9.7036e-05 | 0.0 | 56.37 +Modify | 0 | 0 | 0 | 0.0 | 0.00 +Other | | 6.199e-06 | | | 3.60 + +Nlocal: 3 ave 3 max 3 min +Histogram: 1 0 0 0 0 0 0 0 0 0 +Nghost: 78 ave 78 max 78 min +Histogram: 1 0 0 0 0 0 0 0 0 0 +Neighs: 7 ave 7 max 7 min +Histogram: 1 0 0 0 0 0 0 0 0 0 +FullNghs: 14 ave 14 max 14 min +Histogram: 1 0 0 0 0 0 0 0 0 0 + +Total # of neighbors = 14 +Ave neighs/atom = 4.66667 +Neighbor list builds = 0 +Dangerous builds = 0 +write_data data.msmeam + +print "All done!" +All done! +Total wall time: 0:00:00 + diff --git a/examples/meam/msmeam/msmeam.dump.bu b/examples/meam/msmeam/msmeam.dump.bu new file mode 100644 index 0000000000..039f630073 --- /dev/null +++ b/examples/meam/msmeam/msmeam.dump.bu @@ -0,0 +1,24 @@ +ITEM: TIMESTEP +0 +ITEM: NUMBER OF ATOMS +3 +ITEM: BOX BOUNDS pp pp pp +-4 4 +-4 4 +-4 4 +ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] +1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0 +2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0 +3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0 +ITEM: TIMESTEP +1 +ITEM: NUMBER OF ATOMS +3 +ITEM: BOX BOUNDS pp pp pp +-4 4 +-4 4 +-4 4 +ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6] +1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0 +2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0 +3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0 diff --git a/examples/meam/msmeam/potential.mod b/examples/meam/msmeam/potential.mod new file mode 100644 index 0000000000..760cc93503 --- /dev/null +++ b/examples/meam/msmeam/potential.mod @@ -0,0 +1,25 @@ +# NOTE: This script can be modified for different pair styles +# See in.elastic for more info. + +variable Pu string H +print "potential chosen ${Pu}" +# Choose potential +pair_style meam/ms +print "we just executed" + +pair_coeff * * library.msmeam ${Pu} Ga4 HGa.meam ${Pu} Ga4 +# Setup neighbor style +neighbor 1.0 bin +neigh_modify once no every 1 delay 0 check yes + +# Setup minimization style +variable dmax equal 1.0e-2 +min_style cg +min_modify dmax ${dmax} line quadratic +compute eng all pe/atom +compute eatoms all reduce sum c_eng + +# Setup output +thermo 100 +thermo_style custom step temp etotal press pxx pyy pzz pxy pxz pyz lx ly lz vol c_eatoms +thermo_modify norm yes diff --git a/lib/kokkos/kokkos_5538.diff b/lib/kokkos/kokkos_5538.diff deleted file mode 100644 index 6bf2ccf6a4..0000000000 --- a/lib/kokkos/kokkos_5538.diff +++ /dev/null @@ -1,199 +0,0 @@ -diff --git a/lib/kokkos/Makefile.kokkos b/lib/kokkos/Makefile.kokkos -index 22af411f32..530510a0d1 100644 ---- a/lib/kokkos/Makefile.kokkos -+++ b/lib/kokkos/Makefile.kokkos -@@ -20,7 +20,7 @@ KOKKOS_DEVICES ?= "OpenMP" - #KOKKOS_DEVICES ?= "Threads" - # Options: - # Intel: KNC,KNL,SNB,HSW,BDW,SKL,SKX,ICL,ICX,SPR --# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86 -+# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86,Hopper90 - # ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX - # IBM: BGQ,Power7,Power8,Power9 - # AMD-GPUS: Vega900,Vega906,Vega908,Vega90A -@@ -401,6 +401,7 @@ KOKKOS_INTERNAL_USE_ARCH_VOLTA72 := $(call kokkos_has_string,$(KOKKOS_ARCH),Volt - KOKKOS_INTERNAL_USE_ARCH_TURING75 := $(call kokkos_has_string,$(KOKKOS_ARCH),Turing75) - KOKKOS_INTERNAL_USE_ARCH_AMPERE80 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere80) - KOKKOS_INTERNAL_USE_ARCH_AMPERE86 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere86) -+KOKKOS_INTERNAL_USE_ARCH_HOPPER90 := $(call kokkos_has_string,$(KOKKOS_ARCH),Hopper90) - KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \ - + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \ - + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \ -@@ -414,7 +415,8 @@ KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLE - + $(KOKKOS_INTERNAL_USE_ARCH_VOLTA72) \ - + $(KOKKOS_INTERNAL_USE_ARCH_TURING75) \ - + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE80) \ -- + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE86)) -+ + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE86) \ -+ + $(KOKKOS_INTERNAL_USE_ARCH_HOPPER90)) - - #SEK: This seems like a bug to me - ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0) -@@ -1194,6 +1196,11 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA_ARCH), 1) - tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMPERE86") - KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_86 - endif -+ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_HOPPER90), 1) -+ tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_HOPPER") -+ tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_HOPPER90") -+ KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_90 -+ endif - - ifneq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0) - KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG) -diff --git a/lib/kokkos/cmake/KokkosCore_config.h.in b/lib/kokkos/cmake/KokkosCore_config.h.in -index 88ddc48378..b83ced9243 100644 ---- a/lib/kokkos/cmake/KokkosCore_config.h.in -+++ b/lib/kokkos/cmake/KokkosCore_config.h.in -@@ -102,6 +102,7 @@ - #cmakedefine KOKKOS_ARCH_AMPERE - #cmakedefine KOKKOS_ARCH_AMPERE80 - #cmakedefine KOKKOS_ARCH_AMPERE86 -+#cmakedefine KOKKOS_ARCH_HOPPER90 - #cmakedefine KOKKOS_ARCH_AMD_ZEN - #cmakedefine KOKKOS_ARCH_AMD_ZEN2 - #cmakedefine KOKKOS_ARCH_AMD_ZEN3 -diff --git a/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc b/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc -index f56cef1651..2585a6a64c 100644 ---- a/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc -+++ b/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc -@@ -74,6 +74,7 @@ int main() { - case 75: std::cout << "Set -DKokkos_ARCH_TURING75=ON ." << std::endl; break; - case 80: std::cout << "Set -DKokkos_ARCH_AMPERE80=ON ." << std::endl; break; - case 86: std::cout << "Set -DKokkos_ARCH_AMPERE86=ON ." << std::endl; break; -+ case 90: std::cout << "Set -DKokkos_ARCH_HOPPER90=ON ." << std::endl; break; - default: - std::cout << "Compute capability " << compute_capability - << " is not supported" << std::endl; -diff --git a/lib/kokkos/cmake/kokkos_arch.cmake b/lib/kokkos/cmake/kokkos_arch.cmake -index ef16aad047..c1d76cceeb 100644 ---- a/lib/kokkos/cmake/kokkos_arch.cmake -+++ b/lib/kokkos/cmake/kokkos_arch.cmake -@@ -86,6 +86,7 @@ KOKKOS_ARCH_OPTION(VOLTA72 GPU "NVIDIA Volta generation CC 7.2" "KOKK - KOKKOS_ARCH_OPTION(TURING75 GPU "NVIDIA Turing generation CC 7.5" "KOKKOS_SHOW_CUDA_ARCHS") - KOKKOS_ARCH_OPTION(AMPERE80 GPU "NVIDIA Ampere generation CC 8.0" "KOKKOS_SHOW_CUDA_ARCHS") - KOKKOS_ARCH_OPTION(AMPERE86 GPU "NVIDIA Ampere generation CC 8.6" "KOKKOS_SHOW_CUDA_ARCHS") -+KOKKOS_ARCH_OPTION(HOPPER90 GPU "NVIDIA Hopper generation CC 9.0" "KOKKOS_SHOW_CUDA_ARCHS") - - IF(Kokkos_ENABLE_HIP OR Kokkos_ENABLE_OPENMPTARGET OR Kokkos_ENABLE_UNSUPPORTED_ARCHS) - SET(KOKKOS_SHOW_HIP_ARCHS ON) -@@ -544,6 +545,7 @@ CHECK_CUDA_ARCH(VOLTA72 sm_72) - CHECK_CUDA_ARCH(TURING75 sm_75) - CHECK_CUDA_ARCH(AMPERE80 sm_80) - CHECK_CUDA_ARCH(AMPERE86 sm_86) -+CHECK_CUDA_ARCH(HOPPER90 sm_90) - - SET(AMDGPU_ARCH_ALREADY_SPECIFIED "") - FUNCTION(CHECK_AMDGPU_ARCH ARCH FLAG) -@@ -806,6 +808,10 @@ IF (KOKKOS_ARCH_AMPERE80 OR KOKKOS_ARCH_AMPERE86) - SET(KOKKOS_ARCH_AMPERE ON) - ENDIF() - -+IF (KOKKOS_ARCH_HOPPER90) -+ SET(KOKKOS_ARCH_HOPPER ON) -+ENDIF() -+ - #Regardless of version, make sure we define the general architecture name - IF (KOKKOS_ARCH_VEGA900 OR KOKKOS_ARCH_VEGA906 OR KOKKOS_ARCH_VEGA908 OR KOKKOS_ARCH_VEGA90A) - SET(KOKKOS_ARCH_VEGA ON) -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -index 56f9117844..fcd4773dbc 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -@@ -232,7 +232,8 @@ inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) { - case 61: return 96; - case 70: - case 80: -- case 86: return 8; -+ case 86: -+ case 90: return 8; - case 75: return 32; - default: - Kokkos::Impl::throw_runtime_exception( -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp -index 40a263561f..8c40ebd60d 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp -@@ -418,7 +418,7 @@ KOKKOS_INLINE_FUNCTION - #endif // CUDA_VERSION >= 11000 && CUDA_VERSION < 11010 - - #if CUDA_VERSION >= 11010 && \ -- ((defined(KOKKOS_ARCH_AMPERE80) || defined(KOKKOS_ARCH_AMPERE86))) -+ ((defined(KOKKOS_ARCH_AMPERE) || defined(KOKKOS_ARCH_HOPPER))) - KOKKOS_INLINE_FUNCTION - bhalf_t cast_to_bhalf(bhalf_t val) { return val; } - KOKKOS_INLINE_FUNCTION -diff --git a/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp b/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp -index f9451ecfe6..2ce1efb98c 100644 ---- a/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp -+++ b/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp -@@ -51,7 +51,7 @@ namespace Kokkos::Experimental::Impl { - - struct OpenACC_Traits { - #if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \ -- defined(KOKKOS_ARCH_AMPERE) -+ defined(KOKKOS_ARCH_AMPERE) || defined(KOKKOS_ARCH_HOPPER) - static constexpr acc_device_t dev_type = acc_device_nvidia; - static constexpr bool may_fallback_to_host = false; - #else -diff --git a/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp b/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp -index a9bc085912..27ee1d4232 100644 ---- a/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp -+++ b/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp -@@ -115,8 +115,9 @@ void OpenMPTargetInternal::impl_initialize() { - - // FIXME_OPENMPTARGET: Only fix the number of teams for NVIDIA architectures - // from Pascal and upwards. --#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \ -- defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) -+#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \ -+ defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \ -+ defined(KOKKOS_ARCH_HOPPER) - #if defined(KOKKOS_COMPILER_CLANG) && (KOKKOS_COMPILER_CLANG >= 1300) - omp_set_num_teams(512); - #endif -diff --git a/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp b/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp -index 840db4327c..7e5addbc5b 100644 ---- a/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp -+++ b/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp -@@ -155,7 +155,7 @@ void SYCL::impl_initialize(InitializationSettings const& settings) { - #if !defined(KOKKOS_ARCH_INTEL_GPU) && !defined(KOKKOS_ARCH_KEPLER) && \ - !defined(KOKKOS_ARCH_MAXWELL) && !defined(KOKKOS_ARCH_PASCAL) && \ - !defined(KOKKOS_ARCH_VOLTA) && !defined(KOKKOS_ARCH_TURING75) && \ -- !defined(KOKKOS_ARCH_AMPERE) -+ !defined(KOKKOS_ARCH_AMPERE) && !defined(KOKKOS_ARCH_HOPPER) - if (!settings.has_device_id() && gpu_devices.empty()) { - Impl::SYCLInternal::singleton().initialize(sycl::device()); - return; -diff --git a/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp b/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp -index 5ac7d8af30..ba101f699e 100644 ---- a/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp -+++ b/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp -@@ -335,9 +335,10 @@ class TeamPolicyInternal - return std::min({ - int(m_space.impl_internal_space_instance()->m_maxWorkgroupSize), - // FIXME_SYCL Avoid requesting to many registers on NVIDIA GPUs. --#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \ -- defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \ -- defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) -+#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \ -+ defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \ -+ defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \ -+ defined(KOKKOS_ARCH_HOPPER) - 256, - #endif - max_threads_for_memory -@@ -367,9 +368,10 @@ class TeamPolicyInternal - return std::min({ - int(m_space.impl_internal_space_instance()->m_maxWorkgroupSize), - // FIXME_SYCL Avoid requesting to many registers on NVIDIA GPUs. --#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \ -- defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \ -- defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) -+#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \ -+ defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \ -+ defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \ -+ defined(KOKKOS_ARCH_HOPPER) - 256, - #endif - max_threads_for_memory diff --git a/lib/kokkos/kokkos_5706.diff b/lib/kokkos/kokkos_5706.diff deleted file mode 100644 index 2bfbb35b06..0000000000 --- a/lib/kokkos/kokkos_5706.diff +++ /dev/null @@ -1,523 +0,0 @@ -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -index fcd4773dbc..30b6958a67 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -@@ -207,7 +207,6 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance, - LaunchBounds{}); - } - --// Assuming cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1) - // NOTE these number can be obtained several ways: - // * One option is to download the CUDA Occupancy Calculator spreadsheet, select - // "Compute Capability" first and check what is the smallest "Shared Memory -@@ -242,6 +241,7 @@ inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) { - return 0; - }() * 1024; - } -+ - } // namespace Impl - } // namespace Kokkos - -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp -index 5811498e01..e22eb3b842 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp -@@ -569,12 +569,6 @@ Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default - } - #endif - --#ifdef KOKKOS_ENABLE_PRE_CUDA_10_DEPRECATION_API -- cudaThreadSetCacheConfig(cudaFuncCachePreferShared); --#else -- cudaDeviceSetCacheConfig(cudaFuncCachePreferShared); --#endif -- - // Init the array for used for arbitrarily sized atomics - if (stream == nullptr) Impl::initialize_host_cuda_lock_arrays(); - -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp -index b7a80ad84f..5c4c3a7d39 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp -@@ -93,10 +93,6 @@ namespace Impl { - // __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor) - // function qualifier which could be used to improve performance. - //---------------------------------------------------------------------------- --// Maximize L1 cache and minimize shared memory: --// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1 ); --// For 2.0 capability: 48 KB L1 and 16 KB shared --//---------------------------------------------------------------------------- - - template - __global__ static void cuda_parallel_launch_constant_memory() { -@@ -158,63 +154,105 @@ inline void check_shmem_request(CudaInternal const* cuda_instance, int shmem) { - } - } - --// This function needs to be template on DriverType and LaunchBounds -+// These functions needs to be template on DriverType and LaunchBounds - // so that the static bool is unique for each type combo - // KernelFuncPtr does not necessarily contain that type information. -+ - template --inline void configure_shmem_preference(KernelFuncPtr const& func, -- bool prefer_shmem) { -+const cudaFuncAttributes& get_cuda_kernel_func_attributes( -+ const KernelFuncPtr& func) { -+ // Only call cudaFuncGetAttributes once for each unique kernel -+ // by leveraging static variable initialization rules -+ auto wrap_get_attributes = [&]() -> cudaFuncAttributes { -+ cudaFuncAttributes attr; -+ KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncGetAttributes(&attr, func)); -+ return attr; -+ }; -+ static cudaFuncAttributes func_attr = wrap_get_attributes(); -+ return func_attr; -+} -+ -+template -+inline void configure_shmem_preference(const KernelFuncPtr& func, -+ const cudaDeviceProp& device_props, -+ const size_t block_size, int& shmem, -+ const size_t occupancy) { - #ifndef KOKKOS_ARCH_KEPLER -- // On Kepler the L1 has no benefit since it doesn't cache reads -+ -+ const auto& func_attr = -+ get_cuda_kernel_func_attributes(func); -+ -+ // Compute limits for number of blocks due to registers/SM -+ const size_t regs_per_sm = device_props.regsPerMultiprocessor; -+ const size_t regs_per_thread = func_attr.numRegs; -+ // The granularity of register allocation is chunks of 256 registers per warp -+ // -> 8 registers per thread -+ const size_t allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8); -+ const size_t max_blocks_regs = -+ regs_per_sm / (allocated_regs_per_thread * block_size); -+ -+ // Compute how many threads per sm we actually want -+ const size_t max_threads_per_sm = device_props.maxThreadsPerMultiProcessor; -+ // only allocate multiples of warp size -+ const size_t num_threads_desired = -+ ((max_threads_per_sm * occupancy / 100 + 31) / 32) * 32; -+ // Get close to the desired occupancy, -+ // don't undershoot by much but also don't allocate a whole new block just -+ // because one is a few threads over otherwise. -+ size_t num_blocks_desired = -+ (num_threads_desired + block_size * 0.8) / block_size; -+ num_blocks_desired = ::std::min(max_blocks_regs, num_blocks_desired); -+ if (num_blocks_desired == 0) num_blocks_desired = 1; -+ -+ // Calculate how much shared memory we need per block -+ size_t shmem_per_block = shmem + func_attr.sharedSizeBytes; -+ -+ // The minimum shared memory allocation we can have in total per SM is 8kB. -+ // If we want to lower occupancy we have to make sure we request at least that -+ // much in aggregate over all blocks, so that shared memory actually becomes a -+ // limiting factor for occupancy -+ constexpr size_t min_shmem_size_per_sm = 8192; -+ if ((occupancy < 100) && -+ (shmem_per_block * num_blocks_desired < min_shmem_size_per_sm)) { -+ shmem_per_block = min_shmem_size_per_sm / num_blocks_desired; -+ // Need to set the caller's shmem variable so that the -+ // kernel launch uses the correct dynamic shared memory request -+ shmem = shmem_per_block - func_attr.sharedSizeBytes; -+ } -+ -+ // Compute the carveout fraction we need based on occupancy -+ // Use multiples of 8kB -+ const size_t max_shmem_per_sm = device_props.sharedMemPerMultiprocessor; -+ size_t carveout = shmem_per_block == 0 -+ ? 0 -+ : 100 * -+ (((num_blocks_desired * shmem_per_block + -+ min_shmem_size_per_sm - 1) / -+ min_shmem_size_per_sm) * -+ min_shmem_size_per_sm) / -+ max_shmem_per_sm; -+ if (carveout > 100) carveout = 100; -+ -+ // Set the carveout, but only call it once per kernel or when it changes - auto set_cache_config = [&] { -- KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncSetCacheConfig( -- func, -- (prefer_shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1))); -- return prefer_shmem; -+ KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncSetAttribute( -+ func, cudaFuncAttributePreferredSharedMemoryCarveout, carveout)); -+ return carveout; - }; -- static bool cache_config_preference_cached = set_cache_config(); -- if (cache_config_preference_cached != prefer_shmem) { -+ // Store the value in a static variable so we only reset if needed -+ static size_t cache_config_preference_cached = set_cache_config(); -+ if (cache_config_preference_cached != carveout) { - cache_config_preference_cached = set_cache_config(); - } - #else - // Use the parameters so we don't get a warning - (void)func; -- (void)prefer_shmem; -+ (void)device_props; -+ (void)block_size; -+ (void)occupancy; - #endif - } - --template --std::enable_if_t --modify_launch_configuration_if_desired_occupancy_is_specified( -- Policy const& policy, cudaDeviceProp const& properties, -- cudaFuncAttributes const& attributes, dim3 const& block, int& shmem, -- bool& prefer_shmem) { -- int const block_size = block.x * block.y * block.z; -- int const desired_occupancy = policy.impl_get_desired_occupancy().value(); -- -- size_t const shmem_per_sm_prefer_l1 = get_shmem_per_sm_prefer_l1(properties); -- size_t const static_shmem = attributes.sharedSizeBytes; -- -- // round to nearest integer and avoid division by zero -- int active_blocks = std::max( -- 1, static_cast(std::round( -- static_cast(properties.maxThreadsPerMultiProcessor) / -- block_size * desired_occupancy / 100))); -- int const dynamic_shmem = -- shmem_per_sm_prefer_l1 / active_blocks - static_shmem; -- -- if (dynamic_shmem > shmem) { -- shmem = dynamic_shmem; -- prefer_shmem = false; -- } --} -- --template --std::enable_if_t --modify_launch_configuration_if_desired_occupancy_is_specified( -- Policy const&, cudaDeviceProp const&, cudaFuncAttributes const&, -- dim3 const& /*block*/, int& /*shmem*/, bool& /*prefer_shmem*/) {} -- - // end Some helper functions for launch code readability }}}1 - //============================================================================== - -@@ -348,7 +386,7 @@ struct CudaParallelLaunchKernelInvoker< - #ifdef KOKKOS_CUDA_ENABLE_GRAPHS - inline static void create_parallel_launch_graph_node( - DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem, -- CudaInternal const* cuda_instance, bool prefer_shmem) { -+ CudaInternal const* cuda_instance) { - //---------------------------------------- - auto const& graph = Impl::get_cuda_graph_from_kernel(driver); - KOKKOS_EXPECTS(bool(graph)); -@@ -358,8 +396,15 @@ struct CudaParallelLaunchKernelInvoker< - - if (!Impl::is_empty_launch(grid, block)) { - Impl::check_shmem_request(cuda_instance, shmem); -- Impl::configure_shmem_preference( -- base_t::get_kernel_func(), prefer_shmem); -+ if (DriverType::Policy:: -+ experimental_contains_desired_occupancy) { -+ int desired_occupancy = -+ driver.get_policy().impl_get_desired_occupancy().value(); -+ size_t block_size = block.x * block.y * block.z; -+ Impl::configure_shmem_preference( -+ base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size, -+ shmem, desired_occupancy); -+ } - - void const* args[] = {&driver}; - -@@ -442,7 +487,7 @@ struct CudaParallelLaunchKernelInvoker< - #ifdef KOKKOS_CUDA_ENABLE_GRAPHS - inline static void create_parallel_launch_graph_node( - DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem, -- CudaInternal const* cuda_instance, bool prefer_shmem) { -+ CudaInternal const* cuda_instance) { - //---------------------------------------- - auto const& graph = Impl::get_cuda_graph_from_kernel(driver); - KOKKOS_EXPECTS(bool(graph)); -@@ -452,8 +497,15 @@ struct CudaParallelLaunchKernelInvoker< - - if (!Impl::is_empty_launch(grid, block)) { - Impl::check_shmem_request(cuda_instance, shmem); -- Impl::configure_shmem_preference( -- base_t::get_kernel_func(), prefer_shmem); -+ if constexpr (DriverType::Policy:: -+ experimental_contains_desired_occupancy) { -+ int desired_occupancy = -+ driver.get_policy().impl_get_desired_occupancy().value(); -+ size_t block_size = block.x * block.y * block.z; -+ Impl::configure_shmem_preference( -+ base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size, -+ shmem, desired_occupancy); -+ } - - auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver); - -@@ -566,7 +618,7 @@ struct CudaParallelLaunchKernelInvoker< - #ifdef KOKKOS_CUDA_ENABLE_GRAPHS - inline static void create_parallel_launch_graph_node( - DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem, -- CudaInternal const* cuda_instance, bool prefer_shmem) { -+ CudaInternal const* cuda_instance) { - // Just use global memory; coordinating through events to share constant - // memory with the non-graph interface is not really reasonable since - // events don't work with Graphs directly, and this would anyway require -@@ -580,7 +632,7 @@ struct CudaParallelLaunchKernelInvoker< - DriverType, LaunchBounds, - Experimental::CudaLaunchMechanism::GlobalMemory>; - global_launch_impl_t::create_parallel_launch_graph_node( -- driver, grid, block, shmem, cuda_instance, prefer_shmem); -+ driver, grid, block, shmem, cuda_instance); - } - #endif - }; -@@ -613,8 +665,7 @@ struct CudaParallelLaunchImpl< - - inline static void launch_kernel(const DriverType& driver, const dim3& grid, - const dim3& block, int shmem, -- const CudaInternal* cuda_instance, -- bool prefer_shmem) { -+ const CudaInternal* cuda_instance) { - if (!Impl::is_empty_launch(grid, block)) { - // Prevent multiple threads to simultaneously set the cache configuration - // preference and launch the same kernel -@@ -623,18 +674,17 @@ struct CudaParallelLaunchImpl< - - Impl::check_shmem_request(cuda_instance, shmem); - -- // If a desired occupancy is specified, we compute how much shared memory -- // to ask for to achieve that occupancy, assuming that the cache -- // configuration is `cudaFuncCachePreferL1`. If the amount of dynamic -- // shared memory computed is actually smaller than `shmem` we overwrite -- // `shmem` and set `prefer_shmem` to `false`. -- modify_launch_configuration_if_desired_occupancy_is_specified( -- driver.get_policy(), cuda_instance->m_deviceProp, -- get_cuda_func_attributes(), block, shmem, prefer_shmem); -- -- Impl::configure_shmem_preference< -- DriverType, Kokkos::LaunchBounds>( -- base_t::get_kernel_func(), prefer_shmem); -+ if (DriverType::Policy:: -+ experimental_contains_desired_occupancy) { -+ int desired_occupancy = -+ driver.get_policy().impl_get_desired_occupancy().value(); -+ size_t block_size = block.x * block.y * block.z; -+ Impl::configure_shmem_preference< -+ DriverType, -+ Kokkos::LaunchBounds>( -+ base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size, -+ shmem, desired_occupancy); -+ } - - KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE(); - -@@ -650,18 +700,9 @@ struct CudaParallelLaunchImpl< - } - - static cudaFuncAttributes get_cuda_func_attributes() { -- // Race condition inside of cudaFuncGetAttributes if the same address is -- // given requires using a local variable as input instead of a static Rely -- // on static variable initialization to make sure only one thread executes -- // the code and the result is visible. -- auto wrap_get_attributes = []() -> cudaFuncAttributes { -- cudaFuncAttributes attr_tmp; -- KOKKOS_IMPL_CUDA_SAFE_CALL( -- cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func())); -- return attr_tmp; -- }; -- static cudaFuncAttributes attr = wrap_get_attributes(); -- return attr; -+ return get_cuda_kernel_func_attributes< -+ DriverType, Kokkos::LaunchBounds>( -+ base_t::get_kernel_func()); - } - }; - -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp -index e586bb4cc6..0e348c092a 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp -@@ -121,8 +121,7 @@ class ParallelFor, Kokkos::Cuda> { - maxblocks[1]), - 1); - CudaParallelLaunch( -- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(), -- false); -+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance()); - } else if (RP::rank == 3) { - const dim3 block(m_rp.m_tile[0], m_rp.m_tile[1], m_rp.m_tile[2]); - KOKKOS_ASSERT(block.x > 0); -@@ -139,8 +138,7 @@ class ParallelFor, Kokkos::Cuda> { - (m_rp.m_upper[2] - m_rp.m_lower[2] + block.z - 1) / block.z, - maxblocks[2])); - CudaParallelLaunch( -- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(), -- false); -+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance()); - } else if (RP::rank == 4) { - // id0,id1 encoded within threadIdx.x; id2 to threadIdx.y; id3 to - // threadIdx.z -@@ -158,8 +156,7 @@ class ParallelFor, Kokkos::Cuda> { - (m_rp.m_upper[3] - m_rp.m_lower[3] + block.z - 1) / block.z, - maxblocks[2])); - CudaParallelLaunch( -- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(), -- false); -+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance()); - } else if (RP::rank == 5) { - // id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4 to - // threadIdx.z -@@ -175,8 +172,7 @@ class ParallelFor, Kokkos::Cuda> { - (m_rp.m_upper[4] - m_rp.m_lower[4] + block.z - 1) / block.z, - maxblocks[2])); - CudaParallelLaunch( -- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(), -- false); -+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance()); - } else if (RP::rank == 6) { - // id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4,id5 to - // threadIdx.z -@@ -191,8 +187,7 @@ class ParallelFor, Kokkos::Cuda> { - std::min(m_rp.m_tile_end[4] * m_rp.m_tile_end[5], - maxblocks[2])); - CudaParallelLaunch( -- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(), -- false); -+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance()); - } else { - Kokkos::abort("Kokkos::MDRange Error: Exceeded rank bounds with Cuda\n"); - } -@@ -405,8 +400,8 @@ class ParallelReduce, ReducerType, - - CudaParallelLaunch( - *this, grid, block, shmem, -- m_policy.space().impl_internal_space_instance(), -- false); // copy to device and execute -+ m_policy.space() -+ .impl_internal_space_instance()); // copy to device and execute - - if (!m_result_ptr_device_accessible) { - if (m_result_ptr) { -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp -index ac160f8fe2..d1031751c2 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp -@@ -135,8 +135,7 @@ class ParallelFor, Kokkos::Cuda> { - #endif - - CudaParallelLaunch( -- *this, grid, block, 0, m_policy.space().impl_internal_space_instance(), -- false); -+ *this, grid, block, 0, m_policy.space().impl_internal_space_instance()); - } - - ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy) -@@ -375,8 +374,8 @@ class ParallelReduce, ReducerType, - - CudaParallelLaunch( - *this, grid, block, shmem, -- m_policy.space().impl_internal_space_instance(), -- false); // copy to device and execute -+ m_policy.space() -+ .impl_internal_space_instance()); // copy to device and execute - - if (!m_result_ptr_device_accessible) { - if (m_result_ptr) { -@@ -726,16 +725,16 @@ class ParallelScan, Kokkos::Cuda> { - m_final = false; - CudaParallelLaunch( - *this, grid, block, shmem, -- m_policy.space().impl_internal_space_instance(), -- false); // copy to device and execute -+ m_policy.space() -+ .impl_internal_space_instance()); // copy to device and execute - #ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION - } - #endif - m_final = true; - CudaParallelLaunch( - *this, grid, block, shmem, -- m_policy.space().impl_internal_space_instance(), -- false); // copy to device and execute -+ m_policy.space() -+ .impl_internal_space_instance()); // copy to device and execute - } - } - -@@ -1038,16 +1037,16 @@ class ParallelScanWithTotal, - m_final = false; - CudaParallelLaunch( - *this, grid, block, shmem, -- m_policy.space().impl_internal_space_instance(), -- false); // copy to device and execute -+ m_policy.space() -+ .impl_internal_space_instance()); // copy to device and execute - #ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION - } - #endif - m_final = true; - CudaParallelLaunch( - *this, grid, block, shmem, -- m_policy.space().impl_internal_space_instance(), -- false); // copy to device and execute -+ m_policy.space() -+ .impl_internal_space_instance()); // copy to device and execute - - const int size = Analysis::value_size(m_functor); - #ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp -index cdd16085b3..ea9430b812 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp -@@ -552,8 +552,8 @@ class ParallelFor, - - CudaParallelLaunch( - *this, grid, block, shmem_size_total, -- m_policy.space().impl_internal_space_instance(), -- true); // copy to device and execute -+ m_policy.space() -+ .impl_internal_space_instance()); // copy to device and execute - } - - ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy) -@@ -878,8 +878,8 @@ class ParallelReduce, - - CudaParallelLaunch( - *this, grid, block, shmem_size_total, -- m_policy.space().impl_internal_space_instance(), -- true); // copy to device and execute -+ m_policy.space() -+ .impl_internal_space_instance()); // copy to device and execute - - if (!m_result_ptr_device_accessible) { - m_policy.space().fence( -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp -index 34d4bef9fd..178012431c 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp -@@ -428,11 +428,6 @@ struct CudaReductionsFunctor { - // __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor) - // function qualifier which could be used to improve performance. - //---------------------------------------------------------------------------- --// Maximize shared memory and minimize L1 cache: --// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShared ); --// For 2.0 capability: 48 KB shared and 16 KB L1 --//---------------------------------------------------------------------------- --//---------------------------------------------------------------------------- - /* - * Algorithmic constraints: - * (a) blockDim.y <= 1024 -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp -index fb3a6b138f..a12378a891 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp -@@ -100,8 +100,7 @@ class ParallelFor, - const int shared = 0; - - Kokkos::Impl::CudaParallelLaunch( -- *this, grid, block, shared, Cuda().impl_internal_space_instance(), -- false); -+ *this, grid, block, shared, Cuda().impl_internal_space_instance()); - } - - inline ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy) diff --git a/lib/kokkos/kokkos_5731.diff b/lib/kokkos/kokkos_5731.diff deleted file mode 100644 index e95f4a1546..0000000000 --- a/lib/kokkos/kokkos_5731.diff +++ /dev/null @@ -1,46 +0,0 @@ -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -index 30b6958a67..b94f053272 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -@@ -207,41 +207,6 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance, - LaunchBounds{}); - } - --// NOTE these number can be obtained several ways: --// * One option is to download the CUDA Occupancy Calculator spreadsheet, select --// "Compute Capability" first and check what is the smallest "Shared Memory --// Size Config" that is available. The "Shared Memory Per Multiprocessor" in --// bytes is then to be found below in the summary. --// * Another option would be to look for the information in the "Tuning --// Guide(s)" of the CUDA Toolkit Documentation for each GPU architecture, in --// the "Shared Memory" section (more tedious) --inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) { -- int const compute_capability = properties.major * 10 + properties.minor; -- return [compute_capability]() { -- switch (compute_capability) { -- case 30: -- case 32: -- case 35: return 16; -- case 37: return 80; -- case 50: -- case 53: -- case 60: -- case 62: return 64; -- case 52: -- case 61: return 96; -- case 70: -- case 80: -- case 86: -- case 90: return 8; -- case 75: return 32; -- default: -- Kokkos::Impl::throw_runtime_exception( -- "Unknown device in cuda block size deduction"); -- } -- return 0; -- }() * 1024; --} -- - } // namespace Impl - } // namespace Kokkos - diff --git a/lib/kokkos/kokkos_5739.diff b/lib/kokkos/kokkos_5739.diff deleted file mode 100644 index fe7a1ff551..0000000000 --- a/lib/kokkos/kokkos_5739.diff +++ /dev/null @@ -1,204 +0,0 @@ -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -index b94f053272..252c13c524 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp -@@ -53,17 +53,69 @@ - namespace Kokkos { - namespace Impl { - -+inline int cuda_warp_per_sm_allocation_granularity( -+ cudaDeviceProp const& properties) { -+ // Allocation granularity of warps in each sm -+ switch (properties.major) { -+ case 3: -+ case 5: -+ case 7: -+ case 8: -+ case 9: return 4; -+ case 6: return (properties.minor == 0 ? 2 : 4); -+ default: -+ throw_runtime_exception( -+ "Unknown device in cuda warp per sm allocation granularity"); -+ return 0; -+ } -+} -+ -+inline int cuda_max_warps_per_sm_registers( -+ cudaDeviceProp const& properties, cudaFuncAttributes const& attributes) { -+ // Maximum number of warps per sm as a function of register counts, -+ // subject to the constraint that warps are allocated with a fixed granularity -+ int const max_regs_per_block = properties.regsPerBlock; -+ int const regs_per_warp = attributes.numRegs * properties.warpSize; -+ int const warp_granularity = -+ cuda_warp_per_sm_allocation_granularity(properties); -+ // The granularity of register allocation is chunks of 256 registers per warp, -+ // which implies a need to over-allocate, so we round up -+ int const allocated_regs_per_warp = (regs_per_warp + 256 - 1) / 256; -+ -+ // The maximum number of warps per SM is constrained from above by register -+ // allocation. To satisfy the constraint that warps per SM is allocated at a -+ // finite granularity, we need to round down. -+ int const max_warps_per_sm = -+ warp_granularity * -+ (max_regs_per_block / (allocated_regs_per_warp * warp_granularity)); -+ -+ return max_warps_per_sm; -+} -+ - inline int cuda_max_active_blocks_per_sm(cudaDeviceProp const& properties, - cudaFuncAttributes const& attributes, - int block_size, size_t dynamic_shmem) { -- // Limits due do registers/SM -+ // Limits due to registers/SM - int const regs_per_sm = properties.regsPerMultiprocessor; - int const regs_per_thread = attributes.numRegs; - // The granularity of register allocation is chunks of 256 registers per warp - // -> 8 registers per thread - int const allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8); -- int const max_blocks_regs = -- regs_per_sm / (allocated_regs_per_thread * block_size); -+ int max_blocks_regs = regs_per_sm / (allocated_regs_per_thread * block_size); -+ -+ // Compute the maximum number of warps as a function of the number of -+ // registers -+ int const max_warps_per_sm_registers = -+ cuda_max_warps_per_sm_registers(properties, attributes); -+ -+ // Constrain the number of blocks to respect the maximum number of warps per -+ // SM On face value this should be an equality, but due to the warp -+ // granularity constraints noted in `cuda_max_warps_per_sm_registers` the -+ // left-hand-side of this comparison can overshoot what the hardware allows -+ // based on register counts alone -+ while ((max_blocks_regs * block_size / properties.warpSize) > -+ max_warps_per_sm_registers) -+ max_blocks_regs--; - - // Limits due to shared memory/SM - size_t const shmem_per_sm = properties.sharedMemPerMultiprocessor; -@@ -207,6 +259,19 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance, - LaunchBounds{}); - } - -+template -+int cuda_get_opt_block_size_no_shmem(const cudaFuncAttributes& attr, -+ LaunchBounds) { -+ auto const& prop = Kokkos::Cuda().cuda_device_prop(); -+ -+ // Thin version of cuda_get_opt_block_size for cases where there is no shared -+ // memory -+ auto const block_size_to_no_shmem = [&](int /*block_size*/) { return 0; }; -+ -+ return cuda_deduce_block_size(false, prop, attr, block_size_to_no_shmem, -+ LaunchBounds{}); -+} -+ - } // namespace Impl - } // namespace Kokkos - -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp -index 5c4c3a7d39..170183ca0a 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp -@@ -188,9 +188,23 @@ inline void configure_shmem_preference(const KernelFuncPtr& func, - // The granularity of register allocation is chunks of 256 registers per warp - // -> 8 registers per thread - const size_t allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8); -- const size_t max_blocks_regs = -+ size_t max_blocks_regs = - regs_per_sm / (allocated_regs_per_thread * block_size); - -+ // Compute the maximum number of warps as a function of the number of -+ // registers -+ const size_t max_warps_per_sm_registers = -+ cuda_max_warps_per_sm_registers(device_props, func_attr); -+ -+ // Constrain the number of blocks to respect the maximum number of warps per -+ // SM On face value this should be an equality, but due to the warp -+ // granularity constraints noted in `cuda_max_warps_per_sm_registers` the -+ // left-hand-side of this comparison can overshoot what the hardware allows -+ // based on register counts alone -+ while ((max_blocks_regs * block_size / device_props.warpSize) > -+ max_warps_per_sm_registers) -+ max_blocks_regs--; -+ - // Compute how many threads per sm we actually want - const size_t max_threads_per_sm = device_props.maxThreadsPerMultiProcessor; - // only allocate multiples of warp size -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp -index 0e348c092a..7e4f62f12e 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp -@@ -67,6 +67,34 @@ - namespace Kokkos { - namespace Impl { - -+template -+int max_tile_size_product_helper(const Policy& pol, const LaunchBounds&) { -+ cudaFuncAttributes attr = -+ CudaParallelLaunch::get_cuda_func_attributes(); -+ auto const& prop = pol.space().cuda_device_prop(); -+ -+ // Limits due to registers/SM, MDRange doesn't have -+ // shared memory constraints -+ int const optimal_block_size = -+ Kokkos::Impl::cuda_get_opt_block_size_no_shmem(attr, LaunchBounds{}); -+ -+ // Compute how many blocks of this size we can launch, based on warp -+ // constraints -+ int const max_warps_per_sm_registers = -+ Kokkos::Impl::cuda_max_warps_per_sm_registers(prop, attr); -+ int const max_num_threads_from_warps = -+ max_warps_per_sm_registers * prop.warpSize; -+ int const max_num_blocks = max_num_threads_from_warps / optimal_block_size; -+ -+ // Compute the total number of threads -+ int const max_threads_per_sm = optimal_block_size * max_num_blocks; -+ -+ return std::min( -+ max_threads_per_sm, -+ static_cast(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism)); -+} -+ - template - class ParallelFor, Kokkos::Cuda> { - public: -@@ -85,18 +113,7 @@ class ParallelFor, Kokkos::Cuda> { - public: - template - static int max_tile_size_product(const Policy& pol, const Functor&) { -- cudaFuncAttributes attr = -- CudaParallelLaunch::get_cuda_func_attributes(); -- auto const& prop = pol.space().cuda_device_prop(); -- // Limits due to registers/SM, MDRange doesn't have -- // shared memory constraints -- int const regs_per_sm = prop.regsPerMultiprocessor; -- int const regs_per_thread = attr.numRegs; -- int const max_threads_per_sm = regs_per_sm / regs_per_thread; -- return std::min( -- max_threads_per_sm, -- static_cast(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism)); -+ return max_tile_size_product_helper(pol, LaunchBounds{}); - } - Policy const& get_policy() const { return m_rp; } - inline __device__ void operator()() const { -@@ -258,17 +275,7 @@ class ParallelReduce, ReducerType, - public: - template - static int max_tile_size_product(const Policy& pol, const Functor&) { -- cudaFuncAttributes attr = -- CudaParallelLaunch::get_cuda_func_attributes(); -- auto const& prop = pol.space().cuda_device_prop(); -- // Limits due do registers/SM -- int const regs_per_sm = prop.regsPerMultiprocessor; -- int const regs_per_thread = attr.numRegs; -- int const max_threads_per_sm = regs_per_sm / regs_per_thread; -- return std::min( -- max_threads_per_sm, -- static_cast(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism)); -+ return max_tile_size_product_helper(pol, LaunchBounds{}); - } - Policy const& get_policy() const { return m_policy; } - inline __device__ void exec_range(reference_type update) const { diff --git a/lib/kokkos/kokkos_fix_5706_apply_last.diff b/lib/kokkos/kokkos_fix_5706_apply_last.diff deleted file mode 100644 index 5d298323fd..0000000000 --- a/lib/kokkos/kokkos_fix_5706_apply_last.diff +++ /dev/null @@ -1,63 +0,0 @@ -diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp -index 170183ca0a..ba43e362bb 100644 ---- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp -+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp -@@ -412,12 +412,16 @@ struct CudaParallelLaunchKernelInvoker< - Impl::check_shmem_request(cuda_instance, shmem); - if (DriverType::Policy:: - experimental_contains_desired_occupancy) { -+ /* - int desired_occupancy = - driver.get_policy().impl_get_desired_occupancy().value(); - size_t block_size = block.x * block.y * block.z; - Impl::configure_shmem_preference( - base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size, -- shmem, desired_occupancy); -+ shmem, desired_occupancy);*/ -+ Kokkos::Impl::throw_runtime_exception( -+ std::string("Cuda graph node creation FAILED:" -+ " occupancy requests are currently broken.")); - } - - void const* args[] = {&driver}; -@@ -511,14 +515,17 @@ struct CudaParallelLaunchKernelInvoker< - - if (!Impl::is_empty_launch(grid, block)) { - Impl::check_shmem_request(cuda_instance, shmem); -- if constexpr (DriverType::Policy:: -+ if (DriverType::Policy:: - experimental_contains_desired_occupancy) { -- int desired_occupancy = -+ /*int desired_occupancy = - driver.get_policy().impl_get_desired_occupancy().value(); - size_t block_size = block.x * block.y * block.z; - Impl::configure_shmem_preference( - base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size, -- shmem, desired_occupancy); -+ shmem, desired_occupancy);*/ -+ Kokkos::Impl::throw_runtime_exception( -+ std::string("Cuda graph node creation FAILED:" -+ " occupancy requests are currently broken.")); - } - - auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver); -@@ -690,14 +697,17 @@ struct CudaParallelLaunchImpl< - - if (DriverType::Policy:: - experimental_contains_desired_occupancy) { -- int desired_occupancy = -+ /*int desired_occupancy = - driver.get_policy().impl_get_desired_occupancy().value(); - size_t block_size = block.x * block.y * block.z; - Impl::configure_shmem_preference< - DriverType, - Kokkos::LaunchBounds>( - base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size, -- shmem, desired_occupancy); -+ shmem, desired_occupancy);*/ -+ Kokkos::Impl::throw_runtime_exception( -+ std::string("Cuda graph node creation FAILED:" -+ " occupancy requests are currently broken.")); - } - - KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE(); diff --git a/potentials/HGa.msmeam b/potentials/HGa.msmeam new file mode 100644 index 0000000000..9f01501c16 --- /dev/null +++ b/potentials/HGa.msmeam @@ -0,0 +1,30 @@ +bkgd_dyn = 1 +emb_lin_neg = 1 +augt1=0 +ialloy=1 +rc = 5.9 +#H +attrac(1,1)=0.460 +repuls(1,1)=0.460 +Cmin(1,1,1)=1.3 # PuMS +Cmax(1,1,1)= 2.80 +nn2(1,1)=1 +#Ga +rho0(2) = 0.6 +attrac(2,2)=0.097 +repuls(2,2)=0.097 +nn2(2,2)=1 +#HGa +attrac(1,2)=0.300 +repuls(1,2)=0.300 +lattce(1,2)=l12 +re(1,2)=3.19 +delta(1,2)=-0.48 +alpha(1,2)=6.6 +Cmin(1,1,2)=2.0 +Cmin(2,1,2)= 2.0 +Cmin(1,2,1)=2.0 +Cmin(2,2,1) = 1.4 +Cmin(1,2,2) = 1.4 +Cmin(1,1,2) = 1.4 +nn2(1,2)=1 diff --git a/potentials/library.msmeam b/potentials/library.msmeam new file mode 100644 index 0000000000..9937eaee08 --- /dev/null +++ b/potentials/library.msmeam @@ -0,0 +1,14 @@ +# DATE: 2018-09-22 UNITS: metal CONTRIBUTOR: Steve Valone, smv@lanl.gov CITATION: Baskes, PRB 1992; smv, sr, mib, JNM 2010 +# ms-meam data format May 2010 +# elt lat z ielement atwt +# alpha b0 b1 b2 b3 b1m b2m b3m alat esub asub +# - t0 t1 t2 t3 t1m t2m t3m rozero ibar +# NOTE: leading character cannot be a space + +'H' 'dim' 1.0 1 1.0079 +2.960 2.960 3.0 1.0 1.0 1.0 3.0 1.0 0.741 2.235 2.50 +1.0 0.44721 0.0 0.00 0.0 0.31623 0 6.70 0 + +'Ga4' 'fcc' 12.0 31 69.723 +4.42 4.80 3.10 6.00 0.00 0.0 0.0 0.5 4.247 2.897 0.97 +1.0 1.649 1.435 0.00 0.0 0.0 2.0 0.70 0 diff --git a/src/KOKKOS/Install.sh b/src/KOKKOS/Install.sh index 1b58abe3a2..f493b5438a 100755 --- a/src/KOKKOS/Install.sh +++ b/src/KOKKOS/Install.sh @@ -316,6 +316,8 @@ action pair_lj_spica_kokkos.cpp pair_lj_spica.cpp action pair_lj_spica_kokkos.h pair_lj_spica.h action pair_meam_kokkos.cpp pair_meam.cpp action pair_meam_kokkos.h pair_meam.h +action pair_meam_ms_kokkos.cpp pair_meam_ms.cpp +action pair_meam_ms_kokkos.h pair_meam_ms.h action pair_mliap_kokkos.cpp pair_mliap.cpp action pair_mliap_kokkos.h pair_mliap.h action pair_morse_kokkos.cpp diff --git a/src/KOKKOS/meam_dens_final_kokkos.h b/src/KOKKOS/meam_dens_final_kokkos.h index bcc7b558dc..5e7ffdec20 100644 --- a/src/KOKKOS/meam_dens_final_kokkos.h +++ b/src/KOKKOS/meam_dens_final_kokkos.h @@ -61,34 +61,61 @@ void MEAMKokkos::operator()(TagMEAMDensFinal, const int &i, EV_FLOAT if (elti >= 0) { scaleii = d_scale(type[i],type[i]); d_rho1[i] = 0.0; - d_rho2[i] = -1.0 / 3.0 * d_arho2b[i] * d_arho2b[i]; + if (msmeamflag) { + d_rho2[i] = -1.0 / 3.0 * (d_arho2b[i] * d_arho2b[i] + - d_arho2mb[i] * d_arho2mb[i]); + } else{ + d_rho2[i] = -1.0 / 3.0 * d_arho2b[i] * d_arho2b[i]; + } d_rho3[i] = 0.0; for (int m = 0; m < 3; m++) { - d_rho1[i] += d_arho1(i,m) * d_arho1(i,m); - d_rho3[i] -= 3.0 / 5.0 * d_arho3b(i,m) * d_arho3b(i,m); - } - for (int m = 0; m < 6; m++) - d_rho2[i] += v2D[m] * d_arho2(i,m) * d_arho2(i,m); - for (int m = 0; m < 10; m++) - d_rho3[i] += v3D[m] * d_arho3(i,m) * d_arho3(i,m); - - if (d_rho0[i] > 0.0) { - if (ialloy == 1) { - d_t_ave(i,0) = fdiv_zero_kk(d_t_ave(i,0), d_tsq_ave(i,0)); - d_t_ave(i,1) = fdiv_zero_kk(d_t_ave(i,1), d_tsq_ave(i,1)); - d_t_ave(i,2) = fdiv_zero_kk(d_t_ave(i,2), d_tsq_ave(i,2)); - } else if (ialloy == 2) { - d_t_ave(i,0) = t1_meam[elti]; - d_t_ave(i,1) = t2_meam[elti]; - d_t_ave(i,2) = t3_meam[elti]; - } else { - d_t_ave(i,0) /= d_rho0[i]; - d_t_ave(i,1) /= d_rho0[i]; - d_t_ave(i,2) /= d_rho0[i]; + if (msmeamflag) { + d_rho1[i] = d_rho1[i] + d_arho1(i, m) * d_arho1(i, m) + - d_arho1m(i, m) * d_arho1m(i, m); + d_rho3[i] = d_rho3[i] - 3.0 / 5.0 * (d_arho3b(i, m) * d_arho3b(i, m) + - d_arho3mb(i, m) * d_arho3mb(i, m)); + } else{ + d_rho1[i] += d_arho1(i,m) * d_arho1(i,m); + d_rho3[i] -= 3.0 / 5.0 * d_arho3b(i,m) * d_arho3b(i,m); } } + for (int m = 0; m < 6; m++){ + if (msmeamflag) { + d_rho2[i] = d_rho2[i] + v2D[m] * (d_arho2(i, m) * d_arho2(i, m) + - d_arho2m(i, m) * d_arho2m(i, m)); + } else{ + d_rho2[i] += v2D[m] * d_arho2(i,m) * d_arho2(i,m); + } + } + for (int m = 0; m < 10; m++) + if (msmeamflag) { + d_rho3[i] = d_rho3[i] + v3D[m] * (d_arho3(i, m) * d_arho3(i, m) + - d_arho3m(i, m) * d_arho3m(i, m)); + } else{ + d_rho3[i] += v3D[m] * d_arho3(i,m) * d_arho3(i,m); + } - d_gamma[i] = d_t_ave(i,0) * d_rho1[i] + d_t_ave(i,1) * d_rho2[i] + d_t_ave(i,2) * d_rho3[i]; + if (msmeamflag) { + // with msmeam all t weights are already accounted for in rho + d_gamma[i] = d_rho1[i] + d_rho2[i] + d_rho3[i]; + } else{ + if (d_rho0[i] > 0.0) { + if (ialloy == 1) { + d_t_ave(i,0) = fdiv_zero_kk(d_t_ave(i,0), d_tsq_ave(i,0)); + d_t_ave(i,1) = fdiv_zero_kk(d_t_ave(i,1), d_tsq_ave(i,1)); + d_t_ave(i,2) = fdiv_zero_kk(d_t_ave(i,2), d_tsq_ave(i,2)); + } else if (ialloy == 2) { + d_t_ave(i,0) = t1_meam[elti]; + d_t_ave(i,1) = t2_meam[elti]; + d_t_ave(i,2) = t3_meam[elti]; + } else { + d_t_ave(i,0) /= d_rho0[i]; + d_t_ave(i,1) /= d_rho0[i]; + d_t_ave(i,2) /= d_rho0[i]; + } + } + d_gamma[i] = d_t_ave(i,0) * d_rho1[i] + d_t_ave(i,1) * d_rho2[i] + d_t_ave(i,2) * d_rho3[i]; + } if (d_rho0[i] > 0.0) d_gamma[i] /= (d_rho0[i] * d_rho0[i]); diff --git a/src/KOKKOS/meam_dens_init_kokkos.h b/src/KOKKOS/meam_dens_init_kokkos.h index 31ac046dcf..60bb6553d8 100644 --- a/src/KOKKOS/meam_dens_init_kokkos.h +++ b/src/KOKKOS/meam_dens_init_kokkos.h @@ -43,11 +43,23 @@ void MEAMKokkos::operator()(TagMEAMZero, const int &i) const { d_rho0[i] = 0.0; d_arho2b[i] = 0.0; d_arho1(i,0) = d_arho1(i,1) = d_arho1(i,2) = 0.0; - for (int j = 0; j < 6; j++) + if (msmeamflag) { + d_arho2mb[i] = 0.0; + d_arho1m(i,0) = d_arho1m(i,1) = d_arho1m(i,2) = 0.0; + } + for (int j = 0; j < 6; j++) { d_arho2(i,j) = 0.0; - for (int j = 0; j < 10; j++) + if (msmeamflag) + d_arho2m(i,j) = 0.0; + } + for (int j = 0; j < 10; j++) { d_arho3(i,j) = 0.0; + if (msmeamflag) + d_arho3m(i,j) = 0.0; + } d_arho3b(i,0) = d_arho3b(i,1) = d_arho3b(i,2) = 0.0; + if (msmeamflag) + d_arho3mb(i,0) = d_arho3mb(i,1) = d_arho3mb(i,2) = 0.0; d_t_ave(i,0) = d_t_ave(i,1) = d_t_ave(i,2) = 0.0; d_tsq_ave(i,0) = d_tsq_ave(i,1) = d_tsq_ave(i,2) = 0.0; } @@ -80,13 +92,20 @@ MEAMKokkos::meam_dens_setup(int atom_nmax, int nall, int n_neigh) memoryKK->destroy_kokkos(k_arho3b,arho3b); memoryKK->destroy_kokkos(k_t_ave,t_ave); memoryKK->destroy_kokkos(k_tsq_ave,tsq_ave); + // msmeam + memoryKK->destroy_kokkos(k_arho2mb, arho2mb); + memoryKK->destroy_kokkos(k_arho1m, arho1m); + memoryKK->destroy_kokkos(k_arho2m, arho2m); + memoryKK->destroy_kokkos(k_arho3m, arho3m); + memoryKK->destroy_kokkos(k_arho3mb, arho3mb); nmax = atom_nmax; -// memory->create(rho, nmax, "pair:rho"); + + //memory->create(rho, nmax, "pair:rho"); k_rho = DAT::tdual_ffloat_1d("pair:rho",nmax); d_rho = k_rho.template view(); h_rho = k_rho.h_view; - // memory->create(rho0, nmax, "pair:rho0"); + //memory->create(rho0, nmax, "pair:rho0"); k_rho0 = DAT::tdual_ffloat_1d("pair:rho0",nmax); d_rho0 = k_rho0.template view(); h_rho0 = k_rho0.h_view; @@ -150,6 +169,28 @@ MEAMKokkos::meam_dens_setup(int atom_nmax, int nall, int n_neigh) k_tsq_ave = DAT::tdual_ffloat_2d("pair:tsq_ave",nmax, 3); d_tsq_ave = k_tsq_ave.template view(); h_tsq_ave = k_tsq_ave.h_view; + + // msmeam + //memory->create(arho2mb, nmax, "pair:arho2mb"); + k_arho2mb = DAT::tdual_ffloat_1d("pair:arho2mb",nmax); + d_arho2mb = k_arho2mb.template view(); + h_arho2mb = k_arho2mb.h_view; + //memory->create(arho1m, nmax, 3, "pair:arho1m"); + k_arho1m = DAT::tdual_ffloat_2d("pair:arho1m", nmax, 3); + d_arho1m = k_arho1m.template view(); + h_arho1m = k_arho1m.h_view; + //memory->create(arho2m, nmax, 6, "pair:arho2m"); + k_arho2m = DAT::tdual_ffloat_2d("pair:arho2m", nmax, 6); + d_arho2m = k_arho2m.template view(); + h_arho2m = k_arho2m.h_view; + //memory->create(arho3m, nmax, 10, "pair:arho3m"); + k_arho3m = DAT::tdual_ffloat_2d("pair:arho3m", nmax, 10); + d_arho3m = k_arho3m.template view(); + h_arho3m = k_arho3m.h_view; + //memory->create(arho3mb, nmax, 3, "pair:arho3mb"); + k_arho3mb = DAT::tdual_ffloat_2d("pair:arho3mb", nmax, 3); + d_arho3mb = k_arho3mb.template view(); + h_arho3mb = k_arho3mb.h_view; } if (n_neigh > maxneigh) { @@ -206,6 +247,12 @@ MEAMKokkos::meam_dens_init(int inum_half, int ntype, typename AT::t_ dup_arho3b = Kokkos::Experimental::create_scatter_view(d_arho3b); dup_t_ave = Kokkos::Experimental::create_scatter_view(d_t_ave); dup_tsq_ave = Kokkos::Experimental::create_scatter_view(d_tsq_ave); + // msmeam + dup_arho2mb = Kokkos::Experimental::create_scatter_view(d_arho2mb); + dup_arho1m = Kokkos::Experimental::create_scatter_view(d_arho1m); + dup_arho2m = Kokkos::Experimental::create_scatter_view(d_arho2m); + dup_arho3m = Kokkos::Experimental::create_scatter_view(d_arho3m); + dup_arho3mb = Kokkos::Experimental::create_scatter_view(d_arho3mb); } else { ndup_rho0 = Kokkos::Experimental::create_scatter_view(d_rho0); ndup_arho2b = Kokkos::Experimental::create_scatter_view(d_arho2b); @@ -215,6 +262,12 @@ MEAMKokkos::meam_dens_init(int inum_half, int ntype, typename AT::t_ ndup_arho3b = Kokkos::Experimental::create_scatter_view(d_arho3b); ndup_t_ave = Kokkos::Experimental::create_scatter_view(d_t_ave); ndup_tsq_ave = Kokkos::Experimental::create_scatter_view(d_tsq_ave); + // msmeam + ndup_arho2mb = Kokkos::Experimental::create_scatter_view(d_arho2mb); + ndup_arho1m = Kokkos::Experimental::create_scatter_view(d_arho1m); + ndup_arho2m = Kokkos::Experimental::create_scatter_view(d_arho2m); + ndup_arho3m = Kokkos::Experimental::create_scatter_view(d_arho3m); + ndup_arho3mb = Kokkos::Experimental::create_scatter_view(d_arho3mb); } copymode = 1; @@ -233,6 +286,12 @@ MEAMKokkos::meam_dens_init(int inum_half, int ntype, typename AT::t_ Kokkos::Experimental::contribute(d_arho3b, dup_arho3b); Kokkos::Experimental::contribute(d_t_ave, dup_t_ave); Kokkos::Experimental::contribute(d_tsq_ave, dup_tsq_ave); + // msmeam + Kokkos::Experimental::contribute(d_arho2mb, dup_arho2mb); + Kokkos::Experimental::contribute(d_arho1m, dup_arho1m); + Kokkos::Experimental::contribute(d_arho2m, dup_arho2m); + Kokkos::Experimental::contribute(d_arho3m, dup_arho3m); + Kokkos::Experimental::contribute(d_arho3mb, dup_arho3mb); // free duplicated memory dup_rho0 = decltype(dup_rho0)(); @@ -243,6 +302,12 @@ MEAMKokkos::meam_dens_init(int inum_half, int ntype, typename AT::t_ dup_arho3b = decltype(dup_arho3b)(); dup_t_ave = decltype(dup_t_ave)(); dup_tsq_ave = decltype(dup_tsq_ave)(); + // msmeam + dup_arho2mb = decltype(dup_arho2mb)(); + dup_arho1m = decltype(dup_arho1m)(); + dup_arho2m = decltype(dup_arho2m)(); + dup_arho3m = decltype(dup_arho3m)(); + dup_arho3mb = decltype(dup_arho3mb)(); } } @@ -417,7 +482,6 @@ MEAMKokkos::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty int offset) const { // The rho0, etc. arrays are duplicated for OpenMP, atomic for CUDA, and neither for Serial - auto v_rho0 = ScatterViewHelper,decltype(dup_rho0),decltype(ndup_rho0)>::get(dup_rho0,ndup_rho0); auto a_rho0 = v_rho0.template access>(); auto v_arho2b = ScatterViewHelper,decltype(dup_arho2b),decltype(ndup_arho2b)>::get(dup_arho2b,ndup_arho2b); @@ -434,6 +498,17 @@ MEAMKokkos::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty auto a_t_ave = v_t_ave.template access>(); auto v_tsq_ave = ScatterViewHelper,decltype(dup_tsq_ave),decltype(ndup_tsq_ave)>::get(dup_tsq_ave,ndup_tsq_ave); auto a_tsq_ave = v_tsq_ave.template access>(); + // msmeam + auto v_arho2mb = ScatterViewHelper,decltype(dup_arho2mb),decltype(ndup_arho2mb)>::get(dup_arho2mb,ndup_arho2mb); + auto a_arho2mb = v_arho2mb.template access>(); + auto v_arho1m = ScatterViewHelper,decltype(dup_arho1m),decltype(ndup_arho1m)>::get(dup_arho1m,ndup_arho1m); + auto a_arho1m = v_arho1m.template access>(); + auto v_arho2m = ScatterViewHelper,decltype(dup_arho2m),decltype(ndup_arho2m)>::get(dup_arho2m,ndup_arho2m); + auto a_arho2m = v_arho2m.template access>(); + auto v_arho3m = ScatterViewHelper,decltype(dup_arho3m),decltype(ndup_arho3m)>::get(dup_arho3m,ndup_arho3m); + auto a_arho3m = v_arho3m.template access>(); + auto v_arho3mb = ScatterViewHelper,decltype(dup_arho3mb),decltype(ndup_arho3mb)>::get(dup_arho3mb,ndup_arho3mb); + auto a_arho3mb = v_arho3mb.template access>(); const int elti = d_map[type[i]]; const double xtmp = x(i,0); @@ -463,6 +538,16 @@ MEAMKokkos::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty double rhoa1i = ro0i * MathSpecialKokkos::fm_exp(-beta1_meam[elti] * ai) * sij; double rhoa2i = ro0i * MathSpecialKokkos::fm_exp(-beta2_meam[elti] * ai) * sij; double rhoa3i = ro0i * MathSpecialKokkos::fm_exp(-beta3_meam[elti] * ai) * sij; + // msmeam + double rhoa1mj, rhoa2mj, rhoa3mj, rhoa1mi, rhoa2mi, rhoa3mi; + if (msmeamflag) { + rhoa1mj = ro0j * t1m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta1m_meam[eltj] * aj) * sij; + rhoa2mj = ro0j * t2m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta2m_meam[eltj] * aj) * sij; + rhoa3mj = ro0j * t3m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta3m_meam[eltj] * aj) * sij; + rhoa1mi = ro0i * t1m_meam[elti] * MathSpecialKokkos::fm_exp(-beta1m_meam[elti] * ai) * sij; + rhoa2mi = ro0i * t2m_meam[elti] * MathSpecialKokkos::fm_exp(-beta2m_meam[elti] * ai) * sij; + rhoa3mi = ro0i * t3m_meam[elti] * MathSpecialKokkos::fm_exp(-beta3m_meam[elti] * ai) * sij; + } if (ialloy == 1) { rhoa1j *= t1_meam[eltj]; rhoa2j *= t2_meam[eltj]; @@ -499,20 +584,45 @@ MEAMKokkos::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty const double A1i = rhoa1i / rij; const double A2i = rhoa2i / rij2; const double A3i = rhoa3i / (rij2 * rij); + double A1mj, A2mj, A3mj, A1mi, A2mi, A3mi; + if (msmeamflag) { + a_arho2mb[i] += rhoa2mj; + a_arho2mb[j] += rhoa2mi; + A1mj = rhoa1mj / rij; + A2mj = rhoa2mj / rij2; + A3mj = rhoa3mj / (rij2 * rij); + A1mi = rhoa1mi / rij; + A2mi = rhoa2mi / rij2; + A3mi = rhoa3mi / (rij2 * rij); + } int nv2 = 0; int nv3 = 0; for (int m = 0; m < 3; m++) { - a_arho1(i,m) += A1j * delij[m]; + a_arho1(i,m) += A1j * delij[m]; a_arho1(j,m) += -A1i * delij[m]; - a_arho3b(i,m) += rhoa3j * delij[m] / rij; + a_arho3b(i,m) += rhoa3j * delij[m] / rij; a_arho3b(j,m) += -rhoa3i * delij[m] / rij; + if (msmeamflag) { + a_arho1m(i,m) += A1mj * delij[m]; + a_arho1m(j,m) += -A1mi * delij[m]; + a_arho3mb(i,m) += rhoa3mj * delij[m] / rij; + a_arho3mb(j,m) += -rhoa3mi * delij[m] / rij; + } for (int n = m; n < 3; n++) { a_arho2(i,nv2) += A2j * delij[m] * delij[n]; a_arho2(j,nv2) += A2i * delij[m] * delij[n]; + if (msmeamflag) { + a_arho2m(i,nv2) += A2mj * delij[m] * delij[n]; + a_arho2m(j,nv2) += A2mi * delij[m] * delij[n]; + } nv2++; for (int p = n; p < 3; p++) { - a_arho3(i,nv3) += A3j * delij[m] * delij[n] * delij[p]; + a_arho3(i,nv3) += A3j * delij[m] * delij[n] * delij[p]; a_arho3(j,nv3) += -A3i * delij[m] * delij[n] * delij[p]; + if (msmeamflag) { + a_arho3m(i,nv3) += A3mj * delij[m] * delij[n] * delij[p]; + a_arho3m(j,nv3) += -A3mi * delij[m] * delij[n] * delij[p]; + } nv3++; } } diff --git a/src/KOKKOS/meam_force_kokkos.h b/src/KOKKOS/meam_force_kokkos.h index e7e6c64231..5c4244e99b 100644 --- a/src/KOKKOS/meam_force_kokkos.h +++ b/src/KOKKOS/meam_force_kokkos.h @@ -119,6 +119,17 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos::operator()(TagMEAMForce::operator()(TagMEAMForce::operator()(TagMEAMForce::operator()(TagMEAMForce::operator()(TagMEAMForce::operator()(TagMEAMForce::operator()(TagMEAMForcev2D[nv2]; + drho3mdrm1[m] = drho3mdrm1[m] + d_arho3m(i, vind3D[m][n][p]) * arg; + drho3mdrm2[m] = drho3mdrm2[m] + d_arho3m(j, vind3D[m][n][p]) * arg; + nv2 = nv2 + 1; + } + } + drho3mdrm1[m] = (a3 * drho3mdrm1[m] - a3a * d_arho3mb(i, m)) * rhoa3mj; + drho3mdrm2[m] = (-a3 * drho3mdrm2[m] + a3a * d_arho3mb(j, m)) * rhoa3mi; + } + } else { + for (m = 0; m < 3; m++) { + drho1mdrm1[m] = 0.0; + drho1mdrm2[m] = 0.0; + drho2mdrm1[m] = 0.0; + drho2mdrm2[m] = 0.0; + drho3mdrm1[m] = 0.0; + drho3mdrm2[m] = 0.0; + } + } + // Compute derivatives of weighting functions t wrt rij - t1i = d_t_ave(i, 0); - t2i = d_t_ave(i, 1); - t3i = d_t_ave(i, 2); - t1j = d_t_ave(j, 0); - t2j = d_t_ave(j, 1); - t3j = d_t_ave(j, 2); - - if (ialloy == 1) { - - a1i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 0)); - a1j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 0)); - a2i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 1)); - a2j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 1)); - a3i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 2)); - a3j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 2)); - - dt1dr1 = a1i * (t1mj - t1i * MathSpecialKokkos::square(t1mj)); - dt1dr2 = a1j * (t1mi - t1j * MathSpecialKokkos::square(t1mi)); - dt2dr1 = a2i * (t2mj - t2i * MathSpecialKokkos::square(t2mj)); - dt2dr2 = a2j * (t2mi - t2j * MathSpecialKokkos::square(t2mi)); - dt3dr1 = a3i * (t3mj - t3i * MathSpecialKokkos::square(t3mj)); - dt3dr2 = a3j * (t3mi - t3j * MathSpecialKokkos::square(t3mi)); - - } else if (ialloy == 2) { + // Weighting functions t set to unity for msmeam + if (msmeamflag) { + t1i = 1.0; + t2i = 1.0; + t3i = 1.0; + t1j = 1.0; + t2j = 1.0; + t3j = 1.0; dt1dr1 = 0.0; dt1dr2 = 0.0; dt2dr1 = 0.0; dt2dr2 = 0.0; dt3dr1 = 0.0; dt3dr2 = 0.0; - } else { - ai = 0.0; - if (!iszero_kk(d_rho0[i])) ai = drhoa0j * sij / d_rho0[i]; - aj = 0.0; - if (!iszero_kk(d_rho0[j])) aj = drhoa0i * sij / d_rho0[j]; + t1i = d_t_ave(i, 0); + t2i = d_t_ave(i, 1); + t3i = d_t_ave(i, 2); + t1j = d_t_ave(j, 0); + t2j = d_t_ave(j, 1); + t3j = d_t_ave(j, 2); - dt1dr1 = ai * (t1mj - t1i); - dt1dr2 = aj * (t1mi - t1j); - dt2dr1 = ai * (t2mj - t2i); - dt2dr2 = aj * (t2mi - t2j); - dt3dr1 = ai * (t3mj - t3i); - dt3dr2 = aj * (t3mi - t3j); + if (ialloy == 1) { + + a1i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 0)); + a1j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 0)); + a2i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 1)); + a2j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 1)); + a3i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 2)); + a3j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 2)); + + dt1dr1 = a1i * (t1mj - t1i * MathSpecialKokkos::square(t1mj)); + dt1dr2 = a1j * (t1mi - t1j * MathSpecialKokkos::square(t1mi)); + dt2dr1 = a2i * (t2mj - t2i * MathSpecialKokkos::square(t2mj)); + dt2dr2 = a2j * (t2mi - t2j * MathSpecialKokkos::square(t2mi)); + dt3dr1 = a3i * (t3mj - t3i * MathSpecialKokkos::square(t3mj)); + dt3dr2 = a3j * (t3mi - t3j * MathSpecialKokkos::square(t3mi)); + + } else if (ialloy == 2) { + + dt1dr1 = 0.0; + dt1dr2 = 0.0; + dt2dr1 = 0.0; + dt2dr2 = 0.0; + dt3dr1 = 0.0; + dt3dr2 = 0.0; + + } else { + + ai = 0.0; + if (!iszero_kk(d_rho0[i])) ai = drhoa0j * sij / d_rho0[i]; + aj = 0.0; + if (!iszero_kk(d_rho0[j])) aj = drhoa0i * sij / d_rho0[j]; + + dt1dr1 = ai * (t1mj - t1i); + dt1dr2 = aj * (t1mi - t1j); + dt2dr1 = ai * (t2mj - t2i); + dt2dr2 = aj * (t2mi - t2j); + dt3dr1 = ai * (t3mj - t3i); + dt3dr2 = aj * (t3mi - t3j); + } } // Compute derivatives of total density wrt rij, sij and rij(3) get_shpfcn(lattce_meam[elti][elti], stheta_meam[elti][elti], ctheta_meam[elti][elti], shpi); get_shpfcn(lattce_meam[eltj][eltj], stheta_meam[elti][elti], ctheta_meam[elti][elti], shpj); - drhodr1 = d_dgamma1[i] * drho0dr1 + - d_dgamma2[i] * - (dt1dr1 * d_rho1[i] + t1i * drho1dr1 + dt2dr1 * d_rho2[i] + t2i * drho2dr1 + - dt3dr1 * d_rho3[i] + t3i * drho3dr1) - - d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1); - drhodr2 = d_dgamma1[j] * drho0dr2 + - d_dgamma2[j] * - (dt1dr2 * d_rho1[j] + t1j * drho1dr2 + dt2dr2 * d_rho2[j] + t2j * drho2dr2 + - dt3dr2 * d_rho3[j] + t3j * drho3dr2) - - d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2); - for (m = 0; m < 3; m++) { - drhodrm1[m] = 0.0; - drhodrm2[m] = 0.0; - drhodrm1[m] = - d_dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]); - drhodrm2[m] = - d_dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]); + + if (msmeamflag) { + drhodr1 = d_dgamma1[i] * drho0dr1 + + d_dgamma2[i] * (dt1dr1 * d_rho1[i] + t1i * (drho1dr1 - drho1mdr1) + + dt2dr1 * d_rho2[i] + t2i * (drho2dr1 - drho2mdr1) + + dt3dr1 * d_rho3[i] + t3i * (drho3dr1 - drho3mdr1)) - + d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1); + drhodr2 = d_dgamma1[j] * drho0dr2 + + d_dgamma2[j] * (dt1dr2 * d_rho1[j] + t1j * (drho1dr2 - drho1mdr2) + + dt2dr2 * d_rho2[j] + t2j * (drho2dr2 - drho2mdr2) + + dt3dr2 * d_rho3[j] + t3j * (drho3dr2 - drho3mdr2)) - + d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2); + for (m = 0; m < 3; m++) { + drhodrm1[m] = 0.0; + drhodrm2[m] = 0.0; + drhodrm1[m] = d_dgamma2[i] * (t1i * (drho1drm1[m] - drho1mdrm1[m]) + + t2i * (drho2drm1[m] - drho2mdrm1[m]) + + t3i * (drho3drm1[m] - drho3mdrm1[m]) ); + drhodrm2[m] = d_dgamma2[j] * (t1j * (drho1drm2[m] - drho1mdrm2[m]) + + t2j * (drho2drm2[m] - drho2mdrm2[m]) + + t3j * (drho3drm2[m] - drho3mdrm2[m]) ); + } + } else { + drhodr1 = d_dgamma1[i] * drho0dr1 + + d_dgamma2[i] * + (dt1dr1 * d_rho1[i] + t1i * drho1dr1 + dt2dr1 * d_rho2[i] + t2i * drho2dr1 + + dt3dr1 * d_rho3[i] + t3i * drho3dr1) - + d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1); + drhodr2 = d_dgamma1[j] * drho0dr2 + + d_dgamma2[j] * + (dt1dr2 * d_rho1[j] + t1j * drho1dr2 + dt2dr2 * d_rho2[j] + t2j * drho2dr2 + + dt3dr2 * d_rho3[j] + t3j * drho3dr2) - + d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2); + for (m = 0; m < 3; m++) { + drhodrm1[m] = 0.0; + drhodrm2[m] = 0.0; + drhodrm1[m] = + d_dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]); + drhodrm2[m] = + d_dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]); + } } // Compute derivatives wrt sij, but only if necessary @@ -416,6 +594,24 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos::operator()(TagMEAMForce::operator()(TagMEAMForce::~MEAMKokkos() memoryKK->destroy_kokkos(k_scrfcn,scrfcn); memoryKK->destroy_kokkos(k_dscrfcn,dscrfcn); memoryKK->destroy_kokkos(k_fcpair,fcpair); + + // msmeam + + memoryKK->destroy_kokkos(k_arho2mb, arho2mb); + memoryKK->destroy_kokkos(k_arho1m, arho1m); + memoryKK->destroy_kokkos(k_arho2m, arho2m); + memoryKK->destroy_kokkos(k_arho3m, arho3m); + memoryKK->destroy_kokkos(k_arho3mb, arho3mb); } #include "meam_setup_done_kokkos.h" diff --git a/src/KOKKOS/meam_kokkos.h b/src/KOKKOS/meam_kokkos.h index cc75023810..2203355641 100644 --- a/src/KOKKOS/meam_kokkos.h +++ b/src/KOKKOS/meam_kokkos.h @@ -136,6 +136,13 @@ template class MEAMKokkos : public MEAM { DAT::tdual_ffloat_1d k_scrfcn, k_dscrfcn, k_fcpair; typename ArrayTypes::t_ffloat_1d d_scrfcn, d_dscrfcn, d_fcpair; HAT::t_ffloat_1d h_scrfcn, h_dscrfcn, h_fcpair; + // msmeam + DAT::tdual_ffloat_2d k_arho1m, k_arho2m, k_arho3m, k_arho3mb; + typename ArrayTypes::t_ffloat_2d d_arho1m, d_arho2m, d_arho3m, d_arho3mb; + HAT::t_ffloat_2d h_arho1m, h_arho2m, h_arho3m, h_arho3mb; + DAT::tdual_ffloat_1d k_arho2mb; + typename ArrayTypes::t_ffloat_1d d_arho2mb; + HAT::t_ffloat_1d h_arho2mb; protected: int need_dup; @@ -195,6 +202,31 @@ template class MEAMKokkos : public MEAM { dup_vatom; NonDupScatterView ndup_vatom; + + // msmeam + + DupScatterView + dup_arho1m; + NonDupScatterView + ndup_arho1m; + DupScatterView + dup_arho2m; + NonDupScatterView + ndup_arho2m; + DupScatterView + dup_arho3m; + NonDupScatterView + ndup_arho3m; + DupScatterView + dup_arho2mb; + NonDupScatterView + ndup_arho2mb; + DupScatterView + dup_arho3mb; + NonDupScatterView + ndup_arho3mb; }; KOKKOS_INLINE_FUNCTION diff --git a/src/KOKKOS/pair_meam_kokkos.cpp b/src/KOKKOS/pair_meam_kokkos.cpp index 90e714cefe..c2b03c2054 100644 --- a/src/KOKKOS/pair_meam_kokkos.cpp +++ b/src/KOKKOS/pair_meam_kokkos.cpp @@ -51,6 +51,7 @@ PairMEAMKokkos::PairMEAMKokkos(LAMMPS *lmp) : PairMEAM(lmp) delete meam_inst; meam_inst_kk = new MEAMKokkos(memory); meam_inst = meam_inst_kk; + myname = "meam/kk"; } /* ---------------------------------------------------------------------- */ @@ -156,7 +157,8 @@ void PairMEAMKokkos::compute(int eflag_in, int vflag_in) int need_dup = lmp->kokkos->need_dup(); - meam_inst_kk->meam_dens_init(inum_half,ntype,type,d_map,x,d_numneigh_half,d_numneigh_full,d_ilist_half,d_neighbors_half, d_neighbors_full, d_offset, neighflag, need_dup); + meam_inst_kk->meam_dens_init(inum_half,ntype,type,d_map,x,d_numneigh_half,d_numneigh_full, + d_ilist_half,d_neighbors_half, d_neighbors_full, d_offset, neighflag, need_dup); meam_inst_kk->k_rho0.template modify(); meam_inst_kk->k_arho2b.template modify(); @@ -166,6 +168,13 @@ void PairMEAMKokkos::compute(int eflag_in, int vflag_in) meam_inst_kk->k_arho3b.template modify(); meam_inst_kk->k_t_ave.template modify(); meam_inst_kk->k_tsq_ave.template modify(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.template modify(); + meam_inst_kk->k_arho1m.template modify(); + meam_inst_kk->k_arho2m.template modify(); + meam_inst_kk->k_arho3m.template modify(); + meam_inst_kk->k_arho3mb.template modify(); + } comm->reverse_comm(this); @@ -177,6 +186,13 @@ void PairMEAMKokkos::compute(int eflag_in, int vflag_in) meam_inst_kk->k_arho3b.template sync(); meam_inst_kk->k_t_ave.template sync(); meam_inst_kk->k_tsq_ave.template sync(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.template sync(); + meam_inst_kk->k_arho1m.template sync(); + meam_inst_kk->k_arho2m.template sync(); + meam_inst_kk->k_arho3m.template sync(); + meam_inst_kk->k_arho3mb.template sync(); + } meam_inst_kk->meam_dens_final(nlocal,eflag_either,eflag_global,eflag_atom, d_eatom,ntype,type,d_map,d_scale,errorflag,ev); @@ -200,6 +216,13 @@ void PairMEAMKokkos::compute(int eflag_in, int vflag_in) meam_inst_kk->k_arho3b.template modify(); meam_inst_kk->k_t_ave.template modify(); meam_inst_kk->k_tsq_ave.template modify(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.template modify(); + meam_inst_kk->k_arho1m.template modify(); + meam_inst_kk->k_arho2m.template modify(); + meam_inst_kk->k_arho3m.template modify(); + meam_inst_kk->k_arho3mb.template modify(); + } comm->forward_comm(this); @@ -219,6 +242,13 @@ void PairMEAMKokkos::compute(int eflag_in, int vflag_in) meam_inst_kk->k_arho3b.template sync(); meam_inst_kk->k_t_ave.template sync(); meam_inst_kk->k_tsq_ave.template sync(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.template sync(); + meam_inst_kk->k_arho1m.template sync(); + meam_inst_kk->k_arho2m.template sync(); + meam_inst_kk->k_arho3m.template sync(); + meam_inst_kk->k_arho3mb.template sync(); + } meam_inst_kk->meam_force(inum_half,eflag_global,eflag_atom,vflag_global, vflag_atom,d_eatom,ntype,type,d_map,x, @@ -315,7 +345,7 @@ int PairMEAMKokkos::pack_forward_comm_kokkos(int n, DAT::tdual_int_2 iswap = iswap_in; v_buf = buf.view(); Kokkos::parallel_for(Kokkos::RangePolicy(0,n),*this); - return n*38; + return n*comm_forward; } /* ---------------------------------------------------------------------- */ @@ -324,7 +354,7 @@ template KOKKOS_INLINE_FUNCTION void PairMEAMKokkos::operator()(TagPairMEAMPackForwardComm, const int &i) const { int j = d_sendlist(iswap, i); - int m = i*38; + int m = i*comm_forward; v_buf[m++] = d_rho0[j]; v_buf[m++] = d_rho1[j]; v_buf[m++] = d_rho2[j]; @@ -354,6 +384,22 @@ void PairMEAMKokkos::operator()(TagPairMEAMPackForwardComm, const in v_buf[m++] = d_tsq_ave(j,0); v_buf[m++] = d_tsq_ave(j,1); v_buf[m++] = d_tsq_ave(j,2); + if (msmeamflag) { + v_buf[m++] = d_arho2mb[j]; + v_buf[m++] = d_arho1m(j,0); + v_buf[m++] = d_arho1m(j,1); + v_buf[m++] = d_arho1m(j,2); + v_buf[m++] = d_arho2m(j,0); + v_buf[m++] = d_arho2m(j,1); + v_buf[m++] = d_arho2m(j,2); + v_buf[m++] = d_arho2m(j,3); + v_buf[m++] = d_arho2m(j,4); + v_buf[m++] = d_arho2m(j,5); + for (int k = 0; k < 10; k++) v_buf[m++] = d_arho3m(j,k); + v_buf[m++] = d_arho3mb(j,0); + v_buf[m++] = d_arho3mb(j,1); + v_buf[m++] = d_arho3mb(j,2); + } } /* ---------------------------------------------------------------------- */ @@ -371,7 +417,8 @@ void PairMEAMKokkos::unpack_forward_comm_kokkos(int n, int first_in, template KOKKOS_INLINE_FUNCTION void PairMEAMKokkos::operator()(TagPairMEAMUnpackForwardComm, const int &i) const{ - int m = i*38; + //int m = i*38; + int m = i*comm_forward; d_rho0[i+first] = v_buf[m++]; d_rho1[i+first] = v_buf[m++]; @@ -402,6 +449,22 @@ void PairMEAMKokkos::operator()(TagPairMEAMUnpackForwardComm, const d_tsq_ave(i+first,0) = v_buf[m++]; d_tsq_ave(i+first,1) = v_buf[m++]; d_tsq_ave(i+first,2) = v_buf[m++]; + if (msmeamflag) { + d_arho2mb[i+first] = v_buf[m++]; + d_arho1m(i+first,0) = v_buf[m++]; + d_arho1m(i+first,1) = v_buf[m++]; + d_arho1m(i+first,2) = v_buf[m++]; + d_arho2m(i+first,0) = v_buf[m++]; + d_arho2m(i+first,1) = v_buf[m++]; + d_arho2m(i+first,2) = v_buf[m++]; + d_arho2m(i+first,3) = v_buf[m++]; + d_arho2m(i+first,4) = v_buf[m++]; + d_arho2m(i+first,5) = v_buf[m++]; + for (int k = 0; k < 10; k++) d_arho3m(i+first,k) = v_buf[m++]; + d_arho3mb(i+first,0) = v_buf[m++]; + d_arho3mb(i+first,1) = v_buf[m++]; + d_arho3mb(i+first,2) = v_buf[m++]; + } } /* ---------------------------------------------------------------------- */ @@ -426,6 +489,13 @@ int PairMEAMKokkos::pack_forward_comm(int n, int *list, double *buf, meam_inst_kk->k_arho3b.sync_host(); meam_inst_kk->k_t_ave.sync_host(); meam_inst_kk->k_tsq_ave.sync_host(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.sync_host(); + meam_inst_kk->k_arho1m.sync_host(); + meam_inst_kk->k_arho2m.sync_host(); + meam_inst_kk->k_arho3m.sync_host(); + meam_inst_kk->k_arho3mb.sync_host(); + } int m = 0; for (int i = 0; i < n; i++) { @@ -459,6 +529,22 @@ int PairMEAMKokkos::pack_forward_comm(int n, int *list, double *buf, buf[m++] = meam_inst_kk->h_tsq_ave(j,0); buf[m++] = meam_inst_kk->h_tsq_ave(j,1); buf[m++] = meam_inst_kk->h_tsq_ave(j,2); + if (msmeamflag) { + buf[m++] = meam_inst_kk->h_arho2mb[j]; + buf[m++] = meam_inst_kk->h_arho1m(j,0); + buf[m++] = meam_inst_kk->h_arho1m(j,1); + buf[m++] = meam_inst_kk->h_arho1m(j,2); + buf[m++] = meam_inst_kk->h_arho2m(j,0); + buf[m++] = meam_inst_kk->h_arho2m(j,1); + buf[m++] = meam_inst_kk->h_arho2m(j,2); + buf[m++] = meam_inst_kk->h_arho2m(j,3); + buf[m++] = meam_inst_kk->h_arho2m(j,4); + buf[m++] = meam_inst_kk->h_arho2m(j,5); + for (int k = 0; k < 10; k++) buf[m++] = meam_inst_kk->h_arho3m(j,k); + buf[m++] = meam_inst_kk->h_arho3mb(j,0); + buf[m++] = meam_inst_kk->h_arho3mb(j,1); + buf[m++] = meam_inst_kk->h_arho3mb(j,2); + } } return m; @@ -485,6 +571,13 @@ void PairMEAMKokkos::unpack_forward_comm(int n, int first, double *b meam_inst_kk->k_arho3b.sync_host(); meam_inst_kk->k_t_ave.sync_host(); meam_inst_kk->k_tsq_ave.sync_host(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.sync_host(); + meam_inst_kk->k_arho1m.sync_host(); + meam_inst_kk->k_arho2m.sync_host(); + meam_inst_kk->k_arho3m.sync_host(); + meam_inst_kk->k_arho3mb.sync_host(); + } int m = 0; const int last = first + n; @@ -518,6 +611,22 @@ void PairMEAMKokkos::unpack_forward_comm(int n, int first, double *b meam_inst_kk->h_tsq_ave(i,0) = buf[m++]; meam_inst_kk->h_tsq_ave(i,1) = buf[m++]; meam_inst_kk->h_tsq_ave(i,2) = buf[m++]; + if (msmeamflag) { + meam_inst_kk->h_arho2mb[i] = buf[m++]; + meam_inst_kk->h_arho1m(i,0) = buf[m++]; + meam_inst_kk->h_arho1m(i,1) = buf[m++]; + meam_inst_kk->h_arho1m(i,2) = buf[m++]; + meam_inst_kk->h_arho2m(i,0) = buf[m++]; + meam_inst_kk->h_arho2m(i,1) = buf[m++]; + meam_inst_kk->h_arho2m(i,2) = buf[m++]; + meam_inst_kk->h_arho2m(i,3) = buf[m++]; + meam_inst_kk->h_arho2m(i,4) = buf[m++]; + meam_inst_kk->h_arho2m(i,5) = buf[m++]; + for (int k = 0; k < 10; k++) meam_inst_kk->h_arho3m(i,k) = buf[m++]; + meam_inst_kk->h_arho3mb(i,0) = buf[m++]; + meam_inst_kk->h_arho3mb(i,1) = buf[m++]; + meam_inst_kk->h_arho3mb(i,2) = buf[m++]; + } } meam_inst_kk->k_rho0.modify_host(); @@ -536,6 +645,13 @@ void PairMEAMKokkos::unpack_forward_comm(int n, int first, double *b meam_inst_kk->k_arho3b.modify_host(); meam_inst_kk->k_t_ave.modify_host(); meam_inst_kk->k_tsq_ave.modify_host(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.modify_host(); + meam_inst_kk->k_arho1m.modify_host(); + meam_inst_kk->k_arho2m.modify_host(); + meam_inst_kk->k_arho3m.modify_host(); + meam_inst_kk->k_arho3mb.modify_host(); + } } /* ---------------------------------------------------------------------- */ @@ -546,7 +662,8 @@ int PairMEAMKokkos::pack_reverse_comm_kokkos(int n, int first_in, DA first = first_in; v_buf = buf.view(); Kokkos::parallel_for(Kokkos::RangePolicy(0,n),*this); - return n*30; + //return n*30; + return n*comm_reverse; } /* ---------------------------------------------------------------------- */ @@ -554,7 +671,8 @@ int PairMEAMKokkos::pack_reverse_comm_kokkos(int n, int first_in, DA template KOKKOS_INLINE_FUNCTION void PairMEAMKokkos::operator()(TagPairMEAMPackReverseComm, const int &i) const { - int m = i*30; + //int m = i*30; + int m = i*comm_reverse; v_buf[m++] = d_rho0[i+first]; v_buf[m++] = d_arho2b[i+first]; @@ -577,6 +695,22 @@ void PairMEAMKokkos::operator()(TagPairMEAMPackReverseComm, const in v_buf[m++] = d_tsq_ave(i+first,0); v_buf[m++] = d_tsq_ave(i+first,1); v_buf[m++] = d_tsq_ave(i+first,2); + if (msmeamflag) { + v_buf[m++] = d_arho2mb[i+first]; + v_buf[m++] = d_arho1m(i+first,0); + v_buf[m++] = d_arho1m(i+first,1); + v_buf[m++] = d_arho1m(i+first,2); + v_buf[m++] = d_arho2m(i+first,0); + v_buf[m++] = d_arho2m(i+first,1); + v_buf[m++] = d_arho2m(i+first,2); + v_buf[m++] = d_arho2m(i+first,3); + v_buf[m++] = d_arho2m(i+first,4); + v_buf[m++] = d_arho2m(i+first,5); + for (int k = 0; k < 10; k++) v_buf[m++] = d_arho3m(i+first,k); + v_buf[m++] = d_arho3mb(i+first,0); + v_buf[m++] = d_arho3mb(i+first,1); + v_buf[m++] = d_arho3mb(i+first,2); + } } /* ---------------------------------------------------------------------- */ @@ -592,6 +726,13 @@ int PairMEAMKokkos::pack_reverse_comm(int n, int first, double *buf) meam_inst_kk->k_arho3b.sync_host(); meam_inst_kk->k_t_ave.sync_host(); meam_inst_kk->k_tsq_ave.sync_host(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.sync_host(); + meam_inst_kk->k_arho1m.sync_host(); + meam_inst_kk->k_arho2m.sync_host(); + meam_inst_kk->k_arho3m.sync_host(); + meam_inst_kk->k_arho3mb.sync_host(); + } int m = 0; const int last = first + n; @@ -617,6 +758,22 @@ int PairMEAMKokkos::pack_reverse_comm(int n, int first, double *buf) buf[m++] = meam_inst_kk->h_tsq_ave(i,0); buf[m++] = meam_inst_kk->h_tsq_ave(i,1); buf[m++] = meam_inst_kk->h_tsq_ave(i,2); + if (msmeamflag) { + buf[m++] = meam_inst_kk->h_arho2mb[i]; + buf[m++] = meam_inst_kk->h_arho1m(i,0); + buf[m++] = meam_inst_kk->h_arho1m(i,1); + buf[m++] = meam_inst_kk->h_arho1m(i,2); + buf[m++] = meam_inst_kk->h_arho2m(i,0); + buf[m++] = meam_inst_kk->h_arho2m(i,1); + buf[m++] = meam_inst_kk->h_arho2m(i,2); + buf[m++] = meam_inst_kk->h_arho2m(i,3); + buf[m++] = meam_inst_kk->h_arho2m(i,4); + buf[m++] = meam_inst_kk->h_arho2m(i,5); + for (int k = 0; k < 10; k++) buf[m++] = meam_inst_kk->h_arho3m(i,k); + buf[m++] = meam_inst_kk->h_arho3mb(i,0); + buf[m++] = meam_inst_kk->h_arho3mb(i,1); + buf[m++] = meam_inst_kk->h_arho3mb(i,2); + } } return m; @@ -639,7 +796,8 @@ template KOKKOS_INLINE_FUNCTION void PairMEAMKokkos::operator()(TagPairMEAMUnpackReverseComm, const int &i) const { int j = d_sendlist(iswap, i); - int m = i*30; + //int m = i*30; + int m = i*comm_reverse; d_rho0[j] += v_buf[m++]; d_arho2b[j] += v_buf[m++]; @@ -662,6 +820,22 @@ void PairMEAMKokkos::operator()(TagPairMEAMUnpackReverseComm, const d_tsq_ave(j,0) += v_buf[m++]; d_tsq_ave(j,1) += v_buf[m++]; d_tsq_ave(j,2) += v_buf[m++]; + if (msmeamflag) { + d_arho2mb[j] += v_buf[m++]; + d_arho1m(j,0) += v_buf[m++]; + d_arho1m(j,1) += v_buf[m++]; + d_arho1m(j,2) += v_buf[m++]; + d_arho2m(j,0) += v_buf[m++]; + d_arho2m(j,1) += v_buf[m++]; + d_arho2m(j,2) += v_buf[m++]; + d_arho2m(j,3) += v_buf[m++]; + d_arho2m(j,4) += v_buf[m++]; + d_arho2m(j,5) += v_buf[m++]; + for (int k = 0; k < 10; k++) d_arho3m(j,k) += v_buf[m++]; + d_arho3mb(j,0) += v_buf[m++]; + d_arho3mb(j,1) += v_buf[m++]; + d_arho3mb(j,2) += v_buf[m++]; + } } /* ---------------------------------------------------------------------- */ @@ -677,6 +851,13 @@ void PairMEAMKokkos::unpack_reverse_comm(int n, int *list, double *b meam_inst_kk->k_arho3b.sync_host(); meam_inst_kk->k_t_ave.sync_host(); meam_inst_kk->k_tsq_ave.sync_host(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.sync_host(); + meam_inst_kk->k_arho1m.sync_host(); + meam_inst_kk->k_arho2m.sync_host(); + meam_inst_kk->k_arho3m.sync_host(); + meam_inst_kk->k_arho3mb.sync_host(); + } int m = 0; for (int i = 0; i < n; i++) { @@ -702,6 +883,22 @@ void PairMEAMKokkos::unpack_reverse_comm(int n, int *list, double *b meam_inst_kk->h_tsq_ave(j,0) += buf[m++]; meam_inst_kk->h_tsq_ave(j,1) += buf[m++]; meam_inst_kk->h_tsq_ave(j,2) += buf[m++]; + if (msmeamflag) { + meam_inst_kk->h_arho2mb[j] += buf[m++]; + meam_inst_kk->h_arho1m(j,0) += buf[m++]; + meam_inst_kk->h_arho1m(j,1) += buf[m++]; + meam_inst_kk->h_arho1m(j,2) += buf[m++]; + meam_inst_kk->h_arho2m(j,0) += buf[m++]; + meam_inst_kk->h_arho2m(j,1) += buf[m++]; + meam_inst_kk->h_arho2m(j,2) += buf[m++]; + meam_inst_kk->h_arho2m(j,3) += buf[m++]; + meam_inst_kk->h_arho2m(j,4) += buf[m++]; + meam_inst_kk->h_arho2m(j,5) += buf[m++]; + for (int k = 0; k < 10; k++) meam_inst_kk->h_arho3m(j,k) += buf[m++]; + meam_inst_kk->h_arho3mb(j,0) += buf[m++]; + meam_inst_kk->h_arho3mb(j,1) += buf[m++]; + meam_inst_kk->h_arho3mb(j,2) += buf[m++]; + } } meam_inst_kk->k_rho0.modify_host(); @@ -712,6 +909,13 @@ void PairMEAMKokkos::unpack_reverse_comm(int n, int *list, double *b meam_inst_kk->k_arho3b.modify_host(); meam_inst_kk->k_t_ave.modify_host(); meam_inst_kk->k_tsq_ave.modify_host(); + if (msmeamflag) { + meam_inst_kk->k_arho2mb.modify_host(); + meam_inst_kk->k_arho1m.modify_host(); + meam_inst_kk->k_arho2m.modify_host(); + meam_inst_kk->k_arho3m.modify_host(); + meam_inst_kk->k_arho3mb.modify_host(); + } } /* ---------------------------------------------------------------------- @@ -764,6 +968,12 @@ void PairMEAMKokkos::update_meam_views() d_arho3b = meam_inst_kk->d_arho3b; d_t_ave = meam_inst_kk->d_t_ave; d_tsq_ave = meam_inst_kk->d_tsq_ave; + // msmeam + d_arho1m = meam_inst_kk->d_arho1m; + d_arho2m = meam_inst_kk->d_arho2m; + d_arho3m = meam_inst_kk->d_arho3m; + d_arho2mb = meam_inst_kk->d_arho2mb; + d_arho3mb = meam_inst_kk->d_arho3mb; } /* ---------------------------------------------------------------------- */ diff --git a/src/KOKKOS/pair_meam_kokkos.h b/src/KOKKOS/pair_meam_kokkos.h index c5fe82fa79..0d0d7667f3 100644 --- a/src/KOKKOS/pair_meam_kokkos.h +++ b/src/KOKKOS/pair_meam_kokkos.h @@ -13,12 +13,12 @@ #ifdef PAIR_CLASS // clang-format off -PairStyle(meam/c/kk,PairMEAMKokkos) -PairStyle(meam/c/kk/device,PairMEAMKokkos) -PairStyle(meam/c/kk/host,PairMEAMKokkos) -PairStyle(meam/kk,PairMEAMKokkos) -PairStyle(meam/kk/device,PairMEAMKokkos) -PairStyle(meam/kk/host,PairMEAMKokkos) +PairStyle(meam/c/kk,PairMEAMKokkos); +PairStyle(meam/c/kk/device,PairMEAMKokkos); +PairStyle(meam/c/kk/host,PairMEAMKokkos); +PairStyle(meam/kk,PairMEAMKokkos); +PairStyle(meam/kk/device,PairMEAMKokkos); +PairStyle(meam/kk/host,PairMEAMKokkos); // clang-format on #else @@ -117,6 +117,9 @@ class PairMEAMKokkos : public PairMEAM, public KokkosBase { typename ArrayTypes::t_ffloat_1d d_rho, d_rho0, d_rho1, d_rho2, d_rho3, d_frhop; typename ArrayTypes::t_ffloat_1d d_gamma, d_dgamma1, d_dgamma2, d_dgamma3, d_arho2b; typename ArrayTypes::t_ffloat_2d d_arho1, d_arho2, d_arho3, d_arho3b, d_t_ave, d_tsq_ave; + // msmeam params + typename ArrayTypes::t_ffloat_1d d_arho2mb; + typename ArrayTypes::t_ffloat_2d d_arho1m, d_arho2m, d_arho3m, d_arho3mb; void update_meam_views(); diff --git a/src/KOKKOS/pair_meam_ms_kokkos.cpp b/src/KOKKOS/pair_meam_ms_kokkos.cpp new file mode 100644 index 0000000000..491fc0273c --- /dev/null +++ b/src/KOKKOS/pair_meam_ms_kokkos.cpp @@ -0,0 +1,32 @@ +/* ---------------------------------------------------------------------- + LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator + https://www.lammps.org/ Sandia National Laboratories + LAMMPS development team: developers@lammps.org + + Copyright (2003) Sandia Corporation. Under the terms of Contract + DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains + certain rights in this software. This software is distributed under + the GNU General Public License. + + See the README file in the top-level LAMMPS directory. +------------------------------------------------------------------------- */ + +#include "pair_meam_ms_kokkos.h" +#include "meam.h" + +using namespace LAMMPS_NS; + +/* ---------------------------------------------------------------------- */ +template +PairMEAMMSKokkos::PairMEAMMSKokkos(LAMMPS *lmp) : PairMEAMKokkos(lmp) +{ + this->meam_inst->msmeamflag = this->msmeamflag = 1; + this->myname = "meam/ms/kk"; +} + +namespace LAMMPS_NS { +template class PairMEAMMSKokkos; +#ifdef KOKKOS_ENABLE_CUDA +template class PairMEAMMSKokkos; +#endif +} diff --git a/src/KOKKOS/pair_meam_ms_kokkos.h b/src/KOKKOS/pair_meam_ms_kokkos.h new file mode 100644 index 0000000000..a2cefc2c16 --- /dev/null +++ b/src/KOKKOS/pair_meam_ms_kokkos.h @@ -0,0 +1,36 @@ +/* -*- c++ -*- ---------------------------------------------------------- + LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator + https://www.lammps.org/ Sandia National Laboratories + LAMMPS development team: developers@lammps.org + + Copyright (2003) Sandia Corporation. Under the terms of Contract + DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains + certain rights in this software. This software is distributed under + the GNU General Public License. + + See the README file in the top-level LAMMPS directory. +------------------------------------------------------------------------- */ + +#ifdef PAIR_CLASS +// clang-format off +PairStyle(meam/ms/kk,PairMEAMMSKokkos); +PairStyle(meam/ms/kk/device,PairMEAMMSKokkos); +PairStyle(meam/ms/kk/host,PairMEAMMSKokkos); +// clang-format on +#else + +#ifndef LMP_PAIR_MEAM_MS_KOKKOS_H +#define LMP_PAIR_MEAM_MS_KOKKOS_H + +#include "pair_meam_kokkos.h" + +namespace LAMMPS_NS { + +template +class PairMEAMMSKokkos : public PairMEAMKokkos { + public: + PairMEAMMSKokkos(class LAMMPS *); +}; +} // namespace LAMMPS_NS +#endif +#endif diff --git a/src/MEAM/meam.h b/src/MEAM/meam.h index 9ec7de3426..5a131bdc34 100644 --- a/src/MEAM/meam.h +++ b/src/MEAM/meam.h @@ -17,7 +17,7 @@ #include #include -#define maxelt 5 +constexpr int maxelt = 5; namespace LAMMPS_NS { class Memory; @@ -30,6 +30,7 @@ class MEAM { virtual ~MEAM(); int copymode; + int msmeamflag; protected: Memory *memory; @@ -74,6 +75,12 @@ class MEAM { // vind[23]D = Voight notation index maps for 2 and 3D // v2D,v3D = array of factors to apply for Voight notation + // MS-MEAM parameters + + // msmeamflag = flag to activate MS-MEAM + // betam[1-3]_meam = MS-MEAM electron density constants + // tm[1-3]_meam = MS-MEAM coefficients on densities in Gamma computation + // nr,dr = pair function discretization parameters // nrar,rdrar = spline coeff array parameters @@ -115,12 +122,22 @@ class MEAM { int nr, nrar; double dr, rdrar; + // MS-MEAM parameters + + double t1m_meam[maxelt], t2m_meam[maxelt], t3m_meam[maxelt]; + double beta1m_meam[maxelt], beta2m_meam[maxelt], beta3m_meam[maxelt]; + //int msmeamflag; // made public for pair style settings + public: int nmax; double *rho, *rho0, *rho1, *rho2, *rho3, *frhop; double *gamma, *dgamma1, *dgamma2, *dgamma3, *arho2b; double **arho1, **arho2, **arho3, **arho3b, **t_ave, **tsq_ave; + // MS-MEAM arrays + + double **arho1m, **arho2m, *arho2mb, **arho3m, **arho3mb; + int maxneigh; double *scrfcn, *dscrfcn, *fcpair; @@ -242,7 +259,7 @@ class MEAM { double, double, double, double, double, int, int, lattice_t); void get_sijk(double, int, int, int, double *); void get_densref(double, int, int, double *, double *, double *, double *, double *, double *, - double *, double *); + double *, double *, double *, double *, double *, double *, double *, double *); // last 6 args for msmeam void interpolate_meam(int); public: @@ -282,10 +299,12 @@ class MEAM { } // clang-format on static int get_Zij(const lattice_t latt); + // last 6 args are optional msmeam parameters void meam_setup_global(int nelt, lattice_t *lat, int *ielement, double *atwt, double *alpha, double *b0, double *b1, double *b2, double *b3, double *alat, double *esub, double *asub, double *t0, double *t1, double *t2, double *t3, - double *rozero, int *ibar); + double *rozero, int *ibar, double *b1m, double *b2m, double *b3m, + double *t1m, double *t2m, double *t3m); void meam_setup_param(int which, double value, int nindex, int *index /*index(3)*/, int *errorflag); virtual void meam_setup_done(double *cutmax); diff --git a/src/MEAM/meam_dens_final.cpp b/src/MEAM/meam_dens_final.cpp index cf964a4724..ab0ac8c53f 100644 --- a/src/MEAM/meam_dens_final.cpp +++ b/src/MEAM/meam_dens_final.cpp @@ -27,115 +27,222 @@ MEAM::meam_dens_final(int nlocal, int eflag_either, int eflag_global, int eflag_ // Complete the calculation of density - for (i = 0; i < nlocal; i++) { - elti = fmap[type[i]]; - if (elti >= 0) { - scaleii = scale[type[i]][type[i]]; - rho1[i] = 0.0; - rho2[i] = -1.0 / 3.0 * arho2b[i] * arho2b[i]; - rho3[i] = 0.0; - for (m = 0; m < 3; m++) { - rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m]; - rho3[i] = rho3[i] - 3.0 / 5.0 * arho3b[i][m] * arho3b[i][m]; - } - for (m = 0; m < 6; m++) { - rho2[i] = rho2[i] + this->v2D[m] * arho2[i][m] * arho2[i][m]; - } - for (m = 0; m < 10; m++) { - rho3[i] = rho3[i] + this->v3D[m] * arho3[i][m] * arho3[i][m]; - } - - if (rho0[i] > 0.0) { - if (this->ialloy == 1) { - t_ave[i][0] = fdiv_zero(t_ave[i][0], tsq_ave[i][0]); - t_ave[i][1] = fdiv_zero(t_ave[i][1], tsq_ave[i][1]); - t_ave[i][2] = fdiv_zero(t_ave[i][2], tsq_ave[i][2]); - } else if (this->ialloy == 2) { - t_ave[i][0] = this->t1_meam[elti]; - t_ave[i][1] = this->t2_meam[elti]; - t_ave[i][2] = this->t3_meam[elti]; - } else { - t_ave[i][0] = t_ave[i][0] / rho0[i]; - t_ave[i][1] = t_ave[i][1] / rho0[i]; - t_ave[i][2] = t_ave[i][2] / rho0[i]; + if (this->msmeamflag) { + for (i = 0; i < nlocal; i++) { + elti = fmap[type[i]]; + if (elti >= 0) { + scaleii = scale[type[i]][type[i]]; + rho1[i] = 0.0; + rho2[i] = -1.0 / 3.0 * (arho2b[i] * arho2b[i] + - arho2mb[i] * arho2mb[i]); + rho3[i] = 0.0; + for (m = 0; m < 3; m++) { + rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m] + - arho1m[i][m] * arho1m[i][m]; + rho3[i] = rho3[i] - 3.0 / 5.0 * (arho3b[i][m] * arho3b[i][m] + - arho3mb[i][m] * arho3mb[i][m]); } - } - - gamma[i] = t_ave[i][0] * rho1[i] + t_ave[i][1] * rho2[i] + t_ave[i][2] * rho3[i]; - - if (rho0[i] > 0.0) { - gamma[i] = gamma[i] / (rho0[i] * rho0[i]); - } - - Z = get_Zij(this->lattce_meam[elti][elti]); - - G = G_gam(gamma[i], this->ibar_meam[elti], errorflag); - if (errorflag != 0) - return; - - get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp); - - if (this->ibar_meam[elti] <= 0) { - Gbar = 1.0; - dGbar = 0.0; - } else { - if (this->mix_ref_t == 1) { - gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z); - } else { - gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) / - (Z * Z); + for (m = 0; m < 6; m++) { + rho2[i] = rho2[i] + this->v2D[m] * (arho2[i][m] * arho2[i][m] + - arho2m[i][m] * arho2m[i][m]); } - Gbar = G_gam(gam, this->ibar_meam[elti], errorflag); - } - rho[i] = rho0[i] * G; - if (this->mix_ref_t == 1) { + for (m = 0; m < 10; m++) { + rho3[i] = rho3[i] + this->v3D[m] * (arho3[i][m] * arho3[i][m] + - arho3m[i][m] * arho3m[i][m]); + } + + // all the t weights are already accounted for with msmeam + gamma[i] = rho1[i] + rho2[i] + rho3[i]; + + if (rho0[i] > 0.0) { + gamma[i] = gamma[i] / (rho0[i] * rho0[i]); + } + + Z = get_Zij(this->lattce_meam[elti][elti]); + + G = G_gam(gamma[i], this->ibar_meam[elti], errorflag); + if (errorflag != 0) + return; + + get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp); + if (this->ibar_meam[elti] <= 0) { Gbar = 1.0; dGbar = 0.0; } else { - gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z); - Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar); + if (this->mix_ref_t == 1) { + gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z); + } else { + gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) / + (Z * Z); + } + Gbar = G_gam(gam, this->ibar_meam[elti], errorflag); } - rho_bkgd = this->rho0_meam[elti] * Z * Gbar; - } else { - if (this->bkgd_dyn == 1) { - rho_bkgd = this->rho0_meam[elti] * Z; + rho[i] = rho0[i] * G; + + if (this->mix_ref_t == 1) { + if (this->ibar_meam[elti] <= 0) { + Gbar = 1.0; + dGbar = 0.0; + } else { + gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z); + Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar); + } + rho_bkgd = this->rho0_meam[elti] * Z * Gbar; } else { - rho_bkgd = this->rho_ref_meam[elti]; + if (this->bkgd_dyn == 1) { + rho_bkgd = this->rho0_meam[elti] * Z; + } else { + rho_bkgd = this->rho_ref_meam[elti]; + } + } + rhob = rho[i] / rho_bkgd; + denom = 1.0 / rho_bkgd; + + G = dG_gam(gamma[i], this->ibar_meam[elti], dG); + + dgamma1[i] = (G - 2 * dG * gamma[i]) * denom; + + if (!iszero(rho0[i])) { + dgamma2[i] = (dG / rho0[i]) * denom; + } else { + dgamma2[i] = 0.0; + } + + // dgamma3 is nonzero only if we are using the "mixed" rule for + // computing t in the reference system (which is not correct, but + // included for backward compatibility + if (this->mix_ref_t == 1) { + dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom; + } else { + dgamma3[i] = 0.0; + } + + Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]); + if (eflag_either != 0) { + Fl *= scaleii; + if (eflag_global != 0) { + *eng_vdwl = *eng_vdwl + Fl; + } + if (eflag_atom != 0) { + eatom[i] = eatom[i] + Fl; + } } } - rhob = rho[i] / rho_bkgd; - denom = 1.0 / rho_bkgd; - - G = dG_gam(gamma[i], this->ibar_meam[elti], dG); - - dgamma1[i] = (G - 2 * dG * gamma[i]) * denom; - - if (!iszero(rho0[i])) { - dgamma2[i] = (dG / rho0[i]) * denom; - } else { - dgamma2[i] = 0.0; - } - - // dgamma3 is nonzero only if we are using the "mixed" rule for - // computing t in the reference system (which is not correct, but - // included for backward compatibility - if (this->mix_ref_t == 1) { - dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom; - } else { - dgamma3[i] = 0.0; - } - - Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]); - - if (eflag_either != 0) { - Fl *= scaleii; - if (eflag_global != 0) { - *eng_vdwl = *eng_vdwl + Fl; + } + } else { + for (i = 0; i < nlocal; i++) { + elti = fmap[type[i]]; + if (elti >= 0) { + scaleii = scale[type[i]][type[i]]; + rho1[i] = 0.0; + rho2[i] = -1.0 / 3.0 * arho2b[i] * arho2b[i]; + rho3[i] = 0.0; + for (m = 0; m < 3; m++) { + rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m]; + rho3[i] = rho3[i] - 3.0 / 5.0 * arho3b[i][m] * arho3b[i][m]; } - if (eflag_atom != 0) { - eatom[i] = eatom[i] + Fl; + for (m = 0; m < 6; m++) { + rho2[i] = rho2[i] + this->v2D[m] * arho2[i][m] * arho2[i][m]; + } + for (m = 0; m < 10; m++) { + rho3[i] = rho3[i] + this->v3D[m] * arho3[i][m] * arho3[i][m]; + } + + if (rho0[i] > 0.0) { + if (this->ialloy == 1) { + t_ave[i][0] = fdiv_zero(t_ave[i][0], tsq_ave[i][0]); + t_ave[i][1] = fdiv_zero(t_ave[i][1], tsq_ave[i][1]); + t_ave[i][2] = fdiv_zero(t_ave[i][2], tsq_ave[i][2]); + } else if (this->ialloy == 2) { + t_ave[i][0] = this->t1_meam[elti]; + t_ave[i][1] = this->t2_meam[elti]; + t_ave[i][2] = this->t3_meam[elti]; + } else { + t_ave[i][0] = t_ave[i][0] / rho0[i]; + t_ave[i][1] = t_ave[i][1] / rho0[i]; + t_ave[i][2] = t_ave[i][2] / rho0[i]; + } + } + + gamma[i] = t_ave[i][0] * rho1[i] + t_ave[i][1] * rho2[i] + t_ave[i][2] * rho3[i]; + + if (rho0[i] > 0.0) { + gamma[i] = gamma[i] / (rho0[i] * rho0[i]); + } + + Z = get_Zij(this->lattce_meam[elti][elti]); + + G = G_gam(gamma[i], this->ibar_meam[elti], errorflag); + if (errorflag != 0) + return; + + get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp); + + if (this->ibar_meam[elti] <= 0) { + Gbar = 1.0; + dGbar = 0.0; + } else { + if (this->mix_ref_t == 1) { + gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z); + } else { + gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) / + (Z * Z); + } + Gbar = G_gam(gam, this->ibar_meam[elti], errorflag); + } + rho[i] = rho0[i] * G; + + if (this->mix_ref_t == 1) { + if (this->ibar_meam[elti] <= 0) { + Gbar = 1.0; + dGbar = 0.0; + } else { + gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z); + Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar); + } + rho_bkgd = this->rho0_meam[elti] * Z * Gbar; + } else { + if (this->bkgd_dyn == 1) { + rho_bkgd = this->rho0_meam[elti] * Z; + } else { + rho_bkgd = this->rho_ref_meam[elti]; + } + } + rhob = rho[i] / rho_bkgd; + denom = 1.0 / rho_bkgd; + + G = dG_gam(gamma[i], this->ibar_meam[elti], dG); + + dgamma1[i] = (G - 2 * dG * gamma[i]) * denom; + + if (!iszero(rho0[i])) { + dgamma2[i] = (dG / rho0[i]) * denom; + } else { + dgamma2[i] = 0.0; + } + + // dgamma3 is nonzero only if we are using the "mixed" rule for + // computing t in the reference system (which is not correct, but + // included for backward compatibility + if (this->mix_ref_t == 1) { + dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom; + } else { + dgamma3[i] = 0.0; + } + + Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]); + + if (eflag_either != 0) { + Fl *= scaleii; + if (eflag_global != 0) { + *eng_vdwl = *eng_vdwl + Fl; + } + if (eflag_atom != 0) { + eatom[i] = eatom[i] + Fl; + + } } } } diff --git a/src/MEAM/meam_dens_init.cpp b/src/MEAM/meam_dens_init.cpp index b60e1a7a17..00ad276ad7 100644 --- a/src/MEAM/meam_dens_init.cpp +++ b/src/MEAM/meam_dens_init.cpp @@ -45,6 +45,14 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh) memory->destroy(arho3b); memory->destroy(t_ave); memory->destroy(tsq_ave); + // msmeam params + if (this->msmeamflag) { + memory->destroy(arho1m); + memory->destroy(arho2m); + memory->destroy(arho3m); + memory->destroy(arho2mb); + memory->destroy(arho3mb); + } nmax = atom_nmax; @@ -65,6 +73,14 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh) memory->create(arho3b, nmax, 3, "pair:arho3b"); memory->create(t_ave, nmax, 3, "pair:t_ave"); memory->create(tsq_ave, nmax, 3, "pair:tsq_ave"); + // msmeam params + if (this->msmeamflag) { + memory->create(arho1m, nmax, 3, "pair:arho1m"); + memory->create(arho2m, nmax, 6, "pair:arho2m"); + memory->create(arho3m, nmax, 10, "pair:arho3m"); + memory->create(arho2mb, nmax, "pair:arho2mb"); + memory->create(arho3mb, nmax, 3, "pair:arho3mb"); + } } if (n_neigh > maxneigh) { @@ -83,14 +99,30 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh) rho0[i] = 0.0; arho2b[i] = 0.0; arho1[i][0] = arho1[i][1] = arho1[i][2] = 0.0; - for (j = 0; j < 6; j++) + if (this->msmeamflag) { + arho2mb[i] = 0.0; + arho1m[i][0] = arho1m[i][1] = arho1m[i][2] = 0.0; + } + for (j = 0; j < 6; j++) { arho2[i][j] = 0.0; - for (j = 0; j < 10; j++) + if (this->msmeamflag) { + arho2m[i][j] = 0.0; + } + } + for (j = 0; j < 10; j++) { arho3[i][j] = 0.0; + if (this->msmeamflag) { + arho3m[i][j] = 0.0; + } + } arho3b[i][0] = arho3b[i][1] = arho3b[i][2] = 0.0; + if (this->msmeamflag) { + arho3mb[i][0] = arho3mb[i][1] = arho3mb[i][2] = 0.0; + } t_ave[i][0] = t_ave[i][1] = t_ave[i][2] = 0.0; tsq_ave[i][0] = tsq_ave[i][1] = tsq_ave[i][2] = 0.0; } + } void @@ -282,6 +314,9 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn // double G,Gbar,gam,shp[3+1]; double ro0i, ro0j; double rhoa0i, rhoa1i, rhoa2i, rhoa3i, A1i, A2i, A3i; + // msmeam params + double rhoa1mj, rhoa2mj, rhoa3mj, A1mj, A2mj, A3mj; + double rhoa1mi, rhoa2mi, rhoa3mi, A1mi, A2mi, A3mi; elti = fmap[type[i]]; xtmp = x[i][0]; @@ -306,10 +341,20 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn rhoa1j = ro0j * MathSpecial::fm_exp(-this->beta1_meam[eltj] * aj) * sij; rhoa2j = ro0j * MathSpecial::fm_exp(-this->beta2_meam[eltj] * aj) * sij; rhoa3j = ro0j * MathSpecial::fm_exp(-this->beta3_meam[eltj] * aj) * sij; + if (this->msmeamflag){ + rhoa1mj = ro0j * this->t1m_meam[eltj] * MathSpecial::fm_exp(-this->beta1m_meam[eltj] * aj) * sij; + rhoa2mj = ro0j * this->t2m_meam[eltj] * MathSpecial::fm_exp(-this->beta2m_meam[eltj] * aj) * sij; + rhoa3mj = ro0j * this->t3m_meam[eltj] * MathSpecial::fm_exp(-this->beta3m_meam[eltj] * aj) * sij; + } rhoa0i = ro0i * MathSpecial::fm_exp(-this->beta0_meam[elti] * ai) * sij; rhoa1i = ro0i * MathSpecial::fm_exp(-this->beta1_meam[elti] * ai) * sij; rhoa2i = ro0i * MathSpecial::fm_exp(-this->beta2_meam[elti] * ai) * sij; rhoa3i = ro0i * MathSpecial::fm_exp(-this->beta3_meam[elti] * ai) * sij; + if (this->msmeamflag){ + rhoa1mi = ro0i * this->t1m_meam[elti] * MathSpecial::fm_exp(-this->beta1m_meam[elti] * ai) * sij; + rhoa2mi = ro0i * this->t2m_meam[elti] * MathSpecial::fm_exp(-this->beta2m_meam[elti] * ai) * sij; + rhoa3mi = ro0i * this->t3m_meam[elti] * MathSpecial::fm_exp(-this->beta3m_meam[elti] * ai) * sij; + } if (this->ialloy == 1) { rhoa1j = rhoa1j * this->t1_meam[eltj]; rhoa2j = rhoa2j * this->t2_meam[eltj]; @@ -321,6 +366,7 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn rho0[i] = rho0[i] + rhoa0j; rho0[j] = rho0[j] + rhoa0i; // For ialloy = 2, use single-element value (not average) + // For ialloy = 2, use single-element value (not average) if (this->ialloy != 2) { t_ave[i][0] = t_ave[i][0] + this->t1_meam[eltj] * rhoa0j; t_ave[i][1] = t_ave[i][1] + this->t2_meam[eltj] * rhoa0j; @@ -348,18 +394,42 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn A3i = rhoa3i / (rij2 * rij); nv2 = 0; nv3 = 0; + if (this->msmeamflag) { + arho2mb[i] = arho2mb[i] + rhoa2mj; + arho2mb[j] = arho2mb[j] + rhoa2mi; + A1mj = rhoa1mj/rij; + A2mj = rhoa2mj/rij2; + A3mj = rhoa3mj/(rij2*rij); + A1mi = rhoa1mi/rij; + A2mi = rhoa2mi/rij2; + A3mi = rhoa3mi/(rij2*rij); + } for (m = 0; m < 3; m++) { arho1[i][m] = arho1[i][m] + A1j * delij[m]; arho1[j][m] = arho1[j][m] - A1i * delij[m]; arho3b[i][m] = arho3b[i][m] + rhoa3j * delij[m] / rij; arho3b[j][m] = arho3b[j][m] - rhoa3i * delij[m] / rij; + if (this->msmeamflag) { + arho1m[i][m] = arho1m[i][m] + A1mj*delij[m]; + arho1m[j][m] = arho1m[j][m] - A1mi*delij[m]; + arho3mb[i][m] = arho3mb[i][m] + rhoa3mj*delij[m] / rij; + arho3mb[j][m] = arho3mb[j][m] - rhoa3mi*delij[m] / rij; + } for (n = m; n < 3; n++) { arho2[i][nv2] = arho2[i][nv2] + A2j * delij[m] * delij[n]; arho2[j][nv2] = arho2[j][nv2] + A2i * delij[m] * delij[n]; + if (this->msmeamflag) { + arho2m[i][nv2] = arho2m[i][nv2] + A2mj*delij[m] * delij[n]; + arho2m[j][nv2] = arho2m[j][nv2] + A2mi*delij[m] * delij[n]; + } nv2 = nv2 + 1; for (p = n; p < 3; p++) { arho3[i][nv3] = arho3[i][nv3] + A3j * delij[m] * delij[n] * delij[p]; arho3[j][nv3] = arho3[j][nv3] - A3i * delij[m] * delij[n] * delij[p]; + if (this->msmeamflag) { + arho3m[i][nv3] = arho3m[i][nv3] + A3mj*delij[m]*delij[n]*delij[p]; + arho3m[j][nv3] = arho3m[j][nv3] - A3mi*delij[m]*delij[n]*delij[p]; + } nv3 = nv3 + 1; } } diff --git a/src/MEAM/meam_force.cpp b/src/MEAM/meam_force.cpp index acc3d5672a..4bc7380898 100644 --- a/src/MEAM/meam_force.cpp +++ b/src/MEAM/meam_force.cpp @@ -61,6 +61,17 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int double t1i, t2i, t3i, t1j, t2j, t3j; double scaleij; + double rhoa1mj,drhoa1mj,rhoa1mi,drhoa1mi; + double rhoa2mj,drhoa2mj,rhoa2mi,drhoa2mi; + double rhoa3mj, drhoa3mj, rhoa3mi, drhoa3mi; + double arg1i1m, arg1j1m, arg1i2m, arg1j2m, arg1i3m, arg1j3m, arg3i3m, arg3j3m; + double drho1mdr1, drho1mdr2, drho1mds1, drho1mds2; + double drho1mdrm1[3], drho1mdrm2[3]; + double drho2mdr1, drho2mdr2, drho2mds1, drho2mds2; + double drho2mdrm1[3], drho2mdrm2[3]; + double drho3mdr1, drho3mdr2, drho3mds1, drho3mds2; + double drho3mdrm1[3], drho3mdrm2[3]; + third = 1.0 / 3.0; sixth = 1.0 / 6.0; @@ -74,6 +85,7 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int zitmp = x[i][2]; // Treat each pair + for (jn = 0; jn < numneigh; jn++) { j = firstneigh[jn]; eltj = fmap[type[j]]; @@ -89,7 +101,6 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int if (rij2 < this->cutforcesq) { rij = sqrt(rij2); recip = 1.0 / rij; - // Compute phi and phip ind = this->eltind[elti][eltj]; pp = rij * this->rdrar; @@ -114,6 +125,7 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int // write(1,*) "force_meamf: phip: ",phip // Compute pair densities and derivatives + invrei = 1.0 / this->re_meam[elti][elti]; ai = rij * invrei - 1.0; ro0i = this->rho0_meam[elti]; @@ -126,6 +138,15 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int rhoa3i = ro0i * MathSpecial::fm_exp(-this->beta3_meam[elti] * ai); drhoa3i = -this->beta3_meam[elti] * invrei * rhoa3i; + if (this->msmeamflag) { + rhoa1mi = ro0i * MathSpecial::fm_exp(-this->beta1m_meam[elti] * ai) * t1m_meam[elti]; + drhoa1mi = -this->beta1m_meam[elti] * invrei * rhoa1mi; + rhoa2mi = ro0i * MathSpecial::fm_exp(-this->beta2m_meam[elti] * ai) * t2m_meam[elti]; + drhoa2mi = -this->beta2m_meam[elti] * invrei * rhoa2mi; + rhoa3mi = ro0i * MathSpecial::fm_exp(-this->beta3m_meam[elti] * ai) * t3m_meam[elti]; + drhoa3mi = -this->beta3m_meam[elti] * invrei * rhoa3mi; + } + if (elti != eltj) { invrej = 1.0 / this->re_meam[eltj][eltj]; aj = rij * invrej - 1.0; @@ -138,6 +159,16 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int drhoa2j = -this->beta2_meam[eltj] * invrej * rhoa2j; rhoa3j = ro0j * MathSpecial::fm_exp(-this->beta3_meam[eltj] * aj); drhoa3j = -this->beta3_meam[eltj] * invrej * rhoa3j; + + if (this->msmeamflag) { + rhoa1mj = ro0j * t1m_meam[eltj] * MathSpecial::fm_exp(-this->beta1m_meam[eltj] * aj); + drhoa1mj = -this->beta1m_meam[eltj] * invrej * rhoa1mj; + rhoa2mj = ro0j * t2m_meam[eltj] * MathSpecial::fm_exp(-this->beta2m_meam[eltj] * aj); + drhoa2mj = -this->beta2m_meam[eltj] * invrej * rhoa2mj; + rhoa3mj = ro0j * t3m_meam[eltj] * MathSpecial::fm_exp(-this->beta3m_meam[eltj] * aj); + drhoa3mj = -this->beta3m_meam[eltj] * invrej * rhoa3mj; + } + } else { rhoa0j = rhoa0i; drhoa0j = drhoa0i; @@ -147,6 +178,15 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int drhoa2j = drhoa2i; rhoa3j = rhoa3i; drhoa3j = drhoa3i; + + if (this->msmeamflag) { + rhoa1mj = rhoa1mi; + drhoa1mj = drhoa1mi; + rhoa2mj = rhoa2mi; + drhoa2mj = drhoa2mi; + rhoa3mj = rhoa3mi; + drhoa3mj = drhoa3mi; + } } const double t1mi = this->t1_meam[elti]; @@ -156,7 +196,10 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int const double t2mj = this->t2_meam[eltj]; const double t3mj = this->t3_meam[eltj]; - if (this->ialloy == 1) { + // ialloy mod not needed in MS-MEAM, but similarity here is that we multply rhos by t. + // We did this above with rhoa1mj, rhoa2mj, etc. + + if (this->ialloy == 1 || this->msmeamflag) { rhoa1j *= t1mj; rhoa2j *= t2mj; rhoa3j *= t3mj; @@ -200,6 +243,39 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int arg3j3 = arg3j3 - arho3b[j][n] * delij[n]; } + // msmeam arhom args + + nv2 = 0; + nv3 = 0; + arg1i1m = 0.0; + arg1j1m = 0.0; + arg1i2m = 0.0; + arg1j2m = 0.0; + arg1i3m = 0.0; + arg1j3m = 0.0; + arg3i3m = 0.0; + arg3j3m = 0.0; + if (this->msmeamflag) { + for (n = 0; n < 3; n++) { + for (p = n; p < 3; p++) { + for (q = p; q < 3; q++) { + arg = delij[n] * delij[p] * delij[q] * this->v3D[nv3]; + arg1i3m = arg1i3m - arho3m[i][nv3] * arg; + arg1j3m = arg1j3m + arho3m[j][nv3] * arg; + nv3 = nv3 + 1; + } + arg = delij[n] * delij[p] * this->v2D[nv2]; + arg1i2m = arg1i2m + arho2m[i][nv2] * arg; + arg1j2m = arg1j2m + arho2m[j][nv2] * arg; + nv2 = nv2 + 1; + } + arg1i1m = arg1i1m - arho1m[i][n] * delij[n]; + arg1j1m = arg1j1m + arho1m[j][n] * delij[n]; + arg3i3m = arg3i3m - arho3mb[i][n] * delij[n]; + arg3j3m = arg3j3m + arho3mb[j][n] * delij[n]; + } + } + // rho0 terms drho0dr1 = drhoa0j * sij; drho0dr2 = drhoa0i * sij; @@ -254,32 +330,83 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int drho3drm2[m] = (-a3 * drho3drm2[m] + a3a * arho3b[j][m]) * rhoa3i; } - // Compute derivatives of weighting functions t wrt rij - t1i = t_ave[i][0]; - t2i = t_ave[i][1]; - t3i = t_ave[i][2]; - t1j = t_ave[j][0]; - t2j = t_ave[j][1]; - t3j = t_ave[j][2]; + if (this->msmeamflag) { + // rho1m terms + a1 = 2 * sij / rij; + drho1mdr1 = a1 * (drhoa1mj - rhoa1mj / rij) * arg1i1m; + drho1mdr2 = a1 * (drhoa1mi - rhoa1mi / rij) * arg1j1m; + drho1mdr1 *= -1.0; + drho1mdr2 *= -1.0; + a1 = 2.0 * sij / rij; + for (m = 0; m < 3; m++) { + drho1mdrm1[m] = a1 * rhoa1mj * arho1m[i][m]; + drho1mdrm2[m] = -a1 * rhoa1mi * arho1m[j][m]; + } - if (this->ialloy == 1) { + // rho2m terms + a2 = 2 * sij / rij2; + drho2mdr1 = a2 * (drhoa2mj - 2 * rhoa2mj / rij) * arg1i2m - 2.0 / 3.0 * arho2mb[i] * drhoa2mj * sij; + drho2mdr2 = a2 * (drhoa2mi - 2 * rhoa2mi / rij) * arg1j2m - 2.0 / 3.0 * arho2mb[j] * drhoa2mi * sij; + a2 = 4 * sij / rij2; + for (m = 0; m < 3; m++) { + drho2mdrm1[m] = 0.0; + drho2mdrm2[m] = 0.0; + for (n = 0; n < 3; n++) { + drho2mdrm1[m] += arho2m[i][this->vind2D[m][n]] * delij[n]; + drho2mdrm2[m] -= arho2m[j][this->vind2D[m][n]] * delij[n]; + } + drho2mdrm1[m] = a2 * rhoa2mj * drho2mdrm1[m]; + drho2mdrm2[m] = -a2 * rhoa2mi * drho2mdrm2[m]; + } - a1i = fdiv_zero(drhoa0j * sij, tsq_ave[i][0]); - a1j = fdiv_zero(drhoa0i * sij, tsq_ave[j][0]); - a2i = fdiv_zero(drhoa0j * sij, tsq_ave[i][1]); - a2j = fdiv_zero(drhoa0i * sij, tsq_ave[j][1]); - a3i = fdiv_zero(drhoa0j * sij, tsq_ave[i][2]); - a3j = fdiv_zero(drhoa0i * sij, tsq_ave[j][2]); + // rho3m terms + rij3 = rij * rij2; + a3 = 2 * sij / rij3; + a3a = 6.0 / 5.0 * sij / rij; + drho3mdr1 = a3 * (drhoa3mj - 3 * rhoa3mj / rij) * arg1i3m - a3a * (drhoa3mj - rhoa3mj / rij) * arg3i3m; + drho3mdr2 = a3 * (drhoa3mi - 3 * rhoa3mi / rij) * arg1j3m - a3a * (drhoa3mi - rhoa3mi / rij) * arg3j3m; + drho3mdr1 *= -1.0; + drho3mdr2 *= -1.0; - dt1dr1 = a1i * (t1mj - t1i * MathSpecial::square(t1mj)); - dt1dr2 = a1j * (t1mi - t1j * MathSpecial::square(t1mi)); - dt2dr1 = a2i * (t2mj - t2i * MathSpecial::square(t2mj)); - dt2dr2 = a2j * (t2mi - t2j * MathSpecial::square(t2mi)); - dt3dr1 = a3i * (t3mj - t3i * MathSpecial::square(t3mj)); - dt3dr2 = a3j * (t3mi - t3j * MathSpecial::square(t3mi)); + a3 = 6 * sij / rij3; + a3a = 6 * sij / (5 * rij); + for (m = 0; m < 3; m++) { + drho3mdrm1[m] = 0.0; + drho3mdrm2[m] = 0.0; + nv2 = 0; + for (n = 0; n < 3; n++) { + for (p = n; p < 3; p++) { + arg = delij[n] * delij[p] * this->v2D[nv2]; + drho3mdrm1[m] += arho3m[i][this->vind3D[m][n][p]] * arg; + drho3mdrm2[m] += arho3m[j][this->vind3D[m][n][p]] * arg; + nv2 = nv2 + 1; + } + } + drho3mdrm1[m] = (a3 * drho3mdrm1[m] - a3a * arho3mb[i][m]) * rhoa3mj; + drho3mdrm2[m] = (-a3 * drho3mdrm2[m] + a3a * arho3mb[j][m]) * rhoa3mi; + } + } else { + for (m = 0; m < 3; m++) { + drho1mdrm1[m] = 0.0; + drho1mdrm2[m] = 0.0; + drho2mdrm1[m] = 0.0; + drho2mdrm2[m] = 0.0; + drho3mdrm1[m] = 0.0; + drho3mdrm2[m] = 0.0; + } + } - } else if (this->ialloy == 2) { + // compute derivatives of weighting functions t wrt rij + // weighting functions t set to unity for MS-MEAM + if (this->msmeamflag) { + + t1i = 1.0; + t2i = 1.0; + t3i = 1.0; + t1j = 1.0; + t2j = 1.0; + t3j = 1.0; dt1dr1 = 0.0; dt1dr2 = 0.0; dt2dr1 = 0.0; @@ -289,38 +416,98 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int } else { - ai = 0.0; - if (!iszero(rho0[i])) - ai = drhoa0j * sij / rho0[i]; - aj = 0.0; - if (!iszero(rho0[j])) - aj = drhoa0i * sij / rho0[j]; + t1i = t_ave[i][0]; + t2i = t_ave[i][1]; + t3i = t_ave[i][2]; + t1j = t_ave[j][0]; + t2j = t_ave[j][1]; + t3j = t_ave[j][2]; + + if (this->ialloy == 1) { + + a1i = fdiv_zero(drhoa0j * sij, tsq_ave[i][0]); + a1j = fdiv_zero(drhoa0i * sij, tsq_ave[j][0]); + a2i = fdiv_zero(drhoa0j * sij, tsq_ave[i][1]); + a2j = fdiv_zero(drhoa0i * sij, tsq_ave[j][1]); + a3i = fdiv_zero(drhoa0j * sij, tsq_ave[i][2]); + a3j = fdiv_zero(drhoa0i * sij, tsq_ave[j][2]); + + dt1dr1 = a1i * (t1mj - t1i * MathSpecial::square(t1mj)); + dt1dr2 = a1j * (t1mi - t1j * MathSpecial::square(t1mi)); + dt2dr1 = a2i * (t2mj - t2i * MathSpecial::square(t2mj)); + dt2dr2 = a2j * (t2mi - t2j * MathSpecial::square(t2mi)); + dt3dr1 = a3i * (t3mj - t3i * MathSpecial::square(t3mj)); + dt3dr2 = a3j * (t3mi - t3j * MathSpecial::square(t3mi)); + + } else if (this->ialloy == 2) { + + dt1dr1 = 0.0; + dt1dr2 = 0.0; + dt2dr1 = 0.0; + dt2dr2 = 0.0; + dt3dr1 = 0.0; + dt3dr2 = 0.0; + + } else { + + ai = 0.0; + if (!iszero(rho0[i])) + ai = drhoa0j * sij / rho0[i]; + aj = 0.0; + if (!iszero(rho0[j])) + aj = drhoa0i * sij / rho0[j]; + + dt1dr1 = ai * (t1mj - t1i); + dt1dr2 = aj * (t1mi - t1j); + dt2dr1 = ai * (t2mj - t2i); + dt2dr2 = aj * (t2mi - t2j); + dt3dr1 = ai * (t3mj - t3i); + dt3dr2 = aj * (t3mi - t3j); + } - dt1dr1 = ai * (t1mj - t1i); - dt1dr2 = aj * (t1mi - t1j); - dt2dr1 = ai * (t2mj - t2i); - dt2dr2 = aj * (t2mi - t2j); - dt3dr1 = ai * (t3mj - t3i); - dt3dr2 = aj * (t3mi - t3j); } // Compute derivatives of total density wrt rij, sij and rij(3) get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shpi); get_shpfcn(this->lattce_meam[eltj][eltj], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shpj); - drhodr1 = dgamma1[i] * drho0dr1 + - dgamma2[i] * (dt1dr1 * rho1[i] + t1i * drho1dr1 + dt2dr1 * rho2[i] + t2i * drho2dr1 + - dt3dr1 * rho3[i] + t3i * drho3dr1) - - dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1); - drhodr2 = dgamma1[j] * drho0dr2 + - dgamma2[j] * (dt1dr2 * rho1[j] + t1j * drho1dr2 + dt2dr2 * rho2[j] + t2j * drho2dr2 + - dt3dr2 * rho3[j] + t3j * drho3dr2) - - dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2); - for (m = 0; m < 3; m++) { - drhodrm1[m] = 0.0; - drhodrm2[m] = 0.0; - drhodrm1[m] = dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]); - drhodrm2[m] = dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]); + if (this->msmeamflag) { + drhodr1 = dgamma1[i] * drho0dr1 + + dgamma2[i] * (dt1dr1 * rho1[i] + t1i * (drho1dr1 - drho1mdr1) + + dt2dr1 * rho2[i] + t2i * (drho2dr1 - drho2mdr1) + + dt3dr1 * rho3[i] + t3i * (drho3dr1 - drho3mdr1)) - + dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1); + drhodr2 = dgamma1[j] * drho0dr2 + + dgamma2[j] * (dt1dr2 * rho1[j] + t1j * (drho1dr2 - drho1mdr2) + + dt2dr2 * rho2[j] + t2j * (drho2dr2 - drho2mdr2) + + dt3dr2 * rho3[j] + t3j * (drho3dr2 - drho3mdr2)) - + dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2); + for (m = 0; m < 3; m++) { + drhodrm1[m] = 0.0; + drhodrm2[m] = 0.0; + drhodrm1[m] = dgamma2[i] * (t1i * (drho1drm1[m] - drho1mdrm1[m]) + + t2i * (drho2drm1[m] - drho2mdrm1[m]) + + t3i * (drho3drm1[m] - drho3mdrm1[m]) ); + drhodrm2[m] = dgamma2[j] * (t1j * (drho1drm2[m] - drho1mdrm2[m]) + + t2j * (drho2drm2[m] - drho2mdrm2[m]) + + t3j * (drho3drm2[m] - drho3mdrm2[m]) ); + } + } else { + + drhodr1 = dgamma1[i] * drho0dr1 + + dgamma2[i] * (dt1dr1 * rho1[i] + t1i * drho1dr1 + dt2dr1 * rho2[i] + t2i * drho2dr1 + + dt3dr1 * rho3[i] + t3i * drho3dr1) - + dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1); + drhodr2 = dgamma1[j] * drho0dr2 + + dgamma2[j] * (dt1dr2 * rho1[j] + t1j * drho1dr2 + dt2dr2 * rho2[j] + t2j * drho2dr2 + + dt3dr2 * rho3[j] + t3j * drho3dr2) - + dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2); + for (m = 0; m < 3; m++) { + drhodrm1[m] = 0.0; + drhodrm2[m] = 0.0; + drhodrm1[m] = dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]); + drhodrm2[m] = dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]); + } } // Compute derivatives wrt sij, but only if necessary @@ -328,17 +515,37 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int drho0ds1 = rhoa0j; drho0ds2 = rhoa0i; a1 = 2.0 / rij; - drho1ds1 = a1 * rhoa1j * arg1i1; - drho1ds2 = a1 * rhoa1i * arg1j1; a2 = 2.0 / rij2; - drho2ds1 = a2 * rhoa2j * arg1i2 - 2.0 / 3.0 * arho2b[i] * rhoa2j; - drho2ds2 = a2 * rhoa2i * arg1j2 - 2.0 / 3.0 * arho2b[j] * rhoa2i; a3 = 2.0 / rij3; a3a = 6.0 / (5.0 * rij); + + drho1ds1 = a1 * rhoa1j * arg1i1; + drho1ds2 = a1 * rhoa1i * arg1j1; + drho2ds1 = a2 * rhoa2j * arg1i2 - 2.0 / 3.0 * arho2b[i] * rhoa2j; + drho2ds2 = a2 * rhoa2i * arg1j2 - 2.0 / 3.0 * arho2b[j] * rhoa2i; drho3ds1 = a3 * rhoa3j * arg1i3 - a3a * rhoa3j * arg3i3; drho3ds2 = a3 * rhoa3i * arg1j3 - a3a * rhoa3i * arg3j3; + if (this->msmeamflag) { + drho1mds1 = a1 * rhoa1mj * arg1i1m; + drho1mds2 = a1 * rhoa1mi * arg1j1m; + drho2mds1 = a2 * rhoa2mj * arg1i2m - 2.0 / 3.0 * arho2mb[i] * rhoa2mj; + drho2mds2 = a2 * rhoa2mi * arg1j2m - 2.0 / 3.0 * arho2mb[j] * rhoa2mi; + drho3mds1 = a3 * rhoa3mj * arg1i3m - a3a * rhoa3mj * arg3i3m; + drho3mds2 = a3 * rhoa3mi * arg1j3m - a3a * rhoa3mi * arg3j3m; + drho3mds1 *= -1; + drho3mds2 *= -1; + } else { + drho1mds1 = 0.0; + drho1mds2 = 0.0; + drho2mds1 = 0.0; + drho2mds2 = 0.0; + drho3mds1 = 0.0; + drho3mds2 = 0.0; + } + if (this->ialloy == 1) { + a1i = fdiv_zero(rhoa0j, tsq_ave[i][0]); a1j = fdiv_zero(rhoa0i, tsq_ave[j][0]); a2i = fdiv_zero(rhoa0j, tsq_ave[i][1]); @@ -379,19 +586,36 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int dt3ds2 = aj * (t3mi - t3j); } - drhods1 = dgamma1[i] * drho0ds1 + - dgamma2[i] * (dt1ds1 * rho1[i] + t1i * drho1ds1 + dt2ds1 * rho2[i] + t2i * drho2ds1 + - dt3ds1 * rho3[i] + t3i * drho3ds1) - - dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1); - drhods2 = dgamma1[j] * drho0ds2 + - dgamma2[j] * (dt1ds2 * rho1[j] + t1j * drho1ds2 + dt2ds2 * rho2[j] + t2j * drho2ds2 + - dt3ds2 * rho3[j] + t3j * drho3ds2) - - dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2); + if (this->msmeamflag) { + drhods1 = dgamma1[i] * drho0ds1 + + dgamma2[i] * (dt1ds1 * rho1[i] + t1i * (drho1ds1 - drho1mds1) + + dt2ds1 * rho2[i] + t2i * (drho2ds1 - drho2mds1) + + dt3ds1 * rho3[i] + t3i * (drho3ds1 - drho3mds1)) - + dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1); + drhods2 = dgamma1[j] * drho0ds2 + + dgamma2[j] * (dt1ds2 * rho1[j] + t1j * (drho1ds2 - drho1mds2) + + dt2ds2 * rho2[j] + t2j * (drho2ds2 - drho2mds2) + + dt3ds2 * rho3[j] + t3j * (drho3ds2 - drho3mds2)) - + dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2); + } + else { + drhods1 = dgamma1[i] * drho0ds1 + + dgamma2[i] * (dt1ds1 * rho1[i] + t1i * drho1ds1 + dt2ds1 * rho2[i] + t2i * drho2ds1 + + dt3ds1 * rho3[i] + t3i * drho3ds1) - + dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1); + drhods2 = dgamma1[j] * drho0ds2 + + dgamma2[j] * (dt1ds2 * rho1[j] + t1j * drho1ds2 + dt2ds2 * rho2[j] + t2j * drho2ds2 + + dt3ds2 * rho3[j] + t3j * drho3ds2) - + dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2); + } } - // Compute derivatives of energy wrt rij, sij and rij[3] + // Compute derivatives of energy wrt rij, sij and rij[3] + // MS-MEAM affects phip + dUdrij = phip * sij + frhop[i] * drhodr1 + frhop[j] * drhodr2; dUdsij = 0.0; + if (!iszero(dscrfcn[fnoffset + jn])) { dUdsij = phi + frhop[i] * drhods1 + frhop[j] * drhods2; } diff --git a/src/MEAM/meam_impl.cpp b/src/MEAM/meam_impl.cpp index bbfb83e94a..5290647b18 100644 --- a/src/MEAM/meam_impl.cpp +++ b/src/MEAM/meam_impl.cpp @@ -34,6 +34,11 @@ MEAM::MEAM(Memory* mem) gamma = dgamma1 = dgamma2 = dgamma3 = arho2b = nullptr; arho1 = arho2 = arho3 = arho3b = t_ave = tsq_ave = nullptr; + // msmeam arrays + msmeamflag = 0; + arho2mb = nullptr; + arho1m = arho2m = arho3m = arho3mb = nullptr; + maxneigh = 0; scrfcn = dscrfcn = fcpair = nullptr; copymode = 0; @@ -43,7 +48,9 @@ MEAM::MEAM(Memory* mem) A_meam[i] = rho0_meam[i] = beta0_meam[i] = beta1_meam[i]= beta2_meam[i] = beta3_meam[i] = t0_meam[i] = t1_meam[i] = t2_meam[i] = t3_meam[i] = - rho_ref_meam[i] = ibar_meam[i] = ielt_meam[i] = 0.0; + rho_ref_meam[i] = ibar_meam[i] = ielt_meam[i] = + t1m_meam[i] = t2m_meam[i] = t3m_meam[i] = + beta1m_meam[i] = beta2m_meam[i] = beta3m_meam[i] = 0.0; for (int j = 0; j < maxelt; j++) { lattce_meam[i][j] = FCC; Ec_meam[i][j] = re_meam[i][j] = alpha_meam[i][j] = delta_meam[i][j] = ebound_meam[i][j] = attrac_meam[i][j] = repuls_meam[i][j] = 0.0; @@ -87,4 +94,13 @@ MEAM::~MEAM() memory->destroy(this->scrfcn); memory->destroy(this->dscrfcn); memory->destroy(this->fcpair); + + // msmeam + if (this->msmeamflag){ + memory->destroy(this->arho1m); + memory->destroy(this->arho2m); + memory->destroy(this->arho3m); + memory->destroy(this->arho2mb); + memory->destroy(this->arho3mb); + } } diff --git a/src/MEAM/meam_setup_done.cpp b/src/MEAM/meam_setup_done.cpp index 93f2552465..de1188349c 100644 --- a/src/MEAM/meam_setup_done.cpp +++ b/src/MEAM/meam_setup_done.cpp @@ -220,7 +220,6 @@ void MEAM::compute_pair_meam() // loop over r values and compute for (j = 0; j < this->nr; j++) { r = j * this->dr; - this->phir[nv2][j] = phi_meam(r, a, b); // if using second-nearest neighbor, solve recursive problem @@ -333,9 +332,12 @@ double MEAM::phi_meam(double r, int a, int b) lattice_t latta /*unused:,lattb*/; double rho_bkgd1, rho_bkgd2; double b11s, b22s; + // msmeam + double t1m1av, t2m1av, t3m1av, t1m2av, t2m2av, t3m2av; + double rho1m1, rho2m1, rho3m1; + double rho1m2, rho2m2, rho3m2; double phi_m = 0.0; - // Equation numbers below refer to: // I. Huang et.al., Modelling simul. Mater. Sci. Eng. 3:615 @@ -345,8 +347,16 @@ double MEAM::phi_meam(double r, int a, int b) Z2 = get_Zij(this->lattce_meam[b][b]); Z12 = get_Zij(this->lattce_meam[a][b]); - get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32); - + // this function has extra args for msmeam + if (this->msmeamflag) { + get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32, + &rho1m1, &rho2m1, &rho3m1, + &rho1m2, &rho2m2, &rho3m2); + } else { + get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32, + nullptr, nullptr, nullptr, + nullptr, nullptr, nullptr); + } // if densities are too small, numerical problems may result; just return zero if (rho01 <= 1e-14 && rho02 <= 1e-14) return 0.0; @@ -374,6 +384,12 @@ double MEAM::phi_meam(double r, int a, int b) get_tavref(&t11av, &t21av, &t31av, &t12av, &t22av, &t32av, this->t1_meam[a], this->t2_meam[a], this->t3_meam[a], this->t1_meam[b], this->t2_meam[b], this->t3_meam[b], r, a, b, this->lattce_meam[a][b]); + // with msmeam call twice with different sets of variables + if (this->msmeamflag) { + get_tavref(&t1m1av, &t2m1av, &t3m1av, &t1m2av, &t2m2av, &t3m2av, this->t1m_meam[a], this->t2m_meam[a], + this->t3m_meam[a], this->t1m_meam[b], this->t2m_meam[b], this->t3m_meam[b], r, a, b, + this->lattce_meam[a][b]); + } } // for c11b structure, calculate background electron densities @@ -420,17 +436,33 @@ double MEAM::phi_meam(double r, int a, int b) rho0_1 = this->rho0_meam[a] * Z1 * G1; rho0_2 = this->rho0_meam[b] * Z2 * G2; } - Gam1 = (t11av * rho11 + t21av * rho21 + t31av * rho31); - if (rho01 < 1.0e-14) - Gam1 = 0.0; - else - Gam1 = Gam1 / (rho01 * rho01); - Gam2 = (t12av * rho12 + t22av * rho22 + t32av * rho32); - if (rho02 < 1.0e-14) - Gam2 = 0.0; - else - Gam2 = Gam2 / (rho02 * rho02); + if (this->msmeamflag) { + // no additional use of t's here; all included in definitions of rho's for msmeam + Gam1 = rho11 + rho21 + rho31 - (rho1m1 + rho2m1 + rho3m1); + if (rho01 < 1.0e-14) + Gam1 = 0.0; + else + Gam1 = Gam1 / (rho01 * rho01); + Gam2 = rho12 + rho22 + rho32 - (rho1m2 + rho2m2 + rho3m2); + if (rho02 < 1.0e-14) + Gam2 = 0.0; + else + Gam2 = Gam2 / (rho02 * rho02); + + } else { + Gam1 = (t11av * rho11 + t21av * rho21 + t31av * rho31); + if (rho01 < 1.0e-14) + Gam1 = 0.0; + else + Gam1 = Gam1 / (rho01 * rho01); + + Gam2 = (t12av * rho12 + t22av * rho22 + t32av * rho32); + if (rho02 < 1.0e-14) + Gam2 = 0.0; + else + Gam2 = Gam2 / (rho02 * rho02); + } G1 = G_gam(Gam1, this->ibar_meam[a], errorflag); G2 = G_gam(Gam2, this->ibar_meam[b], errorflag); @@ -655,7 +687,9 @@ void MEAM::get_sijk(double C, int i, int j, int k, double* sijk) //------------------------------------------------------------------------------c // Calculate density functions, assuming reference configuration void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, double* rho21, double* rho31, - double* rho02, double* rho12, double* rho22, double* rho32) + double* rho02, double* rho12, double* rho22, double* rho32, + double* rho1m1, double* rho2m1, double* rho3m1, + double* rho1m2, double* rho2m2, double* rho3m2) { double a1, a2; double s[3]; @@ -666,18 +700,39 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou double rhoa02, rhoa12, rhoa22, rhoa32; double arat, scrn, denom; double C, s111, s112, s221, S11, S22; + // msmeam + double rhoa1m1, rhoa2m1, rhoa3m1, rhoa1m2, rhoa2m2, rhoa3m2; a1 = r / this->re_meam[a][a] - 1.0; a2 = r / this->re_meam[b][b] - 1.0; rhoa01 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta0_meam[a] * a1); - rhoa11 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1); - rhoa21 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1); - rhoa31 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1); - rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2); - rhoa12 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2); - rhoa22 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2); - rhoa32 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2); + + if (this->msmeamflag) { + // the rho variables are multiplied by t here since ialloy not needed in msmeam + rhoa11 = this->rho0_meam[a] * this->t1_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1); + rhoa21 = this->rho0_meam[a] * this->t2_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1); + rhoa31 = this->rho0_meam[a] * this->t3_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1); + rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2); + rhoa12 = this->rho0_meam[b] * this->t1_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2); + rhoa22 = this->rho0_meam[b] * this->t2_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2); + rhoa32 = this->rho0_meam[b] * this->t3_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2); + // msmeam specific rho vars + rhoa1m1 = this->rho0_meam[a] * this->t1m_meam[a] * MathSpecial::fm_exp(-this->beta1m_meam[a] * a1); + rhoa2m1 = this->rho0_meam[a] * this->t2m_meam[a] * MathSpecial::fm_exp(-this->beta2m_meam[a] * a1); + rhoa3m1 = this->rho0_meam[a] * this->t3m_meam[a] * MathSpecial::fm_exp(-this->beta3m_meam[a] * a1); + rhoa1m2 = this->rho0_meam[b] * this->t1m_meam[b] * MathSpecial::fm_exp(-this->beta1m_meam[b] * a2); + rhoa2m2 = this->rho0_meam[b] * this->t2m_meam[b] * MathSpecial::fm_exp(-this->beta2m_meam[b] * a2); + rhoa3m2 = this->rho0_meam[b] * this->t3m_meam[b] * MathSpecial::fm_exp(-this->beta3m_meam[b] * a2); + } else { + rhoa11 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1); + rhoa21 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1); + rhoa31 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1); + rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2); + rhoa12 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2); + rhoa22 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2); + rhoa32 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2); + } lat = this->lattce_meam[a][b]; @@ -689,7 +744,16 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou *rho12 = 0.0; *rho22 = 0.0; *rho32 = 0.0; + if (this->msmeamflag) { + *rho1m1 = 0.0; + *rho2m1 = 0.0; + *rho3m1 = 0.0; + *rho1m2 = 0.0; + *rho2m2 = 0.0; + *rho3m2 = 0.0; + } + // keep track of density components separately; combine in the calling subroutine switch (lat) { case FCC: *rho01 = 12.0 * rhoa02; @@ -710,12 +774,20 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou *rho02 = 4.0 * rhoa01; *rho31 = 32.0 / 9.0 * rhoa32 * rhoa32; *rho32 = 32.0 / 9.0 * rhoa31 * rhoa31; + if (this->msmeamflag) { + *rho3m1 = 32.0 / 9.0 * rhoa3m2 * rhoa3m2; + *rho3m2 = 32.0 / 9.0 * rhoa3m1 * rhoa3m1; + } break; case HCP: *rho01 = 12 * rhoa02; *rho02 = 12 * rhoa01; *rho31 = 1.0 / 3.0 * rhoa32 * rhoa32; *rho32 = 1.0 / 3.0 * rhoa31 * rhoa31; + if (this->msmeamflag) { + *rho3m1 = 1.0 / 3.0 * rhoa3m2 * rhoa3m2; + *rho3m2 = 1.0 / 3.0 * rhoa3m1 * rhoa3m1; + } break; case DIM: get_shpfcn(DIM, 0, 0, s); @@ -727,6 +799,14 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou *rho22 = s[1] * rhoa21 * rhoa21; *rho31 = s[2] * rhoa32 * rhoa32; *rho32 = s[2] * rhoa31 * rhoa31; + if (this->msmeamflag) { + *rho1m1 = s[0] * rhoa1m2 * rhoa1m2; + *rho1m2 = s[0] * rhoa1m1 * rhoa1m1; + *rho2m1 = s[1] * rhoa2m2 * rhoa2m2; + *rho2m2 = s[1] * rhoa2m1 * rhoa2m1; + *rho3m1 = s[2] * rhoa3m2 * rhoa3m2; + *rho3m2 = s[2] * rhoa3m1 * rhoa3m1; + } break; case C11: *rho01 = rhoa01; @@ -737,17 +817,28 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou *rho22 = rhoa22; *rho31 = rhoa31; *rho32 = rhoa32; + if (this->msmeamflag) { + *rho1m1 = rhoa1m1; + *rho1m2 = rhoa1m2; + *rho2m1 = rhoa2m1; + *rho2m2 = rhoa2m2; + *rho3m1 = rhoa3m1; + *rho3m2 = rhoa3m2; + } break; case L12: *rho01 = 8 * rhoa01 + 4 * rhoa02; *rho02 = 12 * rhoa01; - if (this->ialloy == 1) { + if (this->ialloy ==1){ *rho21 = 8. / 3. * MathSpecial::square(rhoa21 * this->t2_meam[a] - rhoa22 * this->t2_meam[b]); denom = 8 * rhoa01 * MathSpecial::square(this->t2_meam[a]) + 4 * rhoa02 * MathSpecial::square(this->t2_meam[b]); if (denom > 0.) *rho21 = *rho21 / denom * *rho01; } else *rho21 = 8. / 3. * (rhoa21 - rhoa22) * (rhoa21 - rhoa22); + if (this->msmeamflag) { + *rho2m1 = 8. / 3. * (rhoa2m1 - rhoa2m2) * (rhoa2m1 - rhoa2m2); + } break; case B2: *rho01 = 8.0 * rhoa02; @@ -864,6 +955,7 @@ void MEAM::interpolate_meam(int ind) this->rdrar = 1.0 / drar; // phir interp + for (j = 0; j < this->nrar; j++) { this->phirar[ind][j] = this->phir[ind][j]; } diff --git a/src/MEAM/meam_setup_global.cpp b/src/MEAM/meam_setup_global.cpp index 545a2ad3f4..5d35242e7c 100644 --- a/src/MEAM/meam_setup_global.cpp +++ b/src/MEAM/meam_setup_global.cpp @@ -36,7 +36,8 @@ void MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*/, double* alpha, double* b0, double* b1, double* b2, double* b3, double* alat, double* esub, double* asub, double* t0, double* t1, double* t2, double* t3, double* rozero, - int* ibar) + int* ibar, double* b1m, double *b2m, double *b3m, double *t1m, double *t2m, + double *t3m) { int i; @@ -53,6 +54,11 @@ MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt* this->beta1_meam[i] = b1[i]; this->beta2_meam[i] = b2[i]; this->beta3_meam[i] = b3[i]; + if (this->msmeamflag){ + this->beta1m_meam[i] = b1m[i]; + this->beta2m_meam[i] = b2m[i]; + this->beta3m_meam[i] = b3m[i]; + } tmplat[i] = alat[i]; this->Ec_meam[i][i] = esub[i]; this->A_meam[i] = asub[i]; @@ -60,6 +66,11 @@ MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt* this->t1_meam[i] = t1[i]; this->t2_meam[i] = t2[i]; this->t3_meam[i] = t3[i]; + if (this->msmeamflag){ + this->t1m_meam[i] = t1m[i]; + this->t2m_meam[i] = t2m[i]; + this->t3m_meam[i] = t3m[i]; + } this->rho0_meam[i] = rozero[i]; this->ibar_meam[i] = ibar[i]; diff --git a/src/MEAM/pair_meam.cpp b/src/MEAM/pair_meam.cpp index bcfffbe52b..c4a4cfa1d7 100644 --- a/src/MEAM/pair_meam.cpp +++ b/src/MEAM/pair_meam.cpp @@ -58,13 +58,12 @@ PairMEAM::PairMEAM(LAMMPS *lmp) : Pair(lmp) allocated = 0; nlibelements = 0; + meam_inst = new MEAM(memory); + meam_inst->msmeamflag = msmeamflag = 0; + myname = "meam"; + scale = nullptr; - - // set comm size needed by this Pair - - comm_forward = 38; - comm_reverse = 30; } /* ---------------------------------------------------------------------- @@ -93,7 +92,6 @@ void PairMEAM::compute(int eflag, int vflag) int i,ii,n,inum_half,errorflag; int *ilist_half,*numneigh_half,**firstneigh_half; int *numneigh_full,**firstneigh_full; - ev_init(eflag,vflag); // neighbor list info @@ -133,7 +131,6 @@ void PairMEAM::compute(int eflag, int vflag) int offset = 0; errorflag = 0; - for (ii = 0; ii < inum_half; ii++) { i = ilist_half[ii]; meam_inst->meam_dens_init(i,ntype,type,map,x, @@ -142,9 +139,7 @@ void PairMEAM::compute(int eflag, int vflag) offset); offset += numneigh_half[i]; } - comm->reverse_comm(this); - meam_inst->meam_dens_final(nlocal,eflag_either,eflag_global,eflag_atom, &eng_vdwl,eatom,ntype,type,map,scale,errorflag); if (errorflag) @@ -159,7 +154,6 @@ void PairMEAM::compute(int eflag, int vflag) double **vptr = nullptr; if (vflag_atom) vptr = vatom; - for (ii = 0; ii < inum_half; ii++) { i = ilist_half[ii]; meam_inst->meam_force(i,eflag_global,eflag_atom,vflag_global, @@ -169,7 +163,6 @@ void PairMEAM::compute(int eflag, int vflag) offset,f,vptr,virial); offset += numneigh_half[i]; } - if (vflag_fdotr) virial_fdotr_compute(); } @@ -193,7 +186,17 @@ void PairMEAM::allocate() void PairMEAM::settings(int narg, char ** /*arg*/) { - if (narg != 0) error->all(FLERR,"Illegal pair_style command"); + if (narg != 0) error->all(FLERR,"Illegal pair_style {} command", myname); + + // set comm size needed by this Pair + + if (msmeamflag) { + comm_forward = 38+23; // plus 23 for msmeam + comm_reverse = 30+23; // plus 23 for msmeam + } else { + comm_forward = 38; + comm_reverse = 30; + } } /* ---------------------------------------------------------------------- @@ -206,12 +209,7 @@ void PairMEAM::coeff(int narg, char **arg) if (!allocated) allocate(); - if (narg < 6) error->all(FLERR,"Incorrect args for pair coefficients"); - - // ensure I,J args are * * - - if (strcmp(arg[0],"*") != 0 || strcmp(arg[1],"*") != 0) - error->all(FLERR,"Incorrect args for pair coefficients"); + if (narg < 6) error->all(FLERR,"Incorrect args for pair style {} coefficients", myname); // check for presence of first meam file @@ -239,7 +237,7 @@ void PairMEAM::coeff(int narg, char **arg) } if (paridx < 0) error->all(FLERR,"No MEAM parameter file in pair coefficients"); if ((narg - paridx - 1) != atom->ntypes) - error->all(FLERR,"Incorrect args for pair coefficients"); + error->all(FLERR,"Incorrect args for pair style {} coefficients", myname); // MEAM element names between 2 filenames // nlibelements = # of MEAM elements @@ -282,7 +280,7 @@ void PairMEAM::coeff(int narg, char **arg) if (libelements[j] == arg[i]) break; if (j < nlibelements) map[m] = j; else if (strcmp(arg[i],"NULL") == 0) map[m] = -1; - else error->all(FLERR,"Incorrect args for pair coefficients"); + else error->all(FLERR,"Incorrect args for pair style {} coefficients", myname); } // clear setflag since coeff() called once with I,J = * * @@ -307,7 +305,7 @@ void PairMEAM::coeff(int narg, char **arg) } } - if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients"); + if (count == 0) error->all(FLERR,"Incorrect args for pair style {} coefficients", myname); } /* ---------------------------------------------------------------------- @@ -317,7 +315,7 @@ void PairMEAM::coeff(int narg, char **arg) void PairMEAM::init_style() { if (force->newton_pair == 0) - error->all(FLERR,"Pair style MEAM requires newton pair on"); + error->all(FLERR,"Pair style {} requires newton pair on", myname); // need a full and a half neighbor list @@ -360,7 +358,9 @@ void PairMEAM::read_files(const std::string &globalfile, void PairMEAM::read_global_meam_file(const std::string &globalfile) { + // allocate parameter arrays + std::vector lat(nlibelements); std::vector ielement(nlibelements); std::vector ibar(nlibelements); @@ -381,6 +381,15 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile) std::vector rozero(nlibelements); std::vector found(nlibelements, false); + // allocate 6 extra arrays for msmeam + + std::vector b1m(nlibelements); + std::vector b2m(nlibelements); + std::vector b3m(nlibelements); + std::vector t1m(nlibelements); + std::vector t2m(nlibelements); + std::vector t3m(nlibelements); + // open global meamf file on proc 0 if (comm->me == 0) { @@ -416,8 +425,7 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile) std::string lattice_type = values.next_string(); if (!MEAM::str_to_lat(lattice_type, true, lat[index])) - error->one(FLERR,"Unrecognized lattice type in MEAM " - "library file: {}", lattice_type); + error->one(FLERR,"Unrecognized lattice type in MEAM library file: {}", lattice_type); // store parameters @@ -429,6 +437,11 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile) b1[index] = values.next_double(); b2[index] = values.next_double(); b3[index] = values.next_double(); + if (msmeamflag) { + b1m[index] = values.next_double(); + b2m[index] = values.next_double(); + b3m[index] = values.next_double(); + } alat[index] = values.next_double(); esub[index] = values.next_double(); asub[index] = values.next_double(); @@ -436,15 +449,20 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile) t1[index] = values.next_double(); t2[index] = values.next_double(); t3[index] = values.next_double(); + if (msmeamflag) { + t1m[index] = values.next_double(); + t2m[index] = values.next_double(); + t3m[index] = values.next_double(); + } rozero[index] = values.next_double(); ibar[index] = values.next_int(); if (!isone(t0[index])) - error->one(FLERR,"Unsupported parameter in MEAM library file: t0!=1"); + error->one(FLERR,"Unsupported parameter in MEAM library file: t0 != 1"); // z given is ignored: if this is mismatched, we definitely won't do what the user said -> fatal error if (z[index] != MEAM::get_Zij(lat[index])) - error->one(FLERR,"Mismatched parameter in MEAM library file: z!=lat"); + error->one(FLERR,"Mismatched parameter in MEAM library file: z != lat"); nset++; } catch (TokenizerException &e) { @@ -484,13 +502,29 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile) MPI_Bcast(t2.data(), nlibelements, MPI_DOUBLE, 0, world); MPI_Bcast(t3.data(), nlibelements, MPI_DOUBLE, 0, world); MPI_Bcast(rozero.data(), nlibelements, MPI_DOUBLE, 0, world); + // distribute msmeam parameter sets + MPI_Bcast(b1m.data(), nlibelements, MPI_DOUBLE, 0, world); + MPI_Bcast(b2m.data(), nlibelements, MPI_DOUBLE, 0, world); + MPI_Bcast(b3m.data(), nlibelements, MPI_DOUBLE, 0, world); + MPI_Bcast(t1m.data(), nlibelements, MPI_DOUBLE, 0, world); + MPI_Bcast(t2m.data(), nlibelements, MPI_DOUBLE, 0, world); + MPI_Bcast(t3m.data(), nlibelements, MPI_DOUBLE, 0, world); // pass element parameters to MEAM package - meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(), - alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(), - alat.data(), esub.data(), asub.data(), t0.data(), t1.data(), - t2.data(), t3.data(), rozero.data(), ibar.data()); + if (msmeamflag) { + meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(), + alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(), + alat.data(), esub.data(), asub.data(), t0.data(), t1.data(), + t2.data(), t3.data(), rozero.data(), ibar.data(), b1m.data(), + b2m.data(), b3m.data(), t1m.data(), t2m.data(), t3m.data()); + } else { + meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(), + alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(), + alat.data(), esub.data(), asub.data(), t0.data(), t1.data(), + t2.data(), t3.data(), rozero.data(), ibar.data(), nullptr, + nullptr, nullptr, nullptr, nullptr, nullptr); + } // set element masses @@ -613,6 +647,23 @@ int PairMEAM::pack_forward_comm(int n, int *list, double *buf, buf[m++] = meam_inst->tsq_ave[j][0]; buf[m++] = meam_inst->tsq_ave[j][1]; buf[m++] = meam_inst->tsq_ave[j][2]; + if (msmeamflag) { + buf[m++] = meam_inst->arho2mb[j]; + buf[m++] = meam_inst->arho1m[j][0]; + buf[m++] = meam_inst->arho1m[j][1]; + buf[m++] = meam_inst->arho1m[j][2]; + buf[m++] = meam_inst->arho2m[j][0]; + buf[m++] = meam_inst->arho2m[j][1]; + buf[m++] = meam_inst->arho2m[j][2]; + buf[m++] = meam_inst->arho2m[j][3]; + buf[m++] = meam_inst->arho2m[j][4]; + buf[m++] = meam_inst->arho2m[j][5]; + for (k = 0; k < 10; k++) buf[m++] = meam_inst->arho3m[j][k]; + buf[m++] = meam_inst->arho3mb[j][0]; + buf[m++] = meam_inst->arho3mb[j][1]; + buf[m++] = meam_inst->arho3mb[j][2]; + } + } return m; @@ -656,6 +707,22 @@ void PairMEAM::unpack_forward_comm(int n, int first, double *buf) meam_inst->tsq_ave[i][0] = buf[m++]; meam_inst->tsq_ave[i][1] = buf[m++]; meam_inst->tsq_ave[i][2] = buf[m++]; + if (msmeamflag) { + meam_inst->arho2mb[i] = buf[m++]; + meam_inst->arho1m[i][0] = buf[m++]; + meam_inst->arho1m[i][1] = buf[m++]; + meam_inst->arho1m[i][2] = buf[m++]; + meam_inst->arho2m[i][0] = buf[m++]; + meam_inst->arho2m[i][1] = buf[m++]; + meam_inst->arho2m[i][2] = buf[m++]; + meam_inst->arho2m[i][3] = buf[m++]; + meam_inst->arho2m[i][4] = buf[m++]; + meam_inst->arho2m[i][5] = buf[m++]; + for (k = 0; k < 10; k++) meam_inst->arho3m[i][k] = buf[m++]; + meam_inst->arho3mb[i][0] = buf[m++]; + meam_inst->arho3mb[i][1] = buf[m++]; + meam_inst->arho3mb[i][2] = buf[m++]; + } } } @@ -689,6 +756,22 @@ int PairMEAM::pack_reverse_comm(int n, int first, double *buf) buf[m++] = meam_inst->tsq_ave[i][0]; buf[m++] = meam_inst->tsq_ave[i][1]; buf[m++] = meam_inst->tsq_ave[i][2]; + if (msmeamflag) { + buf[m++] = meam_inst->arho2mb[i]; + buf[m++] = meam_inst->arho1m[i][0]; + buf[m++] = meam_inst->arho1m[i][1]; + buf[m++] = meam_inst->arho1m[i][2]; + buf[m++] = meam_inst->arho2m[i][0]; + buf[m++] = meam_inst->arho2m[i][1]; + buf[m++] = meam_inst->arho2m[i][2]; + buf[m++] = meam_inst->arho2m[i][3]; + buf[m++] = meam_inst->arho2m[i][4]; + buf[m++] = meam_inst->arho2m[i][5]; + for (k = 0; k < 10; k++) buf[m++] = meam_inst->arho3m[i][k]; + buf[m++] = meam_inst->arho3mb[i][0]; + buf[m++] = meam_inst->arho3mb[i][1]; + buf[m++] = meam_inst->arho3mb[i][2]; + } } return m; @@ -724,7 +807,25 @@ void PairMEAM::unpack_reverse_comm(int n, int *list, double *buf) meam_inst->tsq_ave[j][0] += buf[m++]; meam_inst->tsq_ave[j][1] += buf[m++]; meam_inst->tsq_ave[j][2] += buf[m++]; + if (msmeamflag) { + meam_inst->arho2mb[j] += buf[m++]; + meam_inst->arho1m[j][0] += buf[m++]; + meam_inst->arho1m[j][1] += buf[m++]; + meam_inst->arho1m[j][2] += buf[m++]; + meam_inst->arho2m[j][0] += buf[m++]; + meam_inst->arho2m[j][1] += buf[m++]; + meam_inst->arho2m[j][2] += buf[m++]; + meam_inst->arho2m[j][3] += buf[m++]; + meam_inst->arho2m[j][4] += buf[m++]; + meam_inst->arho2m[j][5] += buf[m++]; + for (k = 0; k < 10; k++) meam_inst->arho3m[j][k] += buf[m++]; + meam_inst->arho3mb[j][0] += buf[m++]; + meam_inst->arho3mb[j][1] += buf[m++]; + meam_inst->arho3mb[j][2] += buf[m++]; + } } + + } /* ---------------------------------------------------------------------- diff --git a/src/MEAM/pair_meam.h b/src/MEAM/pair_meam.h index 16ba38fcb2..a89714bfa9 100644 --- a/src/MEAM/pair_meam.h +++ b/src/MEAM/pair_meam.h @@ -47,6 +47,8 @@ class PairMEAM : public Pair { class MEAM *meam_inst; double cutmax; // max cutoff for all elements int nlibelements; // # of library elements + int msmeamflag; // 0 (default) for normal MEAM, 1 for MS-MEAM + std::string myname; // name of the pair style std::vector libelements; // names of library elements std::vector mass; // mass of library element diff --git a/src/MEAM/pair_meam_ms.cpp b/src/MEAM/pair_meam_ms.cpp new file mode 100644 index 0000000000..982a54f546 --- /dev/null +++ b/src/MEAM/pair_meam_ms.cpp @@ -0,0 +1,25 @@ +/* ---------------------------------------------------------------------- + LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator + https://www.lammps.org/ Sandia National Laboratories + LAMMPS development team: developers@lammps.org + + Copyright (2003) Sandia Corporation. Under the terms of Contract + DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains + certain rights in this software. This software is distributed under + the GNU General Public License. + + See the README file in the top-level LAMMPS directory. +------------------------------------------------------------------------- */ + +#include "pair_meam_ms.h" +#include "meam.h" + +using namespace LAMMPS_NS; + +/* ---------------------------------------------------------------------- */ + +PairMEAMMS::PairMEAMMS(LAMMPS *lmp) : PairMEAM(lmp) +{ + meam_inst->msmeamflag = msmeamflag = 1; + myname = "meam/ms"; +} diff --git a/src/MEAM/pair_meam_ms.h b/src/MEAM/pair_meam_ms.h new file mode 100644 index 0000000000..25878203ed --- /dev/null +++ b/src/MEAM/pair_meam_ms.h @@ -0,0 +1,33 @@ +/* -*- c++ -*- ---------------------------------------------------------- + LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator + https://www.lammps.org/ Sandia National Laboratories + LAMMPS development team: developers@lammps.org + + Copyright (2003) Sandia Corporation. Under the terms of Contract + DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains + certain rights in this software. This software is distributed under + the GNU General Public License. + + See the README file in the top-level LAMMPS directory. +------------------------------------------------------------------------- */ + +#ifdef PAIR_CLASS +// clang-format off +PairStyle(meam/ms,PairMEAMMS); +// clang-format on +#else + +#ifndef LMP_PAIR_MEAM_MS_H +#define LMP_PAIR_MEAM_MS_H + +#include "pair_meam.h" + +namespace LAMMPS_NS { + +class PairMEAMMS : public PairMEAM { + public: + PairMEAMMS(class LAMMPS *); +}; +} // namespace LAMMPS_NS +#endif +#endif diff --git a/unittest/force-styles/tests/atomic-pair-meam_ms.yaml b/unittest/force-styles/tests/atomic-pair-meam_ms.yaml new file mode 100644 index 0000000000..e479514017 --- /dev/null +++ b/unittest/force-styles/tests/atomic-pair-meam_ms.yaml @@ -0,0 +1,94 @@ +--- +lammps_version: 22 Dec 2022 +tags: slow +date_generated: Thu Jan 26 15:27:03 2023 +epsilon: 2.5e-12 +skip_tests: +prerequisites: ! | + pair meam/ms +pre_commands: ! | + variable newton_pair delete + if "$(is_active(package,gpu)) > 0.0" then "variable newton_pair index off" else "variable newton_pair index on" +post_commands: ! "" +input_file: in.metal +pair_style: meam/ms +pair_coeff: ! | + * * library.msmeam H Ga4 HGa.msmeam H Ga4 +extract: ! | + scale 2 +natoms: 32 +init_vdwl: 785.6030480758675 +init_coul: 0 +init_stress: ! |2- + 3.3502530994900699e+03 3.6405858278699407e+03 3.6349804214165547e+03 -3.1609283411508039e+02 -7.9448207656135153e+01 -1.9854140603340727e+02 +init_forces: ! |2 + 1 1.2872255079741514e+01 -7.5031848810810864e-01 4.5969595156096510e+01 + 2 -3.9028679722038632e+01 -1.5647800180326567e+02 -1.6643992152928173e+00 + 3 -6.1521549955194672e+01 2.6970968316419874e+02 -9.6866430262650326e+01 + 4 3.1462579880342336e+01 4.0240291291218455e+01 1.1654869213327775e+01 + 5 1.4859248182951113e+01 -3.4132880749392825e+01 6.7430378007130244e+01 + 6 6.4609571260694096e+00 -3.8973222482916441e+01 -2.8510000379627442e+01 + 7 7.8114612113500250e+00 -1.0421431668544374e+01 -4.2887607385766536e+01 + 8 -4.8934215863351795e+01 -6.3567347969802590e-01 1.1845972792272754e+02 + 9 9.4089549606898402e+01 -7.4342942103394511e+00 2.5331198575951383e+01 + 10 1.5130369934140692e+01 -5.9245630928969938e+01 -6.7469126603400198e+01 + 11 -2.5176547213746847e+01 1.1577205529172168e+02 -2.2897457133540517e+01 + 12 6.2237686199502349e+01 2.0501996047945163e+01 -2.8805091517252826e+01 + 13 -5.9438589221526925e+01 3.0453092653824072e+01 -1.9919245831196157e+01 + 14 6.9128305482543766e+01 -7.7400771634148342e+01 3.3376079908119145e+01 + 15 -4.9671207786831857e+01 -4.9520814527298228e+01 8.4325181097614305e+01 + 16 -1.1782591146017666e+01 -3.2478963020209051e+01 1.5503663677714293e+01 + 17 9.0881787245915220e+00 6.2377477671714963e+01 -4.0411006180232363e+01 + 18 -4.2285082775720454e+01 2.4883979527636967e+01 -4.4858149086530510e+00 + 19 -8.0259798420493979e+01 9.6356660229207137e+01 6.0543230952477984e+01 + 20 8.0924547938759346e+01 7.1034504027236025e+01 -7.1958482512489610e+01 + 21 1.0833434220705425e+02 -1.5973910256481020e+02 -2.5432700070393153e+01 + 22 -2.3754601906353900e+00 5.2216955012971823e+01 4.7112051341131576e+00 + 23 -2.7227169255996543e+01 8.1968603165764222e+01 4.6535834898716878e+01 + 24 -2.9230758067555616e+01 6.5909555829367733e+01 -2.8250697734131258e+01 + 25 -5.1310041582953993e+01 -3.0895272949222822e+01 -5.4271286813003794e+00 + 26 3.9605941911194620e+01 -5.5919050176828883e+01 -1.0209061328106253e+01 + 27 8.2934427989660890e+01 6.1956200199325636e+01 5.0072108788590960e+01 + 28 -7.8572755094413296e+01 -3.9613391730681300e+01 -2.6183413623428891e+00 + 29 6.9475725072041925e+01 -6.0535433603583563e+01 -1.4566536349135829e+01 + 30 -2.4347184151182930e+01 -1.9359391333689970e+02 -2.6718379302915952e+01 + 31 7.7351971629808688e+01 -7.0102650745312999e+01 -5.4615048867524763e+01 + 32 -1.5060591772899014e+02 8.4489763988097266e+01 2.9799482293372058e+01 +run_vdwl: 682.3107192428497 +run_coul: 0 +run_stress: ! |2- + 3.2247564044913129e+03 3.3749506031067485e+03 3.3223794967215117e+03 -2.8460979167554797e+02 -7.2614457076660575e+00 -3.1510685747732862e+02 +run_forces: ! |2 + 1 -1.2037185973996296e+01 -2.5090364403764944e+01 1.4014184973113366e+01 + 2 -3.7365848425239264e+01 -1.5871199357658887e+02 3.7846333470446991e+00 + 3 -3.2057228694304293e+01 2.5316344962361612e+02 -6.0679585186816752e+01 + 4 2.9086197614116237e+01 4.8267528016068823e+01 4.3387429619749920e+00 + 5 -1.1672554618399744e+01 -2.6840760926124332e+01 4.9694308545223279e+01 + 6 1.1892092913978592e+01 -4.9360840569608243e+01 -2.3083171938147949e+01 + 7 2.1084251901459215e+01 -4.8251731643401072e+00 -3.8474871193885967e+01 + 8 -5.7775944085787714e+01 1.3522956442661442e+01 1.1661345819661486e+02 + 9 7.2926105059437930e+01 4.8686056096860133e+00 2.3817134806042311e+01 + 10 1.7307367990304396e+01 -3.0865570121704572e+01 -1.2314307646704794e+01 + 11 -1.1341297645054201e+01 9.1441145595173211e+01 -2.1806407500802493e+01 + 12 4.0645024127126625e+01 1.2207243511090397e+01 -2.6757649464936929e+01 + 13 -5.2283270287937697e+01 3.4023912643812679e+01 -1.9030352703627774e+01 + 14 8.4403128243303399e+01 -9.3773678297574406e+01 1.6481720093363641e+01 + 15 -4.2790833192154764e+01 -4.3242943642279130e+01 7.1075696811865868e+01 + 16 -1.5041912007490836e+01 -3.3544044565611586e+01 2.4823109532967212e+01 + 17 -9.6413207346836316e-01 4.5826021602656141e+01 -3.9155163702194102e+01 + 18 -2.0337015515785971e+01 7.2815285567550134e+00 -8.2049879725129813e+00 + 19 -6.4105384732081120e+01 1.1564665740933788e+02 2.4163791756721466e+01 + 20 8.5723654185276146e+01 8.3354105531647818e+01 -6.6380939444134356e+01 + 21 7.2614253221132458e+01 -1.0858997173537107e+02 -9.7505297776024449e+00 + 22 -7.0420361713052930e+00 5.3431098224890221e+01 3.3089063930822551e+00 + 23 -2.6591358240682062e+01 5.7408565880721866e+01 2.7437106471305679e+01 + 24 -4.1792038450554799e+01 5.1730557789864775e+01 -4.0814677464080816e+01 + 25 -4.1432062506590214e+01 -2.5839213423062226e+01 4.2240164846210408e+00 + 26 4.7210066329871566e+01 -5.2462761136081880e+01 -7.3222050314410501e+00 + 27 7.1880187551772764e+01 6.4264938765955392e+01 4.3600944370341068e+01 + 28 -8.4540787660053340e+01 -3.5402262816619938e+01 -1.8100280797937039e+01 + 29 6.9538301274653790e+01 -6.3441028093040622e+01 -1.4636386232064458e+01 + 30 -1.0347208112535196e+01 -1.7647584813608077e+02 7.2581082578181517e+00 + 31 5.5139777976761025e+01 -4.2081916983382541e+01 -4.6602437208067727e+01 + 32 -1.0993230999577290e+02 3.4110056387297462e+01 1.8478090262857769e+01 +...