Merge pull request #3608 from rohskopf/msmeam-adr

Multi-state MEAM with Kokkos support
This commit is contained in:
Axel Kohlmeyer
2023-01-26 21:47:46 -05:00
committed by GitHub
41 changed files with 2174 additions and 1378 deletions

View File

@ -200,6 +200,7 @@ OPT.
* :doc:`mdpd <pair_mesodpd>`
* :doc:`mdpd/rhosum <pair_mesodpd>`
* :doc:`meam (k) <pair_meam>`
* :doc:`meam/ms (k) <pair_meam>`
* :doc:`meam/spline (o) <pair_meam_spline>`
* :doc:`meam/sw/spline <pair_meam_sw_spline>`
* :doc:`mesocnt <pair_mesocnt>`

View File

@ -1,17 +1,26 @@
.. index:: pair_style meam
.. index:: pair_style meam/kk
.. index:: pair_style meam/ms
.. index:: pair_style meam/ms/kk
pair_style meam command
=========================
Accelerator Variants: *meam/kk*
pair_style meam/ms command
==========================
Accelerator Variants: *meam/ms/kk*
Syntax
""""""
.. code-block:: LAMMPS
pair_style meam
pair_style style
* style = *meam* or *meam/ms*
Examples
""""""""
@ -22,6 +31,9 @@ Examples
pair_coeff * * ../potentials/library.meam Si ../potentials/si.meam Si
pair_coeff * * ../potentials/library.meam Ni Al NULL Ni Al Ni Ni
pair_style meam/ms
pair_coeff * * ../potentials/library.msmeam H Ga ../potentials/HGa.meam H Ga
Description
"""""""""""
@ -31,16 +43,23 @@ Description
as of November 2010; see description below of the mixture_ref_t
parameter
Pair style *meam* computes non-bonded interactions for a variety of materials
using the modified embedded-atom method (MEAM)
:ref:`(Baskes) <Baskes>`. Conceptually, it is an extension to the original
:doc:`EAM method <pair_eam>` which adds angular forces. It is
thus suitable for modeling metals and alloys with fcc, bcc, hcp and
diamond cubic structures, as well as materials with covalent interactions
like silicon and carbon. This *meam* pair style is a translation of the
original Fortran version to C++. It is functionally equivalent but more
efficient and has additional features. The Fortran version of the *meam*
pair style has been removed from LAMMPS after the 12 December 2018 release.
Pair style *meam* computes non-bonded interactions for a variety of
materials using the modified embedded-atom method (MEAM) :ref:`(Baskes)
<Baskes>`. Conceptually, it is an extension to the original :doc:`EAM
method <pair_eam>` which adds angular forces. It is thus suitable for
modeling metals and alloys with fcc, bcc, hcp and diamond cubic
structures, as well as materials with covalent interactions like silicon
and carbon.
The *meam* pair style is a translation of the original Fortran version
to C++. It is functionally equivalent but more efficient and has
additional features. The Fortran version of the *meam* pair style has
been removed from LAMMPS after the 12 December 2018 release.
Pair style *meam/ms* uses the multi-state MEAM (MS-MEAM) method
according to :ref:`(Baskes2) <Baskes2>`, which is an extension to MEAM.
This pair style is mostly equivalent to *meam* and differs only
where noted in the documentation below.
In the MEAM formulation, the total energy E of a system of atoms is
given by:
@ -351,6 +370,16 @@ Most published MEAM parameter sets use the default values *attrac* = *repulse* =
Setting *repuls* = *attrac* = *delta* corresponds to the form used in several
recent published MEAM parameter sets, such as :ref:`(Valone) <Valone>`
Then using *meam/ms* pair style the multi-state MEAM (MS-MEAM) method is
activated. This requires 6 extra parameters in the MEAM library file,
resulting in 25 parameters ordered that are ordered like this:
elt, lat, z, ielement, atwt, alpha, b0, b1, b2, b3, b1m, b2m, b3m, alat, esub, asub,
t0, t1, t2, t3, t1m, t2m, t3m, rozero, ibar
The 6 extra MS-MEAM parameters are *b1m, b2m, b3m, t1m, t2m, t3m*.
In the LAMMPS ``potentials`` folder, compatible files have an ".msmeam" extension.
----------
.. include:: accel_styles.rst
@ -393,16 +422,15 @@ This pair style can only be used via the *pair* keyword of the
Restrictions
""""""""""""
The *meam* style is provided in the MEAM package. It is
only enabled if LAMMPS was built with that package.
The *meam* and *meam/ms* pair styles are provided in the MEAM
package. They are only enabled if LAMMPS was built with that package.
See the :doc:`Build package <Build_package>` page for more info.
The maximum number of elements, that can be read from the MEAM
library file, is determined at compile time. The default is 5.
If you need support for more elements, you have to change the
define for the constant 'maxelt' at the beginning of the file
src/MEAM/meam.h and update/recompile LAMMPS. There is no
limit on the number of atoms types.
The maximum number of elements, that can be read from the MEAM library
file, is determined at compile time. The default is 5. If you need
support for more elements, you have to change the the constant 'maxelt'
at the beginning of the file ``src/MEAM/meam.h`` and update/recompile
LAMMPS. There is no limit on the number of atoms types.
Related commands
""""""""""""""""
@ -421,6 +449,10 @@ none
**(Baskes)** Baskes, Phys Rev B, 46, 2727-2742 (1992).
.. _Baskes2:
**(Baskes2)** Baskes, Phys Rev B, 75, 094113 (2007).
.. _Gullet:
**(Gullet)** Gullet, Wagner, Slepoy, SANDIA Report 2003-8782 (2003). DOI:10.2172/918395

View File

@ -277,7 +277,8 @@ accelerated styles exist.
* :doc:`lubricateU/poly <pair_lubricateU>` - hydrodynamic lubrication forces for Fast Lubrication with polydispersity
* :doc:`mdpd <pair_mesodpd>` - mDPD particle interactions
* :doc:`mdpd/rhosum <pair_mesodpd>` - mDPD particle interactions for mass density
* :doc:`meam <pair_meam>` - modified embedded atom method (MEAM) in C
* :doc:`meam <pair_meam>` - modified embedded atom method (MEAM)
* :doc:`meam/ms <pair_meam>` - multi-state modified embedded atom method (MS-MEAM)
* :doc:`meam/spline <pair_meam_spline>` - splined version of MEAM
* :doc:`meam/sw/spline <pair_meam_sw_spline>` - splined version of MEAM with a Stillinger-Weber term
* :doc:`mesocnt <pair_mesocnt>` - mesoscopic vdW potential for (carbon) nanotubes

View File

@ -0,0 +1,30 @@
bkgd_dyn = 1
emb_lin_neg = 1
augt1=0
ialloy=1
rc = 5.9
#H
attrac(1,1)=0.460
repuls(1,1)=0.460
Cmin(1,1,1)=1.3 # PuMS
Cmax(1,1,1)= 2.80
nn2(1,1)=1
#Ga
rho0(2) = 0.6
attrac(2,2)=0.097
repuls(2,2)=0.097
nn2(2,2)=1
#HGa
attrac(1,2)=0.300
repuls(1,2)=0.300
lattce(1,2)=l12
re(1,2)=3.19
delta(1,2)=-0.48
alpha(1,2)=6.6
Cmin(1,1,2)=2.0
Cmin(2,1,2)= 2.0
Cmin(1,2,1)=2.0
Cmin(2,2,1) = 1.4
Cmin(1,2,2) = 1.4
Cmin(1,1,2) = 1.4
nn2(1,2)=1

View File

@ -0,0 +1,9 @@
To run Baske's test, do
lmp -in in.msmeam
Then
diff dump.msmeam dump.msmeam.bu

View File

@ -0,0 +1,25 @@
LAMMPS data file via write_data, version 16 Feb 2016, timestep = 1
3 atoms
2 atom types
-4.0000000000000000e+00 4.0000000000000000e+00 xlo xhi
-4.0000000000000000e+00 4.0000000000000000e+00 ylo yhi
-4.0000000000000000e+00 4.0000000000000000e+00 zlo zhi
Masses
1 1.0079
2 69.723
Atoms # atomic
1 1 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 0 0 0
2 2 2.2000000000000002e+00 0.0000000000000000e+00 0.0000000000000000e+00 0 0 0
3 2 2.9999999999999999e-01 2.2999999999999998e+00 0.0000000000000000e+00 0 0 0
Velocities
1 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
2 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
3 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00

View File

@ -0,0 +1,24 @@
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
3
ITEM: BOX BOUNDS pp pp pp
-4 4
-4 4
-4 4
ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0
2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0
3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0
ITEM: TIMESTEP
1
ITEM: NUMBER OF ATOMS
3
ITEM: BOX BOUNDS pp pp pp
-4 4
-4 4
-4 4
ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0
2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0
3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0

View File

@ -0,0 +1,31 @@
echo both
log log.msmeam
# Test of MEAM potential for HGa
# ------------------------ INITIALIZATION ----------------------------
units metal
dimension 3
boundary p p p
atom_style atomic
variable latparam equal 4.646
variable ncell equal 3
# ----------------------- ATOM DEFINITION ----------------------------
region box block -4 4 -4 4 -4 4
create_box 2 box
#
include potential.mod
create_atoms 1 single 0 0 0 units box
create_atoms 2 single 2.2 0 0 units box
create_atoms 2 single 0.3 2.3 0 units box
# ---------- Define Settings ---------------------
variable teng equal "c_eatoms"
compute pot_energy all pe/atom
compute stress all stress/atom NULL
dump 1 all custom 1 dump.msmeam id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
run 1
write_data data.msmeam
print "All done!"

View File

@ -0,0 +1,14 @@
# DATE: 2018-09-22 UNITS: metal CONTRIBUTOR: Steve Valone, smv@lanl.gov CITATION: Baskes, PRB 1992; smv, sr, mib, JNM 2010
# ms-meam data format May 2010
# elt lat z ielement atwt
# alpha b0 b1 b2 b3 b1m b2m b3m alat esub asub
# - t0 t1 t2 t3 t1m t2m t3m rozero ibar
# NOTE: leading character cannot be a space
'H' 'dim' 1.0 1 1.0079
2.960 2.960 3.0 1.0 1.0 1.0 3.0 1.0 0.741 2.235 2.50
1.0 0.44721 0.0 0.00 0.0 0.31623 0 6.70 0
'Ga4' 'fcc' 12.0 31 69.723
4.42 4.80 3.10 6.00 0.00 0.0 0.0 0.5 4.247 2.897 0.97
1.0 1.649 1.435 0.00 0.0 0.0 2.0 0.70 0

View File

@ -0,0 +1,107 @@
# Test of MEAM potential for HGa
# ------------------------ INITIALIZATION ----------------------------
units metal
dimension 3
boundary p p p
atom_style atomic
variable latparam equal 4.646
variable ncell equal 3
# ----------------------- ATOM DEFINITION ----------------------------
region box block -4 4 -4 4 -4 4
create_box 2 box
Created orthogonal box = (-4 -4 -4) to (4 4 4)
1 by 1 by 1 MPI processor grid
#
include potential.mod
# NOTE: This script can be modified for different pair styles
# See in.elastic for more info.
variable Pu string H
print "potential chosen ${Pu}"
potential chosen H
# Choose potential
pair_style MSmeam
print "we just executed"
we just executed
pair_coeff * * library.MSmeam ${Pu} Ga4 HGaMS.meam ${Pu} Ga4
pair_coeff * * library.MSmeam H Ga4 HGaMS.meam ${Pu} Ga4
pair_coeff * * library.MSmeam H Ga4 HGaMS.meam H Ga4
Reading potential file library.MSmeam with DATE: 2018-09-22
# Setup neighbor style
neighbor 1.0 nsq
neigh_modify once no every 1 delay 0 check yes
# Setup minimization style
variable dmax equal 1.0e-2
min_style cg
min_modify dmax ${dmax} line quadratic
min_modify dmax 0.01 line quadratic
compute eng all pe/atom
compute eatoms all reduce sum c_eng
# Setup output
thermo 100
thermo_style custom step temp etotal press pxx pyy pzz pxy pxz pyz lx ly lz vol c_eatoms
thermo_modify norm yes
create_atoms 1 single 0 0 0 units box
Created 1 atoms
create_atoms 2 single 2.2 0 0 units box
Created 1 atoms
create_atoms 2 single 0.3 2.3 0 units box
Created 1 atoms
# ---------- Define Settings ---------------------
variable teng equal "c_eatoms"
compute pot_energy all pe/atom
compute stress all stress/atom NULL
dump 1 all custom 1 dump.msmeam id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
run 1
WARNING: No fixes defined, atoms won't move (../verlet.cpp:55)
Neighbor list info ...
2 neighbor list requests
update every 1 steps, delay 0 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 6.9
ghost atom cutoff = 6.9
Memory usage per processor = 12.9295 Mbytes
Step Temp TotEng Press Pxx Pyy Pzz Pxy Pxz Pyz Lx Ly Lz Volume eatoms
0 0 15.433079 491354.68 838670.91 635393.13 0 80195.793 0 0 8 8 8 512 15.433079
1 0 15.433079 491354.68 838670.91 635393.13 0 80195.793 0 0 8 8 8 512 15.433079
Loop time of 0.000172138 on 1 procs for 1 steps with 3 atoms
Performance: 501.922 ns/day, 0.048 hours/ns, 5809.285 timesteps/s
81.3% CPU use with 1 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 6.6996e-05 | 6.6996e-05 | 6.6996e-05 | 0.0 | 38.92
Neigh | 0 | 0 | 0 | 0.0 | 0.00
Comm | 1.9073e-06 | 1.9073e-06 | 1.9073e-06 | 0.0 | 1.11
Output | 9.7036e-05 | 9.7036e-05 | 9.7036e-05 | 0.0 | 56.37
Modify | 0 | 0 | 0 | 0.0 | 0.00
Other | | 6.199e-06 | | | 3.60
Nlocal: 3 ave 3 max 3 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 78 ave 78 max 78 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 7 ave 7 max 7 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs: 14 ave 14 max 14 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Total # of neighbors = 14
Ave neighs/atom = 4.66667
Neighbor list builds = 0
Dangerous builds = 0
write_data data.msmeam
print "All done!"
All done!
Total wall time: 0:00:00

View File

@ -0,0 +1,24 @@
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
3
ITEM: BOX BOUNDS pp pp pp
-4 4
-4 4
-4 4
ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0
2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0
3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0
ITEM: TIMESTEP
1
ITEM: NUMBER OF ATOMS
3
ITEM: BOX BOUNDS pp pp pp
-4 4
-4 4
-4 4
ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0
2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0
3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0

View File

@ -0,0 +1,25 @@
# NOTE: This script can be modified for different pair styles
# See in.elastic for more info.
variable Pu string H
print "potential chosen ${Pu}"
# Choose potential
pair_style meam/ms
print "we just executed"
pair_coeff * * library.msmeam ${Pu} Ga4 HGa.meam ${Pu} Ga4
# Setup neighbor style
neighbor 1.0 bin
neigh_modify once no every 1 delay 0 check yes
# Setup minimization style
variable dmax equal 1.0e-2
min_style cg
min_modify dmax ${dmax} line quadratic
compute eng all pe/atom
compute eatoms all reduce sum c_eng
# Setup output
thermo 100
thermo_style custom step temp etotal press pxx pyy pzz pxy pxz pyz lx ly lz vol c_eatoms
thermo_modify norm yes

View File

@ -1,199 +0,0 @@
diff --git a/lib/kokkos/Makefile.kokkos b/lib/kokkos/Makefile.kokkos
index 22af411f32..530510a0d1 100644
--- a/lib/kokkos/Makefile.kokkos
+++ b/lib/kokkos/Makefile.kokkos
@@ -20,7 +20,7 @@ KOKKOS_DEVICES ?= "OpenMP"
#KOKKOS_DEVICES ?= "Threads"
# Options:
# Intel: KNC,KNL,SNB,HSW,BDW,SKL,SKX,ICL,ICX,SPR
-# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86
+# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86,Hopper90
# ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX
# IBM: BGQ,Power7,Power8,Power9
# AMD-GPUS: Vega900,Vega906,Vega908,Vega90A
@@ -401,6 +401,7 @@ KOKKOS_INTERNAL_USE_ARCH_VOLTA72 := $(call kokkos_has_string,$(KOKKOS_ARCH),Volt
KOKKOS_INTERNAL_USE_ARCH_TURING75 := $(call kokkos_has_string,$(KOKKOS_ARCH),Turing75)
KOKKOS_INTERNAL_USE_ARCH_AMPERE80 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere80)
KOKKOS_INTERNAL_USE_ARCH_AMPERE86 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere86)
+KOKKOS_INTERNAL_USE_ARCH_HOPPER90 := $(call kokkos_has_string,$(KOKKOS_ARCH),Hopper90)
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
@@ -414,7 +415,8 @@ KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLE
+ $(KOKKOS_INTERNAL_USE_ARCH_VOLTA72) \
+ $(KOKKOS_INTERNAL_USE_ARCH_TURING75) \
+ $(KOKKOS_INTERNAL_USE_ARCH_AMPERE80) \
- + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE86))
+ + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE86) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_HOPPER90))
#SEK: This seems like a bug to me
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
@@ -1194,6 +1196,11 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA_ARCH), 1)
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMPERE86")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_86
endif
+ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_HOPPER90), 1)
+ tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_HOPPER")
+ tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_HOPPER90")
+ KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_90
+ endif
ifneq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)
diff --git a/lib/kokkos/cmake/KokkosCore_config.h.in b/lib/kokkos/cmake/KokkosCore_config.h.in
index 88ddc48378..b83ced9243 100644
--- a/lib/kokkos/cmake/KokkosCore_config.h.in
+++ b/lib/kokkos/cmake/KokkosCore_config.h.in
@@ -102,6 +102,7 @@
#cmakedefine KOKKOS_ARCH_AMPERE
#cmakedefine KOKKOS_ARCH_AMPERE80
#cmakedefine KOKKOS_ARCH_AMPERE86
+#cmakedefine KOKKOS_ARCH_HOPPER90
#cmakedefine KOKKOS_ARCH_AMD_ZEN
#cmakedefine KOKKOS_ARCH_AMD_ZEN2
#cmakedefine KOKKOS_ARCH_AMD_ZEN3
diff --git a/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc b/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc
index f56cef1651..2585a6a64c 100644
--- a/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc
+++ b/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc
@@ -74,6 +74,7 @@ int main() {
case 75: std::cout << "Set -DKokkos_ARCH_TURING75=ON ." << std::endl; break;
case 80: std::cout << "Set -DKokkos_ARCH_AMPERE80=ON ." << std::endl; break;
case 86: std::cout << "Set -DKokkos_ARCH_AMPERE86=ON ." << std::endl; break;
+ case 90: std::cout << "Set -DKokkos_ARCH_HOPPER90=ON ." << std::endl; break;
default:
std::cout << "Compute capability " << compute_capability
<< " is not supported" << std::endl;
diff --git a/lib/kokkos/cmake/kokkos_arch.cmake b/lib/kokkos/cmake/kokkos_arch.cmake
index ef16aad047..c1d76cceeb 100644
--- a/lib/kokkos/cmake/kokkos_arch.cmake
+++ b/lib/kokkos/cmake/kokkos_arch.cmake
@@ -86,6 +86,7 @@ KOKKOS_ARCH_OPTION(VOLTA72 GPU "NVIDIA Volta generation CC 7.2" "KOKK
KOKKOS_ARCH_OPTION(TURING75 GPU "NVIDIA Turing generation CC 7.5" "KOKKOS_SHOW_CUDA_ARCHS")
KOKKOS_ARCH_OPTION(AMPERE80 GPU "NVIDIA Ampere generation CC 8.0" "KOKKOS_SHOW_CUDA_ARCHS")
KOKKOS_ARCH_OPTION(AMPERE86 GPU "NVIDIA Ampere generation CC 8.6" "KOKKOS_SHOW_CUDA_ARCHS")
+KOKKOS_ARCH_OPTION(HOPPER90 GPU "NVIDIA Hopper generation CC 9.0" "KOKKOS_SHOW_CUDA_ARCHS")
IF(Kokkos_ENABLE_HIP OR Kokkos_ENABLE_OPENMPTARGET OR Kokkos_ENABLE_UNSUPPORTED_ARCHS)
SET(KOKKOS_SHOW_HIP_ARCHS ON)
@@ -544,6 +545,7 @@ CHECK_CUDA_ARCH(VOLTA72 sm_72)
CHECK_CUDA_ARCH(TURING75 sm_75)
CHECK_CUDA_ARCH(AMPERE80 sm_80)
CHECK_CUDA_ARCH(AMPERE86 sm_86)
+CHECK_CUDA_ARCH(HOPPER90 sm_90)
SET(AMDGPU_ARCH_ALREADY_SPECIFIED "")
FUNCTION(CHECK_AMDGPU_ARCH ARCH FLAG)
@@ -806,6 +808,10 @@ IF (KOKKOS_ARCH_AMPERE80 OR KOKKOS_ARCH_AMPERE86)
SET(KOKKOS_ARCH_AMPERE ON)
ENDIF()
+IF (KOKKOS_ARCH_HOPPER90)
+ SET(KOKKOS_ARCH_HOPPER ON)
+ENDIF()
+
#Regardless of version, make sure we define the general architecture name
IF (KOKKOS_ARCH_VEGA900 OR KOKKOS_ARCH_VEGA906 OR KOKKOS_ARCH_VEGA908 OR KOKKOS_ARCH_VEGA90A)
SET(KOKKOS_ARCH_VEGA ON)
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
index 56f9117844..fcd4773dbc 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
@@ -232,7 +232,8 @@ inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
case 61: return 96;
case 70:
case 80:
- case 86: return 8;
+ case 86:
+ case 90: return 8;
case 75: return 32;
default:
Kokkos::Impl::throw_runtime_exception(
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp
index 40a263561f..8c40ebd60d 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp
@@ -418,7 +418,7 @@ KOKKOS_INLINE_FUNCTION
#endif // CUDA_VERSION >= 11000 && CUDA_VERSION < 11010
#if CUDA_VERSION >= 11010 && \
- ((defined(KOKKOS_ARCH_AMPERE80) || defined(KOKKOS_ARCH_AMPERE86)))
+ ((defined(KOKKOS_ARCH_AMPERE) || defined(KOKKOS_ARCH_HOPPER)))
KOKKOS_INLINE_FUNCTION
bhalf_t cast_to_bhalf(bhalf_t val) { return val; }
KOKKOS_INLINE_FUNCTION
diff --git a/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp b/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
index f9451ecfe6..2ce1efb98c 100644
--- a/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
+++ b/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
@@ -51,7 +51,7 @@ namespace Kokkos::Experimental::Impl {
struct OpenACC_Traits {
#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
- defined(KOKKOS_ARCH_AMPERE)
+ defined(KOKKOS_ARCH_AMPERE) || defined(KOKKOS_ARCH_HOPPER)
static constexpr acc_device_t dev_type = acc_device_nvidia;
static constexpr bool may_fallback_to_host = false;
#else
diff --git a/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp b/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp
index a9bc085912..27ee1d4232 100644
--- a/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp
+++ b/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp
@@ -115,8 +115,9 @@ void OpenMPTargetInternal::impl_initialize() {
// FIXME_OPENMPTARGET: Only fix the number of teams for NVIDIA architectures
// from Pascal and upwards.
-#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
- defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE)
+#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
+ defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \
+ defined(KOKKOS_ARCH_HOPPER)
#if defined(KOKKOS_COMPILER_CLANG) && (KOKKOS_COMPILER_CLANG >= 1300)
omp_set_num_teams(512);
#endif
diff --git a/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp b/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp
index 840db4327c..7e5addbc5b 100644
--- a/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp
+++ b/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp
@@ -155,7 +155,7 @@ void SYCL::impl_initialize(InitializationSettings const& settings) {
#if !defined(KOKKOS_ARCH_INTEL_GPU) && !defined(KOKKOS_ARCH_KEPLER) && \
!defined(KOKKOS_ARCH_MAXWELL) && !defined(KOKKOS_ARCH_PASCAL) && \
!defined(KOKKOS_ARCH_VOLTA) && !defined(KOKKOS_ARCH_TURING75) && \
- !defined(KOKKOS_ARCH_AMPERE)
+ !defined(KOKKOS_ARCH_AMPERE) && !defined(KOKKOS_ARCH_HOPPER)
if (!settings.has_device_id() && gpu_devices.empty()) {
Impl::SYCLInternal::singleton().initialize(sycl::device());
return;
diff --git a/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp b/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp
index 5ac7d8af30..ba101f699e 100644
--- a/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp
+++ b/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp
@@ -335,9 +335,10 @@ class TeamPolicyInternal<Kokkos::Experimental::SYCL, Properties...>
return std::min({
int(m_space.impl_internal_space_instance()->m_maxWorkgroupSize),
// FIXME_SYCL Avoid requesting to many registers on NVIDIA GPUs.
-#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
- defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
- defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE)
+#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
+ defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
+ defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \
+ defined(KOKKOS_ARCH_HOPPER)
256,
#endif
max_threads_for_memory
@@ -367,9 +368,10 @@ class TeamPolicyInternal<Kokkos::Experimental::SYCL, Properties...>
return std::min<int>({
int(m_space.impl_internal_space_instance()->m_maxWorkgroupSize),
// FIXME_SYCL Avoid requesting to many registers on NVIDIA GPUs.
-#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
- defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
- defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE)
+#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
+ defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
+ defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \
+ defined(KOKKOS_ARCH_HOPPER)
256,
#endif
max_threads_for_memory

View File

@ -1,523 +0,0 @@
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
index fcd4773dbc..30b6958a67 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
@@ -207,7 +207,6 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
LaunchBounds{});
}
-// Assuming cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1)
// NOTE these number can be obtained several ways:
// * One option is to download the CUDA Occupancy Calculator spreadsheet, select
// "Compute Capability" first and check what is the smallest "Shared Memory
@@ -242,6 +241,7 @@ inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
return 0;
}() * 1024;
}
+
} // namespace Impl
} // namespace Kokkos
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
index 5811498e01..e22eb3b842 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
@@ -569,12 +569,6 @@ Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
}
#endif
-#ifdef KOKKOS_ENABLE_PRE_CUDA_10_DEPRECATION_API
- cudaThreadSetCacheConfig(cudaFuncCachePreferShared);
-#else
- cudaDeviceSetCacheConfig(cudaFuncCachePreferShared);
-#endif
-
// Init the array for used for arbitrarily sized atomics
if (stream == nullptr) Impl::initialize_host_cuda_lock_arrays();
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
index b7a80ad84f..5c4c3a7d39 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
@@ -93,10 +93,6 @@ namespace Impl {
// __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
// function qualifier which could be used to improve performance.
//----------------------------------------------------------------------------
-// Maximize L1 cache and minimize shared memory:
-// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1 );
-// For 2.0 capability: 48 KB L1 and 16 KB shared
-//----------------------------------------------------------------------------
template <class DriverType>
__global__ static void cuda_parallel_launch_constant_memory() {
@@ -158,63 +154,105 @@ inline void check_shmem_request(CudaInternal const* cuda_instance, int shmem) {
}
}
-// This function needs to be template on DriverType and LaunchBounds
+// These functions needs to be template on DriverType and LaunchBounds
// so that the static bool is unique for each type combo
// KernelFuncPtr does not necessarily contain that type information.
+
template <class DriverType, class LaunchBounds, class KernelFuncPtr>
-inline void configure_shmem_preference(KernelFuncPtr const& func,
- bool prefer_shmem) {
+const cudaFuncAttributes& get_cuda_kernel_func_attributes(
+ const KernelFuncPtr& func) {
+ // Only call cudaFuncGetAttributes once for each unique kernel
+ // by leveraging static variable initialization rules
+ auto wrap_get_attributes = [&]() -> cudaFuncAttributes {
+ cudaFuncAttributes attr;
+ KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncGetAttributes(&attr, func));
+ return attr;
+ };
+ static cudaFuncAttributes func_attr = wrap_get_attributes();
+ return func_attr;
+}
+
+template <class DriverType, class LaunchBounds, class KernelFuncPtr>
+inline void configure_shmem_preference(const KernelFuncPtr& func,
+ const cudaDeviceProp& device_props,
+ const size_t block_size, int& shmem,
+ const size_t occupancy) {
#ifndef KOKKOS_ARCH_KEPLER
- // On Kepler the L1 has no benefit since it doesn't cache reads
+
+ const auto& func_attr =
+ get_cuda_kernel_func_attributes<DriverType, LaunchBounds>(func);
+
+ // Compute limits for number of blocks due to registers/SM
+ const size_t regs_per_sm = device_props.regsPerMultiprocessor;
+ const size_t regs_per_thread = func_attr.numRegs;
+ // The granularity of register allocation is chunks of 256 registers per warp
+ // -> 8 registers per thread
+ const size_t allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8);
+ const size_t max_blocks_regs =
+ regs_per_sm / (allocated_regs_per_thread * block_size);
+
+ // Compute how many threads per sm we actually want
+ const size_t max_threads_per_sm = device_props.maxThreadsPerMultiProcessor;
+ // only allocate multiples of warp size
+ const size_t num_threads_desired =
+ ((max_threads_per_sm * occupancy / 100 + 31) / 32) * 32;
+ // Get close to the desired occupancy,
+ // don't undershoot by much but also don't allocate a whole new block just
+ // because one is a few threads over otherwise.
+ size_t num_blocks_desired =
+ (num_threads_desired + block_size * 0.8) / block_size;
+ num_blocks_desired = ::std::min(max_blocks_regs, num_blocks_desired);
+ if (num_blocks_desired == 0) num_blocks_desired = 1;
+
+ // Calculate how much shared memory we need per block
+ size_t shmem_per_block = shmem + func_attr.sharedSizeBytes;
+
+ // The minimum shared memory allocation we can have in total per SM is 8kB.
+ // If we want to lower occupancy we have to make sure we request at least that
+ // much in aggregate over all blocks, so that shared memory actually becomes a
+ // limiting factor for occupancy
+ constexpr size_t min_shmem_size_per_sm = 8192;
+ if ((occupancy < 100) &&
+ (shmem_per_block * num_blocks_desired < min_shmem_size_per_sm)) {
+ shmem_per_block = min_shmem_size_per_sm / num_blocks_desired;
+ // Need to set the caller's shmem variable so that the
+ // kernel launch uses the correct dynamic shared memory request
+ shmem = shmem_per_block - func_attr.sharedSizeBytes;
+ }
+
+ // Compute the carveout fraction we need based on occupancy
+ // Use multiples of 8kB
+ const size_t max_shmem_per_sm = device_props.sharedMemPerMultiprocessor;
+ size_t carveout = shmem_per_block == 0
+ ? 0
+ : 100 *
+ (((num_blocks_desired * shmem_per_block +
+ min_shmem_size_per_sm - 1) /
+ min_shmem_size_per_sm) *
+ min_shmem_size_per_sm) /
+ max_shmem_per_sm;
+ if (carveout > 100) carveout = 100;
+
+ // Set the carveout, but only call it once per kernel or when it changes
auto set_cache_config = [&] {
- KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
- func,
- (prefer_shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1)));
- return prefer_shmem;
+ KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncSetAttribute(
+ func, cudaFuncAttributePreferredSharedMemoryCarveout, carveout));
+ return carveout;
};
- static bool cache_config_preference_cached = set_cache_config();
- if (cache_config_preference_cached != prefer_shmem) {
+ // Store the value in a static variable so we only reset if needed
+ static size_t cache_config_preference_cached = set_cache_config();
+ if (cache_config_preference_cached != carveout) {
cache_config_preference_cached = set_cache_config();
}
#else
// Use the parameters so we don't get a warning
(void)func;
- (void)prefer_shmem;
+ (void)device_props;
+ (void)block_size;
+ (void)occupancy;
#endif
}
-template <class Policy>
-std::enable_if_t<Policy::experimental_contains_desired_occupancy>
-modify_launch_configuration_if_desired_occupancy_is_specified(
- Policy const& policy, cudaDeviceProp const& properties,
- cudaFuncAttributes const& attributes, dim3 const& block, int& shmem,
- bool& prefer_shmem) {
- int const block_size = block.x * block.y * block.z;
- int const desired_occupancy = policy.impl_get_desired_occupancy().value();
-
- size_t const shmem_per_sm_prefer_l1 = get_shmem_per_sm_prefer_l1(properties);
- size_t const static_shmem = attributes.sharedSizeBytes;
-
- // round to nearest integer and avoid division by zero
- int active_blocks = std::max(
- 1, static_cast<int>(std::round(
- static_cast<double>(properties.maxThreadsPerMultiProcessor) /
- block_size * desired_occupancy / 100)));
- int const dynamic_shmem =
- shmem_per_sm_prefer_l1 / active_blocks - static_shmem;
-
- if (dynamic_shmem > shmem) {
- shmem = dynamic_shmem;
- prefer_shmem = false;
- }
-}
-
-template <class Policy>
-std::enable_if_t<!Policy::experimental_contains_desired_occupancy>
-modify_launch_configuration_if_desired_occupancy_is_specified(
- Policy const&, cudaDeviceProp const&, cudaFuncAttributes const&,
- dim3 const& /*block*/, int& /*shmem*/, bool& /*prefer_shmem*/) {}
-
// </editor-fold> end Some helper functions for launch code readability }}}1
//==============================================================================
@@ -348,7 +386,7 @@ struct CudaParallelLaunchKernelInvoker<
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
inline static void create_parallel_launch_graph_node(
DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
- CudaInternal const* cuda_instance, bool prefer_shmem) {
+ CudaInternal const* cuda_instance) {
//----------------------------------------
auto const& graph = Impl::get_cuda_graph_from_kernel(driver);
KOKKOS_EXPECTS(bool(graph));
@@ -358,8 +396,15 @@ struct CudaParallelLaunchKernelInvoker<
if (!Impl::is_empty_launch(grid, block)) {
Impl::check_shmem_request(cuda_instance, shmem);
- Impl::configure_shmem_preference<DriverType, LaunchBounds>(
- base_t::get_kernel_func(), prefer_shmem);
+ if (DriverType::Policy::
+ experimental_contains_desired_occupancy) {
+ int desired_occupancy =
+ driver.get_policy().impl_get_desired_occupancy().value();
+ size_t block_size = block.x * block.y * block.z;
+ Impl::configure_shmem_preference<DriverType, LaunchBounds>(
+ base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
+ shmem, desired_occupancy);
+ }
void const* args[] = {&driver};
@@ -442,7 +487,7 @@ struct CudaParallelLaunchKernelInvoker<
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
inline static void create_parallel_launch_graph_node(
DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
- CudaInternal const* cuda_instance, bool prefer_shmem) {
+ CudaInternal const* cuda_instance) {
//----------------------------------------
auto const& graph = Impl::get_cuda_graph_from_kernel(driver);
KOKKOS_EXPECTS(bool(graph));
@@ -452,8 +497,15 @@ struct CudaParallelLaunchKernelInvoker<
if (!Impl::is_empty_launch(grid, block)) {
Impl::check_shmem_request(cuda_instance, shmem);
- Impl::configure_shmem_preference<DriverType, LaunchBounds>(
- base_t::get_kernel_func(), prefer_shmem);
+ if constexpr (DriverType::Policy::
+ experimental_contains_desired_occupancy) {
+ int desired_occupancy =
+ driver.get_policy().impl_get_desired_occupancy().value();
+ size_t block_size = block.x * block.y * block.z;
+ Impl::configure_shmem_preference<DriverType, LaunchBounds>(
+ base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
+ shmem, desired_occupancy);
+ }
auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver);
@@ -566,7 +618,7 @@ struct CudaParallelLaunchKernelInvoker<
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
inline static void create_parallel_launch_graph_node(
DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
- CudaInternal const* cuda_instance, bool prefer_shmem) {
+ CudaInternal const* cuda_instance) {
// Just use global memory; coordinating through events to share constant
// memory with the non-graph interface is not really reasonable since
// events don't work with Graphs directly, and this would anyway require
@@ -580,7 +632,7 @@ struct CudaParallelLaunchKernelInvoker<
DriverType, LaunchBounds,
Experimental::CudaLaunchMechanism::GlobalMemory>;
global_launch_impl_t::create_parallel_launch_graph_node(
- driver, grid, block, shmem, cuda_instance, prefer_shmem);
+ driver, grid, block, shmem, cuda_instance);
}
#endif
};
@@ -613,8 +665,7 @@ struct CudaParallelLaunchImpl<
inline static void launch_kernel(const DriverType& driver, const dim3& grid,
const dim3& block, int shmem,
- const CudaInternal* cuda_instance,
- bool prefer_shmem) {
+ const CudaInternal* cuda_instance) {
if (!Impl::is_empty_launch(grid, block)) {
// Prevent multiple threads to simultaneously set the cache configuration
// preference and launch the same kernel
@@ -623,18 +674,17 @@ struct CudaParallelLaunchImpl<
Impl::check_shmem_request(cuda_instance, shmem);
- // If a desired occupancy is specified, we compute how much shared memory
- // to ask for to achieve that occupancy, assuming that the cache
- // configuration is `cudaFuncCachePreferL1`. If the amount of dynamic
- // shared memory computed is actually smaller than `shmem` we overwrite
- // `shmem` and set `prefer_shmem` to `false`.
- modify_launch_configuration_if_desired_occupancy_is_specified(
- driver.get_policy(), cuda_instance->m_deviceProp,
- get_cuda_func_attributes(), block, shmem, prefer_shmem);
-
- Impl::configure_shmem_preference<
- DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
- base_t::get_kernel_func(), prefer_shmem);
+ if (DriverType::Policy::
+ experimental_contains_desired_occupancy) {
+ int desired_occupancy =
+ driver.get_policy().impl_get_desired_occupancy().value();
+ size_t block_size = block.x * block.y * block.z;
+ Impl::configure_shmem_preference<
+ DriverType,
+ Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
+ base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
+ shmem, desired_occupancy);
+ }
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
@@ -650,18 +700,9 @@ struct CudaParallelLaunchImpl<
}
static cudaFuncAttributes get_cuda_func_attributes() {
- // Race condition inside of cudaFuncGetAttributes if the same address is
- // given requires using a local variable as input instead of a static Rely
- // on static variable initialization to make sure only one thread executes
- // the code and the result is visible.
- auto wrap_get_attributes = []() -> cudaFuncAttributes {
- cudaFuncAttributes attr_tmp;
- KOKKOS_IMPL_CUDA_SAFE_CALL(
- cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()));
- return attr_tmp;
- };
- static cudaFuncAttributes attr = wrap_get_attributes();
- return attr;
+ return get_cuda_kernel_func_attributes<
+ DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
+ base_t::get_kernel_func());
}
};
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
index e586bb4cc6..0e348c092a 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
@@ -121,8 +121,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
maxblocks[1]),
1);
CudaParallelLaunch<ParallelFor, LaunchBounds>(
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
- false);
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
} else if (RP::rank == 3) {
const dim3 block(m_rp.m_tile[0], m_rp.m_tile[1], m_rp.m_tile[2]);
KOKKOS_ASSERT(block.x > 0);
@@ -139,8 +138,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
(m_rp.m_upper[2] - m_rp.m_lower[2] + block.z - 1) / block.z,
maxblocks[2]));
CudaParallelLaunch<ParallelFor, LaunchBounds>(
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
- false);
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
} else if (RP::rank == 4) {
// id0,id1 encoded within threadIdx.x; id2 to threadIdx.y; id3 to
// threadIdx.z
@@ -158,8 +156,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
(m_rp.m_upper[3] - m_rp.m_lower[3] + block.z - 1) / block.z,
maxblocks[2]));
CudaParallelLaunch<ParallelFor, LaunchBounds>(
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
- false);
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
} else if (RP::rank == 5) {
// id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4 to
// threadIdx.z
@@ -175,8 +172,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
(m_rp.m_upper[4] - m_rp.m_lower[4] + block.z - 1) / block.z,
maxblocks[2]));
CudaParallelLaunch<ParallelFor, LaunchBounds>(
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
- false);
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
} else if (RP::rank == 6) {
// id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4,id5 to
// threadIdx.z
@@ -191,8 +187,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
std::min<array_index_type>(m_rp.m_tile_end[4] * m_rp.m_tile_end[5],
maxblocks[2]));
CudaParallelLaunch<ParallelFor, LaunchBounds>(
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
- false);
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
} else {
Kokkos::abort("Kokkos::MDRange Error: Exceeded rank bounds with Cuda\n");
}
@@ -405,8 +400,8 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
CudaParallelLaunch<ParallelReduce, LaunchBounds>(
*this, grid, block, shmem,
- m_policy.space().impl_internal_space_instance(),
- false); // copy to device and execute
+ m_policy.space()
+ .impl_internal_space_instance()); // copy to device and execute
if (!m_result_ptr_device_accessible) {
if (m_result_ptr) {
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
index ac160f8fe2..d1031751c2 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
@@ -135,8 +135,7 @@ class ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
#endif
CudaParallelLaunch<ParallelFor, LaunchBounds>(
- *this, grid, block, 0, m_policy.space().impl_internal_space_instance(),
- false);
+ *this, grid, block, 0, m_policy.space().impl_internal_space_instance());
}
ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)
@@ -375,8 +374,8 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
CudaParallelLaunch<ParallelReduce, LaunchBounds>(
*this, grid, block, shmem,
- m_policy.space().impl_internal_space_instance(),
- false); // copy to device and execute
+ m_policy.space()
+ .impl_internal_space_instance()); // copy to device and execute
if (!m_result_ptr_device_accessible) {
if (m_result_ptr) {
@@ -726,16 +725,16 @@ class ParallelScan<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
m_final = false;
CudaParallelLaunch<ParallelScan, LaunchBounds>(
*this, grid, block, shmem,
- m_policy.space().impl_internal_space_instance(),
- false); // copy to device and execute
+ m_policy.space()
+ .impl_internal_space_instance()); // copy to device and execute
#ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
}
#endif
m_final = true;
CudaParallelLaunch<ParallelScan, LaunchBounds>(
*this, grid, block, shmem,
- m_policy.space().impl_internal_space_instance(),
- false); // copy to device and execute
+ m_policy.space()
+ .impl_internal_space_instance()); // copy to device and execute
}
}
@@ -1038,16 +1037,16 @@ class ParallelScanWithTotal<FunctorType, Kokkos::RangePolicy<Traits...>,
m_final = false;
CudaParallelLaunch<ParallelScanWithTotal, LaunchBounds>(
*this, grid, block, shmem,
- m_policy.space().impl_internal_space_instance(),
- false); // copy to device and execute
+ m_policy.space()
+ .impl_internal_space_instance()); // copy to device and execute
#ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
}
#endif
m_final = true;
CudaParallelLaunch<ParallelScanWithTotal, LaunchBounds>(
*this, grid, block, shmem,
- m_policy.space().impl_internal_space_instance(),
- false); // copy to device and execute
+ m_policy.space()
+ .impl_internal_space_instance()); // copy to device and execute
const int size = Analysis::value_size(m_functor);
#ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp
index cdd16085b3..ea9430b812 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp
@@ -552,8 +552,8 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
CudaParallelLaunch<ParallelFor, LaunchBounds>(
*this, grid, block, shmem_size_total,
- m_policy.space().impl_internal_space_instance(),
- true); // copy to device and execute
+ m_policy.space()
+ .impl_internal_space_instance()); // copy to device and execute
}
ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)
@@ -878,8 +878,8 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
CudaParallelLaunch<ParallelReduce, LaunchBounds>(
*this, grid, block, shmem_size_total,
- m_policy.space().impl_internal_space_instance(),
- true); // copy to device and execute
+ m_policy.space()
+ .impl_internal_space_instance()); // copy to device and execute
if (!m_result_ptr_device_accessible) {
m_policy.space().fence(
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
index 34d4bef9fd..178012431c 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
@@ -428,11 +428,6 @@ struct CudaReductionsFunctor<FunctorType, false, false> {
// __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
// function qualifier which could be used to improve performance.
//----------------------------------------------------------------------------
-// Maximize shared memory and minimize L1 cache:
-// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShared );
-// For 2.0 capability: 48 KB shared and 16 KB L1
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
/*
* Algorithmic constraints:
* (a) blockDim.y <= 1024
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp
index fb3a6b138f..a12378a891 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp
@@ -100,8 +100,7 @@ class ParallelFor<FunctorType, Kokkos::WorkGraphPolicy<Traits...>,
const int shared = 0;
Kokkos::Impl::CudaParallelLaunch<Self>(
- *this, grid, block, shared, Cuda().impl_internal_space_instance(),
- false);
+ *this, grid, block, shared, Cuda().impl_internal_space_instance());
}
inline ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)

View File

@ -1,46 +0,0 @@
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
index 30b6958a67..b94f053272 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
@@ -207,41 +207,6 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
LaunchBounds{});
}
-// NOTE these number can be obtained several ways:
-// * One option is to download the CUDA Occupancy Calculator spreadsheet, select
-// "Compute Capability" first and check what is the smallest "Shared Memory
-// Size Config" that is available. The "Shared Memory Per Multiprocessor" in
-// bytes is then to be found below in the summary.
-// * Another option would be to look for the information in the "Tuning
-// Guide(s)" of the CUDA Toolkit Documentation for each GPU architecture, in
-// the "Shared Memory" section (more tedious)
-inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
- int const compute_capability = properties.major * 10 + properties.minor;
- return [compute_capability]() {
- switch (compute_capability) {
- case 30:
- case 32:
- case 35: return 16;
- case 37: return 80;
- case 50:
- case 53:
- case 60:
- case 62: return 64;
- case 52:
- case 61: return 96;
- case 70:
- case 80:
- case 86:
- case 90: return 8;
- case 75: return 32;
- default:
- Kokkos::Impl::throw_runtime_exception(
- "Unknown device in cuda block size deduction");
- }
- return 0;
- }() * 1024;
-}
-
} // namespace Impl
} // namespace Kokkos

View File

@ -1,204 +0,0 @@
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
index b94f053272..252c13c524 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
@@ -53,17 +53,69 @@
namespace Kokkos {
namespace Impl {
+inline int cuda_warp_per_sm_allocation_granularity(
+ cudaDeviceProp const& properties) {
+ // Allocation granularity of warps in each sm
+ switch (properties.major) {
+ case 3:
+ case 5:
+ case 7:
+ case 8:
+ case 9: return 4;
+ case 6: return (properties.minor == 0 ? 2 : 4);
+ default:
+ throw_runtime_exception(
+ "Unknown device in cuda warp per sm allocation granularity");
+ return 0;
+ }
+}
+
+inline int cuda_max_warps_per_sm_registers(
+ cudaDeviceProp const& properties, cudaFuncAttributes const& attributes) {
+ // Maximum number of warps per sm as a function of register counts,
+ // subject to the constraint that warps are allocated with a fixed granularity
+ int const max_regs_per_block = properties.regsPerBlock;
+ int const regs_per_warp = attributes.numRegs * properties.warpSize;
+ int const warp_granularity =
+ cuda_warp_per_sm_allocation_granularity(properties);
+ // The granularity of register allocation is chunks of 256 registers per warp,
+ // which implies a need to over-allocate, so we round up
+ int const allocated_regs_per_warp = (regs_per_warp + 256 - 1) / 256;
+
+ // The maximum number of warps per SM is constrained from above by register
+ // allocation. To satisfy the constraint that warps per SM is allocated at a
+ // finite granularity, we need to round down.
+ int const max_warps_per_sm =
+ warp_granularity *
+ (max_regs_per_block / (allocated_regs_per_warp * warp_granularity));
+
+ return max_warps_per_sm;
+}
+
inline int cuda_max_active_blocks_per_sm(cudaDeviceProp const& properties,
cudaFuncAttributes const& attributes,
int block_size, size_t dynamic_shmem) {
- // Limits due do registers/SM
+ // Limits due to registers/SM
int const regs_per_sm = properties.regsPerMultiprocessor;
int const regs_per_thread = attributes.numRegs;
// The granularity of register allocation is chunks of 256 registers per warp
// -> 8 registers per thread
int const allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8);
- int const max_blocks_regs =
- regs_per_sm / (allocated_regs_per_thread * block_size);
+ int max_blocks_regs = regs_per_sm / (allocated_regs_per_thread * block_size);
+
+ // Compute the maximum number of warps as a function of the number of
+ // registers
+ int const max_warps_per_sm_registers =
+ cuda_max_warps_per_sm_registers(properties, attributes);
+
+ // Constrain the number of blocks to respect the maximum number of warps per
+ // SM On face value this should be an equality, but due to the warp
+ // granularity constraints noted in `cuda_max_warps_per_sm_registers` the
+ // left-hand-side of this comparison can overshoot what the hardware allows
+ // based on register counts alone
+ while ((max_blocks_regs * block_size / properties.warpSize) >
+ max_warps_per_sm_registers)
+ max_blocks_regs--;
// Limits due to shared memory/SM
size_t const shmem_per_sm = properties.sharedMemPerMultiprocessor;
@@ -207,6 +259,19 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
LaunchBounds{});
}
+template <class LaunchBounds>
+int cuda_get_opt_block_size_no_shmem(const cudaFuncAttributes& attr,
+ LaunchBounds) {
+ auto const& prop = Kokkos::Cuda().cuda_device_prop();
+
+ // Thin version of cuda_get_opt_block_size for cases where there is no shared
+ // memory
+ auto const block_size_to_no_shmem = [&](int /*block_size*/) { return 0; };
+
+ return cuda_deduce_block_size(false, prop, attr, block_size_to_no_shmem,
+ LaunchBounds{});
+}
+
} // namespace Impl
} // namespace Kokkos
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
index 5c4c3a7d39..170183ca0a 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
@@ -188,9 +188,23 @@ inline void configure_shmem_preference(const KernelFuncPtr& func,
// The granularity of register allocation is chunks of 256 registers per warp
// -> 8 registers per thread
const size_t allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8);
- const size_t max_blocks_regs =
+ size_t max_blocks_regs =
regs_per_sm / (allocated_regs_per_thread * block_size);
+ // Compute the maximum number of warps as a function of the number of
+ // registers
+ const size_t max_warps_per_sm_registers =
+ cuda_max_warps_per_sm_registers(device_props, func_attr);
+
+ // Constrain the number of blocks to respect the maximum number of warps per
+ // SM On face value this should be an equality, but due to the warp
+ // granularity constraints noted in `cuda_max_warps_per_sm_registers` the
+ // left-hand-side of this comparison can overshoot what the hardware allows
+ // based on register counts alone
+ while ((max_blocks_regs * block_size / device_props.warpSize) >
+ max_warps_per_sm_registers)
+ max_blocks_regs--;
+
// Compute how many threads per sm we actually want
const size_t max_threads_per_sm = device_props.maxThreadsPerMultiProcessor;
// only allocate multiples of warp size
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
index 0e348c092a..7e4f62f12e 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
@@ -67,6 +67,34 @@
namespace Kokkos {
namespace Impl {
+template <typename ParallelType, typename Policy, typename LaunchBounds>
+int max_tile_size_product_helper(const Policy& pol, const LaunchBounds&) {
+ cudaFuncAttributes attr =
+ CudaParallelLaunch<ParallelType,
+ LaunchBounds>::get_cuda_func_attributes();
+ auto const& prop = pol.space().cuda_device_prop();
+
+ // Limits due to registers/SM, MDRange doesn't have
+ // shared memory constraints
+ int const optimal_block_size =
+ Kokkos::Impl::cuda_get_opt_block_size_no_shmem(attr, LaunchBounds{});
+
+ // Compute how many blocks of this size we can launch, based on warp
+ // constraints
+ int const max_warps_per_sm_registers =
+ Kokkos::Impl::cuda_max_warps_per_sm_registers(prop, attr);
+ int const max_num_threads_from_warps =
+ max_warps_per_sm_registers * prop.warpSize;
+ int const max_num_blocks = max_num_threads_from_warps / optimal_block_size;
+
+ // Compute the total number of threads
+ int const max_threads_per_sm = optimal_block_size * max_num_blocks;
+
+ return std::min(
+ max_threads_per_sm,
+ static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
+}
+
template <class FunctorType, class... Traits>
class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
public:
@@ -85,18 +113,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
public:
template <typename Policy, typename Functor>
static int max_tile_size_product(const Policy& pol, const Functor&) {
- cudaFuncAttributes attr =
- CudaParallelLaunch<ParallelFor,
- LaunchBounds>::get_cuda_func_attributes();
- auto const& prop = pol.space().cuda_device_prop();
- // Limits due to registers/SM, MDRange doesn't have
- // shared memory constraints
- int const regs_per_sm = prop.regsPerMultiprocessor;
- int const regs_per_thread = attr.numRegs;
- int const max_threads_per_sm = regs_per_sm / regs_per_thread;
- return std::min(
- max_threads_per_sm,
- static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
+ return max_tile_size_product_helper<ParallelFor>(pol, LaunchBounds{});
}
Policy const& get_policy() const { return m_rp; }
inline __device__ void operator()() const {
@@ -258,17 +275,7 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
public:
template <typename Policy, typename Functor>
static int max_tile_size_product(const Policy& pol, const Functor&) {
- cudaFuncAttributes attr =
- CudaParallelLaunch<ParallelReduce,
- LaunchBounds>::get_cuda_func_attributes();
- auto const& prop = pol.space().cuda_device_prop();
- // Limits due do registers/SM
- int const regs_per_sm = prop.regsPerMultiprocessor;
- int const regs_per_thread = attr.numRegs;
- int const max_threads_per_sm = regs_per_sm / regs_per_thread;
- return std::min(
- max_threads_per_sm,
- static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
+ return max_tile_size_product_helper<ParallelReduce>(pol, LaunchBounds{});
}
Policy const& get_policy() const { return m_policy; }
inline __device__ void exec_range(reference_type update) const {

View File

@ -1,63 +0,0 @@
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
index 170183ca0a..ba43e362bb 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
@@ -412,12 +412,16 @@ struct CudaParallelLaunchKernelInvoker<
Impl::check_shmem_request(cuda_instance, shmem);
if (DriverType::Policy::
experimental_contains_desired_occupancy) {
+ /*
int desired_occupancy =
driver.get_policy().impl_get_desired_occupancy().value();
size_t block_size = block.x * block.y * block.z;
Impl::configure_shmem_preference<DriverType, LaunchBounds>(
base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
- shmem, desired_occupancy);
+ shmem, desired_occupancy);*/
+ Kokkos::Impl::throw_runtime_exception(
+ std::string("Cuda graph node creation FAILED:"
+ " occupancy requests are currently broken."));
}
void const* args[] = {&driver};
@@ -511,14 +515,17 @@ struct CudaParallelLaunchKernelInvoker<
if (!Impl::is_empty_launch(grid, block)) {
Impl::check_shmem_request(cuda_instance, shmem);
- if constexpr (DriverType::Policy::
+ if (DriverType::Policy::
experimental_contains_desired_occupancy) {
- int desired_occupancy =
+ /*int desired_occupancy =
driver.get_policy().impl_get_desired_occupancy().value();
size_t block_size = block.x * block.y * block.z;
Impl::configure_shmem_preference<DriverType, LaunchBounds>(
base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
- shmem, desired_occupancy);
+ shmem, desired_occupancy);*/
+ Kokkos::Impl::throw_runtime_exception(
+ std::string("Cuda graph node creation FAILED:"
+ " occupancy requests are currently broken."));
}
auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver);
@@ -690,14 +697,17 @@ struct CudaParallelLaunchImpl<
if (DriverType::Policy::
experimental_contains_desired_occupancy) {
- int desired_occupancy =
+ /*int desired_occupancy =
driver.get_policy().impl_get_desired_occupancy().value();
size_t block_size = block.x * block.y * block.z;
Impl::configure_shmem_preference<
DriverType,
Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
- shmem, desired_occupancy);
+ shmem, desired_occupancy);*/
+ Kokkos::Impl::throw_runtime_exception(
+ std::string("Cuda graph node creation FAILED:"
+ " occupancy requests are currently broken."));
}
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();

30
potentials/HGa.msmeam Normal file
View File

@ -0,0 +1,30 @@
bkgd_dyn = 1
emb_lin_neg = 1
augt1=0
ialloy=1
rc = 5.9
#H
attrac(1,1)=0.460
repuls(1,1)=0.460
Cmin(1,1,1)=1.3 # PuMS
Cmax(1,1,1)= 2.80
nn2(1,1)=1
#Ga
rho0(2) = 0.6
attrac(2,2)=0.097
repuls(2,2)=0.097
nn2(2,2)=1
#HGa
attrac(1,2)=0.300
repuls(1,2)=0.300
lattce(1,2)=l12
re(1,2)=3.19
delta(1,2)=-0.48
alpha(1,2)=6.6
Cmin(1,1,2)=2.0
Cmin(2,1,2)= 2.0
Cmin(1,2,1)=2.0
Cmin(2,2,1) = 1.4
Cmin(1,2,2) = 1.4
Cmin(1,1,2) = 1.4
nn2(1,2)=1

14
potentials/library.msmeam Normal file
View File

@ -0,0 +1,14 @@
# DATE: 2018-09-22 UNITS: metal CONTRIBUTOR: Steve Valone, smv@lanl.gov CITATION: Baskes, PRB 1992; smv, sr, mib, JNM 2010
# ms-meam data format May 2010
# elt lat z ielement atwt
# alpha b0 b1 b2 b3 b1m b2m b3m alat esub asub
# - t0 t1 t2 t3 t1m t2m t3m rozero ibar
# NOTE: leading character cannot be a space
'H' 'dim' 1.0 1 1.0079
2.960 2.960 3.0 1.0 1.0 1.0 3.0 1.0 0.741 2.235 2.50
1.0 0.44721 0.0 0.00 0.0 0.31623 0 6.70 0
'Ga4' 'fcc' 12.0 31 69.723
4.42 4.80 3.10 6.00 0.00 0.0 0.0 0.5 4.247 2.897 0.97
1.0 1.649 1.435 0.00 0.0 0.0 2.0 0.70 0

View File

@ -316,6 +316,8 @@ action pair_lj_spica_kokkos.cpp pair_lj_spica.cpp
action pair_lj_spica_kokkos.h pair_lj_spica.h
action pair_meam_kokkos.cpp pair_meam.cpp
action pair_meam_kokkos.h pair_meam.h
action pair_meam_ms_kokkos.cpp pair_meam_ms.cpp
action pair_meam_ms_kokkos.h pair_meam_ms.h
action pair_mliap_kokkos.cpp pair_mliap.cpp
action pair_mliap_kokkos.h pair_mliap.h
action pair_morse_kokkos.cpp

View File

@ -61,34 +61,61 @@ void MEAMKokkos<DeviceType>::operator()(TagMEAMDensFinal, const int &i, EV_FLOAT
if (elti >= 0) {
scaleii = d_scale(type[i],type[i]);
d_rho1[i] = 0.0;
d_rho2[i] = -1.0 / 3.0 * d_arho2b[i] * d_arho2b[i];
if (msmeamflag) {
d_rho2[i] = -1.0 / 3.0 * (d_arho2b[i] * d_arho2b[i]
- d_arho2mb[i] * d_arho2mb[i]);
} else{
d_rho2[i] = -1.0 / 3.0 * d_arho2b[i] * d_arho2b[i];
}
d_rho3[i] = 0.0;
for (int m = 0; m < 3; m++) {
d_rho1[i] += d_arho1(i,m) * d_arho1(i,m);
d_rho3[i] -= 3.0 / 5.0 * d_arho3b(i,m) * d_arho3b(i,m);
}
for (int m = 0; m < 6; m++)
d_rho2[i] += v2D[m] * d_arho2(i,m) * d_arho2(i,m);
for (int m = 0; m < 10; m++)
d_rho3[i] += v3D[m] * d_arho3(i,m) * d_arho3(i,m);
if (d_rho0[i] > 0.0) {
if (ialloy == 1) {
d_t_ave(i,0) = fdiv_zero_kk(d_t_ave(i,0), d_tsq_ave(i,0));
d_t_ave(i,1) = fdiv_zero_kk(d_t_ave(i,1), d_tsq_ave(i,1));
d_t_ave(i,2) = fdiv_zero_kk(d_t_ave(i,2), d_tsq_ave(i,2));
} else if (ialloy == 2) {
d_t_ave(i,0) = t1_meam[elti];
d_t_ave(i,1) = t2_meam[elti];
d_t_ave(i,2) = t3_meam[elti];
} else {
d_t_ave(i,0) /= d_rho0[i];
d_t_ave(i,1) /= d_rho0[i];
d_t_ave(i,2) /= d_rho0[i];
if (msmeamflag) {
d_rho1[i] = d_rho1[i] + d_arho1(i, m) * d_arho1(i, m)
- d_arho1m(i, m) * d_arho1m(i, m);
d_rho3[i] = d_rho3[i] - 3.0 / 5.0 * (d_arho3b(i, m) * d_arho3b(i, m)
- d_arho3mb(i, m) * d_arho3mb(i, m));
} else{
d_rho1[i] += d_arho1(i,m) * d_arho1(i,m);
d_rho3[i] -= 3.0 / 5.0 * d_arho3b(i,m) * d_arho3b(i,m);
}
}
for (int m = 0; m < 6; m++){
if (msmeamflag) {
d_rho2[i] = d_rho2[i] + v2D[m] * (d_arho2(i, m) * d_arho2(i, m)
- d_arho2m(i, m) * d_arho2m(i, m));
} else{
d_rho2[i] += v2D[m] * d_arho2(i,m) * d_arho2(i,m);
}
}
for (int m = 0; m < 10; m++)
if (msmeamflag) {
d_rho3[i] = d_rho3[i] + v3D[m] * (d_arho3(i, m) * d_arho3(i, m)
- d_arho3m(i, m) * d_arho3m(i, m));
} else{
d_rho3[i] += v3D[m] * d_arho3(i,m) * d_arho3(i,m);
}
d_gamma[i] = d_t_ave(i,0) * d_rho1[i] + d_t_ave(i,1) * d_rho2[i] + d_t_ave(i,2) * d_rho3[i];
if (msmeamflag) {
// with msmeam all t weights are already accounted for in rho
d_gamma[i] = d_rho1[i] + d_rho2[i] + d_rho3[i];
} else{
if (d_rho0[i] > 0.0) {
if (ialloy == 1) {
d_t_ave(i,0) = fdiv_zero_kk(d_t_ave(i,0), d_tsq_ave(i,0));
d_t_ave(i,1) = fdiv_zero_kk(d_t_ave(i,1), d_tsq_ave(i,1));
d_t_ave(i,2) = fdiv_zero_kk(d_t_ave(i,2), d_tsq_ave(i,2));
} else if (ialloy == 2) {
d_t_ave(i,0) = t1_meam[elti];
d_t_ave(i,1) = t2_meam[elti];
d_t_ave(i,2) = t3_meam[elti];
} else {
d_t_ave(i,0) /= d_rho0[i];
d_t_ave(i,1) /= d_rho0[i];
d_t_ave(i,2) /= d_rho0[i];
}
}
d_gamma[i] = d_t_ave(i,0) * d_rho1[i] + d_t_ave(i,1) * d_rho2[i] + d_t_ave(i,2) * d_rho3[i];
}
if (d_rho0[i] > 0.0)
d_gamma[i] /= (d_rho0[i] * d_rho0[i]);

View File

@ -43,11 +43,23 @@ void MEAMKokkos<DeviceType>::operator()(TagMEAMZero, const int &i) const {
d_rho0[i] = 0.0;
d_arho2b[i] = 0.0;
d_arho1(i,0) = d_arho1(i,1) = d_arho1(i,2) = 0.0;
for (int j = 0; j < 6; j++)
if (msmeamflag) {
d_arho2mb[i] = 0.0;
d_arho1m(i,0) = d_arho1m(i,1) = d_arho1m(i,2) = 0.0;
}
for (int j = 0; j < 6; j++) {
d_arho2(i,j) = 0.0;
for (int j = 0; j < 10; j++)
if (msmeamflag)
d_arho2m(i,j) = 0.0;
}
for (int j = 0; j < 10; j++) {
d_arho3(i,j) = 0.0;
if (msmeamflag)
d_arho3m(i,j) = 0.0;
}
d_arho3b(i,0) = d_arho3b(i,1) = d_arho3b(i,2) = 0.0;
if (msmeamflag)
d_arho3mb(i,0) = d_arho3mb(i,1) = d_arho3mb(i,2) = 0.0;
d_t_ave(i,0) = d_t_ave(i,1) = d_t_ave(i,2) = 0.0;
d_tsq_ave(i,0) = d_tsq_ave(i,1) = d_tsq_ave(i,2) = 0.0;
}
@ -80,13 +92,20 @@ MEAMKokkos<DeviceType>::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
memoryKK->destroy_kokkos(k_arho3b,arho3b);
memoryKK->destroy_kokkos(k_t_ave,t_ave);
memoryKK->destroy_kokkos(k_tsq_ave,tsq_ave);
// msmeam
memoryKK->destroy_kokkos(k_arho2mb, arho2mb);
memoryKK->destroy_kokkos(k_arho1m, arho1m);
memoryKK->destroy_kokkos(k_arho2m, arho2m);
memoryKK->destroy_kokkos(k_arho3m, arho3m);
memoryKK->destroy_kokkos(k_arho3mb, arho3mb);
nmax = atom_nmax;
// memory->create(rho, nmax, "pair:rho");
//memory->create(rho, nmax, "pair:rho");
k_rho = DAT::tdual_ffloat_1d("pair:rho",nmax);
d_rho = k_rho.template view<DeviceType>();
h_rho = k_rho.h_view;
// memory->create(rho0, nmax, "pair:rho0");
//memory->create(rho0, nmax, "pair:rho0");
k_rho0 = DAT::tdual_ffloat_1d("pair:rho0",nmax);
d_rho0 = k_rho0.template view<DeviceType>();
h_rho0 = k_rho0.h_view;
@ -150,6 +169,28 @@ MEAMKokkos<DeviceType>::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
k_tsq_ave = DAT::tdual_ffloat_2d("pair:tsq_ave",nmax, 3);
d_tsq_ave = k_tsq_ave.template view<DeviceType>();
h_tsq_ave = k_tsq_ave.h_view;
// msmeam
//memory->create(arho2mb, nmax, "pair:arho2mb");
k_arho2mb = DAT::tdual_ffloat_1d("pair:arho2mb",nmax);
d_arho2mb = k_arho2mb.template view<DeviceType>();
h_arho2mb = k_arho2mb.h_view;
//memory->create(arho1m, nmax, 3, "pair:arho1m");
k_arho1m = DAT::tdual_ffloat_2d("pair:arho1m", nmax, 3);
d_arho1m = k_arho1m.template view<DeviceType>();
h_arho1m = k_arho1m.h_view;
//memory->create(arho2m, nmax, 6, "pair:arho2m");
k_arho2m = DAT::tdual_ffloat_2d("pair:arho2m", nmax, 6);
d_arho2m = k_arho2m.template view<DeviceType>();
h_arho2m = k_arho2m.h_view;
//memory->create(arho3m, nmax, 10, "pair:arho3m");
k_arho3m = DAT::tdual_ffloat_2d("pair:arho3m", nmax, 10);
d_arho3m = k_arho3m.template view<DeviceType>();
h_arho3m = k_arho3m.h_view;
//memory->create(arho3mb, nmax, 3, "pair:arho3mb");
k_arho3mb = DAT::tdual_ffloat_2d("pair:arho3mb", nmax, 3);
d_arho3mb = k_arho3mb.template view<DeviceType>();
h_arho3mb = k_arho3mb.h_view;
}
if (n_neigh > maxneigh) {
@ -206,6 +247,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
dup_arho3b = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho3b);
dup_t_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_t_ave);
dup_tsq_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_tsq_ave);
// msmeam
dup_arho2mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho2mb);
dup_arho1m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho1m);
dup_arho2m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho2m);
dup_arho3m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho3m);
dup_arho3mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho3mb);
} else {
ndup_rho0 = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_rho0);
ndup_arho2b = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho2b);
@ -215,6 +262,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
ndup_arho3b = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho3b);
ndup_t_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_t_ave);
ndup_tsq_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_tsq_ave);
// msmeam
ndup_arho2mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho2mb);
ndup_arho1m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho1m);
ndup_arho2m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho2m);
ndup_arho3m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho3m);
ndup_arho3mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho3mb);
}
copymode = 1;
@ -233,6 +286,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
Kokkos::Experimental::contribute(d_arho3b, dup_arho3b);
Kokkos::Experimental::contribute(d_t_ave, dup_t_ave);
Kokkos::Experimental::contribute(d_tsq_ave, dup_tsq_ave);
// msmeam
Kokkos::Experimental::contribute(d_arho2mb, dup_arho2mb);
Kokkos::Experimental::contribute(d_arho1m, dup_arho1m);
Kokkos::Experimental::contribute(d_arho2m, dup_arho2m);
Kokkos::Experimental::contribute(d_arho3m, dup_arho3m);
Kokkos::Experimental::contribute(d_arho3mb, dup_arho3mb);
// free duplicated memory
dup_rho0 = decltype(dup_rho0)();
@ -243,6 +302,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
dup_arho3b = decltype(dup_arho3b)();
dup_t_ave = decltype(dup_t_ave)();
dup_tsq_ave = decltype(dup_tsq_ave)();
// msmeam
dup_arho2mb = decltype(dup_arho2mb)();
dup_arho1m = decltype(dup_arho1m)();
dup_arho2m = decltype(dup_arho2m)();
dup_arho3m = decltype(dup_arho3m)();
dup_arho3mb = decltype(dup_arho3mb)();
}
}
@ -417,7 +482,6 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
int offset) const
{
// The rho0, etc. arrays are duplicated for OpenMP, atomic for CUDA, and neither for Serial
auto v_rho0 = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_rho0),decltype(ndup_rho0)>::get(dup_rho0,ndup_rho0);
auto a_rho0 = v_rho0.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
auto v_arho2b = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho2b),decltype(ndup_arho2b)>::get(dup_arho2b,ndup_arho2b);
@ -434,6 +498,17 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
auto a_t_ave = v_t_ave.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
auto v_tsq_ave = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_tsq_ave),decltype(ndup_tsq_ave)>::get(dup_tsq_ave,ndup_tsq_ave);
auto a_tsq_ave = v_tsq_ave.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
// msmeam
auto v_arho2mb = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho2mb),decltype(ndup_arho2mb)>::get(dup_arho2mb,ndup_arho2mb);
auto a_arho2mb = v_arho2mb.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
auto v_arho1m = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho1m),decltype(ndup_arho1m)>::get(dup_arho1m,ndup_arho1m);
auto a_arho1m = v_arho1m.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
auto v_arho2m = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho2m),decltype(ndup_arho2m)>::get(dup_arho2m,ndup_arho2m);
auto a_arho2m = v_arho2m.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
auto v_arho3m = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho3m),decltype(ndup_arho3m)>::get(dup_arho3m,ndup_arho3m);
auto a_arho3m = v_arho3m.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
auto v_arho3mb = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho3mb),decltype(ndup_arho3mb)>::get(dup_arho3mb,ndup_arho3mb);
auto a_arho3mb = v_arho3mb.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
const int elti = d_map[type[i]];
const double xtmp = x(i,0);
@ -463,6 +538,16 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
double rhoa1i = ro0i * MathSpecialKokkos::fm_exp(-beta1_meam[elti] * ai) * sij;
double rhoa2i = ro0i * MathSpecialKokkos::fm_exp(-beta2_meam[elti] * ai) * sij;
double rhoa3i = ro0i * MathSpecialKokkos::fm_exp(-beta3_meam[elti] * ai) * sij;
// msmeam
double rhoa1mj, rhoa2mj, rhoa3mj, rhoa1mi, rhoa2mi, rhoa3mi;
if (msmeamflag) {
rhoa1mj = ro0j * t1m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta1m_meam[eltj] * aj) * sij;
rhoa2mj = ro0j * t2m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta2m_meam[eltj] * aj) * sij;
rhoa3mj = ro0j * t3m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta3m_meam[eltj] * aj) * sij;
rhoa1mi = ro0i * t1m_meam[elti] * MathSpecialKokkos::fm_exp(-beta1m_meam[elti] * ai) * sij;
rhoa2mi = ro0i * t2m_meam[elti] * MathSpecialKokkos::fm_exp(-beta2m_meam[elti] * ai) * sij;
rhoa3mi = ro0i * t3m_meam[elti] * MathSpecialKokkos::fm_exp(-beta3m_meam[elti] * ai) * sij;
}
if (ialloy == 1) {
rhoa1j *= t1_meam[eltj];
rhoa2j *= t2_meam[eltj];
@ -499,20 +584,45 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
const double A1i = rhoa1i / rij;
const double A2i = rhoa2i / rij2;
const double A3i = rhoa3i / (rij2 * rij);
double A1mj, A2mj, A3mj, A1mi, A2mi, A3mi;
if (msmeamflag) {
a_arho2mb[i] += rhoa2mj;
a_arho2mb[j] += rhoa2mi;
A1mj = rhoa1mj / rij;
A2mj = rhoa2mj / rij2;
A3mj = rhoa3mj / (rij2 * rij);
A1mi = rhoa1mi / rij;
A2mi = rhoa2mi / rij2;
A3mi = rhoa3mi / (rij2 * rij);
}
int nv2 = 0;
int nv3 = 0;
for (int m = 0; m < 3; m++) {
a_arho1(i,m) += A1j * delij[m];
a_arho1(i,m) += A1j * delij[m];
a_arho1(j,m) += -A1i * delij[m];
a_arho3b(i,m) += rhoa3j * delij[m] / rij;
a_arho3b(i,m) += rhoa3j * delij[m] / rij;
a_arho3b(j,m) += -rhoa3i * delij[m] / rij;
if (msmeamflag) {
a_arho1m(i,m) += A1mj * delij[m];
a_arho1m(j,m) += -A1mi * delij[m];
a_arho3mb(i,m) += rhoa3mj * delij[m] / rij;
a_arho3mb(j,m) += -rhoa3mi * delij[m] / rij;
}
for (int n = m; n < 3; n++) {
a_arho2(i,nv2) += A2j * delij[m] * delij[n];
a_arho2(j,nv2) += A2i * delij[m] * delij[n];
if (msmeamflag) {
a_arho2m(i,nv2) += A2mj * delij[m] * delij[n];
a_arho2m(j,nv2) += A2mi * delij[m] * delij[n];
}
nv2++;
for (int p = n; p < 3; p++) {
a_arho3(i,nv3) += A3j * delij[m] * delij[n] * delij[p];
a_arho3(i,nv3) += A3j * delij[m] * delij[n] * delij[p];
a_arho3(j,nv3) += -A3i * delij[m] * delij[n] * delij[p];
if (msmeamflag) {
a_arho3m(i,nv3) += A3mj * delij[m] * delij[n] * delij[p];
a_arho3m(j,nv3) += -A3mi * delij[m] * delij[n] * delij[p];
}
nv3++;
}
}

View File

@ -119,6 +119,17 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
double dsij1, dsij2, force1, force2;
double t1i, t2i, t3i, t1j, t2j, t3j;
int fnoffset;
// msmeam
double rhoa1mj,drhoa1mj,rhoa1mi,drhoa1mi;
double rhoa2mj,drhoa2mj,rhoa2mi,drhoa2mi;
double rhoa3mj, drhoa3mj, rhoa3mi, drhoa3mi;
double arg1i1m, arg1j1m, arg1i2m, arg1j2m, arg1i3m, arg1j3m, arg3i3m, arg3j3m;
double drho1mdr1, drho1mdr2, drho1mds1, drho1mds2;
double drho1mdrm1[3], drho1mdrm2[3];
double drho2mdr1, drho2mdr2, drho2mds1, drho2mds2;
double drho2mdrm1[3], drho2mdrm2[3];
double drho3mdr1, drho3mdr2, drho3mds1, drho3mds2;
double drho3mdrm1[3], drho3mdrm2[3];
// The f, etc. arrays are duplicated for OpenMP, atomic for CUDA, and neither for Serial
@ -197,6 +208,14 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
drhoa2i = -beta2_meam[elti] * invrei * rhoa2i;
rhoa3i = ro0i * MathSpecialKokkos::fm_exp(-beta3_meam[elti] * ai);
drhoa3i = -beta3_meam[elti] * invrei * rhoa3i;
if (msmeamflag) {
rhoa1mi = ro0i * MathSpecialKokkos::fm_exp(-beta1m_meam[elti] * ai) * t1m_meam[elti];
drhoa1mi = -beta1m_meam[elti] * invrei * rhoa1mi;
rhoa2mi = ro0i * MathSpecialKokkos::fm_exp(-beta2m_meam[elti] * ai) * t2m_meam[elti];
drhoa2mi = -beta2m_meam[elti] * invrei * rhoa2mi;
rhoa3mi = ro0i * MathSpecialKokkos::fm_exp(-beta3m_meam[elti] * ai) * t3m_meam[elti];
drhoa3mi = -beta3m_meam[elti] * invrei * rhoa3mi;
}
if (elti != eltj) {
invrej = 1.0 / re_meam[eltj][eltj];
@ -210,6 +229,14 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
drhoa2j = -beta2_meam[eltj] * invrej * rhoa2j;
rhoa3j = ro0j * MathSpecialKokkos::fm_exp(-beta3_meam[eltj] * aj);
drhoa3j = -beta3_meam[eltj] * invrej * rhoa3j;
if (msmeamflag) {
rhoa1mj = ro0j * MathSpecialKokkos::fm_exp(-beta1m_meam[eltj] * aj) * t1m_meam[eltj];
drhoa1mj = -beta1m_meam[eltj] * invrej * rhoa1mj;
rhoa2mj = ro0j * MathSpecialKokkos::fm_exp(-beta2m_meam[eltj] * aj) * t2m_meam[eltj];
drhoa2mj = -beta2m_meam[eltj] * invrej * rhoa2mj;
rhoa3mj = ro0j * MathSpecialKokkos::fm_exp(-beta3m_meam[eltj] * aj) * t3m_meam[eltj];
drhoa3mj = -beta3m_meam[eltj] * invrej * rhoa3mj;
}
} else {
rhoa0j = rhoa0i;
drhoa0j = drhoa0i;
@ -219,6 +246,14 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
drhoa2j = drhoa2i;
rhoa3j = rhoa3i;
drhoa3j = drhoa3i;
if (msmeamflag) {
rhoa1mj = rhoa1mi;
drhoa1mj = drhoa1mi;
rhoa2mj = rhoa2mi;
drhoa2mj = drhoa2mi;
rhoa3mj = rhoa3mi;
drhoa3mj = drhoa3mi;
}
}
const double t1mi = t1_meam[elti];
@ -228,7 +263,10 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
const double t2mj = t2_meam[eltj];
const double t3mj = t3_meam[eltj];
if (ialloy == 1) {
// ialloy mod not needed in MS-MEAM, but similarity here is that we multply rhos by t.
// We did this above with rhoa1mj, rhoa2mj, etc.
if (ialloy == 1 || msmeamflag) {
rhoa1j *= t1mj;
rhoa2j *= t2mj;
rhoa3j *= t3mj;
@ -272,6 +310,38 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
arg3j3 = arg3j3 - d_arho3b(j, n) * delij[n];
}
// msmeam arhom args
nv2 = 0.0;
nv3 = 0.0;
arg1i1m = 0.0;
arg1j1m = 0.0;
arg1i2m = 0.0;
arg1j2m = 0.0;
arg1i3m = 0.0;
arg1j3m = 0.0;
arg3i3m = 0.0;
arg3j3m = 0.0;
if (msmeamflag) {
for (n = 0; n < 3; n++) {
for (p = n; p < 3; p++) {
for (q = p; q < 3; q++) {
arg = delij[n] * delij[p] * delij[q] * v3D[nv3];
arg1i3m = arg1i3m - d_arho3m(i, nv3) * arg;
arg1j3m = arg1j3m + d_arho3m(j, nv3) * arg;
nv3 = nv3 + 1;
}
arg = delij[n] * delij[p] * v2D[nv2];
arg1i2m = arg1i2m + d_arho2m(i, nv2) * arg;
arg1j2m = arg1j2m + d_arho2m(j, nv2) * arg;
nv2 = nv2 + 1;
}
arg1i1m = arg1i1m - d_arho1m(i, n) * delij[n];
arg1j1m = arg1j1m + d_arho1m(j, n) * delij[n];
arg3i3m = arg3i3m - d_arho3mb(i, n) * delij[n];
arg3j3m = arg3j3m + d_arho3mb(j, n) * delij[n];
}
}
// rho0 terms
drho0dr1 = drhoa0j * sij;
drho0dr2 = drhoa0i * sij;
@ -330,75 +400,183 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
drho3drm2[m] = (-a3 * drho3drm2[m] + a3a * d_arho3b(j, m)) * rhoa3i;
}
if (msmeamflag) {
// rho1m terms
a1 = 2 * sij / rij;
drho1mdr1 = a1 * (drhoa1mj - rhoa1mj / rij) * arg1i1m;
drho1mdr2 = a1 * (drhoa1mi - rhoa1mi / rij) * arg1j1m;
drho1mdr1 *= -1.0;
drho1mdr2 *= -1.0;
a1 = 2.0 * sij / rij;
for (m = 0; m < 3; m++) {
drho1mdrm1[m] = a1 * rhoa1mj * d_arho1m(i, m);
drho1mdrm2[m] = -a1 * rhoa1mi * d_arho1m(j, m);
}
// rho2m terms
a2 = 2 * sij / rij2;
drho2mdr1 = a2 * (drhoa2mj - 2 * rhoa2mj / rij) * arg1i2m - 2.0 / 3.0 * d_arho2mb[i] * drhoa2mj * sij;
drho2mdr2 = a2 * (drhoa2mi - 2 * rhoa2mi / rij) * arg1j2m - 2.0 / 3.0 * d_arho2mb[j] * drhoa2mi * sij;
a2 = 4 * sij / rij2;
for (m = 0; m < 3; m++) {
drho2mdrm1[m] = 0.0;
drho2mdrm2[m] = 0.0;
for (n = 0; n < 3; n++) {
drho2mdrm1[m] = drho2mdrm1[m] + d_arho2m(i, vind2D[m][n]) * delij[n];
drho2mdrm2[m] = drho2mdrm2[m] - d_arho2m(j, vind2D[m][n]) * delij[n];
}
drho2mdrm1[m] = a2 * rhoa2mj * drho2mdrm1[m];
drho2mdrm2[m] = -a2 * rhoa2mi * drho2mdrm2[m];
}
// rho3m terms
rij3 = rij * rij2;
a3 = 2 * sij / rij3;
a3a = 6.0 / 5.0 * sij / rij;
drho3mdr1 = a3 * (drhoa3mj - 3 * rhoa3mj / rij) * arg1i3m - a3a * (drhoa3mj - rhoa3mj / rij) * arg3i3m;
drho3mdr2 = a3 * (drhoa3mi - 3 * rhoa3mi / rij) * arg1j3m - a3a * (drhoa3mi - rhoa3mi / rij) * arg3j3m;
drho3mdr1 *= -1.0;
drho3mdr2 *= -1.0;
a3 = 6 * sij / rij3;
a3a = 6 * sij / (5 * rij);
for (m = 0; m < 3; m++) {
drho3mdrm1[m] = 0.0;
drho3mdrm2[m] = 0.0;
nv2 = 0;
for (n = 0; n < 3; n++) {
for (p = n; p < 3; p++) {
arg = delij[n] * delij[p] * this->v2D[nv2];
drho3mdrm1[m] = drho3mdrm1[m] + d_arho3m(i, vind3D[m][n][p]) * arg;
drho3mdrm2[m] = drho3mdrm2[m] + d_arho3m(j, vind3D[m][n][p]) * arg;
nv2 = nv2 + 1;
}
}
drho3mdrm1[m] = (a3 * drho3mdrm1[m] - a3a * d_arho3mb(i, m)) * rhoa3mj;
drho3mdrm2[m] = (-a3 * drho3mdrm2[m] + a3a * d_arho3mb(j, m)) * rhoa3mi;
}
} else {
for (m = 0; m < 3; m++) {
drho1mdrm1[m] = 0.0;
drho1mdrm2[m] = 0.0;
drho2mdrm1[m] = 0.0;
drho2mdrm2[m] = 0.0;
drho3mdrm1[m] = 0.0;
drho3mdrm2[m] = 0.0;
}
}
// Compute derivatives of weighting functions t wrt rij
t1i = d_t_ave(i, 0);
t2i = d_t_ave(i, 1);
t3i = d_t_ave(i, 2);
t1j = d_t_ave(j, 0);
t2j = d_t_ave(j, 1);
t3j = d_t_ave(j, 2);
if (ialloy == 1) {
a1i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 0));
a1j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 0));
a2i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 1));
a2j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 1));
a3i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 2));
a3j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 2));
dt1dr1 = a1i * (t1mj - t1i * MathSpecialKokkos::square(t1mj));
dt1dr2 = a1j * (t1mi - t1j * MathSpecialKokkos::square(t1mi));
dt2dr1 = a2i * (t2mj - t2i * MathSpecialKokkos::square(t2mj));
dt2dr2 = a2j * (t2mi - t2j * MathSpecialKokkos::square(t2mi));
dt3dr1 = a3i * (t3mj - t3i * MathSpecialKokkos::square(t3mj));
dt3dr2 = a3j * (t3mi - t3j * MathSpecialKokkos::square(t3mi));
} else if (ialloy == 2) {
// Weighting functions t set to unity for msmeam
if (msmeamflag) {
t1i = 1.0;
t2i = 1.0;
t3i = 1.0;
t1j = 1.0;
t2j = 1.0;
t3j = 1.0;
dt1dr1 = 0.0;
dt1dr2 = 0.0;
dt2dr1 = 0.0;
dt2dr2 = 0.0;
dt3dr1 = 0.0;
dt3dr2 = 0.0;
} else {
ai = 0.0;
if (!iszero_kk(d_rho0[i])) ai = drhoa0j * sij / d_rho0[i];
aj = 0.0;
if (!iszero_kk(d_rho0[j])) aj = drhoa0i * sij / d_rho0[j];
t1i = d_t_ave(i, 0);
t2i = d_t_ave(i, 1);
t3i = d_t_ave(i, 2);
t1j = d_t_ave(j, 0);
t2j = d_t_ave(j, 1);
t3j = d_t_ave(j, 2);
dt1dr1 = ai * (t1mj - t1i);
dt1dr2 = aj * (t1mi - t1j);
dt2dr1 = ai * (t2mj - t2i);
dt2dr2 = aj * (t2mi - t2j);
dt3dr1 = ai * (t3mj - t3i);
dt3dr2 = aj * (t3mi - t3j);
if (ialloy == 1) {
a1i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 0));
a1j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 0));
a2i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 1));
a2j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 1));
a3i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 2));
a3j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 2));
dt1dr1 = a1i * (t1mj - t1i * MathSpecialKokkos::square(t1mj));
dt1dr2 = a1j * (t1mi - t1j * MathSpecialKokkos::square(t1mi));
dt2dr1 = a2i * (t2mj - t2i * MathSpecialKokkos::square(t2mj));
dt2dr2 = a2j * (t2mi - t2j * MathSpecialKokkos::square(t2mi));
dt3dr1 = a3i * (t3mj - t3i * MathSpecialKokkos::square(t3mj));
dt3dr2 = a3j * (t3mi - t3j * MathSpecialKokkos::square(t3mi));
} else if (ialloy == 2) {
dt1dr1 = 0.0;
dt1dr2 = 0.0;
dt2dr1 = 0.0;
dt2dr2 = 0.0;
dt3dr1 = 0.0;
dt3dr2 = 0.0;
} else {
ai = 0.0;
if (!iszero_kk(d_rho0[i])) ai = drhoa0j * sij / d_rho0[i];
aj = 0.0;
if (!iszero_kk(d_rho0[j])) aj = drhoa0i * sij / d_rho0[j];
dt1dr1 = ai * (t1mj - t1i);
dt1dr2 = aj * (t1mi - t1j);
dt2dr1 = ai * (t2mj - t2i);
dt2dr2 = aj * (t2mi - t2j);
dt3dr1 = ai * (t3mj - t3i);
dt3dr2 = aj * (t3mi - t3j);
}
}
// Compute derivatives of total density wrt rij, sij and rij(3)
get_shpfcn(lattce_meam[elti][elti], stheta_meam[elti][elti], ctheta_meam[elti][elti], shpi);
get_shpfcn(lattce_meam[eltj][eltj], stheta_meam[elti][elti], ctheta_meam[elti][elti], shpj);
drhodr1 = d_dgamma1[i] * drho0dr1 +
d_dgamma2[i] *
(dt1dr1 * d_rho1[i] + t1i * drho1dr1 + dt2dr1 * d_rho2[i] + t2i * drho2dr1 +
dt3dr1 * d_rho3[i] + t3i * drho3dr1) -
d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
drhodr2 = d_dgamma1[j] * drho0dr2 +
d_dgamma2[j] *
(dt1dr2 * d_rho1[j] + t1j * drho1dr2 + dt2dr2 * d_rho2[j] + t2j * drho2dr2 +
dt3dr2 * d_rho3[j] + t3j * drho3dr2) -
d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
for (m = 0; m < 3; m++) {
drhodrm1[m] = 0.0;
drhodrm2[m] = 0.0;
drhodrm1[m] =
d_dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
drhodrm2[m] =
d_dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
if (msmeamflag) {
drhodr1 = d_dgamma1[i] * drho0dr1 +
d_dgamma2[i] * (dt1dr1 * d_rho1[i] + t1i * (drho1dr1 - drho1mdr1) +
dt2dr1 * d_rho2[i] + t2i * (drho2dr1 - drho2mdr1) +
dt3dr1 * d_rho3[i] + t3i * (drho3dr1 - drho3mdr1)) -
d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
drhodr2 = d_dgamma1[j] * drho0dr2 +
d_dgamma2[j] * (dt1dr2 * d_rho1[j] + t1j * (drho1dr2 - drho1mdr2) +
dt2dr2 * d_rho2[j] + t2j * (drho2dr2 - drho2mdr2) +
dt3dr2 * d_rho3[j] + t3j * (drho3dr2 - drho3mdr2)) -
d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
for (m = 0; m < 3; m++) {
drhodrm1[m] = 0.0;
drhodrm2[m] = 0.0;
drhodrm1[m] = d_dgamma2[i] * (t1i * (drho1drm1[m] - drho1mdrm1[m]) +
t2i * (drho2drm1[m] - drho2mdrm1[m]) +
t3i * (drho3drm1[m] - drho3mdrm1[m]) );
drhodrm2[m] = d_dgamma2[j] * (t1j * (drho1drm2[m] - drho1mdrm2[m]) +
t2j * (drho2drm2[m] - drho2mdrm2[m]) +
t3j * (drho3drm2[m] - drho3mdrm2[m]) );
}
} else {
drhodr1 = d_dgamma1[i] * drho0dr1 +
d_dgamma2[i] *
(dt1dr1 * d_rho1[i] + t1i * drho1dr1 + dt2dr1 * d_rho2[i] + t2i * drho2dr1 +
dt3dr1 * d_rho3[i] + t3i * drho3dr1) -
d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
drhodr2 = d_dgamma1[j] * drho0dr2 +
d_dgamma2[j] *
(dt1dr2 * d_rho1[j] + t1j * drho1dr2 + dt2dr2 * d_rho2[j] + t2j * drho2dr2 +
dt3dr2 * d_rho3[j] + t3j * drho3dr2) -
d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
for (m = 0; m < 3; m++) {
drhodrm1[m] = 0.0;
drhodrm2[m] = 0.0;
drhodrm1[m] =
d_dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
drhodrm2[m] =
d_dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
}
}
// Compute derivatives wrt sij, but only if necessary
@ -416,6 +594,24 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
drho3ds1 = a3 * rhoa3j * arg1i3 - a3a * rhoa3j * arg3i3;
drho3ds2 = a3 * rhoa3i * arg1j3 - a3a * rhoa3i * arg3j3;
if (msmeamflag) {
drho1mds1 = a1 * rhoa1mj * arg1i1m;
drho1mds2 = a1 * rhoa1mi * arg1j1m;
drho2mds1 = a2 * rhoa2mj * arg1i2m - 2.0 / 3.0 * d_arho2mb[i] * rhoa2mj;
drho2mds2 = a2 * rhoa2mi * arg1j2m - 2.0 / 3.0 * d_arho2mb[j] * rhoa2mi;
drho3mds1 = a3 * rhoa3mj * arg1i3m - a3a * rhoa3mj * arg3i3m;
drho3mds2 = a3 * rhoa3mi * arg1j3m - a3a * rhoa3mi * arg3j3m;
drho3mds1 *= -1;
drho3mds2 *= -1;
} else {
drho1mds1 = 0.0;
drho1mds2 = 0.0;
drho2mds1 = 0.0;
drho2mds2 = 0.0;
drho3mds1 = 0.0;
drho3mds2 = 0.0;
}
if (ialloy == 1) {
a1i = fdiv_zero_kk(rhoa0j, d_tsq_ave(i, 0));
a1j = fdiv_zero_kk(rhoa0i, d_tsq_ave(j, 0));
@ -455,19 +651,33 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
dt3ds2 = aj * (t3mi - t3j);
}
drhods1 = d_dgamma1[i] * drho0ds1 +
d_dgamma2[i] *
(dt1ds1 * d_rho1[i] + t1i * drho1ds1 + dt2ds1 * d_rho2[i] + t2i * drho2ds1 +
dt3ds1 * d_rho3[i] + t3i * drho3ds1) -
if (msmeamflag) {
drhods1 = d_dgamma1[i] * drho0ds1 +
d_dgamma2[i] * (dt1ds1 * d_rho1[i] + t1i * (drho1ds1 - drho1mds1) +
dt2ds1 * d_rho2[i] + t2i * (drho2ds1 - drho2mds1) +
dt3ds1 * d_rho3[i] + t3i * (drho3ds1 - drho3mds1)) -
d_dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
drhods2 = d_dgamma1[j] * drho0ds2 +
d_dgamma2[j] *
(dt1ds2 * d_rho1[j] + t1j * drho1ds2 + dt2ds2 * d_rho2[j] + t2j * drho2ds2 +
dt3ds2 * d_rho3[j] + t3j * drho3ds2) -
drhods2 = d_dgamma1[j] * drho0ds2 +
d_dgamma2[j] * (dt1ds2 * d_rho1[j] + t1j * (drho1ds2 - drho1mds2) +
dt2ds2 * d_rho2[j] + t2j * (drho2ds2 - drho2mds2) +
dt3ds2 * d_rho3[j] + t3j * (drho3ds2 - drho3mds2)) -
d_dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
} else {
drhods1 = d_dgamma1[i] * drho0ds1 +
d_dgamma2[i] *
(dt1ds1 * d_rho1[i] + t1i * drho1ds1 + dt2ds1 * d_rho2[i] + t2i * drho2ds1 +
dt3ds1 * d_rho3[i] + t3i * drho3ds1) -
d_dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
drhods2 = d_dgamma1[j] * drho0ds2 +
d_dgamma2[j] *
(dt1ds2 * d_rho1[j] + t1j * drho1ds2 + dt2ds2 * d_rho2[j] + t2j * drho2ds2 +
dt3ds2 * d_rho3[j] + t3j * drho3ds2) -
d_dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
}
}
// Compute derivatives of energy wrt rij, sij and rij[3]
dUdrij = phip * sij + d_frhop[i] * drhodr1 + d_frhop[j] * drhodr2;
dUdsij = 0.0;
if (!iszero_kk(d_dscrfcn[fnoffset + jn])) {

View File

@ -58,6 +58,14 @@ MEAMKokkos<DeviceType>::~MEAMKokkos()
memoryKK->destroy_kokkos(k_scrfcn,scrfcn);
memoryKK->destroy_kokkos(k_dscrfcn,dscrfcn);
memoryKK->destroy_kokkos(k_fcpair,fcpair);
// msmeam
memoryKK->destroy_kokkos(k_arho2mb, arho2mb);
memoryKK->destroy_kokkos(k_arho1m, arho1m);
memoryKK->destroy_kokkos(k_arho2m, arho2m);
memoryKK->destroy_kokkos(k_arho3m, arho3m);
memoryKK->destroy_kokkos(k_arho3mb, arho3mb);
}
#include "meam_setup_done_kokkos.h"

View File

@ -136,6 +136,13 @@ template <class DeviceType> class MEAMKokkos : public MEAM {
DAT::tdual_ffloat_1d k_scrfcn, k_dscrfcn, k_fcpair;
typename ArrayTypes<DeviceType>::t_ffloat_1d d_scrfcn, d_dscrfcn, d_fcpair;
HAT::t_ffloat_1d h_scrfcn, h_dscrfcn, h_fcpair;
// msmeam
DAT::tdual_ffloat_2d k_arho1m, k_arho2m, k_arho3m, k_arho3mb;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_arho1m, d_arho2m, d_arho3m, d_arho3mb;
HAT::t_ffloat_2d h_arho1m, h_arho2m, h_arho3m, h_arho3mb;
DAT::tdual_ffloat_1d k_arho2mb;
typename ArrayTypes<DeviceType>::t_ffloat_1d d_arho2mb;
HAT::t_ffloat_1d h_arho2mb;
protected:
int need_dup;
@ -195,6 +202,31 @@ template <class DeviceType> class MEAMKokkos : public MEAM {
dup_vatom;
NonDupScatterView<typename decltype(d_vatom)::data_type, typename decltype(d_vatom)::array_layout>
ndup_vatom;
// msmeam
DupScatterView<typename decltype(d_arho1m)::data_type, typename decltype(d_arho1m)::array_layout>
dup_arho1m;
NonDupScatterView<typename decltype(d_arho1m)::data_type, typename decltype(d_arho1m)::array_layout>
ndup_arho1m;
DupScatterView<typename decltype(d_arho2m)::data_type, typename decltype(d_arho2m)::array_layout>
dup_arho2m;
NonDupScatterView<typename decltype(d_arho2m)::data_type, typename decltype(d_arho2m)::array_layout>
ndup_arho2m;
DupScatterView<typename decltype(d_arho3m)::data_type, typename decltype(d_arho3m)::array_layout>
dup_arho3m;
NonDupScatterView<typename decltype(d_arho3m)::data_type, typename decltype(d_arho3m)::array_layout>
ndup_arho3m;
DupScatterView<typename decltype(d_arho2mb)::data_type, typename decltype(d_arho2mb)::array_layout>
dup_arho2mb;
NonDupScatterView<typename decltype(d_arho2mb)::data_type,
typename decltype(d_arho2mb)::array_layout>
ndup_arho2mb;
DupScatterView<typename decltype(d_arho3mb)::data_type, typename decltype(d_arho3mb)::array_layout>
dup_arho3mb;
NonDupScatterView<typename decltype(d_arho3mb)::data_type,
typename decltype(d_arho3mb)::array_layout>
ndup_arho3mb;
};
KOKKOS_INLINE_FUNCTION

View File

@ -51,6 +51,7 @@ PairMEAMKokkos<DeviceType>::PairMEAMKokkos(LAMMPS *lmp) : PairMEAM(lmp)
delete meam_inst;
meam_inst_kk = new MEAMKokkos<DeviceType>(memory);
meam_inst = meam_inst_kk;
myname = "meam/kk";
}
/* ---------------------------------------------------------------------- */
@ -156,7 +157,8 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
int need_dup = lmp->kokkos->need_dup<DeviceType>();
meam_inst_kk->meam_dens_init(inum_half,ntype,type,d_map,x,d_numneigh_half,d_numneigh_full,d_ilist_half,d_neighbors_half, d_neighbors_full, d_offset, neighflag, need_dup);
meam_inst_kk->meam_dens_init(inum_half,ntype,type,d_map,x,d_numneigh_half,d_numneigh_full,
d_ilist_half,d_neighbors_half, d_neighbors_full, d_offset, neighflag, need_dup);
meam_inst_kk->k_rho0.template modify<DeviceType>();
meam_inst_kk->k_arho2b.template modify<DeviceType>();
@ -166,6 +168,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
meam_inst_kk->k_arho3b.template modify<DeviceType>();
meam_inst_kk->k_t_ave.template modify<DeviceType>();
meam_inst_kk->k_tsq_ave.template modify<DeviceType>();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.template modify<DeviceType>();
meam_inst_kk->k_arho1m.template modify<DeviceType>();
meam_inst_kk->k_arho2m.template modify<DeviceType>();
meam_inst_kk->k_arho3m.template modify<DeviceType>();
meam_inst_kk->k_arho3mb.template modify<DeviceType>();
}
comm->reverse_comm(this);
@ -177,6 +186,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
meam_inst_kk->k_arho3b.template sync<DeviceType>();
meam_inst_kk->k_t_ave.template sync<DeviceType>();
meam_inst_kk->k_tsq_ave.template sync<DeviceType>();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.template sync<DeviceType>();
meam_inst_kk->k_arho1m.template sync<DeviceType>();
meam_inst_kk->k_arho2m.template sync<DeviceType>();
meam_inst_kk->k_arho3m.template sync<DeviceType>();
meam_inst_kk->k_arho3mb.template sync<DeviceType>();
}
meam_inst_kk->meam_dens_final(nlocal,eflag_either,eflag_global,eflag_atom,
d_eatom,ntype,type,d_map,d_scale,errorflag,ev);
@ -200,6 +216,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
meam_inst_kk->k_arho3b.template modify<DeviceType>();
meam_inst_kk->k_t_ave.template modify<DeviceType>();
meam_inst_kk->k_tsq_ave.template modify<DeviceType>();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.template modify<DeviceType>();
meam_inst_kk->k_arho1m.template modify<DeviceType>();
meam_inst_kk->k_arho2m.template modify<DeviceType>();
meam_inst_kk->k_arho3m.template modify<DeviceType>();
meam_inst_kk->k_arho3mb.template modify<DeviceType>();
}
comm->forward_comm(this);
@ -219,6 +242,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
meam_inst_kk->k_arho3b.template sync<DeviceType>();
meam_inst_kk->k_t_ave.template sync<DeviceType>();
meam_inst_kk->k_tsq_ave.template sync<DeviceType>();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.template sync<DeviceType>();
meam_inst_kk->k_arho1m.template sync<DeviceType>();
meam_inst_kk->k_arho2m.template sync<DeviceType>();
meam_inst_kk->k_arho3m.template sync<DeviceType>();
meam_inst_kk->k_arho3mb.template sync<DeviceType>();
}
meam_inst_kk->meam_force(inum_half,eflag_global,eflag_atom,vflag_global,
vflag_atom,d_eatom,ntype,type,d_map,x,
@ -315,7 +345,7 @@ int PairMEAMKokkos<DeviceType>::pack_forward_comm_kokkos(int n, DAT::tdual_int_2
iswap = iswap_in;
v_buf = buf.view<DeviceType>();
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, TagPairMEAMPackForwardComm>(0,n),*this);
return n*38;
return n*comm_forward;
}
/* ---------------------------------------------------------------------- */
@ -324,7 +354,7 @@ template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackForwardComm, const int &i) const {
int j = d_sendlist(iswap, i);
int m = i*38;
int m = i*comm_forward;
v_buf[m++] = d_rho0[j];
v_buf[m++] = d_rho1[j];
v_buf[m++] = d_rho2[j];
@ -354,6 +384,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackForwardComm, const in
v_buf[m++] = d_tsq_ave(j,0);
v_buf[m++] = d_tsq_ave(j,1);
v_buf[m++] = d_tsq_ave(j,2);
if (msmeamflag) {
v_buf[m++] = d_arho2mb[j];
v_buf[m++] = d_arho1m(j,0);
v_buf[m++] = d_arho1m(j,1);
v_buf[m++] = d_arho1m(j,2);
v_buf[m++] = d_arho2m(j,0);
v_buf[m++] = d_arho2m(j,1);
v_buf[m++] = d_arho2m(j,2);
v_buf[m++] = d_arho2m(j,3);
v_buf[m++] = d_arho2m(j,4);
v_buf[m++] = d_arho2m(j,5);
for (int k = 0; k < 10; k++) v_buf[m++] = d_arho3m(j,k);
v_buf[m++] = d_arho3mb(j,0);
v_buf[m++] = d_arho3mb(j,1);
v_buf[m++] = d_arho3mb(j,2);
}
}
/* ---------------------------------------------------------------------- */
@ -371,7 +417,8 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm_kokkos(int n, int first_in,
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackForwardComm, const int &i) const{
int m = i*38;
//int m = i*38;
int m = i*comm_forward;
d_rho0[i+first] = v_buf[m++];
d_rho1[i+first] = v_buf[m++];
@ -402,6 +449,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackForwardComm, const
d_tsq_ave(i+first,0) = v_buf[m++];
d_tsq_ave(i+first,1) = v_buf[m++];
d_tsq_ave(i+first,2) = v_buf[m++];
if (msmeamflag) {
d_arho2mb[i+first] = v_buf[m++];
d_arho1m(i+first,0) = v_buf[m++];
d_arho1m(i+first,1) = v_buf[m++];
d_arho1m(i+first,2) = v_buf[m++];
d_arho2m(i+first,0) = v_buf[m++];
d_arho2m(i+first,1) = v_buf[m++];
d_arho2m(i+first,2) = v_buf[m++];
d_arho2m(i+first,3) = v_buf[m++];
d_arho2m(i+first,4) = v_buf[m++];
d_arho2m(i+first,5) = v_buf[m++];
for (int k = 0; k < 10; k++) d_arho3m(i+first,k) = v_buf[m++];
d_arho3mb(i+first,0) = v_buf[m++];
d_arho3mb(i+first,1) = v_buf[m++];
d_arho3mb(i+first,2) = v_buf[m++];
}
}
/* ---------------------------------------------------------------------- */
@ -426,6 +489,13 @@ int PairMEAMKokkos<DeviceType>::pack_forward_comm(int n, int *list, double *buf,
meam_inst_kk->k_arho3b.sync_host();
meam_inst_kk->k_t_ave.sync_host();
meam_inst_kk->k_tsq_ave.sync_host();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.sync_host();
meam_inst_kk->k_arho1m.sync_host();
meam_inst_kk->k_arho2m.sync_host();
meam_inst_kk->k_arho3m.sync_host();
meam_inst_kk->k_arho3mb.sync_host();
}
int m = 0;
for (int i = 0; i < n; i++) {
@ -459,6 +529,22 @@ int PairMEAMKokkos<DeviceType>::pack_forward_comm(int n, int *list, double *buf,
buf[m++] = meam_inst_kk->h_tsq_ave(j,0);
buf[m++] = meam_inst_kk->h_tsq_ave(j,1);
buf[m++] = meam_inst_kk->h_tsq_ave(j,2);
if (msmeamflag) {
buf[m++] = meam_inst_kk->h_arho2mb[j];
buf[m++] = meam_inst_kk->h_arho1m(j,0);
buf[m++] = meam_inst_kk->h_arho1m(j,1);
buf[m++] = meam_inst_kk->h_arho1m(j,2);
buf[m++] = meam_inst_kk->h_arho2m(j,0);
buf[m++] = meam_inst_kk->h_arho2m(j,1);
buf[m++] = meam_inst_kk->h_arho2m(j,2);
buf[m++] = meam_inst_kk->h_arho2m(j,3);
buf[m++] = meam_inst_kk->h_arho2m(j,4);
buf[m++] = meam_inst_kk->h_arho2m(j,5);
for (int k = 0; k < 10; k++) buf[m++] = meam_inst_kk->h_arho3m(j,k);
buf[m++] = meam_inst_kk->h_arho3mb(j,0);
buf[m++] = meam_inst_kk->h_arho3mb(j,1);
buf[m++] = meam_inst_kk->h_arho3mb(j,2);
}
}
return m;
@ -485,6 +571,13 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *b
meam_inst_kk->k_arho3b.sync_host();
meam_inst_kk->k_t_ave.sync_host();
meam_inst_kk->k_tsq_ave.sync_host();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.sync_host();
meam_inst_kk->k_arho1m.sync_host();
meam_inst_kk->k_arho2m.sync_host();
meam_inst_kk->k_arho3m.sync_host();
meam_inst_kk->k_arho3mb.sync_host();
}
int m = 0;
const int last = first + n;
@ -518,6 +611,22 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *b
meam_inst_kk->h_tsq_ave(i,0) = buf[m++];
meam_inst_kk->h_tsq_ave(i,1) = buf[m++];
meam_inst_kk->h_tsq_ave(i,2) = buf[m++];
if (msmeamflag) {
meam_inst_kk->h_arho2mb[i] = buf[m++];
meam_inst_kk->h_arho1m(i,0) = buf[m++];
meam_inst_kk->h_arho1m(i,1) = buf[m++];
meam_inst_kk->h_arho1m(i,2) = buf[m++];
meam_inst_kk->h_arho2m(i,0) = buf[m++];
meam_inst_kk->h_arho2m(i,1) = buf[m++];
meam_inst_kk->h_arho2m(i,2) = buf[m++];
meam_inst_kk->h_arho2m(i,3) = buf[m++];
meam_inst_kk->h_arho2m(i,4) = buf[m++];
meam_inst_kk->h_arho2m(i,5) = buf[m++];
for (int k = 0; k < 10; k++) meam_inst_kk->h_arho3m(i,k) = buf[m++];
meam_inst_kk->h_arho3mb(i,0) = buf[m++];
meam_inst_kk->h_arho3mb(i,1) = buf[m++];
meam_inst_kk->h_arho3mb(i,2) = buf[m++];
}
}
meam_inst_kk->k_rho0.modify_host();
@ -536,6 +645,13 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *b
meam_inst_kk->k_arho3b.modify_host();
meam_inst_kk->k_t_ave.modify_host();
meam_inst_kk->k_tsq_ave.modify_host();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.modify_host();
meam_inst_kk->k_arho1m.modify_host();
meam_inst_kk->k_arho2m.modify_host();
meam_inst_kk->k_arho3m.modify_host();
meam_inst_kk->k_arho3mb.modify_host();
}
}
/* ---------------------------------------------------------------------- */
@ -546,7 +662,8 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm_kokkos(int n, int first_in, DA
first = first_in;
v_buf = buf.view<DeviceType>();
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, TagPairMEAMPackReverseComm>(0,n),*this);
return n*30;
//return n*30;
return n*comm_reverse;
}
/* ---------------------------------------------------------------------- */
@ -554,7 +671,8 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm_kokkos(int n, int first_in, DA
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackReverseComm, const int &i) const {
int m = i*30;
//int m = i*30;
int m = i*comm_reverse;
v_buf[m++] = d_rho0[i+first];
v_buf[m++] = d_arho2b[i+first];
@ -577,6 +695,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackReverseComm, const in
v_buf[m++] = d_tsq_ave(i+first,0);
v_buf[m++] = d_tsq_ave(i+first,1);
v_buf[m++] = d_tsq_ave(i+first,2);
if (msmeamflag) {
v_buf[m++] = d_arho2mb[i+first];
v_buf[m++] = d_arho1m(i+first,0);
v_buf[m++] = d_arho1m(i+first,1);
v_buf[m++] = d_arho1m(i+first,2);
v_buf[m++] = d_arho2m(i+first,0);
v_buf[m++] = d_arho2m(i+first,1);
v_buf[m++] = d_arho2m(i+first,2);
v_buf[m++] = d_arho2m(i+first,3);
v_buf[m++] = d_arho2m(i+first,4);
v_buf[m++] = d_arho2m(i+first,5);
for (int k = 0; k < 10; k++) v_buf[m++] = d_arho3m(i+first,k);
v_buf[m++] = d_arho3mb(i+first,0);
v_buf[m++] = d_arho3mb(i+first,1);
v_buf[m++] = d_arho3mb(i+first,2);
}
}
/* ---------------------------------------------------------------------- */
@ -592,6 +726,13 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm(int n, int first, double *buf)
meam_inst_kk->k_arho3b.sync_host();
meam_inst_kk->k_t_ave.sync_host();
meam_inst_kk->k_tsq_ave.sync_host();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.sync_host();
meam_inst_kk->k_arho1m.sync_host();
meam_inst_kk->k_arho2m.sync_host();
meam_inst_kk->k_arho3m.sync_host();
meam_inst_kk->k_arho3mb.sync_host();
}
int m = 0;
const int last = first + n;
@ -617,6 +758,22 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm(int n, int first, double *buf)
buf[m++] = meam_inst_kk->h_tsq_ave(i,0);
buf[m++] = meam_inst_kk->h_tsq_ave(i,1);
buf[m++] = meam_inst_kk->h_tsq_ave(i,2);
if (msmeamflag) {
buf[m++] = meam_inst_kk->h_arho2mb[i];
buf[m++] = meam_inst_kk->h_arho1m(i,0);
buf[m++] = meam_inst_kk->h_arho1m(i,1);
buf[m++] = meam_inst_kk->h_arho1m(i,2);
buf[m++] = meam_inst_kk->h_arho2m(i,0);
buf[m++] = meam_inst_kk->h_arho2m(i,1);
buf[m++] = meam_inst_kk->h_arho2m(i,2);
buf[m++] = meam_inst_kk->h_arho2m(i,3);
buf[m++] = meam_inst_kk->h_arho2m(i,4);
buf[m++] = meam_inst_kk->h_arho2m(i,5);
for (int k = 0; k < 10; k++) buf[m++] = meam_inst_kk->h_arho3m(i,k);
buf[m++] = meam_inst_kk->h_arho3mb(i,0);
buf[m++] = meam_inst_kk->h_arho3mb(i,1);
buf[m++] = meam_inst_kk->h_arho3mb(i,2);
}
}
return m;
@ -639,7 +796,8 @@ template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackReverseComm, const int &i) const {
int j = d_sendlist(iswap, i);
int m = i*30;
//int m = i*30;
int m = i*comm_reverse;
d_rho0[j] += v_buf[m++];
d_arho2b[j] += v_buf[m++];
@ -662,6 +820,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackReverseComm, const
d_tsq_ave(j,0) += v_buf[m++];
d_tsq_ave(j,1) += v_buf[m++];
d_tsq_ave(j,2) += v_buf[m++];
if (msmeamflag) {
d_arho2mb[j] += v_buf[m++];
d_arho1m(j,0) += v_buf[m++];
d_arho1m(j,1) += v_buf[m++];
d_arho1m(j,2) += v_buf[m++];
d_arho2m(j,0) += v_buf[m++];
d_arho2m(j,1) += v_buf[m++];
d_arho2m(j,2) += v_buf[m++];
d_arho2m(j,3) += v_buf[m++];
d_arho2m(j,4) += v_buf[m++];
d_arho2m(j,5) += v_buf[m++];
for (int k = 0; k < 10; k++) d_arho3m(j,k) += v_buf[m++];
d_arho3mb(j,0) += v_buf[m++];
d_arho3mb(j,1) += v_buf[m++];
d_arho3mb(j,2) += v_buf[m++];
}
}
/* ---------------------------------------------------------------------- */
@ -677,6 +851,13 @@ void PairMEAMKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *b
meam_inst_kk->k_arho3b.sync_host();
meam_inst_kk->k_t_ave.sync_host();
meam_inst_kk->k_tsq_ave.sync_host();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.sync_host();
meam_inst_kk->k_arho1m.sync_host();
meam_inst_kk->k_arho2m.sync_host();
meam_inst_kk->k_arho3m.sync_host();
meam_inst_kk->k_arho3mb.sync_host();
}
int m = 0;
for (int i = 0; i < n; i++) {
@ -702,6 +883,22 @@ void PairMEAMKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *b
meam_inst_kk->h_tsq_ave(j,0) += buf[m++];
meam_inst_kk->h_tsq_ave(j,1) += buf[m++];
meam_inst_kk->h_tsq_ave(j,2) += buf[m++];
if (msmeamflag) {
meam_inst_kk->h_arho2mb[j] += buf[m++];
meam_inst_kk->h_arho1m(j,0) += buf[m++];
meam_inst_kk->h_arho1m(j,1) += buf[m++];
meam_inst_kk->h_arho1m(j,2) += buf[m++];
meam_inst_kk->h_arho2m(j,0) += buf[m++];
meam_inst_kk->h_arho2m(j,1) += buf[m++];
meam_inst_kk->h_arho2m(j,2) += buf[m++];
meam_inst_kk->h_arho2m(j,3) += buf[m++];
meam_inst_kk->h_arho2m(j,4) += buf[m++];
meam_inst_kk->h_arho2m(j,5) += buf[m++];
for (int k = 0; k < 10; k++) meam_inst_kk->h_arho3m(j,k) += buf[m++];
meam_inst_kk->h_arho3mb(j,0) += buf[m++];
meam_inst_kk->h_arho3mb(j,1) += buf[m++];
meam_inst_kk->h_arho3mb(j,2) += buf[m++];
}
}
meam_inst_kk->k_rho0.modify_host();
@ -712,6 +909,13 @@ void PairMEAMKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *b
meam_inst_kk->k_arho3b.modify_host();
meam_inst_kk->k_t_ave.modify_host();
meam_inst_kk->k_tsq_ave.modify_host();
if (msmeamflag) {
meam_inst_kk->k_arho2mb.modify_host();
meam_inst_kk->k_arho1m.modify_host();
meam_inst_kk->k_arho2m.modify_host();
meam_inst_kk->k_arho3m.modify_host();
meam_inst_kk->k_arho3mb.modify_host();
}
}
/* ----------------------------------------------------------------------
@ -764,6 +968,12 @@ void PairMEAMKokkos<DeviceType>::update_meam_views()
d_arho3b = meam_inst_kk->d_arho3b;
d_t_ave = meam_inst_kk->d_t_ave;
d_tsq_ave = meam_inst_kk->d_tsq_ave;
// msmeam
d_arho1m = meam_inst_kk->d_arho1m;
d_arho2m = meam_inst_kk->d_arho2m;
d_arho3m = meam_inst_kk->d_arho3m;
d_arho2mb = meam_inst_kk->d_arho2mb;
d_arho3mb = meam_inst_kk->d_arho3mb;
}
/* ---------------------------------------------------------------------- */

View File

@ -13,12 +13,12 @@
#ifdef PAIR_CLASS
// clang-format off
PairStyle(meam/c/kk,PairMEAMKokkos<LMPDeviceType>)
PairStyle(meam/c/kk/device,PairMEAMKokkos<LMPDeviceType>)
PairStyle(meam/c/kk/host,PairMEAMKokkos<LMPHostType>)
PairStyle(meam/kk,PairMEAMKokkos<LMPDeviceType>)
PairStyle(meam/kk/device,PairMEAMKokkos<LMPDeviceType>)
PairStyle(meam/kk/host,PairMEAMKokkos<LMPHostType>)
PairStyle(meam/c/kk,PairMEAMKokkos<LMPDeviceType>);
PairStyle(meam/c/kk/device,PairMEAMKokkos<LMPDeviceType>);
PairStyle(meam/c/kk/host,PairMEAMKokkos<LMPHostType>);
PairStyle(meam/kk,PairMEAMKokkos<LMPDeviceType>);
PairStyle(meam/kk/device,PairMEAMKokkos<LMPDeviceType>);
PairStyle(meam/kk/host,PairMEAMKokkos<LMPHostType>);
// clang-format on
#else
@ -117,6 +117,9 @@ class PairMEAMKokkos : public PairMEAM, public KokkosBase {
typename ArrayTypes<DeviceType>::t_ffloat_1d d_rho, d_rho0, d_rho1, d_rho2, d_rho3, d_frhop;
typename ArrayTypes<DeviceType>::t_ffloat_1d d_gamma, d_dgamma1, d_dgamma2, d_dgamma3, d_arho2b;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_arho1, d_arho2, d_arho3, d_arho3b, d_t_ave, d_tsq_ave;
// msmeam params
typename ArrayTypes<DeviceType>::t_ffloat_1d d_arho2mb;
typename ArrayTypes<DeviceType>::t_ffloat_2d d_arho1m, d_arho2m, d_arho3m, d_arho3mb;
void update_meam_views();

View File

@ -0,0 +1,32 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
https://www.lammps.org/ Sandia National Laboratories
LAMMPS development team: developers@lammps.org
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include "pair_meam_ms_kokkos.h"
#include "meam.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
template<class DeviceType>
PairMEAMMSKokkos<DeviceType>::PairMEAMMSKokkos(LAMMPS *lmp) : PairMEAMKokkos<DeviceType>(lmp)
{
this->meam_inst->msmeamflag = this->msmeamflag = 1;
this->myname = "meam/ms/kk";
}
namespace LAMMPS_NS {
template class PairMEAMMSKokkos<LMPDeviceType>;
#ifdef KOKKOS_ENABLE_CUDA
template class PairMEAMMSKokkos<LMPHostType>;
#endif
}

View File

@ -0,0 +1,36 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
https://www.lammps.org/ Sandia National Laboratories
LAMMPS development team: developers@lammps.org
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
// clang-format off
PairStyle(meam/ms/kk,PairMEAMMSKokkos<LMPDeviceType>);
PairStyle(meam/ms/kk/device,PairMEAMMSKokkos<LMPDeviceType>);
PairStyle(meam/ms/kk/host,PairMEAMMSKokkos<LMPHostType>);
// clang-format on
#else
#ifndef LMP_PAIR_MEAM_MS_KOKKOS_H
#define LMP_PAIR_MEAM_MS_KOKKOS_H
#include "pair_meam_kokkos.h"
namespace LAMMPS_NS {
template <class DeviceType>
class PairMEAMMSKokkos : public PairMEAMKokkos<DeviceType> {
public:
PairMEAMMSKokkos(class LAMMPS *);
};
} // namespace LAMMPS_NS
#endif
#endif

View File

@ -17,7 +17,7 @@
#include <cmath>
#include <string>
#define maxelt 5
constexpr int maxelt = 5;
namespace LAMMPS_NS {
class Memory;
@ -30,6 +30,7 @@ class MEAM {
virtual ~MEAM();
int copymode;
int msmeamflag;
protected:
Memory *memory;
@ -74,6 +75,12 @@ class MEAM {
// vind[23]D = Voight notation index maps for 2 and 3D
// v2D,v3D = array of factors to apply for Voight notation
// MS-MEAM parameters
// msmeamflag = flag to activate MS-MEAM
// betam[1-3]_meam = MS-MEAM electron density constants
// tm[1-3]_meam = MS-MEAM coefficients on densities in Gamma computation
// nr,dr = pair function discretization parameters
// nrar,rdrar = spline coeff array parameters
@ -115,12 +122,22 @@ class MEAM {
int nr, nrar;
double dr, rdrar;
// MS-MEAM parameters
double t1m_meam[maxelt], t2m_meam[maxelt], t3m_meam[maxelt];
double beta1m_meam[maxelt], beta2m_meam[maxelt], beta3m_meam[maxelt];
//int msmeamflag; // made public for pair style settings
public:
int nmax;
double *rho, *rho0, *rho1, *rho2, *rho3, *frhop;
double *gamma, *dgamma1, *dgamma2, *dgamma3, *arho2b;
double **arho1, **arho2, **arho3, **arho3b, **t_ave, **tsq_ave;
// MS-MEAM arrays
double **arho1m, **arho2m, *arho2mb, **arho3m, **arho3mb;
int maxneigh;
double *scrfcn, *dscrfcn, *fcpair;
@ -242,7 +259,7 @@ class MEAM {
double, double, double, double, double, int, int, lattice_t);
void get_sijk(double, int, int, int, double *);
void get_densref(double, int, int, double *, double *, double *, double *, double *, double *,
double *, double *);
double *, double *, double *, double *, double *, double *, double *, double *); // last 6 args for msmeam
void interpolate_meam(int);
public:
@ -282,10 +299,12 @@ class MEAM {
}
// clang-format on
static int get_Zij(const lattice_t latt);
// last 6 args are optional msmeam parameters
void meam_setup_global(int nelt, lattice_t *lat, int *ielement, double *atwt, double *alpha,
double *b0, double *b1, double *b2, double *b3, double *alat, double *esub,
double *asub, double *t0, double *t1, double *t2, double *t3,
double *rozero, int *ibar);
double *rozero, int *ibar, double *b1m, double *b2m, double *b3m,
double *t1m, double *t2m, double *t3m);
void meam_setup_param(int which, double value, int nindex, int *index /*index(3)*/,
int *errorflag);
virtual void meam_setup_done(double *cutmax);

View File

@ -27,115 +27,222 @@ MEAM::meam_dens_final(int nlocal, int eflag_either, int eflag_global, int eflag_
// Complete the calculation of density
for (i = 0; i < nlocal; i++) {
elti = fmap[type[i]];
if (elti >= 0) {
scaleii = scale[type[i]][type[i]];
rho1[i] = 0.0;
rho2[i] = -1.0 / 3.0 * arho2b[i] * arho2b[i];
rho3[i] = 0.0;
for (m = 0; m < 3; m++) {
rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m];
rho3[i] = rho3[i] - 3.0 / 5.0 * arho3b[i][m] * arho3b[i][m];
}
for (m = 0; m < 6; m++) {
rho2[i] = rho2[i] + this->v2D[m] * arho2[i][m] * arho2[i][m];
}
for (m = 0; m < 10; m++) {
rho3[i] = rho3[i] + this->v3D[m] * arho3[i][m] * arho3[i][m];
}
if (rho0[i] > 0.0) {
if (this->ialloy == 1) {
t_ave[i][0] = fdiv_zero(t_ave[i][0], tsq_ave[i][0]);
t_ave[i][1] = fdiv_zero(t_ave[i][1], tsq_ave[i][1]);
t_ave[i][2] = fdiv_zero(t_ave[i][2], tsq_ave[i][2]);
} else if (this->ialloy == 2) {
t_ave[i][0] = this->t1_meam[elti];
t_ave[i][1] = this->t2_meam[elti];
t_ave[i][2] = this->t3_meam[elti];
} else {
t_ave[i][0] = t_ave[i][0] / rho0[i];
t_ave[i][1] = t_ave[i][1] / rho0[i];
t_ave[i][2] = t_ave[i][2] / rho0[i];
if (this->msmeamflag) {
for (i = 0; i < nlocal; i++) {
elti = fmap[type[i]];
if (elti >= 0) {
scaleii = scale[type[i]][type[i]];
rho1[i] = 0.0;
rho2[i] = -1.0 / 3.0 * (arho2b[i] * arho2b[i]
- arho2mb[i] * arho2mb[i]);
rho3[i] = 0.0;
for (m = 0; m < 3; m++) {
rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m]
- arho1m[i][m] * arho1m[i][m];
rho3[i] = rho3[i] - 3.0 / 5.0 * (arho3b[i][m] * arho3b[i][m]
- arho3mb[i][m] * arho3mb[i][m]);
}
}
gamma[i] = t_ave[i][0] * rho1[i] + t_ave[i][1] * rho2[i] + t_ave[i][2] * rho3[i];
if (rho0[i] > 0.0) {
gamma[i] = gamma[i] / (rho0[i] * rho0[i]);
}
Z = get_Zij(this->lattce_meam[elti][elti]);
G = G_gam(gamma[i], this->ibar_meam[elti], errorflag);
if (errorflag != 0)
return;
get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp);
if (this->ibar_meam[elti] <= 0) {
Gbar = 1.0;
dGbar = 0.0;
} else {
if (this->mix_ref_t == 1) {
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
} else {
gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) /
(Z * Z);
for (m = 0; m < 6; m++) {
rho2[i] = rho2[i] + this->v2D[m] * (arho2[i][m] * arho2[i][m]
- arho2m[i][m] * arho2m[i][m]);
}
Gbar = G_gam(gam, this->ibar_meam[elti], errorflag);
}
rho[i] = rho0[i] * G;
if (this->mix_ref_t == 1) {
for (m = 0; m < 10; m++) {
rho3[i] = rho3[i] + this->v3D[m] * (arho3[i][m] * arho3[i][m]
- arho3m[i][m] * arho3m[i][m]);
}
// all the t weights are already accounted for with msmeam
gamma[i] = rho1[i] + rho2[i] + rho3[i];
if (rho0[i] > 0.0) {
gamma[i] = gamma[i] / (rho0[i] * rho0[i]);
}
Z = get_Zij(this->lattce_meam[elti][elti]);
G = G_gam(gamma[i], this->ibar_meam[elti], errorflag);
if (errorflag != 0)
return;
get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp);
if (this->ibar_meam[elti] <= 0) {
Gbar = 1.0;
dGbar = 0.0;
} else {
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar);
if (this->mix_ref_t == 1) {
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
} else {
gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) /
(Z * Z);
}
Gbar = G_gam(gam, this->ibar_meam[elti], errorflag);
}
rho_bkgd = this->rho0_meam[elti] * Z * Gbar;
} else {
if (this->bkgd_dyn == 1) {
rho_bkgd = this->rho0_meam[elti] * Z;
rho[i] = rho0[i] * G;
if (this->mix_ref_t == 1) {
if (this->ibar_meam[elti] <= 0) {
Gbar = 1.0;
dGbar = 0.0;
} else {
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar);
}
rho_bkgd = this->rho0_meam[elti] * Z * Gbar;
} else {
rho_bkgd = this->rho_ref_meam[elti];
if (this->bkgd_dyn == 1) {
rho_bkgd = this->rho0_meam[elti] * Z;
} else {
rho_bkgd = this->rho_ref_meam[elti];
}
}
rhob = rho[i] / rho_bkgd;
denom = 1.0 / rho_bkgd;
G = dG_gam(gamma[i], this->ibar_meam[elti], dG);
dgamma1[i] = (G - 2 * dG * gamma[i]) * denom;
if (!iszero(rho0[i])) {
dgamma2[i] = (dG / rho0[i]) * denom;
} else {
dgamma2[i] = 0.0;
}
// dgamma3 is nonzero only if we are using the "mixed" rule for
// computing t in the reference system (which is not correct, but
// included for backward compatibility
if (this->mix_ref_t == 1) {
dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom;
} else {
dgamma3[i] = 0.0;
}
Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]);
if (eflag_either != 0) {
Fl *= scaleii;
if (eflag_global != 0) {
*eng_vdwl = *eng_vdwl + Fl;
}
if (eflag_atom != 0) {
eatom[i] = eatom[i] + Fl;
}
}
}
rhob = rho[i] / rho_bkgd;
denom = 1.0 / rho_bkgd;
G = dG_gam(gamma[i], this->ibar_meam[elti], dG);
dgamma1[i] = (G - 2 * dG * gamma[i]) * denom;
if (!iszero(rho0[i])) {
dgamma2[i] = (dG / rho0[i]) * denom;
} else {
dgamma2[i] = 0.0;
}
// dgamma3 is nonzero only if we are using the "mixed" rule for
// computing t in the reference system (which is not correct, but
// included for backward compatibility
if (this->mix_ref_t == 1) {
dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom;
} else {
dgamma3[i] = 0.0;
}
Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]);
if (eflag_either != 0) {
Fl *= scaleii;
if (eflag_global != 0) {
*eng_vdwl = *eng_vdwl + Fl;
}
} else {
for (i = 0; i < nlocal; i++) {
elti = fmap[type[i]];
if (elti >= 0) {
scaleii = scale[type[i]][type[i]];
rho1[i] = 0.0;
rho2[i] = -1.0 / 3.0 * arho2b[i] * arho2b[i];
rho3[i] = 0.0;
for (m = 0; m < 3; m++) {
rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m];
rho3[i] = rho3[i] - 3.0 / 5.0 * arho3b[i][m] * arho3b[i][m];
}
if (eflag_atom != 0) {
eatom[i] = eatom[i] + Fl;
for (m = 0; m < 6; m++) {
rho2[i] = rho2[i] + this->v2D[m] * arho2[i][m] * arho2[i][m];
}
for (m = 0; m < 10; m++) {
rho3[i] = rho3[i] + this->v3D[m] * arho3[i][m] * arho3[i][m];
}
if (rho0[i] > 0.0) {
if (this->ialloy == 1) {
t_ave[i][0] = fdiv_zero(t_ave[i][0], tsq_ave[i][0]);
t_ave[i][1] = fdiv_zero(t_ave[i][1], tsq_ave[i][1]);
t_ave[i][2] = fdiv_zero(t_ave[i][2], tsq_ave[i][2]);
} else if (this->ialloy == 2) {
t_ave[i][0] = this->t1_meam[elti];
t_ave[i][1] = this->t2_meam[elti];
t_ave[i][2] = this->t3_meam[elti];
} else {
t_ave[i][0] = t_ave[i][0] / rho0[i];
t_ave[i][1] = t_ave[i][1] / rho0[i];
t_ave[i][2] = t_ave[i][2] / rho0[i];
}
}
gamma[i] = t_ave[i][0] * rho1[i] + t_ave[i][1] * rho2[i] + t_ave[i][2] * rho3[i];
if (rho0[i] > 0.0) {
gamma[i] = gamma[i] / (rho0[i] * rho0[i]);
}
Z = get_Zij(this->lattce_meam[elti][elti]);
G = G_gam(gamma[i], this->ibar_meam[elti], errorflag);
if (errorflag != 0)
return;
get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp);
if (this->ibar_meam[elti] <= 0) {
Gbar = 1.0;
dGbar = 0.0;
} else {
if (this->mix_ref_t == 1) {
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
} else {
gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) /
(Z * Z);
}
Gbar = G_gam(gam, this->ibar_meam[elti], errorflag);
}
rho[i] = rho0[i] * G;
if (this->mix_ref_t == 1) {
if (this->ibar_meam[elti] <= 0) {
Gbar = 1.0;
dGbar = 0.0;
} else {
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar);
}
rho_bkgd = this->rho0_meam[elti] * Z * Gbar;
} else {
if (this->bkgd_dyn == 1) {
rho_bkgd = this->rho0_meam[elti] * Z;
} else {
rho_bkgd = this->rho_ref_meam[elti];
}
}
rhob = rho[i] / rho_bkgd;
denom = 1.0 / rho_bkgd;
G = dG_gam(gamma[i], this->ibar_meam[elti], dG);
dgamma1[i] = (G - 2 * dG * gamma[i]) * denom;
if (!iszero(rho0[i])) {
dgamma2[i] = (dG / rho0[i]) * denom;
} else {
dgamma2[i] = 0.0;
}
// dgamma3 is nonzero only if we are using the "mixed" rule for
// computing t in the reference system (which is not correct, but
// included for backward compatibility
if (this->mix_ref_t == 1) {
dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom;
} else {
dgamma3[i] = 0.0;
}
Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]);
if (eflag_either != 0) {
Fl *= scaleii;
if (eflag_global != 0) {
*eng_vdwl = *eng_vdwl + Fl;
}
if (eflag_atom != 0) {
eatom[i] = eatom[i] + Fl;
}
}
}
}

View File

@ -45,6 +45,14 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
memory->destroy(arho3b);
memory->destroy(t_ave);
memory->destroy(tsq_ave);
// msmeam params
if (this->msmeamflag) {
memory->destroy(arho1m);
memory->destroy(arho2m);
memory->destroy(arho3m);
memory->destroy(arho2mb);
memory->destroy(arho3mb);
}
nmax = atom_nmax;
@ -65,6 +73,14 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
memory->create(arho3b, nmax, 3, "pair:arho3b");
memory->create(t_ave, nmax, 3, "pair:t_ave");
memory->create(tsq_ave, nmax, 3, "pair:tsq_ave");
// msmeam params
if (this->msmeamflag) {
memory->create(arho1m, nmax, 3, "pair:arho1m");
memory->create(arho2m, nmax, 6, "pair:arho2m");
memory->create(arho3m, nmax, 10, "pair:arho3m");
memory->create(arho2mb, nmax, "pair:arho2mb");
memory->create(arho3mb, nmax, 3, "pair:arho3mb");
}
}
if (n_neigh > maxneigh) {
@ -83,14 +99,30 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
rho0[i] = 0.0;
arho2b[i] = 0.0;
arho1[i][0] = arho1[i][1] = arho1[i][2] = 0.0;
for (j = 0; j < 6; j++)
if (this->msmeamflag) {
arho2mb[i] = 0.0;
arho1m[i][0] = arho1m[i][1] = arho1m[i][2] = 0.0;
}
for (j = 0; j < 6; j++) {
arho2[i][j] = 0.0;
for (j = 0; j < 10; j++)
if (this->msmeamflag) {
arho2m[i][j] = 0.0;
}
}
for (j = 0; j < 10; j++) {
arho3[i][j] = 0.0;
if (this->msmeamflag) {
arho3m[i][j] = 0.0;
}
}
arho3b[i][0] = arho3b[i][1] = arho3b[i][2] = 0.0;
if (this->msmeamflag) {
arho3mb[i][0] = arho3mb[i][1] = arho3mb[i][2] = 0.0;
}
t_ave[i][0] = t_ave[i][1] = t_ave[i][2] = 0.0;
tsq_ave[i][0] = tsq_ave[i][1] = tsq_ave[i][2] = 0.0;
}
}
void
@ -282,6 +314,9 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
// double G,Gbar,gam,shp[3+1];
double ro0i, ro0j;
double rhoa0i, rhoa1i, rhoa2i, rhoa3i, A1i, A2i, A3i;
// msmeam params
double rhoa1mj, rhoa2mj, rhoa3mj, A1mj, A2mj, A3mj;
double rhoa1mi, rhoa2mi, rhoa3mi, A1mi, A2mi, A3mi;
elti = fmap[type[i]];
xtmp = x[i][0];
@ -306,10 +341,20 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
rhoa1j = ro0j * MathSpecial::fm_exp(-this->beta1_meam[eltj] * aj) * sij;
rhoa2j = ro0j * MathSpecial::fm_exp(-this->beta2_meam[eltj] * aj) * sij;
rhoa3j = ro0j * MathSpecial::fm_exp(-this->beta3_meam[eltj] * aj) * sij;
if (this->msmeamflag){
rhoa1mj = ro0j * this->t1m_meam[eltj] * MathSpecial::fm_exp(-this->beta1m_meam[eltj] * aj) * sij;
rhoa2mj = ro0j * this->t2m_meam[eltj] * MathSpecial::fm_exp(-this->beta2m_meam[eltj] * aj) * sij;
rhoa3mj = ro0j * this->t3m_meam[eltj] * MathSpecial::fm_exp(-this->beta3m_meam[eltj] * aj) * sij;
}
rhoa0i = ro0i * MathSpecial::fm_exp(-this->beta0_meam[elti] * ai) * sij;
rhoa1i = ro0i * MathSpecial::fm_exp(-this->beta1_meam[elti] * ai) * sij;
rhoa2i = ro0i * MathSpecial::fm_exp(-this->beta2_meam[elti] * ai) * sij;
rhoa3i = ro0i * MathSpecial::fm_exp(-this->beta3_meam[elti] * ai) * sij;
if (this->msmeamflag){
rhoa1mi = ro0i * this->t1m_meam[elti] * MathSpecial::fm_exp(-this->beta1m_meam[elti] * ai) * sij;
rhoa2mi = ro0i * this->t2m_meam[elti] * MathSpecial::fm_exp(-this->beta2m_meam[elti] * ai) * sij;
rhoa3mi = ro0i * this->t3m_meam[elti] * MathSpecial::fm_exp(-this->beta3m_meam[elti] * ai) * sij;
}
if (this->ialloy == 1) {
rhoa1j = rhoa1j * this->t1_meam[eltj];
rhoa2j = rhoa2j * this->t2_meam[eltj];
@ -321,6 +366,7 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
rho0[i] = rho0[i] + rhoa0j;
rho0[j] = rho0[j] + rhoa0i;
// For ialloy = 2, use single-element value (not average)
// For ialloy = 2, use single-element value (not average)
if (this->ialloy != 2) {
t_ave[i][0] = t_ave[i][0] + this->t1_meam[eltj] * rhoa0j;
t_ave[i][1] = t_ave[i][1] + this->t2_meam[eltj] * rhoa0j;
@ -348,18 +394,42 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
A3i = rhoa3i / (rij2 * rij);
nv2 = 0;
nv3 = 0;
if (this->msmeamflag) {
arho2mb[i] = arho2mb[i] + rhoa2mj;
arho2mb[j] = arho2mb[j] + rhoa2mi;
A1mj = rhoa1mj/rij;
A2mj = rhoa2mj/rij2;
A3mj = rhoa3mj/(rij2*rij);
A1mi = rhoa1mi/rij;
A2mi = rhoa2mi/rij2;
A3mi = rhoa3mi/(rij2*rij);
}
for (m = 0; m < 3; m++) {
arho1[i][m] = arho1[i][m] + A1j * delij[m];
arho1[j][m] = arho1[j][m] - A1i * delij[m];
arho3b[i][m] = arho3b[i][m] + rhoa3j * delij[m] / rij;
arho3b[j][m] = arho3b[j][m] - rhoa3i * delij[m] / rij;
if (this->msmeamflag) {
arho1m[i][m] = arho1m[i][m] + A1mj*delij[m];
arho1m[j][m] = arho1m[j][m] - A1mi*delij[m];
arho3mb[i][m] = arho3mb[i][m] + rhoa3mj*delij[m] / rij;
arho3mb[j][m] = arho3mb[j][m] - rhoa3mi*delij[m] / rij;
}
for (n = m; n < 3; n++) {
arho2[i][nv2] = arho2[i][nv2] + A2j * delij[m] * delij[n];
arho2[j][nv2] = arho2[j][nv2] + A2i * delij[m] * delij[n];
if (this->msmeamflag) {
arho2m[i][nv2] = arho2m[i][nv2] + A2mj*delij[m] * delij[n];
arho2m[j][nv2] = arho2m[j][nv2] + A2mi*delij[m] * delij[n];
}
nv2 = nv2 + 1;
for (p = n; p < 3; p++) {
arho3[i][nv3] = arho3[i][nv3] + A3j * delij[m] * delij[n] * delij[p];
arho3[j][nv3] = arho3[j][nv3] - A3i * delij[m] * delij[n] * delij[p];
if (this->msmeamflag) {
arho3m[i][nv3] = arho3m[i][nv3] + A3mj*delij[m]*delij[n]*delij[p];
arho3m[j][nv3] = arho3m[j][nv3] - A3mi*delij[m]*delij[n]*delij[p];
}
nv3 = nv3 + 1;
}
}

View File

@ -61,6 +61,17 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
double t1i, t2i, t3i, t1j, t2j, t3j;
double scaleij;
double rhoa1mj,drhoa1mj,rhoa1mi,drhoa1mi;
double rhoa2mj,drhoa2mj,rhoa2mi,drhoa2mi;
double rhoa3mj, drhoa3mj, rhoa3mi, drhoa3mi;
double arg1i1m, arg1j1m, arg1i2m, arg1j2m, arg1i3m, arg1j3m, arg3i3m, arg3j3m;
double drho1mdr1, drho1mdr2, drho1mds1, drho1mds2;
double drho1mdrm1[3], drho1mdrm2[3];
double drho2mdr1, drho2mdr2, drho2mds1, drho2mds2;
double drho2mdrm1[3], drho2mdrm2[3];
double drho3mdr1, drho3mdr2, drho3mds1, drho3mds2;
double drho3mdrm1[3], drho3mdrm2[3];
third = 1.0 / 3.0;
sixth = 1.0 / 6.0;
@ -74,6 +85,7 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
zitmp = x[i][2];
// Treat each pair
for (jn = 0; jn < numneigh; jn++) {
j = firstneigh[jn];
eltj = fmap[type[j]];
@ -89,7 +101,6 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
if (rij2 < this->cutforcesq) {
rij = sqrt(rij2);
recip = 1.0 / rij;
// Compute phi and phip
ind = this->eltind[elti][eltj];
pp = rij * this->rdrar;
@ -114,6 +125,7 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
// write(1,*) "force_meamf: phip: ",phip
// Compute pair densities and derivatives
invrei = 1.0 / this->re_meam[elti][elti];
ai = rij * invrei - 1.0;
ro0i = this->rho0_meam[elti];
@ -126,6 +138,15 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
rhoa3i = ro0i * MathSpecial::fm_exp(-this->beta3_meam[elti] * ai);
drhoa3i = -this->beta3_meam[elti] * invrei * rhoa3i;
if (this->msmeamflag) {
rhoa1mi = ro0i * MathSpecial::fm_exp(-this->beta1m_meam[elti] * ai) * t1m_meam[elti];
drhoa1mi = -this->beta1m_meam[elti] * invrei * rhoa1mi;
rhoa2mi = ro0i * MathSpecial::fm_exp(-this->beta2m_meam[elti] * ai) * t2m_meam[elti];
drhoa2mi = -this->beta2m_meam[elti] * invrei * rhoa2mi;
rhoa3mi = ro0i * MathSpecial::fm_exp(-this->beta3m_meam[elti] * ai) * t3m_meam[elti];
drhoa3mi = -this->beta3m_meam[elti] * invrei * rhoa3mi;
}
if (elti != eltj) {
invrej = 1.0 / this->re_meam[eltj][eltj];
aj = rij * invrej - 1.0;
@ -138,6 +159,16 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
drhoa2j = -this->beta2_meam[eltj] * invrej * rhoa2j;
rhoa3j = ro0j * MathSpecial::fm_exp(-this->beta3_meam[eltj] * aj);
drhoa3j = -this->beta3_meam[eltj] * invrej * rhoa3j;
if (this->msmeamflag) {
rhoa1mj = ro0j * t1m_meam[eltj] * MathSpecial::fm_exp(-this->beta1m_meam[eltj] * aj);
drhoa1mj = -this->beta1m_meam[eltj] * invrej * rhoa1mj;
rhoa2mj = ro0j * t2m_meam[eltj] * MathSpecial::fm_exp(-this->beta2m_meam[eltj] * aj);
drhoa2mj = -this->beta2m_meam[eltj] * invrej * rhoa2mj;
rhoa3mj = ro0j * t3m_meam[eltj] * MathSpecial::fm_exp(-this->beta3m_meam[eltj] * aj);
drhoa3mj = -this->beta3m_meam[eltj] * invrej * rhoa3mj;
}
} else {
rhoa0j = rhoa0i;
drhoa0j = drhoa0i;
@ -147,6 +178,15 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
drhoa2j = drhoa2i;
rhoa3j = rhoa3i;
drhoa3j = drhoa3i;
if (this->msmeamflag) {
rhoa1mj = rhoa1mi;
drhoa1mj = drhoa1mi;
rhoa2mj = rhoa2mi;
drhoa2mj = drhoa2mi;
rhoa3mj = rhoa3mi;
drhoa3mj = drhoa3mi;
}
}
const double t1mi = this->t1_meam[elti];
@ -156,7 +196,10 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
const double t2mj = this->t2_meam[eltj];
const double t3mj = this->t3_meam[eltj];
if (this->ialloy == 1) {
// ialloy mod not needed in MS-MEAM, but similarity here is that we multply rhos by t.
// We did this above with rhoa1mj, rhoa2mj, etc.
if (this->ialloy == 1 || this->msmeamflag) {
rhoa1j *= t1mj;
rhoa2j *= t2mj;
rhoa3j *= t3mj;
@ -200,6 +243,39 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
arg3j3 = arg3j3 - arho3b[j][n] * delij[n];
}
// msmeam arhom args
nv2 = 0;
nv3 = 0;
arg1i1m = 0.0;
arg1j1m = 0.0;
arg1i2m = 0.0;
arg1j2m = 0.0;
arg1i3m = 0.0;
arg1j3m = 0.0;
arg3i3m = 0.0;
arg3j3m = 0.0;
if (this->msmeamflag) {
for (n = 0; n < 3; n++) {
for (p = n; p < 3; p++) {
for (q = p; q < 3; q++) {
arg = delij[n] * delij[p] * delij[q] * this->v3D[nv3];
arg1i3m = arg1i3m - arho3m[i][nv3] * arg;
arg1j3m = arg1j3m + arho3m[j][nv3] * arg;
nv3 = nv3 + 1;
}
arg = delij[n] * delij[p] * this->v2D[nv2];
arg1i2m = arg1i2m + arho2m[i][nv2] * arg;
arg1j2m = arg1j2m + arho2m[j][nv2] * arg;
nv2 = nv2 + 1;
}
arg1i1m = arg1i1m - arho1m[i][n] * delij[n];
arg1j1m = arg1j1m + arho1m[j][n] * delij[n];
arg3i3m = arg3i3m - arho3mb[i][n] * delij[n];
arg3j3m = arg3j3m + arho3mb[j][n] * delij[n];
}
}
// rho0 terms
drho0dr1 = drhoa0j * sij;
drho0dr2 = drhoa0i * sij;
@ -254,32 +330,83 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
drho3drm2[m] = (-a3 * drho3drm2[m] + a3a * arho3b[j][m]) * rhoa3i;
}
// Compute derivatives of weighting functions t wrt rij
t1i = t_ave[i][0];
t2i = t_ave[i][1];
t3i = t_ave[i][2];
t1j = t_ave[j][0];
t2j = t_ave[j][1];
t3j = t_ave[j][2];
if (this->msmeamflag) {
// rho1m terms
a1 = 2 * sij / rij;
drho1mdr1 = a1 * (drhoa1mj - rhoa1mj / rij) * arg1i1m;
drho1mdr2 = a1 * (drhoa1mi - rhoa1mi / rij) * arg1j1m;
drho1mdr1 *= -1.0;
drho1mdr2 *= -1.0;
a1 = 2.0 * sij / rij;
for (m = 0; m < 3; m++) {
drho1mdrm1[m] = a1 * rhoa1mj * arho1m[i][m];
drho1mdrm2[m] = -a1 * rhoa1mi * arho1m[j][m];
}
if (this->ialloy == 1) {
// rho2m terms
a2 = 2 * sij / rij2;
drho2mdr1 = a2 * (drhoa2mj - 2 * rhoa2mj / rij) * arg1i2m - 2.0 / 3.0 * arho2mb[i] * drhoa2mj * sij;
drho2mdr2 = a2 * (drhoa2mi - 2 * rhoa2mi / rij) * arg1j2m - 2.0 / 3.0 * arho2mb[j] * drhoa2mi * sij;
a2 = 4 * sij / rij2;
for (m = 0; m < 3; m++) {
drho2mdrm1[m] = 0.0;
drho2mdrm2[m] = 0.0;
for (n = 0; n < 3; n++) {
drho2mdrm1[m] += arho2m[i][this->vind2D[m][n]] * delij[n];
drho2mdrm2[m] -= arho2m[j][this->vind2D[m][n]] * delij[n];
}
drho2mdrm1[m] = a2 * rhoa2mj * drho2mdrm1[m];
drho2mdrm2[m] = -a2 * rhoa2mi * drho2mdrm2[m];
}
a1i = fdiv_zero(drhoa0j * sij, tsq_ave[i][0]);
a1j = fdiv_zero(drhoa0i * sij, tsq_ave[j][0]);
a2i = fdiv_zero(drhoa0j * sij, tsq_ave[i][1]);
a2j = fdiv_zero(drhoa0i * sij, tsq_ave[j][1]);
a3i = fdiv_zero(drhoa0j * sij, tsq_ave[i][2]);
a3j = fdiv_zero(drhoa0i * sij, tsq_ave[j][2]);
// rho3m terms
rij3 = rij * rij2;
a3 = 2 * sij / rij3;
a3a = 6.0 / 5.0 * sij / rij;
drho3mdr1 = a3 * (drhoa3mj - 3 * rhoa3mj / rij) * arg1i3m - a3a * (drhoa3mj - rhoa3mj / rij) * arg3i3m;
drho3mdr2 = a3 * (drhoa3mi - 3 * rhoa3mi / rij) * arg1j3m - a3a * (drhoa3mi - rhoa3mi / rij) * arg3j3m;
drho3mdr1 *= -1.0;
drho3mdr2 *= -1.0;
dt1dr1 = a1i * (t1mj - t1i * MathSpecial::square(t1mj));
dt1dr2 = a1j * (t1mi - t1j * MathSpecial::square(t1mi));
dt2dr1 = a2i * (t2mj - t2i * MathSpecial::square(t2mj));
dt2dr2 = a2j * (t2mi - t2j * MathSpecial::square(t2mi));
dt3dr1 = a3i * (t3mj - t3i * MathSpecial::square(t3mj));
dt3dr2 = a3j * (t3mi - t3j * MathSpecial::square(t3mi));
a3 = 6 * sij / rij3;
a3a = 6 * sij / (5 * rij);
for (m = 0; m < 3; m++) {
drho3mdrm1[m] = 0.0;
drho3mdrm2[m] = 0.0;
nv2 = 0;
for (n = 0; n < 3; n++) {
for (p = n; p < 3; p++) {
arg = delij[n] * delij[p] * this->v2D[nv2];
drho3mdrm1[m] += arho3m[i][this->vind3D[m][n][p]] * arg;
drho3mdrm2[m] += arho3m[j][this->vind3D[m][n][p]] * arg;
nv2 = nv2 + 1;
}
}
drho3mdrm1[m] = (a3 * drho3mdrm1[m] - a3a * arho3mb[i][m]) * rhoa3mj;
drho3mdrm2[m] = (-a3 * drho3mdrm2[m] + a3a * arho3mb[j][m]) * rhoa3mi;
}
} else {
for (m = 0; m < 3; m++) {
drho1mdrm1[m] = 0.0;
drho1mdrm2[m] = 0.0;
drho2mdrm1[m] = 0.0;
drho2mdrm2[m] = 0.0;
drho3mdrm1[m] = 0.0;
drho3mdrm2[m] = 0.0;
}
}
} else if (this->ialloy == 2) {
// compute derivatives of weighting functions t wrt rij
// weighting functions t set to unity for MS-MEAM
if (this->msmeamflag) {
t1i = 1.0;
t2i = 1.0;
t3i = 1.0;
t1j = 1.0;
t2j = 1.0;
t3j = 1.0;
dt1dr1 = 0.0;
dt1dr2 = 0.0;
dt2dr1 = 0.0;
@ -289,38 +416,98 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
} else {
ai = 0.0;
if (!iszero(rho0[i]))
ai = drhoa0j * sij / rho0[i];
aj = 0.0;
if (!iszero(rho0[j]))
aj = drhoa0i * sij / rho0[j];
t1i = t_ave[i][0];
t2i = t_ave[i][1];
t3i = t_ave[i][2];
t1j = t_ave[j][0];
t2j = t_ave[j][1];
t3j = t_ave[j][2];
if (this->ialloy == 1) {
a1i = fdiv_zero(drhoa0j * sij, tsq_ave[i][0]);
a1j = fdiv_zero(drhoa0i * sij, tsq_ave[j][0]);
a2i = fdiv_zero(drhoa0j * sij, tsq_ave[i][1]);
a2j = fdiv_zero(drhoa0i * sij, tsq_ave[j][1]);
a3i = fdiv_zero(drhoa0j * sij, tsq_ave[i][2]);
a3j = fdiv_zero(drhoa0i * sij, tsq_ave[j][2]);
dt1dr1 = a1i * (t1mj - t1i * MathSpecial::square(t1mj));
dt1dr2 = a1j * (t1mi - t1j * MathSpecial::square(t1mi));
dt2dr1 = a2i * (t2mj - t2i * MathSpecial::square(t2mj));
dt2dr2 = a2j * (t2mi - t2j * MathSpecial::square(t2mi));
dt3dr1 = a3i * (t3mj - t3i * MathSpecial::square(t3mj));
dt3dr2 = a3j * (t3mi - t3j * MathSpecial::square(t3mi));
} else if (this->ialloy == 2) {
dt1dr1 = 0.0;
dt1dr2 = 0.0;
dt2dr1 = 0.0;
dt2dr2 = 0.0;
dt3dr1 = 0.0;
dt3dr2 = 0.0;
} else {
ai = 0.0;
if (!iszero(rho0[i]))
ai = drhoa0j * sij / rho0[i];
aj = 0.0;
if (!iszero(rho0[j]))
aj = drhoa0i * sij / rho0[j];
dt1dr1 = ai * (t1mj - t1i);
dt1dr2 = aj * (t1mi - t1j);
dt2dr1 = ai * (t2mj - t2i);
dt2dr2 = aj * (t2mi - t2j);
dt3dr1 = ai * (t3mj - t3i);
dt3dr2 = aj * (t3mi - t3j);
}
dt1dr1 = ai * (t1mj - t1i);
dt1dr2 = aj * (t1mi - t1j);
dt2dr1 = ai * (t2mj - t2i);
dt2dr2 = aj * (t2mi - t2j);
dt3dr1 = ai * (t3mj - t3i);
dt3dr2 = aj * (t3mi - t3j);
}
// Compute derivatives of total density wrt rij, sij and rij(3)
get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shpi);
get_shpfcn(this->lattce_meam[eltj][eltj], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shpj);
drhodr1 = dgamma1[i] * drho0dr1 +
dgamma2[i] * (dt1dr1 * rho1[i] + t1i * drho1dr1 + dt2dr1 * rho2[i] + t2i * drho2dr1 +
dt3dr1 * rho3[i] + t3i * drho3dr1) -
dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
drhodr2 = dgamma1[j] * drho0dr2 +
dgamma2[j] * (dt1dr2 * rho1[j] + t1j * drho1dr2 + dt2dr2 * rho2[j] + t2j * drho2dr2 +
dt3dr2 * rho3[j] + t3j * drho3dr2) -
dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
for (m = 0; m < 3; m++) {
drhodrm1[m] = 0.0;
drhodrm2[m] = 0.0;
drhodrm1[m] = dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
drhodrm2[m] = dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
if (this->msmeamflag) {
drhodr1 = dgamma1[i] * drho0dr1 +
dgamma2[i] * (dt1dr1 * rho1[i] + t1i * (drho1dr1 - drho1mdr1) +
dt2dr1 * rho2[i] + t2i * (drho2dr1 - drho2mdr1) +
dt3dr1 * rho3[i] + t3i * (drho3dr1 - drho3mdr1)) -
dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
drhodr2 = dgamma1[j] * drho0dr2 +
dgamma2[j] * (dt1dr2 * rho1[j] + t1j * (drho1dr2 - drho1mdr2) +
dt2dr2 * rho2[j] + t2j * (drho2dr2 - drho2mdr2) +
dt3dr2 * rho3[j] + t3j * (drho3dr2 - drho3mdr2)) -
dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
for (m = 0; m < 3; m++) {
drhodrm1[m] = 0.0;
drhodrm2[m] = 0.0;
drhodrm1[m] = dgamma2[i] * (t1i * (drho1drm1[m] - drho1mdrm1[m]) +
t2i * (drho2drm1[m] - drho2mdrm1[m]) +
t3i * (drho3drm1[m] - drho3mdrm1[m]) );
drhodrm2[m] = dgamma2[j] * (t1j * (drho1drm2[m] - drho1mdrm2[m]) +
t2j * (drho2drm2[m] - drho2mdrm2[m]) +
t3j * (drho3drm2[m] - drho3mdrm2[m]) );
}
} else {
drhodr1 = dgamma1[i] * drho0dr1 +
dgamma2[i] * (dt1dr1 * rho1[i] + t1i * drho1dr1 + dt2dr1 * rho2[i] + t2i * drho2dr1 +
dt3dr1 * rho3[i] + t3i * drho3dr1) -
dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
drhodr2 = dgamma1[j] * drho0dr2 +
dgamma2[j] * (dt1dr2 * rho1[j] + t1j * drho1dr2 + dt2dr2 * rho2[j] + t2j * drho2dr2 +
dt3dr2 * rho3[j] + t3j * drho3dr2) -
dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
for (m = 0; m < 3; m++) {
drhodrm1[m] = 0.0;
drhodrm2[m] = 0.0;
drhodrm1[m] = dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
drhodrm2[m] = dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
}
}
// Compute derivatives wrt sij, but only if necessary
@ -328,17 +515,37 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
drho0ds1 = rhoa0j;
drho0ds2 = rhoa0i;
a1 = 2.0 / rij;
drho1ds1 = a1 * rhoa1j * arg1i1;
drho1ds2 = a1 * rhoa1i * arg1j1;
a2 = 2.0 / rij2;
drho2ds1 = a2 * rhoa2j * arg1i2 - 2.0 / 3.0 * arho2b[i] * rhoa2j;
drho2ds2 = a2 * rhoa2i * arg1j2 - 2.0 / 3.0 * arho2b[j] * rhoa2i;
a3 = 2.0 / rij3;
a3a = 6.0 / (5.0 * rij);
drho1ds1 = a1 * rhoa1j * arg1i1;
drho1ds2 = a1 * rhoa1i * arg1j1;
drho2ds1 = a2 * rhoa2j * arg1i2 - 2.0 / 3.0 * arho2b[i] * rhoa2j;
drho2ds2 = a2 * rhoa2i * arg1j2 - 2.0 / 3.0 * arho2b[j] * rhoa2i;
drho3ds1 = a3 * rhoa3j * arg1i3 - a3a * rhoa3j * arg3i3;
drho3ds2 = a3 * rhoa3i * arg1j3 - a3a * rhoa3i * arg3j3;
if (this->msmeamflag) {
drho1mds1 = a1 * rhoa1mj * arg1i1m;
drho1mds2 = a1 * rhoa1mi * arg1j1m;
drho2mds1 = a2 * rhoa2mj * arg1i2m - 2.0 / 3.0 * arho2mb[i] * rhoa2mj;
drho2mds2 = a2 * rhoa2mi * arg1j2m - 2.0 / 3.0 * arho2mb[j] * rhoa2mi;
drho3mds1 = a3 * rhoa3mj * arg1i3m - a3a * rhoa3mj * arg3i3m;
drho3mds2 = a3 * rhoa3mi * arg1j3m - a3a * rhoa3mi * arg3j3m;
drho3mds1 *= -1;
drho3mds2 *= -1;
} else {
drho1mds1 = 0.0;
drho1mds2 = 0.0;
drho2mds1 = 0.0;
drho2mds2 = 0.0;
drho3mds1 = 0.0;
drho3mds2 = 0.0;
}
if (this->ialloy == 1) {
a1i = fdiv_zero(rhoa0j, tsq_ave[i][0]);
a1j = fdiv_zero(rhoa0i, tsq_ave[j][0]);
a2i = fdiv_zero(rhoa0j, tsq_ave[i][1]);
@ -379,19 +586,36 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
dt3ds2 = aj * (t3mi - t3j);
}
drhods1 = dgamma1[i] * drho0ds1 +
dgamma2[i] * (dt1ds1 * rho1[i] + t1i * drho1ds1 + dt2ds1 * rho2[i] + t2i * drho2ds1 +
dt3ds1 * rho3[i] + t3i * drho3ds1) -
dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
drhods2 = dgamma1[j] * drho0ds2 +
dgamma2[j] * (dt1ds2 * rho1[j] + t1j * drho1ds2 + dt2ds2 * rho2[j] + t2j * drho2ds2 +
dt3ds2 * rho3[j] + t3j * drho3ds2) -
dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
if (this->msmeamflag) {
drhods1 = dgamma1[i] * drho0ds1 +
dgamma2[i] * (dt1ds1 * rho1[i] + t1i * (drho1ds1 - drho1mds1) +
dt2ds1 * rho2[i] + t2i * (drho2ds1 - drho2mds1) +
dt3ds1 * rho3[i] + t3i * (drho3ds1 - drho3mds1)) -
dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
drhods2 = dgamma1[j] * drho0ds2 +
dgamma2[j] * (dt1ds2 * rho1[j] + t1j * (drho1ds2 - drho1mds2) +
dt2ds2 * rho2[j] + t2j * (drho2ds2 - drho2mds2) +
dt3ds2 * rho3[j] + t3j * (drho3ds2 - drho3mds2)) -
dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
}
else {
drhods1 = dgamma1[i] * drho0ds1 +
dgamma2[i] * (dt1ds1 * rho1[i] + t1i * drho1ds1 + dt2ds1 * rho2[i] + t2i * drho2ds1 +
dt3ds1 * rho3[i] + t3i * drho3ds1) -
dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
drhods2 = dgamma1[j] * drho0ds2 +
dgamma2[j] * (dt1ds2 * rho1[j] + t1j * drho1ds2 + dt2ds2 * rho2[j] + t2j * drho2ds2 +
dt3ds2 * rho3[j] + t3j * drho3ds2) -
dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
}
}
// Compute derivatives of energy wrt rij, sij and rij[3]
// Compute derivatives of energy wrt rij, sij and rij[3]
// MS-MEAM affects phip
dUdrij = phip * sij + frhop[i] * drhodr1 + frhop[j] * drhodr2;
dUdsij = 0.0;
if (!iszero(dscrfcn[fnoffset + jn])) {
dUdsij = phi + frhop[i] * drhods1 + frhop[j] * drhods2;
}

View File

@ -34,6 +34,11 @@ MEAM::MEAM(Memory* mem)
gamma = dgamma1 = dgamma2 = dgamma3 = arho2b = nullptr;
arho1 = arho2 = arho3 = arho3b = t_ave = tsq_ave = nullptr;
// msmeam arrays
msmeamflag = 0;
arho2mb = nullptr;
arho1m = arho2m = arho3m = arho3mb = nullptr;
maxneigh = 0;
scrfcn = dscrfcn = fcpair = nullptr;
copymode = 0;
@ -43,7 +48,9 @@ MEAM::MEAM(Memory* mem)
A_meam[i] = rho0_meam[i] = beta0_meam[i] =
beta1_meam[i]= beta2_meam[i] = beta3_meam[i] =
t0_meam[i] = t1_meam[i] = t2_meam[i] = t3_meam[i] =
rho_ref_meam[i] = ibar_meam[i] = ielt_meam[i] = 0.0;
rho_ref_meam[i] = ibar_meam[i] = ielt_meam[i] =
t1m_meam[i] = t2m_meam[i] = t3m_meam[i] =
beta1m_meam[i] = beta2m_meam[i] = beta3m_meam[i] = 0.0;
for (int j = 0; j < maxelt; j++) {
lattce_meam[i][j] = FCC;
Ec_meam[i][j] = re_meam[i][j] = alpha_meam[i][j] = delta_meam[i][j] = ebound_meam[i][j] = attrac_meam[i][j] = repuls_meam[i][j] = 0.0;
@ -87,4 +94,13 @@ MEAM::~MEAM()
memory->destroy(this->scrfcn);
memory->destroy(this->dscrfcn);
memory->destroy(this->fcpair);
// msmeam
if (this->msmeamflag){
memory->destroy(this->arho1m);
memory->destroy(this->arho2m);
memory->destroy(this->arho3m);
memory->destroy(this->arho2mb);
memory->destroy(this->arho3mb);
}
}

View File

@ -220,7 +220,6 @@ void MEAM::compute_pair_meam()
// loop over r values and compute
for (j = 0; j < this->nr; j++) {
r = j * this->dr;
this->phir[nv2][j] = phi_meam(r, a, b);
// if using second-nearest neighbor, solve recursive problem
@ -333,9 +332,12 @@ double MEAM::phi_meam(double r, int a, int b)
lattice_t latta /*unused:,lattb*/;
double rho_bkgd1, rho_bkgd2;
double b11s, b22s;
// msmeam
double t1m1av, t2m1av, t3m1av, t1m2av, t2m2av, t3m2av;
double rho1m1, rho2m1, rho3m1;
double rho1m2, rho2m2, rho3m2;
double phi_m = 0.0;
// Equation numbers below refer to:
// I. Huang et.al., Modelling simul. Mater. Sci. Eng. 3:615
@ -345,8 +347,16 @@ double MEAM::phi_meam(double r, int a, int b)
Z2 = get_Zij(this->lattce_meam[b][b]);
Z12 = get_Zij(this->lattce_meam[a][b]);
get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32);
// this function has extra args for msmeam
if (this->msmeamflag) {
get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32,
&rho1m1, &rho2m1, &rho3m1,
&rho1m2, &rho2m2, &rho3m2);
} else {
get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32,
nullptr, nullptr, nullptr,
nullptr, nullptr, nullptr);
}
// if densities are too small, numerical problems may result; just return zero
if (rho01 <= 1e-14 && rho02 <= 1e-14)
return 0.0;
@ -374,6 +384,12 @@ double MEAM::phi_meam(double r, int a, int b)
get_tavref(&t11av, &t21av, &t31av, &t12av, &t22av, &t32av, this->t1_meam[a], this->t2_meam[a],
this->t3_meam[a], this->t1_meam[b], this->t2_meam[b], this->t3_meam[b], r, a, b,
this->lattce_meam[a][b]);
// with msmeam call twice with different sets of variables
if (this->msmeamflag) {
get_tavref(&t1m1av, &t2m1av, &t3m1av, &t1m2av, &t2m2av, &t3m2av, this->t1m_meam[a], this->t2m_meam[a],
this->t3m_meam[a], this->t1m_meam[b], this->t2m_meam[b], this->t3m_meam[b], r, a, b,
this->lattce_meam[a][b]);
}
}
// for c11b structure, calculate background electron densities
@ -420,17 +436,33 @@ double MEAM::phi_meam(double r, int a, int b)
rho0_1 = this->rho0_meam[a] * Z1 * G1;
rho0_2 = this->rho0_meam[b] * Z2 * G2;
}
Gam1 = (t11av * rho11 + t21av * rho21 + t31av * rho31);
if (rho01 < 1.0e-14)
Gam1 = 0.0;
else
Gam1 = Gam1 / (rho01 * rho01);
Gam2 = (t12av * rho12 + t22av * rho22 + t32av * rho32);
if (rho02 < 1.0e-14)
Gam2 = 0.0;
else
Gam2 = Gam2 / (rho02 * rho02);
if (this->msmeamflag) {
// no additional use of t's here; all included in definitions of rho's for msmeam
Gam1 = rho11 + rho21 + rho31 - (rho1m1 + rho2m1 + rho3m1);
if (rho01 < 1.0e-14)
Gam1 = 0.0;
else
Gam1 = Gam1 / (rho01 * rho01);
Gam2 = rho12 + rho22 + rho32 - (rho1m2 + rho2m2 + rho3m2);
if (rho02 < 1.0e-14)
Gam2 = 0.0;
else
Gam2 = Gam2 / (rho02 * rho02);
} else {
Gam1 = (t11av * rho11 + t21av * rho21 + t31av * rho31);
if (rho01 < 1.0e-14)
Gam1 = 0.0;
else
Gam1 = Gam1 / (rho01 * rho01);
Gam2 = (t12av * rho12 + t22av * rho22 + t32av * rho32);
if (rho02 < 1.0e-14)
Gam2 = 0.0;
else
Gam2 = Gam2 / (rho02 * rho02);
}
G1 = G_gam(Gam1, this->ibar_meam[a], errorflag);
G2 = G_gam(Gam2, this->ibar_meam[b], errorflag);
@ -655,7 +687,9 @@ void MEAM::get_sijk(double C, int i, int j, int k, double* sijk)
//------------------------------------------------------------------------------c
// Calculate density functions, assuming reference configuration
void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, double* rho21, double* rho31,
double* rho02, double* rho12, double* rho22, double* rho32)
double* rho02, double* rho12, double* rho22, double* rho32,
double* rho1m1, double* rho2m1, double* rho3m1,
double* rho1m2, double* rho2m2, double* rho3m2)
{
double a1, a2;
double s[3];
@ -666,18 +700,39 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
double rhoa02, rhoa12, rhoa22, rhoa32;
double arat, scrn, denom;
double C, s111, s112, s221, S11, S22;
// msmeam
double rhoa1m1, rhoa2m1, rhoa3m1, rhoa1m2, rhoa2m2, rhoa3m2;
a1 = r / this->re_meam[a][a] - 1.0;
a2 = r / this->re_meam[b][b] - 1.0;
rhoa01 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta0_meam[a] * a1);
rhoa11 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1);
rhoa21 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1);
rhoa31 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1);
rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2);
rhoa12 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2);
rhoa22 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2);
rhoa32 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2);
if (this->msmeamflag) {
// the rho variables are multiplied by t here since ialloy not needed in msmeam
rhoa11 = this->rho0_meam[a] * this->t1_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1);
rhoa21 = this->rho0_meam[a] * this->t2_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1);
rhoa31 = this->rho0_meam[a] * this->t3_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1);
rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2);
rhoa12 = this->rho0_meam[b] * this->t1_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2);
rhoa22 = this->rho0_meam[b] * this->t2_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2);
rhoa32 = this->rho0_meam[b] * this->t3_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2);
// msmeam specific rho vars
rhoa1m1 = this->rho0_meam[a] * this->t1m_meam[a] * MathSpecial::fm_exp(-this->beta1m_meam[a] * a1);
rhoa2m1 = this->rho0_meam[a] * this->t2m_meam[a] * MathSpecial::fm_exp(-this->beta2m_meam[a] * a1);
rhoa3m1 = this->rho0_meam[a] * this->t3m_meam[a] * MathSpecial::fm_exp(-this->beta3m_meam[a] * a1);
rhoa1m2 = this->rho0_meam[b] * this->t1m_meam[b] * MathSpecial::fm_exp(-this->beta1m_meam[b] * a2);
rhoa2m2 = this->rho0_meam[b] * this->t2m_meam[b] * MathSpecial::fm_exp(-this->beta2m_meam[b] * a2);
rhoa3m2 = this->rho0_meam[b] * this->t3m_meam[b] * MathSpecial::fm_exp(-this->beta3m_meam[b] * a2);
} else {
rhoa11 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1);
rhoa21 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1);
rhoa31 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1);
rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2);
rhoa12 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2);
rhoa22 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2);
rhoa32 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2);
}
lat = this->lattce_meam[a][b];
@ -689,7 +744,16 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
*rho12 = 0.0;
*rho22 = 0.0;
*rho32 = 0.0;
if (this->msmeamflag) {
*rho1m1 = 0.0;
*rho2m1 = 0.0;
*rho3m1 = 0.0;
*rho1m2 = 0.0;
*rho2m2 = 0.0;
*rho3m2 = 0.0;
}
// keep track of density components separately; combine in the calling subroutine
switch (lat) {
case FCC:
*rho01 = 12.0 * rhoa02;
@ -710,12 +774,20 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
*rho02 = 4.0 * rhoa01;
*rho31 = 32.0 / 9.0 * rhoa32 * rhoa32;
*rho32 = 32.0 / 9.0 * rhoa31 * rhoa31;
if (this->msmeamflag) {
*rho3m1 = 32.0 / 9.0 * rhoa3m2 * rhoa3m2;
*rho3m2 = 32.0 / 9.0 * rhoa3m1 * rhoa3m1;
}
break;
case HCP:
*rho01 = 12 * rhoa02;
*rho02 = 12 * rhoa01;
*rho31 = 1.0 / 3.0 * rhoa32 * rhoa32;
*rho32 = 1.0 / 3.0 * rhoa31 * rhoa31;
if (this->msmeamflag) {
*rho3m1 = 1.0 / 3.0 * rhoa3m2 * rhoa3m2;
*rho3m2 = 1.0 / 3.0 * rhoa3m1 * rhoa3m1;
}
break;
case DIM:
get_shpfcn(DIM, 0, 0, s);
@ -727,6 +799,14 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
*rho22 = s[1] * rhoa21 * rhoa21;
*rho31 = s[2] * rhoa32 * rhoa32;
*rho32 = s[2] * rhoa31 * rhoa31;
if (this->msmeamflag) {
*rho1m1 = s[0] * rhoa1m2 * rhoa1m2;
*rho1m2 = s[0] * rhoa1m1 * rhoa1m1;
*rho2m1 = s[1] * rhoa2m2 * rhoa2m2;
*rho2m2 = s[1] * rhoa2m1 * rhoa2m1;
*rho3m1 = s[2] * rhoa3m2 * rhoa3m2;
*rho3m2 = s[2] * rhoa3m1 * rhoa3m1;
}
break;
case C11:
*rho01 = rhoa01;
@ -737,17 +817,28 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
*rho22 = rhoa22;
*rho31 = rhoa31;
*rho32 = rhoa32;
if (this->msmeamflag) {
*rho1m1 = rhoa1m1;
*rho1m2 = rhoa1m2;
*rho2m1 = rhoa2m1;
*rho2m2 = rhoa2m2;
*rho3m1 = rhoa3m1;
*rho3m2 = rhoa3m2;
}
break;
case L12:
*rho01 = 8 * rhoa01 + 4 * rhoa02;
*rho02 = 12 * rhoa01;
if (this->ialloy == 1) {
if (this->ialloy ==1){
*rho21 = 8. / 3. * MathSpecial::square(rhoa21 * this->t2_meam[a] - rhoa22 * this->t2_meam[b]);
denom = 8 * rhoa01 * MathSpecial::square(this->t2_meam[a]) + 4 * rhoa02 * MathSpecial::square(this->t2_meam[b]);
if (denom > 0.)
*rho21 = *rho21 / denom * *rho01;
} else
*rho21 = 8. / 3. * (rhoa21 - rhoa22) * (rhoa21 - rhoa22);
if (this->msmeamflag) {
*rho2m1 = 8. / 3. * (rhoa2m1 - rhoa2m2) * (rhoa2m1 - rhoa2m2);
}
break;
case B2:
*rho01 = 8.0 * rhoa02;
@ -864,6 +955,7 @@ void MEAM::interpolate_meam(int ind)
this->rdrar = 1.0 / drar;
// phir interp
for (j = 0; j < this->nrar; j++) {
this->phirar[ind][j] = this->phir[ind][j];
}

View File

@ -36,7 +36,8 @@ void
MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*/, double* alpha,
double* b0, double* b1, double* b2, double* b3, double* alat, double* esub,
double* asub, double* t0, double* t1, double* t2, double* t3, double* rozero,
int* ibar)
int* ibar, double* b1m, double *b2m, double *b3m, double *t1m, double *t2m,
double *t3m)
{
int i;
@ -53,6 +54,11 @@ MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*
this->beta1_meam[i] = b1[i];
this->beta2_meam[i] = b2[i];
this->beta3_meam[i] = b3[i];
if (this->msmeamflag){
this->beta1m_meam[i] = b1m[i];
this->beta2m_meam[i] = b2m[i];
this->beta3m_meam[i] = b3m[i];
}
tmplat[i] = alat[i];
this->Ec_meam[i][i] = esub[i];
this->A_meam[i] = asub[i];
@ -60,6 +66,11 @@ MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*
this->t1_meam[i] = t1[i];
this->t2_meam[i] = t2[i];
this->t3_meam[i] = t3[i];
if (this->msmeamflag){
this->t1m_meam[i] = t1m[i];
this->t2m_meam[i] = t2m[i];
this->t3m_meam[i] = t3m[i];
}
this->rho0_meam[i] = rozero[i];
this->ibar_meam[i] = ibar[i];

View File

@ -58,13 +58,12 @@ PairMEAM::PairMEAM(LAMMPS *lmp) : Pair(lmp)
allocated = 0;
nlibelements = 0;
meam_inst = new MEAM(memory);
meam_inst->msmeamflag = msmeamflag = 0;
myname = "meam";
scale = nullptr;
// set comm size needed by this Pair
comm_forward = 38;
comm_reverse = 30;
}
/* ----------------------------------------------------------------------
@ -93,7 +92,6 @@ void PairMEAM::compute(int eflag, int vflag)
int i,ii,n,inum_half,errorflag;
int *ilist_half,*numneigh_half,**firstneigh_half;
int *numneigh_full,**firstneigh_full;
ev_init(eflag,vflag);
// neighbor list info
@ -133,7 +131,6 @@ void PairMEAM::compute(int eflag, int vflag)
int offset = 0;
errorflag = 0;
for (ii = 0; ii < inum_half; ii++) {
i = ilist_half[ii];
meam_inst->meam_dens_init(i,ntype,type,map,x,
@ -142,9 +139,7 @@ void PairMEAM::compute(int eflag, int vflag)
offset);
offset += numneigh_half[i];
}
comm->reverse_comm(this);
meam_inst->meam_dens_final(nlocal,eflag_either,eflag_global,eflag_atom,
&eng_vdwl,eatom,ntype,type,map,scale,errorflag);
if (errorflag)
@ -159,7 +154,6 @@ void PairMEAM::compute(int eflag, int vflag)
double **vptr = nullptr;
if (vflag_atom) vptr = vatom;
for (ii = 0; ii < inum_half; ii++) {
i = ilist_half[ii];
meam_inst->meam_force(i,eflag_global,eflag_atom,vflag_global,
@ -169,7 +163,6 @@ void PairMEAM::compute(int eflag, int vflag)
offset,f,vptr,virial);
offset += numneigh_half[i];
}
if (vflag_fdotr) virial_fdotr_compute();
}
@ -193,7 +186,17 @@ void PairMEAM::allocate()
void PairMEAM::settings(int narg, char ** /*arg*/)
{
if (narg != 0) error->all(FLERR,"Illegal pair_style command");
if (narg != 0) error->all(FLERR,"Illegal pair_style {} command", myname);
// set comm size needed by this Pair
if (msmeamflag) {
comm_forward = 38+23; // plus 23 for msmeam
comm_reverse = 30+23; // plus 23 for msmeam
} else {
comm_forward = 38;
comm_reverse = 30;
}
}
/* ----------------------------------------------------------------------
@ -206,12 +209,7 @@ void PairMEAM::coeff(int narg, char **arg)
if (!allocated) allocate();
if (narg < 6) error->all(FLERR,"Incorrect args for pair coefficients");
// ensure I,J args are * *
if (strcmp(arg[0],"*") != 0 || strcmp(arg[1],"*") != 0)
error->all(FLERR,"Incorrect args for pair coefficients");
if (narg < 6) error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
// check for presence of first meam file
@ -239,7 +237,7 @@ void PairMEAM::coeff(int narg, char **arg)
}
if (paridx < 0) error->all(FLERR,"No MEAM parameter file in pair coefficients");
if ((narg - paridx - 1) != atom->ntypes)
error->all(FLERR,"Incorrect args for pair coefficients");
error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
// MEAM element names between 2 filenames
// nlibelements = # of MEAM elements
@ -282,7 +280,7 @@ void PairMEAM::coeff(int narg, char **arg)
if (libelements[j] == arg[i]) break;
if (j < nlibelements) map[m] = j;
else if (strcmp(arg[i],"NULL") == 0) map[m] = -1;
else error->all(FLERR,"Incorrect args for pair coefficients");
else error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
}
// clear setflag since coeff() called once with I,J = * *
@ -307,7 +305,7 @@ void PairMEAM::coeff(int narg, char **arg)
}
}
if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
if (count == 0) error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
}
/* ----------------------------------------------------------------------
@ -317,7 +315,7 @@ void PairMEAM::coeff(int narg, char **arg)
void PairMEAM::init_style()
{
if (force->newton_pair == 0)
error->all(FLERR,"Pair style MEAM requires newton pair on");
error->all(FLERR,"Pair style {} requires newton pair on", myname);
// need a full and a half neighbor list
@ -360,7 +358,9 @@ void PairMEAM::read_files(const std::string &globalfile,
void PairMEAM::read_global_meam_file(const std::string &globalfile)
{
// allocate parameter arrays
std::vector<lattice_t> lat(nlibelements);
std::vector<int> ielement(nlibelements);
std::vector<int> ibar(nlibelements);
@ -381,6 +381,15 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
std::vector<double> rozero(nlibelements);
std::vector<bool> found(nlibelements, false);
// allocate 6 extra arrays for msmeam
std::vector<double> b1m(nlibelements);
std::vector<double> b2m(nlibelements);
std::vector<double> b3m(nlibelements);
std::vector<double> t1m(nlibelements);
std::vector<double> t2m(nlibelements);
std::vector<double> t3m(nlibelements);
// open global meamf file on proc 0
if (comm->me == 0) {
@ -416,8 +425,7 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
std::string lattice_type = values.next_string();
if (!MEAM::str_to_lat(lattice_type, true, lat[index]))
error->one(FLERR,"Unrecognized lattice type in MEAM "
"library file: {}", lattice_type);
error->one(FLERR,"Unrecognized lattice type in MEAM library file: {}", lattice_type);
// store parameters
@ -429,6 +437,11 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
b1[index] = values.next_double();
b2[index] = values.next_double();
b3[index] = values.next_double();
if (msmeamflag) {
b1m[index] = values.next_double();
b2m[index] = values.next_double();
b3m[index] = values.next_double();
}
alat[index] = values.next_double();
esub[index] = values.next_double();
asub[index] = values.next_double();
@ -436,15 +449,20 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
t1[index] = values.next_double();
t2[index] = values.next_double();
t3[index] = values.next_double();
if (msmeamflag) {
t1m[index] = values.next_double();
t2m[index] = values.next_double();
t3m[index] = values.next_double();
}
rozero[index] = values.next_double();
ibar[index] = values.next_int();
if (!isone(t0[index]))
error->one(FLERR,"Unsupported parameter in MEAM library file: t0!=1");
error->one(FLERR,"Unsupported parameter in MEAM library file: t0 != 1");
// z given is ignored: if this is mismatched, we definitely won't do what the user said -> fatal error
if (z[index] != MEAM::get_Zij(lat[index]))
error->one(FLERR,"Mismatched parameter in MEAM library file: z!=lat");
error->one(FLERR,"Mismatched parameter in MEAM library file: z != lat");
nset++;
} catch (TokenizerException &e) {
@ -484,13 +502,29 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
MPI_Bcast(t2.data(), nlibelements, MPI_DOUBLE, 0, world);
MPI_Bcast(t3.data(), nlibelements, MPI_DOUBLE, 0, world);
MPI_Bcast(rozero.data(), nlibelements, MPI_DOUBLE, 0, world);
// distribute msmeam parameter sets
MPI_Bcast(b1m.data(), nlibelements, MPI_DOUBLE, 0, world);
MPI_Bcast(b2m.data(), nlibelements, MPI_DOUBLE, 0, world);
MPI_Bcast(b3m.data(), nlibelements, MPI_DOUBLE, 0, world);
MPI_Bcast(t1m.data(), nlibelements, MPI_DOUBLE, 0, world);
MPI_Bcast(t2m.data(), nlibelements, MPI_DOUBLE, 0, world);
MPI_Bcast(t3m.data(), nlibelements, MPI_DOUBLE, 0, world);
// pass element parameters to MEAM package
meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(),
alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(),
alat.data(), esub.data(), asub.data(), t0.data(), t1.data(),
t2.data(), t3.data(), rozero.data(), ibar.data());
if (msmeamflag) {
meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(),
alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(),
alat.data(), esub.data(), asub.data(), t0.data(), t1.data(),
t2.data(), t3.data(), rozero.data(), ibar.data(), b1m.data(),
b2m.data(), b3m.data(), t1m.data(), t2m.data(), t3m.data());
} else {
meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(),
alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(),
alat.data(), esub.data(), asub.data(), t0.data(), t1.data(),
t2.data(), t3.data(), rozero.data(), ibar.data(), nullptr,
nullptr, nullptr, nullptr, nullptr, nullptr);
}
// set element masses
@ -613,6 +647,23 @@ int PairMEAM::pack_forward_comm(int n, int *list, double *buf,
buf[m++] = meam_inst->tsq_ave[j][0];
buf[m++] = meam_inst->tsq_ave[j][1];
buf[m++] = meam_inst->tsq_ave[j][2];
if (msmeamflag) {
buf[m++] = meam_inst->arho2mb[j];
buf[m++] = meam_inst->arho1m[j][0];
buf[m++] = meam_inst->arho1m[j][1];
buf[m++] = meam_inst->arho1m[j][2];
buf[m++] = meam_inst->arho2m[j][0];
buf[m++] = meam_inst->arho2m[j][1];
buf[m++] = meam_inst->arho2m[j][2];
buf[m++] = meam_inst->arho2m[j][3];
buf[m++] = meam_inst->arho2m[j][4];
buf[m++] = meam_inst->arho2m[j][5];
for (k = 0; k < 10; k++) buf[m++] = meam_inst->arho3m[j][k];
buf[m++] = meam_inst->arho3mb[j][0];
buf[m++] = meam_inst->arho3mb[j][1];
buf[m++] = meam_inst->arho3mb[j][2];
}
}
return m;
@ -656,6 +707,22 @@ void PairMEAM::unpack_forward_comm(int n, int first, double *buf)
meam_inst->tsq_ave[i][0] = buf[m++];
meam_inst->tsq_ave[i][1] = buf[m++];
meam_inst->tsq_ave[i][2] = buf[m++];
if (msmeamflag) {
meam_inst->arho2mb[i] = buf[m++];
meam_inst->arho1m[i][0] = buf[m++];
meam_inst->arho1m[i][1] = buf[m++];
meam_inst->arho1m[i][2] = buf[m++];
meam_inst->arho2m[i][0] = buf[m++];
meam_inst->arho2m[i][1] = buf[m++];
meam_inst->arho2m[i][2] = buf[m++];
meam_inst->arho2m[i][3] = buf[m++];
meam_inst->arho2m[i][4] = buf[m++];
meam_inst->arho2m[i][5] = buf[m++];
for (k = 0; k < 10; k++) meam_inst->arho3m[i][k] = buf[m++];
meam_inst->arho3mb[i][0] = buf[m++];
meam_inst->arho3mb[i][1] = buf[m++];
meam_inst->arho3mb[i][2] = buf[m++];
}
}
}
@ -689,6 +756,22 @@ int PairMEAM::pack_reverse_comm(int n, int first, double *buf)
buf[m++] = meam_inst->tsq_ave[i][0];
buf[m++] = meam_inst->tsq_ave[i][1];
buf[m++] = meam_inst->tsq_ave[i][2];
if (msmeamflag) {
buf[m++] = meam_inst->arho2mb[i];
buf[m++] = meam_inst->arho1m[i][0];
buf[m++] = meam_inst->arho1m[i][1];
buf[m++] = meam_inst->arho1m[i][2];
buf[m++] = meam_inst->arho2m[i][0];
buf[m++] = meam_inst->arho2m[i][1];
buf[m++] = meam_inst->arho2m[i][2];
buf[m++] = meam_inst->arho2m[i][3];
buf[m++] = meam_inst->arho2m[i][4];
buf[m++] = meam_inst->arho2m[i][5];
for (k = 0; k < 10; k++) buf[m++] = meam_inst->arho3m[i][k];
buf[m++] = meam_inst->arho3mb[i][0];
buf[m++] = meam_inst->arho3mb[i][1];
buf[m++] = meam_inst->arho3mb[i][2];
}
}
return m;
@ -724,7 +807,25 @@ void PairMEAM::unpack_reverse_comm(int n, int *list, double *buf)
meam_inst->tsq_ave[j][0] += buf[m++];
meam_inst->tsq_ave[j][1] += buf[m++];
meam_inst->tsq_ave[j][2] += buf[m++];
if (msmeamflag) {
meam_inst->arho2mb[j] += buf[m++];
meam_inst->arho1m[j][0] += buf[m++];
meam_inst->arho1m[j][1] += buf[m++];
meam_inst->arho1m[j][2] += buf[m++];
meam_inst->arho2m[j][0] += buf[m++];
meam_inst->arho2m[j][1] += buf[m++];
meam_inst->arho2m[j][2] += buf[m++];
meam_inst->arho2m[j][3] += buf[m++];
meam_inst->arho2m[j][4] += buf[m++];
meam_inst->arho2m[j][5] += buf[m++];
for (k = 0; k < 10; k++) meam_inst->arho3m[j][k] += buf[m++];
meam_inst->arho3mb[j][0] += buf[m++];
meam_inst->arho3mb[j][1] += buf[m++];
meam_inst->arho3mb[j][2] += buf[m++];
}
}
}
/* ----------------------------------------------------------------------

View File

@ -47,6 +47,8 @@ class PairMEAM : public Pair {
class MEAM *meam_inst;
double cutmax; // max cutoff for all elements
int nlibelements; // # of library elements
int msmeamflag; // 0 (default) for normal MEAM, 1 for MS-MEAM
std::string myname; // name of the pair style
std::vector<std::string> libelements; // names of library elements
std::vector<double> mass; // mass of library element

25
src/MEAM/pair_meam_ms.cpp Normal file
View File

@ -0,0 +1,25 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
https://www.lammps.org/ Sandia National Laboratories
LAMMPS development team: developers@lammps.org
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include "pair_meam_ms.h"
#include "meam.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
PairMEAMMS::PairMEAMMS(LAMMPS *lmp) : PairMEAM(lmp)
{
meam_inst->msmeamflag = msmeamflag = 1;
myname = "meam/ms";
}

33
src/MEAM/pair_meam_ms.h Normal file
View File

@ -0,0 +1,33 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
https://www.lammps.org/ Sandia National Laboratories
LAMMPS development team: developers@lammps.org
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
// clang-format off
PairStyle(meam/ms,PairMEAMMS);
// clang-format on
#else
#ifndef LMP_PAIR_MEAM_MS_H
#define LMP_PAIR_MEAM_MS_H
#include "pair_meam.h"
namespace LAMMPS_NS {
class PairMEAMMS : public PairMEAM {
public:
PairMEAMMS(class LAMMPS *);
};
} // namespace LAMMPS_NS
#endif
#endif

View File

@ -0,0 +1,94 @@
---
lammps_version: 22 Dec 2022
tags: slow
date_generated: Thu Jan 26 15:27:03 2023
epsilon: 2.5e-12
skip_tests:
prerequisites: ! |
pair meam/ms
pre_commands: ! |
variable newton_pair delete
if "$(is_active(package,gpu)) > 0.0" then "variable newton_pair index off" else "variable newton_pair index on"
post_commands: ! ""
input_file: in.metal
pair_style: meam/ms
pair_coeff: ! |
* * library.msmeam H Ga4 HGa.msmeam H Ga4
extract: ! |
scale 2
natoms: 32
init_vdwl: 785.6030480758675
init_coul: 0
init_stress: ! |2-
3.3502530994900699e+03 3.6405858278699407e+03 3.6349804214165547e+03 -3.1609283411508039e+02 -7.9448207656135153e+01 -1.9854140603340727e+02
init_forces: ! |2
1 1.2872255079741514e+01 -7.5031848810810864e-01 4.5969595156096510e+01
2 -3.9028679722038632e+01 -1.5647800180326567e+02 -1.6643992152928173e+00
3 -6.1521549955194672e+01 2.6970968316419874e+02 -9.6866430262650326e+01
4 3.1462579880342336e+01 4.0240291291218455e+01 1.1654869213327775e+01
5 1.4859248182951113e+01 -3.4132880749392825e+01 6.7430378007130244e+01
6 6.4609571260694096e+00 -3.8973222482916441e+01 -2.8510000379627442e+01
7 7.8114612113500250e+00 -1.0421431668544374e+01 -4.2887607385766536e+01
8 -4.8934215863351795e+01 -6.3567347969802590e-01 1.1845972792272754e+02
9 9.4089549606898402e+01 -7.4342942103394511e+00 2.5331198575951383e+01
10 1.5130369934140692e+01 -5.9245630928969938e+01 -6.7469126603400198e+01
11 -2.5176547213746847e+01 1.1577205529172168e+02 -2.2897457133540517e+01
12 6.2237686199502349e+01 2.0501996047945163e+01 -2.8805091517252826e+01
13 -5.9438589221526925e+01 3.0453092653824072e+01 -1.9919245831196157e+01
14 6.9128305482543766e+01 -7.7400771634148342e+01 3.3376079908119145e+01
15 -4.9671207786831857e+01 -4.9520814527298228e+01 8.4325181097614305e+01
16 -1.1782591146017666e+01 -3.2478963020209051e+01 1.5503663677714293e+01
17 9.0881787245915220e+00 6.2377477671714963e+01 -4.0411006180232363e+01
18 -4.2285082775720454e+01 2.4883979527636967e+01 -4.4858149086530510e+00
19 -8.0259798420493979e+01 9.6356660229207137e+01 6.0543230952477984e+01
20 8.0924547938759346e+01 7.1034504027236025e+01 -7.1958482512489610e+01
21 1.0833434220705425e+02 -1.5973910256481020e+02 -2.5432700070393153e+01
22 -2.3754601906353900e+00 5.2216955012971823e+01 4.7112051341131576e+00
23 -2.7227169255996543e+01 8.1968603165764222e+01 4.6535834898716878e+01
24 -2.9230758067555616e+01 6.5909555829367733e+01 -2.8250697734131258e+01
25 -5.1310041582953993e+01 -3.0895272949222822e+01 -5.4271286813003794e+00
26 3.9605941911194620e+01 -5.5919050176828883e+01 -1.0209061328106253e+01
27 8.2934427989660890e+01 6.1956200199325636e+01 5.0072108788590960e+01
28 -7.8572755094413296e+01 -3.9613391730681300e+01 -2.6183413623428891e+00
29 6.9475725072041925e+01 -6.0535433603583563e+01 -1.4566536349135829e+01
30 -2.4347184151182930e+01 -1.9359391333689970e+02 -2.6718379302915952e+01
31 7.7351971629808688e+01 -7.0102650745312999e+01 -5.4615048867524763e+01
32 -1.5060591772899014e+02 8.4489763988097266e+01 2.9799482293372058e+01
run_vdwl: 682.3107192428497
run_coul: 0
run_stress: ! |2-
3.2247564044913129e+03 3.3749506031067485e+03 3.3223794967215117e+03 -2.8460979167554797e+02 -7.2614457076660575e+00 -3.1510685747732862e+02
run_forces: ! |2
1 -1.2037185973996296e+01 -2.5090364403764944e+01 1.4014184973113366e+01
2 -3.7365848425239264e+01 -1.5871199357658887e+02 3.7846333470446991e+00
3 -3.2057228694304293e+01 2.5316344962361612e+02 -6.0679585186816752e+01
4 2.9086197614116237e+01 4.8267528016068823e+01 4.3387429619749920e+00
5 -1.1672554618399744e+01 -2.6840760926124332e+01 4.9694308545223279e+01
6 1.1892092913978592e+01 -4.9360840569608243e+01 -2.3083171938147949e+01
7 2.1084251901459215e+01 -4.8251731643401072e+00 -3.8474871193885967e+01
8 -5.7775944085787714e+01 1.3522956442661442e+01 1.1661345819661486e+02
9 7.2926105059437930e+01 4.8686056096860133e+00 2.3817134806042311e+01
10 1.7307367990304396e+01 -3.0865570121704572e+01 -1.2314307646704794e+01
11 -1.1341297645054201e+01 9.1441145595173211e+01 -2.1806407500802493e+01
12 4.0645024127126625e+01 1.2207243511090397e+01 -2.6757649464936929e+01
13 -5.2283270287937697e+01 3.4023912643812679e+01 -1.9030352703627774e+01
14 8.4403128243303399e+01 -9.3773678297574406e+01 1.6481720093363641e+01
15 -4.2790833192154764e+01 -4.3242943642279130e+01 7.1075696811865868e+01
16 -1.5041912007490836e+01 -3.3544044565611586e+01 2.4823109532967212e+01
17 -9.6413207346836316e-01 4.5826021602656141e+01 -3.9155163702194102e+01
18 -2.0337015515785971e+01 7.2815285567550134e+00 -8.2049879725129813e+00
19 -6.4105384732081120e+01 1.1564665740933788e+02 2.4163791756721466e+01
20 8.5723654185276146e+01 8.3354105531647818e+01 -6.6380939444134356e+01
21 7.2614253221132458e+01 -1.0858997173537107e+02 -9.7505297776024449e+00
22 -7.0420361713052930e+00 5.3431098224890221e+01 3.3089063930822551e+00
23 -2.6591358240682062e+01 5.7408565880721866e+01 2.7437106471305679e+01
24 -4.1792038450554799e+01 5.1730557789864775e+01 -4.0814677464080816e+01
25 -4.1432062506590214e+01 -2.5839213423062226e+01 4.2240164846210408e+00
26 4.7210066329871566e+01 -5.2462761136081880e+01 -7.3222050314410501e+00
27 7.1880187551772764e+01 6.4264938765955392e+01 4.3600944370341068e+01
28 -8.4540787660053340e+01 -3.5402262816619938e+01 -1.8100280797937039e+01
29 6.9538301274653790e+01 -6.3441028093040622e+01 -1.4636386232064458e+01
30 -1.0347208112535196e+01 -1.7647584813608077e+02 7.2581082578181517e+00
31 5.5139777976761025e+01 -4.2081916983382541e+01 -4.6602437208067727e+01
32 -1.0993230999577290e+02 3.4110056387297462e+01 1.8478090262857769e+01
...