Merge pull request #3608 from rohskopf/msmeam-adr
Multi-state MEAM with Kokkos support
This commit is contained in:
@ -200,6 +200,7 @@ OPT.
|
||||
* :doc:`mdpd <pair_mesodpd>`
|
||||
* :doc:`mdpd/rhosum <pair_mesodpd>`
|
||||
* :doc:`meam (k) <pair_meam>`
|
||||
* :doc:`meam/ms (k) <pair_meam>`
|
||||
* :doc:`meam/spline (o) <pair_meam_spline>`
|
||||
* :doc:`meam/sw/spline <pair_meam_sw_spline>`
|
||||
* :doc:`mesocnt <pair_mesocnt>`
|
||||
|
||||
@ -1,17 +1,26 @@
|
||||
.. index:: pair_style meam
|
||||
.. index:: pair_style meam/kk
|
||||
.. index:: pair_style meam/ms
|
||||
.. index:: pair_style meam/ms/kk
|
||||
|
||||
pair_style meam command
|
||||
=========================
|
||||
|
||||
Accelerator Variants: *meam/kk*
|
||||
|
||||
pair_style meam/ms command
|
||||
==========================
|
||||
|
||||
Accelerator Variants: *meam/ms/kk*
|
||||
|
||||
Syntax
|
||||
""""""
|
||||
|
||||
.. code-block:: LAMMPS
|
||||
|
||||
pair_style meam
|
||||
pair_style style
|
||||
|
||||
* style = *meam* or *meam/ms*
|
||||
|
||||
Examples
|
||||
""""""""
|
||||
@ -22,6 +31,9 @@ Examples
|
||||
pair_coeff * * ../potentials/library.meam Si ../potentials/si.meam Si
|
||||
pair_coeff * * ../potentials/library.meam Ni Al NULL Ni Al Ni Ni
|
||||
|
||||
pair_style meam/ms
|
||||
pair_coeff * * ../potentials/library.msmeam H Ga ../potentials/HGa.meam H Ga
|
||||
|
||||
Description
|
||||
"""""""""""
|
||||
|
||||
@ -31,16 +43,23 @@ Description
|
||||
as of November 2010; see description below of the mixture_ref_t
|
||||
parameter
|
||||
|
||||
Pair style *meam* computes non-bonded interactions for a variety of materials
|
||||
using the modified embedded-atom method (MEAM)
|
||||
:ref:`(Baskes) <Baskes>`. Conceptually, it is an extension to the original
|
||||
:doc:`EAM method <pair_eam>` which adds angular forces. It is
|
||||
thus suitable for modeling metals and alloys with fcc, bcc, hcp and
|
||||
diamond cubic structures, as well as materials with covalent interactions
|
||||
like silicon and carbon. This *meam* pair style is a translation of the
|
||||
original Fortran version to C++. It is functionally equivalent but more
|
||||
efficient and has additional features. The Fortran version of the *meam*
|
||||
pair style has been removed from LAMMPS after the 12 December 2018 release.
|
||||
Pair style *meam* computes non-bonded interactions for a variety of
|
||||
materials using the modified embedded-atom method (MEAM) :ref:`(Baskes)
|
||||
<Baskes>`. Conceptually, it is an extension to the original :doc:`EAM
|
||||
method <pair_eam>` which adds angular forces. It is thus suitable for
|
||||
modeling metals and alloys with fcc, bcc, hcp and diamond cubic
|
||||
structures, as well as materials with covalent interactions like silicon
|
||||
and carbon.
|
||||
|
||||
The *meam* pair style is a translation of the original Fortran version
|
||||
to C++. It is functionally equivalent but more efficient and has
|
||||
additional features. The Fortran version of the *meam* pair style has
|
||||
been removed from LAMMPS after the 12 December 2018 release.
|
||||
|
||||
Pair style *meam/ms* uses the multi-state MEAM (MS-MEAM) method
|
||||
according to :ref:`(Baskes2) <Baskes2>`, which is an extension to MEAM.
|
||||
This pair style is mostly equivalent to *meam* and differs only
|
||||
where noted in the documentation below.
|
||||
|
||||
In the MEAM formulation, the total energy E of a system of atoms is
|
||||
given by:
|
||||
@ -351,6 +370,16 @@ Most published MEAM parameter sets use the default values *attrac* = *repulse* =
|
||||
Setting *repuls* = *attrac* = *delta* corresponds to the form used in several
|
||||
recent published MEAM parameter sets, such as :ref:`(Valone) <Valone>`
|
||||
|
||||
Then using *meam/ms* pair style the multi-state MEAM (MS-MEAM) method is
|
||||
activated. This requires 6 extra parameters in the MEAM library file,
|
||||
resulting in 25 parameters ordered that are ordered like this:
|
||||
|
||||
elt, lat, z, ielement, atwt, alpha, b0, b1, b2, b3, b1m, b2m, b3m, alat, esub, asub,
|
||||
t0, t1, t2, t3, t1m, t2m, t3m, rozero, ibar
|
||||
|
||||
The 6 extra MS-MEAM parameters are *b1m, b2m, b3m, t1m, t2m, t3m*.
|
||||
In the LAMMPS ``potentials`` folder, compatible files have an ".msmeam" extension.
|
||||
|
||||
----------
|
||||
|
||||
.. include:: accel_styles.rst
|
||||
@ -393,16 +422,15 @@ This pair style can only be used via the *pair* keyword of the
|
||||
Restrictions
|
||||
""""""""""""
|
||||
|
||||
The *meam* style is provided in the MEAM package. It is
|
||||
only enabled if LAMMPS was built with that package.
|
||||
The *meam* and *meam/ms* pair styles are provided in the MEAM
|
||||
package. They are only enabled if LAMMPS was built with that package.
|
||||
See the :doc:`Build package <Build_package>` page for more info.
|
||||
|
||||
The maximum number of elements, that can be read from the MEAM
|
||||
library file, is determined at compile time. The default is 5.
|
||||
If you need support for more elements, you have to change the
|
||||
define for the constant 'maxelt' at the beginning of the file
|
||||
src/MEAM/meam.h and update/recompile LAMMPS. There is no
|
||||
limit on the number of atoms types.
|
||||
The maximum number of elements, that can be read from the MEAM library
|
||||
file, is determined at compile time. The default is 5. If you need
|
||||
support for more elements, you have to change the the constant 'maxelt'
|
||||
at the beginning of the file ``src/MEAM/meam.h`` and update/recompile
|
||||
LAMMPS. There is no limit on the number of atoms types.
|
||||
|
||||
Related commands
|
||||
""""""""""""""""
|
||||
@ -421,6 +449,10 @@ none
|
||||
|
||||
**(Baskes)** Baskes, Phys Rev B, 46, 2727-2742 (1992).
|
||||
|
||||
.. _Baskes2:
|
||||
|
||||
**(Baskes2)** Baskes, Phys Rev B, 75, 094113 (2007).
|
||||
|
||||
.. _Gullet:
|
||||
|
||||
**(Gullet)** Gullet, Wagner, Slepoy, SANDIA Report 2003-8782 (2003). DOI:10.2172/918395
|
||||
|
||||
@ -277,7 +277,8 @@ accelerated styles exist.
|
||||
* :doc:`lubricateU/poly <pair_lubricateU>` - hydrodynamic lubrication forces for Fast Lubrication with polydispersity
|
||||
* :doc:`mdpd <pair_mesodpd>` - mDPD particle interactions
|
||||
* :doc:`mdpd/rhosum <pair_mesodpd>` - mDPD particle interactions for mass density
|
||||
* :doc:`meam <pair_meam>` - modified embedded atom method (MEAM) in C
|
||||
* :doc:`meam <pair_meam>` - modified embedded atom method (MEAM)
|
||||
* :doc:`meam/ms <pair_meam>` - multi-state modified embedded atom method (MS-MEAM)
|
||||
* :doc:`meam/spline <pair_meam_spline>` - splined version of MEAM
|
||||
* :doc:`meam/sw/spline <pair_meam_sw_spline>` - splined version of MEAM with a Stillinger-Weber term
|
||||
* :doc:`mesocnt <pair_mesocnt>` - mesoscopic vdW potential for (carbon) nanotubes
|
||||
|
||||
30
examples/meam/msmeam/HGa.meam
Normal file
30
examples/meam/msmeam/HGa.meam
Normal file
@ -0,0 +1,30 @@
|
||||
bkgd_dyn = 1
|
||||
emb_lin_neg = 1
|
||||
augt1=0
|
||||
ialloy=1
|
||||
rc = 5.9
|
||||
#H
|
||||
attrac(1,1)=0.460
|
||||
repuls(1,1)=0.460
|
||||
Cmin(1,1,1)=1.3 # PuMS
|
||||
Cmax(1,1,1)= 2.80
|
||||
nn2(1,1)=1
|
||||
#Ga
|
||||
rho0(2) = 0.6
|
||||
attrac(2,2)=0.097
|
||||
repuls(2,2)=0.097
|
||||
nn2(2,2)=1
|
||||
#HGa
|
||||
attrac(1,2)=0.300
|
||||
repuls(1,2)=0.300
|
||||
lattce(1,2)=l12
|
||||
re(1,2)=3.19
|
||||
delta(1,2)=-0.48
|
||||
alpha(1,2)=6.6
|
||||
Cmin(1,1,2)=2.0
|
||||
Cmin(2,1,2)= 2.0
|
||||
Cmin(1,2,1)=2.0
|
||||
Cmin(2,2,1) = 1.4
|
||||
Cmin(1,2,2) = 1.4
|
||||
Cmin(1,1,2) = 1.4
|
||||
nn2(1,2)=1
|
||||
9
examples/meam/msmeam/README.md
Normal file
9
examples/meam/msmeam/README.md
Normal file
@ -0,0 +1,9 @@
|
||||
To run Baske's test, do
|
||||
|
||||
lmp -in in.msmeam
|
||||
|
||||
Then
|
||||
|
||||
diff dump.msmeam dump.msmeam.bu
|
||||
|
||||
|
||||
25
examples/meam/msmeam/data.msmeam.bu
Normal file
25
examples/meam/msmeam/data.msmeam.bu
Normal file
@ -0,0 +1,25 @@
|
||||
LAMMPS data file via write_data, version 16 Feb 2016, timestep = 1
|
||||
|
||||
3 atoms
|
||||
2 atom types
|
||||
|
||||
-4.0000000000000000e+00 4.0000000000000000e+00 xlo xhi
|
||||
-4.0000000000000000e+00 4.0000000000000000e+00 ylo yhi
|
||||
-4.0000000000000000e+00 4.0000000000000000e+00 zlo zhi
|
||||
|
||||
Masses
|
||||
|
||||
1 1.0079
|
||||
2 69.723
|
||||
|
||||
Atoms # atomic
|
||||
|
||||
1 1 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 0 0 0
|
||||
2 2 2.2000000000000002e+00 0.0000000000000000e+00 0.0000000000000000e+00 0 0 0
|
||||
3 2 2.9999999999999999e-01 2.2999999999999998e+00 0.0000000000000000e+00 0 0 0
|
||||
|
||||
Velocities
|
||||
|
||||
1 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
|
||||
2 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
|
||||
3 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
|
||||
24
examples/meam/msmeam/dump.msmeam.bu
Normal file
24
examples/meam/msmeam/dump.msmeam.bu
Normal file
@ -0,0 +1,24 @@
|
||||
ITEM: TIMESTEP
|
||||
0
|
||||
ITEM: NUMBER OF ATOMS
|
||||
3
|
||||
ITEM: BOX BOUNDS pp pp pp
|
||||
-4 4
|
||||
-4 4
|
||||
-4 4
|
||||
ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
|
||||
1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0
|
||||
2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0
|
||||
3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0
|
||||
ITEM: TIMESTEP
|
||||
1
|
||||
ITEM: NUMBER OF ATOMS
|
||||
3
|
||||
ITEM: BOX BOUNDS pp pp pp
|
||||
-4 4
|
||||
-4 4
|
||||
-4 4
|
||||
ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
|
||||
1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0
|
||||
2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0
|
||||
3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0
|
||||
31
examples/meam/msmeam/in.msmeam
Normal file
31
examples/meam/msmeam/in.msmeam
Normal file
@ -0,0 +1,31 @@
|
||||
echo both
|
||||
log log.msmeam
|
||||
# Test of MEAM potential for HGa
|
||||
|
||||
# ------------------------ INITIALIZATION ----------------------------
|
||||
units metal
|
||||
dimension 3
|
||||
boundary p p p
|
||||
atom_style atomic
|
||||
variable latparam equal 4.646
|
||||
variable ncell equal 3
|
||||
|
||||
# ----------------------- ATOM DEFINITION ----------------------------
|
||||
region box block -4 4 -4 4 -4 4
|
||||
create_box 2 box
|
||||
|
||||
#
|
||||
|
||||
include potential.mod
|
||||
create_atoms 1 single 0 0 0 units box
|
||||
create_atoms 2 single 2.2 0 0 units box
|
||||
create_atoms 2 single 0.3 2.3 0 units box
|
||||
# ---------- Define Settings ---------------------
|
||||
variable teng equal "c_eatoms"
|
||||
compute pot_energy all pe/atom
|
||||
compute stress all stress/atom NULL
|
||||
dump 1 all custom 1 dump.msmeam id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
|
||||
run 1
|
||||
write_data data.msmeam
|
||||
|
||||
print "All done!"
|
||||
14
examples/meam/msmeam/library.msmeam
Normal file
14
examples/meam/msmeam/library.msmeam
Normal file
@ -0,0 +1,14 @@
|
||||
# DATE: 2018-09-22 UNITS: metal CONTRIBUTOR: Steve Valone, smv@lanl.gov CITATION: Baskes, PRB 1992; smv, sr, mib, JNM 2010
|
||||
# ms-meam data format May 2010
|
||||
# elt lat z ielement atwt
|
||||
# alpha b0 b1 b2 b3 b1m b2m b3m alat esub asub
|
||||
# - t0 t1 t2 t3 t1m t2m t3m rozero ibar
|
||||
# NOTE: leading character cannot be a space
|
||||
|
||||
'H' 'dim' 1.0 1 1.0079
|
||||
2.960 2.960 3.0 1.0 1.0 1.0 3.0 1.0 0.741 2.235 2.50
|
||||
1.0 0.44721 0.0 0.00 0.0 0.31623 0 6.70 0
|
||||
|
||||
'Ga4' 'fcc' 12.0 31 69.723
|
||||
4.42 4.80 3.10 6.00 0.00 0.0 0.0 0.5 4.247 2.897 0.97
|
||||
1.0 1.649 1.435 0.00 0.0 0.0 2.0 0.70 0
|
||||
107
examples/meam/msmeam/log.msmeam.bu
Normal file
107
examples/meam/msmeam/log.msmeam.bu
Normal file
@ -0,0 +1,107 @@
|
||||
# Test of MEAM potential for HGa
|
||||
|
||||
# ------------------------ INITIALIZATION ----------------------------
|
||||
units metal
|
||||
dimension 3
|
||||
boundary p p p
|
||||
atom_style atomic
|
||||
variable latparam equal 4.646
|
||||
variable ncell equal 3
|
||||
|
||||
# ----------------------- ATOM DEFINITION ----------------------------
|
||||
region box block -4 4 -4 4 -4 4
|
||||
create_box 2 box
|
||||
Created orthogonal box = (-4 -4 -4) to (4 4 4)
|
||||
1 by 1 by 1 MPI processor grid
|
||||
|
||||
#
|
||||
|
||||
include potential.mod
|
||||
# NOTE: This script can be modified for different pair styles
|
||||
# See in.elastic for more info.
|
||||
|
||||
variable Pu string H
|
||||
print "potential chosen ${Pu}"
|
||||
potential chosen H
|
||||
# Choose potential
|
||||
pair_style MSmeam
|
||||
print "we just executed"
|
||||
we just executed
|
||||
|
||||
pair_coeff * * library.MSmeam ${Pu} Ga4 HGaMS.meam ${Pu} Ga4
|
||||
pair_coeff * * library.MSmeam H Ga4 HGaMS.meam ${Pu} Ga4
|
||||
pair_coeff * * library.MSmeam H Ga4 HGaMS.meam H Ga4
|
||||
Reading potential file library.MSmeam with DATE: 2018-09-22
|
||||
# Setup neighbor style
|
||||
neighbor 1.0 nsq
|
||||
neigh_modify once no every 1 delay 0 check yes
|
||||
|
||||
# Setup minimization style
|
||||
variable dmax equal 1.0e-2
|
||||
min_style cg
|
||||
min_modify dmax ${dmax} line quadratic
|
||||
min_modify dmax 0.01 line quadratic
|
||||
compute eng all pe/atom
|
||||
compute eatoms all reduce sum c_eng
|
||||
|
||||
# Setup output
|
||||
thermo 100
|
||||
thermo_style custom step temp etotal press pxx pyy pzz pxy pxz pyz lx ly lz vol c_eatoms
|
||||
thermo_modify norm yes
|
||||
create_atoms 1 single 0 0 0 units box
|
||||
Created 1 atoms
|
||||
create_atoms 2 single 2.2 0 0 units box
|
||||
Created 1 atoms
|
||||
create_atoms 2 single 0.3 2.3 0 units box
|
||||
Created 1 atoms
|
||||
# ---------- Define Settings ---------------------
|
||||
variable teng equal "c_eatoms"
|
||||
compute pot_energy all pe/atom
|
||||
compute stress all stress/atom NULL
|
||||
dump 1 all custom 1 dump.msmeam id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
|
||||
run 1
|
||||
WARNING: No fixes defined, atoms won't move (../verlet.cpp:55)
|
||||
Neighbor list info ...
|
||||
2 neighbor list requests
|
||||
update every 1 steps, delay 0 steps, check yes
|
||||
max neighbors/atom: 2000, page size: 100000
|
||||
master list distance cutoff = 6.9
|
||||
ghost atom cutoff = 6.9
|
||||
Memory usage per processor = 12.9295 Mbytes
|
||||
Step Temp TotEng Press Pxx Pyy Pzz Pxy Pxz Pyz Lx Ly Lz Volume eatoms
|
||||
0 0 15.433079 491354.68 838670.91 635393.13 0 80195.793 0 0 8 8 8 512 15.433079
|
||||
1 0 15.433079 491354.68 838670.91 635393.13 0 80195.793 0 0 8 8 8 512 15.433079
|
||||
Loop time of 0.000172138 on 1 procs for 1 steps with 3 atoms
|
||||
|
||||
Performance: 501.922 ns/day, 0.048 hours/ns, 5809.285 timesteps/s
|
||||
81.3% CPU use with 1 MPI tasks x no OpenMP threads
|
||||
|
||||
MPI task timing breakdown:
|
||||
Section | min time | avg time | max time |%varavg| %total
|
||||
---------------------------------------------------------------
|
||||
Pair | 6.6996e-05 | 6.6996e-05 | 6.6996e-05 | 0.0 | 38.92
|
||||
Neigh | 0 | 0 | 0 | 0.0 | 0.00
|
||||
Comm | 1.9073e-06 | 1.9073e-06 | 1.9073e-06 | 0.0 | 1.11
|
||||
Output | 9.7036e-05 | 9.7036e-05 | 9.7036e-05 | 0.0 | 56.37
|
||||
Modify | 0 | 0 | 0 | 0.0 | 0.00
|
||||
Other | | 6.199e-06 | | | 3.60
|
||||
|
||||
Nlocal: 3 ave 3 max 3 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 0
|
||||
Nghost: 78 ave 78 max 78 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 0
|
||||
Neighs: 7 ave 7 max 7 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 0
|
||||
FullNghs: 14 ave 14 max 14 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 0
|
||||
|
||||
Total # of neighbors = 14
|
||||
Ave neighs/atom = 4.66667
|
||||
Neighbor list builds = 0
|
||||
Dangerous builds = 0
|
||||
write_data data.msmeam
|
||||
|
||||
print "All done!"
|
||||
All done!
|
||||
Total wall time: 0:00:00
|
||||
|
||||
24
examples/meam/msmeam/msmeam.dump.bu
Normal file
24
examples/meam/msmeam/msmeam.dump.bu
Normal file
@ -0,0 +1,24 @@
|
||||
ITEM: TIMESTEP
|
||||
0
|
||||
ITEM: NUMBER OF ATOMS
|
||||
3
|
||||
ITEM: BOX BOUNDS pp pp pp
|
||||
-4 4
|
||||
-4 4
|
||||
-4 4
|
||||
ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
|
||||
1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0
|
||||
2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0
|
||||
3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0
|
||||
ITEM: TIMESTEP
|
||||
1
|
||||
ITEM: NUMBER OF ATOMS
|
||||
3
|
||||
ITEM: BOX BOUNDS pp pp pp
|
||||
-4 4
|
||||
-4 4
|
||||
-4 4
|
||||
ITEM: ATOMS id x y z fx fy fz c_pot_energy c_stress[1] c_stress[2] c_stress[3] c_stress[4] c_stress[5] c_stress[6]
|
||||
1 0 0 0 -131.925 -88.3005 0 22.9153 -2.147e+08 -1.62661e+08 -0 -2.05301e+07 -0 -0
|
||||
2 2.2 0 0 120.809 -0.482171 0 14.7692 -2.12028e+08 -0 -0 403352 -0 -0
|
||||
3 0.3 2.3 0 11.1159 88.7827 0 8.61478 -2.67145e+06 -1.62661e+08 -0 -2.09335e+07 -0 -0
|
||||
25
examples/meam/msmeam/potential.mod
Normal file
25
examples/meam/msmeam/potential.mod
Normal file
@ -0,0 +1,25 @@
|
||||
# NOTE: This script can be modified for different pair styles
|
||||
# See in.elastic for more info.
|
||||
|
||||
variable Pu string H
|
||||
print "potential chosen ${Pu}"
|
||||
# Choose potential
|
||||
pair_style meam/ms
|
||||
print "we just executed"
|
||||
|
||||
pair_coeff * * library.msmeam ${Pu} Ga4 HGa.meam ${Pu} Ga4
|
||||
# Setup neighbor style
|
||||
neighbor 1.0 bin
|
||||
neigh_modify once no every 1 delay 0 check yes
|
||||
|
||||
# Setup minimization style
|
||||
variable dmax equal 1.0e-2
|
||||
min_style cg
|
||||
min_modify dmax ${dmax} line quadratic
|
||||
compute eng all pe/atom
|
||||
compute eatoms all reduce sum c_eng
|
||||
|
||||
# Setup output
|
||||
thermo 100
|
||||
thermo_style custom step temp etotal press pxx pyy pzz pxy pxz pyz lx ly lz vol c_eatoms
|
||||
thermo_modify norm yes
|
||||
@ -1,199 +0,0 @@
|
||||
diff --git a/lib/kokkos/Makefile.kokkos b/lib/kokkos/Makefile.kokkos
|
||||
index 22af411f32..530510a0d1 100644
|
||||
--- a/lib/kokkos/Makefile.kokkos
|
||||
+++ b/lib/kokkos/Makefile.kokkos
|
||||
@@ -20,7 +20,7 @@ KOKKOS_DEVICES ?= "OpenMP"
|
||||
#KOKKOS_DEVICES ?= "Threads"
|
||||
# Options:
|
||||
# Intel: KNC,KNL,SNB,HSW,BDW,SKL,SKX,ICL,ICX,SPR
|
||||
-# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86
|
||||
+# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86,Hopper90
|
||||
# ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX
|
||||
# IBM: BGQ,Power7,Power8,Power9
|
||||
# AMD-GPUS: Vega900,Vega906,Vega908,Vega90A
|
||||
@@ -401,6 +401,7 @@ KOKKOS_INTERNAL_USE_ARCH_VOLTA72 := $(call kokkos_has_string,$(KOKKOS_ARCH),Volt
|
||||
KOKKOS_INTERNAL_USE_ARCH_TURING75 := $(call kokkos_has_string,$(KOKKOS_ARCH),Turing75)
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMPERE80 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere80)
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMPERE86 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere86)
|
||||
+KOKKOS_INTERNAL_USE_ARCH_HOPPER90 := $(call kokkos_has_string,$(KOKKOS_ARCH),Hopper90)
|
||||
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
|
||||
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
|
||||
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
|
||||
@@ -414,7 +415,8 @@ KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLE
|
||||
+ $(KOKKOS_INTERNAL_USE_ARCH_VOLTA72) \
|
||||
+ $(KOKKOS_INTERNAL_USE_ARCH_TURING75) \
|
||||
+ $(KOKKOS_INTERNAL_USE_ARCH_AMPERE80) \
|
||||
- + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE86))
|
||||
+ + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE86) \
|
||||
+ + $(KOKKOS_INTERNAL_USE_ARCH_HOPPER90))
|
||||
|
||||
#SEK: This seems like a bug to me
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
|
||||
@@ -1194,6 +1196,11 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA_ARCH), 1)
|
||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMPERE86")
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_86
|
||||
endif
|
||||
+ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_HOPPER90), 1)
|
||||
+ tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_HOPPER")
|
||||
+ tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_HOPPER90")
|
||||
+ KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_90
|
||||
+ endif
|
||||
|
||||
ifneq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)
|
||||
diff --git a/lib/kokkos/cmake/KokkosCore_config.h.in b/lib/kokkos/cmake/KokkosCore_config.h.in
|
||||
index 88ddc48378..b83ced9243 100644
|
||||
--- a/lib/kokkos/cmake/KokkosCore_config.h.in
|
||||
+++ b/lib/kokkos/cmake/KokkosCore_config.h.in
|
||||
@@ -102,6 +102,7 @@
|
||||
#cmakedefine KOKKOS_ARCH_AMPERE
|
||||
#cmakedefine KOKKOS_ARCH_AMPERE80
|
||||
#cmakedefine KOKKOS_ARCH_AMPERE86
|
||||
+#cmakedefine KOKKOS_ARCH_HOPPER90
|
||||
#cmakedefine KOKKOS_ARCH_AMD_ZEN
|
||||
#cmakedefine KOKKOS_ARCH_AMD_ZEN2
|
||||
#cmakedefine KOKKOS_ARCH_AMD_ZEN3
|
||||
diff --git a/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc b/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc
|
||||
index f56cef1651..2585a6a64c 100644
|
||||
--- a/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc
|
||||
+++ b/lib/kokkos/cmake/compile_tests/cuda_compute_capability.cc
|
||||
@@ -74,6 +74,7 @@ int main() {
|
||||
case 75: std::cout << "Set -DKokkos_ARCH_TURING75=ON ." << std::endl; break;
|
||||
case 80: std::cout << "Set -DKokkos_ARCH_AMPERE80=ON ." << std::endl; break;
|
||||
case 86: std::cout << "Set -DKokkos_ARCH_AMPERE86=ON ." << std::endl; break;
|
||||
+ case 90: std::cout << "Set -DKokkos_ARCH_HOPPER90=ON ." << std::endl; break;
|
||||
default:
|
||||
std::cout << "Compute capability " << compute_capability
|
||||
<< " is not supported" << std::endl;
|
||||
diff --git a/lib/kokkos/cmake/kokkos_arch.cmake b/lib/kokkos/cmake/kokkos_arch.cmake
|
||||
index ef16aad047..c1d76cceeb 100644
|
||||
--- a/lib/kokkos/cmake/kokkos_arch.cmake
|
||||
+++ b/lib/kokkos/cmake/kokkos_arch.cmake
|
||||
@@ -86,6 +86,7 @@ KOKKOS_ARCH_OPTION(VOLTA72 GPU "NVIDIA Volta generation CC 7.2" "KOKK
|
||||
KOKKOS_ARCH_OPTION(TURING75 GPU "NVIDIA Turing generation CC 7.5" "KOKKOS_SHOW_CUDA_ARCHS")
|
||||
KOKKOS_ARCH_OPTION(AMPERE80 GPU "NVIDIA Ampere generation CC 8.0" "KOKKOS_SHOW_CUDA_ARCHS")
|
||||
KOKKOS_ARCH_OPTION(AMPERE86 GPU "NVIDIA Ampere generation CC 8.6" "KOKKOS_SHOW_CUDA_ARCHS")
|
||||
+KOKKOS_ARCH_OPTION(HOPPER90 GPU "NVIDIA Hopper generation CC 9.0" "KOKKOS_SHOW_CUDA_ARCHS")
|
||||
|
||||
IF(Kokkos_ENABLE_HIP OR Kokkos_ENABLE_OPENMPTARGET OR Kokkos_ENABLE_UNSUPPORTED_ARCHS)
|
||||
SET(KOKKOS_SHOW_HIP_ARCHS ON)
|
||||
@@ -544,6 +545,7 @@ CHECK_CUDA_ARCH(VOLTA72 sm_72)
|
||||
CHECK_CUDA_ARCH(TURING75 sm_75)
|
||||
CHECK_CUDA_ARCH(AMPERE80 sm_80)
|
||||
CHECK_CUDA_ARCH(AMPERE86 sm_86)
|
||||
+CHECK_CUDA_ARCH(HOPPER90 sm_90)
|
||||
|
||||
SET(AMDGPU_ARCH_ALREADY_SPECIFIED "")
|
||||
FUNCTION(CHECK_AMDGPU_ARCH ARCH FLAG)
|
||||
@@ -806,6 +808,10 @@ IF (KOKKOS_ARCH_AMPERE80 OR KOKKOS_ARCH_AMPERE86)
|
||||
SET(KOKKOS_ARCH_AMPERE ON)
|
||||
ENDIF()
|
||||
|
||||
+IF (KOKKOS_ARCH_HOPPER90)
|
||||
+ SET(KOKKOS_ARCH_HOPPER ON)
|
||||
+ENDIF()
|
||||
+
|
||||
#Regardless of version, make sure we define the general architecture name
|
||||
IF (KOKKOS_ARCH_VEGA900 OR KOKKOS_ARCH_VEGA906 OR KOKKOS_ARCH_VEGA908 OR KOKKOS_ARCH_VEGA90A)
|
||||
SET(KOKKOS_ARCH_VEGA ON)
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
index 56f9117844..fcd4773dbc 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
@@ -232,7 +232,8 @@ inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
|
||||
case 61: return 96;
|
||||
case 70:
|
||||
case 80:
|
||||
- case 86: return 8;
|
||||
+ case 86:
|
||||
+ case 90: return 8;
|
||||
case 75: return 32;
|
||||
default:
|
||||
Kokkos::Impl::throw_runtime_exception(
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp
|
||||
index 40a263561f..8c40ebd60d 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half_Conversion.hpp
|
||||
@@ -418,7 +418,7 @@ KOKKOS_INLINE_FUNCTION
|
||||
#endif // CUDA_VERSION >= 11000 && CUDA_VERSION < 11010
|
||||
|
||||
#if CUDA_VERSION >= 11010 && \
|
||||
- ((defined(KOKKOS_ARCH_AMPERE80) || defined(KOKKOS_ARCH_AMPERE86)))
|
||||
+ ((defined(KOKKOS_ARCH_AMPERE) || defined(KOKKOS_ARCH_HOPPER)))
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
bhalf_t cast_to_bhalf(bhalf_t val) { return val; }
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
diff --git a/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp b/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
|
||||
index f9451ecfe6..2ce1efb98c 100644
|
||||
--- a/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
|
||||
+++ b/lib/kokkos/core/src/OpenACC/Kokkos_OpenACC_Traits.hpp
|
||||
@@ -51,7 +51,7 @@ namespace Kokkos::Experimental::Impl {
|
||||
|
||||
struct OpenACC_Traits {
|
||||
#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
|
||||
- defined(KOKKOS_ARCH_AMPERE)
|
||||
+ defined(KOKKOS_ARCH_AMPERE) || defined(KOKKOS_ARCH_HOPPER)
|
||||
static constexpr acc_device_t dev_type = acc_device_nvidia;
|
||||
static constexpr bool may_fallback_to_host = false;
|
||||
#else
|
||||
diff --git a/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp b/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp
|
||||
index a9bc085912..27ee1d4232 100644
|
||||
--- a/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp
|
||||
+++ b/lib/kokkos/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp
|
||||
@@ -115,8 +115,9 @@ void OpenMPTargetInternal::impl_initialize() {
|
||||
|
||||
// FIXME_OPENMPTARGET: Only fix the number of teams for NVIDIA architectures
|
||||
// from Pascal and upwards.
|
||||
-#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
|
||||
- defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE)
|
||||
+#if defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
|
||||
+ defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \
|
||||
+ defined(KOKKOS_ARCH_HOPPER)
|
||||
#if defined(KOKKOS_COMPILER_CLANG) && (KOKKOS_COMPILER_CLANG >= 1300)
|
||||
omp_set_num_teams(512);
|
||||
#endif
|
||||
diff --git a/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp b/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp
|
||||
index 840db4327c..7e5addbc5b 100644
|
||||
--- a/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp
|
||||
+++ b/lib/kokkos/core/src/SYCL/Kokkos_SYCL.cpp
|
||||
@@ -155,7 +155,7 @@ void SYCL::impl_initialize(InitializationSettings const& settings) {
|
||||
#if !defined(KOKKOS_ARCH_INTEL_GPU) && !defined(KOKKOS_ARCH_KEPLER) && \
|
||||
!defined(KOKKOS_ARCH_MAXWELL) && !defined(KOKKOS_ARCH_PASCAL) && \
|
||||
!defined(KOKKOS_ARCH_VOLTA) && !defined(KOKKOS_ARCH_TURING75) && \
|
||||
- !defined(KOKKOS_ARCH_AMPERE)
|
||||
+ !defined(KOKKOS_ARCH_AMPERE) && !defined(KOKKOS_ARCH_HOPPER)
|
||||
if (!settings.has_device_id() && gpu_devices.empty()) {
|
||||
Impl::SYCLInternal::singleton().initialize(sycl::device());
|
||||
return;
|
||||
diff --git a/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp b/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp
|
||||
index 5ac7d8af30..ba101f699e 100644
|
||||
--- a/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp
|
||||
+++ b/lib/kokkos/core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp
|
||||
@@ -335,9 +335,10 @@ class TeamPolicyInternal<Kokkos::Experimental::SYCL, Properties...>
|
||||
return std::min({
|
||||
int(m_space.impl_internal_space_instance()->m_maxWorkgroupSize),
|
||||
// FIXME_SYCL Avoid requesting to many registers on NVIDIA GPUs.
|
||||
-#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
|
||||
- defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
|
||||
- defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE)
|
||||
+#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
|
||||
+ defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
|
||||
+ defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \
|
||||
+ defined(KOKKOS_ARCH_HOPPER)
|
||||
256,
|
||||
#endif
|
||||
max_threads_for_memory
|
||||
@@ -367,9 +368,10 @@ class TeamPolicyInternal<Kokkos::Experimental::SYCL, Properties...>
|
||||
return std::min<int>({
|
||||
int(m_space.impl_internal_space_instance()->m_maxWorkgroupSize),
|
||||
// FIXME_SYCL Avoid requesting to many registers on NVIDIA GPUs.
|
||||
-#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
|
||||
- defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
|
||||
- defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE)
|
||||
+#if defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL) || \
|
||||
+ defined(KOKKOS_ARCH_PASCAL) || defined(KOKKOS_ARCH_VOLTA) || \
|
||||
+ defined(KOKKOS_ARCH_TURING75) || defined(KOKKOS_ARCH_AMPERE) || \
|
||||
+ defined(KOKKOS_ARCH_HOPPER)
|
||||
256,
|
||||
#endif
|
||||
max_threads_for_memory
|
||||
@ -1,523 +0,0 @@
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
index fcd4773dbc..30b6958a67 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
@@ -207,7 +207,6 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
|
||||
LaunchBounds{});
|
||||
}
|
||||
|
||||
-// Assuming cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1)
|
||||
// NOTE these number can be obtained several ways:
|
||||
// * One option is to download the CUDA Occupancy Calculator spreadsheet, select
|
||||
// "Compute Capability" first and check what is the smallest "Shared Memory
|
||||
@@ -242,6 +241,7 @@ inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
|
||||
return 0;
|
||||
}() * 1024;
|
||||
}
|
||||
+
|
||||
} // namespace Impl
|
||||
} // namespace Kokkos
|
||||
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
|
||||
index 5811498e01..e22eb3b842 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
|
||||
@@ -569,12 +569,6 @@ Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
|
||||
}
|
||||
#endif
|
||||
|
||||
-#ifdef KOKKOS_ENABLE_PRE_CUDA_10_DEPRECATION_API
|
||||
- cudaThreadSetCacheConfig(cudaFuncCachePreferShared);
|
||||
-#else
|
||||
- cudaDeviceSetCacheConfig(cudaFuncCachePreferShared);
|
||||
-#endif
|
||||
-
|
||||
// Init the array for used for arbitrarily sized atomics
|
||||
if (stream == nullptr) Impl::initialize_host_cuda_lock_arrays();
|
||||
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
|
||||
index b7a80ad84f..5c4c3a7d39 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
|
||||
@@ -93,10 +93,6 @@ namespace Impl {
|
||||
// __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
|
||||
// function qualifier which could be used to improve performance.
|
||||
//----------------------------------------------------------------------------
|
||||
-// Maximize L1 cache and minimize shared memory:
|
||||
-// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1 );
|
||||
-// For 2.0 capability: 48 KB L1 and 16 KB shared
|
||||
-//----------------------------------------------------------------------------
|
||||
|
||||
template <class DriverType>
|
||||
__global__ static void cuda_parallel_launch_constant_memory() {
|
||||
@@ -158,63 +154,105 @@ inline void check_shmem_request(CudaInternal const* cuda_instance, int shmem) {
|
||||
}
|
||||
}
|
||||
|
||||
-// This function needs to be template on DriverType and LaunchBounds
|
||||
+// These functions needs to be template on DriverType and LaunchBounds
|
||||
// so that the static bool is unique for each type combo
|
||||
// KernelFuncPtr does not necessarily contain that type information.
|
||||
+
|
||||
template <class DriverType, class LaunchBounds, class KernelFuncPtr>
|
||||
-inline void configure_shmem_preference(KernelFuncPtr const& func,
|
||||
- bool prefer_shmem) {
|
||||
+const cudaFuncAttributes& get_cuda_kernel_func_attributes(
|
||||
+ const KernelFuncPtr& func) {
|
||||
+ // Only call cudaFuncGetAttributes once for each unique kernel
|
||||
+ // by leveraging static variable initialization rules
|
||||
+ auto wrap_get_attributes = [&]() -> cudaFuncAttributes {
|
||||
+ cudaFuncAttributes attr;
|
||||
+ KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncGetAttributes(&attr, func));
|
||||
+ return attr;
|
||||
+ };
|
||||
+ static cudaFuncAttributes func_attr = wrap_get_attributes();
|
||||
+ return func_attr;
|
||||
+}
|
||||
+
|
||||
+template <class DriverType, class LaunchBounds, class KernelFuncPtr>
|
||||
+inline void configure_shmem_preference(const KernelFuncPtr& func,
|
||||
+ const cudaDeviceProp& device_props,
|
||||
+ const size_t block_size, int& shmem,
|
||||
+ const size_t occupancy) {
|
||||
#ifndef KOKKOS_ARCH_KEPLER
|
||||
- // On Kepler the L1 has no benefit since it doesn't cache reads
|
||||
+
|
||||
+ const auto& func_attr =
|
||||
+ get_cuda_kernel_func_attributes<DriverType, LaunchBounds>(func);
|
||||
+
|
||||
+ // Compute limits for number of blocks due to registers/SM
|
||||
+ const size_t regs_per_sm = device_props.regsPerMultiprocessor;
|
||||
+ const size_t regs_per_thread = func_attr.numRegs;
|
||||
+ // The granularity of register allocation is chunks of 256 registers per warp
|
||||
+ // -> 8 registers per thread
|
||||
+ const size_t allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8);
|
||||
+ const size_t max_blocks_regs =
|
||||
+ regs_per_sm / (allocated_regs_per_thread * block_size);
|
||||
+
|
||||
+ // Compute how many threads per sm we actually want
|
||||
+ const size_t max_threads_per_sm = device_props.maxThreadsPerMultiProcessor;
|
||||
+ // only allocate multiples of warp size
|
||||
+ const size_t num_threads_desired =
|
||||
+ ((max_threads_per_sm * occupancy / 100 + 31) / 32) * 32;
|
||||
+ // Get close to the desired occupancy,
|
||||
+ // don't undershoot by much but also don't allocate a whole new block just
|
||||
+ // because one is a few threads over otherwise.
|
||||
+ size_t num_blocks_desired =
|
||||
+ (num_threads_desired + block_size * 0.8) / block_size;
|
||||
+ num_blocks_desired = ::std::min(max_blocks_regs, num_blocks_desired);
|
||||
+ if (num_blocks_desired == 0) num_blocks_desired = 1;
|
||||
+
|
||||
+ // Calculate how much shared memory we need per block
|
||||
+ size_t shmem_per_block = shmem + func_attr.sharedSizeBytes;
|
||||
+
|
||||
+ // The minimum shared memory allocation we can have in total per SM is 8kB.
|
||||
+ // If we want to lower occupancy we have to make sure we request at least that
|
||||
+ // much in aggregate over all blocks, so that shared memory actually becomes a
|
||||
+ // limiting factor for occupancy
|
||||
+ constexpr size_t min_shmem_size_per_sm = 8192;
|
||||
+ if ((occupancy < 100) &&
|
||||
+ (shmem_per_block * num_blocks_desired < min_shmem_size_per_sm)) {
|
||||
+ shmem_per_block = min_shmem_size_per_sm / num_blocks_desired;
|
||||
+ // Need to set the caller's shmem variable so that the
|
||||
+ // kernel launch uses the correct dynamic shared memory request
|
||||
+ shmem = shmem_per_block - func_attr.sharedSizeBytes;
|
||||
+ }
|
||||
+
|
||||
+ // Compute the carveout fraction we need based on occupancy
|
||||
+ // Use multiples of 8kB
|
||||
+ const size_t max_shmem_per_sm = device_props.sharedMemPerMultiprocessor;
|
||||
+ size_t carveout = shmem_per_block == 0
|
||||
+ ? 0
|
||||
+ : 100 *
|
||||
+ (((num_blocks_desired * shmem_per_block +
|
||||
+ min_shmem_size_per_sm - 1) /
|
||||
+ min_shmem_size_per_sm) *
|
||||
+ min_shmem_size_per_sm) /
|
||||
+ max_shmem_per_sm;
|
||||
+ if (carveout > 100) carveout = 100;
|
||||
+
|
||||
+ // Set the carveout, but only call it once per kernel or when it changes
|
||||
auto set_cache_config = [&] {
|
||||
- KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
|
||||
- func,
|
||||
- (prefer_shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1)));
|
||||
- return prefer_shmem;
|
||||
+ KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFuncSetAttribute(
|
||||
+ func, cudaFuncAttributePreferredSharedMemoryCarveout, carveout));
|
||||
+ return carveout;
|
||||
};
|
||||
- static bool cache_config_preference_cached = set_cache_config();
|
||||
- if (cache_config_preference_cached != prefer_shmem) {
|
||||
+ // Store the value in a static variable so we only reset if needed
|
||||
+ static size_t cache_config_preference_cached = set_cache_config();
|
||||
+ if (cache_config_preference_cached != carveout) {
|
||||
cache_config_preference_cached = set_cache_config();
|
||||
}
|
||||
#else
|
||||
// Use the parameters so we don't get a warning
|
||||
(void)func;
|
||||
- (void)prefer_shmem;
|
||||
+ (void)device_props;
|
||||
+ (void)block_size;
|
||||
+ (void)occupancy;
|
||||
#endif
|
||||
}
|
||||
|
||||
-template <class Policy>
|
||||
-std::enable_if_t<Policy::experimental_contains_desired_occupancy>
|
||||
-modify_launch_configuration_if_desired_occupancy_is_specified(
|
||||
- Policy const& policy, cudaDeviceProp const& properties,
|
||||
- cudaFuncAttributes const& attributes, dim3 const& block, int& shmem,
|
||||
- bool& prefer_shmem) {
|
||||
- int const block_size = block.x * block.y * block.z;
|
||||
- int const desired_occupancy = policy.impl_get_desired_occupancy().value();
|
||||
-
|
||||
- size_t const shmem_per_sm_prefer_l1 = get_shmem_per_sm_prefer_l1(properties);
|
||||
- size_t const static_shmem = attributes.sharedSizeBytes;
|
||||
-
|
||||
- // round to nearest integer and avoid division by zero
|
||||
- int active_blocks = std::max(
|
||||
- 1, static_cast<int>(std::round(
|
||||
- static_cast<double>(properties.maxThreadsPerMultiProcessor) /
|
||||
- block_size * desired_occupancy / 100)));
|
||||
- int const dynamic_shmem =
|
||||
- shmem_per_sm_prefer_l1 / active_blocks - static_shmem;
|
||||
-
|
||||
- if (dynamic_shmem > shmem) {
|
||||
- shmem = dynamic_shmem;
|
||||
- prefer_shmem = false;
|
||||
- }
|
||||
-}
|
||||
-
|
||||
-template <class Policy>
|
||||
-std::enable_if_t<!Policy::experimental_contains_desired_occupancy>
|
||||
-modify_launch_configuration_if_desired_occupancy_is_specified(
|
||||
- Policy const&, cudaDeviceProp const&, cudaFuncAttributes const&,
|
||||
- dim3 const& /*block*/, int& /*shmem*/, bool& /*prefer_shmem*/) {}
|
||||
-
|
||||
// </editor-fold> end Some helper functions for launch code readability }}}1
|
||||
//==============================================================================
|
||||
|
||||
@@ -348,7 +386,7 @@ struct CudaParallelLaunchKernelInvoker<
|
||||
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||
inline static void create_parallel_launch_graph_node(
|
||||
DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
|
||||
- CudaInternal const* cuda_instance, bool prefer_shmem) {
|
||||
+ CudaInternal const* cuda_instance) {
|
||||
//----------------------------------------
|
||||
auto const& graph = Impl::get_cuda_graph_from_kernel(driver);
|
||||
KOKKOS_EXPECTS(bool(graph));
|
||||
@@ -358,8 +396,15 @@ struct CudaParallelLaunchKernelInvoker<
|
||||
|
||||
if (!Impl::is_empty_launch(grid, block)) {
|
||||
Impl::check_shmem_request(cuda_instance, shmem);
|
||||
- Impl::configure_shmem_preference<DriverType, LaunchBounds>(
|
||||
- base_t::get_kernel_func(), prefer_shmem);
|
||||
+ if (DriverType::Policy::
|
||||
+ experimental_contains_desired_occupancy) {
|
||||
+ int desired_occupancy =
|
||||
+ driver.get_policy().impl_get_desired_occupancy().value();
|
||||
+ size_t block_size = block.x * block.y * block.z;
|
||||
+ Impl::configure_shmem_preference<DriverType, LaunchBounds>(
|
||||
+ base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
|
||||
+ shmem, desired_occupancy);
|
||||
+ }
|
||||
|
||||
void const* args[] = {&driver};
|
||||
|
||||
@@ -442,7 +487,7 @@ struct CudaParallelLaunchKernelInvoker<
|
||||
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||
inline static void create_parallel_launch_graph_node(
|
||||
DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
|
||||
- CudaInternal const* cuda_instance, bool prefer_shmem) {
|
||||
+ CudaInternal const* cuda_instance) {
|
||||
//----------------------------------------
|
||||
auto const& graph = Impl::get_cuda_graph_from_kernel(driver);
|
||||
KOKKOS_EXPECTS(bool(graph));
|
||||
@@ -452,8 +497,15 @@ struct CudaParallelLaunchKernelInvoker<
|
||||
|
||||
if (!Impl::is_empty_launch(grid, block)) {
|
||||
Impl::check_shmem_request(cuda_instance, shmem);
|
||||
- Impl::configure_shmem_preference<DriverType, LaunchBounds>(
|
||||
- base_t::get_kernel_func(), prefer_shmem);
|
||||
+ if constexpr (DriverType::Policy::
|
||||
+ experimental_contains_desired_occupancy) {
|
||||
+ int desired_occupancy =
|
||||
+ driver.get_policy().impl_get_desired_occupancy().value();
|
||||
+ size_t block_size = block.x * block.y * block.z;
|
||||
+ Impl::configure_shmem_preference<DriverType, LaunchBounds>(
|
||||
+ base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
|
||||
+ shmem, desired_occupancy);
|
||||
+ }
|
||||
|
||||
auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver);
|
||||
|
||||
@@ -566,7 +618,7 @@ struct CudaParallelLaunchKernelInvoker<
|
||||
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||
inline static void create_parallel_launch_graph_node(
|
||||
DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
|
||||
- CudaInternal const* cuda_instance, bool prefer_shmem) {
|
||||
+ CudaInternal const* cuda_instance) {
|
||||
// Just use global memory; coordinating through events to share constant
|
||||
// memory with the non-graph interface is not really reasonable since
|
||||
// events don't work with Graphs directly, and this would anyway require
|
||||
@@ -580,7 +632,7 @@ struct CudaParallelLaunchKernelInvoker<
|
||||
DriverType, LaunchBounds,
|
||||
Experimental::CudaLaunchMechanism::GlobalMemory>;
|
||||
global_launch_impl_t::create_parallel_launch_graph_node(
|
||||
- driver, grid, block, shmem, cuda_instance, prefer_shmem);
|
||||
+ driver, grid, block, shmem, cuda_instance);
|
||||
}
|
||||
#endif
|
||||
};
|
||||
@@ -613,8 +665,7 @@ struct CudaParallelLaunchImpl<
|
||||
|
||||
inline static void launch_kernel(const DriverType& driver, const dim3& grid,
|
||||
const dim3& block, int shmem,
|
||||
- const CudaInternal* cuda_instance,
|
||||
- bool prefer_shmem) {
|
||||
+ const CudaInternal* cuda_instance) {
|
||||
if (!Impl::is_empty_launch(grid, block)) {
|
||||
// Prevent multiple threads to simultaneously set the cache configuration
|
||||
// preference and launch the same kernel
|
||||
@@ -623,18 +674,17 @@ struct CudaParallelLaunchImpl<
|
||||
|
||||
Impl::check_shmem_request(cuda_instance, shmem);
|
||||
|
||||
- // If a desired occupancy is specified, we compute how much shared memory
|
||||
- // to ask for to achieve that occupancy, assuming that the cache
|
||||
- // configuration is `cudaFuncCachePreferL1`. If the amount of dynamic
|
||||
- // shared memory computed is actually smaller than `shmem` we overwrite
|
||||
- // `shmem` and set `prefer_shmem` to `false`.
|
||||
- modify_launch_configuration_if_desired_occupancy_is_specified(
|
||||
- driver.get_policy(), cuda_instance->m_deviceProp,
|
||||
- get_cuda_func_attributes(), block, shmem, prefer_shmem);
|
||||
-
|
||||
- Impl::configure_shmem_preference<
|
||||
- DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
|
||||
- base_t::get_kernel_func(), prefer_shmem);
|
||||
+ if (DriverType::Policy::
|
||||
+ experimental_contains_desired_occupancy) {
|
||||
+ int desired_occupancy =
|
||||
+ driver.get_policy().impl_get_desired_occupancy().value();
|
||||
+ size_t block_size = block.x * block.y * block.z;
|
||||
+ Impl::configure_shmem_preference<
|
||||
+ DriverType,
|
||||
+ Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
|
||||
+ base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
|
||||
+ shmem, desired_occupancy);
|
||||
+ }
|
||||
|
||||
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
|
||||
|
||||
@@ -650,18 +700,9 @@ struct CudaParallelLaunchImpl<
|
||||
}
|
||||
|
||||
static cudaFuncAttributes get_cuda_func_attributes() {
|
||||
- // Race condition inside of cudaFuncGetAttributes if the same address is
|
||||
- // given requires using a local variable as input instead of a static Rely
|
||||
- // on static variable initialization to make sure only one thread executes
|
||||
- // the code and the result is visible.
|
||||
- auto wrap_get_attributes = []() -> cudaFuncAttributes {
|
||||
- cudaFuncAttributes attr_tmp;
|
||||
- KOKKOS_IMPL_CUDA_SAFE_CALL(
|
||||
- cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()));
|
||||
- return attr_tmp;
|
||||
- };
|
||||
- static cudaFuncAttributes attr = wrap_get_attributes();
|
||||
- return attr;
|
||||
+ return get_cuda_kernel_func_attributes<
|
||||
+ DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
|
||||
+ base_t::get_kernel_func());
|
||||
}
|
||||
};
|
||||
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
|
||||
index e586bb4cc6..0e348c092a 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
|
||||
@@ -121,8 +121,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
||||
maxblocks[1]),
|
||||
1);
|
||||
CudaParallelLaunch<ParallelFor, LaunchBounds>(
|
||||
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
|
||||
- false);
|
||||
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
|
||||
} else if (RP::rank == 3) {
|
||||
const dim3 block(m_rp.m_tile[0], m_rp.m_tile[1], m_rp.m_tile[2]);
|
||||
KOKKOS_ASSERT(block.x > 0);
|
||||
@@ -139,8 +138,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
||||
(m_rp.m_upper[2] - m_rp.m_lower[2] + block.z - 1) / block.z,
|
||||
maxblocks[2]));
|
||||
CudaParallelLaunch<ParallelFor, LaunchBounds>(
|
||||
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
|
||||
- false);
|
||||
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
|
||||
} else if (RP::rank == 4) {
|
||||
// id0,id1 encoded within threadIdx.x; id2 to threadIdx.y; id3 to
|
||||
// threadIdx.z
|
||||
@@ -158,8 +156,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
||||
(m_rp.m_upper[3] - m_rp.m_lower[3] + block.z - 1) / block.z,
|
||||
maxblocks[2]));
|
||||
CudaParallelLaunch<ParallelFor, LaunchBounds>(
|
||||
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
|
||||
- false);
|
||||
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
|
||||
} else if (RP::rank == 5) {
|
||||
// id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4 to
|
||||
// threadIdx.z
|
||||
@@ -175,8 +172,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
||||
(m_rp.m_upper[4] - m_rp.m_lower[4] + block.z - 1) / block.z,
|
||||
maxblocks[2]));
|
||||
CudaParallelLaunch<ParallelFor, LaunchBounds>(
|
||||
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
|
||||
- false);
|
||||
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
|
||||
} else if (RP::rank == 6) {
|
||||
// id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4,id5 to
|
||||
// threadIdx.z
|
||||
@@ -191,8 +187,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
||||
std::min<array_index_type>(m_rp.m_tile_end[4] * m_rp.m_tile_end[5],
|
||||
maxblocks[2]));
|
||||
CudaParallelLaunch<ParallelFor, LaunchBounds>(
|
||||
- *this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
|
||||
- false);
|
||||
+ *this, grid, block, 0, m_rp.space().impl_internal_space_instance());
|
||||
} else {
|
||||
Kokkos::abort("Kokkos::MDRange Error: Exceeded rank bounds with Cuda\n");
|
||||
}
|
||||
@@ -405,8 +400,8 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
|
||||
|
||||
CudaParallelLaunch<ParallelReduce, LaunchBounds>(
|
||||
*this, grid, block, shmem,
|
||||
- m_policy.space().impl_internal_space_instance(),
|
||||
- false); // copy to device and execute
|
||||
+ m_policy.space()
|
||||
+ .impl_internal_space_instance()); // copy to device and execute
|
||||
|
||||
if (!m_result_ptr_device_accessible) {
|
||||
if (m_result_ptr) {
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
|
||||
index ac160f8fe2..d1031751c2 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
|
||||
@@ -135,8 +135,7 @@ class ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
|
||||
#endif
|
||||
|
||||
CudaParallelLaunch<ParallelFor, LaunchBounds>(
|
||||
- *this, grid, block, 0, m_policy.space().impl_internal_space_instance(),
|
||||
- false);
|
||||
+ *this, grid, block, 0, m_policy.space().impl_internal_space_instance());
|
||||
}
|
||||
|
||||
ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)
|
||||
@@ -375,8 +374,8 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
|
||||
|
||||
CudaParallelLaunch<ParallelReduce, LaunchBounds>(
|
||||
*this, grid, block, shmem,
|
||||
- m_policy.space().impl_internal_space_instance(),
|
||||
- false); // copy to device and execute
|
||||
+ m_policy.space()
|
||||
+ .impl_internal_space_instance()); // copy to device and execute
|
||||
|
||||
if (!m_result_ptr_device_accessible) {
|
||||
if (m_result_ptr) {
|
||||
@@ -726,16 +725,16 @@ class ParallelScan<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
|
||||
m_final = false;
|
||||
CudaParallelLaunch<ParallelScan, LaunchBounds>(
|
||||
*this, grid, block, shmem,
|
||||
- m_policy.space().impl_internal_space_instance(),
|
||||
- false); // copy to device and execute
|
||||
+ m_policy.space()
|
||||
+ .impl_internal_space_instance()); // copy to device and execute
|
||||
#ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
|
||||
}
|
||||
#endif
|
||||
m_final = true;
|
||||
CudaParallelLaunch<ParallelScan, LaunchBounds>(
|
||||
*this, grid, block, shmem,
|
||||
- m_policy.space().impl_internal_space_instance(),
|
||||
- false); // copy to device and execute
|
||||
+ m_policy.space()
|
||||
+ .impl_internal_space_instance()); // copy to device and execute
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1038,16 +1037,16 @@ class ParallelScanWithTotal<FunctorType, Kokkos::RangePolicy<Traits...>,
|
||||
m_final = false;
|
||||
CudaParallelLaunch<ParallelScanWithTotal, LaunchBounds>(
|
||||
*this, grid, block, shmem,
|
||||
- m_policy.space().impl_internal_space_instance(),
|
||||
- false); // copy to device and execute
|
||||
+ m_policy.space()
|
||||
+ .impl_internal_space_instance()); // copy to device and execute
|
||||
#ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
|
||||
}
|
||||
#endif
|
||||
m_final = true;
|
||||
CudaParallelLaunch<ParallelScanWithTotal, LaunchBounds>(
|
||||
*this, grid, block, shmem,
|
||||
- m_policy.space().impl_internal_space_instance(),
|
||||
- false); // copy to device and execute
|
||||
+ m_policy.space()
|
||||
+ .impl_internal_space_instance()); // copy to device and execute
|
||||
|
||||
const int size = Analysis::value_size(m_functor);
|
||||
#ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp
|
||||
index cdd16085b3..ea9430b812 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp
|
||||
@@ -552,8 +552,8 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
||||
|
||||
CudaParallelLaunch<ParallelFor, LaunchBounds>(
|
||||
*this, grid, block, shmem_size_total,
|
||||
- m_policy.space().impl_internal_space_instance(),
|
||||
- true); // copy to device and execute
|
||||
+ m_policy.space()
|
||||
+ .impl_internal_space_instance()); // copy to device and execute
|
||||
}
|
||||
|
||||
ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)
|
||||
@@ -878,8 +878,8 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
||||
|
||||
CudaParallelLaunch<ParallelReduce, LaunchBounds>(
|
||||
*this, grid, block, shmem_size_total,
|
||||
- m_policy.space().impl_internal_space_instance(),
|
||||
- true); // copy to device and execute
|
||||
+ m_policy.space()
|
||||
+ .impl_internal_space_instance()); // copy to device and execute
|
||||
|
||||
if (!m_result_ptr_device_accessible) {
|
||||
m_policy.space().fence(
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
|
||||
index 34d4bef9fd..178012431c 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
|
||||
@@ -428,11 +428,6 @@ struct CudaReductionsFunctor<FunctorType, false, false> {
|
||||
// __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
|
||||
// function qualifier which could be used to improve performance.
|
||||
//----------------------------------------------------------------------------
|
||||
-// Maximize shared memory and minimize L1 cache:
|
||||
-// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShared );
|
||||
-// For 2.0 capability: 48 KB shared and 16 KB L1
|
||||
-//----------------------------------------------------------------------------
|
||||
-//----------------------------------------------------------------------------
|
||||
/*
|
||||
* Algorithmic constraints:
|
||||
* (a) blockDim.y <= 1024
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp
|
||||
index fb3a6b138f..a12378a891 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_WorkGraphPolicy.hpp
|
||||
@@ -100,8 +100,7 @@ class ParallelFor<FunctorType, Kokkos::WorkGraphPolicy<Traits...>,
|
||||
const int shared = 0;
|
||||
|
||||
Kokkos::Impl::CudaParallelLaunch<Self>(
|
||||
- *this, grid, block, shared, Cuda().impl_internal_space_instance(),
|
||||
- false);
|
||||
+ *this, grid, block, shared, Cuda().impl_internal_space_instance());
|
||||
}
|
||||
|
||||
inline ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)
|
||||
@ -1,46 +0,0 @@
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
index 30b6958a67..b94f053272 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
@@ -207,41 +207,6 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
|
||||
LaunchBounds{});
|
||||
}
|
||||
|
||||
-// NOTE these number can be obtained several ways:
|
||||
-// * One option is to download the CUDA Occupancy Calculator spreadsheet, select
|
||||
-// "Compute Capability" first and check what is the smallest "Shared Memory
|
||||
-// Size Config" that is available. The "Shared Memory Per Multiprocessor" in
|
||||
-// bytes is then to be found below in the summary.
|
||||
-// * Another option would be to look for the information in the "Tuning
|
||||
-// Guide(s)" of the CUDA Toolkit Documentation for each GPU architecture, in
|
||||
-// the "Shared Memory" section (more tedious)
|
||||
-inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
|
||||
- int const compute_capability = properties.major * 10 + properties.minor;
|
||||
- return [compute_capability]() {
|
||||
- switch (compute_capability) {
|
||||
- case 30:
|
||||
- case 32:
|
||||
- case 35: return 16;
|
||||
- case 37: return 80;
|
||||
- case 50:
|
||||
- case 53:
|
||||
- case 60:
|
||||
- case 62: return 64;
|
||||
- case 52:
|
||||
- case 61: return 96;
|
||||
- case 70:
|
||||
- case 80:
|
||||
- case 86:
|
||||
- case 90: return 8;
|
||||
- case 75: return 32;
|
||||
- default:
|
||||
- Kokkos::Impl::throw_runtime_exception(
|
||||
- "Unknown device in cuda block size deduction");
|
||||
- }
|
||||
- return 0;
|
||||
- }() * 1024;
|
||||
-}
|
||||
-
|
||||
} // namespace Impl
|
||||
} // namespace Kokkos
|
||||
|
||||
@ -1,204 +0,0 @@
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
index b94f053272..252c13c524 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp
|
||||
@@ -53,17 +53,69 @@
|
||||
namespace Kokkos {
|
||||
namespace Impl {
|
||||
|
||||
+inline int cuda_warp_per_sm_allocation_granularity(
|
||||
+ cudaDeviceProp const& properties) {
|
||||
+ // Allocation granularity of warps in each sm
|
||||
+ switch (properties.major) {
|
||||
+ case 3:
|
||||
+ case 5:
|
||||
+ case 7:
|
||||
+ case 8:
|
||||
+ case 9: return 4;
|
||||
+ case 6: return (properties.minor == 0 ? 2 : 4);
|
||||
+ default:
|
||||
+ throw_runtime_exception(
|
||||
+ "Unknown device in cuda warp per sm allocation granularity");
|
||||
+ return 0;
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+inline int cuda_max_warps_per_sm_registers(
|
||||
+ cudaDeviceProp const& properties, cudaFuncAttributes const& attributes) {
|
||||
+ // Maximum number of warps per sm as a function of register counts,
|
||||
+ // subject to the constraint that warps are allocated with a fixed granularity
|
||||
+ int const max_regs_per_block = properties.regsPerBlock;
|
||||
+ int const regs_per_warp = attributes.numRegs * properties.warpSize;
|
||||
+ int const warp_granularity =
|
||||
+ cuda_warp_per_sm_allocation_granularity(properties);
|
||||
+ // The granularity of register allocation is chunks of 256 registers per warp,
|
||||
+ // which implies a need to over-allocate, so we round up
|
||||
+ int const allocated_regs_per_warp = (regs_per_warp + 256 - 1) / 256;
|
||||
+
|
||||
+ // The maximum number of warps per SM is constrained from above by register
|
||||
+ // allocation. To satisfy the constraint that warps per SM is allocated at a
|
||||
+ // finite granularity, we need to round down.
|
||||
+ int const max_warps_per_sm =
|
||||
+ warp_granularity *
|
||||
+ (max_regs_per_block / (allocated_regs_per_warp * warp_granularity));
|
||||
+
|
||||
+ return max_warps_per_sm;
|
||||
+}
|
||||
+
|
||||
inline int cuda_max_active_blocks_per_sm(cudaDeviceProp const& properties,
|
||||
cudaFuncAttributes const& attributes,
|
||||
int block_size, size_t dynamic_shmem) {
|
||||
- // Limits due do registers/SM
|
||||
+ // Limits due to registers/SM
|
||||
int const regs_per_sm = properties.regsPerMultiprocessor;
|
||||
int const regs_per_thread = attributes.numRegs;
|
||||
// The granularity of register allocation is chunks of 256 registers per warp
|
||||
// -> 8 registers per thread
|
||||
int const allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8);
|
||||
- int const max_blocks_regs =
|
||||
- regs_per_sm / (allocated_regs_per_thread * block_size);
|
||||
+ int max_blocks_regs = regs_per_sm / (allocated_regs_per_thread * block_size);
|
||||
+
|
||||
+ // Compute the maximum number of warps as a function of the number of
|
||||
+ // registers
|
||||
+ int const max_warps_per_sm_registers =
|
||||
+ cuda_max_warps_per_sm_registers(properties, attributes);
|
||||
+
|
||||
+ // Constrain the number of blocks to respect the maximum number of warps per
|
||||
+ // SM On face value this should be an equality, but due to the warp
|
||||
+ // granularity constraints noted in `cuda_max_warps_per_sm_registers` the
|
||||
+ // left-hand-side of this comparison can overshoot what the hardware allows
|
||||
+ // based on register counts alone
|
||||
+ while ((max_blocks_regs * block_size / properties.warpSize) >
|
||||
+ max_warps_per_sm_registers)
|
||||
+ max_blocks_regs--;
|
||||
|
||||
// Limits due to shared memory/SM
|
||||
size_t const shmem_per_sm = properties.sharedMemPerMultiprocessor;
|
||||
@@ -207,6 +259,19 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
|
||||
LaunchBounds{});
|
||||
}
|
||||
|
||||
+template <class LaunchBounds>
|
||||
+int cuda_get_opt_block_size_no_shmem(const cudaFuncAttributes& attr,
|
||||
+ LaunchBounds) {
|
||||
+ auto const& prop = Kokkos::Cuda().cuda_device_prop();
|
||||
+
|
||||
+ // Thin version of cuda_get_opt_block_size for cases where there is no shared
|
||||
+ // memory
|
||||
+ auto const block_size_to_no_shmem = [&](int /*block_size*/) { return 0; };
|
||||
+
|
||||
+ return cuda_deduce_block_size(false, prop, attr, block_size_to_no_shmem,
|
||||
+ LaunchBounds{});
|
||||
+}
|
||||
+
|
||||
} // namespace Impl
|
||||
} // namespace Kokkos
|
||||
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
|
||||
index 5c4c3a7d39..170183ca0a 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
|
||||
@@ -188,9 +188,23 @@ inline void configure_shmem_preference(const KernelFuncPtr& func,
|
||||
// The granularity of register allocation is chunks of 256 registers per warp
|
||||
// -> 8 registers per thread
|
||||
const size_t allocated_regs_per_thread = 8 * ((regs_per_thread + 8 - 1) / 8);
|
||||
- const size_t max_blocks_regs =
|
||||
+ size_t max_blocks_regs =
|
||||
regs_per_sm / (allocated_regs_per_thread * block_size);
|
||||
|
||||
+ // Compute the maximum number of warps as a function of the number of
|
||||
+ // registers
|
||||
+ const size_t max_warps_per_sm_registers =
|
||||
+ cuda_max_warps_per_sm_registers(device_props, func_attr);
|
||||
+
|
||||
+ // Constrain the number of blocks to respect the maximum number of warps per
|
||||
+ // SM On face value this should be an equality, but due to the warp
|
||||
+ // granularity constraints noted in `cuda_max_warps_per_sm_registers` the
|
||||
+ // left-hand-side of this comparison can overshoot what the hardware allows
|
||||
+ // based on register counts alone
|
||||
+ while ((max_blocks_regs * block_size / device_props.warpSize) >
|
||||
+ max_warps_per_sm_registers)
|
||||
+ max_blocks_regs--;
|
||||
+
|
||||
// Compute how many threads per sm we actually want
|
||||
const size_t max_threads_per_sm = device_props.maxThreadsPerMultiProcessor;
|
||||
// only allocate multiples of warp size
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
|
||||
index 0e348c092a..7e4f62f12e 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp
|
||||
@@ -67,6 +67,34 @@
|
||||
namespace Kokkos {
|
||||
namespace Impl {
|
||||
|
||||
+template <typename ParallelType, typename Policy, typename LaunchBounds>
|
||||
+int max_tile_size_product_helper(const Policy& pol, const LaunchBounds&) {
|
||||
+ cudaFuncAttributes attr =
|
||||
+ CudaParallelLaunch<ParallelType,
|
||||
+ LaunchBounds>::get_cuda_func_attributes();
|
||||
+ auto const& prop = pol.space().cuda_device_prop();
|
||||
+
|
||||
+ // Limits due to registers/SM, MDRange doesn't have
|
||||
+ // shared memory constraints
|
||||
+ int const optimal_block_size =
|
||||
+ Kokkos::Impl::cuda_get_opt_block_size_no_shmem(attr, LaunchBounds{});
|
||||
+
|
||||
+ // Compute how many blocks of this size we can launch, based on warp
|
||||
+ // constraints
|
||||
+ int const max_warps_per_sm_registers =
|
||||
+ Kokkos::Impl::cuda_max_warps_per_sm_registers(prop, attr);
|
||||
+ int const max_num_threads_from_warps =
|
||||
+ max_warps_per_sm_registers * prop.warpSize;
|
||||
+ int const max_num_blocks = max_num_threads_from_warps / optimal_block_size;
|
||||
+
|
||||
+ // Compute the total number of threads
|
||||
+ int const max_threads_per_sm = optimal_block_size * max_num_blocks;
|
||||
+
|
||||
+ return std::min(
|
||||
+ max_threads_per_sm,
|
||||
+ static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
|
||||
+}
|
||||
+
|
||||
template <class FunctorType, class... Traits>
|
||||
class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
||||
public:
|
||||
@@ -85,18 +113,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
||||
public:
|
||||
template <typename Policy, typename Functor>
|
||||
static int max_tile_size_product(const Policy& pol, const Functor&) {
|
||||
- cudaFuncAttributes attr =
|
||||
- CudaParallelLaunch<ParallelFor,
|
||||
- LaunchBounds>::get_cuda_func_attributes();
|
||||
- auto const& prop = pol.space().cuda_device_prop();
|
||||
- // Limits due to registers/SM, MDRange doesn't have
|
||||
- // shared memory constraints
|
||||
- int const regs_per_sm = prop.regsPerMultiprocessor;
|
||||
- int const regs_per_thread = attr.numRegs;
|
||||
- int const max_threads_per_sm = regs_per_sm / regs_per_thread;
|
||||
- return std::min(
|
||||
- max_threads_per_sm,
|
||||
- static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
|
||||
+ return max_tile_size_product_helper<ParallelFor>(pol, LaunchBounds{});
|
||||
}
|
||||
Policy const& get_policy() const { return m_rp; }
|
||||
inline __device__ void operator()() const {
|
||||
@@ -258,17 +275,7 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
|
||||
public:
|
||||
template <typename Policy, typename Functor>
|
||||
static int max_tile_size_product(const Policy& pol, const Functor&) {
|
||||
- cudaFuncAttributes attr =
|
||||
- CudaParallelLaunch<ParallelReduce,
|
||||
- LaunchBounds>::get_cuda_func_attributes();
|
||||
- auto const& prop = pol.space().cuda_device_prop();
|
||||
- // Limits due do registers/SM
|
||||
- int const regs_per_sm = prop.regsPerMultiprocessor;
|
||||
- int const regs_per_thread = attr.numRegs;
|
||||
- int const max_threads_per_sm = regs_per_sm / regs_per_thread;
|
||||
- return std::min(
|
||||
- max_threads_per_sm,
|
||||
- static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
|
||||
+ return max_tile_size_product_helper<ParallelReduce>(pol, LaunchBounds{});
|
||||
}
|
||||
Policy const& get_policy() const { return m_policy; }
|
||||
inline __device__ void exec_range(reference_type update) const {
|
||||
@ -1,63 +0,0 @@
|
||||
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
|
||||
index 170183ca0a..ba43e362bb 100644
|
||||
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
|
||||
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp
|
||||
@@ -412,12 +412,16 @@ struct CudaParallelLaunchKernelInvoker<
|
||||
Impl::check_shmem_request(cuda_instance, shmem);
|
||||
if (DriverType::Policy::
|
||||
experimental_contains_desired_occupancy) {
|
||||
+ /*
|
||||
int desired_occupancy =
|
||||
driver.get_policy().impl_get_desired_occupancy().value();
|
||||
size_t block_size = block.x * block.y * block.z;
|
||||
Impl::configure_shmem_preference<DriverType, LaunchBounds>(
|
||||
base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
|
||||
- shmem, desired_occupancy);
|
||||
+ shmem, desired_occupancy);*/
|
||||
+ Kokkos::Impl::throw_runtime_exception(
|
||||
+ std::string("Cuda graph node creation FAILED:"
|
||||
+ " occupancy requests are currently broken."));
|
||||
}
|
||||
|
||||
void const* args[] = {&driver};
|
||||
@@ -511,14 +515,17 @@ struct CudaParallelLaunchKernelInvoker<
|
||||
|
||||
if (!Impl::is_empty_launch(grid, block)) {
|
||||
Impl::check_shmem_request(cuda_instance, shmem);
|
||||
- if constexpr (DriverType::Policy::
|
||||
+ if (DriverType::Policy::
|
||||
experimental_contains_desired_occupancy) {
|
||||
- int desired_occupancy =
|
||||
+ /*int desired_occupancy =
|
||||
driver.get_policy().impl_get_desired_occupancy().value();
|
||||
size_t block_size = block.x * block.y * block.z;
|
||||
Impl::configure_shmem_preference<DriverType, LaunchBounds>(
|
||||
base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
|
||||
- shmem, desired_occupancy);
|
||||
+ shmem, desired_occupancy);*/
|
||||
+ Kokkos::Impl::throw_runtime_exception(
|
||||
+ std::string("Cuda graph node creation FAILED:"
|
||||
+ " occupancy requests are currently broken."));
|
||||
}
|
||||
|
||||
auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver);
|
||||
@@ -690,14 +697,17 @@ struct CudaParallelLaunchImpl<
|
||||
|
||||
if (DriverType::Policy::
|
||||
experimental_contains_desired_occupancy) {
|
||||
- int desired_occupancy =
|
||||
+ /*int desired_occupancy =
|
||||
driver.get_policy().impl_get_desired_occupancy().value();
|
||||
size_t block_size = block.x * block.y * block.z;
|
||||
Impl::configure_shmem_preference<
|
||||
DriverType,
|
||||
Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
|
||||
base_t::get_kernel_func(), cuda_instance->m_deviceProp, block_size,
|
||||
- shmem, desired_occupancy);
|
||||
+ shmem, desired_occupancy);*/
|
||||
+ Kokkos::Impl::throw_runtime_exception(
|
||||
+ std::string("Cuda graph node creation FAILED:"
|
||||
+ " occupancy requests are currently broken."));
|
||||
}
|
||||
|
||||
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
|
||||
30
potentials/HGa.msmeam
Normal file
30
potentials/HGa.msmeam
Normal file
@ -0,0 +1,30 @@
|
||||
bkgd_dyn = 1
|
||||
emb_lin_neg = 1
|
||||
augt1=0
|
||||
ialloy=1
|
||||
rc = 5.9
|
||||
#H
|
||||
attrac(1,1)=0.460
|
||||
repuls(1,1)=0.460
|
||||
Cmin(1,1,1)=1.3 # PuMS
|
||||
Cmax(1,1,1)= 2.80
|
||||
nn2(1,1)=1
|
||||
#Ga
|
||||
rho0(2) = 0.6
|
||||
attrac(2,2)=0.097
|
||||
repuls(2,2)=0.097
|
||||
nn2(2,2)=1
|
||||
#HGa
|
||||
attrac(1,2)=0.300
|
||||
repuls(1,2)=0.300
|
||||
lattce(1,2)=l12
|
||||
re(1,2)=3.19
|
||||
delta(1,2)=-0.48
|
||||
alpha(1,2)=6.6
|
||||
Cmin(1,1,2)=2.0
|
||||
Cmin(2,1,2)= 2.0
|
||||
Cmin(1,2,1)=2.0
|
||||
Cmin(2,2,1) = 1.4
|
||||
Cmin(1,2,2) = 1.4
|
||||
Cmin(1,1,2) = 1.4
|
||||
nn2(1,2)=1
|
||||
14
potentials/library.msmeam
Normal file
14
potentials/library.msmeam
Normal file
@ -0,0 +1,14 @@
|
||||
# DATE: 2018-09-22 UNITS: metal CONTRIBUTOR: Steve Valone, smv@lanl.gov CITATION: Baskes, PRB 1992; smv, sr, mib, JNM 2010
|
||||
# ms-meam data format May 2010
|
||||
# elt lat z ielement atwt
|
||||
# alpha b0 b1 b2 b3 b1m b2m b3m alat esub asub
|
||||
# - t0 t1 t2 t3 t1m t2m t3m rozero ibar
|
||||
# NOTE: leading character cannot be a space
|
||||
|
||||
'H' 'dim' 1.0 1 1.0079
|
||||
2.960 2.960 3.0 1.0 1.0 1.0 3.0 1.0 0.741 2.235 2.50
|
||||
1.0 0.44721 0.0 0.00 0.0 0.31623 0 6.70 0
|
||||
|
||||
'Ga4' 'fcc' 12.0 31 69.723
|
||||
4.42 4.80 3.10 6.00 0.00 0.0 0.0 0.5 4.247 2.897 0.97
|
||||
1.0 1.649 1.435 0.00 0.0 0.0 2.0 0.70 0
|
||||
@ -316,6 +316,8 @@ action pair_lj_spica_kokkos.cpp pair_lj_spica.cpp
|
||||
action pair_lj_spica_kokkos.h pair_lj_spica.h
|
||||
action pair_meam_kokkos.cpp pair_meam.cpp
|
||||
action pair_meam_kokkos.h pair_meam.h
|
||||
action pair_meam_ms_kokkos.cpp pair_meam_ms.cpp
|
||||
action pair_meam_ms_kokkos.h pair_meam_ms.h
|
||||
action pair_mliap_kokkos.cpp pair_mliap.cpp
|
||||
action pair_mliap_kokkos.h pair_mliap.h
|
||||
action pair_morse_kokkos.cpp
|
||||
|
||||
@ -61,34 +61,61 @@ void MEAMKokkos<DeviceType>::operator()(TagMEAMDensFinal, const int &i, EV_FLOAT
|
||||
if (elti >= 0) {
|
||||
scaleii = d_scale(type[i],type[i]);
|
||||
d_rho1[i] = 0.0;
|
||||
d_rho2[i] = -1.0 / 3.0 * d_arho2b[i] * d_arho2b[i];
|
||||
if (msmeamflag) {
|
||||
d_rho2[i] = -1.0 / 3.0 * (d_arho2b[i] * d_arho2b[i]
|
||||
- d_arho2mb[i] * d_arho2mb[i]);
|
||||
} else{
|
||||
d_rho2[i] = -1.0 / 3.0 * d_arho2b[i] * d_arho2b[i];
|
||||
}
|
||||
d_rho3[i] = 0.0;
|
||||
for (int m = 0; m < 3; m++) {
|
||||
d_rho1[i] += d_arho1(i,m) * d_arho1(i,m);
|
||||
d_rho3[i] -= 3.0 / 5.0 * d_arho3b(i,m) * d_arho3b(i,m);
|
||||
}
|
||||
for (int m = 0; m < 6; m++)
|
||||
d_rho2[i] += v2D[m] * d_arho2(i,m) * d_arho2(i,m);
|
||||
for (int m = 0; m < 10; m++)
|
||||
d_rho3[i] += v3D[m] * d_arho3(i,m) * d_arho3(i,m);
|
||||
|
||||
if (d_rho0[i] > 0.0) {
|
||||
if (ialloy == 1) {
|
||||
d_t_ave(i,0) = fdiv_zero_kk(d_t_ave(i,0), d_tsq_ave(i,0));
|
||||
d_t_ave(i,1) = fdiv_zero_kk(d_t_ave(i,1), d_tsq_ave(i,1));
|
||||
d_t_ave(i,2) = fdiv_zero_kk(d_t_ave(i,2), d_tsq_ave(i,2));
|
||||
} else if (ialloy == 2) {
|
||||
d_t_ave(i,0) = t1_meam[elti];
|
||||
d_t_ave(i,1) = t2_meam[elti];
|
||||
d_t_ave(i,2) = t3_meam[elti];
|
||||
} else {
|
||||
d_t_ave(i,0) /= d_rho0[i];
|
||||
d_t_ave(i,1) /= d_rho0[i];
|
||||
d_t_ave(i,2) /= d_rho0[i];
|
||||
if (msmeamflag) {
|
||||
d_rho1[i] = d_rho1[i] + d_arho1(i, m) * d_arho1(i, m)
|
||||
- d_arho1m(i, m) * d_arho1m(i, m);
|
||||
d_rho3[i] = d_rho3[i] - 3.0 / 5.0 * (d_arho3b(i, m) * d_arho3b(i, m)
|
||||
- d_arho3mb(i, m) * d_arho3mb(i, m));
|
||||
} else{
|
||||
d_rho1[i] += d_arho1(i,m) * d_arho1(i,m);
|
||||
d_rho3[i] -= 3.0 / 5.0 * d_arho3b(i,m) * d_arho3b(i,m);
|
||||
}
|
||||
}
|
||||
for (int m = 0; m < 6; m++){
|
||||
if (msmeamflag) {
|
||||
d_rho2[i] = d_rho2[i] + v2D[m] * (d_arho2(i, m) * d_arho2(i, m)
|
||||
- d_arho2m(i, m) * d_arho2m(i, m));
|
||||
} else{
|
||||
d_rho2[i] += v2D[m] * d_arho2(i,m) * d_arho2(i,m);
|
||||
}
|
||||
}
|
||||
for (int m = 0; m < 10; m++)
|
||||
if (msmeamflag) {
|
||||
d_rho3[i] = d_rho3[i] + v3D[m] * (d_arho3(i, m) * d_arho3(i, m)
|
||||
- d_arho3m(i, m) * d_arho3m(i, m));
|
||||
} else{
|
||||
d_rho3[i] += v3D[m] * d_arho3(i,m) * d_arho3(i,m);
|
||||
}
|
||||
|
||||
d_gamma[i] = d_t_ave(i,0) * d_rho1[i] + d_t_ave(i,1) * d_rho2[i] + d_t_ave(i,2) * d_rho3[i];
|
||||
if (msmeamflag) {
|
||||
// with msmeam all t weights are already accounted for in rho
|
||||
d_gamma[i] = d_rho1[i] + d_rho2[i] + d_rho3[i];
|
||||
} else{
|
||||
if (d_rho0[i] > 0.0) {
|
||||
if (ialloy == 1) {
|
||||
d_t_ave(i,0) = fdiv_zero_kk(d_t_ave(i,0), d_tsq_ave(i,0));
|
||||
d_t_ave(i,1) = fdiv_zero_kk(d_t_ave(i,1), d_tsq_ave(i,1));
|
||||
d_t_ave(i,2) = fdiv_zero_kk(d_t_ave(i,2), d_tsq_ave(i,2));
|
||||
} else if (ialloy == 2) {
|
||||
d_t_ave(i,0) = t1_meam[elti];
|
||||
d_t_ave(i,1) = t2_meam[elti];
|
||||
d_t_ave(i,2) = t3_meam[elti];
|
||||
} else {
|
||||
d_t_ave(i,0) /= d_rho0[i];
|
||||
d_t_ave(i,1) /= d_rho0[i];
|
||||
d_t_ave(i,2) /= d_rho0[i];
|
||||
}
|
||||
}
|
||||
d_gamma[i] = d_t_ave(i,0) * d_rho1[i] + d_t_ave(i,1) * d_rho2[i] + d_t_ave(i,2) * d_rho3[i];
|
||||
}
|
||||
|
||||
if (d_rho0[i] > 0.0)
|
||||
d_gamma[i] /= (d_rho0[i] * d_rho0[i]);
|
||||
|
||||
@ -43,11 +43,23 @@ void MEAMKokkos<DeviceType>::operator()(TagMEAMZero, const int &i) const {
|
||||
d_rho0[i] = 0.0;
|
||||
d_arho2b[i] = 0.0;
|
||||
d_arho1(i,0) = d_arho1(i,1) = d_arho1(i,2) = 0.0;
|
||||
for (int j = 0; j < 6; j++)
|
||||
if (msmeamflag) {
|
||||
d_arho2mb[i] = 0.0;
|
||||
d_arho1m(i,0) = d_arho1m(i,1) = d_arho1m(i,2) = 0.0;
|
||||
}
|
||||
for (int j = 0; j < 6; j++) {
|
||||
d_arho2(i,j) = 0.0;
|
||||
for (int j = 0; j < 10; j++)
|
||||
if (msmeamflag)
|
||||
d_arho2m(i,j) = 0.0;
|
||||
}
|
||||
for (int j = 0; j < 10; j++) {
|
||||
d_arho3(i,j) = 0.0;
|
||||
if (msmeamflag)
|
||||
d_arho3m(i,j) = 0.0;
|
||||
}
|
||||
d_arho3b(i,0) = d_arho3b(i,1) = d_arho3b(i,2) = 0.0;
|
||||
if (msmeamflag)
|
||||
d_arho3mb(i,0) = d_arho3mb(i,1) = d_arho3mb(i,2) = 0.0;
|
||||
d_t_ave(i,0) = d_t_ave(i,1) = d_t_ave(i,2) = 0.0;
|
||||
d_tsq_ave(i,0) = d_tsq_ave(i,1) = d_tsq_ave(i,2) = 0.0;
|
||||
}
|
||||
@ -80,13 +92,20 @@ MEAMKokkos<DeviceType>::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
|
||||
memoryKK->destroy_kokkos(k_arho3b,arho3b);
|
||||
memoryKK->destroy_kokkos(k_t_ave,t_ave);
|
||||
memoryKK->destroy_kokkos(k_tsq_ave,tsq_ave);
|
||||
// msmeam
|
||||
memoryKK->destroy_kokkos(k_arho2mb, arho2mb);
|
||||
memoryKK->destroy_kokkos(k_arho1m, arho1m);
|
||||
memoryKK->destroy_kokkos(k_arho2m, arho2m);
|
||||
memoryKK->destroy_kokkos(k_arho3m, arho3m);
|
||||
memoryKK->destroy_kokkos(k_arho3mb, arho3mb);
|
||||
|
||||
nmax = atom_nmax;
|
||||
// memory->create(rho, nmax, "pair:rho");
|
||||
|
||||
//memory->create(rho, nmax, "pair:rho");
|
||||
k_rho = DAT::tdual_ffloat_1d("pair:rho",nmax);
|
||||
d_rho = k_rho.template view<DeviceType>();
|
||||
h_rho = k_rho.h_view;
|
||||
// memory->create(rho0, nmax, "pair:rho0");
|
||||
//memory->create(rho0, nmax, "pair:rho0");
|
||||
k_rho0 = DAT::tdual_ffloat_1d("pair:rho0",nmax);
|
||||
d_rho0 = k_rho0.template view<DeviceType>();
|
||||
h_rho0 = k_rho0.h_view;
|
||||
@ -150,6 +169,28 @@ MEAMKokkos<DeviceType>::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
|
||||
k_tsq_ave = DAT::tdual_ffloat_2d("pair:tsq_ave",nmax, 3);
|
||||
d_tsq_ave = k_tsq_ave.template view<DeviceType>();
|
||||
h_tsq_ave = k_tsq_ave.h_view;
|
||||
|
||||
// msmeam
|
||||
//memory->create(arho2mb, nmax, "pair:arho2mb");
|
||||
k_arho2mb = DAT::tdual_ffloat_1d("pair:arho2mb",nmax);
|
||||
d_arho2mb = k_arho2mb.template view<DeviceType>();
|
||||
h_arho2mb = k_arho2mb.h_view;
|
||||
//memory->create(arho1m, nmax, 3, "pair:arho1m");
|
||||
k_arho1m = DAT::tdual_ffloat_2d("pair:arho1m", nmax, 3);
|
||||
d_arho1m = k_arho1m.template view<DeviceType>();
|
||||
h_arho1m = k_arho1m.h_view;
|
||||
//memory->create(arho2m, nmax, 6, "pair:arho2m");
|
||||
k_arho2m = DAT::tdual_ffloat_2d("pair:arho2m", nmax, 6);
|
||||
d_arho2m = k_arho2m.template view<DeviceType>();
|
||||
h_arho2m = k_arho2m.h_view;
|
||||
//memory->create(arho3m, nmax, 10, "pair:arho3m");
|
||||
k_arho3m = DAT::tdual_ffloat_2d("pair:arho3m", nmax, 10);
|
||||
d_arho3m = k_arho3m.template view<DeviceType>();
|
||||
h_arho3m = k_arho3m.h_view;
|
||||
//memory->create(arho3mb, nmax, 3, "pair:arho3mb");
|
||||
k_arho3mb = DAT::tdual_ffloat_2d("pair:arho3mb", nmax, 3);
|
||||
d_arho3mb = k_arho3mb.template view<DeviceType>();
|
||||
h_arho3mb = k_arho3mb.h_view;
|
||||
}
|
||||
|
||||
if (n_neigh > maxneigh) {
|
||||
@ -206,6 +247,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
|
||||
dup_arho3b = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho3b);
|
||||
dup_t_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_t_ave);
|
||||
dup_tsq_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_tsq_ave);
|
||||
// msmeam
|
||||
dup_arho2mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho2mb);
|
||||
dup_arho1m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho1m);
|
||||
dup_arho2m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho2m);
|
||||
dup_arho3m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho3m);
|
||||
dup_arho3mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterDuplicated>(d_arho3mb);
|
||||
} else {
|
||||
ndup_rho0 = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_rho0);
|
||||
ndup_arho2b = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho2b);
|
||||
@ -215,6 +262,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
|
||||
ndup_arho3b = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho3b);
|
||||
ndup_t_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_t_ave);
|
||||
ndup_tsq_ave = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_tsq_ave);
|
||||
// msmeam
|
||||
ndup_arho2mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho2mb);
|
||||
ndup_arho1m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho1m);
|
||||
ndup_arho2m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho2m);
|
||||
ndup_arho3m = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho3m);
|
||||
ndup_arho3mb = Kokkos::Experimental::create_scatter_view<Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonDuplicated>(d_arho3mb);
|
||||
}
|
||||
|
||||
copymode = 1;
|
||||
@ -233,6 +286,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
|
||||
Kokkos::Experimental::contribute(d_arho3b, dup_arho3b);
|
||||
Kokkos::Experimental::contribute(d_t_ave, dup_t_ave);
|
||||
Kokkos::Experimental::contribute(d_tsq_ave, dup_tsq_ave);
|
||||
// msmeam
|
||||
Kokkos::Experimental::contribute(d_arho2mb, dup_arho2mb);
|
||||
Kokkos::Experimental::contribute(d_arho1m, dup_arho1m);
|
||||
Kokkos::Experimental::contribute(d_arho2m, dup_arho2m);
|
||||
Kokkos::Experimental::contribute(d_arho3m, dup_arho3m);
|
||||
Kokkos::Experimental::contribute(d_arho3mb, dup_arho3mb);
|
||||
|
||||
// free duplicated memory
|
||||
dup_rho0 = decltype(dup_rho0)();
|
||||
@ -243,6 +302,12 @@ MEAMKokkos<DeviceType>::meam_dens_init(int inum_half, int ntype, typename AT::t_
|
||||
dup_arho3b = decltype(dup_arho3b)();
|
||||
dup_t_ave = decltype(dup_t_ave)();
|
||||
dup_tsq_ave = decltype(dup_tsq_ave)();
|
||||
// msmeam
|
||||
dup_arho2mb = decltype(dup_arho2mb)();
|
||||
dup_arho1m = decltype(dup_arho1m)();
|
||||
dup_arho2m = decltype(dup_arho2m)();
|
||||
dup_arho3m = decltype(dup_arho3m)();
|
||||
dup_arho3mb = decltype(dup_arho3mb)();
|
||||
}
|
||||
}
|
||||
|
||||
@ -417,7 +482,6 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
|
||||
int offset) const
|
||||
{
|
||||
// The rho0, etc. arrays are duplicated for OpenMP, atomic for CUDA, and neither for Serial
|
||||
|
||||
auto v_rho0 = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_rho0),decltype(ndup_rho0)>::get(dup_rho0,ndup_rho0);
|
||||
auto a_rho0 = v_rho0.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
|
||||
auto v_arho2b = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho2b),decltype(ndup_arho2b)>::get(dup_arho2b,ndup_arho2b);
|
||||
@ -434,6 +498,17 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
|
||||
auto a_t_ave = v_t_ave.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
|
||||
auto v_tsq_ave = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_tsq_ave),decltype(ndup_tsq_ave)>::get(dup_tsq_ave,ndup_tsq_ave);
|
||||
auto a_tsq_ave = v_tsq_ave.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
|
||||
// msmeam
|
||||
auto v_arho2mb = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho2mb),decltype(ndup_arho2mb)>::get(dup_arho2mb,ndup_arho2mb);
|
||||
auto a_arho2mb = v_arho2mb.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
|
||||
auto v_arho1m = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho1m),decltype(ndup_arho1m)>::get(dup_arho1m,ndup_arho1m);
|
||||
auto a_arho1m = v_arho1m.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
|
||||
auto v_arho2m = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho2m),decltype(ndup_arho2m)>::get(dup_arho2m,ndup_arho2m);
|
||||
auto a_arho2m = v_arho2m.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
|
||||
auto v_arho3m = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho3m),decltype(ndup_arho3m)>::get(dup_arho3m,ndup_arho3m);
|
||||
auto a_arho3m = v_arho3m.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
|
||||
auto v_arho3mb = ScatterViewHelper<NeedDup_v<NEIGHFLAG,DeviceType>,decltype(dup_arho3mb),decltype(ndup_arho3mb)>::get(dup_arho3mb,ndup_arho3mb);
|
||||
auto a_arho3mb = v_arho3mb.template access<AtomicDup_v<NEIGHFLAG,DeviceType>>();
|
||||
|
||||
const int elti = d_map[type[i]];
|
||||
const double xtmp = x(i,0);
|
||||
@ -463,6 +538,16 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
|
||||
double rhoa1i = ro0i * MathSpecialKokkos::fm_exp(-beta1_meam[elti] * ai) * sij;
|
||||
double rhoa2i = ro0i * MathSpecialKokkos::fm_exp(-beta2_meam[elti] * ai) * sij;
|
||||
double rhoa3i = ro0i * MathSpecialKokkos::fm_exp(-beta3_meam[elti] * ai) * sij;
|
||||
// msmeam
|
||||
double rhoa1mj, rhoa2mj, rhoa3mj, rhoa1mi, rhoa2mi, rhoa3mi;
|
||||
if (msmeamflag) {
|
||||
rhoa1mj = ro0j * t1m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta1m_meam[eltj] * aj) * sij;
|
||||
rhoa2mj = ro0j * t2m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta2m_meam[eltj] * aj) * sij;
|
||||
rhoa3mj = ro0j * t3m_meam[eltj] * MathSpecialKokkos::fm_exp(-beta3m_meam[eltj] * aj) * sij;
|
||||
rhoa1mi = ro0i * t1m_meam[elti] * MathSpecialKokkos::fm_exp(-beta1m_meam[elti] * ai) * sij;
|
||||
rhoa2mi = ro0i * t2m_meam[elti] * MathSpecialKokkos::fm_exp(-beta2m_meam[elti] * ai) * sij;
|
||||
rhoa3mi = ro0i * t3m_meam[elti] * MathSpecialKokkos::fm_exp(-beta3m_meam[elti] * ai) * sij;
|
||||
}
|
||||
if (ialloy == 1) {
|
||||
rhoa1j *= t1_meam[eltj];
|
||||
rhoa2j *= t2_meam[eltj];
|
||||
@ -499,20 +584,45 @@ MEAMKokkos<DeviceType>::calc_rho1(int i, int /*ntype*/, typename AT::t_int_1d ty
|
||||
const double A1i = rhoa1i / rij;
|
||||
const double A2i = rhoa2i / rij2;
|
||||
const double A3i = rhoa3i / (rij2 * rij);
|
||||
double A1mj, A2mj, A3mj, A1mi, A2mi, A3mi;
|
||||
if (msmeamflag) {
|
||||
a_arho2mb[i] += rhoa2mj;
|
||||
a_arho2mb[j] += rhoa2mi;
|
||||
A1mj = rhoa1mj / rij;
|
||||
A2mj = rhoa2mj / rij2;
|
||||
A3mj = rhoa3mj / (rij2 * rij);
|
||||
A1mi = rhoa1mi / rij;
|
||||
A2mi = rhoa2mi / rij2;
|
||||
A3mi = rhoa3mi / (rij2 * rij);
|
||||
}
|
||||
int nv2 = 0;
|
||||
int nv3 = 0;
|
||||
for (int m = 0; m < 3; m++) {
|
||||
a_arho1(i,m) += A1j * delij[m];
|
||||
a_arho1(i,m) += A1j * delij[m];
|
||||
a_arho1(j,m) += -A1i * delij[m];
|
||||
a_arho3b(i,m) += rhoa3j * delij[m] / rij;
|
||||
a_arho3b(i,m) += rhoa3j * delij[m] / rij;
|
||||
a_arho3b(j,m) += -rhoa3i * delij[m] / rij;
|
||||
if (msmeamflag) {
|
||||
a_arho1m(i,m) += A1mj * delij[m];
|
||||
a_arho1m(j,m) += -A1mi * delij[m];
|
||||
a_arho3mb(i,m) += rhoa3mj * delij[m] / rij;
|
||||
a_arho3mb(j,m) += -rhoa3mi * delij[m] / rij;
|
||||
}
|
||||
for (int n = m; n < 3; n++) {
|
||||
a_arho2(i,nv2) += A2j * delij[m] * delij[n];
|
||||
a_arho2(j,nv2) += A2i * delij[m] * delij[n];
|
||||
if (msmeamflag) {
|
||||
a_arho2m(i,nv2) += A2mj * delij[m] * delij[n];
|
||||
a_arho2m(j,nv2) += A2mi * delij[m] * delij[n];
|
||||
}
|
||||
nv2++;
|
||||
for (int p = n; p < 3; p++) {
|
||||
a_arho3(i,nv3) += A3j * delij[m] * delij[n] * delij[p];
|
||||
a_arho3(i,nv3) += A3j * delij[m] * delij[n] * delij[p];
|
||||
a_arho3(j,nv3) += -A3i * delij[m] * delij[n] * delij[p];
|
||||
if (msmeamflag) {
|
||||
a_arho3m(i,nv3) += A3mj * delij[m] * delij[n] * delij[p];
|
||||
a_arho3m(j,nv3) += -A3mi * delij[m] * delij[n] * delij[p];
|
||||
}
|
||||
nv3++;
|
||||
}
|
||||
}
|
||||
|
||||
@ -119,6 +119,17 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
|
||||
double dsij1, dsij2, force1, force2;
|
||||
double t1i, t2i, t3i, t1j, t2j, t3j;
|
||||
int fnoffset;
|
||||
// msmeam
|
||||
double rhoa1mj,drhoa1mj,rhoa1mi,drhoa1mi;
|
||||
double rhoa2mj,drhoa2mj,rhoa2mi,drhoa2mi;
|
||||
double rhoa3mj, drhoa3mj, rhoa3mi, drhoa3mi;
|
||||
double arg1i1m, arg1j1m, arg1i2m, arg1j2m, arg1i3m, arg1j3m, arg3i3m, arg3j3m;
|
||||
double drho1mdr1, drho1mdr2, drho1mds1, drho1mds2;
|
||||
double drho1mdrm1[3], drho1mdrm2[3];
|
||||
double drho2mdr1, drho2mdr2, drho2mds1, drho2mds2;
|
||||
double drho2mdrm1[3], drho2mdrm2[3];
|
||||
double drho3mdr1, drho3mdr2, drho3mds1, drho3mds2;
|
||||
double drho3mdrm1[3], drho3mdrm2[3];
|
||||
|
||||
// The f, etc. arrays are duplicated for OpenMP, atomic for CUDA, and neither for Serial
|
||||
|
||||
@ -197,6 +208,14 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
|
||||
drhoa2i = -beta2_meam[elti] * invrei * rhoa2i;
|
||||
rhoa3i = ro0i * MathSpecialKokkos::fm_exp(-beta3_meam[elti] * ai);
|
||||
drhoa3i = -beta3_meam[elti] * invrei * rhoa3i;
|
||||
if (msmeamflag) {
|
||||
rhoa1mi = ro0i * MathSpecialKokkos::fm_exp(-beta1m_meam[elti] * ai) * t1m_meam[elti];
|
||||
drhoa1mi = -beta1m_meam[elti] * invrei * rhoa1mi;
|
||||
rhoa2mi = ro0i * MathSpecialKokkos::fm_exp(-beta2m_meam[elti] * ai) * t2m_meam[elti];
|
||||
drhoa2mi = -beta2m_meam[elti] * invrei * rhoa2mi;
|
||||
rhoa3mi = ro0i * MathSpecialKokkos::fm_exp(-beta3m_meam[elti] * ai) * t3m_meam[elti];
|
||||
drhoa3mi = -beta3m_meam[elti] * invrei * rhoa3mi;
|
||||
}
|
||||
|
||||
if (elti != eltj) {
|
||||
invrej = 1.0 / re_meam[eltj][eltj];
|
||||
@ -210,6 +229,14 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
|
||||
drhoa2j = -beta2_meam[eltj] * invrej * rhoa2j;
|
||||
rhoa3j = ro0j * MathSpecialKokkos::fm_exp(-beta3_meam[eltj] * aj);
|
||||
drhoa3j = -beta3_meam[eltj] * invrej * rhoa3j;
|
||||
if (msmeamflag) {
|
||||
rhoa1mj = ro0j * MathSpecialKokkos::fm_exp(-beta1m_meam[eltj] * aj) * t1m_meam[eltj];
|
||||
drhoa1mj = -beta1m_meam[eltj] * invrej * rhoa1mj;
|
||||
rhoa2mj = ro0j * MathSpecialKokkos::fm_exp(-beta2m_meam[eltj] * aj) * t2m_meam[eltj];
|
||||
drhoa2mj = -beta2m_meam[eltj] * invrej * rhoa2mj;
|
||||
rhoa3mj = ro0j * MathSpecialKokkos::fm_exp(-beta3m_meam[eltj] * aj) * t3m_meam[eltj];
|
||||
drhoa3mj = -beta3m_meam[eltj] * invrej * rhoa3mj;
|
||||
}
|
||||
} else {
|
||||
rhoa0j = rhoa0i;
|
||||
drhoa0j = drhoa0i;
|
||||
@ -219,6 +246,14 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
|
||||
drhoa2j = drhoa2i;
|
||||
rhoa3j = rhoa3i;
|
||||
drhoa3j = drhoa3i;
|
||||
if (msmeamflag) {
|
||||
rhoa1mj = rhoa1mi;
|
||||
drhoa1mj = drhoa1mi;
|
||||
rhoa2mj = rhoa2mi;
|
||||
drhoa2mj = drhoa2mi;
|
||||
rhoa3mj = rhoa3mi;
|
||||
drhoa3mj = drhoa3mi;
|
||||
}
|
||||
}
|
||||
|
||||
const double t1mi = t1_meam[elti];
|
||||
@ -228,7 +263,10 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
|
||||
const double t2mj = t2_meam[eltj];
|
||||
const double t3mj = t3_meam[eltj];
|
||||
|
||||
if (ialloy == 1) {
|
||||
// ialloy mod not needed in MS-MEAM, but similarity here is that we multply rhos by t.
|
||||
// We did this above with rhoa1mj, rhoa2mj, etc.
|
||||
|
||||
if (ialloy == 1 || msmeamflag) {
|
||||
rhoa1j *= t1mj;
|
||||
rhoa2j *= t2mj;
|
||||
rhoa3j *= t3mj;
|
||||
@ -272,6 +310,38 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
|
||||
arg3j3 = arg3j3 - d_arho3b(j, n) * delij[n];
|
||||
}
|
||||
|
||||
// msmeam arhom args
|
||||
nv2 = 0.0;
|
||||
nv3 = 0.0;
|
||||
arg1i1m = 0.0;
|
||||
arg1j1m = 0.0;
|
||||
arg1i2m = 0.0;
|
||||
arg1j2m = 0.0;
|
||||
arg1i3m = 0.0;
|
||||
arg1j3m = 0.0;
|
||||
arg3i3m = 0.0;
|
||||
arg3j3m = 0.0;
|
||||
if (msmeamflag) {
|
||||
for (n = 0; n < 3; n++) {
|
||||
for (p = n; p < 3; p++) {
|
||||
for (q = p; q < 3; q++) {
|
||||
arg = delij[n] * delij[p] * delij[q] * v3D[nv3];
|
||||
arg1i3m = arg1i3m - d_arho3m(i, nv3) * arg;
|
||||
arg1j3m = arg1j3m + d_arho3m(j, nv3) * arg;
|
||||
nv3 = nv3 + 1;
|
||||
}
|
||||
arg = delij[n] * delij[p] * v2D[nv2];
|
||||
arg1i2m = arg1i2m + d_arho2m(i, nv2) * arg;
|
||||
arg1j2m = arg1j2m + d_arho2m(j, nv2) * arg;
|
||||
nv2 = nv2 + 1;
|
||||
}
|
||||
arg1i1m = arg1i1m - d_arho1m(i, n) * delij[n];
|
||||
arg1j1m = arg1j1m + d_arho1m(j, n) * delij[n];
|
||||
arg3i3m = arg3i3m - d_arho3mb(i, n) * delij[n];
|
||||
arg3j3m = arg3j3m + d_arho3mb(j, n) * delij[n];
|
||||
}
|
||||
}
|
||||
|
||||
// rho0 terms
|
||||
drho0dr1 = drhoa0j * sij;
|
||||
drho0dr2 = drhoa0i * sij;
|
||||
@ -330,75 +400,183 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
|
||||
drho3drm2[m] = (-a3 * drho3drm2[m] + a3a * d_arho3b(j, m)) * rhoa3i;
|
||||
}
|
||||
|
||||
if (msmeamflag) {
|
||||
// rho1m terms
|
||||
a1 = 2 * sij / rij;
|
||||
drho1mdr1 = a1 * (drhoa1mj - rhoa1mj / rij) * arg1i1m;
|
||||
drho1mdr2 = a1 * (drhoa1mi - rhoa1mi / rij) * arg1j1m;
|
||||
drho1mdr1 *= -1.0;
|
||||
drho1mdr2 *= -1.0;
|
||||
a1 = 2.0 * sij / rij;
|
||||
for (m = 0; m < 3; m++) {
|
||||
drho1mdrm1[m] = a1 * rhoa1mj * d_arho1m(i, m);
|
||||
drho1mdrm2[m] = -a1 * rhoa1mi * d_arho1m(j, m);
|
||||
}
|
||||
|
||||
// rho2m terms
|
||||
a2 = 2 * sij / rij2;
|
||||
drho2mdr1 = a2 * (drhoa2mj - 2 * rhoa2mj / rij) * arg1i2m - 2.0 / 3.0 * d_arho2mb[i] * drhoa2mj * sij;
|
||||
drho2mdr2 = a2 * (drhoa2mi - 2 * rhoa2mi / rij) * arg1j2m - 2.0 / 3.0 * d_arho2mb[j] * drhoa2mi * sij;
|
||||
a2 = 4 * sij / rij2;
|
||||
for (m = 0; m < 3; m++) {
|
||||
drho2mdrm1[m] = 0.0;
|
||||
drho2mdrm2[m] = 0.0;
|
||||
for (n = 0; n < 3; n++) {
|
||||
drho2mdrm1[m] = drho2mdrm1[m] + d_arho2m(i, vind2D[m][n]) * delij[n];
|
||||
drho2mdrm2[m] = drho2mdrm2[m] - d_arho2m(j, vind2D[m][n]) * delij[n];
|
||||
}
|
||||
drho2mdrm1[m] = a2 * rhoa2mj * drho2mdrm1[m];
|
||||
drho2mdrm2[m] = -a2 * rhoa2mi * drho2mdrm2[m];
|
||||
}
|
||||
|
||||
// rho3m terms
|
||||
rij3 = rij * rij2;
|
||||
a3 = 2 * sij / rij3;
|
||||
a3a = 6.0 / 5.0 * sij / rij;
|
||||
drho3mdr1 = a3 * (drhoa3mj - 3 * rhoa3mj / rij) * arg1i3m - a3a * (drhoa3mj - rhoa3mj / rij) * arg3i3m;
|
||||
drho3mdr2 = a3 * (drhoa3mi - 3 * rhoa3mi / rij) * arg1j3m - a3a * (drhoa3mi - rhoa3mi / rij) * arg3j3m;
|
||||
drho3mdr1 *= -1.0;
|
||||
drho3mdr2 *= -1.0;
|
||||
|
||||
a3 = 6 * sij / rij3;
|
||||
a3a = 6 * sij / (5 * rij);
|
||||
for (m = 0; m < 3; m++) {
|
||||
drho3mdrm1[m] = 0.0;
|
||||
drho3mdrm2[m] = 0.0;
|
||||
nv2 = 0;
|
||||
for (n = 0; n < 3; n++) {
|
||||
for (p = n; p < 3; p++) {
|
||||
arg = delij[n] * delij[p] * this->v2D[nv2];
|
||||
drho3mdrm1[m] = drho3mdrm1[m] + d_arho3m(i, vind3D[m][n][p]) * arg;
|
||||
drho3mdrm2[m] = drho3mdrm2[m] + d_arho3m(j, vind3D[m][n][p]) * arg;
|
||||
nv2 = nv2 + 1;
|
||||
}
|
||||
}
|
||||
drho3mdrm1[m] = (a3 * drho3mdrm1[m] - a3a * d_arho3mb(i, m)) * rhoa3mj;
|
||||
drho3mdrm2[m] = (-a3 * drho3mdrm2[m] + a3a * d_arho3mb(j, m)) * rhoa3mi;
|
||||
}
|
||||
} else {
|
||||
for (m = 0; m < 3; m++) {
|
||||
drho1mdrm1[m] = 0.0;
|
||||
drho1mdrm2[m] = 0.0;
|
||||
drho2mdrm1[m] = 0.0;
|
||||
drho2mdrm2[m] = 0.0;
|
||||
drho3mdrm1[m] = 0.0;
|
||||
drho3mdrm2[m] = 0.0;
|
||||
}
|
||||
}
|
||||
|
||||
// Compute derivatives of weighting functions t wrt rij
|
||||
t1i = d_t_ave(i, 0);
|
||||
t2i = d_t_ave(i, 1);
|
||||
t3i = d_t_ave(i, 2);
|
||||
t1j = d_t_ave(j, 0);
|
||||
t2j = d_t_ave(j, 1);
|
||||
t3j = d_t_ave(j, 2);
|
||||
|
||||
if (ialloy == 1) {
|
||||
|
||||
a1i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 0));
|
||||
a1j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 0));
|
||||
a2i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 1));
|
||||
a2j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 1));
|
||||
a3i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 2));
|
||||
a3j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 2));
|
||||
|
||||
dt1dr1 = a1i * (t1mj - t1i * MathSpecialKokkos::square(t1mj));
|
||||
dt1dr2 = a1j * (t1mi - t1j * MathSpecialKokkos::square(t1mi));
|
||||
dt2dr1 = a2i * (t2mj - t2i * MathSpecialKokkos::square(t2mj));
|
||||
dt2dr2 = a2j * (t2mi - t2j * MathSpecialKokkos::square(t2mi));
|
||||
dt3dr1 = a3i * (t3mj - t3i * MathSpecialKokkos::square(t3mj));
|
||||
dt3dr2 = a3j * (t3mi - t3j * MathSpecialKokkos::square(t3mi));
|
||||
|
||||
} else if (ialloy == 2) {
|
||||
// Weighting functions t set to unity for msmeam
|
||||
|
||||
if (msmeamflag) {
|
||||
t1i = 1.0;
|
||||
t2i = 1.0;
|
||||
t3i = 1.0;
|
||||
t1j = 1.0;
|
||||
t2j = 1.0;
|
||||
t3j = 1.0;
|
||||
dt1dr1 = 0.0;
|
||||
dt1dr2 = 0.0;
|
||||
dt2dr1 = 0.0;
|
||||
dt2dr2 = 0.0;
|
||||
dt3dr1 = 0.0;
|
||||
dt3dr2 = 0.0;
|
||||
|
||||
} else {
|
||||
|
||||
ai = 0.0;
|
||||
if (!iszero_kk(d_rho0[i])) ai = drhoa0j * sij / d_rho0[i];
|
||||
aj = 0.0;
|
||||
if (!iszero_kk(d_rho0[j])) aj = drhoa0i * sij / d_rho0[j];
|
||||
t1i = d_t_ave(i, 0);
|
||||
t2i = d_t_ave(i, 1);
|
||||
t3i = d_t_ave(i, 2);
|
||||
t1j = d_t_ave(j, 0);
|
||||
t2j = d_t_ave(j, 1);
|
||||
t3j = d_t_ave(j, 2);
|
||||
|
||||
dt1dr1 = ai * (t1mj - t1i);
|
||||
dt1dr2 = aj * (t1mi - t1j);
|
||||
dt2dr1 = ai * (t2mj - t2i);
|
||||
dt2dr2 = aj * (t2mi - t2j);
|
||||
dt3dr1 = ai * (t3mj - t3i);
|
||||
dt3dr2 = aj * (t3mi - t3j);
|
||||
if (ialloy == 1) {
|
||||
|
||||
a1i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 0));
|
||||
a1j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 0));
|
||||
a2i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 1));
|
||||
a2j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 1));
|
||||
a3i = fdiv_zero_kk(drhoa0j * sij, d_tsq_ave(i, 2));
|
||||
a3j = fdiv_zero_kk(drhoa0i * sij, d_tsq_ave(j, 2));
|
||||
|
||||
dt1dr1 = a1i * (t1mj - t1i * MathSpecialKokkos::square(t1mj));
|
||||
dt1dr2 = a1j * (t1mi - t1j * MathSpecialKokkos::square(t1mi));
|
||||
dt2dr1 = a2i * (t2mj - t2i * MathSpecialKokkos::square(t2mj));
|
||||
dt2dr2 = a2j * (t2mi - t2j * MathSpecialKokkos::square(t2mi));
|
||||
dt3dr1 = a3i * (t3mj - t3i * MathSpecialKokkos::square(t3mj));
|
||||
dt3dr2 = a3j * (t3mi - t3j * MathSpecialKokkos::square(t3mi));
|
||||
|
||||
} else if (ialloy == 2) {
|
||||
|
||||
dt1dr1 = 0.0;
|
||||
dt1dr2 = 0.0;
|
||||
dt2dr1 = 0.0;
|
||||
dt2dr2 = 0.0;
|
||||
dt3dr1 = 0.0;
|
||||
dt3dr2 = 0.0;
|
||||
|
||||
} else {
|
||||
|
||||
ai = 0.0;
|
||||
if (!iszero_kk(d_rho0[i])) ai = drhoa0j * sij / d_rho0[i];
|
||||
aj = 0.0;
|
||||
if (!iszero_kk(d_rho0[j])) aj = drhoa0i * sij / d_rho0[j];
|
||||
|
||||
dt1dr1 = ai * (t1mj - t1i);
|
||||
dt1dr2 = aj * (t1mi - t1j);
|
||||
dt2dr1 = ai * (t2mj - t2i);
|
||||
dt2dr2 = aj * (t2mi - t2j);
|
||||
dt3dr1 = ai * (t3mj - t3i);
|
||||
dt3dr2 = aj * (t3mi - t3j);
|
||||
}
|
||||
}
|
||||
|
||||
// Compute derivatives of total density wrt rij, sij and rij(3)
|
||||
get_shpfcn(lattce_meam[elti][elti], stheta_meam[elti][elti], ctheta_meam[elti][elti], shpi);
|
||||
get_shpfcn(lattce_meam[eltj][eltj], stheta_meam[elti][elti], ctheta_meam[elti][elti], shpj);
|
||||
|
||||
drhodr1 = d_dgamma1[i] * drho0dr1 +
|
||||
d_dgamma2[i] *
|
||||
(dt1dr1 * d_rho1[i] + t1i * drho1dr1 + dt2dr1 * d_rho2[i] + t2i * drho2dr1 +
|
||||
dt3dr1 * d_rho3[i] + t3i * drho3dr1) -
|
||||
d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
|
||||
drhodr2 = d_dgamma1[j] * drho0dr2 +
|
||||
d_dgamma2[j] *
|
||||
(dt1dr2 * d_rho1[j] + t1j * drho1dr2 + dt2dr2 * d_rho2[j] + t2j * drho2dr2 +
|
||||
dt3dr2 * d_rho3[j] + t3j * drho3dr2) -
|
||||
d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
|
||||
for (m = 0; m < 3; m++) {
|
||||
drhodrm1[m] = 0.0;
|
||||
drhodrm2[m] = 0.0;
|
||||
drhodrm1[m] =
|
||||
d_dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
|
||||
drhodrm2[m] =
|
||||
d_dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
|
||||
|
||||
if (msmeamflag) {
|
||||
drhodr1 = d_dgamma1[i] * drho0dr1 +
|
||||
d_dgamma2[i] * (dt1dr1 * d_rho1[i] + t1i * (drho1dr1 - drho1mdr1) +
|
||||
dt2dr1 * d_rho2[i] + t2i * (drho2dr1 - drho2mdr1) +
|
||||
dt3dr1 * d_rho3[i] + t3i * (drho3dr1 - drho3mdr1)) -
|
||||
d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
|
||||
drhodr2 = d_dgamma1[j] * drho0dr2 +
|
||||
d_dgamma2[j] * (dt1dr2 * d_rho1[j] + t1j * (drho1dr2 - drho1mdr2) +
|
||||
dt2dr2 * d_rho2[j] + t2j * (drho2dr2 - drho2mdr2) +
|
||||
dt3dr2 * d_rho3[j] + t3j * (drho3dr2 - drho3mdr2)) -
|
||||
d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
|
||||
for (m = 0; m < 3; m++) {
|
||||
drhodrm1[m] = 0.0;
|
||||
drhodrm2[m] = 0.0;
|
||||
drhodrm1[m] = d_dgamma2[i] * (t1i * (drho1drm1[m] - drho1mdrm1[m]) +
|
||||
t2i * (drho2drm1[m] - drho2mdrm1[m]) +
|
||||
t3i * (drho3drm1[m] - drho3mdrm1[m]) );
|
||||
drhodrm2[m] = d_dgamma2[j] * (t1j * (drho1drm2[m] - drho1mdrm2[m]) +
|
||||
t2j * (drho2drm2[m] - drho2mdrm2[m]) +
|
||||
t3j * (drho3drm2[m] - drho3mdrm2[m]) );
|
||||
}
|
||||
} else {
|
||||
drhodr1 = d_dgamma1[i] * drho0dr1 +
|
||||
d_dgamma2[i] *
|
||||
(dt1dr1 * d_rho1[i] + t1i * drho1dr1 + dt2dr1 * d_rho2[i] + t2i * drho2dr1 +
|
||||
dt3dr1 * d_rho3[i] + t3i * drho3dr1) -
|
||||
d_dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
|
||||
drhodr2 = d_dgamma1[j] * drho0dr2 +
|
||||
d_dgamma2[j] *
|
||||
(dt1dr2 * d_rho1[j] + t1j * drho1dr2 + dt2dr2 * d_rho2[j] + t2j * drho2dr2 +
|
||||
dt3dr2 * d_rho3[j] + t3j * drho3dr2) -
|
||||
d_dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
|
||||
for (m = 0; m < 3; m++) {
|
||||
drhodrm1[m] = 0.0;
|
||||
drhodrm2[m] = 0.0;
|
||||
drhodrm1[m] =
|
||||
d_dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
|
||||
drhodrm2[m] =
|
||||
d_dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
|
||||
}
|
||||
}
|
||||
|
||||
// Compute derivatives wrt sij, but only if necessary
|
||||
@ -416,6 +594,24 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
|
||||
drho3ds1 = a3 * rhoa3j * arg1i3 - a3a * rhoa3j * arg3i3;
|
||||
drho3ds2 = a3 * rhoa3i * arg1j3 - a3a * rhoa3i * arg3j3;
|
||||
|
||||
if (msmeamflag) {
|
||||
drho1mds1 = a1 * rhoa1mj * arg1i1m;
|
||||
drho1mds2 = a1 * rhoa1mi * arg1j1m;
|
||||
drho2mds1 = a2 * rhoa2mj * arg1i2m - 2.0 / 3.0 * d_arho2mb[i] * rhoa2mj;
|
||||
drho2mds2 = a2 * rhoa2mi * arg1j2m - 2.0 / 3.0 * d_arho2mb[j] * rhoa2mi;
|
||||
drho3mds1 = a3 * rhoa3mj * arg1i3m - a3a * rhoa3mj * arg3i3m;
|
||||
drho3mds2 = a3 * rhoa3mi * arg1j3m - a3a * rhoa3mi * arg3j3m;
|
||||
drho3mds1 *= -1;
|
||||
drho3mds2 *= -1;
|
||||
} else {
|
||||
drho1mds1 = 0.0;
|
||||
drho1mds2 = 0.0;
|
||||
drho2mds1 = 0.0;
|
||||
drho2mds2 = 0.0;
|
||||
drho3mds1 = 0.0;
|
||||
drho3mds2 = 0.0;
|
||||
}
|
||||
|
||||
if (ialloy == 1) {
|
||||
a1i = fdiv_zero_kk(rhoa0j, d_tsq_ave(i, 0));
|
||||
a1j = fdiv_zero_kk(rhoa0i, d_tsq_ave(j, 0));
|
||||
@ -455,19 +651,33 @@ KOKKOS_INLINE_FUNCTION void MEAMKokkos<DeviceType>::operator()(TagMEAMForce<NEIG
|
||||
dt3ds2 = aj * (t3mi - t3j);
|
||||
}
|
||||
|
||||
drhods1 = d_dgamma1[i] * drho0ds1 +
|
||||
d_dgamma2[i] *
|
||||
(dt1ds1 * d_rho1[i] + t1i * drho1ds1 + dt2ds1 * d_rho2[i] + t2i * drho2ds1 +
|
||||
dt3ds1 * d_rho3[i] + t3i * drho3ds1) -
|
||||
if (msmeamflag) {
|
||||
drhods1 = d_dgamma1[i] * drho0ds1 +
|
||||
d_dgamma2[i] * (dt1ds1 * d_rho1[i] + t1i * (drho1ds1 - drho1mds1) +
|
||||
dt2ds1 * d_rho2[i] + t2i * (drho2ds1 - drho2mds1) +
|
||||
dt3ds1 * d_rho3[i] + t3i * (drho3ds1 - drho3mds1)) -
|
||||
d_dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
|
||||
drhods2 = d_dgamma1[j] * drho0ds2 +
|
||||
d_dgamma2[j] *
|
||||
(dt1ds2 * d_rho1[j] + t1j * drho1ds2 + dt2ds2 * d_rho2[j] + t2j * drho2ds2 +
|
||||
dt3ds2 * d_rho3[j] + t3j * drho3ds2) -
|
||||
drhods2 = d_dgamma1[j] * drho0ds2 +
|
||||
d_dgamma2[j] * (dt1ds2 * d_rho1[j] + t1j * (drho1ds2 - drho1mds2) +
|
||||
dt2ds2 * d_rho2[j] + t2j * (drho2ds2 - drho2mds2) +
|
||||
dt3ds2 * d_rho3[j] + t3j * (drho3ds2 - drho3mds2)) -
|
||||
d_dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
|
||||
} else {
|
||||
drhods1 = d_dgamma1[i] * drho0ds1 +
|
||||
d_dgamma2[i] *
|
||||
(dt1ds1 * d_rho1[i] + t1i * drho1ds1 + dt2ds1 * d_rho2[i] + t2i * drho2ds1 +
|
||||
dt3ds1 * d_rho3[i] + t3i * drho3ds1) -
|
||||
d_dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
|
||||
drhods2 = d_dgamma1[j] * drho0ds2 +
|
||||
d_dgamma2[j] *
|
||||
(dt1ds2 * d_rho1[j] + t1j * drho1ds2 + dt2ds2 * d_rho2[j] + t2j * drho2ds2 +
|
||||
dt3ds2 * d_rho3[j] + t3j * drho3ds2) -
|
||||
d_dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
|
||||
}
|
||||
}
|
||||
|
||||
// Compute derivatives of energy wrt rij, sij and rij[3]
|
||||
|
||||
dUdrij = phip * sij + d_frhop[i] * drhodr1 + d_frhop[j] * drhodr2;
|
||||
dUdsij = 0.0;
|
||||
if (!iszero_kk(d_dscrfcn[fnoffset + jn])) {
|
||||
|
||||
@ -58,6 +58,14 @@ MEAMKokkos<DeviceType>::~MEAMKokkos()
|
||||
memoryKK->destroy_kokkos(k_scrfcn,scrfcn);
|
||||
memoryKK->destroy_kokkos(k_dscrfcn,dscrfcn);
|
||||
memoryKK->destroy_kokkos(k_fcpair,fcpair);
|
||||
|
||||
// msmeam
|
||||
|
||||
memoryKK->destroy_kokkos(k_arho2mb, arho2mb);
|
||||
memoryKK->destroy_kokkos(k_arho1m, arho1m);
|
||||
memoryKK->destroy_kokkos(k_arho2m, arho2m);
|
||||
memoryKK->destroy_kokkos(k_arho3m, arho3m);
|
||||
memoryKK->destroy_kokkos(k_arho3mb, arho3mb);
|
||||
}
|
||||
|
||||
#include "meam_setup_done_kokkos.h"
|
||||
|
||||
@ -136,6 +136,13 @@ template <class DeviceType> class MEAMKokkos : public MEAM {
|
||||
DAT::tdual_ffloat_1d k_scrfcn, k_dscrfcn, k_fcpair;
|
||||
typename ArrayTypes<DeviceType>::t_ffloat_1d d_scrfcn, d_dscrfcn, d_fcpair;
|
||||
HAT::t_ffloat_1d h_scrfcn, h_dscrfcn, h_fcpair;
|
||||
// msmeam
|
||||
DAT::tdual_ffloat_2d k_arho1m, k_arho2m, k_arho3m, k_arho3mb;
|
||||
typename ArrayTypes<DeviceType>::t_ffloat_2d d_arho1m, d_arho2m, d_arho3m, d_arho3mb;
|
||||
HAT::t_ffloat_2d h_arho1m, h_arho2m, h_arho3m, h_arho3mb;
|
||||
DAT::tdual_ffloat_1d k_arho2mb;
|
||||
typename ArrayTypes<DeviceType>::t_ffloat_1d d_arho2mb;
|
||||
HAT::t_ffloat_1d h_arho2mb;
|
||||
|
||||
protected:
|
||||
int need_dup;
|
||||
@ -195,6 +202,31 @@ template <class DeviceType> class MEAMKokkos : public MEAM {
|
||||
dup_vatom;
|
||||
NonDupScatterView<typename decltype(d_vatom)::data_type, typename decltype(d_vatom)::array_layout>
|
||||
ndup_vatom;
|
||||
|
||||
// msmeam
|
||||
|
||||
DupScatterView<typename decltype(d_arho1m)::data_type, typename decltype(d_arho1m)::array_layout>
|
||||
dup_arho1m;
|
||||
NonDupScatterView<typename decltype(d_arho1m)::data_type, typename decltype(d_arho1m)::array_layout>
|
||||
ndup_arho1m;
|
||||
DupScatterView<typename decltype(d_arho2m)::data_type, typename decltype(d_arho2m)::array_layout>
|
||||
dup_arho2m;
|
||||
NonDupScatterView<typename decltype(d_arho2m)::data_type, typename decltype(d_arho2m)::array_layout>
|
||||
ndup_arho2m;
|
||||
DupScatterView<typename decltype(d_arho3m)::data_type, typename decltype(d_arho3m)::array_layout>
|
||||
dup_arho3m;
|
||||
NonDupScatterView<typename decltype(d_arho3m)::data_type, typename decltype(d_arho3m)::array_layout>
|
||||
ndup_arho3m;
|
||||
DupScatterView<typename decltype(d_arho2mb)::data_type, typename decltype(d_arho2mb)::array_layout>
|
||||
dup_arho2mb;
|
||||
NonDupScatterView<typename decltype(d_arho2mb)::data_type,
|
||||
typename decltype(d_arho2mb)::array_layout>
|
||||
ndup_arho2mb;
|
||||
DupScatterView<typename decltype(d_arho3mb)::data_type, typename decltype(d_arho3mb)::array_layout>
|
||||
dup_arho3mb;
|
||||
NonDupScatterView<typename decltype(d_arho3mb)::data_type,
|
||||
typename decltype(d_arho3mb)::array_layout>
|
||||
ndup_arho3mb;
|
||||
};
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
||||
@ -51,6 +51,7 @@ PairMEAMKokkos<DeviceType>::PairMEAMKokkos(LAMMPS *lmp) : PairMEAM(lmp)
|
||||
delete meam_inst;
|
||||
meam_inst_kk = new MEAMKokkos<DeviceType>(memory);
|
||||
meam_inst = meam_inst_kk;
|
||||
myname = "meam/kk";
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -156,7 +157,8 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
|
||||
|
||||
int need_dup = lmp->kokkos->need_dup<DeviceType>();
|
||||
|
||||
meam_inst_kk->meam_dens_init(inum_half,ntype,type,d_map,x,d_numneigh_half,d_numneigh_full,d_ilist_half,d_neighbors_half, d_neighbors_full, d_offset, neighflag, need_dup);
|
||||
meam_inst_kk->meam_dens_init(inum_half,ntype,type,d_map,x,d_numneigh_half,d_numneigh_full,
|
||||
d_ilist_half,d_neighbors_half, d_neighbors_full, d_offset, neighflag, need_dup);
|
||||
|
||||
meam_inst_kk->k_rho0.template modify<DeviceType>();
|
||||
meam_inst_kk->k_arho2b.template modify<DeviceType>();
|
||||
@ -166,6 +168,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
|
||||
meam_inst_kk->k_arho3b.template modify<DeviceType>();
|
||||
meam_inst_kk->k_t_ave.template modify<DeviceType>();
|
||||
meam_inst_kk->k_tsq_ave.template modify<DeviceType>();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.template modify<DeviceType>();
|
||||
meam_inst_kk->k_arho1m.template modify<DeviceType>();
|
||||
meam_inst_kk->k_arho2m.template modify<DeviceType>();
|
||||
meam_inst_kk->k_arho3m.template modify<DeviceType>();
|
||||
meam_inst_kk->k_arho3mb.template modify<DeviceType>();
|
||||
}
|
||||
|
||||
comm->reverse_comm(this);
|
||||
|
||||
@ -177,6 +186,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
|
||||
meam_inst_kk->k_arho3b.template sync<DeviceType>();
|
||||
meam_inst_kk->k_t_ave.template sync<DeviceType>();
|
||||
meam_inst_kk->k_tsq_ave.template sync<DeviceType>();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.template sync<DeviceType>();
|
||||
meam_inst_kk->k_arho1m.template sync<DeviceType>();
|
||||
meam_inst_kk->k_arho2m.template sync<DeviceType>();
|
||||
meam_inst_kk->k_arho3m.template sync<DeviceType>();
|
||||
meam_inst_kk->k_arho3mb.template sync<DeviceType>();
|
||||
}
|
||||
|
||||
meam_inst_kk->meam_dens_final(nlocal,eflag_either,eflag_global,eflag_atom,
|
||||
d_eatom,ntype,type,d_map,d_scale,errorflag,ev);
|
||||
@ -200,6 +216,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
|
||||
meam_inst_kk->k_arho3b.template modify<DeviceType>();
|
||||
meam_inst_kk->k_t_ave.template modify<DeviceType>();
|
||||
meam_inst_kk->k_tsq_ave.template modify<DeviceType>();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.template modify<DeviceType>();
|
||||
meam_inst_kk->k_arho1m.template modify<DeviceType>();
|
||||
meam_inst_kk->k_arho2m.template modify<DeviceType>();
|
||||
meam_inst_kk->k_arho3m.template modify<DeviceType>();
|
||||
meam_inst_kk->k_arho3mb.template modify<DeviceType>();
|
||||
}
|
||||
|
||||
comm->forward_comm(this);
|
||||
|
||||
@ -219,6 +242,13 @@ void PairMEAMKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
|
||||
meam_inst_kk->k_arho3b.template sync<DeviceType>();
|
||||
meam_inst_kk->k_t_ave.template sync<DeviceType>();
|
||||
meam_inst_kk->k_tsq_ave.template sync<DeviceType>();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.template sync<DeviceType>();
|
||||
meam_inst_kk->k_arho1m.template sync<DeviceType>();
|
||||
meam_inst_kk->k_arho2m.template sync<DeviceType>();
|
||||
meam_inst_kk->k_arho3m.template sync<DeviceType>();
|
||||
meam_inst_kk->k_arho3mb.template sync<DeviceType>();
|
||||
}
|
||||
|
||||
meam_inst_kk->meam_force(inum_half,eflag_global,eflag_atom,vflag_global,
|
||||
vflag_atom,d_eatom,ntype,type,d_map,x,
|
||||
@ -315,7 +345,7 @@ int PairMEAMKokkos<DeviceType>::pack_forward_comm_kokkos(int n, DAT::tdual_int_2
|
||||
iswap = iswap_in;
|
||||
v_buf = buf.view<DeviceType>();
|
||||
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, TagPairMEAMPackForwardComm>(0,n),*this);
|
||||
return n*38;
|
||||
return n*comm_forward;
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -324,7 +354,7 @@ template<class DeviceType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackForwardComm, const int &i) const {
|
||||
int j = d_sendlist(iswap, i);
|
||||
int m = i*38;
|
||||
int m = i*comm_forward;
|
||||
v_buf[m++] = d_rho0[j];
|
||||
v_buf[m++] = d_rho1[j];
|
||||
v_buf[m++] = d_rho2[j];
|
||||
@ -354,6 +384,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackForwardComm, const in
|
||||
v_buf[m++] = d_tsq_ave(j,0);
|
||||
v_buf[m++] = d_tsq_ave(j,1);
|
||||
v_buf[m++] = d_tsq_ave(j,2);
|
||||
if (msmeamflag) {
|
||||
v_buf[m++] = d_arho2mb[j];
|
||||
v_buf[m++] = d_arho1m(j,0);
|
||||
v_buf[m++] = d_arho1m(j,1);
|
||||
v_buf[m++] = d_arho1m(j,2);
|
||||
v_buf[m++] = d_arho2m(j,0);
|
||||
v_buf[m++] = d_arho2m(j,1);
|
||||
v_buf[m++] = d_arho2m(j,2);
|
||||
v_buf[m++] = d_arho2m(j,3);
|
||||
v_buf[m++] = d_arho2m(j,4);
|
||||
v_buf[m++] = d_arho2m(j,5);
|
||||
for (int k = 0; k < 10; k++) v_buf[m++] = d_arho3m(j,k);
|
||||
v_buf[m++] = d_arho3mb(j,0);
|
||||
v_buf[m++] = d_arho3mb(j,1);
|
||||
v_buf[m++] = d_arho3mb(j,2);
|
||||
}
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -371,7 +417,8 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm_kokkos(int n, int first_in,
|
||||
template<class DeviceType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackForwardComm, const int &i) const{
|
||||
int m = i*38;
|
||||
//int m = i*38;
|
||||
int m = i*comm_forward;
|
||||
|
||||
d_rho0[i+first] = v_buf[m++];
|
||||
d_rho1[i+first] = v_buf[m++];
|
||||
@ -402,6 +449,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackForwardComm, const
|
||||
d_tsq_ave(i+first,0) = v_buf[m++];
|
||||
d_tsq_ave(i+first,1) = v_buf[m++];
|
||||
d_tsq_ave(i+first,2) = v_buf[m++];
|
||||
if (msmeamflag) {
|
||||
d_arho2mb[i+first] = v_buf[m++];
|
||||
d_arho1m(i+first,0) = v_buf[m++];
|
||||
d_arho1m(i+first,1) = v_buf[m++];
|
||||
d_arho1m(i+first,2) = v_buf[m++];
|
||||
d_arho2m(i+first,0) = v_buf[m++];
|
||||
d_arho2m(i+first,1) = v_buf[m++];
|
||||
d_arho2m(i+first,2) = v_buf[m++];
|
||||
d_arho2m(i+first,3) = v_buf[m++];
|
||||
d_arho2m(i+first,4) = v_buf[m++];
|
||||
d_arho2m(i+first,5) = v_buf[m++];
|
||||
for (int k = 0; k < 10; k++) d_arho3m(i+first,k) = v_buf[m++];
|
||||
d_arho3mb(i+first,0) = v_buf[m++];
|
||||
d_arho3mb(i+first,1) = v_buf[m++];
|
||||
d_arho3mb(i+first,2) = v_buf[m++];
|
||||
}
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -426,6 +489,13 @@ int PairMEAMKokkos<DeviceType>::pack_forward_comm(int n, int *list, double *buf,
|
||||
meam_inst_kk->k_arho3b.sync_host();
|
||||
meam_inst_kk->k_t_ave.sync_host();
|
||||
meam_inst_kk->k_tsq_ave.sync_host();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.sync_host();
|
||||
meam_inst_kk->k_arho1m.sync_host();
|
||||
meam_inst_kk->k_arho2m.sync_host();
|
||||
meam_inst_kk->k_arho3m.sync_host();
|
||||
meam_inst_kk->k_arho3mb.sync_host();
|
||||
}
|
||||
|
||||
int m = 0;
|
||||
for (int i = 0; i < n; i++) {
|
||||
@ -459,6 +529,22 @@ int PairMEAMKokkos<DeviceType>::pack_forward_comm(int n, int *list, double *buf,
|
||||
buf[m++] = meam_inst_kk->h_tsq_ave(j,0);
|
||||
buf[m++] = meam_inst_kk->h_tsq_ave(j,1);
|
||||
buf[m++] = meam_inst_kk->h_tsq_ave(j,2);
|
||||
if (msmeamflag) {
|
||||
buf[m++] = meam_inst_kk->h_arho2mb[j];
|
||||
buf[m++] = meam_inst_kk->h_arho1m(j,0);
|
||||
buf[m++] = meam_inst_kk->h_arho1m(j,1);
|
||||
buf[m++] = meam_inst_kk->h_arho1m(j,2);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(j,0);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(j,1);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(j,2);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(j,3);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(j,4);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(j,5);
|
||||
for (int k = 0; k < 10; k++) buf[m++] = meam_inst_kk->h_arho3m(j,k);
|
||||
buf[m++] = meam_inst_kk->h_arho3mb(j,0);
|
||||
buf[m++] = meam_inst_kk->h_arho3mb(j,1);
|
||||
buf[m++] = meam_inst_kk->h_arho3mb(j,2);
|
||||
}
|
||||
}
|
||||
|
||||
return m;
|
||||
@ -485,6 +571,13 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *b
|
||||
meam_inst_kk->k_arho3b.sync_host();
|
||||
meam_inst_kk->k_t_ave.sync_host();
|
||||
meam_inst_kk->k_tsq_ave.sync_host();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.sync_host();
|
||||
meam_inst_kk->k_arho1m.sync_host();
|
||||
meam_inst_kk->k_arho2m.sync_host();
|
||||
meam_inst_kk->k_arho3m.sync_host();
|
||||
meam_inst_kk->k_arho3mb.sync_host();
|
||||
}
|
||||
|
||||
int m = 0;
|
||||
const int last = first + n;
|
||||
@ -518,6 +611,22 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *b
|
||||
meam_inst_kk->h_tsq_ave(i,0) = buf[m++];
|
||||
meam_inst_kk->h_tsq_ave(i,1) = buf[m++];
|
||||
meam_inst_kk->h_tsq_ave(i,2) = buf[m++];
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->h_arho2mb[i] = buf[m++];
|
||||
meam_inst_kk->h_arho1m(i,0) = buf[m++];
|
||||
meam_inst_kk->h_arho1m(i,1) = buf[m++];
|
||||
meam_inst_kk->h_arho1m(i,2) = buf[m++];
|
||||
meam_inst_kk->h_arho2m(i,0) = buf[m++];
|
||||
meam_inst_kk->h_arho2m(i,1) = buf[m++];
|
||||
meam_inst_kk->h_arho2m(i,2) = buf[m++];
|
||||
meam_inst_kk->h_arho2m(i,3) = buf[m++];
|
||||
meam_inst_kk->h_arho2m(i,4) = buf[m++];
|
||||
meam_inst_kk->h_arho2m(i,5) = buf[m++];
|
||||
for (int k = 0; k < 10; k++) meam_inst_kk->h_arho3m(i,k) = buf[m++];
|
||||
meam_inst_kk->h_arho3mb(i,0) = buf[m++];
|
||||
meam_inst_kk->h_arho3mb(i,1) = buf[m++];
|
||||
meam_inst_kk->h_arho3mb(i,2) = buf[m++];
|
||||
}
|
||||
}
|
||||
|
||||
meam_inst_kk->k_rho0.modify_host();
|
||||
@ -536,6 +645,13 @@ void PairMEAMKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *b
|
||||
meam_inst_kk->k_arho3b.modify_host();
|
||||
meam_inst_kk->k_t_ave.modify_host();
|
||||
meam_inst_kk->k_tsq_ave.modify_host();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.modify_host();
|
||||
meam_inst_kk->k_arho1m.modify_host();
|
||||
meam_inst_kk->k_arho2m.modify_host();
|
||||
meam_inst_kk->k_arho3m.modify_host();
|
||||
meam_inst_kk->k_arho3mb.modify_host();
|
||||
}
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -546,7 +662,8 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm_kokkos(int n, int first_in, DA
|
||||
first = first_in;
|
||||
v_buf = buf.view<DeviceType>();
|
||||
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, TagPairMEAMPackReverseComm>(0,n),*this);
|
||||
return n*30;
|
||||
//return n*30;
|
||||
return n*comm_reverse;
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -554,7 +671,8 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm_kokkos(int n, int first_in, DA
|
||||
template<class DeviceType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackReverseComm, const int &i) const {
|
||||
int m = i*30;
|
||||
//int m = i*30;
|
||||
int m = i*comm_reverse;
|
||||
|
||||
v_buf[m++] = d_rho0[i+first];
|
||||
v_buf[m++] = d_arho2b[i+first];
|
||||
@ -577,6 +695,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMPackReverseComm, const in
|
||||
v_buf[m++] = d_tsq_ave(i+first,0);
|
||||
v_buf[m++] = d_tsq_ave(i+first,1);
|
||||
v_buf[m++] = d_tsq_ave(i+first,2);
|
||||
if (msmeamflag) {
|
||||
v_buf[m++] = d_arho2mb[i+first];
|
||||
v_buf[m++] = d_arho1m(i+first,0);
|
||||
v_buf[m++] = d_arho1m(i+first,1);
|
||||
v_buf[m++] = d_arho1m(i+first,2);
|
||||
v_buf[m++] = d_arho2m(i+first,0);
|
||||
v_buf[m++] = d_arho2m(i+first,1);
|
||||
v_buf[m++] = d_arho2m(i+first,2);
|
||||
v_buf[m++] = d_arho2m(i+first,3);
|
||||
v_buf[m++] = d_arho2m(i+first,4);
|
||||
v_buf[m++] = d_arho2m(i+first,5);
|
||||
for (int k = 0; k < 10; k++) v_buf[m++] = d_arho3m(i+first,k);
|
||||
v_buf[m++] = d_arho3mb(i+first,0);
|
||||
v_buf[m++] = d_arho3mb(i+first,1);
|
||||
v_buf[m++] = d_arho3mb(i+first,2);
|
||||
}
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -592,6 +726,13 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm(int n, int first, double *buf)
|
||||
meam_inst_kk->k_arho3b.sync_host();
|
||||
meam_inst_kk->k_t_ave.sync_host();
|
||||
meam_inst_kk->k_tsq_ave.sync_host();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.sync_host();
|
||||
meam_inst_kk->k_arho1m.sync_host();
|
||||
meam_inst_kk->k_arho2m.sync_host();
|
||||
meam_inst_kk->k_arho3m.sync_host();
|
||||
meam_inst_kk->k_arho3mb.sync_host();
|
||||
}
|
||||
|
||||
int m = 0;
|
||||
const int last = first + n;
|
||||
@ -617,6 +758,22 @@ int PairMEAMKokkos<DeviceType>::pack_reverse_comm(int n, int first, double *buf)
|
||||
buf[m++] = meam_inst_kk->h_tsq_ave(i,0);
|
||||
buf[m++] = meam_inst_kk->h_tsq_ave(i,1);
|
||||
buf[m++] = meam_inst_kk->h_tsq_ave(i,2);
|
||||
if (msmeamflag) {
|
||||
buf[m++] = meam_inst_kk->h_arho2mb[i];
|
||||
buf[m++] = meam_inst_kk->h_arho1m(i,0);
|
||||
buf[m++] = meam_inst_kk->h_arho1m(i,1);
|
||||
buf[m++] = meam_inst_kk->h_arho1m(i,2);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(i,0);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(i,1);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(i,2);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(i,3);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(i,4);
|
||||
buf[m++] = meam_inst_kk->h_arho2m(i,5);
|
||||
for (int k = 0; k < 10; k++) buf[m++] = meam_inst_kk->h_arho3m(i,k);
|
||||
buf[m++] = meam_inst_kk->h_arho3mb(i,0);
|
||||
buf[m++] = meam_inst_kk->h_arho3mb(i,1);
|
||||
buf[m++] = meam_inst_kk->h_arho3mb(i,2);
|
||||
}
|
||||
}
|
||||
|
||||
return m;
|
||||
@ -639,7 +796,8 @@ template<class DeviceType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackReverseComm, const int &i) const {
|
||||
int j = d_sendlist(iswap, i);
|
||||
int m = i*30;
|
||||
//int m = i*30;
|
||||
int m = i*comm_reverse;
|
||||
|
||||
d_rho0[j] += v_buf[m++];
|
||||
d_arho2b[j] += v_buf[m++];
|
||||
@ -662,6 +820,22 @@ void PairMEAMKokkos<DeviceType>::operator()(TagPairMEAMUnpackReverseComm, const
|
||||
d_tsq_ave(j,0) += v_buf[m++];
|
||||
d_tsq_ave(j,1) += v_buf[m++];
|
||||
d_tsq_ave(j,2) += v_buf[m++];
|
||||
if (msmeamflag) {
|
||||
d_arho2mb[j] += v_buf[m++];
|
||||
d_arho1m(j,0) += v_buf[m++];
|
||||
d_arho1m(j,1) += v_buf[m++];
|
||||
d_arho1m(j,2) += v_buf[m++];
|
||||
d_arho2m(j,0) += v_buf[m++];
|
||||
d_arho2m(j,1) += v_buf[m++];
|
||||
d_arho2m(j,2) += v_buf[m++];
|
||||
d_arho2m(j,3) += v_buf[m++];
|
||||
d_arho2m(j,4) += v_buf[m++];
|
||||
d_arho2m(j,5) += v_buf[m++];
|
||||
for (int k = 0; k < 10; k++) d_arho3m(j,k) += v_buf[m++];
|
||||
d_arho3mb(j,0) += v_buf[m++];
|
||||
d_arho3mb(j,1) += v_buf[m++];
|
||||
d_arho3mb(j,2) += v_buf[m++];
|
||||
}
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -677,6 +851,13 @@ void PairMEAMKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *b
|
||||
meam_inst_kk->k_arho3b.sync_host();
|
||||
meam_inst_kk->k_t_ave.sync_host();
|
||||
meam_inst_kk->k_tsq_ave.sync_host();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.sync_host();
|
||||
meam_inst_kk->k_arho1m.sync_host();
|
||||
meam_inst_kk->k_arho2m.sync_host();
|
||||
meam_inst_kk->k_arho3m.sync_host();
|
||||
meam_inst_kk->k_arho3mb.sync_host();
|
||||
}
|
||||
|
||||
int m = 0;
|
||||
for (int i = 0; i < n; i++) {
|
||||
@ -702,6 +883,22 @@ void PairMEAMKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *b
|
||||
meam_inst_kk->h_tsq_ave(j,0) += buf[m++];
|
||||
meam_inst_kk->h_tsq_ave(j,1) += buf[m++];
|
||||
meam_inst_kk->h_tsq_ave(j,2) += buf[m++];
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->h_arho2mb[j] += buf[m++];
|
||||
meam_inst_kk->h_arho1m(j,0) += buf[m++];
|
||||
meam_inst_kk->h_arho1m(j,1) += buf[m++];
|
||||
meam_inst_kk->h_arho1m(j,2) += buf[m++];
|
||||
meam_inst_kk->h_arho2m(j,0) += buf[m++];
|
||||
meam_inst_kk->h_arho2m(j,1) += buf[m++];
|
||||
meam_inst_kk->h_arho2m(j,2) += buf[m++];
|
||||
meam_inst_kk->h_arho2m(j,3) += buf[m++];
|
||||
meam_inst_kk->h_arho2m(j,4) += buf[m++];
|
||||
meam_inst_kk->h_arho2m(j,5) += buf[m++];
|
||||
for (int k = 0; k < 10; k++) meam_inst_kk->h_arho3m(j,k) += buf[m++];
|
||||
meam_inst_kk->h_arho3mb(j,0) += buf[m++];
|
||||
meam_inst_kk->h_arho3mb(j,1) += buf[m++];
|
||||
meam_inst_kk->h_arho3mb(j,2) += buf[m++];
|
||||
}
|
||||
}
|
||||
|
||||
meam_inst_kk->k_rho0.modify_host();
|
||||
@ -712,6 +909,13 @@ void PairMEAMKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *b
|
||||
meam_inst_kk->k_arho3b.modify_host();
|
||||
meam_inst_kk->k_t_ave.modify_host();
|
||||
meam_inst_kk->k_tsq_ave.modify_host();
|
||||
if (msmeamflag) {
|
||||
meam_inst_kk->k_arho2mb.modify_host();
|
||||
meam_inst_kk->k_arho1m.modify_host();
|
||||
meam_inst_kk->k_arho2m.modify_host();
|
||||
meam_inst_kk->k_arho3m.modify_host();
|
||||
meam_inst_kk->k_arho3mb.modify_host();
|
||||
}
|
||||
}
|
||||
|
||||
/* ----------------------------------------------------------------------
|
||||
@ -764,6 +968,12 @@ void PairMEAMKokkos<DeviceType>::update_meam_views()
|
||||
d_arho3b = meam_inst_kk->d_arho3b;
|
||||
d_t_ave = meam_inst_kk->d_t_ave;
|
||||
d_tsq_ave = meam_inst_kk->d_tsq_ave;
|
||||
// msmeam
|
||||
d_arho1m = meam_inst_kk->d_arho1m;
|
||||
d_arho2m = meam_inst_kk->d_arho2m;
|
||||
d_arho3m = meam_inst_kk->d_arho3m;
|
||||
d_arho2mb = meam_inst_kk->d_arho2mb;
|
||||
d_arho3mb = meam_inst_kk->d_arho3mb;
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
|
||||
@ -13,12 +13,12 @@
|
||||
|
||||
#ifdef PAIR_CLASS
|
||||
// clang-format off
|
||||
PairStyle(meam/c/kk,PairMEAMKokkos<LMPDeviceType>)
|
||||
PairStyle(meam/c/kk/device,PairMEAMKokkos<LMPDeviceType>)
|
||||
PairStyle(meam/c/kk/host,PairMEAMKokkos<LMPHostType>)
|
||||
PairStyle(meam/kk,PairMEAMKokkos<LMPDeviceType>)
|
||||
PairStyle(meam/kk/device,PairMEAMKokkos<LMPDeviceType>)
|
||||
PairStyle(meam/kk/host,PairMEAMKokkos<LMPHostType>)
|
||||
PairStyle(meam/c/kk,PairMEAMKokkos<LMPDeviceType>);
|
||||
PairStyle(meam/c/kk/device,PairMEAMKokkos<LMPDeviceType>);
|
||||
PairStyle(meam/c/kk/host,PairMEAMKokkos<LMPHostType>);
|
||||
PairStyle(meam/kk,PairMEAMKokkos<LMPDeviceType>);
|
||||
PairStyle(meam/kk/device,PairMEAMKokkos<LMPDeviceType>);
|
||||
PairStyle(meam/kk/host,PairMEAMKokkos<LMPHostType>);
|
||||
// clang-format on
|
||||
#else
|
||||
|
||||
@ -117,6 +117,9 @@ class PairMEAMKokkos : public PairMEAM, public KokkosBase {
|
||||
typename ArrayTypes<DeviceType>::t_ffloat_1d d_rho, d_rho0, d_rho1, d_rho2, d_rho3, d_frhop;
|
||||
typename ArrayTypes<DeviceType>::t_ffloat_1d d_gamma, d_dgamma1, d_dgamma2, d_dgamma3, d_arho2b;
|
||||
typename ArrayTypes<DeviceType>::t_ffloat_2d d_arho1, d_arho2, d_arho3, d_arho3b, d_t_ave, d_tsq_ave;
|
||||
// msmeam params
|
||||
typename ArrayTypes<DeviceType>::t_ffloat_1d d_arho2mb;
|
||||
typename ArrayTypes<DeviceType>::t_ffloat_2d d_arho1m, d_arho2m, d_arho3m, d_arho3mb;
|
||||
|
||||
void update_meam_views();
|
||||
|
||||
|
||||
32
src/KOKKOS/pair_meam_ms_kokkos.cpp
Normal file
32
src/KOKKOS/pair_meam_ms_kokkos.cpp
Normal file
@ -0,0 +1,32 @@
|
||||
/* ----------------------------------------------------------------------
|
||||
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
|
||||
https://www.lammps.org/ Sandia National Laboratories
|
||||
LAMMPS development team: developers@lammps.org
|
||||
|
||||
Copyright (2003) Sandia Corporation. Under the terms of Contract
|
||||
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
|
||||
certain rights in this software. This software is distributed under
|
||||
the GNU General Public License.
|
||||
|
||||
See the README file in the top-level LAMMPS directory.
|
||||
------------------------------------------------------------------------- */
|
||||
|
||||
#include "pair_meam_ms_kokkos.h"
|
||||
#include "meam.h"
|
||||
|
||||
using namespace LAMMPS_NS;
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
template<class DeviceType>
|
||||
PairMEAMMSKokkos<DeviceType>::PairMEAMMSKokkos(LAMMPS *lmp) : PairMEAMKokkos<DeviceType>(lmp)
|
||||
{
|
||||
this->meam_inst->msmeamflag = this->msmeamflag = 1;
|
||||
this->myname = "meam/ms/kk";
|
||||
}
|
||||
|
||||
namespace LAMMPS_NS {
|
||||
template class PairMEAMMSKokkos<LMPDeviceType>;
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
template class PairMEAMMSKokkos<LMPHostType>;
|
||||
#endif
|
||||
}
|
||||
36
src/KOKKOS/pair_meam_ms_kokkos.h
Normal file
36
src/KOKKOS/pair_meam_ms_kokkos.h
Normal file
@ -0,0 +1,36 @@
|
||||
/* -*- c++ -*- ----------------------------------------------------------
|
||||
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
|
||||
https://www.lammps.org/ Sandia National Laboratories
|
||||
LAMMPS development team: developers@lammps.org
|
||||
|
||||
Copyright (2003) Sandia Corporation. Under the terms of Contract
|
||||
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
|
||||
certain rights in this software. This software is distributed under
|
||||
the GNU General Public License.
|
||||
|
||||
See the README file in the top-level LAMMPS directory.
|
||||
------------------------------------------------------------------------- */
|
||||
|
||||
#ifdef PAIR_CLASS
|
||||
// clang-format off
|
||||
PairStyle(meam/ms/kk,PairMEAMMSKokkos<LMPDeviceType>);
|
||||
PairStyle(meam/ms/kk/device,PairMEAMMSKokkos<LMPDeviceType>);
|
||||
PairStyle(meam/ms/kk/host,PairMEAMMSKokkos<LMPHostType>);
|
||||
// clang-format on
|
||||
#else
|
||||
|
||||
#ifndef LMP_PAIR_MEAM_MS_KOKKOS_H
|
||||
#define LMP_PAIR_MEAM_MS_KOKKOS_H
|
||||
|
||||
#include "pair_meam_kokkos.h"
|
||||
|
||||
namespace LAMMPS_NS {
|
||||
|
||||
template <class DeviceType>
|
||||
class PairMEAMMSKokkos : public PairMEAMKokkos<DeviceType> {
|
||||
public:
|
||||
PairMEAMMSKokkos(class LAMMPS *);
|
||||
};
|
||||
} // namespace LAMMPS_NS
|
||||
#endif
|
||||
#endif
|
||||
@ -17,7 +17,7 @@
|
||||
#include <cmath>
|
||||
#include <string>
|
||||
|
||||
#define maxelt 5
|
||||
constexpr int maxelt = 5;
|
||||
|
||||
namespace LAMMPS_NS {
|
||||
class Memory;
|
||||
@ -30,6 +30,7 @@ class MEAM {
|
||||
virtual ~MEAM();
|
||||
|
||||
int copymode;
|
||||
int msmeamflag;
|
||||
|
||||
protected:
|
||||
Memory *memory;
|
||||
@ -74,6 +75,12 @@ class MEAM {
|
||||
// vind[23]D = Voight notation index maps for 2 and 3D
|
||||
// v2D,v3D = array of factors to apply for Voight notation
|
||||
|
||||
// MS-MEAM parameters
|
||||
|
||||
// msmeamflag = flag to activate MS-MEAM
|
||||
// betam[1-3]_meam = MS-MEAM electron density constants
|
||||
// tm[1-3]_meam = MS-MEAM coefficients on densities in Gamma computation
|
||||
|
||||
// nr,dr = pair function discretization parameters
|
||||
// nrar,rdrar = spline coeff array parameters
|
||||
|
||||
@ -115,12 +122,22 @@ class MEAM {
|
||||
int nr, nrar;
|
||||
double dr, rdrar;
|
||||
|
||||
// MS-MEAM parameters
|
||||
|
||||
double t1m_meam[maxelt], t2m_meam[maxelt], t3m_meam[maxelt];
|
||||
double beta1m_meam[maxelt], beta2m_meam[maxelt], beta3m_meam[maxelt];
|
||||
//int msmeamflag; // made public for pair style settings
|
||||
|
||||
public:
|
||||
int nmax;
|
||||
double *rho, *rho0, *rho1, *rho2, *rho3, *frhop;
|
||||
double *gamma, *dgamma1, *dgamma2, *dgamma3, *arho2b;
|
||||
double **arho1, **arho2, **arho3, **arho3b, **t_ave, **tsq_ave;
|
||||
|
||||
// MS-MEAM arrays
|
||||
|
||||
double **arho1m, **arho2m, *arho2mb, **arho3m, **arho3mb;
|
||||
|
||||
int maxneigh;
|
||||
double *scrfcn, *dscrfcn, *fcpair;
|
||||
|
||||
@ -242,7 +259,7 @@ class MEAM {
|
||||
double, double, double, double, double, int, int, lattice_t);
|
||||
void get_sijk(double, int, int, int, double *);
|
||||
void get_densref(double, int, int, double *, double *, double *, double *, double *, double *,
|
||||
double *, double *);
|
||||
double *, double *, double *, double *, double *, double *, double *, double *); // last 6 args for msmeam
|
||||
void interpolate_meam(int);
|
||||
|
||||
public:
|
||||
@ -282,10 +299,12 @@ class MEAM {
|
||||
}
|
||||
// clang-format on
|
||||
static int get_Zij(const lattice_t latt);
|
||||
// last 6 args are optional msmeam parameters
|
||||
void meam_setup_global(int nelt, lattice_t *lat, int *ielement, double *atwt, double *alpha,
|
||||
double *b0, double *b1, double *b2, double *b3, double *alat, double *esub,
|
||||
double *asub, double *t0, double *t1, double *t2, double *t3,
|
||||
double *rozero, int *ibar);
|
||||
double *rozero, int *ibar, double *b1m, double *b2m, double *b3m,
|
||||
double *t1m, double *t2m, double *t3m);
|
||||
void meam_setup_param(int which, double value, int nindex, int *index /*index(3)*/,
|
||||
int *errorflag);
|
||||
virtual void meam_setup_done(double *cutmax);
|
||||
|
||||
@ -27,115 +27,222 @@ MEAM::meam_dens_final(int nlocal, int eflag_either, int eflag_global, int eflag_
|
||||
|
||||
// Complete the calculation of density
|
||||
|
||||
for (i = 0; i < nlocal; i++) {
|
||||
elti = fmap[type[i]];
|
||||
if (elti >= 0) {
|
||||
scaleii = scale[type[i]][type[i]];
|
||||
rho1[i] = 0.0;
|
||||
rho2[i] = -1.0 / 3.0 * arho2b[i] * arho2b[i];
|
||||
rho3[i] = 0.0;
|
||||
for (m = 0; m < 3; m++) {
|
||||
rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m];
|
||||
rho3[i] = rho3[i] - 3.0 / 5.0 * arho3b[i][m] * arho3b[i][m];
|
||||
}
|
||||
for (m = 0; m < 6; m++) {
|
||||
rho2[i] = rho2[i] + this->v2D[m] * arho2[i][m] * arho2[i][m];
|
||||
}
|
||||
for (m = 0; m < 10; m++) {
|
||||
rho3[i] = rho3[i] + this->v3D[m] * arho3[i][m] * arho3[i][m];
|
||||
}
|
||||
|
||||
if (rho0[i] > 0.0) {
|
||||
if (this->ialloy == 1) {
|
||||
t_ave[i][0] = fdiv_zero(t_ave[i][0], tsq_ave[i][0]);
|
||||
t_ave[i][1] = fdiv_zero(t_ave[i][1], tsq_ave[i][1]);
|
||||
t_ave[i][2] = fdiv_zero(t_ave[i][2], tsq_ave[i][2]);
|
||||
} else if (this->ialloy == 2) {
|
||||
t_ave[i][0] = this->t1_meam[elti];
|
||||
t_ave[i][1] = this->t2_meam[elti];
|
||||
t_ave[i][2] = this->t3_meam[elti];
|
||||
} else {
|
||||
t_ave[i][0] = t_ave[i][0] / rho0[i];
|
||||
t_ave[i][1] = t_ave[i][1] / rho0[i];
|
||||
t_ave[i][2] = t_ave[i][2] / rho0[i];
|
||||
if (this->msmeamflag) {
|
||||
for (i = 0; i < nlocal; i++) {
|
||||
elti = fmap[type[i]];
|
||||
if (elti >= 0) {
|
||||
scaleii = scale[type[i]][type[i]];
|
||||
rho1[i] = 0.0;
|
||||
rho2[i] = -1.0 / 3.0 * (arho2b[i] * arho2b[i]
|
||||
- arho2mb[i] * arho2mb[i]);
|
||||
rho3[i] = 0.0;
|
||||
for (m = 0; m < 3; m++) {
|
||||
rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m]
|
||||
- arho1m[i][m] * arho1m[i][m];
|
||||
rho3[i] = rho3[i] - 3.0 / 5.0 * (arho3b[i][m] * arho3b[i][m]
|
||||
- arho3mb[i][m] * arho3mb[i][m]);
|
||||
}
|
||||
}
|
||||
|
||||
gamma[i] = t_ave[i][0] * rho1[i] + t_ave[i][1] * rho2[i] + t_ave[i][2] * rho3[i];
|
||||
|
||||
if (rho0[i] > 0.0) {
|
||||
gamma[i] = gamma[i] / (rho0[i] * rho0[i]);
|
||||
}
|
||||
|
||||
Z = get_Zij(this->lattce_meam[elti][elti]);
|
||||
|
||||
G = G_gam(gamma[i], this->ibar_meam[elti], errorflag);
|
||||
if (errorflag != 0)
|
||||
return;
|
||||
|
||||
get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp);
|
||||
|
||||
if (this->ibar_meam[elti] <= 0) {
|
||||
Gbar = 1.0;
|
||||
dGbar = 0.0;
|
||||
} else {
|
||||
if (this->mix_ref_t == 1) {
|
||||
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
|
||||
} else {
|
||||
gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) /
|
||||
(Z * Z);
|
||||
for (m = 0; m < 6; m++) {
|
||||
rho2[i] = rho2[i] + this->v2D[m] * (arho2[i][m] * arho2[i][m]
|
||||
- arho2m[i][m] * arho2m[i][m]);
|
||||
}
|
||||
Gbar = G_gam(gam, this->ibar_meam[elti], errorflag);
|
||||
}
|
||||
rho[i] = rho0[i] * G;
|
||||
|
||||
if (this->mix_ref_t == 1) {
|
||||
for (m = 0; m < 10; m++) {
|
||||
rho3[i] = rho3[i] + this->v3D[m] * (arho3[i][m] * arho3[i][m]
|
||||
- arho3m[i][m] * arho3m[i][m]);
|
||||
}
|
||||
|
||||
// all the t weights are already accounted for with msmeam
|
||||
gamma[i] = rho1[i] + rho2[i] + rho3[i];
|
||||
|
||||
if (rho0[i] > 0.0) {
|
||||
gamma[i] = gamma[i] / (rho0[i] * rho0[i]);
|
||||
}
|
||||
|
||||
Z = get_Zij(this->lattce_meam[elti][elti]);
|
||||
|
||||
G = G_gam(gamma[i], this->ibar_meam[elti], errorflag);
|
||||
if (errorflag != 0)
|
||||
return;
|
||||
|
||||
get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp);
|
||||
|
||||
if (this->ibar_meam[elti] <= 0) {
|
||||
Gbar = 1.0;
|
||||
dGbar = 0.0;
|
||||
} else {
|
||||
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
|
||||
Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar);
|
||||
if (this->mix_ref_t == 1) {
|
||||
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
|
||||
} else {
|
||||
gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) /
|
||||
(Z * Z);
|
||||
}
|
||||
Gbar = G_gam(gam, this->ibar_meam[elti], errorflag);
|
||||
}
|
||||
rho_bkgd = this->rho0_meam[elti] * Z * Gbar;
|
||||
} else {
|
||||
if (this->bkgd_dyn == 1) {
|
||||
rho_bkgd = this->rho0_meam[elti] * Z;
|
||||
rho[i] = rho0[i] * G;
|
||||
|
||||
if (this->mix_ref_t == 1) {
|
||||
if (this->ibar_meam[elti] <= 0) {
|
||||
Gbar = 1.0;
|
||||
dGbar = 0.0;
|
||||
} else {
|
||||
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
|
||||
Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar);
|
||||
}
|
||||
rho_bkgd = this->rho0_meam[elti] * Z * Gbar;
|
||||
} else {
|
||||
rho_bkgd = this->rho_ref_meam[elti];
|
||||
if (this->bkgd_dyn == 1) {
|
||||
rho_bkgd = this->rho0_meam[elti] * Z;
|
||||
} else {
|
||||
rho_bkgd = this->rho_ref_meam[elti];
|
||||
}
|
||||
}
|
||||
rhob = rho[i] / rho_bkgd;
|
||||
denom = 1.0 / rho_bkgd;
|
||||
|
||||
G = dG_gam(gamma[i], this->ibar_meam[elti], dG);
|
||||
|
||||
dgamma1[i] = (G - 2 * dG * gamma[i]) * denom;
|
||||
|
||||
if (!iszero(rho0[i])) {
|
||||
dgamma2[i] = (dG / rho0[i]) * denom;
|
||||
} else {
|
||||
dgamma2[i] = 0.0;
|
||||
}
|
||||
|
||||
// dgamma3 is nonzero only if we are using the "mixed" rule for
|
||||
// computing t in the reference system (which is not correct, but
|
||||
// included for backward compatibility
|
||||
if (this->mix_ref_t == 1) {
|
||||
dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom;
|
||||
} else {
|
||||
dgamma3[i] = 0.0;
|
||||
}
|
||||
|
||||
Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]);
|
||||
if (eflag_either != 0) {
|
||||
Fl *= scaleii;
|
||||
if (eflag_global != 0) {
|
||||
*eng_vdwl = *eng_vdwl + Fl;
|
||||
}
|
||||
if (eflag_atom != 0) {
|
||||
eatom[i] = eatom[i] + Fl;
|
||||
}
|
||||
}
|
||||
}
|
||||
rhob = rho[i] / rho_bkgd;
|
||||
denom = 1.0 / rho_bkgd;
|
||||
|
||||
G = dG_gam(gamma[i], this->ibar_meam[elti], dG);
|
||||
|
||||
dgamma1[i] = (G - 2 * dG * gamma[i]) * denom;
|
||||
|
||||
if (!iszero(rho0[i])) {
|
||||
dgamma2[i] = (dG / rho0[i]) * denom;
|
||||
} else {
|
||||
dgamma2[i] = 0.0;
|
||||
}
|
||||
|
||||
// dgamma3 is nonzero only if we are using the "mixed" rule for
|
||||
// computing t in the reference system (which is not correct, but
|
||||
// included for backward compatibility
|
||||
if (this->mix_ref_t == 1) {
|
||||
dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom;
|
||||
} else {
|
||||
dgamma3[i] = 0.0;
|
||||
}
|
||||
|
||||
Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]);
|
||||
|
||||
if (eflag_either != 0) {
|
||||
Fl *= scaleii;
|
||||
if (eflag_global != 0) {
|
||||
*eng_vdwl = *eng_vdwl + Fl;
|
||||
}
|
||||
} else {
|
||||
for (i = 0; i < nlocal; i++) {
|
||||
elti = fmap[type[i]];
|
||||
if (elti >= 0) {
|
||||
scaleii = scale[type[i]][type[i]];
|
||||
rho1[i] = 0.0;
|
||||
rho2[i] = -1.0 / 3.0 * arho2b[i] * arho2b[i];
|
||||
rho3[i] = 0.0;
|
||||
for (m = 0; m < 3; m++) {
|
||||
rho1[i] = rho1[i] + arho1[i][m] * arho1[i][m];
|
||||
rho3[i] = rho3[i] - 3.0 / 5.0 * arho3b[i][m] * arho3b[i][m];
|
||||
}
|
||||
if (eflag_atom != 0) {
|
||||
eatom[i] = eatom[i] + Fl;
|
||||
for (m = 0; m < 6; m++) {
|
||||
rho2[i] = rho2[i] + this->v2D[m] * arho2[i][m] * arho2[i][m];
|
||||
}
|
||||
for (m = 0; m < 10; m++) {
|
||||
rho3[i] = rho3[i] + this->v3D[m] * arho3[i][m] * arho3[i][m];
|
||||
}
|
||||
|
||||
if (rho0[i] > 0.0) {
|
||||
if (this->ialloy == 1) {
|
||||
t_ave[i][0] = fdiv_zero(t_ave[i][0], tsq_ave[i][0]);
|
||||
t_ave[i][1] = fdiv_zero(t_ave[i][1], tsq_ave[i][1]);
|
||||
t_ave[i][2] = fdiv_zero(t_ave[i][2], tsq_ave[i][2]);
|
||||
} else if (this->ialloy == 2) {
|
||||
t_ave[i][0] = this->t1_meam[elti];
|
||||
t_ave[i][1] = this->t2_meam[elti];
|
||||
t_ave[i][2] = this->t3_meam[elti];
|
||||
} else {
|
||||
t_ave[i][0] = t_ave[i][0] / rho0[i];
|
||||
t_ave[i][1] = t_ave[i][1] / rho0[i];
|
||||
t_ave[i][2] = t_ave[i][2] / rho0[i];
|
||||
}
|
||||
}
|
||||
|
||||
gamma[i] = t_ave[i][0] * rho1[i] + t_ave[i][1] * rho2[i] + t_ave[i][2] * rho3[i];
|
||||
|
||||
if (rho0[i] > 0.0) {
|
||||
gamma[i] = gamma[i] / (rho0[i] * rho0[i]);
|
||||
}
|
||||
|
||||
Z = get_Zij(this->lattce_meam[elti][elti]);
|
||||
|
||||
G = G_gam(gamma[i], this->ibar_meam[elti], errorflag);
|
||||
if (errorflag != 0)
|
||||
return;
|
||||
|
||||
get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shp);
|
||||
|
||||
if (this->ibar_meam[elti] <= 0) {
|
||||
Gbar = 1.0;
|
||||
dGbar = 0.0;
|
||||
} else {
|
||||
if (this->mix_ref_t == 1) {
|
||||
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
|
||||
} else {
|
||||
gam = (this->t1_meam[elti] * shp[0] + this->t2_meam[elti] * shp[1] + this->t3_meam[elti] * shp[2]) /
|
||||
(Z * Z);
|
||||
}
|
||||
Gbar = G_gam(gam, this->ibar_meam[elti], errorflag);
|
||||
}
|
||||
rho[i] = rho0[i] * G;
|
||||
|
||||
if (this->mix_ref_t == 1) {
|
||||
if (this->ibar_meam[elti] <= 0) {
|
||||
Gbar = 1.0;
|
||||
dGbar = 0.0;
|
||||
} else {
|
||||
gam = (t_ave[i][0] * shp[0] + t_ave[i][1] * shp[1] + t_ave[i][2] * shp[2]) / (Z * Z);
|
||||
Gbar = dG_gam(gam, this->ibar_meam[elti], dGbar);
|
||||
}
|
||||
rho_bkgd = this->rho0_meam[elti] * Z * Gbar;
|
||||
} else {
|
||||
if (this->bkgd_dyn == 1) {
|
||||
rho_bkgd = this->rho0_meam[elti] * Z;
|
||||
} else {
|
||||
rho_bkgd = this->rho_ref_meam[elti];
|
||||
}
|
||||
}
|
||||
rhob = rho[i] / rho_bkgd;
|
||||
denom = 1.0 / rho_bkgd;
|
||||
|
||||
G = dG_gam(gamma[i], this->ibar_meam[elti], dG);
|
||||
|
||||
dgamma1[i] = (G - 2 * dG * gamma[i]) * denom;
|
||||
|
||||
if (!iszero(rho0[i])) {
|
||||
dgamma2[i] = (dG / rho0[i]) * denom;
|
||||
} else {
|
||||
dgamma2[i] = 0.0;
|
||||
}
|
||||
|
||||
// dgamma3 is nonzero only if we are using the "mixed" rule for
|
||||
// computing t in the reference system (which is not correct, but
|
||||
// included for backward compatibility
|
||||
if (this->mix_ref_t == 1) {
|
||||
dgamma3[i] = rho0[i] * G * dGbar / (Gbar * Z * Z) * denom;
|
||||
} else {
|
||||
dgamma3[i] = 0.0;
|
||||
}
|
||||
|
||||
Fl = embedding(this->A_meam[elti], this->Ec_meam[elti][elti], rhob, frhop[i]);
|
||||
|
||||
if (eflag_either != 0) {
|
||||
Fl *= scaleii;
|
||||
if (eflag_global != 0) {
|
||||
*eng_vdwl = *eng_vdwl + Fl;
|
||||
}
|
||||
if (eflag_atom != 0) {
|
||||
eatom[i] = eatom[i] + Fl;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -45,6 +45,14 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
|
||||
memory->destroy(arho3b);
|
||||
memory->destroy(t_ave);
|
||||
memory->destroy(tsq_ave);
|
||||
// msmeam params
|
||||
if (this->msmeamflag) {
|
||||
memory->destroy(arho1m);
|
||||
memory->destroy(arho2m);
|
||||
memory->destroy(arho3m);
|
||||
memory->destroy(arho2mb);
|
||||
memory->destroy(arho3mb);
|
||||
}
|
||||
|
||||
nmax = atom_nmax;
|
||||
|
||||
@ -65,6 +73,14 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
|
||||
memory->create(arho3b, nmax, 3, "pair:arho3b");
|
||||
memory->create(t_ave, nmax, 3, "pair:t_ave");
|
||||
memory->create(tsq_ave, nmax, 3, "pair:tsq_ave");
|
||||
// msmeam params
|
||||
if (this->msmeamflag) {
|
||||
memory->create(arho1m, nmax, 3, "pair:arho1m");
|
||||
memory->create(arho2m, nmax, 6, "pair:arho2m");
|
||||
memory->create(arho3m, nmax, 10, "pair:arho3m");
|
||||
memory->create(arho2mb, nmax, "pair:arho2mb");
|
||||
memory->create(arho3mb, nmax, 3, "pair:arho3mb");
|
||||
}
|
||||
}
|
||||
|
||||
if (n_neigh > maxneigh) {
|
||||
@ -83,14 +99,30 @@ MEAM::meam_dens_setup(int atom_nmax, int nall, int n_neigh)
|
||||
rho0[i] = 0.0;
|
||||
arho2b[i] = 0.0;
|
||||
arho1[i][0] = arho1[i][1] = arho1[i][2] = 0.0;
|
||||
for (j = 0; j < 6; j++)
|
||||
if (this->msmeamflag) {
|
||||
arho2mb[i] = 0.0;
|
||||
arho1m[i][0] = arho1m[i][1] = arho1m[i][2] = 0.0;
|
||||
}
|
||||
for (j = 0; j < 6; j++) {
|
||||
arho2[i][j] = 0.0;
|
||||
for (j = 0; j < 10; j++)
|
||||
if (this->msmeamflag) {
|
||||
arho2m[i][j] = 0.0;
|
||||
}
|
||||
}
|
||||
for (j = 0; j < 10; j++) {
|
||||
arho3[i][j] = 0.0;
|
||||
if (this->msmeamflag) {
|
||||
arho3m[i][j] = 0.0;
|
||||
}
|
||||
}
|
||||
arho3b[i][0] = arho3b[i][1] = arho3b[i][2] = 0.0;
|
||||
if (this->msmeamflag) {
|
||||
arho3mb[i][0] = arho3mb[i][1] = arho3mb[i][2] = 0.0;
|
||||
}
|
||||
t_ave[i][0] = t_ave[i][1] = t_ave[i][2] = 0.0;
|
||||
tsq_ave[i][0] = tsq_ave[i][1] = tsq_ave[i][2] = 0.0;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
void
|
||||
@ -282,6 +314,9 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
|
||||
// double G,Gbar,gam,shp[3+1];
|
||||
double ro0i, ro0j;
|
||||
double rhoa0i, rhoa1i, rhoa2i, rhoa3i, A1i, A2i, A3i;
|
||||
// msmeam params
|
||||
double rhoa1mj, rhoa2mj, rhoa3mj, A1mj, A2mj, A3mj;
|
||||
double rhoa1mi, rhoa2mi, rhoa3mi, A1mi, A2mi, A3mi;
|
||||
|
||||
elti = fmap[type[i]];
|
||||
xtmp = x[i][0];
|
||||
@ -306,10 +341,20 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
|
||||
rhoa1j = ro0j * MathSpecial::fm_exp(-this->beta1_meam[eltj] * aj) * sij;
|
||||
rhoa2j = ro0j * MathSpecial::fm_exp(-this->beta2_meam[eltj] * aj) * sij;
|
||||
rhoa3j = ro0j * MathSpecial::fm_exp(-this->beta3_meam[eltj] * aj) * sij;
|
||||
if (this->msmeamflag){
|
||||
rhoa1mj = ro0j * this->t1m_meam[eltj] * MathSpecial::fm_exp(-this->beta1m_meam[eltj] * aj) * sij;
|
||||
rhoa2mj = ro0j * this->t2m_meam[eltj] * MathSpecial::fm_exp(-this->beta2m_meam[eltj] * aj) * sij;
|
||||
rhoa3mj = ro0j * this->t3m_meam[eltj] * MathSpecial::fm_exp(-this->beta3m_meam[eltj] * aj) * sij;
|
||||
}
|
||||
rhoa0i = ro0i * MathSpecial::fm_exp(-this->beta0_meam[elti] * ai) * sij;
|
||||
rhoa1i = ro0i * MathSpecial::fm_exp(-this->beta1_meam[elti] * ai) * sij;
|
||||
rhoa2i = ro0i * MathSpecial::fm_exp(-this->beta2_meam[elti] * ai) * sij;
|
||||
rhoa3i = ro0i * MathSpecial::fm_exp(-this->beta3_meam[elti] * ai) * sij;
|
||||
if (this->msmeamflag){
|
||||
rhoa1mi = ro0i * this->t1m_meam[elti] * MathSpecial::fm_exp(-this->beta1m_meam[elti] * ai) * sij;
|
||||
rhoa2mi = ro0i * this->t2m_meam[elti] * MathSpecial::fm_exp(-this->beta2m_meam[elti] * ai) * sij;
|
||||
rhoa3mi = ro0i * this->t3m_meam[elti] * MathSpecial::fm_exp(-this->beta3m_meam[elti] * ai) * sij;
|
||||
}
|
||||
if (this->ialloy == 1) {
|
||||
rhoa1j = rhoa1j * this->t1_meam[eltj];
|
||||
rhoa2j = rhoa2j * this->t2_meam[eltj];
|
||||
@ -321,6 +366,7 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
|
||||
rho0[i] = rho0[i] + rhoa0j;
|
||||
rho0[j] = rho0[j] + rhoa0i;
|
||||
// For ialloy = 2, use single-element value (not average)
|
||||
// For ialloy = 2, use single-element value (not average)
|
||||
if (this->ialloy != 2) {
|
||||
t_ave[i][0] = t_ave[i][0] + this->t1_meam[eltj] * rhoa0j;
|
||||
t_ave[i][1] = t_ave[i][1] + this->t2_meam[eltj] * rhoa0j;
|
||||
@ -348,18 +394,42 @@ MEAM::calc_rho1(int i, int /*ntype*/, int* type, int* fmap, double** x, int numn
|
||||
A3i = rhoa3i / (rij2 * rij);
|
||||
nv2 = 0;
|
||||
nv3 = 0;
|
||||
if (this->msmeamflag) {
|
||||
arho2mb[i] = arho2mb[i] + rhoa2mj;
|
||||
arho2mb[j] = arho2mb[j] + rhoa2mi;
|
||||
A1mj = rhoa1mj/rij;
|
||||
A2mj = rhoa2mj/rij2;
|
||||
A3mj = rhoa3mj/(rij2*rij);
|
||||
A1mi = rhoa1mi/rij;
|
||||
A2mi = rhoa2mi/rij2;
|
||||
A3mi = rhoa3mi/(rij2*rij);
|
||||
}
|
||||
for (m = 0; m < 3; m++) {
|
||||
arho1[i][m] = arho1[i][m] + A1j * delij[m];
|
||||
arho1[j][m] = arho1[j][m] - A1i * delij[m];
|
||||
arho3b[i][m] = arho3b[i][m] + rhoa3j * delij[m] / rij;
|
||||
arho3b[j][m] = arho3b[j][m] - rhoa3i * delij[m] / rij;
|
||||
if (this->msmeamflag) {
|
||||
arho1m[i][m] = arho1m[i][m] + A1mj*delij[m];
|
||||
arho1m[j][m] = arho1m[j][m] - A1mi*delij[m];
|
||||
arho3mb[i][m] = arho3mb[i][m] + rhoa3mj*delij[m] / rij;
|
||||
arho3mb[j][m] = arho3mb[j][m] - rhoa3mi*delij[m] / rij;
|
||||
}
|
||||
for (n = m; n < 3; n++) {
|
||||
arho2[i][nv2] = arho2[i][nv2] + A2j * delij[m] * delij[n];
|
||||
arho2[j][nv2] = arho2[j][nv2] + A2i * delij[m] * delij[n];
|
||||
if (this->msmeamflag) {
|
||||
arho2m[i][nv2] = arho2m[i][nv2] + A2mj*delij[m] * delij[n];
|
||||
arho2m[j][nv2] = arho2m[j][nv2] + A2mi*delij[m] * delij[n];
|
||||
}
|
||||
nv2 = nv2 + 1;
|
||||
for (p = n; p < 3; p++) {
|
||||
arho3[i][nv3] = arho3[i][nv3] + A3j * delij[m] * delij[n] * delij[p];
|
||||
arho3[j][nv3] = arho3[j][nv3] - A3i * delij[m] * delij[n] * delij[p];
|
||||
if (this->msmeamflag) {
|
||||
arho3m[i][nv3] = arho3m[i][nv3] + A3mj*delij[m]*delij[n]*delij[p];
|
||||
arho3m[j][nv3] = arho3m[j][nv3] - A3mi*delij[m]*delij[n]*delij[p];
|
||||
}
|
||||
nv3 = nv3 + 1;
|
||||
}
|
||||
}
|
||||
|
||||
@ -61,6 +61,17 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
double t1i, t2i, t3i, t1j, t2j, t3j;
|
||||
double scaleij;
|
||||
|
||||
double rhoa1mj,drhoa1mj,rhoa1mi,drhoa1mi;
|
||||
double rhoa2mj,drhoa2mj,rhoa2mi,drhoa2mi;
|
||||
double rhoa3mj, drhoa3mj, rhoa3mi, drhoa3mi;
|
||||
double arg1i1m, arg1j1m, arg1i2m, arg1j2m, arg1i3m, arg1j3m, arg3i3m, arg3j3m;
|
||||
double drho1mdr1, drho1mdr2, drho1mds1, drho1mds2;
|
||||
double drho1mdrm1[3], drho1mdrm2[3];
|
||||
double drho2mdr1, drho2mdr2, drho2mds1, drho2mds2;
|
||||
double drho2mdrm1[3], drho2mdrm2[3];
|
||||
double drho3mdr1, drho3mdr2, drho3mds1, drho3mds2;
|
||||
double drho3mdrm1[3], drho3mdrm2[3];
|
||||
|
||||
third = 1.0 / 3.0;
|
||||
sixth = 1.0 / 6.0;
|
||||
|
||||
@ -74,6 +85,7 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
zitmp = x[i][2];
|
||||
|
||||
// Treat each pair
|
||||
|
||||
for (jn = 0; jn < numneigh; jn++) {
|
||||
j = firstneigh[jn];
|
||||
eltj = fmap[type[j]];
|
||||
@ -89,7 +101,6 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
if (rij2 < this->cutforcesq) {
|
||||
rij = sqrt(rij2);
|
||||
recip = 1.0 / rij;
|
||||
|
||||
// Compute phi and phip
|
||||
ind = this->eltind[elti][eltj];
|
||||
pp = rij * this->rdrar;
|
||||
@ -114,6 +125,7 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
// write(1,*) "force_meamf: phip: ",phip
|
||||
|
||||
// Compute pair densities and derivatives
|
||||
|
||||
invrei = 1.0 / this->re_meam[elti][elti];
|
||||
ai = rij * invrei - 1.0;
|
||||
ro0i = this->rho0_meam[elti];
|
||||
@ -126,6 +138,15 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
rhoa3i = ro0i * MathSpecial::fm_exp(-this->beta3_meam[elti] * ai);
|
||||
drhoa3i = -this->beta3_meam[elti] * invrei * rhoa3i;
|
||||
|
||||
if (this->msmeamflag) {
|
||||
rhoa1mi = ro0i * MathSpecial::fm_exp(-this->beta1m_meam[elti] * ai) * t1m_meam[elti];
|
||||
drhoa1mi = -this->beta1m_meam[elti] * invrei * rhoa1mi;
|
||||
rhoa2mi = ro0i * MathSpecial::fm_exp(-this->beta2m_meam[elti] * ai) * t2m_meam[elti];
|
||||
drhoa2mi = -this->beta2m_meam[elti] * invrei * rhoa2mi;
|
||||
rhoa3mi = ro0i * MathSpecial::fm_exp(-this->beta3m_meam[elti] * ai) * t3m_meam[elti];
|
||||
drhoa3mi = -this->beta3m_meam[elti] * invrei * rhoa3mi;
|
||||
}
|
||||
|
||||
if (elti != eltj) {
|
||||
invrej = 1.0 / this->re_meam[eltj][eltj];
|
||||
aj = rij * invrej - 1.0;
|
||||
@ -138,6 +159,16 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
drhoa2j = -this->beta2_meam[eltj] * invrej * rhoa2j;
|
||||
rhoa3j = ro0j * MathSpecial::fm_exp(-this->beta3_meam[eltj] * aj);
|
||||
drhoa3j = -this->beta3_meam[eltj] * invrej * rhoa3j;
|
||||
|
||||
if (this->msmeamflag) {
|
||||
rhoa1mj = ro0j * t1m_meam[eltj] * MathSpecial::fm_exp(-this->beta1m_meam[eltj] * aj);
|
||||
drhoa1mj = -this->beta1m_meam[eltj] * invrej * rhoa1mj;
|
||||
rhoa2mj = ro0j * t2m_meam[eltj] * MathSpecial::fm_exp(-this->beta2m_meam[eltj] * aj);
|
||||
drhoa2mj = -this->beta2m_meam[eltj] * invrej * rhoa2mj;
|
||||
rhoa3mj = ro0j * t3m_meam[eltj] * MathSpecial::fm_exp(-this->beta3m_meam[eltj] * aj);
|
||||
drhoa3mj = -this->beta3m_meam[eltj] * invrej * rhoa3mj;
|
||||
}
|
||||
|
||||
} else {
|
||||
rhoa0j = rhoa0i;
|
||||
drhoa0j = drhoa0i;
|
||||
@ -147,6 +178,15 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
drhoa2j = drhoa2i;
|
||||
rhoa3j = rhoa3i;
|
||||
drhoa3j = drhoa3i;
|
||||
|
||||
if (this->msmeamflag) {
|
||||
rhoa1mj = rhoa1mi;
|
||||
drhoa1mj = drhoa1mi;
|
||||
rhoa2mj = rhoa2mi;
|
||||
drhoa2mj = drhoa2mi;
|
||||
rhoa3mj = rhoa3mi;
|
||||
drhoa3mj = drhoa3mi;
|
||||
}
|
||||
}
|
||||
|
||||
const double t1mi = this->t1_meam[elti];
|
||||
@ -156,7 +196,10 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
const double t2mj = this->t2_meam[eltj];
|
||||
const double t3mj = this->t3_meam[eltj];
|
||||
|
||||
if (this->ialloy == 1) {
|
||||
// ialloy mod not needed in MS-MEAM, but similarity here is that we multply rhos by t.
|
||||
// We did this above with rhoa1mj, rhoa2mj, etc.
|
||||
|
||||
if (this->ialloy == 1 || this->msmeamflag) {
|
||||
rhoa1j *= t1mj;
|
||||
rhoa2j *= t2mj;
|
||||
rhoa3j *= t3mj;
|
||||
@ -200,6 +243,39 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
arg3j3 = arg3j3 - arho3b[j][n] * delij[n];
|
||||
}
|
||||
|
||||
// msmeam arhom args
|
||||
|
||||
nv2 = 0;
|
||||
nv3 = 0;
|
||||
arg1i1m = 0.0;
|
||||
arg1j1m = 0.0;
|
||||
arg1i2m = 0.0;
|
||||
arg1j2m = 0.0;
|
||||
arg1i3m = 0.0;
|
||||
arg1j3m = 0.0;
|
||||
arg3i3m = 0.0;
|
||||
arg3j3m = 0.0;
|
||||
if (this->msmeamflag) {
|
||||
for (n = 0; n < 3; n++) {
|
||||
for (p = n; p < 3; p++) {
|
||||
for (q = p; q < 3; q++) {
|
||||
arg = delij[n] * delij[p] * delij[q] * this->v3D[nv3];
|
||||
arg1i3m = arg1i3m - arho3m[i][nv3] * arg;
|
||||
arg1j3m = arg1j3m + arho3m[j][nv3] * arg;
|
||||
nv3 = nv3 + 1;
|
||||
}
|
||||
arg = delij[n] * delij[p] * this->v2D[nv2];
|
||||
arg1i2m = arg1i2m + arho2m[i][nv2] * arg;
|
||||
arg1j2m = arg1j2m + arho2m[j][nv2] * arg;
|
||||
nv2 = nv2 + 1;
|
||||
}
|
||||
arg1i1m = arg1i1m - arho1m[i][n] * delij[n];
|
||||
arg1j1m = arg1j1m + arho1m[j][n] * delij[n];
|
||||
arg3i3m = arg3i3m - arho3mb[i][n] * delij[n];
|
||||
arg3j3m = arg3j3m + arho3mb[j][n] * delij[n];
|
||||
}
|
||||
}
|
||||
|
||||
// rho0 terms
|
||||
drho0dr1 = drhoa0j * sij;
|
||||
drho0dr2 = drhoa0i * sij;
|
||||
@ -254,32 +330,83 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
drho3drm2[m] = (-a3 * drho3drm2[m] + a3a * arho3b[j][m]) * rhoa3i;
|
||||
}
|
||||
|
||||
// Compute derivatives of weighting functions t wrt rij
|
||||
t1i = t_ave[i][0];
|
||||
t2i = t_ave[i][1];
|
||||
t3i = t_ave[i][2];
|
||||
t1j = t_ave[j][0];
|
||||
t2j = t_ave[j][1];
|
||||
t3j = t_ave[j][2];
|
||||
if (this->msmeamflag) {
|
||||
// rho1m terms
|
||||
a1 = 2 * sij / rij;
|
||||
drho1mdr1 = a1 * (drhoa1mj - rhoa1mj / rij) * arg1i1m;
|
||||
drho1mdr2 = a1 * (drhoa1mi - rhoa1mi / rij) * arg1j1m;
|
||||
drho1mdr1 *= -1.0;
|
||||
drho1mdr2 *= -1.0;
|
||||
a1 = 2.0 * sij / rij;
|
||||
for (m = 0; m < 3; m++) {
|
||||
drho1mdrm1[m] = a1 * rhoa1mj * arho1m[i][m];
|
||||
drho1mdrm2[m] = -a1 * rhoa1mi * arho1m[j][m];
|
||||
}
|
||||
|
||||
if (this->ialloy == 1) {
|
||||
// rho2m terms
|
||||
a2 = 2 * sij / rij2;
|
||||
drho2mdr1 = a2 * (drhoa2mj - 2 * rhoa2mj / rij) * arg1i2m - 2.0 / 3.0 * arho2mb[i] * drhoa2mj * sij;
|
||||
drho2mdr2 = a2 * (drhoa2mi - 2 * rhoa2mi / rij) * arg1j2m - 2.0 / 3.0 * arho2mb[j] * drhoa2mi * sij;
|
||||
a2 = 4 * sij / rij2;
|
||||
for (m = 0; m < 3; m++) {
|
||||
drho2mdrm1[m] = 0.0;
|
||||
drho2mdrm2[m] = 0.0;
|
||||
for (n = 0; n < 3; n++) {
|
||||
drho2mdrm1[m] += arho2m[i][this->vind2D[m][n]] * delij[n];
|
||||
drho2mdrm2[m] -= arho2m[j][this->vind2D[m][n]] * delij[n];
|
||||
}
|
||||
drho2mdrm1[m] = a2 * rhoa2mj * drho2mdrm1[m];
|
||||
drho2mdrm2[m] = -a2 * rhoa2mi * drho2mdrm2[m];
|
||||
}
|
||||
|
||||
a1i = fdiv_zero(drhoa0j * sij, tsq_ave[i][0]);
|
||||
a1j = fdiv_zero(drhoa0i * sij, tsq_ave[j][0]);
|
||||
a2i = fdiv_zero(drhoa0j * sij, tsq_ave[i][1]);
|
||||
a2j = fdiv_zero(drhoa0i * sij, tsq_ave[j][1]);
|
||||
a3i = fdiv_zero(drhoa0j * sij, tsq_ave[i][2]);
|
||||
a3j = fdiv_zero(drhoa0i * sij, tsq_ave[j][2]);
|
||||
// rho3m terms
|
||||
rij3 = rij * rij2;
|
||||
a3 = 2 * sij / rij3;
|
||||
a3a = 6.0 / 5.0 * sij / rij;
|
||||
drho3mdr1 = a3 * (drhoa3mj - 3 * rhoa3mj / rij) * arg1i3m - a3a * (drhoa3mj - rhoa3mj / rij) * arg3i3m;
|
||||
drho3mdr2 = a3 * (drhoa3mi - 3 * rhoa3mi / rij) * arg1j3m - a3a * (drhoa3mi - rhoa3mi / rij) * arg3j3m;
|
||||
drho3mdr1 *= -1.0;
|
||||
drho3mdr2 *= -1.0;
|
||||
|
||||
dt1dr1 = a1i * (t1mj - t1i * MathSpecial::square(t1mj));
|
||||
dt1dr2 = a1j * (t1mi - t1j * MathSpecial::square(t1mi));
|
||||
dt2dr1 = a2i * (t2mj - t2i * MathSpecial::square(t2mj));
|
||||
dt2dr2 = a2j * (t2mi - t2j * MathSpecial::square(t2mi));
|
||||
dt3dr1 = a3i * (t3mj - t3i * MathSpecial::square(t3mj));
|
||||
dt3dr2 = a3j * (t3mi - t3j * MathSpecial::square(t3mi));
|
||||
a3 = 6 * sij / rij3;
|
||||
a3a = 6 * sij / (5 * rij);
|
||||
for (m = 0; m < 3; m++) {
|
||||
drho3mdrm1[m] = 0.0;
|
||||
drho3mdrm2[m] = 0.0;
|
||||
nv2 = 0;
|
||||
for (n = 0; n < 3; n++) {
|
||||
for (p = n; p < 3; p++) {
|
||||
arg = delij[n] * delij[p] * this->v2D[nv2];
|
||||
drho3mdrm1[m] += arho3m[i][this->vind3D[m][n][p]] * arg;
|
||||
drho3mdrm2[m] += arho3m[j][this->vind3D[m][n][p]] * arg;
|
||||
nv2 = nv2 + 1;
|
||||
}
|
||||
}
|
||||
drho3mdrm1[m] = (a3 * drho3mdrm1[m] - a3a * arho3mb[i][m]) * rhoa3mj;
|
||||
drho3mdrm2[m] = (-a3 * drho3mdrm2[m] + a3a * arho3mb[j][m]) * rhoa3mi;
|
||||
}
|
||||
} else {
|
||||
for (m = 0; m < 3; m++) {
|
||||
drho1mdrm1[m] = 0.0;
|
||||
drho1mdrm2[m] = 0.0;
|
||||
drho2mdrm1[m] = 0.0;
|
||||
drho2mdrm2[m] = 0.0;
|
||||
drho3mdrm1[m] = 0.0;
|
||||
drho3mdrm2[m] = 0.0;
|
||||
}
|
||||
}
|
||||
|
||||
} else if (this->ialloy == 2) {
|
||||
// compute derivatives of weighting functions t wrt rij
|
||||
// weighting functions t set to unity for MS-MEAM
|
||||
|
||||
if (this->msmeamflag) {
|
||||
|
||||
t1i = 1.0;
|
||||
t2i = 1.0;
|
||||
t3i = 1.0;
|
||||
t1j = 1.0;
|
||||
t2j = 1.0;
|
||||
t3j = 1.0;
|
||||
dt1dr1 = 0.0;
|
||||
dt1dr2 = 0.0;
|
||||
dt2dr1 = 0.0;
|
||||
@ -289,38 +416,98 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
|
||||
} else {
|
||||
|
||||
ai = 0.0;
|
||||
if (!iszero(rho0[i]))
|
||||
ai = drhoa0j * sij / rho0[i];
|
||||
aj = 0.0;
|
||||
if (!iszero(rho0[j]))
|
||||
aj = drhoa0i * sij / rho0[j];
|
||||
t1i = t_ave[i][0];
|
||||
t2i = t_ave[i][1];
|
||||
t3i = t_ave[i][2];
|
||||
t1j = t_ave[j][0];
|
||||
t2j = t_ave[j][1];
|
||||
t3j = t_ave[j][2];
|
||||
|
||||
if (this->ialloy == 1) {
|
||||
|
||||
a1i = fdiv_zero(drhoa0j * sij, tsq_ave[i][0]);
|
||||
a1j = fdiv_zero(drhoa0i * sij, tsq_ave[j][0]);
|
||||
a2i = fdiv_zero(drhoa0j * sij, tsq_ave[i][1]);
|
||||
a2j = fdiv_zero(drhoa0i * sij, tsq_ave[j][1]);
|
||||
a3i = fdiv_zero(drhoa0j * sij, tsq_ave[i][2]);
|
||||
a3j = fdiv_zero(drhoa0i * sij, tsq_ave[j][2]);
|
||||
|
||||
dt1dr1 = a1i * (t1mj - t1i * MathSpecial::square(t1mj));
|
||||
dt1dr2 = a1j * (t1mi - t1j * MathSpecial::square(t1mi));
|
||||
dt2dr1 = a2i * (t2mj - t2i * MathSpecial::square(t2mj));
|
||||
dt2dr2 = a2j * (t2mi - t2j * MathSpecial::square(t2mi));
|
||||
dt3dr1 = a3i * (t3mj - t3i * MathSpecial::square(t3mj));
|
||||
dt3dr2 = a3j * (t3mi - t3j * MathSpecial::square(t3mi));
|
||||
|
||||
} else if (this->ialloy == 2) {
|
||||
|
||||
dt1dr1 = 0.0;
|
||||
dt1dr2 = 0.0;
|
||||
dt2dr1 = 0.0;
|
||||
dt2dr2 = 0.0;
|
||||
dt3dr1 = 0.0;
|
||||
dt3dr2 = 0.0;
|
||||
|
||||
} else {
|
||||
|
||||
ai = 0.0;
|
||||
if (!iszero(rho0[i]))
|
||||
ai = drhoa0j * sij / rho0[i];
|
||||
aj = 0.0;
|
||||
if (!iszero(rho0[j]))
|
||||
aj = drhoa0i * sij / rho0[j];
|
||||
|
||||
dt1dr1 = ai * (t1mj - t1i);
|
||||
dt1dr2 = aj * (t1mi - t1j);
|
||||
dt2dr1 = ai * (t2mj - t2i);
|
||||
dt2dr2 = aj * (t2mi - t2j);
|
||||
dt3dr1 = ai * (t3mj - t3i);
|
||||
dt3dr2 = aj * (t3mi - t3j);
|
||||
}
|
||||
|
||||
dt1dr1 = ai * (t1mj - t1i);
|
||||
dt1dr2 = aj * (t1mi - t1j);
|
||||
dt2dr1 = ai * (t2mj - t2i);
|
||||
dt2dr2 = aj * (t2mi - t2j);
|
||||
dt3dr1 = ai * (t3mj - t3i);
|
||||
dt3dr2 = aj * (t3mi - t3j);
|
||||
}
|
||||
|
||||
// Compute derivatives of total density wrt rij, sij and rij(3)
|
||||
get_shpfcn(this->lattce_meam[elti][elti], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shpi);
|
||||
get_shpfcn(this->lattce_meam[eltj][eltj], this->stheta_meam[elti][elti], this->ctheta_meam[elti][elti], shpj);
|
||||
|
||||
drhodr1 = dgamma1[i] * drho0dr1 +
|
||||
dgamma2[i] * (dt1dr1 * rho1[i] + t1i * drho1dr1 + dt2dr1 * rho2[i] + t2i * drho2dr1 +
|
||||
dt3dr1 * rho3[i] + t3i * drho3dr1) -
|
||||
dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
|
||||
drhodr2 = dgamma1[j] * drho0dr2 +
|
||||
dgamma2[j] * (dt1dr2 * rho1[j] + t1j * drho1dr2 + dt2dr2 * rho2[j] + t2j * drho2dr2 +
|
||||
dt3dr2 * rho3[j] + t3j * drho3dr2) -
|
||||
dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
|
||||
for (m = 0; m < 3; m++) {
|
||||
drhodrm1[m] = 0.0;
|
||||
drhodrm2[m] = 0.0;
|
||||
drhodrm1[m] = dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
|
||||
drhodrm2[m] = dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
|
||||
if (this->msmeamflag) {
|
||||
drhodr1 = dgamma1[i] * drho0dr1 +
|
||||
dgamma2[i] * (dt1dr1 * rho1[i] + t1i * (drho1dr1 - drho1mdr1) +
|
||||
dt2dr1 * rho2[i] + t2i * (drho2dr1 - drho2mdr1) +
|
||||
dt3dr1 * rho3[i] + t3i * (drho3dr1 - drho3mdr1)) -
|
||||
dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
|
||||
drhodr2 = dgamma1[j] * drho0dr2 +
|
||||
dgamma2[j] * (dt1dr2 * rho1[j] + t1j * (drho1dr2 - drho1mdr2) +
|
||||
dt2dr2 * rho2[j] + t2j * (drho2dr2 - drho2mdr2) +
|
||||
dt3dr2 * rho3[j] + t3j * (drho3dr2 - drho3mdr2)) -
|
||||
dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
|
||||
for (m = 0; m < 3; m++) {
|
||||
drhodrm1[m] = 0.0;
|
||||
drhodrm2[m] = 0.0;
|
||||
drhodrm1[m] = dgamma2[i] * (t1i * (drho1drm1[m] - drho1mdrm1[m]) +
|
||||
t2i * (drho2drm1[m] - drho2mdrm1[m]) +
|
||||
t3i * (drho3drm1[m] - drho3mdrm1[m]) );
|
||||
drhodrm2[m] = dgamma2[j] * (t1j * (drho1drm2[m] - drho1mdrm2[m]) +
|
||||
t2j * (drho2drm2[m] - drho2mdrm2[m]) +
|
||||
t3j * (drho3drm2[m] - drho3mdrm2[m]) );
|
||||
}
|
||||
} else {
|
||||
|
||||
drhodr1 = dgamma1[i] * drho0dr1 +
|
||||
dgamma2[i] * (dt1dr1 * rho1[i] + t1i * drho1dr1 + dt2dr1 * rho2[i] + t2i * drho2dr1 +
|
||||
dt3dr1 * rho3[i] + t3i * drho3dr1) -
|
||||
dgamma3[i] * (shpi[0] * dt1dr1 + shpi[1] * dt2dr1 + shpi[2] * dt3dr1);
|
||||
drhodr2 = dgamma1[j] * drho0dr2 +
|
||||
dgamma2[j] * (dt1dr2 * rho1[j] + t1j * drho1dr2 + dt2dr2 * rho2[j] + t2j * drho2dr2 +
|
||||
dt3dr2 * rho3[j] + t3j * drho3dr2) -
|
||||
dgamma3[j] * (shpj[0] * dt1dr2 + shpj[1] * dt2dr2 + shpj[2] * dt3dr2);
|
||||
for (m = 0; m < 3; m++) {
|
||||
drhodrm1[m] = 0.0;
|
||||
drhodrm2[m] = 0.0;
|
||||
drhodrm1[m] = dgamma2[i] * (t1i * drho1drm1[m] + t2i * drho2drm1[m] + t3i * drho3drm1[m]);
|
||||
drhodrm2[m] = dgamma2[j] * (t1j * drho1drm2[m] + t2j * drho2drm2[m] + t3j * drho3drm2[m]);
|
||||
}
|
||||
}
|
||||
|
||||
// Compute derivatives wrt sij, but only if necessary
|
||||
@ -328,17 +515,37 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
drho0ds1 = rhoa0j;
|
||||
drho0ds2 = rhoa0i;
|
||||
a1 = 2.0 / rij;
|
||||
drho1ds1 = a1 * rhoa1j * arg1i1;
|
||||
drho1ds2 = a1 * rhoa1i * arg1j1;
|
||||
a2 = 2.0 / rij2;
|
||||
drho2ds1 = a2 * rhoa2j * arg1i2 - 2.0 / 3.0 * arho2b[i] * rhoa2j;
|
||||
drho2ds2 = a2 * rhoa2i * arg1j2 - 2.0 / 3.0 * arho2b[j] * rhoa2i;
|
||||
a3 = 2.0 / rij3;
|
||||
a3a = 6.0 / (5.0 * rij);
|
||||
|
||||
drho1ds1 = a1 * rhoa1j * arg1i1;
|
||||
drho1ds2 = a1 * rhoa1i * arg1j1;
|
||||
drho2ds1 = a2 * rhoa2j * arg1i2 - 2.0 / 3.0 * arho2b[i] * rhoa2j;
|
||||
drho2ds2 = a2 * rhoa2i * arg1j2 - 2.0 / 3.0 * arho2b[j] * rhoa2i;
|
||||
drho3ds1 = a3 * rhoa3j * arg1i3 - a3a * rhoa3j * arg3i3;
|
||||
drho3ds2 = a3 * rhoa3i * arg1j3 - a3a * rhoa3i * arg3j3;
|
||||
|
||||
if (this->msmeamflag) {
|
||||
drho1mds1 = a1 * rhoa1mj * arg1i1m;
|
||||
drho1mds2 = a1 * rhoa1mi * arg1j1m;
|
||||
drho2mds1 = a2 * rhoa2mj * arg1i2m - 2.0 / 3.0 * arho2mb[i] * rhoa2mj;
|
||||
drho2mds2 = a2 * rhoa2mi * arg1j2m - 2.0 / 3.0 * arho2mb[j] * rhoa2mi;
|
||||
drho3mds1 = a3 * rhoa3mj * arg1i3m - a3a * rhoa3mj * arg3i3m;
|
||||
drho3mds2 = a3 * rhoa3mi * arg1j3m - a3a * rhoa3mi * arg3j3m;
|
||||
drho3mds1 *= -1;
|
||||
drho3mds2 *= -1;
|
||||
} else {
|
||||
drho1mds1 = 0.0;
|
||||
drho1mds2 = 0.0;
|
||||
drho2mds1 = 0.0;
|
||||
drho2mds2 = 0.0;
|
||||
drho3mds1 = 0.0;
|
||||
drho3mds2 = 0.0;
|
||||
}
|
||||
|
||||
if (this->ialloy == 1) {
|
||||
|
||||
a1i = fdiv_zero(rhoa0j, tsq_ave[i][0]);
|
||||
a1j = fdiv_zero(rhoa0i, tsq_ave[j][0]);
|
||||
a2i = fdiv_zero(rhoa0j, tsq_ave[i][1]);
|
||||
@ -379,19 +586,36 @@ MEAM::meam_force(int i, int eflag_global, int eflag_atom, int vflag_global, int
|
||||
dt3ds2 = aj * (t3mi - t3j);
|
||||
}
|
||||
|
||||
drhods1 = dgamma1[i] * drho0ds1 +
|
||||
dgamma2[i] * (dt1ds1 * rho1[i] + t1i * drho1ds1 + dt2ds1 * rho2[i] + t2i * drho2ds1 +
|
||||
dt3ds1 * rho3[i] + t3i * drho3ds1) -
|
||||
dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
|
||||
drhods2 = dgamma1[j] * drho0ds2 +
|
||||
dgamma2[j] * (dt1ds2 * rho1[j] + t1j * drho1ds2 + dt2ds2 * rho2[j] + t2j * drho2ds2 +
|
||||
dt3ds2 * rho3[j] + t3j * drho3ds2) -
|
||||
dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
|
||||
if (this->msmeamflag) {
|
||||
drhods1 = dgamma1[i] * drho0ds1 +
|
||||
dgamma2[i] * (dt1ds1 * rho1[i] + t1i * (drho1ds1 - drho1mds1) +
|
||||
dt2ds1 * rho2[i] + t2i * (drho2ds1 - drho2mds1) +
|
||||
dt3ds1 * rho3[i] + t3i * (drho3ds1 - drho3mds1)) -
|
||||
dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
|
||||
drhods2 = dgamma1[j] * drho0ds2 +
|
||||
dgamma2[j] * (dt1ds2 * rho1[j] + t1j * (drho1ds2 - drho1mds2) +
|
||||
dt2ds2 * rho2[j] + t2j * (drho2ds2 - drho2mds2) +
|
||||
dt3ds2 * rho3[j] + t3j * (drho3ds2 - drho3mds2)) -
|
||||
dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
|
||||
}
|
||||
else {
|
||||
drhods1 = dgamma1[i] * drho0ds1 +
|
||||
dgamma2[i] * (dt1ds1 * rho1[i] + t1i * drho1ds1 + dt2ds1 * rho2[i] + t2i * drho2ds1 +
|
||||
dt3ds1 * rho3[i] + t3i * drho3ds1) -
|
||||
dgamma3[i] * (shpi[0] * dt1ds1 + shpi[1] * dt2ds1 + shpi[2] * dt3ds1);
|
||||
drhods2 = dgamma1[j] * drho0ds2 +
|
||||
dgamma2[j] * (dt1ds2 * rho1[j] + t1j * drho1ds2 + dt2ds2 * rho2[j] + t2j * drho2ds2 +
|
||||
dt3ds2 * rho3[j] + t3j * drho3ds2) -
|
||||
dgamma3[j] * (shpj[0] * dt1ds2 + shpj[1] * dt2ds2 + shpj[2] * dt3ds2);
|
||||
}
|
||||
}
|
||||
|
||||
// Compute derivatives of energy wrt rij, sij and rij[3]
|
||||
// Compute derivatives of energy wrt rij, sij and rij[3]
|
||||
// MS-MEAM affects phip
|
||||
|
||||
dUdrij = phip * sij + frhop[i] * drhodr1 + frhop[j] * drhodr2;
|
||||
dUdsij = 0.0;
|
||||
|
||||
if (!iszero(dscrfcn[fnoffset + jn])) {
|
||||
dUdsij = phi + frhop[i] * drhods1 + frhop[j] * drhods2;
|
||||
}
|
||||
|
||||
@ -34,6 +34,11 @@ MEAM::MEAM(Memory* mem)
|
||||
gamma = dgamma1 = dgamma2 = dgamma3 = arho2b = nullptr;
|
||||
arho1 = arho2 = arho3 = arho3b = t_ave = tsq_ave = nullptr;
|
||||
|
||||
// msmeam arrays
|
||||
msmeamflag = 0;
|
||||
arho2mb = nullptr;
|
||||
arho1m = arho2m = arho3m = arho3mb = nullptr;
|
||||
|
||||
maxneigh = 0;
|
||||
scrfcn = dscrfcn = fcpair = nullptr;
|
||||
copymode = 0;
|
||||
@ -43,7 +48,9 @@ MEAM::MEAM(Memory* mem)
|
||||
A_meam[i] = rho0_meam[i] = beta0_meam[i] =
|
||||
beta1_meam[i]= beta2_meam[i] = beta3_meam[i] =
|
||||
t0_meam[i] = t1_meam[i] = t2_meam[i] = t3_meam[i] =
|
||||
rho_ref_meam[i] = ibar_meam[i] = ielt_meam[i] = 0.0;
|
||||
rho_ref_meam[i] = ibar_meam[i] = ielt_meam[i] =
|
||||
t1m_meam[i] = t2m_meam[i] = t3m_meam[i] =
|
||||
beta1m_meam[i] = beta2m_meam[i] = beta3m_meam[i] = 0.0;
|
||||
for (int j = 0; j < maxelt; j++) {
|
||||
lattce_meam[i][j] = FCC;
|
||||
Ec_meam[i][j] = re_meam[i][j] = alpha_meam[i][j] = delta_meam[i][j] = ebound_meam[i][j] = attrac_meam[i][j] = repuls_meam[i][j] = 0.0;
|
||||
@ -87,4 +94,13 @@ MEAM::~MEAM()
|
||||
memory->destroy(this->scrfcn);
|
||||
memory->destroy(this->dscrfcn);
|
||||
memory->destroy(this->fcpair);
|
||||
|
||||
// msmeam
|
||||
if (this->msmeamflag){
|
||||
memory->destroy(this->arho1m);
|
||||
memory->destroy(this->arho2m);
|
||||
memory->destroy(this->arho3m);
|
||||
memory->destroy(this->arho2mb);
|
||||
memory->destroy(this->arho3mb);
|
||||
}
|
||||
}
|
||||
|
||||
@ -220,7 +220,6 @@ void MEAM::compute_pair_meam()
|
||||
// loop over r values and compute
|
||||
for (j = 0; j < this->nr; j++) {
|
||||
r = j * this->dr;
|
||||
|
||||
this->phir[nv2][j] = phi_meam(r, a, b);
|
||||
|
||||
// if using second-nearest neighbor, solve recursive problem
|
||||
@ -333,9 +332,12 @@ double MEAM::phi_meam(double r, int a, int b)
|
||||
lattice_t latta /*unused:,lattb*/;
|
||||
double rho_bkgd1, rho_bkgd2;
|
||||
double b11s, b22s;
|
||||
// msmeam
|
||||
double t1m1av, t2m1av, t3m1av, t1m2av, t2m2av, t3m2av;
|
||||
double rho1m1, rho2m1, rho3m1;
|
||||
double rho1m2, rho2m2, rho3m2;
|
||||
|
||||
double phi_m = 0.0;
|
||||
|
||||
// Equation numbers below refer to:
|
||||
// I. Huang et.al., Modelling simul. Mater. Sci. Eng. 3:615
|
||||
|
||||
@ -345,8 +347,16 @@ double MEAM::phi_meam(double r, int a, int b)
|
||||
Z2 = get_Zij(this->lattce_meam[b][b]);
|
||||
Z12 = get_Zij(this->lattce_meam[a][b]);
|
||||
|
||||
get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32);
|
||||
|
||||
// this function has extra args for msmeam
|
||||
if (this->msmeamflag) {
|
||||
get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32,
|
||||
&rho1m1, &rho2m1, &rho3m1,
|
||||
&rho1m2, &rho2m2, &rho3m2);
|
||||
} else {
|
||||
get_densref(r, a, b, &rho01, &rho11, &rho21, &rho31, &rho02, &rho12, &rho22, &rho32,
|
||||
nullptr, nullptr, nullptr,
|
||||
nullptr, nullptr, nullptr);
|
||||
}
|
||||
// if densities are too small, numerical problems may result; just return zero
|
||||
if (rho01 <= 1e-14 && rho02 <= 1e-14)
|
||||
return 0.0;
|
||||
@ -374,6 +384,12 @@ double MEAM::phi_meam(double r, int a, int b)
|
||||
get_tavref(&t11av, &t21av, &t31av, &t12av, &t22av, &t32av, this->t1_meam[a], this->t2_meam[a],
|
||||
this->t3_meam[a], this->t1_meam[b], this->t2_meam[b], this->t3_meam[b], r, a, b,
|
||||
this->lattce_meam[a][b]);
|
||||
// with msmeam call twice with different sets of variables
|
||||
if (this->msmeamflag) {
|
||||
get_tavref(&t1m1av, &t2m1av, &t3m1av, &t1m2av, &t2m2av, &t3m2av, this->t1m_meam[a], this->t2m_meam[a],
|
||||
this->t3m_meam[a], this->t1m_meam[b], this->t2m_meam[b], this->t3m_meam[b], r, a, b,
|
||||
this->lattce_meam[a][b]);
|
||||
}
|
||||
}
|
||||
|
||||
// for c11b structure, calculate background electron densities
|
||||
@ -420,17 +436,33 @@ double MEAM::phi_meam(double r, int a, int b)
|
||||
rho0_1 = this->rho0_meam[a] * Z1 * G1;
|
||||
rho0_2 = this->rho0_meam[b] * Z2 * G2;
|
||||
}
|
||||
Gam1 = (t11av * rho11 + t21av * rho21 + t31av * rho31);
|
||||
if (rho01 < 1.0e-14)
|
||||
Gam1 = 0.0;
|
||||
else
|
||||
Gam1 = Gam1 / (rho01 * rho01);
|
||||
|
||||
Gam2 = (t12av * rho12 + t22av * rho22 + t32av * rho32);
|
||||
if (rho02 < 1.0e-14)
|
||||
Gam2 = 0.0;
|
||||
else
|
||||
Gam2 = Gam2 / (rho02 * rho02);
|
||||
if (this->msmeamflag) {
|
||||
// no additional use of t's here; all included in definitions of rho's for msmeam
|
||||
Gam1 = rho11 + rho21 + rho31 - (rho1m1 + rho2m1 + rho3m1);
|
||||
if (rho01 < 1.0e-14)
|
||||
Gam1 = 0.0;
|
||||
else
|
||||
Gam1 = Gam1 / (rho01 * rho01);
|
||||
Gam2 = rho12 + rho22 + rho32 - (rho1m2 + rho2m2 + rho3m2);
|
||||
if (rho02 < 1.0e-14)
|
||||
Gam2 = 0.0;
|
||||
else
|
||||
Gam2 = Gam2 / (rho02 * rho02);
|
||||
|
||||
} else {
|
||||
Gam1 = (t11av * rho11 + t21av * rho21 + t31av * rho31);
|
||||
if (rho01 < 1.0e-14)
|
||||
Gam1 = 0.0;
|
||||
else
|
||||
Gam1 = Gam1 / (rho01 * rho01);
|
||||
|
||||
Gam2 = (t12av * rho12 + t22av * rho22 + t32av * rho32);
|
||||
if (rho02 < 1.0e-14)
|
||||
Gam2 = 0.0;
|
||||
else
|
||||
Gam2 = Gam2 / (rho02 * rho02);
|
||||
}
|
||||
|
||||
G1 = G_gam(Gam1, this->ibar_meam[a], errorflag);
|
||||
G2 = G_gam(Gam2, this->ibar_meam[b], errorflag);
|
||||
@ -655,7 +687,9 @@ void MEAM::get_sijk(double C, int i, int j, int k, double* sijk)
|
||||
//------------------------------------------------------------------------------c
|
||||
// Calculate density functions, assuming reference configuration
|
||||
void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, double* rho21, double* rho31,
|
||||
double* rho02, double* rho12, double* rho22, double* rho32)
|
||||
double* rho02, double* rho12, double* rho22, double* rho32,
|
||||
double* rho1m1, double* rho2m1, double* rho3m1,
|
||||
double* rho1m2, double* rho2m2, double* rho3m2)
|
||||
{
|
||||
double a1, a2;
|
||||
double s[3];
|
||||
@ -666,18 +700,39 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
|
||||
double rhoa02, rhoa12, rhoa22, rhoa32;
|
||||
double arat, scrn, denom;
|
||||
double C, s111, s112, s221, S11, S22;
|
||||
// msmeam
|
||||
double rhoa1m1, rhoa2m1, rhoa3m1, rhoa1m2, rhoa2m2, rhoa3m2;
|
||||
|
||||
a1 = r / this->re_meam[a][a] - 1.0;
|
||||
a2 = r / this->re_meam[b][b] - 1.0;
|
||||
|
||||
rhoa01 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta0_meam[a] * a1);
|
||||
rhoa11 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1);
|
||||
rhoa21 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1);
|
||||
rhoa31 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1);
|
||||
rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2);
|
||||
rhoa12 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2);
|
||||
rhoa22 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2);
|
||||
rhoa32 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2);
|
||||
|
||||
if (this->msmeamflag) {
|
||||
// the rho variables are multiplied by t here since ialloy not needed in msmeam
|
||||
rhoa11 = this->rho0_meam[a] * this->t1_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1);
|
||||
rhoa21 = this->rho0_meam[a] * this->t2_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1);
|
||||
rhoa31 = this->rho0_meam[a] * this->t3_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1);
|
||||
rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2);
|
||||
rhoa12 = this->rho0_meam[b] * this->t1_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2);
|
||||
rhoa22 = this->rho0_meam[b] * this->t2_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2);
|
||||
rhoa32 = this->rho0_meam[b] * this->t3_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2);
|
||||
// msmeam specific rho vars
|
||||
rhoa1m1 = this->rho0_meam[a] * this->t1m_meam[a] * MathSpecial::fm_exp(-this->beta1m_meam[a] * a1);
|
||||
rhoa2m1 = this->rho0_meam[a] * this->t2m_meam[a] * MathSpecial::fm_exp(-this->beta2m_meam[a] * a1);
|
||||
rhoa3m1 = this->rho0_meam[a] * this->t3m_meam[a] * MathSpecial::fm_exp(-this->beta3m_meam[a] * a1);
|
||||
rhoa1m2 = this->rho0_meam[b] * this->t1m_meam[b] * MathSpecial::fm_exp(-this->beta1m_meam[b] * a2);
|
||||
rhoa2m2 = this->rho0_meam[b] * this->t2m_meam[b] * MathSpecial::fm_exp(-this->beta2m_meam[b] * a2);
|
||||
rhoa3m2 = this->rho0_meam[b] * this->t3m_meam[b] * MathSpecial::fm_exp(-this->beta3m_meam[b] * a2);
|
||||
} else {
|
||||
rhoa11 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta1_meam[a] * a1);
|
||||
rhoa21 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta2_meam[a] * a1);
|
||||
rhoa31 = this->rho0_meam[a] * MathSpecial::fm_exp(-this->beta3_meam[a] * a1);
|
||||
rhoa02 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta0_meam[b] * a2);
|
||||
rhoa12 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta1_meam[b] * a2);
|
||||
rhoa22 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta2_meam[b] * a2);
|
||||
rhoa32 = this->rho0_meam[b] * MathSpecial::fm_exp(-this->beta3_meam[b] * a2);
|
||||
}
|
||||
|
||||
lat = this->lattce_meam[a][b];
|
||||
|
||||
@ -689,7 +744,16 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
|
||||
*rho12 = 0.0;
|
||||
*rho22 = 0.0;
|
||||
*rho32 = 0.0;
|
||||
if (this->msmeamflag) {
|
||||
*rho1m1 = 0.0;
|
||||
*rho2m1 = 0.0;
|
||||
*rho3m1 = 0.0;
|
||||
*rho1m2 = 0.0;
|
||||
*rho2m2 = 0.0;
|
||||
*rho3m2 = 0.0;
|
||||
}
|
||||
|
||||
// keep track of density components separately; combine in the calling subroutine
|
||||
switch (lat) {
|
||||
case FCC:
|
||||
*rho01 = 12.0 * rhoa02;
|
||||
@ -710,12 +774,20 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
|
||||
*rho02 = 4.0 * rhoa01;
|
||||
*rho31 = 32.0 / 9.0 * rhoa32 * rhoa32;
|
||||
*rho32 = 32.0 / 9.0 * rhoa31 * rhoa31;
|
||||
if (this->msmeamflag) {
|
||||
*rho3m1 = 32.0 / 9.0 * rhoa3m2 * rhoa3m2;
|
||||
*rho3m2 = 32.0 / 9.0 * rhoa3m1 * rhoa3m1;
|
||||
}
|
||||
break;
|
||||
case HCP:
|
||||
*rho01 = 12 * rhoa02;
|
||||
*rho02 = 12 * rhoa01;
|
||||
*rho31 = 1.0 / 3.0 * rhoa32 * rhoa32;
|
||||
*rho32 = 1.0 / 3.0 * rhoa31 * rhoa31;
|
||||
if (this->msmeamflag) {
|
||||
*rho3m1 = 1.0 / 3.0 * rhoa3m2 * rhoa3m2;
|
||||
*rho3m2 = 1.0 / 3.0 * rhoa3m1 * rhoa3m1;
|
||||
}
|
||||
break;
|
||||
case DIM:
|
||||
get_shpfcn(DIM, 0, 0, s);
|
||||
@ -727,6 +799,14 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
|
||||
*rho22 = s[1] * rhoa21 * rhoa21;
|
||||
*rho31 = s[2] * rhoa32 * rhoa32;
|
||||
*rho32 = s[2] * rhoa31 * rhoa31;
|
||||
if (this->msmeamflag) {
|
||||
*rho1m1 = s[0] * rhoa1m2 * rhoa1m2;
|
||||
*rho1m2 = s[0] * rhoa1m1 * rhoa1m1;
|
||||
*rho2m1 = s[1] * rhoa2m2 * rhoa2m2;
|
||||
*rho2m2 = s[1] * rhoa2m1 * rhoa2m1;
|
||||
*rho3m1 = s[2] * rhoa3m2 * rhoa3m2;
|
||||
*rho3m2 = s[2] * rhoa3m1 * rhoa3m1;
|
||||
}
|
||||
break;
|
||||
case C11:
|
||||
*rho01 = rhoa01;
|
||||
@ -737,17 +817,28 @@ void MEAM::get_densref(double r, int a, int b, double* rho01, double* rho11, dou
|
||||
*rho22 = rhoa22;
|
||||
*rho31 = rhoa31;
|
||||
*rho32 = rhoa32;
|
||||
if (this->msmeamflag) {
|
||||
*rho1m1 = rhoa1m1;
|
||||
*rho1m2 = rhoa1m2;
|
||||
*rho2m1 = rhoa2m1;
|
||||
*rho2m2 = rhoa2m2;
|
||||
*rho3m1 = rhoa3m1;
|
||||
*rho3m2 = rhoa3m2;
|
||||
}
|
||||
break;
|
||||
case L12:
|
||||
*rho01 = 8 * rhoa01 + 4 * rhoa02;
|
||||
*rho02 = 12 * rhoa01;
|
||||
if (this->ialloy == 1) {
|
||||
if (this->ialloy ==1){
|
||||
*rho21 = 8. / 3. * MathSpecial::square(rhoa21 * this->t2_meam[a] - rhoa22 * this->t2_meam[b]);
|
||||
denom = 8 * rhoa01 * MathSpecial::square(this->t2_meam[a]) + 4 * rhoa02 * MathSpecial::square(this->t2_meam[b]);
|
||||
if (denom > 0.)
|
||||
*rho21 = *rho21 / denom * *rho01;
|
||||
} else
|
||||
*rho21 = 8. / 3. * (rhoa21 - rhoa22) * (rhoa21 - rhoa22);
|
||||
if (this->msmeamflag) {
|
||||
*rho2m1 = 8. / 3. * (rhoa2m1 - rhoa2m2) * (rhoa2m1 - rhoa2m2);
|
||||
}
|
||||
break;
|
||||
case B2:
|
||||
*rho01 = 8.0 * rhoa02;
|
||||
@ -864,6 +955,7 @@ void MEAM::interpolate_meam(int ind)
|
||||
this->rdrar = 1.0 / drar;
|
||||
|
||||
// phir interp
|
||||
|
||||
for (j = 0; j < this->nrar; j++) {
|
||||
this->phirar[ind][j] = this->phir[ind][j];
|
||||
}
|
||||
|
||||
@ -36,7 +36,8 @@ void
|
||||
MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*/, double* alpha,
|
||||
double* b0, double* b1, double* b2, double* b3, double* alat, double* esub,
|
||||
double* asub, double* t0, double* t1, double* t2, double* t3, double* rozero,
|
||||
int* ibar)
|
||||
int* ibar, double* b1m, double *b2m, double *b3m, double *t1m, double *t2m,
|
||||
double *t3m)
|
||||
{
|
||||
|
||||
int i;
|
||||
@ -53,6 +54,11 @@ MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*
|
||||
this->beta1_meam[i] = b1[i];
|
||||
this->beta2_meam[i] = b2[i];
|
||||
this->beta3_meam[i] = b3[i];
|
||||
if (this->msmeamflag){
|
||||
this->beta1m_meam[i] = b1m[i];
|
||||
this->beta2m_meam[i] = b2m[i];
|
||||
this->beta3m_meam[i] = b3m[i];
|
||||
}
|
||||
tmplat[i] = alat[i];
|
||||
this->Ec_meam[i][i] = esub[i];
|
||||
this->A_meam[i] = asub[i];
|
||||
@ -60,6 +66,11 @@ MEAM::meam_setup_global(int nelt, lattice_t* lat, int* ielement, double* /*atwt*
|
||||
this->t1_meam[i] = t1[i];
|
||||
this->t2_meam[i] = t2[i];
|
||||
this->t3_meam[i] = t3[i];
|
||||
if (this->msmeamflag){
|
||||
this->t1m_meam[i] = t1m[i];
|
||||
this->t2m_meam[i] = t2m[i];
|
||||
this->t3m_meam[i] = t3m[i];
|
||||
}
|
||||
this->rho0_meam[i] = rozero[i];
|
||||
this->ibar_meam[i] = ibar[i];
|
||||
|
||||
|
||||
@ -58,13 +58,12 @@ PairMEAM::PairMEAM(LAMMPS *lmp) : Pair(lmp)
|
||||
allocated = 0;
|
||||
|
||||
nlibelements = 0;
|
||||
|
||||
meam_inst = new MEAM(memory);
|
||||
meam_inst->msmeamflag = msmeamflag = 0;
|
||||
myname = "meam";
|
||||
|
||||
scale = nullptr;
|
||||
|
||||
// set comm size needed by this Pair
|
||||
|
||||
comm_forward = 38;
|
||||
comm_reverse = 30;
|
||||
}
|
||||
|
||||
/* ----------------------------------------------------------------------
|
||||
@ -93,7 +92,6 @@ void PairMEAM::compute(int eflag, int vflag)
|
||||
int i,ii,n,inum_half,errorflag;
|
||||
int *ilist_half,*numneigh_half,**firstneigh_half;
|
||||
int *numneigh_full,**firstneigh_full;
|
||||
|
||||
ev_init(eflag,vflag);
|
||||
|
||||
// neighbor list info
|
||||
@ -133,7 +131,6 @@ void PairMEAM::compute(int eflag, int vflag)
|
||||
|
||||
int offset = 0;
|
||||
errorflag = 0;
|
||||
|
||||
for (ii = 0; ii < inum_half; ii++) {
|
||||
i = ilist_half[ii];
|
||||
meam_inst->meam_dens_init(i,ntype,type,map,x,
|
||||
@ -142,9 +139,7 @@ void PairMEAM::compute(int eflag, int vflag)
|
||||
offset);
|
||||
offset += numneigh_half[i];
|
||||
}
|
||||
|
||||
comm->reverse_comm(this);
|
||||
|
||||
meam_inst->meam_dens_final(nlocal,eflag_either,eflag_global,eflag_atom,
|
||||
&eng_vdwl,eatom,ntype,type,map,scale,errorflag);
|
||||
if (errorflag)
|
||||
@ -159,7 +154,6 @@ void PairMEAM::compute(int eflag, int vflag)
|
||||
|
||||
double **vptr = nullptr;
|
||||
if (vflag_atom) vptr = vatom;
|
||||
|
||||
for (ii = 0; ii < inum_half; ii++) {
|
||||
i = ilist_half[ii];
|
||||
meam_inst->meam_force(i,eflag_global,eflag_atom,vflag_global,
|
||||
@ -169,7 +163,6 @@ void PairMEAM::compute(int eflag, int vflag)
|
||||
offset,f,vptr,virial);
|
||||
offset += numneigh_half[i];
|
||||
}
|
||||
|
||||
if (vflag_fdotr) virial_fdotr_compute();
|
||||
}
|
||||
|
||||
@ -193,7 +186,17 @@ void PairMEAM::allocate()
|
||||
|
||||
void PairMEAM::settings(int narg, char ** /*arg*/)
|
||||
{
|
||||
if (narg != 0) error->all(FLERR,"Illegal pair_style command");
|
||||
if (narg != 0) error->all(FLERR,"Illegal pair_style {} command", myname);
|
||||
|
||||
// set comm size needed by this Pair
|
||||
|
||||
if (msmeamflag) {
|
||||
comm_forward = 38+23; // plus 23 for msmeam
|
||||
comm_reverse = 30+23; // plus 23 for msmeam
|
||||
} else {
|
||||
comm_forward = 38;
|
||||
comm_reverse = 30;
|
||||
}
|
||||
}
|
||||
|
||||
/* ----------------------------------------------------------------------
|
||||
@ -206,12 +209,7 @@ void PairMEAM::coeff(int narg, char **arg)
|
||||
|
||||
if (!allocated) allocate();
|
||||
|
||||
if (narg < 6) error->all(FLERR,"Incorrect args for pair coefficients");
|
||||
|
||||
// ensure I,J args are * *
|
||||
|
||||
if (strcmp(arg[0],"*") != 0 || strcmp(arg[1],"*") != 0)
|
||||
error->all(FLERR,"Incorrect args for pair coefficients");
|
||||
if (narg < 6) error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
|
||||
|
||||
// check for presence of first meam file
|
||||
|
||||
@ -239,7 +237,7 @@ void PairMEAM::coeff(int narg, char **arg)
|
||||
}
|
||||
if (paridx < 0) error->all(FLERR,"No MEAM parameter file in pair coefficients");
|
||||
if ((narg - paridx - 1) != atom->ntypes)
|
||||
error->all(FLERR,"Incorrect args for pair coefficients");
|
||||
error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
|
||||
|
||||
// MEAM element names between 2 filenames
|
||||
// nlibelements = # of MEAM elements
|
||||
@ -282,7 +280,7 @@ void PairMEAM::coeff(int narg, char **arg)
|
||||
if (libelements[j] == arg[i]) break;
|
||||
if (j < nlibelements) map[m] = j;
|
||||
else if (strcmp(arg[i],"NULL") == 0) map[m] = -1;
|
||||
else error->all(FLERR,"Incorrect args for pair coefficients");
|
||||
else error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
|
||||
}
|
||||
|
||||
// clear setflag since coeff() called once with I,J = * *
|
||||
@ -307,7 +305,7 @@ void PairMEAM::coeff(int narg, char **arg)
|
||||
}
|
||||
}
|
||||
|
||||
if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
|
||||
if (count == 0) error->all(FLERR,"Incorrect args for pair style {} coefficients", myname);
|
||||
}
|
||||
|
||||
/* ----------------------------------------------------------------------
|
||||
@ -317,7 +315,7 @@ void PairMEAM::coeff(int narg, char **arg)
|
||||
void PairMEAM::init_style()
|
||||
{
|
||||
if (force->newton_pair == 0)
|
||||
error->all(FLERR,"Pair style MEAM requires newton pair on");
|
||||
error->all(FLERR,"Pair style {} requires newton pair on", myname);
|
||||
|
||||
// need a full and a half neighbor list
|
||||
|
||||
@ -360,7 +358,9 @@ void PairMEAM::read_files(const std::string &globalfile,
|
||||
|
||||
void PairMEAM::read_global_meam_file(const std::string &globalfile)
|
||||
{
|
||||
|
||||
// allocate parameter arrays
|
||||
|
||||
std::vector<lattice_t> lat(nlibelements);
|
||||
std::vector<int> ielement(nlibelements);
|
||||
std::vector<int> ibar(nlibelements);
|
||||
@ -381,6 +381,15 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
|
||||
std::vector<double> rozero(nlibelements);
|
||||
std::vector<bool> found(nlibelements, false);
|
||||
|
||||
// allocate 6 extra arrays for msmeam
|
||||
|
||||
std::vector<double> b1m(nlibelements);
|
||||
std::vector<double> b2m(nlibelements);
|
||||
std::vector<double> b3m(nlibelements);
|
||||
std::vector<double> t1m(nlibelements);
|
||||
std::vector<double> t2m(nlibelements);
|
||||
std::vector<double> t3m(nlibelements);
|
||||
|
||||
// open global meamf file on proc 0
|
||||
|
||||
if (comm->me == 0) {
|
||||
@ -416,8 +425,7 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
|
||||
std::string lattice_type = values.next_string();
|
||||
|
||||
if (!MEAM::str_to_lat(lattice_type, true, lat[index]))
|
||||
error->one(FLERR,"Unrecognized lattice type in MEAM "
|
||||
"library file: {}", lattice_type);
|
||||
error->one(FLERR,"Unrecognized lattice type in MEAM library file: {}", lattice_type);
|
||||
|
||||
// store parameters
|
||||
|
||||
@ -429,6 +437,11 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
|
||||
b1[index] = values.next_double();
|
||||
b2[index] = values.next_double();
|
||||
b3[index] = values.next_double();
|
||||
if (msmeamflag) {
|
||||
b1m[index] = values.next_double();
|
||||
b2m[index] = values.next_double();
|
||||
b3m[index] = values.next_double();
|
||||
}
|
||||
alat[index] = values.next_double();
|
||||
esub[index] = values.next_double();
|
||||
asub[index] = values.next_double();
|
||||
@ -436,15 +449,20 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
|
||||
t1[index] = values.next_double();
|
||||
t2[index] = values.next_double();
|
||||
t3[index] = values.next_double();
|
||||
if (msmeamflag) {
|
||||
t1m[index] = values.next_double();
|
||||
t2m[index] = values.next_double();
|
||||
t3m[index] = values.next_double();
|
||||
}
|
||||
rozero[index] = values.next_double();
|
||||
ibar[index] = values.next_int();
|
||||
|
||||
if (!isone(t0[index]))
|
||||
error->one(FLERR,"Unsupported parameter in MEAM library file: t0!=1");
|
||||
error->one(FLERR,"Unsupported parameter in MEAM library file: t0 != 1");
|
||||
|
||||
// z given is ignored: if this is mismatched, we definitely won't do what the user said -> fatal error
|
||||
if (z[index] != MEAM::get_Zij(lat[index]))
|
||||
error->one(FLERR,"Mismatched parameter in MEAM library file: z!=lat");
|
||||
error->one(FLERR,"Mismatched parameter in MEAM library file: z != lat");
|
||||
|
||||
nset++;
|
||||
} catch (TokenizerException &e) {
|
||||
@ -484,13 +502,29 @@ void PairMEAM::read_global_meam_file(const std::string &globalfile)
|
||||
MPI_Bcast(t2.data(), nlibelements, MPI_DOUBLE, 0, world);
|
||||
MPI_Bcast(t3.data(), nlibelements, MPI_DOUBLE, 0, world);
|
||||
MPI_Bcast(rozero.data(), nlibelements, MPI_DOUBLE, 0, world);
|
||||
// distribute msmeam parameter sets
|
||||
MPI_Bcast(b1m.data(), nlibelements, MPI_DOUBLE, 0, world);
|
||||
MPI_Bcast(b2m.data(), nlibelements, MPI_DOUBLE, 0, world);
|
||||
MPI_Bcast(b3m.data(), nlibelements, MPI_DOUBLE, 0, world);
|
||||
MPI_Bcast(t1m.data(), nlibelements, MPI_DOUBLE, 0, world);
|
||||
MPI_Bcast(t2m.data(), nlibelements, MPI_DOUBLE, 0, world);
|
||||
MPI_Bcast(t3m.data(), nlibelements, MPI_DOUBLE, 0, world);
|
||||
|
||||
// pass element parameters to MEAM package
|
||||
|
||||
meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(),
|
||||
alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(),
|
||||
alat.data(), esub.data(), asub.data(), t0.data(), t1.data(),
|
||||
t2.data(), t3.data(), rozero.data(), ibar.data());
|
||||
if (msmeamflag) {
|
||||
meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(),
|
||||
alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(),
|
||||
alat.data(), esub.data(), asub.data(), t0.data(), t1.data(),
|
||||
t2.data(), t3.data(), rozero.data(), ibar.data(), b1m.data(),
|
||||
b2m.data(), b3m.data(), t1m.data(), t2m.data(), t3m.data());
|
||||
} else {
|
||||
meam_inst->meam_setup_global(nlibelements, lat.data(), ielement.data(), atwt.data(),
|
||||
alpha.data(), b0.data(), b1.data(), b2.data(), b3.data(),
|
||||
alat.data(), esub.data(), asub.data(), t0.data(), t1.data(),
|
||||
t2.data(), t3.data(), rozero.data(), ibar.data(), nullptr,
|
||||
nullptr, nullptr, nullptr, nullptr, nullptr);
|
||||
}
|
||||
|
||||
// set element masses
|
||||
|
||||
@ -613,6 +647,23 @@ int PairMEAM::pack_forward_comm(int n, int *list, double *buf,
|
||||
buf[m++] = meam_inst->tsq_ave[j][0];
|
||||
buf[m++] = meam_inst->tsq_ave[j][1];
|
||||
buf[m++] = meam_inst->tsq_ave[j][2];
|
||||
if (msmeamflag) {
|
||||
buf[m++] = meam_inst->arho2mb[j];
|
||||
buf[m++] = meam_inst->arho1m[j][0];
|
||||
buf[m++] = meam_inst->arho1m[j][1];
|
||||
buf[m++] = meam_inst->arho1m[j][2];
|
||||
buf[m++] = meam_inst->arho2m[j][0];
|
||||
buf[m++] = meam_inst->arho2m[j][1];
|
||||
buf[m++] = meam_inst->arho2m[j][2];
|
||||
buf[m++] = meam_inst->arho2m[j][3];
|
||||
buf[m++] = meam_inst->arho2m[j][4];
|
||||
buf[m++] = meam_inst->arho2m[j][5];
|
||||
for (k = 0; k < 10; k++) buf[m++] = meam_inst->arho3m[j][k];
|
||||
buf[m++] = meam_inst->arho3mb[j][0];
|
||||
buf[m++] = meam_inst->arho3mb[j][1];
|
||||
buf[m++] = meam_inst->arho3mb[j][2];
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
return m;
|
||||
@ -656,6 +707,22 @@ void PairMEAM::unpack_forward_comm(int n, int first, double *buf)
|
||||
meam_inst->tsq_ave[i][0] = buf[m++];
|
||||
meam_inst->tsq_ave[i][1] = buf[m++];
|
||||
meam_inst->tsq_ave[i][2] = buf[m++];
|
||||
if (msmeamflag) {
|
||||
meam_inst->arho2mb[i] = buf[m++];
|
||||
meam_inst->arho1m[i][0] = buf[m++];
|
||||
meam_inst->arho1m[i][1] = buf[m++];
|
||||
meam_inst->arho1m[i][2] = buf[m++];
|
||||
meam_inst->arho2m[i][0] = buf[m++];
|
||||
meam_inst->arho2m[i][1] = buf[m++];
|
||||
meam_inst->arho2m[i][2] = buf[m++];
|
||||
meam_inst->arho2m[i][3] = buf[m++];
|
||||
meam_inst->arho2m[i][4] = buf[m++];
|
||||
meam_inst->arho2m[i][5] = buf[m++];
|
||||
for (k = 0; k < 10; k++) meam_inst->arho3m[i][k] = buf[m++];
|
||||
meam_inst->arho3mb[i][0] = buf[m++];
|
||||
meam_inst->arho3mb[i][1] = buf[m++];
|
||||
meam_inst->arho3mb[i][2] = buf[m++];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -689,6 +756,22 @@ int PairMEAM::pack_reverse_comm(int n, int first, double *buf)
|
||||
buf[m++] = meam_inst->tsq_ave[i][0];
|
||||
buf[m++] = meam_inst->tsq_ave[i][1];
|
||||
buf[m++] = meam_inst->tsq_ave[i][2];
|
||||
if (msmeamflag) {
|
||||
buf[m++] = meam_inst->arho2mb[i];
|
||||
buf[m++] = meam_inst->arho1m[i][0];
|
||||
buf[m++] = meam_inst->arho1m[i][1];
|
||||
buf[m++] = meam_inst->arho1m[i][2];
|
||||
buf[m++] = meam_inst->arho2m[i][0];
|
||||
buf[m++] = meam_inst->arho2m[i][1];
|
||||
buf[m++] = meam_inst->arho2m[i][2];
|
||||
buf[m++] = meam_inst->arho2m[i][3];
|
||||
buf[m++] = meam_inst->arho2m[i][4];
|
||||
buf[m++] = meam_inst->arho2m[i][5];
|
||||
for (k = 0; k < 10; k++) buf[m++] = meam_inst->arho3m[i][k];
|
||||
buf[m++] = meam_inst->arho3mb[i][0];
|
||||
buf[m++] = meam_inst->arho3mb[i][1];
|
||||
buf[m++] = meam_inst->arho3mb[i][2];
|
||||
}
|
||||
}
|
||||
|
||||
return m;
|
||||
@ -724,7 +807,25 @@ void PairMEAM::unpack_reverse_comm(int n, int *list, double *buf)
|
||||
meam_inst->tsq_ave[j][0] += buf[m++];
|
||||
meam_inst->tsq_ave[j][1] += buf[m++];
|
||||
meam_inst->tsq_ave[j][2] += buf[m++];
|
||||
if (msmeamflag) {
|
||||
meam_inst->arho2mb[j] += buf[m++];
|
||||
meam_inst->arho1m[j][0] += buf[m++];
|
||||
meam_inst->arho1m[j][1] += buf[m++];
|
||||
meam_inst->arho1m[j][2] += buf[m++];
|
||||
meam_inst->arho2m[j][0] += buf[m++];
|
||||
meam_inst->arho2m[j][1] += buf[m++];
|
||||
meam_inst->arho2m[j][2] += buf[m++];
|
||||
meam_inst->arho2m[j][3] += buf[m++];
|
||||
meam_inst->arho2m[j][4] += buf[m++];
|
||||
meam_inst->arho2m[j][5] += buf[m++];
|
||||
for (k = 0; k < 10; k++) meam_inst->arho3m[j][k] += buf[m++];
|
||||
meam_inst->arho3mb[j][0] += buf[m++];
|
||||
meam_inst->arho3mb[j][1] += buf[m++];
|
||||
meam_inst->arho3mb[j][2] += buf[m++];
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
|
||||
/* ----------------------------------------------------------------------
|
||||
|
||||
@ -47,6 +47,8 @@ class PairMEAM : public Pair {
|
||||
class MEAM *meam_inst;
|
||||
double cutmax; // max cutoff for all elements
|
||||
int nlibelements; // # of library elements
|
||||
int msmeamflag; // 0 (default) for normal MEAM, 1 for MS-MEAM
|
||||
std::string myname; // name of the pair style
|
||||
std::vector<std::string> libelements; // names of library elements
|
||||
std::vector<double> mass; // mass of library element
|
||||
|
||||
|
||||
25
src/MEAM/pair_meam_ms.cpp
Normal file
25
src/MEAM/pair_meam_ms.cpp
Normal file
@ -0,0 +1,25 @@
|
||||
/* ----------------------------------------------------------------------
|
||||
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
|
||||
https://www.lammps.org/ Sandia National Laboratories
|
||||
LAMMPS development team: developers@lammps.org
|
||||
|
||||
Copyright (2003) Sandia Corporation. Under the terms of Contract
|
||||
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
|
||||
certain rights in this software. This software is distributed under
|
||||
the GNU General Public License.
|
||||
|
||||
See the README file in the top-level LAMMPS directory.
|
||||
------------------------------------------------------------------------- */
|
||||
|
||||
#include "pair_meam_ms.h"
|
||||
#include "meam.h"
|
||||
|
||||
using namespace LAMMPS_NS;
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
|
||||
PairMEAMMS::PairMEAMMS(LAMMPS *lmp) : PairMEAM(lmp)
|
||||
{
|
||||
meam_inst->msmeamflag = msmeamflag = 1;
|
||||
myname = "meam/ms";
|
||||
}
|
||||
33
src/MEAM/pair_meam_ms.h
Normal file
33
src/MEAM/pair_meam_ms.h
Normal file
@ -0,0 +1,33 @@
|
||||
/* -*- c++ -*- ----------------------------------------------------------
|
||||
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
|
||||
https://www.lammps.org/ Sandia National Laboratories
|
||||
LAMMPS development team: developers@lammps.org
|
||||
|
||||
Copyright (2003) Sandia Corporation. Under the terms of Contract
|
||||
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
|
||||
certain rights in this software. This software is distributed under
|
||||
the GNU General Public License.
|
||||
|
||||
See the README file in the top-level LAMMPS directory.
|
||||
------------------------------------------------------------------------- */
|
||||
|
||||
#ifdef PAIR_CLASS
|
||||
// clang-format off
|
||||
PairStyle(meam/ms,PairMEAMMS);
|
||||
// clang-format on
|
||||
#else
|
||||
|
||||
#ifndef LMP_PAIR_MEAM_MS_H
|
||||
#define LMP_PAIR_MEAM_MS_H
|
||||
|
||||
#include "pair_meam.h"
|
||||
|
||||
namespace LAMMPS_NS {
|
||||
|
||||
class PairMEAMMS : public PairMEAM {
|
||||
public:
|
||||
PairMEAMMS(class LAMMPS *);
|
||||
};
|
||||
} // namespace LAMMPS_NS
|
||||
#endif
|
||||
#endif
|
||||
94
unittest/force-styles/tests/atomic-pair-meam_ms.yaml
Normal file
94
unittest/force-styles/tests/atomic-pair-meam_ms.yaml
Normal file
@ -0,0 +1,94 @@
|
||||
---
|
||||
lammps_version: 22 Dec 2022
|
||||
tags: slow
|
||||
date_generated: Thu Jan 26 15:27:03 2023
|
||||
epsilon: 2.5e-12
|
||||
skip_tests:
|
||||
prerequisites: ! |
|
||||
pair meam/ms
|
||||
pre_commands: ! |
|
||||
variable newton_pair delete
|
||||
if "$(is_active(package,gpu)) > 0.0" then "variable newton_pair index off" else "variable newton_pair index on"
|
||||
post_commands: ! ""
|
||||
input_file: in.metal
|
||||
pair_style: meam/ms
|
||||
pair_coeff: ! |
|
||||
* * library.msmeam H Ga4 HGa.msmeam H Ga4
|
||||
extract: ! |
|
||||
scale 2
|
||||
natoms: 32
|
||||
init_vdwl: 785.6030480758675
|
||||
init_coul: 0
|
||||
init_stress: ! |2-
|
||||
3.3502530994900699e+03 3.6405858278699407e+03 3.6349804214165547e+03 -3.1609283411508039e+02 -7.9448207656135153e+01 -1.9854140603340727e+02
|
||||
init_forces: ! |2
|
||||
1 1.2872255079741514e+01 -7.5031848810810864e-01 4.5969595156096510e+01
|
||||
2 -3.9028679722038632e+01 -1.5647800180326567e+02 -1.6643992152928173e+00
|
||||
3 -6.1521549955194672e+01 2.6970968316419874e+02 -9.6866430262650326e+01
|
||||
4 3.1462579880342336e+01 4.0240291291218455e+01 1.1654869213327775e+01
|
||||
5 1.4859248182951113e+01 -3.4132880749392825e+01 6.7430378007130244e+01
|
||||
6 6.4609571260694096e+00 -3.8973222482916441e+01 -2.8510000379627442e+01
|
||||
7 7.8114612113500250e+00 -1.0421431668544374e+01 -4.2887607385766536e+01
|
||||
8 -4.8934215863351795e+01 -6.3567347969802590e-01 1.1845972792272754e+02
|
||||
9 9.4089549606898402e+01 -7.4342942103394511e+00 2.5331198575951383e+01
|
||||
10 1.5130369934140692e+01 -5.9245630928969938e+01 -6.7469126603400198e+01
|
||||
11 -2.5176547213746847e+01 1.1577205529172168e+02 -2.2897457133540517e+01
|
||||
12 6.2237686199502349e+01 2.0501996047945163e+01 -2.8805091517252826e+01
|
||||
13 -5.9438589221526925e+01 3.0453092653824072e+01 -1.9919245831196157e+01
|
||||
14 6.9128305482543766e+01 -7.7400771634148342e+01 3.3376079908119145e+01
|
||||
15 -4.9671207786831857e+01 -4.9520814527298228e+01 8.4325181097614305e+01
|
||||
16 -1.1782591146017666e+01 -3.2478963020209051e+01 1.5503663677714293e+01
|
||||
17 9.0881787245915220e+00 6.2377477671714963e+01 -4.0411006180232363e+01
|
||||
18 -4.2285082775720454e+01 2.4883979527636967e+01 -4.4858149086530510e+00
|
||||
19 -8.0259798420493979e+01 9.6356660229207137e+01 6.0543230952477984e+01
|
||||
20 8.0924547938759346e+01 7.1034504027236025e+01 -7.1958482512489610e+01
|
||||
21 1.0833434220705425e+02 -1.5973910256481020e+02 -2.5432700070393153e+01
|
||||
22 -2.3754601906353900e+00 5.2216955012971823e+01 4.7112051341131576e+00
|
||||
23 -2.7227169255996543e+01 8.1968603165764222e+01 4.6535834898716878e+01
|
||||
24 -2.9230758067555616e+01 6.5909555829367733e+01 -2.8250697734131258e+01
|
||||
25 -5.1310041582953993e+01 -3.0895272949222822e+01 -5.4271286813003794e+00
|
||||
26 3.9605941911194620e+01 -5.5919050176828883e+01 -1.0209061328106253e+01
|
||||
27 8.2934427989660890e+01 6.1956200199325636e+01 5.0072108788590960e+01
|
||||
28 -7.8572755094413296e+01 -3.9613391730681300e+01 -2.6183413623428891e+00
|
||||
29 6.9475725072041925e+01 -6.0535433603583563e+01 -1.4566536349135829e+01
|
||||
30 -2.4347184151182930e+01 -1.9359391333689970e+02 -2.6718379302915952e+01
|
||||
31 7.7351971629808688e+01 -7.0102650745312999e+01 -5.4615048867524763e+01
|
||||
32 -1.5060591772899014e+02 8.4489763988097266e+01 2.9799482293372058e+01
|
||||
run_vdwl: 682.3107192428497
|
||||
run_coul: 0
|
||||
run_stress: ! |2-
|
||||
3.2247564044913129e+03 3.3749506031067485e+03 3.3223794967215117e+03 -2.8460979167554797e+02 -7.2614457076660575e+00 -3.1510685747732862e+02
|
||||
run_forces: ! |2
|
||||
1 -1.2037185973996296e+01 -2.5090364403764944e+01 1.4014184973113366e+01
|
||||
2 -3.7365848425239264e+01 -1.5871199357658887e+02 3.7846333470446991e+00
|
||||
3 -3.2057228694304293e+01 2.5316344962361612e+02 -6.0679585186816752e+01
|
||||
4 2.9086197614116237e+01 4.8267528016068823e+01 4.3387429619749920e+00
|
||||
5 -1.1672554618399744e+01 -2.6840760926124332e+01 4.9694308545223279e+01
|
||||
6 1.1892092913978592e+01 -4.9360840569608243e+01 -2.3083171938147949e+01
|
||||
7 2.1084251901459215e+01 -4.8251731643401072e+00 -3.8474871193885967e+01
|
||||
8 -5.7775944085787714e+01 1.3522956442661442e+01 1.1661345819661486e+02
|
||||
9 7.2926105059437930e+01 4.8686056096860133e+00 2.3817134806042311e+01
|
||||
10 1.7307367990304396e+01 -3.0865570121704572e+01 -1.2314307646704794e+01
|
||||
11 -1.1341297645054201e+01 9.1441145595173211e+01 -2.1806407500802493e+01
|
||||
12 4.0645024127126625e+01 1.2207243511090397e+01 -2.6757649464936929e+01
|
||||
13 -5.2283270287937697e+01 3.4023912643812679e+01 -1.9030352703627774e+01
|
||||
14 8.4403128243303399e+01 -9.3773678297574406e+01 1.6481720093363641e+01
|
||||
15 -4.2790833192154764e+01 -4.3242943642279130e+01 7.1075696811865868e+01
|
||||
16 -1.5041912007490836e+01 -3.3544044565611586e+01 2.4823109532967212e+01
|
||||
17 -9.6413207346836316e-01 4.5826021602656141e+01 -3.9155163702194102e+01
|
||||
18 -2.0337015515785971e+01 7.2815285567550134e+00 -8.2049879725129813e+00
|
||||
19 -6.4105384732081120e+01 1.1564665740933788e+02 2.4163791756721466e+01
|
||||
20 8.5723654185276146e+01 8.3354105531647818e+01 -6.6380939444134356e+01
|
||||
21 7.2614253221132458e+01 -1.0858997173537107e+02 -9.7505297776024449e+00
|
||||
22 -7.0420361713052930e+00 5.3431098224890221e+01 3.3089063930822551e+00
|
||||
23 -2.6591358240682062e+01 5.7408565880721866e+01 2.7437106471305679e+01
|
||||
24 -4.1792038450554799e+01 5.1730557789864775e+01 -4.0814677464080816e+01
|
||||
25 -4.1432062506590214e+01 -2.5839213423062226e+01 4.2240164846210408e+00
|
||||
26 4.7210066329871566e+01 -5.2462761136081880e+01 -7.3222050314410501e+00
|
||||
27 7.1880187551772764e+01 6.4264938765955392e+01 4.3600944370341068e+01
|
||||
28 -8.4540787660053340e+01 -3.5402262816619938e+01 -1.8100280797937039e+01
|
||||
29 6.9538301274653790e+01 -6.3441028093040622e+01 -1.4636386232064458e+01
|
||||
30 -1.0347208112535196e+01 -1.7647584813608077e+02 7.2581082578181517e+00
|
||||
31 5.5139777976761025e+01 -4.2081916983382541e+01 -4.6602437208067727e+01
|
||||
32 -1.0993230999577290e+02 3.4110056387297462e+01 1.8478090262857769e+01
|
||||
...
|
||||
Reference in New Issue
Block a user