cd3a00c2c4
Added timing breakdown for fphi_uind
2022-09-14 15:28:44 -05:00
17e54c9390
Updated the GPU API in the gpu pair style
2022-09-11 19:00:40 -05:00
363b6c51d0
Used local arrays and re-arranged for coalesced global memory writes
2022-09-10 02:31:39 -05:00
c58343b2e2
Cleaned up debugging stuffs, need more refactoring and add to hippo
2022-09-09 13:50:41 -05:00
b72b71837e
Moved first_induce_iteration in induce() to the right place
2022-09-09 13:34:57 -05:00
4b8caac727
Made some progress with fphi_uind in the gpu pair style
2022-09-09 12:14:36 -05:00
21b7fb2fcf
Exposing fphi_uind to the gpu pair style, still keeping the part not ready though
2022-09-02 14:55:20 -05:00
b2d6df5bfb
Re-arranged some for loops in umutual1 to improve cache-friendly memory access; made placeholder for grid_uind on the GPU lib, maybe FFT is not that heavy to be put on the device.
2022-08-25 23:18:13 -05:00
28dabb9687
Cleaned up unused variables in the amoeba kernels, made room for convolution gpu
2022-08-16 15:37:49 -05:00
f1112ab6b6
Working on the gpu kspace induce term: dipole spreading and/or fft calls
2022-08-15 14:28:46 -05:00
e980838ae2
Added timings for real-space and k-space portions for the terms
2022-08-02 16:45:06 -05:00
93784f35e3
Added ucl_erfc to the opencl, cuda and hip backends; reverted to using erfc instead of approximation to ensure double-precision matches
2022-07-25 15:34:44 -05:00
0c44bd1086
Rearranged the order of real-space and kspace part of ufield0c(), delayed device-host transfer from umutual2b() to overlap with kspace part
2022-07-08 14:45:31 -05:00
78d6df5ba9
Removed temporary arrays in hippo/gpu induce, flipped sign of the viriral terms in torque2force in hippo/gpu
2022-07-06 11:17:08 -05:00
ee5afdc146
Updated all the gpu ready terms
2022-07-04 23:24:31 -05:00
5dab809522
Flipped force sign in polar_real, made sure that multipole_real is true for precompute() to be invoked, ubdirect2b() is segfault and needs work
2022-07-04 01:38:22 -05:00
f4900d131a
Working on the multipole term on the gpu side, incorrect virials
2022-07-01 16:26:25 -05:00
a14f0cfd6c
Merge branch 'amoeba' into amoeba-gpu, update the gpu pair styles with the base class
2022-06-28 12:54:27 -05:00
79fbbd4f33
Cleaned up the API of amoeba and hippo to remove unncessary arguments
2021-10-04 14:40:58 -05:00
e0f91b96fe
Cleaned up and added necessary comments
2021-09-29 13:07:20 -05:00
b874feb127
Removed trailing spaces
2021-09-28 17:28:33 -05:00
d77d5b7f0a
Added classes for hippo/gpu, refactored BaseAmoeba and made room for the dispersion real-space term in hippo
2021-09-21 15:40:06 -05:00
a2fd784034
Added the dispersion real space term, which is for HIPPO.
2021-09-21 10:55:38 -05:00
0228867d8e
Added the dispersion real space kernel and transfer special coeffs to the device
2021-09-19 23:40:43 -05:00
1166845fcf
Prepared data structure for the dispersion real-space term
2021-09-18 10:22:22 -05:00
78045d8f76
Cleaned up debugging stuffs and unused variables
2021-09-17 23:13:51 -05:00
f5713a52b3
Added another kernel to accumulate forces, energies and virial on the device (similar to the tersoff kernels) as multiple kernels all added to those quantities; also only copy answers back to the host in the last kernel in a time step; cleaned up debugging messages
2021-09-17 16:39:57 -05:00
2e6df83b9b
Fixed bugs in the multipole real-space part on the GPU; separately multipole real and polar real work correctly (along with udirect2b and umutual2b), but
...
together they are conflicting due to the use of ans to copy forces back from device to host. The other 2 kernels (induce part) do not touch forces and energies.
2021-09-17 15:24:36 -05:00
d926705950
Short neighbor list for multipole real-space should be built with off2_mpole
2021-09-17 01:32:00 -05:00
003bebd31e
Working on the multipole real-space term, not ready yet
2021-09-17 01:19:33 -05:00
6293da7661
Cleaned up a bit
2021-09-16 17:30:56 -05:00
98c1a0178c
Refactored the API so that different off2 values are used for different kernels
2021-09-16 17:14:36 -05:00
76794bef58
Removed some of the debugging stuffs
2021-09-13 01:16:42 -05:00
bc665999d5
Fixed bugs with the umutual2b kernel, now the field and fieldp seems correct
2021-09-13 01:11:03 -05:00
edd76733a1
Working on umutual2b, tdipdip are correct, but incorrect results for field and fieldp
2021-09-12 00:51:48 -05:00
94d6f7219c
Attempted to reduce the memory footprint of the per-atom arrays
2021-09-11 11:22:17 -05:00
7f5a82dc54
Switched to the short neighbor list implementation in the pre-10Feb21 version (the recent version enforces tpa = 1 for short nbor)
2021-09-11 00:34:43 -05:00
4ebe5833d3
Working on short nbor list for the amoeba kernels (based on what has been done with tersoff and ellipsod, nbor dev_packed needs to be allocated properly)
2021-09-10 16:51:16 -05:00
a22923aee2
Added the API for the umutual kernel, needs work for storing the tdiptdip array
2021-09-09 17:22:09 -05:00
b654f293ee
Working on the umutual2b kernel, the tdipdip values are computed on the fly for now, maybe a seprate neigh list as in the CPU version will be more efficient
2021-09-09 16:52:27 -05:00
efe0bf593f
Adding the umutual2b kernel, need to create another array for tdipdip on the GPU
2021-09-09 15:19:43 -05:00
4a75a9bdd2
Removed dfield0c from ameoba/gpu (no need to override this one)
2021-09-09 14:47:29 -05:00
6f6fd0999c
Both udirect2b and polar_real are working correctly on the GPU
2021-09-09 00:57:21 -05:00
8c5a116d30
Made dfield0c work to compute uind and uinp correctly; need to make sure they are correct for polar_real()
2021-09-08 16:43:33 -05:00
1c5d235f12
Working on the field and fieldp values from GPU back to the host for dfield0c
2021-09-07 16:15:08 -05:00
4e346c2de6
Refactored neighbor list builds and per-atom reallocation parts
2021-09-07 13:05:57 -05:00
8f5f65e68d
Declared virtual to relevant functions in PairAmoeba, added the overridden versions in PairAmoebaGPU
2021-09-03 16:42:58 -05:00
7d69a870a4
Reverted the binsize function call from the GPU package in Atom, instead added atom_modify sort with a binsize to ensure matching virial values, enabled the udirect2b kernel, need more work to override dfield0c, and induce() to bypass reverse_comm() for field and fieldp (line amoeba_induce.cpp:111-112)
2021-09-03 13:43:22 -05:00
7e0c77f1cb
Added fallback flags to indicate which terms are ready from the GPU lib
2021-09-01 14:51:36 -05:00
785a794d39
Added and renamed API to make room for additional kernels (udirect2b only computes the field and fieldp, not accumulating forces, energies, nor virials)
2021-09-01 14:37:11 -05:00