37f22c8627
Misc Improvements to GPU Package
...
- Optimizations for molecular systems
- Improved kernel performance and greater CPU overlap
- Reduced GPU to CPU communications for discrete devices
- Switch classic Intel makefiles to use LLVM-based compilers
- Prefetch optimizations supported for OpenCL
- Optimized data repack for quaternions
2023-03-05 21:03:12 -08:00
adf43d7fee
Fixed the issues with some OpenCL implementation to avoid errors casting changing the pointer address spaces
2023-01-25 00:02:25 -06:00
8e79e2efa5
More cleanup, fixed bugs with hippo fphi kernels for mixed precision
2023-01-23 00:18:42 -06:00
658328dd9d
Added a note in the amoeba doc page on the not-yet resolved issue with integrated GPUs, removed commented out and debugging stuffs in the AM/HP kernels
2023-01-22 17:24:15 -06:00
973b46a907
Attempted to resolve the memory access runtime errors when acquiring single and mixed precision arrays from the GPU lib
2023-01-16 10:12:42 -06:00
2f1f7ee0fa
Cleaned up code
2022-11-03 23:45:40 -05:00
e6d2582642
Updated fphi_mpole, renamed precompute_induce to precompute_kspace
2022-09-28 15:08:18 -05:00
166701f13a
Fixed missing commas in the argument list of the macros in amoeba and hippo cu files, added amoeba_convolution_gpu.cpp and .h to the source file list in GPU.cmake
2022-09-23 11:53:09 -05:00
785131932c
Added fphi_mpole in amoeba/gpu, fixed a bug in the kernel when indexing grid
2022-09-20 13:58:17 -05:00
356c46c913
Replaced mem allocation/deallocation inside moduli() with using member variables and mem resize if needed
2022-09-18 16:28:30 -05:00
62ecf98cda
Enabled fphi_uind in hippo/gpu, really need to refactor hippo and amoeba in the GPU lib to remove kernel duplicates
2022-09-16 14:47:16 -05:00
880f20c285
Cleaned up kernels
2022-09-15 15:29:14 -05:00
9c4d3db558
Cleaned up and converted arrays to ucl_vector of numtyp4
2022-09-13 16:48:39 -05:00
7f4efa380a
Re-arranged memory allocation for cgrid_brick, some issues need to be fixed
2022-09-11 18:58:34 -05:00
5e59c95be4
Moved temp variables inside loops
2022-09-10 02:45:06 -05:00
363b6c51d0
Used local arrays and re-arranged for coalesced global memory writes
2022-09-10 02:31:39 -05:00
c58343b2e2
Cleaned up debugging stuffs, need more refactoring and add to hippo
2022-09-09 13:50:41 -05:00
4b8caac727
Made some progress with fphi_uind in the gpu pair style
2022-09-09 12:14:36 -05:00
a0af9627e5
Fixed memory bugs with device array allocations
2022-09-06 16:19:17 -05:00
cad7e1b364
Moved fphi_uind up to BaseAmoeba
2022-09-02 10:18:59 -05:00
aac264f2e2
Working on the fphi_uind kernel and array allocations
2022-08-30 23:40:04 -05:00
c5c3c697df
Adding fphi_uind kernel, working on the arrays allocation
2022-08-29 00:13:30 -05:00
28dabb9687
Cleaned up unused variables in the amoeba kernels, made room for convolution gpu
2022-08-16 15:37:49 -05:00
aad4e417f9
Moved temp variables inside neighbor loops
2022-08-03 12:33:48 -05:00
a54f0b684d
Moved temp variables inside the loop over neighbors
2022-08-03 10:56:52 -05:00
93784f35e3
Added ucl_erfc to the opencl, cuda and hip backends; reverted to using erfc instead of approximation to ensure double-precision matches
2022-07-25 15:34:44 -05:00
675c2d38a3
Flipped sign of forces and virial terms in the hippo kernels
2022-07-05 14:37:26 -05:00
5dab809522
Flipped force sign in polar_real, made sure that multipole_real is true for precompute() to be invoked, ubdirect2b() is segfault and needs work
2022-07-04 01:38:22 -05:00
f4900d131a
Working on the multipole term on the gpu side, incorrect virials
2022-07-01 16:26:25 -05:00
0f0f6a51de
Renamed sp_polar to sp_amoeba, and replaced special_wscale with special_hal for amoeba
2021-10-02 16:02:44 -05:00
3328ac0df2
Attempted to remove some redundancy in data transfers in the amoeba kernels; keeping HIPPO independent of AMOEBA for now
2021-10-01 09:58:21 -05:00
b874feb127
Removed trailing spaces
2021-09-28 17:28:33 -05:00
e80eea56ba
Added udirect2b and umutual2b for hippo
2021-09-28 14:59:39 -05:00
bebef18495
Cleaned up and minor changes
2021-09-21 23:46:21 -05:00
a2fd784034
Added the dispersion real space term, which is for HIPPO.
2021-09-21 10:55:38 -05:00
42034bd1c9
Fixed bugs for undefined tagint and ucl_powr ambiguity in kernels for OpenCL builds
2021-09-20 12:48:29 -05:00
4e88cd158e
Fixed bugs with _tep and _fieldp to allow mixed-precision builds, being defensive with acctyp for these variables
2021-09-20 11:38:50 -05:00
0228867d8e
Added the dispersion real space kernel and transfer special coeffs to the device
2021-09-19 23:40:43 -05:00
1166845fcf
Prepared data structure for the dispersion real-space term
2021-09-18 10:22:22 -05:00
78045d8f76
Cleaned up debugging stuffs and unused variables
2021-09-17 23:13:51 -05:00
f5713a52b3
Added another kernel to accumulate forces, energies and virial on the device (similar to the tersoff kernels) as multiple kernels all added to those quantities; also only copy answers back to the host in the last kernel in a time step; cleaned up debugging messages
2021-09-17 16:39:57 -05:00
2e6df83b9b
Fixed bugs in the multipole real-space part on the GPU; separately multipole real and polar real work correctly (along with udirect2b and umutual2b), but
...
together they are conflicting due to the use of ans to copy forces back from device to host. The other 2 kernels (induce part) do not touch forces and energies.
2021-09-17 15:24:36 -05:00
d926705950
Short neighbor list for multipole real-space should be built with off2_mpole
2021-09-17 01:32:00 -05:00
003bebd31e
Working on the multipole real-space term, not ready yet
2021-09-17 01:19:33 -05:00
bc665999d5
Fixed bugs with the umutual2b kernel, now the field and fieldp seems correct
2021-09-13 01:11:03 -05:00
edd76733a1
Working on umutual2b, tdipdip are correct, but incorrect results for field and fieldp
2021-09-12 00:51:48 -05:00
c765861851
Cleaned up and re-arranged the functions to reflect the order of calling in a time step
2021-09-11 01:00:58 -05:00
7f5a82dc54
Switched to the short neighbor list implementation in the pre-10Feb21 version (the recent version enforces tpa = 1 for short nbor)
2021-09-11 00:34:43 -05:00
4ebe5833d3
Working on short nbor list for the amoeba kernels (based on what has been done with tersoff and ellipsod, nbor dev_packed needs to be allocated properly)
2021-09-10 16:51:16 -05:00
a22923aee2
Added the API for the umutual kernel, needs work for storing the tdiptdip array
2021-09-09 17:22:09 -05:00