Commit Graph

78 Commits

Author SHA1 Message Date
6442e05988 even more define to static constexpr conversions 2024-01-25 02:17:28 -05:00
a6d178194e use consistent names and capitalization in comments 2023-08-03 10:59:31 -04:00
d6412dc97b Attempted to resolve issues with switching from acctyp4 to acctyp3 in tep, fieldp since the changes in PR #3675, noting some changes with Intel OCL PR #3663 2023-07-08 00:50:19 -05:00
d2faf86214 Merge branch 'develop' into bond-harmonic-restrain 2023-03-14 00:41:28 -04:00
17f39d9d2c rename fix STORE/PERATOM to STORE/ATOM 2023-03-13 22:33:47 -04:00
37f22c8627 Misc Improvements to GPU Package
- Optimizations for molecular systems
-   Improved kernel performance and greater CPU overlap
- Reduced GPU to CPU communications for discrete devices
- Switch classic Intel makefiles to use LLVM-based compilers
- Prefetch optimizations supported for OpenCL
- Optimized data repack for quaternions
2023-03-05 21:03:12 -08:00
722e583b59 use available introspection API to get accumulator data type. update name of flag. 2023-01-25 05:22:49 -05:00
e068b14969 make consistent and simplify 2023-01-25 02:56:05 -05:00
8786819993 use FFT_SCALAR more consistently to perhaps support single precision FFT some time
also, use "override" instead of virtual and add a forgotten virtual
2023-01-24 22:32:40 -05:00
617d70dd1c Replaced MPI_Wtime() with platform::walltime(), put the low-level timing breakdown inside #if DEBUG_AMOEBA 2023-01-20 14:19:16 -06:00
b59ee8d16c silence compiler warnings 2023-01-17 03:54:49 -05:00
28fbc2631b Fixed another bug with ic_kspace being nullptr 2023-01-16 22:33:21 -06:00
b3e45c29ca Removed whitespaces 2023-01-16 10:30:03 -06:00
973b46a907 Attempted to resolve the memory access runtime errors when acquiring single and mixed precision arrays from the GPU lib 2023-01-16 10:12:42 -06:00
c9ae41246d Ran the four make commands in the src folder: make fix-whitespace; make fix-homepage; make fix-errordocs; make fix-permissions 2023-01-15 16:05:36 -06:00
c21f2faa1f Cleaned up debug statements and unused sections in the amoeba and hippo gpu styles 2023-01-14 20:02:36 -06:00
2f1f7ee0fa Cleaned up code 2022-11-03 23:45:40 -05:00
6b9e83fe20 Added timing for the induced dipole spreading part, computed the block size to ensure all the CUs are occupied by the fphi_uind and fphi_mpole kernels 2022-10-06 15:03:58 -05:00
2ef6a59c0a Merge branch 'develop' into amoeba-gpu 2022-10-01 00:38:24 -05:00
1d75ca3b20 Moved precompute() out of the terms in amoeba and hippo, to be involed in the first term in a time step: multipole for amoeba and repulsion for hippo 2022-09-30 16:31:13 -05:00
e6d2582642 Updated fphi_mpole, renamed precompute_induce to precompute_kspace 2022-09-28 15:08:18 -05:00
785131932c Added fphi_mpole in amoeba/gpu, fixed a bug in the kernel when indexing grid 2022-09-20 13:58:17 -05:00
caa66d904e Cleaned up GPU lib functions 2022-09-18 15:54:12 -05:00
f9f777b099 Refactored precompute_induce to overlap data transfers with kernel launches 2022-09-18 15:09:26 -05:00
62ecf98cda Enabled fphi_uind in hippo/gpu, really need to refactor hippo and amoeba in the GPU lib to remove kernel duplicates 2022-09-16 14:47:16 -05:00
880f20c285 Cleaned up kernels 2022-09-15 15:29:14 -05:00
cd3a00c2c4 Added timing breakdown for fphi_uind 2022-09-14 15:28:44 -05:00
17e54c9390 Updated the GPU API in the gpu pair style 2022-09-11 19:00:40 -05:00
363b6c51d0 Used local arrays and re-arranged for coalesced global memory writes 2022-09-10 02:31:39 -05:00
c58343b2e2 Cleaned up debugging stuffs, need more refactoring and add to hippo 2022-09-09 13:50:41 -05:00
b72b71837e Moved first_induce_iteration in induce() to the right place 2022-09-09 13:34:57 -05:00
4b8caac727 Made some progress with fphi_uind in the gpu pair style 2022-09-09 12:14:36 -05:00
21b7fb2fcf Exposing fphi_uind to the gpu pair style, still keeping the part not ready though 2022-09-02 14:55:20 -05:00
b2d6df5bfb Re-arranged some for loops in umutual1 to improve cache-friendly memory access; made placeholder for grid_uind on the GPU lib, maybe FFT is not that heavy to be put on the device. 2022-08-25 23:18:13 -05:00
28dabb9687 Cleaned up unused variables in the amoeba kernels, made room for convolution gpu 2022-08-16 15:37:49 -05:00
f1112ab6b6 Working on the gpu kspace induce term: dipole spreading and/or fft calls 2022-08-15 14:28:46 -05:00
e980838ae2 Added timings for real-space and k-space portions for the terms 2022-08-02 16:45:06 -05:00
93784f35e3 Added ucl_erfc to the opencl, cuda and hip backends; reverted to using erfc instead of approximation to ensure double-precision matches 2022-07-25 15:34:44 -05:00
0c44bd1086 Rearranged the order of real-space and kspace part of ufield0c(), delayed device-host transfer from umutual2b() to overlap with kspace part 2022-07-08 14:45:31 -05:00
78d6df5ba9 Removed temporary arrays in hippo/gpu induce, flipped sign of the viriral terms in torque2force in hippo/gpu 2022-07-06 11:17:08 -05:00
ee5afdc146 Updated all the gpu ready terms 2022-07-04 23:24:31 -05:00
5dab809522 Flipped force sign in polar_real, made sure that multipole_real is true for precompute() to be invoked, ubdirect2b() is segfault and needs work 2022-07-04 01:38:22 -05:00
f4900d131a Working on the multipole term on the gpu side, incorrect virials 2022-07-01 16:26:25 -05:00
a14f0cfd6c Merge branch 'amoeba' into amoeba-gpu, update the gpu pair styles with the base class 2022-06-28 12:54:27 -05:00
79fbbd4f33 Cleaned up the API of amoeba and hippo to remove unncessary arguments 2021-10-04 14:40:58 -05:00
e0f91b96fe Cleaned up and added necessary comments 2021-09-29 13:07:20 -05:00
b874feb127 Removed trailing spaces 2021-09-28 17:28:33 -05:00
d77d5b7f0a Added classes for hippo/gpu, refactored BaseAmoeba and made room for the dispersion real-space term in hippo 2021-09-21 15:40:06 -05:00
a2fd784034 Added the dispersion real space term, which is for HIPPO. 2021-09-21 10:55:38 -05:00
0228867d8e Added the dispersion real space kernel and transfer special coeffs to the device 2021-09-19 23:40:43 -05:00