Commit Graph

73 Commits

Author SHA1 Message Date
d6412dc97b Attempted to resolve issues with switching from acctyp4 to acctyp3 in tep, fieldp since the changes in PR #3675, noting some changes with Intel OCL PR #3663 2023-07-08 00:50:19 -05:00
37f22c8627 Misc Improvements to GPU Package
- Optimizations for molecular systems
-   Improved kernel performance and greater CPU overlap
- Reduced GPU to CPU communications for discrete devices
- Switch classic Intel makefiles to use LLVM-based compilers
- Prefetch optimizations supported for OpenCL
- Optimized data repack for quaternions
2023-03-05 21:03:12 -08:00
adf43d7fee Fixed the issues with some OpenCL implementation to avoid errors casting changing the pointer address spaces 2023-01-25 00:02:25 -06:00
aaa918cbe7 Fixed bugs with access mode on the host side of thetai[1-3] 2023-01-24 17:05:48 -06:00
5014e04341 Removed commented out code, ensured that ic_kspace is not nullptr when call precompute_kspace for hippo/gpu 2023-01-24 08:40:08 -06:00
8eb722a32a Enforced synchronous host-device transfers for cgrid_brick and fdip arrays 2023-01-19 13:22:27 -06:00
03ab42fd52 correct calling sequence for matching argument types 2023-01-19 08:57:24 -05:00
4244d2e6cd silence compiler warnings about unused parameters and variables 2023-01-19 08:56:54 -05:00
c9ae41246d Ran the four make commands in the src folder: make fix-whitespace; make fix-homepage; make fix-errordocs; make fix-permissions 2023-01-15 16:05:36 -06:00
959b9c220f Cleaned up unused member functions and hd_balancer calls 2022-11-07 15:49:37 -06:00
a3cc0e8432 Reverted the block size tuning, which caused bugs for low atom counts (will revisit later) 2022-11-04 13:45:59 -05:00
00f46120c7 Removed max_cus() from Device, used device->gpu->cus() instead 2022-10-07 15:50:30 -05:00
6b9e83fe20 Added timing for the induced dipole spreading part, computed the block size to ensure all the CUs are occupied by the fphi_uind and fphi_mpole kernels 2022-10-06 15:03:58 -05:00
2ef6a59c0a Merge branch 'develop' into amoeba-gpu 2022-10-01 00:38:24 -05:00
9a1f23a079 Cosmetic changes and cleanup 2022-09-30 17:32:25 -05:00
1d75ca3b20 Moved precompute() out of the terms in amoeba and hippo, to be involed in the first term in a time step: multipole for amoeba and repulsion for hippo 2022-09-30 16:31:13 -05:00
e6d2582642 Updated fphi_mpole, renamed precompute_induce to precompute_kspace 2022-09-28 15:08:18 -05:00
785131932c Added fphi_mpole in amoeba/gpu, fixed a bug in the kernel when indexing grid 2022-09-20 13:58:17 -05:00
caa66d904e Cleaned up GPU lib functions 2022-09-18 15:54:12 -05:00
f9f777b099 Refactored precompute_induce to overlap data transfers with kernel launches 2022-09-18 15:09:26 -05:00
62ecf98cda Enabled fphi_uind in hippo/gpu, really need to refactor hippo and amoeba in the GPU lib to remove kernel duplicates 2022-09-16 14:47:16 -05:00
880f20c285 Cleaned up kernels 2022-09-15 15:29:14 -05:00
9c4d3db558 Cleaned up and converted arrays to ucl_vector of numtyp4 2022-09-13 16:48:39 -05:00
31047b4a31 Removed mem alloc in precompute_induce, used buffer for packing, and switched to using ucl_vector 2022-09-13 12:53:48 -05:00
7f4efa380a Re-arranged memory allocation for cgrid_brick, some issues need to be fixed 2022-09-11 18:58:34 -05:00
c58343b2e2 Cleaned up debugging stuffs, need more refactoring and add to hippo 2022-09-09 13:50:41 -05:00
b72b71837e Moved first_induce_iteration in induce() to the right place 2022-09-09 13:34:57 -05:00
4b8caac727 Made some progress with fphi_uind in the gpu pair style 2022-09-09 12:14:36 -05:00
a0af9627e5 Fixed memory bugs with device array allocations 2022-09-06 16:19:17 -05:00
21b7fb2fcf Exposing fphi_uind to the gpu pair style, still keeping the part not ready though 2022-09-02 14:55:20 -05:00
cad7e1b364 Moved fphi_uind up to BaseAmoeba 2022-09-02 10:18:59 -05:00
aac264f2e2 Working on the fphi_uind kernel and array allocations 2022-08-30 23:40:04 -05:00
c5c3c697df Adding fphi_uind kernel, working on the arrays allocation 2022-08-29 00:13:30 -05:00
9e7bbad4d4 Working on fphi_uind in the GPU lib 2022-08-27 13:19:52 -05:00
b160460dcc Added preprocessors to comment out cufft entirely for now 2022-08-26 12:55:46 -05:00
f4a90c62c0 First attempt to port the forward FFT in the k-space induce term to the GPU, not working yet 2022-08-23 15:42:05 -05:00
28dabb9687 Cleaned up unused variables in the amoeba kernels, made room for convolution gpu 2022-08-16 15:37:49 -05:00
46b8b00a4f Working on fft on the device 2022-08-15 15:51:43 -05:00
538aa13693 Only transfer data that is needed for umutual2b; allowed convolution and kspace term umutual1 to be overridden by the gpu counterparts 2022-08-10 16:21:30 -05:00
66ee2bf989 Cleaned up 2022-07-14 11:01:30 -05:00
0c44bd1086 Rearranged the order of real-space and kspace part of ufield0c(), delayed device-host transfer from umutual2b() to overlap with kspace part 2022-07-08 14:45:31 -05:00
79fbbd4f33 Cleaned up the API of amoeba and hippo to remove unncessary arguments 2021-10-04 14:40:58 -05:00
5a6426bf96 Only transfer data arrays that are needed in each kernel 2021-10-02 00:56:15 -05:00
f4d3d3a2b5 Gradually cleaned up and removed redundancy in amoeba and hippo 2021-10-02 00:09:53 -05:00
f126f785a4 Removed duplicates in the amoeba kernels 2021-10-01 10:19:17 -05:00
3328ac0df2 Attempted to remove some redundancy in data transfers in the amoeba kernels; keeping HIPPO independent of AMOEBA for now 2021-10-01 09:58:21 -05:00
01381b7f54 Fixed bugs in the repulsion kernel, now working correctly with the double precision mode 2021-09-29 11:57:25 -05:00
6286a119b3 Removed precompute() in hippo 2021-09-28 23:12:07 -05:00
98a2b67292 Changed to the API of BaseAmoeba to reduce duplicates in hippo 2021-09-28 17:39:55 -05:00
b874feb127 Removed trailing spaces 2021-09-28 17:28:33 -05:00