a3cc0e8432
Reverted the block size tuning, which caused bugs for low atom counts (will revisit later)
2022-11-04 13:45:59 -05:00
2f1f7ee0fa
Cleaned up code
2022-11-03 23:45:40 -05:00
00f46120c7
Removed max_cus() from Device, used device->gpu->cus() instead
2022-10-07 15:50:30 -05:00
6b9e83fe20
Added timing for the induced dipole spreading part, computed the block size to ensure all the CUs are occupied by the fphi_uind and fphi_mpole kernels
2022-10-06 15:03:58 -05:00
2ef6a59c0a
Merge branch 'develop' into amoeba-gpu
2022-10-01 00:38:24 -05:00
9a1f23a079
Cosmetic changes and cleanup
2022-09-30 17:32:25 -05:00
1d75ca3b20
Moved precompute() out of the terms in amoeba and hippo, to be involed in the first term in a time step: multipole for amoeba and repulsion for hippo
2022-09-30 16:31:13 -05:00
fb675028b9
whitespace
2022-09-29 02:42:11 -04:00
71464d8314
GPU Package: Fixing logic in OpenCL backend that could result in unnecessary device allocations.
2022-09-28 22:30:09 -07:00
6e34d21b24
GPU Package: Switching back to timer disabling with multiple MPI tasks per GPU. Logic added to prevent mem leak.
2022-09-28 21:02:16 -07:00
e6d2582642
Updated fphi_mpole, renamed precompute_induce to precompute_kspace
2022-09-28 15:08:18 -05:00
166701f13a
Fixed missing commas in the argument list of the macros in amoeba and hippo cu files, added amoeba_convolution_gpu.cpp and .h to the source file list in GPU.cmake
2022-09-23 11:53:09 -05:00
785131932c
Added fphi_mpole in amoeba/gpu, fixed a bug in the kernel when indexing grid
2022-09-20 13:58:17 -05:00
356c46c913
Replaced mem allocation/deallocation inside moduli() with using member variables and mem resize if needed
2022-09-18 16:28:30 -05:00
caa66d904e
Cleaned up GPU lib functions
2022-09-18 15:54:12 -05:00
f9f777b099
Refactored precompute_induce to overlap data transfers with kernel launches
2022-09-18 15:09:26 -05:00
62ecf98cda
Enabled fphi_uind in hippo/gpu, really need to refactor hippo and amoeba in the GPU lib to remove kernel duplicates
2022-09-16 14:47:16 -05:00
880f20c285
Cleaned up kernels
2022-09-15 15:29:14 -05:00
cd3a00c2c4
Added timing breakdown for fphi_uind
2022-09-14 15:28:44 -05:00
9c4d3db558
Cleaned up and converted arrays to ucl_vector of numtyp4
2022-09-13 16:48:39 -05:00
31047b4a31
Removed mem alloc in precompute_induce, used buffer for packing, and switched to using ucl_vector
2022-09-13 12:53:48 -05:00
7f4efa380a
Re-arranged memory allocation for cgrid_brick, some issues need to be fixed
2022-09-11 18:58:34 -05:00
5e59c95be4
Moved temp variables inside loops
2022-09-10 02:45:06 -05:00
363b6c51d0
Used local arrays and re-arranged for coalesced global memory writes
2022-09-10 02:31:39 -05:00
c58343b2e2
Cleaned up debugging stuffs, need more refactoring and add to hippo
2022-09-09 13:50:41 -05:00
b72b71837e
Moved first_induce_iteration in induce() to the right place
2022-09-09 13:34:57 -05:00
4b8caac727
Made some progress with fphi_uind in the gpu pair style
2022-09-09 12:14:36 -05:00
167abe9ce0
add preprocessor flags to select between the changed and the old code variant
2022-09-09 12:41:24 -04:00
0d2db984eb
Merge branch 'develop' into benmenadue/develop
2022-09-06 19:25:21 -04:00
a0af9627e5
Fixed memory bugs with device array allocations
2022-09-06 16:19:17 -05:00
294a1c2168
Use primary context in CUDA GPU code.
...
Since LAMMPS uses the low-level driver API of CUDA, it needs to ensure
that it is in the correct context when invoking such functions. At the
moment it creates and switches to its own context inside `UCL_Device::set`
but then assumes that the driver is still in that context for subsequent
calls into CUDA; if another part of the program uses a different context
(such as the CUDA runtime using the "primary" context) this will cause
failures inside LAMMPS.
This patch changes the context creation to instead use the primary
context for the requested device. While it's not perfect, in that it
still doesn't ensure that it's in the correct context before making
driver API calls, it at least allows it to work with libraries that use
the runtime API.
2022-09-06 09:28:51 +10:00
21b7fb2fcf
Exposing fphi_uind to the gpu pair style, still keeping the part not ready though
2022-09-02 14:55:20 -05:00
51a4819bfc
Fixed an illegal preprocessor issue.
2022-09-02 11:42:30 -04:00
cad7e1b364
Moved fphi_uind up to BaseAmoeba
2022-09-02 10:18:59 -05:00
aac264f2e2
Working on the fphi_uind kernel and array allocations
2022-08-30 23:40:04 -05:00
c5c3c697df
Adding fphi_uind kernel, working on the arrays allocation
2022-08-29 00:13:30 -05:00
9e7bbad4d4
Working on fphi_uind in the GPU lib
2022-08-27 13:19:52 -05:00
b160460dcc
Added preprocessors to comment out cufft entirely for now
2022-08-26 12:55:46 -05:00
b2d6df5bfb
Re-arranged some for loops in umutual1 to improve cache-friendly memory access; made placeholder for grid_uind on the GPU lib, maybe FFT is not that heavy to be put on the device.
2022-08-25 23:18:13 -05:00
8d77c1daee
Merge remote-tracking branch 'origin/develop' into tip4p_cornercase
2022-08-25 17:58:17 +03:00
f4a90c62c0
First attempt to port the forward FFT in the k-space induce term to the GPU, not working yet
2022-08-23 15:42:05 -05:00
921796a15f
Cleaned up unused variables in the hippo kernels
2022-08-16 16:29:38 -05:00
28dabb9687
Cleaned up unused variables in the amoeba kernels, made room for convolution gpu
2022-08-16 15:37:49 -05:00
46b8b00a4f
Working on fft on the device
2022-08-15 15:51:43 -05:00
538aa13693
Only transfer data that is needed for umutual2b; allowed convolution and kspace term umutual1 to be overridden by the gpu counterparts
2022-08-10 16:21:30 -05:00
baf3e614fb
Add comments for tip4p GPU kernels
2022-08-07 22:26:11 +03:00
aad4e417f9
Moved temp variables inside neighbor loops
2022-08-03 12:33:48 -05:00
a54f0b684d
Moved temp variables inside the loop over neighbors
2022-08-03 10:56:52 -05:00
5fee276348
add some GNU Make magic(tm) to Makefile.hip to adapt itself to OpenMPI and MPICH
2022-07-28 07:03:58 -04:00
e7ffa7fae3
Add Makefile support for CHIP-SPV
2022-07-27 08:34:35 +00:00