NOTE: pair evaluation order changes, causing numerical differences!
Atom pair processing order is fully planned out in npair_half_bin_newton_ssa
Makes the SSA neighbor list structure very different. Do not use by others!
Each local is in ilist, numneigh, and firstneigh four times instead of once.
Changes LAMMPS core code that had been previously changed for USER-DPD/SSA:
Removes ssaAIR[] from class Atom as it is now unused.
Removes ndxAIR_ssa[] from class NeighList as it is now unused.
Increases length of ilist[], numneigh[], and firstneigh[] if SSA flag set.
NOTE: pair evaluation order changes, causing numerical differences!
This enables processing neighbors in subphase groups that enforce
a geometrical seperation of pairs, allowing greater parallelism
once fix_shardlow (SSA) is converted to Kokkos.
This removes the the distinction between pure and impure locals.
Pure and impure locals messed up the directionality of half neighbor lists,
which turns out is crucial to the approach for SSA with kokkos.
Since we compute dvdw as d vdw / d rij, we have to also compute
dslw as d slw / d rij. Currently, we compute -1/r d slw/d rij,
which leads to incorrect results when the two are later combined.
Alternatively, one could also modify dvdw to be -1/r d vdw/d rij,
which would be a more standard way to do LJ calculations, but this
way seems more consistent.
- Switched from using lambda functions to operator()'s with type tags
in FixRxKokkos. The lambda's were giving big problems in Cuda with
the memory objects. This required that all referenced views be members
of the FixRXKokkos class.
- Add copymode controls to solve_reactions() to avoid the destructor
freeing pointers carried forward from the copy constructor. Added
the same to FixRX since its called, too.
- Updated the function prototypes to include the necessary KOKKOS
macros for __host__ and __device__ functions and inlined functions.
- Changed several View definitions to match the disjoint memory spaces
that only come up with Cuda builds.
- Finished porting all scratch arrays to using the StridedArrayType
template.
- Created a single, large Kokkos device array and using that for all
scratch data passed into the StridedArrayType objects.
- Created an Array class that provides stride access for operator[]
w/o needing Kokkos views. This was designed to avoid the performance
issues encountered with Views and sub-views throughout the RHS and
ODE solver functions.
- Added the diagnostics performance analysis routine to FixRxKokkos
using Kokkos views.
TODO:
- Switch to using Kokkos data for the per-iteration scratch data.
How to allocate only enouch for each work-unit and not all
iterations? Can the shared-memory scratch memory work for this,
even for large sizes?
usage and calls to computeLocalTemperature.
- Created request for kokkos neighbor list for fix and switched to
that neighbor list datatype in computeLocalTemperature.
- Reconfigured pre_force and setup_pre_force to call a common
solve_reactions() method to avoid duplicate code.
TODO:
- Clean-up
- Provide per-problem scratch data within kokkos framework (instead
of C++ new/delete data).
- Added a kokkos version of setup_pre_force that only sets dvector
and then communicates that.
- Converted all for loops to parallel_for's in computeLocalTemperator()
and setup_pre_force.
- Added pack/unpack forward/reverse methods with Kokkos host views.
TODO:
- The Kokkos neighbor list is not working. Need to request a Kokkos
neighbor list in ::init(). Then, replace objects like list->ilist[]
with k_list->d_ilist().
Added kokkos dual-view datatypes used in computeLocalTemperature and
pre_force (e.g., dpdThetaLocal) but still using the original host
pointers for the pack/unpack operations.
TODO:
- The Kokkos neighbor list is not working. Need to request a Kokkos
neighbor list in ::init(). Then, replace objects like list->ilist[]
with k_list->d_ilist().
- Add another template parameter for HALFTHREAD and create (automatic)
atomic view of dpdThetaLocal and sumWeights.
- Add modify/sync comments and replace the host-only pointers in the
pack/unpack methods.
- Added templated computeLocalTemp<>() to FixRxKokkos but still
using the original host data pointers.
- Updated the copy-back to dvector operation to be the same with
RK4 and RKF45 per discussion with J. Larentzos.
TODO:
- Add kokkos data for computeLocalTemp and parallel_for loop.