Note: "newton on" still required if using non-kokkos pair styles or fixes.
Non-kokkos pairs/fixes don't expect their half lists with newton off,
which happens if newton is turned off globally by kokkos via commandline.
Note2: Regardless, fix_shardlow* will still use half lists and newton on.
- Switched from using lambda functions to operator()'s with type tags
in FixRxKokkos. The lambda's were giving big problems in Cuda with
the memory objects. This required that all referenced views be members
of the FixRXKokkos class.
- Add copymode controls to solve_reactions() to avoid the destructor
freeing pointers carried forward from the copy constructor. Added
the same to FixRX since its called, too.
- Updated the function prototypes to include the necessary KOKKOS
macros for __host__ and __device__ functions and inlined functions.
- Changed several View definitions to match the disjoint memory spaces
that only come up with Cuda builds.
- Finished porting all scratch arrays to using the StridedArrayType
template.
- Created a single, large Kokkos device array and using that for all
scratch data passed into the StridedArrayType objects.
- Created an Array class that provides stride access for operator[]
w/o needing Kokkos views. This was designed to avoid the performance
issues encountered with Views and sub-views throughout the RHS and
ODE solver functions.
- Added the diagnostics performance analysis routine to FixRxKokkos
using Kokkos views.
TODO:
- Switch to using Kokkos data for the per-iteration scratch data.
How to allocate only enouch for each work-unit and not all
iterations? Can the shared-memory scratch memory work for this,
even for large sizes?
usage and calls to computeLocalTemperature.
- Created request for kokkos neighbor list for fix and switched to
that neighbor list datatype in computeLocalTemperature.
- Reconfigured pre_force and setup_pre_force to call a common
solve_reactions() method to avoid duplicate code.
TODO:
- Clean-up
- Provide per-problem scratch data within kokkos framework (instead
of C++ new/delete data).
- Added a kokkos version of setup_pre_force that only sets dvector
and then communicates that.
- Converted all for loops to parallel_for's in computeLocalTemperator()
and setup_pre_force.
- Added pack/unpack forward/reverse methods with Kokkos host views.
TODO:
- The Kokkos neighbor list is not working. Need to request a Kokkos
neighbor list in ::init(). Then, replace objects like list->ilist[]
with k_list->d_ilist().
Added kokkos dual-view datatypes used in computeLocalTemperature and
pre_force (e.g., dpdThetaLocal) but still using the original host
pointers for the pack/unpack operations.
TODO:
- The Kokkos neighbor list is not working. Need to request a Kokkos
neighbor list in ::init(). Then, replace objects like list->ilist[]
with k_list->d_ilist().
- Add another template parameter for HALFTHREAD and create (automatic)
atomic view of dpdThetaLocal and sumWeights.
- Add modify/sync comments and replace the host-only pointers in the
pack/unpack methods.
- Added templated computeLocalTemp<>() to FixRxKokkos but still
using the original host data pointers.
- Updated the copy-back to dvector operation to be the same with
RK4 and RKF45 per discussion with J. Larentzos.
TODO:
- Add kokkos data for computeLocalTemp and parallel_for loop.
- Added kokkos-managed parameter data for the kinetics equations.
- Removed dependencies in rhs() on atom and domain objects.
TODO:
1. Switch to using KOKKOS data for dvector.
2. Port ComputeLocalTemp(...) to Kokkos (needs pairing algorithm).
Initial port of USER-DPD/fix_rx.cpp to KOKKOS/fix_rx_kokkos.cpp.
Using parallel_reduce(...) but still using host-only data.
TODO:
1. Switch to KOKKOS datatypes for sparse-kinetics data; dense
is finished.
2. Switch to using KOKKOS data for dvector.
3. Remove dependencies in rhs(...) on atom. Store those consts
in UserData{} or as member constants.
4. Port ComputeLocalTemp(...) to Kokkos (needs pairing algorithm).