- Updated the KOKKOS installer to include the fix_rx_kokkos.[cpp,h].
- Updated the USER-DPD version of fix_rx.[cpp,h] to sync with the Kokkos
version. Solves child->parent class dependencies.
- Added kokkos-managed parameter data for the kinetics equations.
- Removed dependencies in rhs() on atom and domain objects.
TODO:
1. Switch to using KOKKOS data for dvector.
2. Port ComputeLocalTemp(...) to Kokkos (needs pairing algorithm).
Initial port of USER-DPD/fix_rx.cpp to KOKKOS/fix_rx_kokkos.cpp.
Using parallel_reduce(...) but still using host-only data.
TODO:
1. Switch to KOKKOS datatypes for sparse-kinetics data; dense
is finished.
2. Switch to using KOKKOS data for dvector.
3. Remove dependencies in rhs(...) on atom. Store those consts
in UserData{} or as member constants.
4. Port ComputeLocalTemp(...) to Kokkos (needs pairing algorithm).
Overall improvements range from 2% to 18% on our benchmarks
1) Newton has to be turned on for SSA, so remove those conditionals
2) Rework the math in ssa_update() to eliminate many ops and temporaries
3) Split ssa_update() into two versions, based on DPD vs. DPDE
4) Reorder code in ssa_update_*() to reduce register pressure
The code tries to make this distinction between the real distance (r23) and the facticious one (rij), but does not do so very well.
It is better if those two variables have the same value everywhere, and apply the correction where necessary.
The current way to use the values is incorrrect.
Remove those calculations that effectively are derivatives w.r.t. |rij| (the facticious distance), is constant and thus the chained derivative (d|rij|/dRij) is always zero.
Apply the corrections due to drij/dRij in the sum omega term.
The bonderorderLJ function operates on a facticious distance |rij|, i.e. everything gets calculated "as if" atoms i and j were a given distance alpha apart.
Mathematically, bondorderLJ is a function of rij (a vector), that is (in terms of the real distance Rij) rij = alpha * Rij/|Rij|.
When we calculate the forces in bondorderLJ, we have to make sure to chain in this derivative whenever we calculate derivatives w.r.t. rij.
The right correction, as it turns our, is Fij = alpha / |Rij| * (Identity(3,3) - Rij * Rij^T / |Rij|^2) * fij.
This commit only fixes this for the p_ij^sigma pi terms, which were modified to separate out the d/drij derivative in the cosine calculation.
Now, derivatives are taken w.r.t. the connecting edges instead of the edge points.
Since Etmp (representing sum_kijl omega_kijl * w_ik * w_jl) is not reset between the forward and reverse pass, the value used by later calculation will be twice the expected values.
One could instead reset Etmp between these passes, but there really is no reason to calculate it twice.
because the verlet_kokkos system has
a "clever" optimization which will
alter the datamasks before calling sync/modify,
so the datamask framework must be
strictly obeyed for GPU correctness.
(the optimization is to concurrently
compute forces on the host and GPU,
and add them up at the end of an iteration.
calling your own sync will overwrite
the partial GPU forces with the
partial host forces).