diff --git a/doc/src/Speed_kokkos.rst b/doc/src/Speed_kokkos.rst index 569a24f1c2..dd417d7c79 100644 --- a/doc/src/Speed_kokkos.rst +++ b/doc/src/Speed_kokkos.rst @@ -285,6 +285,16 @@ one or more nodes, each with two GPUs: settings. Experimenting with its options can provide a speed-up for specific calculations. For example: +.. note:: + + The default binsize for :doc:`atom sorting ` on GPUs + is equal to the default CPU neighbor binsize (i.e. 2x smaller than the + default neighbor binsize on GPUs). When running simple pair-wise + potentials like Lennard Jones on GPUs, using a 2x larger binsize for + atom sorting (equal to the default binsize for building the neighbor + list on GPUs) and a more frequent sorting than default (e.g. sorting + every 100 time steps instead of 1000) may improve performance. + .. code-block:: bash mpirun -np 2 lmp_kokkos_cuda_openmpi -k on g 2 -sf kk -pk kokkos newton on neigh half binsize 2.8 -in in.lj # Newton on, half neighbor list, set binsize = neighbor ghost cutoff diff --git a/doc/src/atom_modify.rst b/doc/src/atom_modify.rst index 9049a24fde..1e5a3d49ff 100644 --- a/doc/src/atom_modify.rst +++ b/doc/src/atom_modify.rst @@ -153,6 +153,13 @@ cache locality will be undermined. order of atoms in a :doc:`dump ` file will also typically change if sorting is enabled. +.. note:: + + When running simple pair-wise potentials like Lennard Jones on GPUs + with the KOKKOS package, using a larger binsize (e.g. 2x larger than + default) and a more frequent reordering than default (e.g. every 100 + time steps) may improve performance. + Restrictions """"""""""""