updated the doc page for the new parameters

2025-06-15 15:20:56 -05:00
parent b3423e9b7a
commit b5809aff67
1 changed files with 21 additions and 3 deletions
--- a/doc/src/package.rst
+++ b/doc/src/package.rst
@ -123,6 +123,10 @@ Syntax
          Nteamsize = # of threads per block used for the pair compute kernel
        *nbin/atoms/per/bin = Natomsperbin
          Natomsperbin = # of atoms per bin used for neighbor list builds
+        *nbor/block/size = blocksize
+          blocksize = # of GPU threads per block for the flat neighbor build method
+        *bond/block/size = blocksize
+          blocksize = # of GPU threads per block for the bond force computation 
    *omp* args = Nthreads keyword value ...
      Nthreads = # of OpenMP threads to associate with each MPI process
      zero or more keyword/value pairs may be appended
@ -592,14 +596,14 @@ keyword above.
 The *gpu/aware* keyword chooses whether GPU-aware MPI will be used. When
 this keyword is set to *on*, buffers in GPU memory are passed directly
 through MPI send/receive calls. This reduces overhead of first copying
-the data to the host CPU. However GPU-aware MPI is not supported on all
+the data to the host CPU.  However GPU-aware MPI is not supported on all
 systems, which can lead to segmentation faults and would require using a
 value of *off*\ . If LAMMPS can safely detect that GPU-aware MPI is not
 available (currently only possible with OpenMPI v2.0.0 or later), then
 the *gpu/aware* keyword is automatically set to *off* by default. When
 the *gpu/aware* keyword is set to *off* while any of the *comm*
 keywords are set to *device*, the value for these *comm* keywords will
-be automatically changed to *no*\ . This setting has no effect if not
+be automatically changed to *no*\ .  This setting has no effect if not
 running on GPUs or if using only one MPI rank. GPU-aware MPI is available
 for OpenMPI 1.8 (or later versions), Mvapich2 1.9 (or later) when the
 "MV2_USE_CUDA" environment variable is set to "1", CrayMPI, and IBM
@ -614,6 +618,10 @@ other force computations on the host CPU.  The *comm* flags, along with the
 This can result in better performance for certain configurations and
 system sizes.

+The following parameters allow users to tune the overall performance
+depending on the simulated systems.  If not explicitly specified,
+their values will be set internally by the KOKKOS package.
+
 The *threads/per/atom* keyword sets the number of GPU vector lanes per atom
 used to perform force calculations.  This keyword is only applicable
 when *neigh/thread* is set to *on*.   For large cutoffs or with a small number
@ -629,7 +637,17 @@ when *neigh/thread* is set to *on*.  The default value of this parameter
 is determined based on the GPU architecture at runtime.

 The *nbin/atoms/per/bin* keyword sets the number of atoms per bin
-used for the neighbor list builds on the GPU.  The default value is 16.
+used for the neighbor list builds on the GPU, which then determines
+the number of GPU threads per bin.  The default value of this parameter is 16.
+
+The *nbor/block/size* keyword sets the number of GPU threads per block
+used for the neighbor list builds on the GPU using the flat method (i.e.,
+each thread finds the neighbor list of an atom).  If not specified, then
+the GPU threads are assigned to the bins.
+
+The *bond/block/size* keyword sets the number of GPU threads per block
+used for launching the bond force kernel on the GPU.  The default value 
+of this parameter is determined based on the GPU architecture at runtime.

 ----------