Change defaults for GPU-direct to use comm host

2018-08-08 16:46:36 -06:00
parent d8aa6d534b
commit 1c550d8f39
3 changed files with 23 additions and 12 deletions
--- a/doc/src/Speed_kokkos.txt
+++ b/doc/src/Speed_kokkos.txt
@ -102,8 +102,8 @@ the case, especially when using pre-compiled MPI libraries provided by
 a Linux distribution. This is not a problem when using only a single
 GPU and a single MPI rank on a desktop. When running with multiple
 MPI ranks, you may see segmentation faults without GPU-direct support.
-These can be avoided by adding the flags '-pk kokkos comm no gpu/direct no'
-to the LAMMPS command line or using "package kokkos comm no gpu/direct no"_package.html
+These can be avoided by adding the flags '-pk kokkos gpu/direct no'
+to the LAMMPS command line or using "package kokkos gpu/direct no"_package.html
 in the input file.

 Use a C++11 compatible compiler and set KOKKOS_ARCH variable in
@ -273,8 +273,7 @@ to the same GPU with the KOKKOS package, but this is usually only
 faster if significant portions of the input script have not been
 ported to use Kokkos. Using CUDA MPS is recommended in this
 scenario. Using a CUDA-aware MPI library with support for GPU-direct
-is highly recommended and for some KOKKOS-enabled styles even required.
-Most GPU-direct use can be avoided by using "-pk kokkos comm no".
+is highly recommended. GPU-direct use can be avoided by using "-pk kokkos gpu/direct no".
 As above for multi-core CPUs (and no GPU), if N is the number of
 physical cores/node, then the number of MPI tasks/node should not
 exceed N.
--- a/doc/src/package.txt
+++ b/doc/src/package.txt
@ -489,10 +489,9 @@ packing/unpacking operation.

 The optimal choice for these keywords depends on the input script and
 the hardware used.  The {no} value is useful for verifying that the
-Kokkos-based {host} and {device} values are working correctly.  The {no}
-value should also be used, in case of using an MPI library that does
-not support GPU-direct. It may also be the fastest choice when using
-Kokkos styles in MPI-only mode (i.e. with a thread count of 1).
+Kokkos-based {host} and {device} values are working correctly. 
+It may also be the fastest choice when using Kokkos styles in
+MPI-only mode (i.e. with a thread count of 1).

 When running on CPUs or Xeon Phi, the {host} and {device} values work
 identically.  When using GPUs, the {device} value will typically be
@ -513,7 +512,9 @@ this keyword is set to {on}, buffers in GPU memory are passed directly
 through MPI send/receive calls. This reduces overhead of first copying 
 the data to the host CPU. However GPU-direct is not supported on all 
 systems, which can lead to segmentation faults and would require
-using a value of {off}. 
+using a value of {off}. When the {gpu/direct} keyword is set to {off}
+while any of the {comm} keywords are set to {device}, the value for the
+{comm} keywords will be automatically changed to {host}.

 :line

--- a/src/KOKKOS/kokkos.cpp
+++ b/src/KOKKOS/kokkos.cpp
@ -156,11 +156,11 @@ KokkosLMP::KokkosLMP(LAMMPS *lmp, int narg, char **arg) : Pointers(lmp)
    } else if (-1 == have_gpu_direct() ) {
      error->warning(FLERR,"Kokkos with CUDA assumes GPU-direct is available,"
                     " but cannot determine if this is the case\n         try"
-                     " '-pk kokkos comm no gpu/direct no' when getting segmentation faults");
+                     " '-pk kokkos gpu/direct no' when getting segmentation faults");
    } else if ( 0 == have_gpu_direct() ) {
      error->warning(FLERR,"GPU-direct is NOT available, but some parts of "
                     "Kokkos with CUDA require it\n         try"
-                     " '-pk kokkos comm no gpu/direct no' when getting segmentation faults");
+                     " '-pk kokkos gpu/direct no' when getting segmentation faults");
    } else {
      ; // should never get here
    }
@ -186,7 +186,7 @@ KokkosLMP::KokkosLMP(LAMMPS *lmp, int narg, char **arg) : Pointers(lmp)
  exchange_comm_on_host = 0;
  forward_comm_on_host = 0;
  reverse_comm_on_host = 0;
-  gpu_direct = 0;
+  gpu_direct = 1;

 #ifdef KILL_KOKKOS_ON_SIGSEGV
  signal(SIGSEGV, my_signal_handler);
@ -310,6 +310,17 @@ void KokkosLMP::accelerator(int narg, char **arg)
    } else error->all(FLERR,"Illegal package kokkos command");
  }

+  // if "gpu/direct no" and "comm device", change to "comm host"
+
+  if (!gpu_direct) {
+   if (exchange_comm_classic == 0 && exchange_comm_on_host == 0)
+     exchange_comm_on_host = 1;
+   if (forward_comm_classic == 0 && forward_comm_on_host == 0)
+     forward_comm_on_host = 1;
+   if (reverse_comm_classic == 0 && reverse_comm_on_host == 0)
+     reverse_comm_on_host = 1;
+  }
+
  // set newton flags
  // set neighbor binsize, same as neigh_modify command