Change defaults for GPU-direct to use comm host
This commit is contained in:
@ -102,8 +102,8 @@ the case, especially when using pre-compiled MPI libraries provided by
|
||||
a Linux distribution. This is not a problem when using only a single
|
||||
GPU and a single MPI rank on a desktop. When running with multiple
|
||||
MPI ranks, you may see segmentation faults without GPU-direct support.
|
||||
These can be avoided by adding the flags '-pk kokkos comm no gpu/direct no'
|
||||
to the LAMMPS command line or using "package kokkos comm no gpu/direct no"_package.html
|
||||
These can be avoided by adding the flags '-pk kokkos gpu/direct no'
|
||||
to the LAMMPS command line or using "package kokkos gpu/direct no"_package.html
|
||||
in the input file.
|
||||
|
||||
Use a C++11 compatible compiler and set KOKKOS_ARCH variable in
|
||||
@ -273,8 +273,7 @@ to the same GPU with the KOKKOS package, but this is usually only
|
||||
faster if significant portions of the input script have not been
|
||||
ported to use Kokkos. Using CUDA MPS is recommended in this
|
||||
scenario. Using a CUDA-aware MPI library with support for GPU-direct
|
||||
is highly recommended and for some KOKKOS-enabled styles even required.
|
||||
Most GPU-direct use can be avoided by using "-pk kokkos comm no".
|
||||
is highly recommended. GPU-direct use can be avoided by using "-pk kokkos gpu/direct no".
|
||||
As above for multi-core CPUs (and no GPU), if N is the number of
|
||||
physical cores/node, then the number of MPI tasks/node should not
|
||||
exceed N.
|
||||
|
||||
@ -489,10 +489,9 @@ packing/unpacking operation.
|
||||
|
||||
The optimal choice for these keywords depends on the input script and
|
||||
the hardware used. The {no} value is useful for verifying that the
|
||||
Kokkos-based {host} and {device} values are working correctly. The {no}
|
||||
value should also be used, in case of using an MPI library that does
|
||||
not support GPU-direct. It may also be the fastest choice when using
|
||||
Kokkos styles in MPI-only mode (i.e. with a thread count of 1).
|
||||
Kokkos-based {host} and {device} values are working correctly.
|
||||
It may also be the fastest choice when using Kokkos styles in
|
||||
MPI-only mode (i.e. with a thread count of 1).
|
||||
|
||||
When running on CPUs or Xeon Phi, the {host} and {device} values work
|
||||
identically. When using GPUs, the {device} value will typically be
|
||||
@ -513,7 +512,9 @@ this keyword is set to {on}, buffers in GPU memory are passed directly
|
||||
through MPI send/receive calls. This reduces overhead of first copying
|
||||
the data to the host CPU. However GPU-direct is not supported on all
|
||||
systems, which can lead to segmentation faults and would require
|
||||
using a value of {off}.
|
||||
using a value of {off}. When the {gpu/direct} keyword is set to {off}
|
||||
while any of the {comm} keywords are set to {device}, the value for the
|
||||
{comm} keywords will be automatically changed to {host}.
|
||||
|
||||
:line
|
||||
|
||||
|
||||
@ -156,11 +156,11 @@ KokkosLMP::KokkosLMP(LAMMPS *lmp, int narg, char **arg) : Pointers(lmp)
|
||||
} else if (-1 == have_gpu_direct() ) {
|
||||
error->warning(FLERR,"Kokkos with CUDA assumes GPU-direct is available,"
|
||||
" but cannot determine if this is the case\n try"
|
||||
" '-pk kokkos comm no gpu/direct no' when getting segmentation faults");
|
||||
" '-pk kokkos gpu/direct no' when getting segmentation faults");
|
||||
} else if ( 0 == have_gpu_direct() ) {
|
||||
error->warning(FLERR,"GPU-direct is NOT available, but some parts of "
|
||||
"Kokkos with CUDA require it\n try"
|
||||
" '-pk kokkos comm no gpu/direct no' when getting segmentation faults");
|
||||
" '-pk kokkos gpu/direct no' when getting segmentation faults");
|
||||
} else {
|
||||
; // should never get here
|
||||
}
|
||||
@ -186,7 +186,7 @@ KokkosLMP::KokkosLMP(LAMMPS *lmp, int narg, char **arg) : Pointers(lmp)
|
||||
exchange_comm_on_host = 0;
|
||||
forward_comm_on_host = 0;
|
||||
reverse_comm_on_host = 0;
|
||||
gpu_direct = 0;
|
||||
gpu_direct = 1;
|
||||
|
||||
#ifdef KILL_KOKKOS_ON_SIGSEGV
|
||||
signal(SIGSEGV, my_signal_handler);
|
||||
@ -310,6 +310,17 @@ void KokkosLMP::accelerator(int narg, char **arg)
|
||||
} else error->all(FLERR,"Illegal package kokkos command");
|
||||
}
|
||||
|
||||
// if "gpu/direct no" and "comm device", change to "comm host"
|
||||
|
||||
if (!gpu_direct) {
|
||||
if (exchange_comm_classic == 0 && exchange_comm_on_host == 0)
|
||||
exchange_comm_on_host = 1;
|
||||
if (forward_comm_classic == 0 && forward_comm_on_host == 0)
|
||||
forward_comm_on_host = 1;
|
||||
if (reverse_comm_classic == 0 && reverse_comm_on_host == 0)
|
||||
reverse_comm_on_host = 1;
|
||||
}
|
||||
|
||||
// set newton flags
|
||||
// set neighbor binsize, same as neigh_modify command
|
||||
|
||||
|
||||
Reference in New Issue
Block a user