From bd6d7b91365704780061ba48969c9cd63b73d71d Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Wed, 13 Apr 2022 14:22:32 -0400 Subject: [PATCH] clarify CUDA versus OpenCL build and runtime restrictions --- doc/src/Build_extras.rst | 9 +++++---- lib/gpu/README | 6 ++++-- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/doc/src/Build_extras.rst b/doc/src/Build_extras.rst index ca55038d7b..d2d12b48db 100644 --- a/doc/src/Build_extras.rst +++ b/doc/src/Build_extras.rst @@ -148,7 +148,6 @@ CMake build * sm_70 for Volta (supported since CUDA 9) * sm_75 for Turing (supported since CUDA 10) * sm_80 for Ampere (supported since CUDA 11) -.. * sm_90 for Hopper (supported since CUDA 12) A more detailed list can be found, for example, at `Wikipedia's CUDA article `_ @@ -159,9 +158,11 @@ Thus the GPU_ARCH setting is merely an optimization, to have code for the preferred GPU architecture directly included rather than having to wait for the JIT compiler of the CUDA driver to translate it. -Version 8.0 or later of the CUDA toolkit is required and a GPU architecture -of Kepler or laters, which must *also* be supported by the CUDA toolkit in use -**and** the CUDA driver in use. +When compiling for CUDA or HIP with CUDA, version 8.0 or later of the CUDA toolkit +is required and a GPU architecture of Kepler or later, which must *also* be +supported by the CUDA toolkit in use **and** the CUDA driver in use. +When compiling for OpenCL, OpenCL version 1.2 or later is required and the +GPU must be supported by the GPU driver and OpenCL runtime bundled with the driver. When building with CMake, you **must NOT** build the GPU library in ``lib/gpu`` using the traditional build procedure. CMake will detect files generated by that diff --git a/lib/gpu/README b/lib/gpu/README index b8866cf79e..100179feca 100644 --- a/lib/gpu/README +++ b/lib/gpu/README @@ -176,7 +176,8 @@ Makefile.linux_multi after adjusting the settings for the CUDA toolkit in use. Only CUDA toolkit version 8.0 and later and only GPU architecture 3.0 (aka Kepler) and later are supported by this version of LAMMPS. If you want -to use older hard- or software you have to use an older version of LAMMPS. +to use older hard- or software you have to compile for OpenCL or use an older +version of LAMMPS. If you do not want to use a fat binary, that supports multiple CUDA architectures, the CUDA_ARCH must be set to match the GPU architecture. This @@ -230,7 +231,8 @@ If GERYON_NUMA_FISSION is defined at build time, LAMMPS will consider separate NUMA nodes on GPUs or accelerators as separate devices. For example, a 2-socket CPU would appear as two separate devices for OpenCL (and LAMMPS would require two MPI processes to use both sockets with the GPU library - each with its -own device ID as output by ocl_get_devices). +own device ID as output by ocl_get_devices). OpenCL version 1.2 or later is +required. For a debug build, use "-DUCL_DEBUG -DGERYON_KERNEL_DUMP" and remove "-DUCL_NO_EXIT" and "-DMPI_GERYON" from the build options.