From 806fd8cf11bb7fc18c1507749d98222b35af39c8 Mon Sep 17 00:00:00 2001 From: Axel Kohlmeyer Date: Fri, 10 Apr 2020 23:32:31 -0400 Subject: [PATCH] rework KOKKOS package compilation docs for Kokkos 3.0+ and recent changes in CMake support --- doc/src/Build_extras.rst | 158 +++++++++++++++++++++++++-------------- 1 file changed, 101 insertions(+), 57 deletions(-) diff --git a/doc/src/Build_extras.rst b/doc/src/Build_extras.rst index 2cb4acf2db..00165db81d 100644 --- a/doc/src/Build_extras.rst +++ b/doc/src/Build_extras.rst @@ -279,28 +279,47 @@ using. For example: .. _kokkos: KOKKOS package ---------------------------- +-------------- -To build with this package, you must choose which hardware you want to -build for, either CPUs (multi-threading via OpenMP) or KNLs (OpenMP) -or GPUs (NVIDIA Cuda). +Using the KOKKOS package requires choosing several settings. You have +to select whether you want to compile with parallelization on the host +and whether you want to include offloading of calculations to a device +(e.g. a GPU). The default setting is to have no host parallelization +and no device offloading. In addition, you can select the hardware +architecture to select the instruction set. Since most hardware is +backward compatible, you may choose settings for an older architecture +to have an executable that will run on this and newer architectures. -For a CMake or make build, these are the possible choices for the -``Kokkos_ARCH_`` settings described below. Note that for CMake, -these are really Kokkos variables, not LAMMPS variables. Hence you -must use case-sensitive variable names, i.e. ``Kokkos_ARCH_BDW`` -and **NOT** ``KOKKOS_ARCH_BDW`` or ``Kokkos_arch_bdw`` and so on. +.. note:: + + NVIDIA GPUs with CC 5.0 (Maxwell) and newer are not compatible with + CC 3.x (Kepler). If you run Kokkos on a newer architecture than what + LAMMPS was compiled with, there will be a significant delay during + device initialization since the just-in-time compiler has to + recompile the GPU kernel code for the new hardware. + +The settings discussed below have been tested with LAMMPS and are +confirmed to work. Kokkos is an active project with ongoing improvements +and projects working on including support for additional architectures. +More information on Kokkos can be found on the +`Kokkos GitHub project `_. + +Available Architecture settings +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These are the possible choices for the Kokkos architecture ID. They must +be specified in uppercase. .. list-table:: - :header-rows: 1 + :header-rows: 0 :widths: auto - * - Arch - - Description + * - **Arch-ID** + - **Description** * - AMDAVX - - AMD 64-bit x86 CPUs + - AMD 64-bit x86 CPU (AVX 1) * - EPYC - - AMD EPYC Zen class CPUs + - AMD EPYC Zen class CPU (AVX 2) * - ARMV80 - ARMv8.0 Compatible CPU * - ARMV81 @@ -310,74 +329,78 @@ and **NOT** ``KOKKOS_ARCH_BDW`` or ``Kokkos_arch_bdw`` and so on. * - ARMV8_THUNDERX2 - ARMv8 Cavium ThunderX2 CPU * - WSM - - Intel Westmere CPUs + - Intel Westmere CPU (SSE 4.2) * - SNB - - Intel Sandy/Ivy Bridge CPUs + - Intel Sandy/Ivy Bridge CPU (AVX 1) * - HSW - - Intel Haswell CPUs + - Intel Haswell CPU (AVX 2) * - BDW - - Intel Broadwell Xeon E-class CPUs + - Intel Broadwell Xeon E-class CPU (AVX 2 + transactional mem) * - SKX - - Intel Sky Lake Xeon E-class HPC CPUs (AVX512) + - Intel Sky Lake Xeon E-class HPC CPU (AVX512 + transactional mem) * - KNC - Intel Knights Corner Xeon Phi * - KNL - Intel Knights Landing Xeon Phi * - BGQ - - IBM Blue Gene/Q CPUs + - IBM Blue Gene/Q CPU * - POWER7 - - IBM POWER8 CPUs + - IBM POWER8 CPU * - POWER8 - - IBM POWER8 CPUs + - IBM POWER8 CPU * - POWER9 - - IBM POWER9 CPUs + - IBM POWER9 CPU * - KEPLER30 - - NVIDIA Kepler generation CC 3.0 + - NVIDIA Kepler generation CC 3.0 GPU * - KEPLER32 - - NVIDIA Kepler generation CC 3.2 + - NVIDIA Kepler generation CC 3.2 GPU * - KEPLER35 - - NVIDIA Kepler generation CC 3.5 + - NVIDIA Kepler generation CC 3.5 GPU * - KEPLER37 - - NVIDIA Kepler generation CC 3.7 + - NVIDIA Kepler generation CC 3.7 GPU * - MAXWELL50 - - NVIDIA Maxwell generation CC 5.0 + - NVIDIA Maxwell generation CC 5.0 GPU * - MAXWELL52 - - NVIDIA Maxwell generation CC 5.2 + - NVIDIA Maxwell generation CC 5.2 GPU * - MAXWELL53 - - NVIDIA Maxwell generation CC 5.3 + - NVIDIA Maxwell generation CC 5.3 GPU * - PASCAL60 - - NVIDIA Pascal generation CC 6.0 + - NVIDIA Pascal generation CC 6.0 GPU * - PASCAL61 - - NVIDIA Pascal generation CC 6.1 + - NVIDIA Pascal generation CC 6.1 GPU * - VOLTA70 - - NVIDIA Volta generation CC 7.0 + - NVIDIA Volta generation CC 7.0 GPU * - VOLTA72 - - NVIDIA Volta generation CC 7.2 + - NVIDIA Volta generation CC 7.2 GPU * - TURING75 - - NVIDIA Turing generation CC 7.5 - -**CMake build**\ : + - NVIDIA Turing generation CC 7.5 GPU +CMake build settings: +^^^^^^^^^^^^^^^^^^^^^ For multicore CPUs using OpenMP, set these 2 variables. .. code-block:: bash - -D Kokkos_ARCH_=yes # = CPU from list above + -D Kokkos_ARCH_CPUARCH=yes # CPUARCH = CPU from list above -D Kokkos_ENABLE_OPENMP=yes + -D BUILD_OMP=yes -For Intel KNLs using OpenMP, set these 2 variables: +Please note that enabling OpenMP for KOKKOS requires that OpenMP is +also :ref:`enabled for the rest of LAMMPS `. + +For Intel KNLs using OpenMP, set these variables: .. code-block:: bash -D Kokkos_ARCH_KNL=yes -D Kokkos_ENABLE_OPENMP=yes -For NVIDIA GPUs using CUDA, set these 4 variables: +For NVIDIA GPUs using CUDA, set these variables: .. code-block:: bash - -D Kokkos_ARCH_=yes # = CPU from list above that is hosting the GPU - -D Kokkos_ARCH_=yes # = GPU from list above + -D Kokkos_ARCH_CPUARCH=yes # CPUARCH = CPU from list above + -D Kokkos_ARCH_GPUARCH=yes # GPUARCH = GPU from list above -D Kokkos_ENABLE_CUDA=yes -D Kokkos_ENABLE_OPENMP=yes -D CMAKE_CXX_COMPILER=wrapper # wrapper = full path to Cuda nvcc wrapper @@ -388,20 +411,35 @@ include the full path name to the wrapper, e.g. .. code-block:: bash - -D CMAKE_CXX_COMPILER=/home/username/lammps/lib/kokkos/bin/nvcc_wrapper + -D CMAKE_CXX_COMPILER=${HOME}/lammps/lib/kokkos/bin/nvcc_wrapper -**Traditional make**\ : +To simplify the compilation, three preset files are included in the +``cmake/presets`` folder, ``kokkos-serial.cmake``, ``kokkos-openmp.cmake``, +and ``kokkos-cuda.cmake``. They will enable the KOKKOS package and +enable some hardware choice. So to compile with OpenMP host parallelization, +CUDA device parallelization (for GPUs with CC 5.0 and up) with some +common packages enabled, you can do the following: + +.. code-block:: bash + + mkdir build-kokkos-cuda + cd build-kokkos-cuda + cmake -C ../cmake/presets/minimal.cmake -C ../cmake/presets/kokkos-cuda.cmake ../cmake + cmake --build . + +Traditional make settings: +^^^^^^^^^^^^^^^^^^^^^^^^^^ Choose which hardware to support in ``Makefile.machine`` via ``KOKKOS_DEVICES`` and ``KOKKOS_ARCH`` settings. See the -``src/MAKE/OPTIONS/Makefile.kokkos\*`` files for examples. +``src/MAKE/OPTIONS/Makefile.kokkos*`` files for examples. For multicore CPUs using OpenMP: .. code-block:: make KOKKOS_DEVICES = OpenMP - KOKKOS_ARCH = archCPU # archCPU = CPU from list above + KOKKOS_ARCH = CPUARCH # CPUARCH = CPU from list above For Intel KNLs using OpenMP: @@ -415,22 +453,28 @@ For NVIDIA GPUs using CUDA: .. code-block:: make KOKKOS_DEVICES = Cuda - KOKKOS_ARCH = archCPU,archGPU # archCPU = CPU from list above that is hosting the GPU - # archGPU = GPU from list above - FFT_INC = -DFFT_CUFFT # enable use of cuFFT (optional) - FFT_LIB = -lcufft # link to cuFFT library + KOKKOS_ARCH = CPUARCH,GPUARCH # CPUARCH = CPU from list above that is hosting the GPU + # GPUARCH = GPU from list above + FFT_INC = -DFFT_CUFFT # enable use of cuFFT (optional) + FFT_LIB = -lcufft # link to cuFFT library -For GPUs, you also need the following 2 lines in your Makefile.machine -before the CC line is defined, in this case for use with OpenMPI mpicxx. -The 2 lines define a nvcc wrapper compiler, which will use nvcc for -compiling CUDA files and use a C++ compiler for non-Kokkos, non-CUDA -files. +For GPUs, you also need the following lines in your ``Makefile.machine`` +before the CC line is defined. They tell ``mpicxx`` to use an ``nvcc`` +compiler wrapper, which will use ``nvcc`` for compiling CUDA files and a +C++ compiler for non-Kokkos, non-CUDA files. .. code-block:: make + # For OpenMPI KOKKOS_ABSOLUTE_PATH = $(shell cd $(KOKKOS_PATH); pwd) export OMPI_CXX = $(KOKKOS_ABSOLUTE_PATH)/config/nvcc_wrapper - CC = mpicxx + CC = mpicxx + +.. code-block:: make + + # For MPICH and derivatives + KOKKOS_ABSOLUTE_PATH = $(shell cd $(KOKKOS_PATH); pwd) + CC = mpicxx -cxx=$(KOKKOS_ABSOLUTE_PATH)/config/nvcc_wrapper ---------- @@ -469,7 +513,7 @@ args: $ make lib-latte args="-b" # download and build in lib/latte/LATTE-master $ make lib-latte args="-p $HOME/latte" # use existing LATTE installation in $HOME/latte $ make lib-latte args="-b -m gfortran" # download and build in lib/latte and - # copy Makefile.lammps.gfortran to Makefile.lammps + # copy Makefile.lammps.gfortran to Makefile.lammps Note that 3 symbolic (soft) links, "includelink" and "liblink" and "filelink.o", are created in lib/latte to point into the LATTE home