use FFT_FFTW_THREADS consistently, add CUFFT to CMake, minor improvements

This commit is contained in:
Axel Kohlmeyer
2020-01-12 21:32:07 -05:00
parent 7cb36d1705
commit 3784df23f0
4 changed files with 36 additions and 21 deletions

View File

@ -49,13 +49,19 @@ through the CMAKE\_CXX\_FLAGS variable. Example for CentOS 7:
-D CMAKE_CXX_FLAGS="-O3 -g -fopenmp -DNDEBUG -std=c++11"
**Makefile.machine setting**\ :
**Makefile.machine setting**\ to bypass the C++11 test and compile in C++98 mode:
.. parsed-literal::
LMP_INC = -DLAMMPS_CXX98
**Makefile.machine setting**\ to enable the C++11 with older (but not too old) GNU c++ (e.g. on CentOS 7):
.. parsed-literal::
CCFLAGS = -g -O3 -std=c++11
----------
@ -86,14 +92,19 @@ LAMMPS can use them if they are available on your system.
an exception to the rule that all CMake variables can be specified
with lower-case values.
Usually these settings are all that is needed. If CMake cannot find
the FFT library, you can set these variables:
Usually these settings are all that is needed. If FFTW3 is selected,
then CMake will try to detect, if threaded FFTW libraries are available
and enable them by default. This setting is independent of whether
OpenMP threads are enabled and a packages like KOKKOS or USER-OMP is
used. If CMake cannot detect the FFT library, you can set these variables
to assist:
.. parsed-literal::
-D FFTW3_INCLUDE_DIRS=path # path to FFTW3 include files
-D FFTW3_LIBRARIES=path # path to FFTW3 libraries
-D FFT_FFTW_THREADS=on # enable using threaded FFTW3 libraries
-D MKL_INCLUDE_DIRS=path # ditto for Intel MKL library
-D MKL_LIBRARIES=path
@ -105,6 +116,7 @@ the FFT library, you can set these variables:
FFT_INC = -DFFT_FFTW3 # -DFFT_FFTW3, -DFFT_FFTW (same as -DFFT_FFTW3), -DFFT_MKL, or -DFFT_KISS
# default is KISS if not specified
FFT_INC = -DFFT_SINGLE # do not specify for double precision
FFT_INC = -DFFT_FFTW_THREADS # enable using threaded FFTW3 libraries
FFT_INC = -DFFT_PACK_ARRAY # or -DFFT_PACK_POINTER or -DFFT_PACK_MEMCPY
# default is FFT\_PACK\_ARRAY if not specified
@ -115,6 +127,7 @@ the FFT library, you can set these variables:
FFT_INC = -I/usr/local/include
FFT_PATH = -L/usr/local/lib
FFT_LIB = -lfftw3 # FFTW3 double precision
FFT_LIB = -lfftw3 -lfftw3_omp # FFTW3 double precision with threads (needs -DFFT_FFTW_THREADS)
FFT_LIB = -lfftw3 -lfftw3f # FFTW3 single precision
FFT_LIB = -lmkl_intel_lp64 -lmkl_sequential -lmkl_core # MKL with Intel compiler
FFT_LIB = -lmkl_gf_lp64 -lmkl_sequential -lmkl_core # MKL with GNU compier
@ -126,16 +139,19 @@ FFT\_LIB with the appropriate FFT libraries to include in the link.
**CMake and make info**\ :
The `KISS FFT library <http://kissfft.sf.net>`_ is included in the LAMMPS
distribution. It is portable across all platforms. Depending on the
size of the FFTs and the number of processors used, the other
libraries listed here can be faster.
distribution. It is portable across all platforms. Depending on the size
of the FFTs and the number of processors used, the other libraries listed
here can be faster.
However, note that long-range Coulombics are only a portion of the
per-timestep CPU cost, FFTs are only a portion of long-range
Coulombics, and 1d FFTs are only a portion of the FFT cost (parallel
communication can be costly). A breakdown of these timings is printed
to the screen at the end of a run using the :doc:`kspace\_style pppm <kspace_style>` command. The :doc:`Run output <Run_output>`
doc page gives more details.
to the screen at the end of a run when using the
:doc:`kspace_style pppm <kspace_style>` command. The :doc:`Run output <Run_output>`
doc page gives more details. A more detailed (and time consuming)
report of the FFT performance is generated with the
:doc:`kspace_modify fftbench yes <kspace_modify>` command.
FFTW is a fast, portable FFT library that should also work on any
platform and can be faster than the KISS FFT library. You can
@ -166,7 +182,7 @@ When using -DFFT\_SINGLE with FFTW3 you may need to build the FFTW
library a second time with support for single-precision.
For FFTW3, do the following, which should produce the additional
library libfftw3f.a
library libfftw3f.a or libfftw3f.so.
.. parsed-literal::