lammps/src/INTEL/README


                     --------------------------------
                          LAMMPS Intel(R) Package
                     --------------------------------

             W. Michael Brown (Intel) michael.w.brown at intel.com
                  Markus Hohnerbach (RWTH Aachen University)
                   William McDoniel (RWTH Aachen University)
                   Rodrigo Canales (RWTH Aachen University)
                           Stan Moore (Sandia)
                   Ahmed E. Ismail (RWTH Aachen University)
                   Paolo Bientinesi (RWTH Aachen University)
                          Anupama Kurpad (Intel)
                          Biswajit Mishra (Shell)

-----------------------------------------------------------------------------

This package provides LAMMPS styles that:

   1. include support for single and mixed precision in addition to double.
   2. include modifications to support vectorization for key routines
   3. include modifications for data layouts to improve cache efficiency
   3. include modifications to support offload to Intel(R) Xeon Phi(TM)
      coprocessors

-----------------------------------------------------------------------------

As of 2019/03/28 none of the styles provided in this package support
tallying per-atom stresses. Any attempt to compute/access it will
cause an error termination.

-----------------------------------------------------------------------------

For Intel server processors codenamed "Skylake", the following flags should
be added or changed in the Makefile depending on the version:

2017 update 2         - No changes needed
2017 updates 3 or 4   - Use -xCOMMON-AVX512 and not -xHost or -xCORE-AVX512
2018 initial release  - Use -xCOMMON-AVX512 and not -xHost or -xCORE-AVX512
2018u1 or newer       - Use -xHost or -xCORE-AVX512 and -qopt-zmm-usage=high

-----------------------------------------------------------------------------

When using the suffix command with "intel", intel styles will be used if they
exist. If the suffix command is used with "hybrid intel omp" and the OPENMP
is installed, OPENMP styles will be used whenever INTEL styles are not
available. This allow for running most styles in LAMMPS with threading.

-----------------------------------------------------------------------------

The Long-Range Thread mode (LRT) in the Intel package is enabled through the
-DLMP_INTEL_USELRT define at compile time. All intel optimized makefiles
include this define. This feature will use pthreads by default.
Alternatively, "-DLMP_INTEL_LRT11" can be used to build with compilers that
support threads intrinsically using the C++11 standard. When using
LRT mode, you might need to disable OpenMP affinity settings (e.g.
export KMP_AFFINITY=none). LAMMPS will generate a warning if the settings
need to be changed.

-----------------------------------------------------------------------------

Unless Intel Math Kernel Library (MKL) is unavailable, -DLMP_USE_MKL_RNG
should be added to the compile flags. This will enable using the MKL Mersenne
Twister random number generator (RNG) for Dissipative Particle Dynamics
(DPD). This RNG can allow significantly faster performance and it also has a
significantly longer period than the standard RNG for DPD.

-----------------------------------------------------------------------------

In order to use offload to Intel(R) Xeon Phi(TM) coprocessors, the flag
-DLMP_INTEL_OFFLOAD should be set in the Makefile. Offload requires the use of
Intel compilers.

-----------------------------------------------------------------------------

For portability reasons, vectorization directives are currently only enabled
for Intel compilers. Using other compilers may result in significantly
lower performance. This behavior can be changed by defining
LMP_SIMD_COMPILER for the preprocessor (see intel_preprocess.h).

-----------------------------------------------------------------------------

By default, when running with offload to Intel(R) coprocessors, affinity
for host MPI tasks and OpenMP threads is set automatically within the code.
This currently requires the use of system calls. To disable at build time,
compile with -DINTEL_OFFLOAD_NOAFFINITY.

-----------------------------------------------------------------------------

Vector intrinsics are temporarily being used for the Stillinger-Weber
potential to allow for advanced features in the AVX512 instruction set to
be exploited on early hardware. We hope to see compiler improvements for
AVX512 that will eliminate this requirement, so it is not recommended to
develop code based on the intrinsics implementation. Please e-mail the
authors for more details.