From b2f3ef52e4e64bbf259710f40671e76102c3709b Mon Sep 17 00:00:00 2001
From: sjplimp The USER-INTEL package was developed by Mike Brown at Intel
Corporation. It provides a capability to accelerate simulations by
offloading neighbor list and non-bonded force calculations to Intel
-coprocessors. Additionally, it supports running simulations in
-single, mixed, or double precision with vectorization, even if a
-coprocessor is not present. The same C++ code is used for both cases.
-When offloading to a coprocessor, the routine is run twice, once with
-an offload flag.
+coprocessors (Xeon Phi). Additionally, it supports running
+simulations in single, mixed, or double precision with vectorization,
+even if a coprocessor is not present, i.e. on an Intel CPU. The same
+C++ code is used for both cases. When offloading to a coprocessor,
+the routine is run twice, once with an offload flag.
The USER-INTEL package will work with the USER-OMP package. Specifying
-use of the Intel package implicitly includes the OMP package allowing
-it to be used for angle, bond, dihedral, and long-range
-electrostatics. Using the suffix intel command will use
-styles from the Intel package if available; otherwise it will use
-styles from the OMP package if available.
+ The USER-INTEL package can be used in tandem with the USER-OMP
+package. This is useful when a USER-INTEL pair style is used, so that
+other styles not supported by the USER-INTEL package, e.g. for bond,
+angle, dihedral, improper, and long-range electrostatics can be run
+with the USER-OMP package versions. If you have built LAMMPS with
+both the USER-INTEL and USER-OMP packages, then this mode of operation
+is made easier, because the "-suffix intel" command-line
+switch and the the suffix
+intel command will both set a second-choice suffix to
+"omp" so that styles from the USER-OMP package will be used if
+available.
Building LAMMPS with the USER-INTEL package:
The procedure for building LAMMPS with the USER-INTEL package is
simple. You have to edit your machine specific makefile to add the
flags to enable OpenMP support (-openmp) to both the CCFLAGS and
-LINKFLAGS variables. You also need to add -restrict to CCFLAGS. If
-you are compiling on the same architecture that will be used for the
-runs, adding the flag -xHost will enable vectorization with the
-Intel compiler. In order to build with support for an Intel
+LINKFLAGS variables. You also need to add -DLAMMPS_MEMALIGN=64 and
+-restrict to CCFLAGS.
+ If you are compiling on the same architecture that will be used for
+the runs, adding the flag -xHost will enable vectorization with the
+Intel compiler. In order to build with support for an Intel
coprocessor, the flag -offload should be added to the LINKFLAGS line
and the flag -DLMP_INTEL_OFFLOAD should be added to the CCFLAGS
line.
The files src/MAKE/Makefile.intel and src/MAKE/Makefile.intel_offload
-are provided with options that perform well with the Intel
-compiler. The latter Makefile has support for offload to coprocessors
-and the former does not.
+are included in the src/MAKE directory with options that perform well
+with the Intel compiler. The latter Makefile has support for offload
+to coprocessors and the former does not.
It is recommended that Intel Compiler 2013 SP1 update 1 be used for
compiling. Newer versions have some performance issues that are being
addressed. If using Intel MPI, version 5 or higher is recommended.
The rest of the compilation is the same as for any other package that
-has no additional library dependencies:
+has no additional library dependencies, e.g.
Running an input script:
@@ -1032,94 +1039,97 @@ commands, and is independent of the Intel package.
Input script requirements to run using pair styles with a intel
suffix are as follows:
To invoke specific styles from the Intel package, either append
+ To invoke specific styles from the UESR-INTEL package, either append
"intel" to the style name (e.g. pair_style lj/cut/intel), or use the
-suffix command-line switch, or use the
suffix command in the input script.
Unless the -suffix intel command-line
-switch is used, the package
+switch is used, a package
intel command must be used near the beginning of the
-script. The default precision mode for the Intel package is mixed,
-meaning that accumulation is performed in double precision and other
-calculations are performed in single precision. In order to use all
-single or all double precision, the "package intel" line must be used
-in the input script with a "single" or "double" keyword specified.
+input script. The default precision mode for the USER-INTEL package
+is mixed, meaning that accumulation is performed in double precision
+and other calculations are performed in single precision. In order to
+use all single or all double precision, the package
+intel command must be used in the input script with a
+"single" or "double" keyword specified.
Running with an Intel coprocessor:
The Intel package supports offload of a fraction of the work to Intel
-coprocessors. This is accomplished by setting a balance fraction on
-the package intel line. A balance of 0 runs all
-calculations on the CPU. A balance of 1 runs all calculations on the
-coprocessor. A balance of 0.5 runs half of the calculations on the
-coprocessor. Setting the balance to -1 will enable dynamic load
-balancing that continously adjusts the fraction of offloaded work
-throughout the simulation. This option is typically within 5 to 10
-percent of the optimal fixed balance. By default, using the suffix
-command or command-line switch will use offload to a coprocessor with
-the balance set to -1. If LAMMPS is built without offload support,
-this setting is ignored.
+ The USER-INTEL package supports offload of a fraction of the work to
+Intel coprocessors (Xeon Phi). This is accomplished by setting a
+balance fraction on the package intel command. A
+balance of 0 runs all calculations on the CPU. A balance of 1 runs
+all calculations on the coprocessor. A balance of 0.5 runs half of
+the calculations on the coprocessor. Setting the balance to -1 will
+enable dynamic load balancing that continously adjusts the fraction of
+offloaded work throughout the simulation. This option typically
+produces results within 5 to 10 percent of the optimal fixed balance.
+By default, using the suffix command or -suffix
+command-line switch will use offload to a
+coprocessor with the balance set to -1. If LAMMPS is built without
+offload support, this setting is ignored.
If one is running short benchmark runs with dynamic load balancing,
adding a short warm-up run (10-20 steps) will allow the load-balancer
-to find a setting that will be carried over to additional runs.
+to find a setting that will carry over to additional runs.
The default for the package intel command is to have
-all of the MPI tasks on a given compute node use a single
-coprocessor. In general, running with a large number of MPI tasks on
-each node will perform best with offload. Each MPI task will
+all the MPI tasks on a given compute node use a single coprocessor
+(Xeon Phi). In general, running with a large number of MPI tasks on
+each node will perform best with offload. Each MPI task will
automatically get affinity to a subset of the hardware threads
-available on the coprocessor. For example, if your card has 61 cores,
-with 60 cores available for offload and 4 hardware threads per core,
-running with 24 MPI tasks per node will cause each MPI task to use a
-subset of 10 threads on the coprocessor. Fine tuning of the number of
-threads to use per MPI task or the number of threads to use per core
-can be accomplished with keywords to the package intel
-command.
+available on the coprocessor. For example, if your card has 61 cores,
+with 60 cores available for offload and 4 hardware threads per core
+(240 total threads), running with 24 MPI tasks per node will cause
+each MPI task to use a subset of 10 threads on the coprocessor. Fine
+tuning of the number of threads to use per MPI task or the number of
+threads to use per core can be accomplished with keywords to the
+package intel command.
If LAMMPS is using offload to a coprocessor, a diagnostic line during
-the setup for a run is printed to the screen (not to log files)
-indicating that offload is being used and the number of coprocessor
-threads per MPI task. Additionally, an offload timing summary is
-printed at the end of each run. When using offload, the
-sort frequency for atom data is changed to 1 such
-that the data is sorted every neighbor build.
+ If LAMMPS is using offload to a coprocessor (Xeon Phi), a diagnostic
+line during the setup for a run is printed to the screen (not to log
+files) indicating that offload is being used and the number of
+coprocessor threads per MPI task. Additionally, an offload timing
+summary is printed at the end of each run. When using offload, the
+sort frequency for atom data is changed to 1 so
+that the per-atom data is sorted every neighbor build.
In order to use multiple coprocessors on each compute node, the
+ To use multiple coprocessors (Xeon Phis) on each compute node, the
offload_cards keyword can be specified with the package
intel command to specify the number of coprocessors to
use.
For simulations involving long-range electrostatics or angle, bond,
-and dihedral calculations, computation and data transfer to the
+ For simulations with long-range electrostatics or bond, angle,
+dihedral, improper calculations, computation and data transfer to the
coprocessor will run concurrently with computations and MPI
-communications for these routines on the host. The Intel package has
-two modes for deciding which atoms will be handled by the coprocessor.
-The setting is controlled with the "offload_ghost" option. When set to
-0, ghost atoms (atoms at the borders between MPI tasks) are not
-offloaded to the card. This allows for overlap of MPI communication of
-forces with computation on the coprocessor when the
-newton setting is "on". The default is dependent on the
-style being used, however, better performance might be achieving by
+communications for these routines on the host. The USER-INTEL package
+has two modes for deciding which atoms will be handled by the
+coprocessor. The setting is controlled with the "offload_ghost"
+option. When set to 0, ghost atoms (atoms at the borders between MPI
+tasks) are not offloaded to the card. This allows for overlap of MPI
+communication of forces with computation on the coprocessor when the
+newton setting is "on". The default is dependent on the
+style being used, however, better performance might be achieved by
setting this explictly.
In order to control the number of OpenMP threads used on the host, the
OMP_NUM_THREADS environment variable should be set. This variable will
not influence the number of threads used on the coprocessor. Only the
-"package intel" command can be used to control thread counts on the
-coprocessor.
+package intel command can be used to control thread
+counts on the coprocessor.
Restrictions:
When using offload, hybrid styles that require skip
lists for neighbor builds cannot be offloaded to the coprocessor.
-Using hybrid/overlay is allowed. Only one intel
-accelerated style may be used with hybrid styles. Exclusion lists are
+Using hybrid/overlay is allowed. Only one intel
+accelerated style may be used with hybrid styles. Exclusion lists are
not currently supported with offload, however, the same effect can
-often be accomplished by setting cutoffs for excluded atom types to
-0. None of the pair styles in the USER-OMP package support the
-"inner", "middle", "outer" options for r-RESPA integration.
+often be accomplished by setting cutoffs for excluded atom types to 0.
+None of the pair styles in the USER-OMP package currently support the
+"inner", "middle", "outer" options for rRESPA integration via the
+run_style respa command.
make yes-user-omp yes-user-intel
+
make yes-user-intel yes-user-omp
make machine
diff --git a/doc/Section_accelerate.txt b/doc/Section_accelerate.txt
index bfeba1dfd6..160d67c108 100644
--- a/doc/Section_accelerate.txt
+++ b/doc/Section_accelerate.txt
@@ -974,45 +974,52 @@ LAMMPS.
The USER-INTEL package was developed by Mike Brown at Intel
Corporation. It provides a capability to accelerate simulations by
offloading neighbor list and non-bonded force calculations to Intel
-coprocessors. Additionally, it supports running simulations in
-single, mixed, or double precision with vectorization, even if a
-coprocessor is not present. The same C++ code is used for both cases.
-When offloading to a coprocessor, the routine is run twice, once with
-an offload flag.
+coprocessors (Xeon Phi). Additionally, it supports running
+simulations in single, mixed, or double precision with vectorization,
+even if a coprocessor is not present, i.e. on an Intel CPU. The same
+C++ code is used for both cases. When offloading to a coprocessor,
+the routine is run twice, once with an offload flag.
-The USER-INTEL package will work with the USER-OMP package. Specifying
-use of the Intel package implicitly includes the OMP package allowing
-it to be used for angle, bond, dihedral, and long-range
-electrostatics. Using the "suffix intel"_suffix.html command will use
-styles from the Intel package if available; otherwise it will use
-styles from the OMP package if available.
+The USER-INTEL package can be used in tandem with the USER-OMP
+package. This is useful when a USER-INTEL pair style is used, so that
+other styles not supported by the USER-INTEL package, e.g. for bond,
+angle, dihedral, improper, and long-range electrostatics can be run
+with the USER-OMP package versions. If you have built LAMMPS with
+both the USER-INTEL and USER-OMP packages, then this mode of operation
+is made easier, because the "-suffix intel" "command-line
+switch"_Section_start.html#start_7 and the the "suffix
+intel"_suffix.html command will both set a second-choice suffix to
+"omp" so that styles from the USER-OMP package will be used if
+available.
[Building LAMMPS with the USER-INTEL package:]
The procedure for building LAMMPS with the USER-INTEL package is
simple. You have to edit your machine specific makefile to add the
flags to enable OpenMP support ({-openmp}) to both the CCFLAGS and
-LINKFLAGS variables. You also need to add -restrict to CCFLAGS. If
-you are compiling on the same architecture that will be used for the
-runs, adding the flag {-xHost} will enable vectorization with the
-Intel compiler. In order to build with support for an Intel
+LINKFLAGS variables. You also need to add -DLAMMPS_MEMALIGN=64 and
+-restrict to CCFLAGS.
+
+If you are compiling on the same architecture that will be used for
+the runs, adding the flag {-xHost} will enable vectorization with the
+Intel compiler. In order to build with support for an Intel
coprocessor, the flag {-offload} should be added to the LINKFLAGS line
and the flag {-DLMP_INTEL_OFFLOAD} should be added to the CCFLAGS
line.
The files src/MAKE/Makefile.intel and src/MAKE/Makefile.intel_offload
-are provided with options that perform well with the Intel
-compiler. The latter Makefile has support for offload to coprocessors
-and the former does not.
+are included in the src/MAKE directory with options that perform well
+with the Intel compiler. The latter Makefile has support for offload
+to coprocessors and the former does not.
It is recommended that Intel Compiler 2013 SP1 update 1 be used for
compiling. Newer versions have some performance issues that are being
addressed. If using Intel MPI, version 5 or higher is recommended.
The rest of the compilation is the same as for any other package that
-has no additional library dependencies:
+has no additional library dependencies, e.g.
-make yes-user-omp yes-user-intel
+make yes-user-intel yes-user-omp
make machine :pre
[Running an input script:]
@@ -1028,94 +1035,97 @@ commands, and is independent of the Intel package.
Input script requirements to run using pair styles with a {intel}
suffix are as follows:
-To invoke specific styles from the Intel package, either append
+To invoke specific styles from the UESR-INTEL package, either append
"intel" to the style name (e.g. pair_style lj/cut/intel), or use the
"-suffix command-line switch"_Section_start.html#start_7, or use the
"suffix"_suffix.html command in the input script.
Unless the "-suffix intel command-line
-switch"_Section_start.html#start_7 is used, the "package
+switch"_Section_start.html#start_7 is used, a "package
intel"_package.html command must be used near the beginning of the
-script. The default precision mode for the Intel package is {mixed},
-meaning that accumulation is performed in double precision and other
-calculations are performed in single precision. In order to use all
-single or all double precision, the "package intel" line must be used
-in the input script with a "single" or "double" keyword specified.
+input script. The default precision mode for the USER-INTEL package
+is {mixed}, meaning that accumulation is performed in double precision
+and other calculations are performed in single precision. In order to
+use all single or all double precision, the "package
+intel"_package.html command must be used in the input script with a
+"single" or "double" keyword specified.
[Running with an Intel coprocessor:]
-The Intel package supports offload of a fraction of the work to Intel
-coprocessors. This is accomplished by setting a balance fraction on
-the "package intel"_package.html line. A balance of 0 runs all
-calculations on the CPU. A balance of 1 runs all calculations on the
-coprocessor. A balance of 0.5 runs half of the calculations on the
-coprocessor. Setting the balance to -1 will enable dynamic load
-balancing that continously adjusts the fraction of offloaded work
-throughout the simulation. This option is typically within 5 to 10
-percent of the optimal fixed balance. By default, using the suffix
-command or command-line switch will use offload to a coprocessor with
-the balance set to -1. If LAMMPS is built without offload support,
-this setting is ignored.
+The USER-INTEL package supports offload of a fraction of the work to
+Intel coprocessors (Xeon Phi). This is accomplished by setting a
+balance fraction on the "package intel"_package.html command. A
+balance of 0 runs all calculations on the CPU. A balance of 1 runs
+all calculations on the coprocessor. A balance of 0.5 runs half of
+the calculations on the coprocessor. Setting the balance to -1 will
+enable dynamic load balancing that continously adjusts the fraction of
+offloaded work throughout the simulation. This option typically
+produces results within 5 to 10 percent of the optimal fixed balance.
+By default, using the "suffix"_suffix.html command or "-suffix
+command-line switch"_Section_start.html#start_7 will use offload to a
+coprocessor with the balance set to -1. If LAMMPS is built without
+offload support, this setting is ignored.
If one is running short benchmark runs with dynamic load balancing,
adding a short warm-up run (10-20 steps) will allow the load-balancer
-to find a setting that will be carried over to additional runs.
+to find a setting that will carry over to additional runs.
The default for the "package intel"_package.html command is to have
-all of the MPI tasks on a given compute node use a single
-coprocessor. In general, running with a large number of MPI tasks on
-each node will perform best with offload. Each MPI task will
+all the MPI tasks on a given compute node use a single coprocessor
+(Xeon Phi). In general, running with a large number of MPI tasks on
+each node will perform best with offload. Each MPI task will
automatically get affinity to a subset of the hardware threads
-available on the coprocessor. For example, if your card has 61 cores,
-with 60 cores available for offload and 4 hardware threads per core,
-running with 24 MPI tasks per node will cause each MPI task to use a
-subset of 10 threads on the coprocessor. Fine tuning of the number of
-threads to use per MPI task or the number of threads to use per core
-can be accomplished with keywords to the "package intel"_package.html
-command.
+available on the coprocessor. For example, if your card has 61 cores,
+with 60 cores available for offload and 4 hardware threads per core
+(240 total threads), running with 24 MPI tasks per node will cause
+each MPI task to use a subset of 10 threads on the coprocessor. Fine
+tuning of the number of threads to use per MPI task or the number of
+threads to use per core can be accomplished with keywords to the
+"package intel"_package.html command.
-If LAMMPS is using offload to a coprocessor, a diagnostic line during
-the setup for a run is printed to the screen (not to log files)
-indicating that offload is being used and the number of coprocessor
-threads per MPI task. Additionally, an offload timing summary is
-printed at the end of each run. When using offload, the
-"sort"_atom_modify.html frequency for atom data is changed to 1 such
-that the data is sorted every neighbor build.
+If LAMMPS is using offload to a coprocessor (Xeon Phi), a diagnostic
+line during the setup for a run is printed to the screen (not to log
+files) indicating that offload is being used and the number of
+coprocessor threads per MPI task. Additionally, an offload timing
+summary is printed at the end of each run. When using offload, the
+"sort"_atom_modify.html frequency for atom data is changed to 1 so
+that the per-atom data is sorted every neighbor build.
-In order to use multiple coprocessors on each compute node, the
+To use multiple coprocessors (Xeon Phis) on each compute node, the
{offload_cards} keyword can be specified with the "package
intel"_package.html command to specify the number of coprocessors to
use.
-For simulations involving long-range electrostatics or angle, bond,
-and dihedral calculations, computation and data transfer to the
+For simulations with long-range electrostatics or bond, angle,
+dihedral, improper calculations, computation and data transfer to the
coprocessor will run concurrently with computations and MPI
-communications for these routines on the host. The Intel package has
-two modes for deciding which atoms will be handled by the coprocessor.
-The setting is controlled with the "offload_ghost" option. When set to
-0, ghost atoms (atoms at the borders between MPI tasks) are not
-offloaded to the card. This allows for overlap of MPI communication of
-forces with computation on the coprocessor when the
-"newton"_newton.html setting is "on". The default is dependent on the
-style being used, however, better performance might be achieving by
+communications for these routines on the host. The USER-INTEL package
+has two modes for deciding which atoms will be handled by the
+coprocessor. The setting is controlled with the "offload_ghost"
+option. When set to 0, ghost atoms (atoms at the borders between MPI
+tasks) are not offloaded to the card. This allows for overlap of MPI
+communication of forces with computation on the coprocessor when the
+"newton"_newton.html setting is "on". The default is dependent on the
+style being used, however, better performance might be achieved by
setting this explictly.
In order to control the number of OpenMP threads used on the host, the
OMP_NUM_THREADS environment variable should be set. This variable will
not influence the number of threads used on the coprocessor. Only the
-"package intel" command can be used to control thread counts on the
-coprocessor.
+"package intel"_package.html command can be used to control thread
+counts on the coprocessor.
[Restrictions:]
When using offload, "hybrid"_pair_hybrid.html styles that require skip
lists for neighbor builds cannot be offloaded to the coprocessor.
-Using "hybrid/overlay"_pair_hybrid.html is allowed. Only one intel
-accelerated style may be used with hybrid styles. Exclusion lists are
+Using "hybrid/overlay"_pair_hybrid.html is allowed. Only one intel
+accelerated style may be used with hybrid styles. Exclusion lists are
not currently supported with offload, however, the same effect can
-often be accomplished by setting cutoffs for excluded atom types to
-0. None of the pair styles in the USER-OMP package support the
-"inner", "middle", "outer" options for r-RESPA integration.
+often be accomplished by setting cutoffs for excluded atom types to 0.
+None of the pair styles in the USER-OMP package currently support the
+"inner", "middle", "outer" options for rRESPA integration via the
+"run_style respa"_run_style.html command.
:line
diff --git a/doc/Section_start.html b/doc/Section_start.html
index 566f5f6e00..67da141f95 100644
--- a/doc/Section_start.html
+++ b/doc/Section_start.html
@@ -1497,8 +1497,9 @@ if desired.
default Intel settings, as if the command "package intel * mixed
balance -1" were used at the top of your input script. These settings
can be changed by using the package intel command in
-your script if desired. The intel suffix will attempt to use styles
-from the OMP package if they are not present in the Intel package.
+your script if desired. If the USER-OMP package is installed, the
+intel suffix will make the omp suffix a second choice, if a requested
+style is not available in the USER-INTEL package.
For the KOKKOS package, using this command-line switch also invokes the default KOKKOS settings, as if the command "package kokkos neigh @@ -1511,9 +1512,9 @@ default OMP settings, as if the command "package omp *" were used at the top of your input script. These settings can be changed by using the package omp command in your script if desired.
-The suffix command can also be used set a suffix and it -can also turn off or back on any suffix setting made via the command -line. +
The suffix command can also be used to set a suffix and +it can also turn off or back on any suffix setting made via the +command line.
-var name value1 value2 ...diff --git a/doc/Section_start.txt b/doc/Section_start.txt index 30baf84667..c7714905a4 100644 --- a/doc/Section_start.txt +++ b/doc/Section_start.txt @@ -1491,8 +1491,9 @@ For the Intel package, using this command-line switch also invokes the default Intel settings, as if the command "package intel * mixed balance -1" were used at the top of your input script. These settings can be changed by using the "package intel"_package.html command in -your script if desired. The intel suffix will attempt to use styles -from the OMP package if they are not present in the Intel package. +your script if desired. If the USER-OMP package is installed, the +intel suffix will make the omp suffix a second choice, if a requested +style is not available in the USER-INTEL package. For the KOKKOS package, using this command-line switch also invokes the default KOKKOS settings, as if the command "package kokkos neigh @@ -1505,9 +1506,9 @@ default OMP settings, as if the command "package omp *" were used at the top of your input script. These settings can be changed by using the "package omp"_package.html command in your script if desired. -The "suffix"_suffix.html command can also be used set a suffix and it -can also turn off or back on any suffix setting made via the command -line. +The "suffix"_suffix.html command can also be used to set a suffix and +it can also turn off or back on any suffix setting made via the +command line. -var name value1 value2 ... :pre diff --git a/doc/suffix.html b/doc/suffix.html index 26b1f176cd..479a9bcd29 100644 --- a/doc/suffix.html +++ b/doc/suffix.html @@ -80,9 +80,9 @@ If the variant version does not exist, the standard version is created.
When using the intel suffix, LAMMPS will first attempt to use a style -with the intel suffix. If this does not exist, a style with the omp -suffix is attempted. If this also does not exist, the style without -any suffix is used. +with the intel suffix. If the USER-OMP package is installed, the the +omp suffix will be tried as a second choice, if a requested style is +not available in the USER-INTEL package.
If the specified style is off, then any previously specified suffix is temporarily disabled, whether it was specified by a command-line diff --git a/doc/suffix.txt b/doc/suffix.txt index 01e9aae3a8..6309b5fa16 100644 --- a/doc/suffix.txt +++ b/doc/suffix.txt @@ -77,9 +77,9 @@ If the variant version does not exist, the standard version is created. When using the intel suffix, LAMMPS will first attempt to use a style -with the intel suffix. If this does not exist, a style with the omp -suffix is attempted. If this also does not exist, the style without -any suffix is used. +with the intel suffix. If the USER-OMP package is installed, the the +omp suffix will be tried as a second choice, if a requested style is +not available in the USER-INTEL package. If the specified style is {off}, then any previously specified suffix is temporarily disabled, whether it was specified by a command-line