git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12850 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp
2014-12-22 22:12:21 +00:00
parent 691602dba2
commit 6429e05b78
4 changed files with 76 additions and 22 deletions

View File

@ -62,8 +62,7 @@ Xeon Phi(TM) coprocessor is the same except for these additional
steps:
</P>
<UL><LI>add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine
<LI>add the flag -offload to LINKFLAGS in your Makefile.machine
<LI>specify how many coprocessor threads per MPI task to use
<LI>add the flag -offload to LINKFLAGS in your Makefile.machine
</UL>
<P>The latter two steps in the first case and the last step in the
coprocessor case can be done using the "-pk intel" and "-sf intel"
@ -75,7 +74,7 @@ commands respectively to your input script.
<P><B>Required hardware/software:</B>
</P>
<P>To use the offload option, you must have one or more Intel(R) Xeon
Phi(TM) coprocessors.
Phi(TM) coprocessors and use an Intel(R) C++ compiler.
</P>
<P>Optimizations for vectorization have only been tested with the
Intel(R) compiler. Use of other compilers may not result in
@ -85,10 +84,18 @@ vectorization or give poor performance.
g++ will not recognize some of the settings, so they cannot be used).
The compiler must support the OpenMP interface.
</P>
<P>The recommended version of the Intel(R) compiler is 14.0.1.106.
Versions 15.0.1.133 and later are also supported. If using Intel(R)
MPI, versions 15.0.2.044 and later are recommended.
</P>
<P><B>Building LAMMPS with the USER-INTEL package:</B>
</P>
<P>You must choose at build time whether to build for CPU acceleration or
to use the Xeon Phi in offload mode.
<P>You can choose to build with or without support for offload to a
Intel(R) Xeon Phi(TM) coprocessor. If you build with support for a
coprocessor, the same binary can be used on nodes with and without
coprocessors installed. However, if you do not have coprocessors
on your system, building without offload support will produce a
smaller binary.
</P>
<P>You can do either in one line, using the src/Make.py script, described
in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual. Type
@ -119,7 +126,9 @@ both the CCFLAGS and LINKFLAGS variables. You also need to add
</P>
<P>If you are compiling on the same architecture that will be used for
the runs, adding the flag <I>-xHost</I> to CCFLAGS will enable
vectorization with the Intel(R) compiler.
vectorization with the Intel(R) compiler. Otherwise, you must
provide the correct compute node architecture to the -x option
(e.g. -xAVX).
</P>
<P>In order to build with support for an Intel(R) Xeon Phi(TM)
coprocessor, the flag <I>-offload</I> should be added to the LINKFLAGS line
@ -130,10 +139,20 @@ included in the src/MAKE/OPTIONS directory with settings that perform
well with the Intel(R) compiler. The latter file has support for
offload to coprocessors; the former does not.
</P>
<P>If using an Intel compiler, it is recommended that Intel(R) Compiler
2013 SP1 update 1 be used. Newer versions have some performance
issues that are being addressed. If using Intel(R) MPI, version 5 or
higher is recommended.
<P><B>Notes on CPU and core affinity:</B>
</P>
<P>Setting core affinity is often used to pin MPI tasks and OpenMP
threads to a core or group of cores so that memory access can be
uniform. Unless disabled at build time, affinity for MPI tasks and
OpenMP threads on the host will be set by default on the host
when using offload to a coprocessor. In this case, it is unnecessary
to use other methods to control affinity (e.g. taskset, numactl,
I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script
with the <I>no_affinity</I> option to the <A HREF = "package.html">package intel</A>
command or by disabling the option at build time (by adding
-DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
Disabling this option is not recommended, especially when running
on a machine with hyperthreading disabled.
</P>
<P><B>Running with the USER-INTEL package from the command line:</B>
</P>

View File

@ -59,8 +59,7 @@ Xeon Phi(TM) coprocessor is the same except for these additional
steps:
add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine
add the flag -offload to LINKFLAGS in your Makefile.machine
specify how many coprocessor threads per MPI task to use :ul
add the flag -offload to LINKFLAGS in your Makefile.machine :ul
The latter two steps in the first case and the last step in the
coprocessor case can be done using the "-pk intel" and "-sf intel"
@ -72,7 +71,7 @@ commands respectively to your input script.
[Required hardware/software:]
To use the offload option, you must have one or more Intel(R) Xeon
Phi(TM) coprocessors.
Phi(TM) coprocessors and use an Intel(R) C++ compiler.
Optimizations for vectorization have only been tested with the
Intel(R) compiler. Use of other compilers may not result in
@ -82,10 +81,18 @@ Use of an Intel C++ compiler is recommended, but not required (though
g++ will not recognize some of the settings, so they cannot be used).
The compiler must support the OpenMP interface.
The recommended version of the Intel(R) compiler is 14.0.1.106.
Versions 15.0.1.133 and later are also supported. If using Intel(R)
MPI, versions 15.0.2.044 and later are recommended.
[Building LAMMPS with the USER-INTEL package:]
You must choose at build time whether to build for CPU acceleration or
to use the Xeon Phi in offload mode.
You can choose to build with or without support for offload to a
Intel(R) Xeon Phi(TM) coprocessor. If you build with support for a
coprocessor, the same binary can be used on nodes with and without
coprocessors installed. However, if you do not have coprocessors
on your system, building without offload support will produce a
smaller binary.
You can do either in one line, using the src/Make.py script, described
in "Section 2.4"_Section_start.html#start_4 of the manual. Type
@ -116,7 +123,9 @@ both the CCFLAGS and LINKFLAGS variables. You also need to add
If you are compiling on the same architecture that will be used for
the runs, adding the flag {-xHost} to CCFLAGS will enable
vectorization with the Intel(R) compiler.
vectorization with the Intel(R) compiler. Otherwise, you must
provide the correct compute node architecture to the -x option
(e.g. -xAVX).
In order to build with support for an Intel(R) Xeon Phi(TM)
coprocessor, the flag {-offload} should be added to the LINKFLAGS line
@ -127,10 +136,20 @@ included in the src/MAKE/OPTIONS directory with settings that perform
well with the Intel(R) compiler. The latter file has support for
offload to coprocessors; the former does not.
If using an Intel compiler, it is recommended that Intel(R) Compiler
2013 SP1 update 1 be used. Newer versions have some performance
issues that are being addressed. If using Intel(R) MPI, version 5 or
higher is recommended.
[Notes on CPU and core affinity:]
Setting core affinity is often used to pin MPI tasks and OpenMP
threads to a core or group of cores so that memory access can be
uniform. Unless disabled at build time, affinity for MPI tasks and
OpenMP threads on the host will be set by default on the host
when using offload to a coprocessor. In this case, it is unnecessary
to use other methods to control affinity (e.g. taskset, numactl,
I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script
with the {no_affinity} option to the "package intel"_package.html
command or by disabling the option at build time (by adding
-DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
Disabling this option is not recommended, especially when running
on a machine with hyperthreading disabled.
[Running with the USER-INTEL package from the command line:]

View File

@ -59,7 +59,7 @@
<I>intel</I> args = NPhi keyword value ...
Nphi = # of coprocessors per node
zero or more keyword/value pairs may be appended
keywords = <I>omp</I> or <I>mode</I> or <I>balance</I> or <I>ghost</I> or <I>tpc</I> or <I>tptask</I>
keywords = <I>omp</I> or <I>mode</I> or <I>balance</I> or <I>ghost</I> or <I>tpc</I> or <I>tptask</I> or <I>no_affinity</I>
<I>omp</I> value = Nthreads
Nthreads = number of OpenMP threads to use on CPU (default = 0)
<I>mode</I> value = <I>single</I> or <I>mixed</I> or <I>double</I>
@ -75,6 +75,7 @@
Ntpc = max number of coprocessor threads per coprocessor core (default = 4)
<I>tptask</I> value = Ntptask
Ntptask = max number of coprocessor threads per MPI task (default = 240)
<I>no_affinity</I> values = none
<I>kokkos</I> args = keyword value ...
zero or more keyword/value pairs may be appended
keywords = <I>neigh</I> or <I>newton</I> or <I>binsize</I> or <I>comm</I> or <I>comm/exchange</I> or <I>comm/forward</I>
@ -427,6 +428,13 @@ with 16 threads, for a total of 128.
<P>Note that the default settings for <I>tpc</I> and <I>tptask</I> are fine for
most problems, regardless of how many MPI tasks you assign to a Phi.
</P>
<P>The <I>no_affinity</I> keyword will turn off automatic setting of core
affinity for MPI tasks and OpenMP threads on the host when using
offload to a coprocessor. Affinity settings are used when possible
to prevent MPI tasks and OpenMP threads from being on separate NUMA
domains and to prevent offload threads from interfering with other
processes/threads used for LAMMPS.
</P>
<HR>
<P>The <I>kokkos</I> style invokes settings associated with the use of the

View File

@ -54,7 +54,7 @@ args = arguments specific to the style :l
{intel} args = NPhi keyword value ...
Nphi = # of coprocessors per node
zero or more keyword/value pairs may be appended
keywords = {omp} or {mode} or {balance} or {ghost} or {tpc} or {tptask}
keywords = {omp} or {mode} or {balance} or {ghost} or {tpc} or {tptask} or {no_affinity}
{omp} value = Nthreads
Nthreads = number of OpenMP threads to use on CPU (default = 0)
{mode} value = {single} or {mixed} or {double}
@ -70,6 +70,7 @@ args = arguments specific to the style :l
Ntpc = max number of coprocessor threads per coprocessor core (default = 4)
{tptask} value = Ntptask
Ntptask = max number of coprocessor threads per MPI task (default = 240)
{no_affinity} values = none
{kokkos} args = keyword value ...
zero or more keyword/value pairs may be appended
keywords = {neigh} or {newton} or {binsize} or {comm} or {comm/exchange} or {comm/forward}
@ -421,6 +422,13 @@ with 16 threads, for a total of 128.
Note that the default settings for {tpc} and {tptask} are fine for
most problems, regardless of how many MPI tasks you assign to a Phi.
The {no_affinity} keyword will turn off automatic setting of core
affinity for MPI tasks and OpenMP threads on the host when using
offload to a coprocessor. Affinity settings are used when possible
to prevent MPI tasks and OpenMP threads from being on separate NUMA
domains and to prevent offload threads from interfering with other
processes/threads used for LAMMPS.
:line
The {kokkos} style invokes settings associated with the use of the