git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12850 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
@ -62,8 +62,7 @@ Xeon Phi(TM) coprocessor is the same except for these additional
|
||||
steps:
|
||||
</P>
|
||||
<UL><LI>add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine
|
||||
<LI>add the flag -offload to LINKFLAGS in your Makefile.machine
|
||||
<LI>specify how many coprocessor threads per MPI task to use
|
||||
<LI>add the flag -offload to LINKFLAGS in your Makefile.machine
|
||||
</UL>
|
||||
<P>The latter two steps in the first case and the last step in the
|
||||
coprocessor case can be done using the "-pk intel" and "-sf intel"
|
||||
@ -75,7 +74,7 @@ commands respectively to your input script.
|
||||
<P><B>Required hardware/software:</B>
|
||||
</P>
|
||||
<P>To use the offload option, you must have one or more Intel(R) Xeon
|
||||
Phi(TM) coprocessors.
|
||||
Phi(TM) coprocessors and use an Intel(R) C++ compiler.
|
||||
</P>
|
||||
<P>Optimizations for vectorization have only been tested with the
|
||||
Intel(R) compiler. Use of other compilers may not result in
|
||||
@ -85,10 +84,18 @@ vectorization or give poor performance.
|
||||
g++ will not recognize some of the settings, so they cannot be used).
|
||||
The compiler must support the OpenMP interface.
|
||||
</P>
|
||||
<P>The recommended version of the Intel(R) compiler is 14.0.1.106.
|
||||
Versions 15.0.1.133 and later are also supported. If using Intel(R)
|
||||
MPI, versions 15.0.2.044 and later are recommended.
|
||||
</P>
|
||||
<P><B>Building LAMMPS with the USER-INTEL package:</B>
|
||||
</P>
|
||||
<P>You must choose at build time whether to build for CPU acceleration or
|
||||
to use the Xeon Phi in offload mode.
|
||||
<P>You can choose to build with or without support for offload to a
|
||||
Intel(R) Xeon Phi(TM) coprocessor. If you build with support for a
|
||||
coprocessor, the same binary can be used on nodes with and without
|
||||
coprocessors installed. However, if you do not have coprocessors
|
||||
on your system, building without offload support will produce a
|
||||
smaller binary.
|
||||
</P>
|
||||
<P>You can do either in one line, using the src/Make.py script, described
|
||||
in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual. Type
|
||||
@ -119,7 +126,9 @@ both the CCFLAGS and LINKFLAGS variables. You also need to add
|
||||
</P>
|
||||
<P>If you are compiling on the same architecture that will be used for
|
||||
the runs, adding the flag <I>-xHost</I> to CCFLAGS will enable
|
||||
vectorization with the Intel(R) compiler.
|
||||
vectorization with the Intel(R) compiler. Otherwise, you must
|
||||
provide the correct compute node architecture to the -x option
|
||||
(e.g. -xAVX).
|
||||
</P>
|
||||
<P>In order to build with support for an Intel(R) Xeon Phi(TM)
|
||||
coprocessor, the flag <I>-offload</I> should be added to the LINKFLAGS line
|
||||
@ -130,10 +139,20 @@ included in the src/MAKE/OPTIONS directory with settings that perform
|
||||
well with the Intel(R) compiler. The latter file has support for
|
||||
offload to coprocessors; the former does not.
|
||||
</P>
|
||||
<P>If using an Intel compiler, it is recommended that Intel(R) Compiler
|
||||
2013 SP1 update 1 be used. Newer versions have some performance
|
||||
issues that are being addressed. If using Intel(R) MPI, version 5 or
|
||||
higher is recommended.
|
||||
<P><B>Notes on CPU and core affinity:</B>
|
||||
</P>
|
||||
<P>Setting core affinity is often used to pin MPI tasks and OpenMP
|
||||
threads to a core or group of cores so that memory access can be
|
||||
uniform. Unless disabled at build time, affinity for MPI tasks and
|
||||
OpenMP threads on the host will be set by default on the host
|
||||
when using offload to a coprocessor. In this case, it is unnecessary
|
||||
to use other methods to control affinity (e.g. taskset, numactl,
|
||||
I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script
|
||||
with the <I>no_affinity</I> option to the <A HREF = "package.html">package intel</A>
|
||||
command or by disabling the option at build time (by adding
|
||||
-DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
|
||||
Disabling this option is not recommended, especially when running
|
||||
on a machine with hyperthreading disabled.
|
||||
</P>
|
||||
<P><B>Running with the USER-INTEL package from the command line:</B>
|
||||
</P>
|
||||
|
||||
@ -59,8 +59,7 @@ Xeon Phi(TM) coprocessor is the same except for these additional
|
||||
steps:
|
||||
|
||||
add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine
|
||||
add the flag -offload to LINKFLAGS in your Makefile.machine
|
||||
specify how many coprocessor threads per MPI task to use :ul
|
||||
add the flag -offload to LINKFLAGS in your Makefile.machine :ul
|
||||
|
||||
The latter two steps in the first case and the last step in the
|
||||
coprocessor case can be done using the "-pk intel" and "-sf intel"
|
||||
@ -72,7 +71,7 @@ commands respectively to your input script.
|
||||
[Required hardware/software:]
|
||||
|
||||
To use the offload option, you must have one or more Intel(R) Xeon
|
||||
Phi(TM) coprocessors.
|
||||
Phi(TM) coprocessors and use an Intel(R) C++ compiler.
|
||||
|
||||
Optimizations for vectorization have only been tested with the
|
||||
Intel(R) compiler. Use of other compilers may not result in
|
||||
@ -82,10 +81,18 @@ Use of an Intel C++ compiler is recommended, but not required (though
|
||||
g++ will not recognize some of the settings, so they cannot be used).
|
||||
The compiler must support the OpenMP interface.
|
||||
|
||||
The recommended version of the Intel(R) compiler is 14.0.1.106.
|
||||
Versions 15.0.1.133 and later are also supported. If using Intel(R)
|
||||
MPI, versions 15.0.2.044 and later are recommended.
|
||||
|
||||
[Building LAMMPS with the USER-INTEL package:]
|
||||
|
||||
You must choose at build time whether to build for CPU acceleration or
|
||||
to use the Xeon Phi in offload mode.
|
||||
You can choose to build with or without support for offload to a
|
||||
Intel(R) Xeon Phi(TM) coprocessor. If you build with support for a
|
||||
coprocessor, the same binary can be used on nodes with and without
|
||||
coprocessors installed. However, if you do not have coprocessors
|
||||
on your system, building without offload support will produce a
|
||||
smaller binary.
|
||||
|
||||
You can do either in one line, using the src/Make.py script, described
|
||||
in "Section 2.4"_Section_start.html#start_4 of the manual. Type
|
||||
@ -116,7 +123,9 @@ both the CCFLAGS and LINKFLAGS variables. You also need to add
|
||||
|
||||
If you are compiling on the same architecture that will be used for
|
||||
the runs, adding the flag {-xHost} to CCFLAGS will enable
|
||||
vectorization with the Intel(R) compiler.
|
||||
vectorization with the Intel(R) compiler. Otherwise, you must
|
||||
provide the correct compute node architecture to the -x option
|
||||
(e.g. -xAVX).
|
||||
|
||||
In order to build with support for an Intel(R) Xeon Phi(TM)
|
||||
coprocessor, the flag {-offload} should be added to the LINKFLAGS line
|
||||
@ -127,10 +136,20 @@ included in the src/MAKE/OPTIONS directory with settings that perform
|
||||
well with the Intel(R) compiler. The latter file has support for
|
||||
offload to coprocessors; the former does not.
|
||||
|
||||
If using an Intel compiler, it is recommended that Intel(R) Compiler
|
||||
2013 SP1 update 1 be used. Newer versions have some performance
|
||||
issues that are being addressed. If using Intel(R) MPI, version 5 or
|
||||
higher is recommended.
|
||||
[Notes on CPU and core affinity:]
|
||||
|
||||
Setting core affinity is often used to pin MPI tasks and OpenMP
|
||||
threads to a core or group of cores so that memory access can be
|
||||
uniform. Unless disabled at build time, affinity for MPI tasks and
|
||||
OpenMP threads on the host will be set by default on the host
|
||||
when using offload to a coprocessor. In this case, it is unnecessary
|
||||
to use other methods to control affinity (e.g. taskset, numactl,
|
||||
I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script
|
||||
with the {no_affinity} option to the "package intel"_package.html
|
||||
command or by disabling the option at build time (by adding
|
||||
-DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
|
||||
Disabling this option is not recommended, especially when running
|
||||
on a machine with hyperthreading disabled.
|
||||
|
||||
[Running with the USER-INTEL package from the command line:]
|
||||
|
||||
|
||||
@ -59,7 +59,7 @@
|
||||
<I>intel</I> args = NPhi keyword value ...
|
||||
Nphi = # of coprocessors per node
|
||||
zero or more keyword/value pairs may be appended
|
||||
keywords = <I>omp</I> or <I>mode</I> or <I>balance</I> or <I>ghost</I> or <I>tpc</I> or <I>tptask</I>
|
||||
keywords = <I>omp</I> or <I>mode</I> or <I>balance</I> or <I>ghost</I> or <I>tpc</I> or <I>tptask</I> or <I>no_affinity</I>
|
||||
<I>omp</I> value = Nthreads
|
||||
Nthreads = number of OpenMP threads to use on CPU (default = 0)
|
||||
<I>mode</I> value = <I>single</I> or <I>mixed</I> or <I>double</I>
|
||||
@ -75,6 +75,7 @@
|
||||
Ntpc = max number of coprocessor threads per coprocessor core (default = 4)
|
||||
<I>tptask</I> value = Ntptask
|
||||
Ntptask = max number of coprocessor threads per MPI task (default = 240)
|
||||
<I>no_affinity</I> values = none
|
||||
<I>kokkos</I> args = keyword value ...
|
||||
zero or more keyword/value pairs may be appended
|
||||
keywords = <I>neigh</I> or <I>newton</I> or <I>binsize</I> or <I>comm</I> or <I>comm/exchange</I> or <I>comm/forward</I>
|
||||
@ -427,6 +428,13 @@ with 16 threads, for a total of 128.
|
||||
<P>Note that the default settings for <I>tpc</I> and <I>tptask</I> are fine for
|
||||
most problems, regardless of how many MPI tasks you assign to a Phi.
|
||||
</P>
|
||||
<P>The <I>no_affinity</I> keyword will turn off automatic setting of core
|
||||
affinity for MPI tasks and OpenMP threads on the host when using
|
||||
offload to a coprocessor. Affinity settings are used when possible
|
||||
to prevent MPI tasks and OpenMP threads from being on separate NUMA
|
||||
domains and to prevent offload threads from interfering with other
|
||||
processes/threads used for LAMMPS.
|
||||
</P>
|
||||
<HR>
|
||||
|
||||
<P>The <I>kokkos</I> style invokes settings associated with the use of the
|
||||
|
||||
@ -54,7 +54,7 @@ args = arguments specific to the style :l
|
||||
{intel} args = NPhi keyword value ...
|
||||
Nphi = # of coprocessors per node
|
||||
zero or more keyword/value pairs may be appended
|
||||
keywords = {omp} or {mode} or {balance} or {ghost} or {tpc} or {tptask}
|
||||
keywords = {omp} or {mode} or {balance} or {ghost} or {tpc} or {tptask} or {no_affinity}
|
||||
{omp} value = Nthreads
|
||||
Nthreads = number of OpenMP threads to use on CPU (default = 0)
|
||||
{mode} value = {single} or {mixed} or {double}
|
||||
@ -70,6 +70,7 @@ args = arguments specific to the style :l
|
||||
Ntpc = max number of coprocessor threads per coprocessor core (default = 4)
|
||||
{tptask} value = Ntptask
|
||||
Ntptask = max number of coprocessor threads per MPI task (default = 240)
|
||||
{no_affinity} values = none
|
||||
{kokkos} args = keyword value ...
|
||||
zero or more keyword/value pairs may be appended
|
||||
keywords = {neigh} or {newton} or {binsize} or {comm} or {comm/exchange} or {comm/forward}
|
||||
@ -421,6 +422,13 @@ with 16 threads, for a total of 128.
|
||||
Note that the default settings for {tpc} and {tptask} are fine for
|
||||
most problems, regardless of how many MPI tasks you assign to a Phi.
|
||||
|
||||
The {no_affinity} keyword will turn off automatic setting of core
|
||||
affinity for MPI tasks and OpenMP threads on the host when using
|
||||
offload to a coprocessor. Affinity settings are used when possible
|
||||
to prevent MPI tasks and OpenMP threads from being on separate NUMA
|
||||
domains and to prevent offload threads from interfering with other
|
||||
processes/threads used for LAMMPS.
|
||||
|
||||
:line
|
||||
|
||||
The {kokkos} style invokes settings associated with the use of the
|
||||
|
||||
Reference in New Issue
Block a user