git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12850 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp
2014-12-22 22:12:21 +00:00
parent 691602dba2
commit 6429e05b78
4 changed files with 76 additions and 22 deletions

View File

@ -62,8 +62,7 @@ Xeon Phi(TM) coprocessor is the same except for these additional
steps: steps:
</P> </P>
<UL><LI>add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine <UL><LI>add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine
<LI>add the flag -offload to LINKFLAGS in your Makefile.machine <LI>add the flag -offload to LINKFLAGS in your Makefile.machine
<LI>specify how many coprocessor threads per MPI task to use
</UL> </UL>
<P>The latter two steps in the first case and the last step in the <P>The latter two steps in the first case and the last step in the
coprocessor case can be done using the "-pk intel" and "-sf intel" coprocessor case can be done using the "-pk intel" and "-sf intel"
@ -75,7 +74,7 @@ commands respectively to your input script.
<P><B>Required hardware/software:</B> <P><B>Required hardware/software:</B>
</P> </P>
<P>To use the offload option, you must have one or more Intel(R) Xeon <P>To use the offload option, you must have one or more Intel(R) Xeon
Phi(TM) coprocessors. Phi(TM) coprocessors and use an Intel(R) C++ compiler.
</P> </P>
<P>Optimizations for vectorization have only been tested with the <P>Optimizations for vectorization have only been tested with the
Intel(R) compiler. Use of other compilers may not result in Intel(R) compiler. Use of other compilers may not result in
@ -85,10 +84,18 @@ vectorization or give poor performance.
g++ will not recognize some of the settings, so they cannot be used). g++ will not recognize some of the settings, so they cannot be used).
The compiler must support the OpenMP interface. The compiler must support the OpenMP interface.
</P> </P>
<P>The recommended version of the Intel(R) compiler is 14.0.1.106.
Versions 15.0.1.133 and later are also supported. If using Intel(R)
MPI, versions 15.0.2.044 and later are recommended.
</P>
<P><B>Building LAMMPS with the USER-INTEL package:</B> <P><B>Building LAMMPS with the USER-INTEL package:</B>
</P> </P>
<P>You must choose at build time whether to build for CPU acceleration or <P>You can choose to build with or without support for offload to a
to use the Xeon Phi in offload mode. Intel(R) Xeon Phi(TM) coprocessor. If you build with support for a
coprocessor, the same binary can be used on nodes with and without
coprocessors installed. However, if you do not have coprocessors
on your system, building without offload support will produce a
smaller binary.
</P> </P>
<P>You can do either in one line, using the src/Make.py script, described <P>You can do either in one line, using the src/Make.py script, described
in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual. Type in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual. Type
@ -119,7 +126,9 @@ both the CCFLAGS and LINKFLAGS variables. You also need to add
</P> </P>
<P>If you are compiling on the same architecture that will be used for <P>If you are compiling on the same architecture that will be used for
the runs, adding the flag <I>-xHost</I> to CCFLAGS will enable the runs, adding the flag <I>-xHost</I> to CCFLAGS will enable
vectorization with the Intel(R) compiler. vectorization with the Intel(R) compiler. Otherwise, you must
provide the correct compute node architecture to the -x option
(e.g. -xAVX).
</P> </P>
<P>In order to build with support for an Intel(R) Xeon Phi(TM) <P>In order to build with support for an Intel(R) Xeon Phi(TM)
coprocessor, the flag <I>-offload</I> should be added to the LINKFLAGS line coprocessor, the flag <I>-offload</I> should be added to the LINKFLAGS line
@ -130,10 +139,20 @@ included in the src/MAKE/OPTIONS directory with settings that perform
well with the Intel(R) compiler. The latter file has support for well with the Intel(R) compiler. The latter file has support for
offload to coprocessors; the former does not. offload to coprocessors; the former does not.
</P> </P>
<P>If using an Intel compiler, it is recommended that Intel(R) Compiler <P><B>Notes on CPU and core affinity:</B>
2013 SP1 update 1 be used. Newer versions have some performance </P>
issues that are being addressed. If using Intel(R) MPI, version 5 or <P>Setting core affinity is often used to pin MPI tasks and OpenMP
higher is recommended. threads to a core or group of cores so that memory access can be
uniform. Unless disabled at build time, affinity for MPI tasks and
OpenMP threads on the host will be set by default on the host
when using offload to a coprocessor. In this case, it is unnecessary
to use other methods to control affinity (e.g. taskset, numactl,
I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script
with the <I>no_affinity</I> option to the <A HREF = "package.html">package intel</A>
command or by disabling the option at build time (by adding
-DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
Disabling this option is not recommended, especially when running
on a machine with hyperthreading disabled.
</P> </P>
<P><B>Running with the USER-INTEL package from the command line:</B> <P><B>Running with the USER-INTEL package from the command line:</B>
</P> </P>

View File

@ -59,8 +59,7 @@ Xeon Phi(TM) coprocessor is the same except for these additional
steps: steps:
add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine
add the flag -offload to LINKFLAGS in your Makefile.machine add the flag -offload to LINKFLAGS in your Makefile.machine :ul
specify how many coprocessor threads per MPI task to use :ul
The latter two steps in the first case and the last step in the The latter two steps in the first case and the last step in the
coprocessor case can be done using the "-pk intel" and "-sf intel" coprocessor case can be done using the "-pk intel" and "-sf intel"
@ -72,7 +71,7 @@ commands respectively to your input script.
[Required hardware/software:] [Required hardware/software:]
To use the offload option, you must have one or more Intel(R) Xeon To use the offload option, you must have one or more Intel(R) Xeon
Phi(TM) coprocessors. Phi(TM) coprocessors and use an Intel(R) C++ compiler.
Optimizations for vectorization have only been tested with the Optimizations for vectorization have only been tested with the
Intel(R) compiler. Use of other compilers may not result in Intel(R) compiler. Use of other compilers may not result in
@ -82,10 +81,18 @@ Use of an Intel C++ compiler is recommended, but not required (though
g++ will not recognize some of the settings, so they cannot be used). g++ will not recognize some of the settings, so they cannot be used).
The compiler must support the OpenMP interface. The compiler must support the OpenMP interface.
The recommended version of the Intel(R) compiler is 14.0.1.106.
Versions 15.0.1.133 and later are also supported. If using Intel(R)
MPI, versions 15.0.2.044 and later are recommended.
[Building LAMMPS with the USER-INTEL package:] [Building LAMMPS with the USER-INTEL package:]
You must choose at build time whether to build for CPU acceleration or You can choose to build with or without support for offload to a
to use the Xeon Phi in offload mode. Intel(R) Xeon Phi(TM) coprocessor. If you build with support for a
coprocessor, the same binary can be used on nodes with and without
coprocessors installed. However, if you do not have coprocessors
on your system, building without offload support will produce a
smaller binary.
You can do either in one line, using the src/Make.py script, described You can do either in one line, using the src/Make.py script, described
in "Section 2.4"_Section_start.html#start_4 of the manual. Type in "Section 2.4"_Section_start.html#start_4 of the manual. Type
@ -116,7 +123,9 @@ both the CCFLAGS and LINKFLAGS variables. You also need to add
If you are compiling on the same architecture that will be used for If you are compiling on the same architecture that will be used for
the runs, adding the flag {-xHost} to CCFLAGS will enable the runs, adding the flag {-xHost} to CCFLAGS will enable
vectorization with the Intel(R) compiler. vectorization with the Intel(R) compiler. Otherwise, you must
provide the correct compute node architecture to the -x option
(e.g. -xAVX).
In order to build with support for an Intel(R) Xeon Phi(TM) In order to build with support for an Intel(R) Xeon Phi(TM)
coprocessor, the flag {-offload} should be added to the LINKFLAGS line coprocessor, the flag {-offload} should be added to the LINKFLAGS line
@ -127,10 +136,20 @@ included in the src/MAKE/OPTIONS directory with settings that perform
well with the Intel(R) compiler. The latter file has support for well with the Intel(R) compiler. The latter file has support for
offload to coprocessors; the former does not. offload to coprocessors; the former does not.
If using an Intel compiler, it is recommended that Intel(R) Compiler [Notes on CPU and core affinity:]
2013 SP1 update 1 be used. Newer versions have some performance
issues that are being addressed. If using Intel(R) MPI, version 5 or Setting core affinity is often used to pin MPI tasks and OpenMP
higher is recommended. threads to a core or group of cores so that memory access can be
uniform. Unless disabled at build time, affinity for MPI tasks and
OpenMP threads on the host will be set by default on the host
when using offload to a coprocessor. In this case, it is unnecessary
to use other methods to control affinity (e.g. taskset, numactl,
I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script
with the {no_affinity} option to the "package intel"_package.html
command or by disabling the option at build time (by adding
-DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
Disabling this option is not recommended, especially when running
on a machine with hyperthreading disabled.
[Running with the USER-INTEL package from the command line:] [Running with the USER-INTEL package from the command line:]

View File

@ -59,7 +59,7 @@
<I>intel</I> args = NPhi keyword value ... <I>intel</I> args = NPhi keyword value ...
Nphi = # of coprocessors per node Nphi = # of coprocessors per node
zero or more keyword/value pairs may be appended zero or more keyword/value pairs may be appended
keywords = <I>omp</I> or <I>mode</I> or <I>balance</I> or <I>ghost</I> or <I>tpc</I> or <I>tptask</I> keywords = <I>omp</I> or <I>mode</I> or <I>balance</I> or <I>ghost</I> or <I>tpc</I> or <I>tptask</I> or <I>no_affinity</I>
<I>omp</I> value = Nthreads <I>omp</I> value = Nthreads
Nthreads = number of OpenMP threads to use on CPU (default = 0) Nthreads = number of OpenMP threads to use on CPU (default = 0)
<I>mode</I> value = <I>single</I> or <I>mixed</I> or <I>double</I> <I>mode</I> value = <I>single</I> or <I>mixed</I> or <I>double</I>
@ -75,6 +75,7 @@
Ntpc = max number of coprocessor threads per coprocessor core (default = 4) Ntpc = max number of coprocessor threads per coprocessor core (default = 4)
<I>tptask</I> value = Ntptask <I>tptask</I> value = Ntptask
Ntptask = max number of coprocessor threads per MPI task (default = 240) Ntptask = max number of coprocessor threads per MPI task (default = 240)
<I>no_affinity</I> values = none
<I>kokkos</I> args = keyword value ... <I>kokkos</I> args = keyword value ...
zero or more keyword/value pairs may be appended zero or more keyword/value pairs may be appended
keywords = <I>neigh</I> or <I>newton</I> or <I>binsize</I> or <I>comm</I> or <I>comm/exchange</I> or <I>comm/forward</I> keywords = <I>neigh</I> or <I>newton</I> or <I>binsize</I> or <I>comm</I> or <I>comm/exchange</I> or <I>comm/forward</I>
@ -427,6 +428,13 @@ with 16 threads, for a total of 128.
<P>Note that the default settings for <I>tpc</I> and <I>tptask</I> are fine for <P>Note that the default settings for <I>tpc</I> and <I>tptask</I> are fine for
most problems, regardless of how many MPI tasks you assign to a Phi. most problems, regardless of how many MPI tasks you assign to a Phi.
</P> </P>
<P>The <I>no_affinity</I> keyword will turn off automatic setting of core
affinity for MPI tasks and OpenMP threads on the host when using
offload to a coprocessor. Affinity settings are used when possible
to prevent MPI tasks and OpenMP threads from being on separate NUMA
domains and to prevent offload threads from interfering with other
processes/threads used for LAMMPS.
</P>
<HR> <HR>
<P>The <I>kokkos</I> style invokes settings associated with the use of the <P>The <I>kokkos</I> style invokes settings associated with the use of the

View File

@ -54,7 +54,7 @@ args = arguments specific to the style :l
{intel} args = NPhi keyword value ... {intel} args = NPhi keyword value ...
Nphi = # of coprocessors per node Nphi = # of coprocessors per node
zero or more keyword/value pairs may be appended zero or more keyword/value pairs may be appended
keywords = {omp} or {mode} or {balance} or {ghost} or {tpc} or {tptask} keywords = {omp} or {mode} or {balance} or {ghost} or {tpc} or {tptask} or {no_affinity}
{omp} value = Nthreads {omp} value = Nthreads
Nthreads = number of OpenMP threads to use on CPU (default = 0) Nthreads = number of OpenMP threads to use on CPU (default = 0)
{mode} value = {single} or {mixed} or {double} {mode} value = {single} or {mixed} or {double}
@ -70,6 +70,7 @@ args = arguments specific to the style :l
Ntpc = max number of coprocessor threads per coprocessor core (default = 4) Ntpc = max number of coprocessor threads per coprocessor core (default = 4)
{tptask} value = Ntptask {tptask} value = Ntptask
Ntptask = max number of coprocessor threads per MPI task (default = 240) Ntptask = max number of coprocessor threads per MPI task (default = 240)
{no_affinity} values = none
{kokkos} args = keyword value ... {kokkos} args = keyword value ...
zero or more keyword/value pairs may be appended zero or more keyword/value pairs may be appended
keywords = {neigh} or {newton} or {binsize} or {comm} or {comm/exchange} or {comm/forward} keywords = {neigh} or {newton} or {binsize} or {comm} or {comm/exchange} or {comm/forward}
@ -421,6 +422,13 @@ with 16 threads, for a total of 128.
Note that the default settings for {tpc} and {tptask} are fine for Note that the default settings for {tpc} and {tptask} are fine for
most problems, regardless of how many MPI tasks you assign to a Phi. most problems, regardless of how many MPI tasks you assign to a Phi.
The {no_affinity} keyword will turn off automatic setting of core
affinity for MPI tasks and OpenMP threads on the host when using
offload to a coprocessor. Affinity settings are used when possible
to prevent MPI tasks and OpenMP threads from being on separate NUMA
domains and to prevent offload threads from interfering with other
processes/threads used for LAMMPS.
:line :line
The {kokkos} style invokes settings associated with the use of the The {kokkos} style invokes settings associated with the use of the