git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12850 f3b2605a-c512-4ea7-a41b-209d697bcdaa

2014-12-22 22:12:21 +00:00
parent 691602dba2
commit 6429e05b78
4 changed files with 76 additions and 22 deletions
--- a/doc/accelerate_intel.html
+++ b/doc/accelerate_intel.html
@ -62,8 +62,7 @@ Xeon Phi(TM) coprocessor is the same except for these additional
 steps:
 </P>
 <UL><LI>add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine
-<LI>add the flag -offload to LINKFLAGS in your Makefile.machine
+<LI>add the flag -offload to LINKFLAGS in your Makefile.machine 
 <LI>specify how many coprocessor threads per MPI task to use 
 </UL>
 <P>The latter two steps in the first case and the last step in the
 coprocessor case can be done using the "-pk intel" and "-sf intel"
@ -75,7 +74,7 @@ commands respectively to your input script.
 <P><B>Required hardware/software:</B>
 </P>
 <P>To use the offload option, you must have one or more Intel(R) Xeon
-Phi(TM) coprocessors.
+Phi(TM) coprocessors and use an Intel(R) C++ compiler.
 </P>
 <P>Optimizations for vectorization have only been tested with the
 Intel(R) compiler.  Use of other compilers may not result in
@ -85,10 +84,18 @@ vectorization or give poor performance.
 g++ will not recognize some of the settings, so they cannot be used).
 The compiler must support the OpenMP interface.
 </P>
 <P>The recommended version of the Intel(R) compiler is 14.0.1.106. 
 Versions 15.0.1.133 and later are also supported. If using Intel(R) 
 MPI, versions 15.0.2.044 and later are recommended.
 </P>
 <P><B>Building LAMMPS with the USER-INTEL package:</B>
 </P>
-<P>You must choose at build time whether to build for CPU acceleration or
+<P>You can choose to build with or without support for offload to a
-to use the Xeon Phi in offload mode.
+Intel(R) Xeon Phi(TM) coprocessor. If you build with support for a
 coprocessor, the same binary can be used on nodes with and without
 coprocessors installed. However, if you do not have coprocessors
 on your system, building without offload support will produce a
 smaller binary.
 </P>
 <P>You can do either in one line, using the src/Make.py script, described
 in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual.  Type
@ -119,7 +126,9 @@ both the CCFLAGS and LINKFLAGS variables.  You also need to add
 </P>
 <P>If you are compiling on the same architecture that will be used for
 the runs, adding the flag <I>-xHost</I> to CCFLAGS will enable
-vectorization with the Intel(R) compiler.
+vectorization with the Intel(R) compiler. Otherwise, you must
 provide the correct compute node architecture to the -x option
 (e.g. -xAVX).
 </P>
 <P>In order to build with support for an Intel(R) Xeon Phi(TM)
 coprocessor, the flag <I>-offload</I> should be added to the LINKFLAGS line
@ -130,10 +139,20 @@ included in the src/MAKE/OPTIONS directory with settings that perform
 well with the Intel(R) compiler. The latter file has support for
 offload to coprocessors; the former does not.
 </P>
-<P>If using an Intel compiler, it is recommended that Intel(R) Compiler
+<P><B>Notes on CPU and core affinity:</B>
-2013 SP1 update 1 be used.  Newer versions have some performance
+</P>
-issues that are being addressed. If using Intel(R) MPI, version 5 or
+<P>Setting core affinity is often used to pin MPI tasks and OpenMP
-higher is recommended.
+threads to a core or group of cores so that memory access can be
 uniform. Unless disabled at build time, affinity for MPI tasks and 
 OpenMP threads on the host will be set by default on the host 
 when using offload to a coprocessor. In this case, it is unnecessary 
 to use other methods to control affinity (e.g. taskset, numactl,
 I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script
 with the <I>no_affinity</I> option to the <A HREF = "package.html">package intel</A> 
 command or by disabling the option at build time (by adding
 -DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
 Disabling this option is not recommended, especially when running
 on a machine with hyperthreading disabled.
 </P>
 <P><B>Running with the USER-INTEL package from the command line:</B>
 </P>
--- a/doc/accelerate_intel.txt
+++ b/doc/accelerate_intel.txt
@ -59,8 +59,7 @@ Xeon Phi(TM) coprocessor is the same except for these additional
 steps:
 add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine
-add the flag -offload to LINKFLAGS in your Makefile.machine
+add the flag -offload to LINKFLAGS in your Makefile.machine :ul
 specify how many coprocessor threads per MPI task to use :ul
 The latter two steps in the first case and the last step in the
 coprocessor case can be done using the "-pk intel" and "-sf intel"
@ -72,7 +71,7 @@ commands respectively to your input script.
 [Required hardware/software:]
 To use the offload option, you must have one or more Intel(R) Xeon
-Phi(TM) coprocessors.
+Phi(TM) coprocessors and use an Intel(R) C++ compiler.
 Optimizations for vectorization have only been tested with the
 Intel(R) compiler.  Use of other compilers may not result in
@ -82,10 +81,18 @@ Use of an Intel C++ compiler is recommended, but not required (though
 g++ will not recognize some of the settings, so they cannot be used).
 The compiler must support the OpenMP interface.
 The recommended version of the Intel(R) compiler is 14.0.1.106. 
 Versions 15.0.1.133 and later are also supported. If using Intel(R) 
 MPI, versions 15.0.2.044 and later are recommended.
 [Building LAMMPS with the USER-INTEL package:]
-You must choose at build time whether to build for CPU acceleration or
+You can choose to build with or without support for offload to a
-to use the Xeon Phi in offload mode.
+Intel(R) Xeon Phi(TM) coprocessor. If you build with support for a
 coprocessor, the same binary can be used on nodes with and without
 coprocessors installed. However, if you do not have coprocessors
 on your system, building without offload support will produce a
 smaller binary.
 You can do either in one line, using the src/Make.py script, described
 in "Section 2.4"_Section_start.html#start_4 of the manual.  Type
@ -116,7 +123,9 @@ both the CCFLAGS and LINKFLAGS variables.  You also need to add
 If you are compiling on the same architecture that will be used for
 the runs, adding the flag {-xHost} to CCFLAGS will enable
-vectorization with the Intel(R) compiler.
+vectorization with the Intel(R) compiler. Otherwise, you must
 provide the correct compute node architecture to the -x option
 (e.g. -xAVX).
 In order to build with support for an Intel(R) Xeon Phi(TM)
 coprocessor, the flag {-offload} should be added to the LINKFLAGS line
@ -127,10 +136,20 @@ included in the src/MAKE/OPTIONS directory with settings that perform
 well with the Intel(R) compiler. The latter file has support for
 offload to coprocessors; the former does not.
-If using an Intel compiler, it is recommended that Intel(R) Compiler
+[Notes on CPU and core affinity:]
-2013 SP1 update 1 be used.  Newer versions have some performance
+
-issues that are being addressed. If using Intel(R) MPI, version 5 or
+Setting core affinity is often used to pin MPI tasks and OpenMP
-higher is recommended.
+threads to a core or group of cores so that memory access can be
 uniform. Unless disabled at build time, affinity for MPI tasks and 
 OpenMP threads on the host will be set by default on the host 
 when using offload to a coprocessor. In this case, it is unnecessary 
 to use other methods to control affinity (e.g. taskset, numactl,
 I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script
 with the {no_affinity} option to the "package intel"_package.html 
 command or by disabling the option at build time (by adding
 -DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
 Disabling this option is not recommended, especially when running
 on a machine with hyperthreading disabled.
 [Running with the USER-INTEL package from the command line:]
--- a/doc/package.html
+++ b/doc/package.html
@ -59,7 +59,7 @@
  <I>intel</I> args = NPhi keyword value ...
    Nphi = # of coprocessors per node
    zero or more keyword/value pairs may be appended 
-    keywords = <I>omp</I> or <I>mode</I> or <I>balance</I> or <I>ghost</I> or <I>tpc</I> or <I>tptask</I>
+    keywords = <I>omp</I> or <I>mode</I> or <I>balance</I> or <I>ghost</I> or <I>tpc</I> or <I>tptask</I> or <I>no_affinity</I>
      <I>omp</I> value = Nthreads
        Nthreads = number of OpenMP threads to use on CPU (default = 0)
      <I>mode</I> value = <I>single</I> or <I>mixed</I> or <I>double</I>
@ -75,6 +75,7 @@
        Ntpc = max number of coprocessor threads per coprocessor core (default = 4)
      <I>tptask</I> value = Ntptask
        Ntptask = max number of coprocessor threads per MPI task (default = 240)
      <I>no_affinity</I> values = none
  <I>kokkos</I> args = keyword value ...
    zero or more keyword/value pairs may be appended
    keywords = <I>neigh</I> or <I>newton</I> or <I>binsize</I> or <I>comm</I> or <I>comm/exchange</I> or <I>comm/forward</I>
@ -427,6 +428,13 @@ with 16 threads, for a total of 128.
 <P>Note that the default settings for <I>tpc</I> and <I>tptask</I> are fine for
 most problems, regardless of how many MPI tasks you assign to a Phi.
 </P>
 <P>The <I>no_affinity</I> keyword will turn off automatic setting of core
 affinity for MPI tasks and OpenMP threads on the host when using
 offload to a coprocessor. Affinity settings are used when possible 
 to prevent MPI tasks and OpenMP threads from being on separate NUMA 
 domains and to prevent offload threads from interfering with other 
 processes/threads used for LAMMPS.
 </P>
 <HR>
 <P>The <I>kokkos</I> style invokes settings associated with the use of the
--- a/doc/package.txt
+++ b/doc/package.txt
@ -54,7 +54,7 @@ args = arguments specific to the style :l
  {intel} args = NPhi keyword value ...
    Nphi = # of coprocessors per node
    zero or more keyword/value pairs may be appended 
-    keywords = {omp} or {mode} or {balance} or {ghost} or {tpc} or {tptask}
+    keywords = {omp} or {mode} or {balance} or {ghost} or {tpc} or {tptask} or {no_affinity}
      {omp} value = Nthreads
        Nthreads = number of OpenMP threads to use on CPU (default = 0)
      {mode} value = {single} or {mixed} or {double}
@ -70,6 +70,7 @@ args = arguments specific to the style :l
        Ntpc = max number of coprocessor threads per coprocessor core (default = 4)
      {tptask} value = Ntptask
        Ntptask = max number of coprocessor threads per MPI task (default = 240)
      {no_affinity} values = none
  {kokkos} args = keyword value ...
    zero or more keyword/value pairs may be appended
    keywords = {neigh} or {newton} or {binsize} or {comm} or {comm/exchange} or {comm/forward}
@ -421,6 +422,13 @@ with 16 threads, for a total of 128.
 Note that the default settings for {tpc} and {tptask} are fine for
 most problems, regardless of how many MPI tasks you assign to a Phi.
 The {no_affinity} keyword will turn off automatic setting of core
 affinity for MPI tasks and OpenMP threads on the host when using
 offload to a coprocessor. Affinity settings are used when possible 
 to prevent MPI tasks and OpenMP threads from being on separate NUMA 
 domains and to prevent offload threads from interfering with other 
 processes/threads used for LAMMPS.
 :line
 The {kokkos} style invokes settings associated with the use of the