diff --git a/doc/Section_accelerate.html b/doc/Section_accelerate.html index 296219f985..88d03984b3 100644 --- a/doc/Section_accelerate.html +++ b/doc/Section_accelerate.html @@ -176,8 +176,8 @@ discussed below. package. These styles support vectorized single and mixed precision calculations, in addition to full double precision. In extreme cases, this can provide speedups over 3.5x on CPUs. The package also -supports acceleration with offload to Intel corprocessors (Xeon -Phi). This can result in additional speedup over 2x depending on the +supports acceleration with offload to Intel(R) Xeon Phi(TM) coprocessors. +This can result in additional speedup over 2x depending on the hardware configuration.

Styles with a "kk" suffix are part of the KOKKOS package, and can be @@ -977,10 +977,10 @@ LAMMPS.

The USER-INTEL package was developed by Mike Brown at Intel Corporation. It provides a capability to accelerate simulations by -offloading neighbor list and non-bonded force calculations to Intel -coprocessors (Xeon Phi). Additionally, it supports running +offloading neighbor list and non-bonded force calculations to Intel(R) +Xeon Phi(TM) coprocessors. Additionally, it supports running simulations in single, mixed, or double precision with vectorization, -even if a coprocessor is not present, i.e. on an Intel CPU. The same +even if a coprocessor is not present, i.e. on an Intel(R) CPU. The same C++ code is used for both cases. When offloading to a coprocessor, the routine is run twice, once with an offload flag.

@@ -1004,21 +1004,25 @@ flags to enable OpenMP support (-openmp) to both the CCFLAGS and LINKFLAGS variables. You also need to add -DLAMMPS_MEMALIGN=64 and -restrict to CCFLAGS.

+

Note that currently you must use the Intel C++ compiler (icc/icpc) to +build the package. In the future, using other compilers (e.g. g++) +may be possible. +

If you are compiling on the same architecture that will be used for the runs, adding the flag -xHost will enable vectorization with the -Intel compiler. In order to build with support for an Intel +Intel(R) compiler. In order to build with support for an Intel(R) coprocessor, the flag -offload should be added to the LINKFLAGS line and the flag -DLMP_INTEL_OFFLOAD should be added to the CCFLAGS line.

The files src/MAKE/Makefile.intel and src/MAKE/Makefile.intel_offload are included in the src/MAKE directory with options that perform well -with the Intel compiler. The latter Makefile has support for offload +with the Intel(R) compiler. The latter Makefile has support for offload to coprocessors and the former does not.

-

It is recommended that Intel Compiler 2013 SP1 update 1 be used for +

It is recommended that Intel(R) Compiler 2013 SP1 update 1 be used for compiling. Newer versions have some performance issues that are being -addressed. If using Intel MPI, version 5 or higher is recommended. +addressed. If using Intel(R) MPI, version 5 or higher is recommended.

The rest of the compilation is the same as for any other package that has no additional library dependencies, e.g. @@ -1034,7 +1038,7 @@ them.

The total number of MPI tasks used by LAMMPS (one or multiple per compute node) is set in the usual manner via the mpirun or mpiexec -commands, and is independent of the Intel package. +commands, and is independent of the USER-INTEL package.

Input script requirements to run using pair styles with a intel suffix are as follows: @@ -1054,10 +1058,10 @@ use all single or all double precision, the package intel command must be used in the input script with a "single" or "double" keyword specified.

-

Running with an Intel coprocessor: +

Running with an Intel(R) coprocessor:

The USER-INTEL package supports offload of a fraction of the work to -Intel coprocessors (Xeon Phi). This is accomplished by setting a +Intel(R) Xeon Phi(TM) coprocessors. This is accomplished by setting a balance fraction on the package intel command. A balance of 0 runs all calculations on the CPU. A balance of 1 runs all calculations on the coprocessor. A balance of 0.5 runs half of @@ -1075,8 +1079,8 @@ adding a short warm-up run (10-20 steps) will allow the load-balancer to find a setting that will carry over to additional runs.

The default for the package intel command is to have -all the MPI tasks on a given compute node use a single coprocessor -(Xeon Phi). In general, running with a large number of MPI tasks on +all the MPI tasks on a given compute node use a single Xeon Phi(TM) coprocessor +In general, running with a large number of MPI tasks on each node will perform best with offload. Each MPI task will automatically get affinity to a subset of the hardware threads available on the coprocessor. For example, if your card has 61 cores, @@ -1087,7 +1091,7 @@ tuning of the number of threads to use per MPI task or the number of threads to use per core can be accomplished with keywords to the package intel command.

-

If LAMMPS is using offload to a coprocessor (Xeon Phi), a diagnostic +

If LAMMPS is using offload to a Intel(R) Xeon Phi(TM) coprocessor, a diagnostic line during the setup for a run is printed to the screen (not to log files) indicating that offload is being used and the number of coprocessor threads per MPI task. Additionally, an offload timing @@ -1095,7 +1099,7 @@ summary is printed at the end of each run. When using offload, the sort frequency for atom data is changed to 1 so that the per-atom data is sorted every neighbor build.

-

To use multiple coprocessors (Xeon Phis) on each compute node, the +

To use multiple coprocessors on each compute node, the offload_cards keyword can be specified with the package intel command to specify the number of coprocessors to use. diff --git a/doc/Section_accelerate.txt b/doc/Section_accelerate.txt index 6618ed05af..4e4d54513c 100644 --- a/doc/Section_accelerate.txt +++ b/doc/Section_accelerate.txt @@ -172,8 +172,8 @@ Styles with an "intel" suffix are part of the USER-INTEL package. These styles support vectorized single and mixed precision calculations, in addition to full double precision. In extreme cases, this can provide speedups over 3.5x on CPUs. The package also -supports acceleration with offload to Intel corprocessors (Xeon -Phi). This can result in additional speedup over 2x depending on the +supports acceleration with offload to Intel(R) Xeon Phi(TM) coprocessors. +This can result in additional speedup over 2x depending on the hardware configuration. Styles with a "kk" suffix are part of the KOKKOS package, and can be @@ -976,10 +976,10 @@ LAMMPS. The USER-INTEL package was developed by Mike Brown at Intel Corporation. It provides a capability to accelerate simulations by -offloading neighbor list and non-bonded force calculations to Intel -coprocessors (Xeon Phi). Additionally, it supports running +offloading neighbor list and non-bonded force calculations to Intel(R) +Xeon Phi(TM) coprocessors. Additionally, it supports running simulations in single, mixed, or double precision with vectorization, -even if a coprocessor is not present, i.e. on an Intel CPU. The same +even if a coprocessor is not present, i.e. on an Intel(R) CPU. The same C++ code is used for both cases. When offloading to a coprocessor, the routine is run twice, once with an offload flag. @@ -1003,21 +1003,25 @@ flags to enable OpenMP support ({-openmp}) to both the CCFLAGS and LINKFLAGS variables. You also need to add -DLAMMPS_MEMALIGN=64 and -restrict to CCFLAGS. +Note that currently you must use the Intel C++ compiler (icc/icpc) to +build the package. In the future, using other compilers (e.g. g++) +may be possible. + If you are compiling on the same architecture that will be used for the runs, adding the flag {-xHost} will enable vectorization with the -Intel compiler. In order to build with support for an Intel +Intel(R) compiler. In order to build with support for an Intel(R) coprocessor, the flag {-offload} should be added to the LINKFLAGS line and the flag {-DLMP_INTEL_OFFLOAD} should be added to the CCFLAGS line. The files src/MAKE/Makefile.intel and src/MAKE/Makefile.intel_offload are included in the src/MAKE directory with options that perform well -with the Intel compiler. The latter Makefile has support for offload +with the Intel(R) compiler. The latter Makefile has support for offload to coprocessors and the former does not. -It is recommended that Intel Compiler 2013 SP1 update 1 be used for +It is recommended that Intel(R) Compiler 2013 SP1 update 1 be used for compiling. Newer versions have some performance issues that are being -addressed. If using Intel MPI, version 5 or higher is recommended. +addressed. If using Intel(R) MPI, version 5 or higher is recommended. The rest of the compilation is the same as for any other package that has no additional library dependencies, e.g. @@ -1033,7 +1037,7 @@ them. The total number of MPI tasks used by LAMMPS (one or multiple per compute node) is set in the usual manner via the mpirun or mpiexec -commands, and is independent of the Intel package. +commands, and is independent of the USER-INTEL package. Input script requirements to run using pair styles with a {intel} suffix are as follows: @@ -1053,10 +1057,10 @@ use all single or all double precision, the "package intel"_package.html command must be used in the input script with a "single" or "double" keyword specified. -[Running with an Intel coprocessor:] +[Running with an Intel(R) coprocessor:] The USER-INTEL package supports offload of a fraction of the work to -Intel coprocessors (Xeon Phi). This is accomplished by setting a +Intel(R) Xeon Phi(TM) coprocessors. This is accomplished by setting a balance fraction on the "package intel"_package.html command. A balance of 0 runs all calculations on the CPU. A balance of 1 runs all calculations on the coprocessor. A balance of 0.5 runs half of @@ -1074,8 +1078,8 @@ adding a short warm-up run (10-20 steps) will allow the load-balancer to find a setting that will carry over to additional runs. The default for the "package intel"_package.html command is to have -all the MPI tasks on a given compute node use a single coprocessor -(Xeon Phi). In general, running with a large number of MPI tasks on +all the MPI tasks on a given compute node use a single Xeon Phi(TM) coprocessor +In general, running with a large number of MPI tasks on each node will perform best with offload. Each MPI task will automatically get affinity to a subset of the hardware threads available on the coprocessor. For example, if your card has 61 cores, @@ -1086,7 +1090,7 @@ tuning of the number of threads to use per MPI task or the number of threads to use per core can be accomplished with keywords to the "package intel"_package.html command. -If LAMMPS is using offload to a coprocessor (Xeon Phi), a diagnostic +If LAMMPS is using offload to a Intel(R) Xeon Phi(TM) coprocessor, a diagnostic line during the setup for a run is printed to the screen (not to log files) indicating that offload is being used and the number of coprocessor threads per MPI task. Additionally, an offload timing @@ -1094,7 +1098,7 @@ summary is printed at the end of each run. When using offload, the "sort"_atom_modify.html frequency for atom data is changed to 1 so that the per-atom data is sorted every neighbor build. -To use multiple coprocessors (Xeon Phis) on each compute node, the +To use multiple coprocessors on each compute node, the {offload_cards} keyword can be specified with the "package intel"_package.html command to specify the number of coprocessors to use. diff --git a/doc/Section_example.html b/doc/Section_example.html index dfc356a3d4..7683b56578 100644 --- a/doc/Section_example.html +++ b/doc/Section_example.html @@ -59,7 +59,7 @@ section of the LAMMPS WWW Site. gpu use of the GPU package for GPU acceleration hugoniostat Hugoniostat shock dynamics indent spherical indenter into a 2d solid -intel use of the USER-INTEL package for CPU or Xeon Phi acceleration +intel use of the USER-INTEL package for CPU or Intel(R) Xeon Phi(TM) coprocessor kim use of potentials in Knowledge Base for Interatomic Models (KIM) line line segment particles in 2d rigid bodies meam MEAM test for SiC and shear (same as shear examples) diff --git a/doc/Section_example.txt b/doc/Section_example.txt index 8dbc4d1f6a..2b879cbb0b 100644 --- a/doc/Section_example.txt +++ b/doc/Section_example.txt @@ -55,7 +55,7 @@ friction: frictional contact of spherical asperities between 2d surfaces gpu: use of the GPU package for GPU acceleration hugoniostat: Hugoniostat shock dynamics indent: spherical indenter into a 2d solid -intel: use of the USER-INTEL package for CPU or Xeon Phi acceleration +intel: use of the USER-INTEL package for CPU or Intel(R) Xeon Phi(TM) coprocessor kim: use of potentials in Knowledge Base for Interatomic Models (KIM) line: line segment particles in 2d rigid bodies meam: MEAM test for SiC and shear (same as shear examples) diff --git a/doc/Section_intro.html b/doc/Section_intro.html index aaeaf0ab5c..173180c3a2 100644 --- a/doc/Section_intro.html +++ b/doc/Section_intro.html @@ -109,7 +109,7 @@ it to LAMMPS.

  • open-source distribution
  • highly portable C++
  • optional libraries used: MPI and single-processor FFT -
  • GPU (CUDA and OpenCL), Intel Xeon Phi, and OpenMP support for many code features +
  • GPU (CUDA and OpenCL), Intel(R) Xeon Phi(TM) coprocessors, and OpenMP support for many code features
  • easy to extend with new features and functionality
  • runs from an input script
  • syntax for defining and using variables and formulas diff --git a/doc/Section_intro.txt b/doc/Section_intro.txt index ef99317ea5..bde594d7d6 100644 --- a/doc/Section_intro.txt +++ b/doc/Section_intro.txt @@ -105,7 +105,7 @@ General features :h4 open-source distribution highly portable C++ optional libraries used: MPI and single-processor FFT - GPU (CUDA and OpenCL), Intel Xeon Phi, and OpenMP support for many code features + GPU (CUDA and OpenCL), Intel(R) Xeon Phi(TM) coprocessors, and OpenMP support for many code features easy to extend with new features and functionality runs from an input script syntax for defining and using variables and formulas diff --git a/doc/Section_packages.html b/doc/Section_packages.html index 7ac4e3cac6..e151cc6053 100644 --- a/doc/Section_packages.html +++ b/doc/Section_packages.html @@ -125,7 +125,7 @@ on how to build LAMMPS with both kinds of auxiliary libraries. USER-CUDA NVIDIA GPU styles Christian Trott (U Tech Ilmenau) Section accelerate USER/cuda - lib/cuda USER-EFF electron force field Andres Jaramillo-Botero (Caltech) pair_style eff/cut USER/eff eff - USER-FEP free energy perturbation Agilio Padua (U Blaise Pascal Clermont-Ferrand) fix adapt/fep USER/fep - - -USER-INTEL Vectorized CPU and Intel coprocessor styles W. Michael Brown (Intel) Section accelerate examples/intel - - +USER-INTEL Vectorized CPU and Intel(R) coprocessor styles W. Michael Brown (Intel) Section accelerate examples/intel - - USER-LB Lattice Boltzmann fluid Colin Denniston (U Western Ontario) fix lb/fluid USER/lb - - USER-MISC single-file contributions USER-MISC/README USER-MISC/README - - - USER-MOLFILE VMD molfile plug-ins Axel Kohlmeyer (Temple U) dump molfile - - VMD-MOLFILE @@ -390,6 +390,22 @@ Contact him directly if you have questions.


    +

    USER-INTEL package +

    +

    This package provides options for performing neighbor list and +non-bonded force calculations in single, mixed, or double precision +and also a capability for accelerating calculations with an +Intel(R) Xeon Phi(TM) coprocessor. +

    +

    See this section of the manual to get started: +

    +

    Section_accelerate +

    +

    The person who created this package is W. Michael Brown at Intel +(michael.w.brown at intel.com). Contact him directly if you have questions. +

    +
    +

    USER-LB package

    This package contains a LAMMPS implementation of a background diff --git a/doc/Section_packages.txt b/doc/Section_packages.txt index ea0810e923..1936eea35b 100644 --- a/doc/Section_packages.txt +++ b/doc/Section_packages.txt @@ -117,7 +117,7 @@ USER-COLVARS, collective variables, Fiorin & Henin & Kohlmeyer (3), "fix colvars USER-CUDA, NVIDIA GPU styles, Christian Trott (U Tech Ilmenau), "Section accelerate"_Section_accelerate.html#acc_7, USER/cuda, -, lib/cuda USER-EFF, electron force field, Andres Jaramillo-Botero (Caltech), "pair_style eff/cut"_pair_eff.html, USER/eff, "eff"_eff, - USER-FEP, free energy perturbation, Agilio Padua (U Blaise Pascal Clermont-Ferrand), "fix adapt/fep"_fix_adapt.html, USER/fep, -, - -USER-INTEL, Vectorized CPU and Intel coprocessor styles, W. Michael Brown (Intel), "Section accelerate"_Section_accelerate.html#acc_9, examples/intel, -, - +USER-INTEL, Vectorized CPU and Intel(R) coprocessor styles, W. Michael Brown (Intel), "Section accelerate"_Section_accelerate.html#acc_9, examples/intel, -, - USER-LB, Lattice Boltzmann fluid, Colin Denniston (U Western Ontario), "fix lb/fluid"_fix_lb_fluid.html, USER/lb, -, - USER-MISC, single-file contributions, USER-MISC/README, USER-MISC/README, -, -, - USER-MOLFILE, "VMD"_VMD molfile plug-ins, Axel Kohlmeyer (Temple U), "dump molfile"_dump_molfile.html, -, -, VMD-MOLFILE @@ -377,6 +377,22 @@ Contact him directly if you have questions. :line +USER-INTEL package :h4 + +This package provides options for performing neighbor list and +non-bonded force calculations in single, mixed, or double precision +and also a capability for accelerating calculations with an +Intel(R) Xeon Phi(TM) coprocessor. + +See this section of the manual to get started: + +"Section_accelerate"_Section_accelerate.html#acc_9 + +The person who created this package is W. Michael Brown at Intel +(michael.w.brown at intel.com). Contact him directly if you have questions. + +:line + USER-LB package :h4 This package contains a LAMMPS implementation of a background diff --git a/doc/Section_start.html b/doc/Section_start.html index 67da141f95..6448a72a76 100644 --- a/doc/Section_start.html +++ b/doc/Section_start.html @@ -1493,8 +1493,8 @@ default GPU settings, as if the command "package gpu force/neigh 0 0 changed by using the package gpu command in your script if desired.

    -

    For the Intel package, using this command-line switch also invokes the -default Intel settings, as if the command "package intel * mixed +

    For the USER-INTEL package, using this command-line switch also invokes the +default USER-INTEL settings, as if the command "package intel * mixed balance -1" were used at the top of your input script. These settings can be changed by using the package intel command in your script if desired. If the USER-OMP package is installed, the diff --git a/doc/Section_start.txt b/doc/Section_start.txt index 3f6a52180e..89270d4846 100644 --- a/doc/Section_start.txt +++ b/doc/Section_start.txt @@ -1487,8 +1487,8 @@ default GPU settings, as if the command "package gpu force/neigh 0 0 changed by using the "package gpu"_package.html command in your script if desired. -For the Intel package, using this command-line switch also invokes the -default Intel settings, as if the command "package intel * mixed +For the USER-INTEL package, using this command-line switch also invokes the +default USER-INTEL settings, as if the command "package intel * mixed balance -1" were used at the top of your input script. These settings can be changed by using the "package intel"_package.html command in your script if desired. If the USER-OMP package is installed, the diff --git a/doc/fix_langevin.html b/doc/fix_langevin.html index 36ffc14e3c..bc35b2181f 100644 --- a/doc/fix_langevin.html +++ b/doc/fix_langevin.html @@ -239,20 +239,29 @@ group. As a result, the center-of-mass of a system with zero initial momentum will not drift over time.

    The keyword gjf can be used to run the Gronbech-Jensen/Farago - time-discretization of the Langevin model. The -effective random force is composed of the average of two random forces -representing half-contributions from the previous and current time -intervals. This discretization has been shown to be consistent with -the underlying physical model of Langevin dynamics and produces the -correct Boltzmann distribution of positions for large timesteps, -up to the numerical stability limit. In common with all -methods based on Verlet integration, the discretized velocities -generated by the time integration scheme are not exactly conjugate -to the positions. As a result the temperature computed from the -discretized velocities will be systematically lower than the -target temperature, by an amount that grows with the timestep. -Nonetheless, the distribution of positions will be consistent -with the target temperature. + time-discretization of the Langevin model. As +described in the papers cited below, the purpose of this method is to +enable longer timesteps to be used (up to the numerical stability +limit of the integrator), while still producing the correct Boltzmann +distribution of atom positions. It is implemented within LAMMPS, by +changing how the the random force is applied so that it is composed of +the average of two random forces representing half-contributions from +the previous and current time intervals. In common with all methods +based on Verlet integration, the discretized velocities generated by +this method in conjunction with velocity-Verlet time integration are +not exactly conjugate to the positions. As a result the temperature +(computed from the discretized velocities) will be systematically +lower than the target temperature, by a small amount which grows with +the timestep. Nonetheless, the distribution of atom positions will +still be consistent with the target temperature. For molecules containing +C-H bonds, configurational properties generated with dt = 2.5 fs and +tdamp = 100 fs are indistinguishable from dt = 0.5 fs. +Because the velocity distribution systematically decreases with increasing +timestep, the method should not be used to +generate properties that depend on the velocity distribution, such as +the velocity autocorrelation function (VACF). In the above example, the +velocity distribution at dt = 2.5fs generates an average temperature of 220 K, +instead of 300 K.


    diff --git a/doc/fix_langevin.txt b/doc/fix_langevin.txt index cfbdc2ad7d..6e129c358e 100644 --- a/doc/fix_langevin.txt +++ b/doc/fix_langevin.txt @@ -227,20 +227,29 @@ group. As a result, the center-of-mass of a system with zero initial momentum will not drift over time. The keyword {gjf} can be used to run the "Gronbech-Jensen/Farago -"_#Gronbech-Jensen time-discretization of the Langevin model. The -effective random force is composed of the average of two random forces -representing half-contributions from the previous and current time -intervals. This discretization has been shown to be consistent with -the underlying physical model of Langevin dynamics and produces the -correct Boltzmann distribution of positions for large timesteps, -up to the numerical stability limit. In common with all -methods based on Verlet integration, the discretized velocities -generated by the time integration scheme are not exactly conjugate -to the positions. As a result the temperature computed from the -discretized velocities will be systematically lower than the -target temperature, by an amount that grows with the timestep. -Nonetheless, the distribution of positions will be consistent -with the target temperature. +"_#Gronbech-Jensen time-discretization of the Langevin model. As +described in the papers cited below, the purpose of this method is to +enable longer timesteps to be used (up to the numerical stability +limit of the integrator), while still producing the correct Boltzmann +distribution of atom positions. It is implemented within LAMMPS, by +changing how the the random force is applied so that it is composed of +the average of two random forces representing half-contributions from +the previous and current time intervals. In common with all methods +based on Verlet integration, the discretized velocities generated by +this method in conjunction with velocity-Verlet time integration are +not exactly conjugate to the positions. As a result the temperature +(computed from the discretized velocities) will be systematically +lower than the target temperature, by a small amount which grows with +the timestep. Nonetheless, the distribution of atom positions will +still be consistent with the target temperature. For molecules containing +C-H bonds, configurational properties generated with dt = 2.5 fs and +tdamp = 100 fs are indistinguishable from dt = 0.5 fs. +Because the velocity distribution systematically decreases with increasing +timestep, the method should not be used to +generate properties that depend on the velocity distribution, such as +the velocity autocorrelation function (VACF). In the above example, the +velocity distribution at dt = 2.5fs generates an average temperature of 220 K, +instead of 300 K. :line diff --git a/doc/package.html b/doc/package.html index 8e7f8dd231..6a1f0ec39a 100644 --- a/doc/package.html +++ b/doc/package.html @@ -249,7 +249,7 @@ terms and single precision for everything else), or double (intel styles use double precision for all calculations).

    Additional keyword-value pairs are available that are used to -determine how work is offloaded to an Intel coprocessor. If LAMMPS is +determine how work is offloaded to an Intel(R) coprocessor. If LAMMPS is built without offload support, these values are ignored. The additional settings are as follows:

    diff --git a/doc/package.txt b/doc/package.txt index 7640d335c0..be263abb16 100644 --- a/doc/package.txt +++ b/doc/package.txt @@ -244,7 +244,7 @@ terms and single precision for everything else), or {double} (intel styles use double precision for all calculations). Additional keyword-value pairs are available that are used to -determine how work is offloaded to an Intel coprocessor. If LAMMPS is +determine how work is offloaded to an Intel(R) coprocessor. If LAMMPS is built without offload support, these values are ignored. The additional settings are as follows: diff --git a/doc/suffix.html b/doc/suffix.html index 479a9bcd29..c1aae192de 100644 --- a/doc/suffix.html +++ b/doc/suffix.html @@ -51,7 +51,7 @@ run on one or more GPUs or multicore CPU/GPU nodes
  • USER-INTEL = a collection of pair styles and neighbor routines optimized to run in single, mixed, or double precision on CPUs and -Intel coprocessors (Xeon Phi). +Intel(R) Xeon Phi(TM) coprocessors.
  • KOKKOS = a collection of atom, pair, and fix styles optimized to run using the Kokkos library on various kinds of hardware, including GPUs diff --git a/doc/suffix.txt b/doc/suffix.txt index 6309b5fa16..d2e0780dda 100644 --- a/doc/suffix.txt +++ b/doc/suffix.txt @@ -48,7 +48,7 @@ run on one or more GPUs or multicore CPU/GPU nodes :l USER-INTEL = a collection of pair styles and neighbor routines optimized to run in single, mixed, or double precision on CPUs and -Intel coprocessors (Xeon Phi). :l +Intel(R) Xeon Phi(TM) coprocessors. :l KOKKOS = a collection of atom, pair, and fix styles optimized to run using the Kokkos library on various kinds of hardware, including GPUs diff --git a/src/KSPACE/msm.cpp b/src/KSPACE/msm.cpp index bdbe80c67f..9ba0c58191 100644 --- a/src/KSPACE/msm.cpp +++ b/src/KSPACE/msm.cpp @@ -1429,7 +1429,7 @@ void MSM::particle_map() int flag = 0; if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2])) - error->one(FLERR,"Non-numeric box dimensions. Simulation unstable."); + error->one(FLERR,"Non-numeric box dimensions - simulation unstable"); for (int i = 0; i < nlocal; i++) { diff --git a/src/KSPACE/msm_cg.cpp b/src/KSPACE/msm_cg.cpp index 07177324dc..5e351cfcf2 100644 --- a/src/KSPACE/msm_cg.cpp +++ b/src/KSPACE/msm_cg.cpp @@ -310,7 +310,7 @@ void MSMCG::particle_map() int i; if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2])) - error->one(FLERR,"Non-numeric box dimensions. Simulation unstable."); + error->one(FLERR,"Non-numeric box dimensions - simulation unstable"); for (int j = 0; j < num_charged; j++) { i = is_charged[j]; diff --git a/src/KSPACE/pppm.cpp b/src/KSPACE/pppm.cpp index de24b54755..a2672ca7b7 100644 --- a/src/KSPACE/pppm.cpp +++ b/src/KSPACE/pppm.cpp @@ -1877,7 +1877,7 @@ void PPPM::particle_map() int flag = 0; if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2])) - error->one(FLERR,"Non-numeric box dimensions. Simulation unstable."); + error->one(FLERR,"Non-numeric box dimensions - simulation unstable"); for (int i = 0; i < nlocal; i++) { @@ -1897,9 +1897,8 @@ void PPPM::particle_map() if (nx+nlower < nxlo_out || nx+nupper > nxhi_out || ny+nlower < nylo_out || ny+nupper > nyhi_out || - nz+nlower < nzlo_out || nz+nupper > nzhi_out) { + nz+nlower < nzlo_out || nz+nupper > nzhi_out) flag = 1; - } } if (flag) error->one(FLERR,"Out of range atoms - cannot compute PPPM"); diff --git a/src/KSPACE/pppm_cg.cpp b/src/KSPACE/pppm_cg.cpp index 9651ac7aa6..be84a5fc0b 100644 --- a/src/KSPACE/pppm_cg.cpp +++ b/src/KSPACE/pppm_cg.cpp @@ -283,7 +283,7 @@ void PPPMCG::particle_map() double **x = atom->x; if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2])) - error->one(FLERR,"Non-numeric box dimensions. Simulation unstable."); + error->one(FLERR,"Non-numeric box dimensions - simulation unstable"); int flag = 0; for (int j = 0; j < num_charged; j++) { diff --git a/src/KSPACE/pppm_disp.cpp b/src/KSPACE/pppm_disp.cpp index d565ce00cf..37fa0b46f6 100644 --- a/src/KSPACE/pppm_disp.cpp +++ b/src/KSPACE/pppm_disp.cpp @@ -4210,7 +4210,7 @@ void PPPMDisp::particle_map(double delx, double dely, double delz, int nlocal = atom->nlocal; if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2])) - error->one(FLERR,"Non-numeric box dimensions. Simulation unstable."); + error->one(FLERR,"Non-numeric box dimensions - simulation unstable"); int flag = 0; for (int i = 0; i < nlocal; i++) { diff --git a/src/KSPACE/pppm_disp_tip4p.cpp b/src/KSPACE/pppm_disp_tip4p.cpp index aa4f3607c6..c021e3dcc0 100644 --- a/src/KSPACE/pppm_disp_tip4p.cpp +++ b/src/KSPACE/pppm_disp_tip4p.cpp @@ -79,7 +79,7 @@ void PPPMDispTIP4P::particle_map_c(double delx, double dely, double delz, int nlocal = atom->nlocal; if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2])) - error->one(FLERR,"Non-numeric box dimensions. Simulation unstable."); + error->one(FLERR,"Non-numeric box dimensions - simulation unstable"); int flag = 0; for (int i = 0; i < nlocal; i++) { diff --git a/src/KSPACE/pppm_stagger.cpp b/src/KSPACE/pppm_stagger.cpp index 62db9d6441..f0ee7e10dc 100644 --- a/src/KSPACE/pppm_stagger.cpp +++ b/src/KSPACE/pppm_stagger.cpp @@ -680,7 +680,7 @@ void PPPMStagger::particle_map() int nlocal = atom->nlocal; if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2])) - error->one(FLERR,"Non-numeric box dimensions. Simulation unstable."); + error->one(FLERR,"Non-numeric box dimensions - simulation unstable"); int flag = 0; for (int i = 0; i < nlocal; i++) { diff --git a/src/KSPACE/pppm_tip4p.cpp b/src/KSPACE/pppm_tip4p.cpp index 4345786339..f3c6d3c9a4 100644 --- a/src/KSPACE/pppm_tip4p.cpp +++ b/src/KSPACE/pppm_tip4p.cpp @@ -74,7 +74,7 @@ void PPPMTIP4P::particle_map() int nlocal = atom->nlocal; if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2])) - error->one(FLERR,"Non-numeric box dimensions. Simulation unstable."); + error->one(FLERR,"Non-numeric box dimensions - simulation unstable"); int flag = 0; for (int i = 0; i < nlocal; i++) { diff --git a/src/MPIIO/dump_atom_mpiio.cpp b/src/MPIIO/dump_atom_mpiio.cpp index 2767847a6a..d4a2f29f8a 100644 --- a/src/MPIIO/dump_atom_mpiio.cpp +++ b/src/MPIIO/dump_atom_mpiio.cpp @@ -98,7 +98,7 @@ void DumpAtomMPIIO::openfile() } else { // replace open - int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_APPEND | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh); + int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh); if (err != MPI_SUCCESS) { char str[128]; sprintf(str,"Cannot open dump file %s",filecurrent); diff --git a/src/MPIIO/dump_custom_mpiio.cpp b/src/MPIIO/dump_custom_mpiio.cpp index 2922ad30c0..7aac8e665a 100644 --- a/src/MPIIO/dump_custom_mpiio.cpp +++ b/src/MPIIO/dump_custom_mpiio.cpp @@ -119,7 +119,7 @@ void DumpCustomMPIIO::openfile() } else { // replace open - int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_APPEND | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh); + int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh); if (err != MPI_SUCCESS) { char str[128]; sprintf(str,"Cannot open dump file %s",filecurrent); diff --git a/src/MPIIO/dump_xyz_mpiio.cpp b/src/MPIIO/dump_xyz_mpiio.cpp index 108ab8dfee..af1f96c94a 100644 --- a/src/MPIIO/dump_xyz_mpiio.cpp +++ b/src/MPIIO/dump_xyz_mpiio.cpp @@ -118,7 +118,7 @@ void DumpXYZMPIIO::openfile() } else { // replace open - int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_APPEND | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh); + int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh); if (err != MPI_SUCCESS) { char str[128]; sprintf(str,"Cannot open dump file %s",filecurrent); diff --git a/src/MPIIO/restart_mpiio.cpp b/src/MPIIO/restart_mpiio.cpp index 203d490aa8..f52fd69db6 100644 --- a/src/MPIIO/restart_mpiio.cpp +++ b/src/MPIIO/restart_mpiio.cpp @@ -59,7 +59,7 @@ void RestartMPIIO::openForRead(char *filename) void RestartMPIIO::openForWrite(char *filename) { - int err = MPI_File_open(world, filename, MPI_MODE_APPEND | MPI_MODE_WRONLY, + int err = MPI_File_open(world, filename, MPI_MODE_WRONLY, MPI_INFO_NULL, &mpifh); if (err != MPI_SUCCESS) { char str[MPI_MAX_ERROR_STRING+128]; diff --git a/src/USER-INTEL/README b/src/USER-INTEL/README index 0b38928b2e..27c60d237a 100644 --- a/src/USER-INTEL/README +++ b/src/USER-INTEL/README @@ -1,6 +1,6 @@ -------------------------------- - LAMMPS Intel Package + LAMMPS Intel(R) Package -------------------------------- W. Michael Brown (Intel) @@ -12,14 +12,15 @@ This package is based on the USER-OMP package and provides LAMMPS styles that: 1. include support for single and mixed precision in addition to double. 2. include modifications to support vectorization for key routines - 3. include modifications to support offload to Xeon Phi coprocessors + 3. include modifications to support offload to Intel(R) Xeon Phi(TM) + coprocessors ----------------------------------------------------------------------------- When using the suffix command with "intel", intel styles will be used if they -exist; if they do not, and an omp version exists, that style will be used. -This is accomplished through the files *ompinto_intel.h that are created -in the src directory when the intel package is installed. For example, +exist; if they do not, and the USER-OMP package is installed and an omp version +exists, that style will be used. For example, in the case the USER-OMP package +is installed, kspace_style pppm/intel 1e-4 @@ -31,5 +32,14 @@ because no pppm style has been implemented for the Intel package. ----------------------------------------------------------------------------- -In order to use offload to Xeon Phi, the flag -DLMP_INTEL_OFFLOAD should be -set in the Makefile. Offload requires the use of Intel compilers. +In order to use offload to Intel(R) Xeon Phi(TM) coprocessors, the flag +-DLMP_INTEL_OFFLOAD should be set in the Makefile. Offload requires the use of +Intel compilers. + +----------------------------------------------------------------------------- + +The files in this package must be compiled with the Intel C++ +compiler, i.e. icc/icpc. + + +