Merge remote-tracking branch 'lammps-ro/master' into lammps-icms
Resolved Conflicts: src/KSPACE/msm.cpp src/KSPACE/msm_cg.cpp src/KSPACE/pppm.cpp src/KSPACE/pppm_cg.cpp src/KSPACE/pppm_disp.cpp src/KSPACE/pppm_disp_tip4p.cpp src/KSPACE/pppm_stagger.cpp src/KSPACE/pppm_tip4p.cpp
This commit is contained in:
@ -176,8 +176,8 @@ discussed below.
|
||||
package. These styles support vectorized single and mixed precision
|
||||
calculations, in addition to full double precision. In extreme cases,
|
||||
this can provide speedups over 3.5x on CPUs. The package also
|
||||
supports acceleration with offload to Intel corprocessors (Xeon
|
||||
Phi). This can result in additional speedup over 2x depending on the
|
||||
supports acceleration with offload to Intel(R) Xeon Phi(TM) coprocessors.
|
||||
This can result in additional speedup over 2x depending on the
|
||||
hardware configuration.
|
||||
</P>
|
||||
<P>Styles with a "kk" suffix are part of the KOKKOS package, and can be
|
||||
@ -977,10 +977,10 @@ LAMMPS.
|
||||
</H4>
|
||||
<P>The USER-INTEL package was developed by Mike Brown at Intel
|
||||
Corporation. It provides a capability to accelerate simulations by
|
||||
offloading neighbor list and non-bonded force calculations to Intel
|
||||
coprocessors (Xeon Phi). Additionally, it supports running
|
||||
offloading neighbor list and non-bonded force calculations to Intel(R)
|
||||
Xeon Phi(TM) coprocessors. Additionally, it supports running
|
||||
simulations in single, mixed, or double precision with vectorization,
|
||||
even if a coprocessor is not present, i.e. on an Intel CPU. The same
|
||||
even if a coprocessor is not present, i.e. on an Intel(R) CPU. The same
|
||||
C++ code is used for both cases. When offloading to a coprocessor,
|
||||
the routine is run twice, once with an offload flag.
|
||||
</P>
|
||||
@ -1004,21 +1004,25 @@ flags to enable OpenMP support (<I>-openmp</I>) to both the CCFLAGS and
|
||||
LINKFLAGS variables. You also need to add -DLAMMPS_MEMALIGN=64 and
|
||||
-restrict to CCFLAGS.
|
||||
</P>
|
||||
<P>Note that currently you must use the Intel C++ compiler (icc/icpc) to
|
||||
build the package. In the future, using other compilers (e.g. g++)
|
||||
may be possible.
|
||||
</P>
|
||||
<P>If you are compiling on the same architecture that will be used for
|
||||
the runs, adding the flag <I>-xHost</I> will enable vectorization with the
|
||||
Intel compiler. In order to build with support for an Intel
|
||||
Intel(R) compiler. In order to build with support for an Intel(R)
|
||||
coprocessor, the flag <I>-offload</I> should be added to the LINKFLAGS line
|
||||
and the flag <I>-DLMP_INTEL_OFFLOAD</I> should be added to the CCFLAGS
|
||||
line.
|
||||
</P>
|
||||
<P>The files src/MAKE/Makefile.intel and src/MAKE/Makefile.intel_offload
|
||||
are included in the src/MAKE directory with options that perform well
|
||||
with the Intel compiler. The latter Makefile has support for offload
|
||||
with the Intel(R) compiler. The latter Makefile has support for offload
|
||||
to coprocessors and the former does not.
|
||||
</P>
|
||||
<P>It is recommended that Intel Compiler 2013 SP1 update 1 be used for
|
||||
<P>It is recommended that Intel(R) Compiler 2013 SP1 update 1 be used for
|
||||
compiling. Newer versions have some performance issues that are being
|
||||
addressed. If using Intel MPI, version 5 or higher is recommended.
|
||||
addressed. If using Intel(R) MPI, version 5 or higher is recommended.
|
||||
</P>
|
||||
<P>The rest of the compilation is the same as for any other package that
|
||||
has no additional library dependencies, e.g.
|
||||
@ -1034,7 +1038,7 @@ them.
|
||||
</P>
|
||||
<P>The total number of MPI tasks used by LAMMPS (one or multiple per
|
||||
compute node) is set in the usual manner via the mpirun or mpiexec
|
||||
commands, and is independent of the Intel package.
|
||||
commands, and is independent of the USER-INTEL package.
|
||||
</P>
|
||||
<P>Input script requirements to run using pair styles with a <I>intel</I>
|
||||
suffix are as follows:
|
||||
@ -1054,10 +1058,10 @@ use all single or all double precision, the <A HREF = "package.html">package
|
||||
intel</A> command must be used in the input script with a
|
||||
"single" or "double" keyword specified.
|
||||
</P>
|
||||
<P><B>Running with an Intel coprocessor:</B>
|
||||
<P><B>Running with an Intel(R) coprocessor:</B>
|
||||
</P>
|
||||
<P>The USER-INTEL package supports offload of a fraction of the work to
|
||||
Intel coprocessors (Xeon Phi). This is accomplished by setting a
|
||||
Intel(R) Xeon Phi(TM) coprocessors. This is accomplished by setting a
|
||||
balance fraction on the <A HREF = "package.html">package intel</A> command. A
|
||||
balance of 0 runs all calculations on the CPU. A balance of 1 runs
|
||||
all calculations on the coprocessor. A balance of 0.5 runs half of
|
||||
@ -1075,8 +1079,8 @@ adding a short warm-up run (10-20 steps) will allow the load-balancer
|
||||
to find a setting that will carry over to additional runs.
|
||||
</P>
|
||||
<P>The default for the <A HREF = "package.html">package intel</A> command is to have
|
||||
all the MPI tasks on a given compute node use a single coprocessor
|
||||
(Xeon Phi). In general, running with a large number of MPI tasks on
|
||||
all the MPI tasks on a given compute node use a single Xeon Phi(TM) coprocessor
|
||||
In general, running with a large number of MPI tasks on
|
||||
each node will perform best with offload. Each MPI task will
|
||||
automatically get affinity to a subset of the hardware threads
|
||||
available on the coprocessor. For example, if your card has 61 cores,
|
||||
@ -1087,7 +1091,7 @@ tuning of the number of threads to use per MPI task or the number of
|
||||
threads to use per core can be accomplished with keywords to the
|
||||
<A HREF = "package.html">package intel</A> command.
|
||||
</P>
|
||||
<P>If LAMMPS is using offload to a coprocessor (Xeon Phi), a diagnostic
|
||||
<P>If LAMMPS is using offload to a Intel(R) Xeon Phi(TM) coprocessor, a diagnostic
|
||||
line during the setup for a run is printed to the screen (not to log
|
||||
files) indicating that offload is being used and the number of
|
||||
coprocessor threads per MPI task. Additionally, an offload timing
|
||||
@ -1095,7 +1099,7 @@ summary is printed at the end of each run. When using offload, the
|
||||
<A HREF = "atom_modify.html">sort</A> frequency for atom data is changed to 1 so
|
||||
that the per-atom data is sorted every neighbor build.
|
||||
</P>
|
||||
<P>To use multiple coprocessors (Xeon Phis) on each compute node, the
|
||||
<P>To use multiple coprocessors on each compute node, the
|
||||
<I>offload_cards</I> keyword can be specified with the <A HREF = "package.html">package
|
||||
intel</A> command to specify the number of coprocessors to
|
||||
use.
|
||||
|
||||
@ -172,8 +172,8 @@ Styles with an "intel" suffix are part of the USER-INTEL
|
||||
package. These styles support vectorized single and mixed precision
|
||||
calculations, in addition to full double precision. In extreme cases,
|
||||
this can provide speedups over 3.5x on CPUs. The package also
|
||||
supports acceleration with offload to Intel corprocessors (Xeon
|
||||
Phi). This can result in additional speedup over 2x depending on the
|
||||
supports acceleration with offload to Intel(R) Xeon Phi(TM) coprocessors.
|
||||
This can result in additional speedup over 2x depending on the
|
||||
hardware configuration.
|
||||
|
||||
Styles with a "kk" suffix are part of the KOKKOS package, and can be
|
||||
@ -976,10 +976,10 @@ LAMMPS.
|
||||
|
||||
The USER-INTEL package was developed by Mike Brown at Intel
|
||||
Corporation. It provides a capability to accelerate simulations by
|
||||
offloading neighbor list and non-bonded force calculations to Intel
|
||||
coprocessors (Xeon Phi). Additionally, it supports running
|
||||
offloading neighbor list and non-bonded force calculations to Intel(R)
|
||||
Xeon Phi(TM) coprocessors. Additionally, it supports running
|
||||
simulations in single, mixed, or double precision with vectorization,
|
||||
even if a coprocessor is not present, i.e. on an Intel CPU. The same
|
||||
even if a coprocessor is not present, i.e. on an Intel(R) CPU. The same
|
||||
C++ code is used for both cases. When offloading to a coprocessor,
|
||||
the routine is run twice, once with an offload flag.
|
||||
|
||||
@ -1003,21 +1003,25 @@ flags to enable OpenMP support ({-openmp}) to both the CCFLAGS and
|
||||
LINKFLAGS variables. You also need to add -DLAMMPS_MEMALIGN=64 and
|
||||
-restrict to CCFLAGS.
|
||||
|
||||
Note that currently you must use the Intel C++ compiler (icc/icpc) to
|
||||
build the package. In the future, using other compilers (e.g. g++)
|
||||
may be possible.
|
||||
|
||||
If you are compiling on the same architecture that will be used for
|
||||
the runs, adding the flag {-xHost} will enable vectorization with the
|
||||
Intel compiler. In order to build with support for an Intel
|
||||
Intel(R) compiler. In order to build with support for an Intel(R)
|
||||
coprocessor, the flag {-offload} should be added to the LINKFLAGS line
|
||||
and the flag {-DLMP_INTEL_OFFLOAD} should be added to the CCFLAGS
|
||||
line.
|
||||
|
||||
The files src/MAKE/Makefile.intel and src/MAKE/Makefile.intel_offload
|
||||
are included in the src/MAKE directory with options that perform well
|
||||
with the Intel compiler. The latter Makefile has support for offload
|
||||
with the Intel(R) compiler. The latter Makefile has support for offload
|
||||
to coprocessors and the former does not.
|
||||
|
||||
It is recommended that Intel Compiler 2013 SP1 update 1 be used for
|
||||
It is recommended that Intel(R) Compiler 2013 SP1 update 1 be used for
|
||||
compiling. Newer versions have some performance issues that are being
|
||||
addressed. If using Intel MPI, version 5 or higher is recommended.
|
||||
addressed. If using Intel(R) MPI, version 5 or higher is recommended.
|
||||
|
||||
The rest of the compilation is the same as for any other package that
|
||||
has no additional library dependencies, e.g.
|
||||
@ -1033,7 +1037,7 @@ them.
|
||||
|
||||
The total number of MPI tasks used by LAMMPS (one or multiple per
|
||||
compute node) is set in the usual manner via the mpirun or mpiexec
|
||||
commands, and is independent of the Intel package.
|
||||
commands, and is independent of the USER-INTEL package.
|
||||
|
||||
Input script requirements to run using pair styles with a {intel}
|
||||
suffix are as follows:
|
||||
@ -1053,10 +1057,10 @@ use all single or all double precision, the "package
|
||||
intel"_package.html command must be used in the input script with a
|
||||
"single" or "double" keyword specified.
|
||||
|
||||
[Running with an Intel coprocessor:]
|
||||
[Running with an Intel(R) coprocessor:]
|
||||
|
||||
The USER-INTEL package supports offload of a fraction of the work to
|
||||
Intel coprocessors (Xeon Phi). This is accomplished by setting a
|
||||
Intel(R) Xeon Phi(TM) coprocessors. This is accomplished by setting a
|
||||
balance fraction on the "package intel"_package.html command. A
|
||||
balance of 0 runs all calculations on the CPU. A balance of 1 runs
|
||||
all calculations on the coprocessor. A balance of 0.5 runs half of
|
||||
@ -1074,8 +1078,8 @@ adding a short warm-up run (10-20 steps) will allow the load-balancer
|
||||
to find a setting that will carry over to additional runs.
|
||||
|
||||
The default for the "package intel"_package.html command is to have
|
||||
all the MPI tasks on a given compute node use a single coprocessor
|
||||
(Xeon Phi). In general, running with a large number of MPI tasks on
|
||||
all the MPI tasks on a given compute node use a single Xeon Phi(TM) coprocessor
|
||||
In general, running with a large number of MPI tasks on
|
||||
each node will perform best with offload. Each MPI task will
|
||||
automatically get affinity to a subset of the hardware threads
|
||||
available on the coprocessor. For example, if your card has 61 cores,
|
||||
@ -1086,7 +1090,7 @@ tuning of the number of threads to use per MPI task or the number of
|
||||
threads to use per core can be accomplished with keywords to the
|
||||
"package intel"_package.html command.
|
||||
|
||||
If LAMMPS is using offload to a coprocessor (Xeon Phi), a diagnostic
|
||||
If LAMMPS is using offload to a Intel(R) Xeon Phi(TM) coprocessor, a diagnostic
|
||||
line during the setup for a run is printed to the screen (not to log
|
||||
files) indicating that offload is being used and the number of
|
||||
coprocessor threads per MPI task. Additionally, an offload timing
|
||||
@ -1094,7 +1098,7 @@ summary is printed at the end of each run. When using offload, the
|
||||
"sort"_atom_modify.html frequency for atom data is changed to 1 so
|
||||
that the per-atom data is sorted every neighbor build.
|
||||
|
||||
To use multiple coprocessors (Xeon Phis) on each compute node, the
|
||||
To use multiple coprocessors on each compute node, the
|
||||
{offload_cards} keyword can be specified with the "package
|
||||
intel"_package.html command to specify the number of coprocessors to
|
||||
use.
|
||||
|
||||
@ -59,7 +59,7 @@ section of the <A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A>.
|
||||
<TR><TD >gpu</TD><TD > use of the GPU package for GPU acceleration</TD></TR>
|
||||
<TR><TD >hugoniostat</TD><TD > Hugoniostat shock dynamics</TD></TR>
|
||||
<TR><TD >indent</TD><TD > spherical indenter into a 2d solid</TD></TR>
|
||||
<TR><TD >intel</TD><TD > use of the USER-INTEL package for CPU or Xeon Phi acceleration</TD></TR>
|
||||
<TR><TD >intel</TD><TD > use of the USER-INTEL package for CPU or Intel(R) Xeon Phi(TM) coprocessor</TD></TR>
|
||||
<TR><TD >kim</TD><TD > use of potentials in Knowledge Base for Interatomic Models (KIM)</TD></TR>
|
||||
<TR><TD >line</TD><TD > line segment particles in 2d rigid bodies</TD></TR>
|
||||
<TR><TD >meam</TD><TD > MEAM test for SiC and shear (same as shear examples)</TD></TR>
|
||||
|
||||
@ -55,7 +55,7 @@ friction: frictional contact of spherical asperities between 2d surfaces
|
||||
gpu: use of the GPU package for GPU acceleration
|
||||
hugoniostat: Hugoniostat shock dynamics
|
||||
indent: spherical indenter into a 2d solid
|
||||
intel: use of the USER-INTEL package for CPU or Xeon Phi acceleration
|
||||
intel: use of the USER-INTEL package for CPU or Intel(R) Xeon Phi(TM) coprocessor
|
||||
kim: use of potentials in Knowledge Base for Interatomic Models (KIM)
|
||||
line: line segment particles in 2d rigid bodies
|
||||
meam: MEAM test for SiC and shear (same as shear examples)
|
||||
|
||||
@ -109,7 +109,7 @@ it to LAMMPS.
|
||||
<LI> open-source distribution
|
||||
<LI> highly portable C++
|
||||
<LI> optional libraries used: MPI and single-processor FFT
|
||||
<LI> GPU (CUDA and OpenCL), Intel Xeon Phi, and OpenMP support for many code features
|
||||
<LI> GPU (CUDA and OpenCL), Intel(R) Xeon Phi(TM) coprocessors, and OpenMP support for many code features
|
||||
<LI> easy to extend with new features and functionality
|
||||
<LI> runs from an input script
|
||||
<LI> syntax for defining and using variables and formulas
|
||||
|
||||
@ -105,7 +105,7 @@ General features :h4
|
||||
open-source distribution
|
||||
highly portable C++
|
||||
optional libraries used: MPI and single-processor FFT
|
||||
GPU (CUDA and OpenCL), Intel Xeon Phi, and OpenMP support for many code features
|
||||
GPU (CUDA and OpenCL), Intel(R) Xeon Phi(TM) coprocessors, and OpenMP support for many code features
|
||||
easy to extend with new features and functionality
|
||||
runs from an input script
|
||||
syntax for defining and using variables and formulas
|
||||
|
||||
@ -125,7 +125,7 @@ on how to build LAMMPS with both kinds of auxiliary libraries.
|
||||
<TR ALIGN="center"><TD >USER-CUDA</TD><TD > NVIDIA GPU styles</TD><TD > Christian Trott (U Tech Ilmenau)</TD><TD > <A HREF = "Section_accelerate.html#acc_7">Section accelerate</A></TD><TD > USER/cuda</TD><TD > -</TD><TD > lib/cuda</TD></TR>
|
||||
<TR ALIGN="center"><TD >USER-EFF</TD><TD > electron force field</TD><TD > Andres Jaramillo-Botero (Caltech)</TD><TD > <A HREF = "pair_eff.html">pair_style eff/cut</A></TD><TD > USER/eff</TD><TD > <A HREF = "http://lammps.sandia.gov/movies.html#eff">eff</A></TD><TD > -</TD></TR>
|
||||
<TR ALIGN="center"><TD >USER-FEP</TD><TD > free energy perturbation</TD><TD > Agilio Padua (U Blaise Pascal Clermont-Ferrand)</TD><TD > <A HREF = "fix_adapt.html">fix adapt/fep</A></TD><TD > USER/fep</TD><TD > -</TD><TD > -</TD></TR>
|
||||
<TR ALIGN="center"><TD >USER-INTEL</TD><TD > Vectorized CPU and Intel coprocessor styles</TD><TD > W. Michael Brown (Intel)</TD><TD > <A HREF = "Section_accelerate.html#acc_9">Section accelerate</A></TD><TD > examples/intel</TD><TD > -</TD><TD > -</TD></TR>
|
||||
<TR ALIGN="center"><TD >USER-INTEL</TD><TD > Vectorized CPU and Intel(R) coprocessor styles</TD><TD > W. Michael Brown (Intel)</TD><TD > <A HREF = "Section_accelerate.html#acc_9">Section accelerate</A></TD><TD > examples/intel</TD><TD > -</TD><TD > -</TD></TR>
|
||||
<TR ALIGN="center"><TD >USER-LB</TD><TD > Lattice Boltzmann fluid</TD><TD > Colin Denniston (U Western Ontario)</TD><TD > <A HREF = "fix_lb_fluid.html">fix lb/fluid</A></TD><TD > USER/lb</TD><TD > -</TD><TD > -</TD></TR>
|
||||
<TR ALIGN="center"><TD >USER-MISC</TD><TD > single-file contributions</TD><TD > USER-MISC/README</TD><TD > USER-MISC/README</TD><TD > -</TD><TD > -</TD><TD > -</TD></TR>
|
||||
<TR ALIGN="center"><TD >USER-MOLFILE</TD><TD > <A HREF = "http://www.ks.uiuc.edu/Research/vmd">VMD</A> molfile plug-ins</TD><TD > Axel Kohlmeyer (Temple U)</TD><TD > <A HREF = "dump_molfile.html">dump molfile</A></TD><TD > -</TD><TD > -</TD><TD > VMD-MOLFILE</TD></TR>
|
||||
@ -390,6 +390,22 @@ Contact him directly if you have questions.
|
||||
</P>
|
||||
<HR>
|
||||
|
||||
<H4>USER-INTEL package
|
||||
</H4>
|
||||
<P>This package provides options for performing neighbor list and
|
||||
non-bonded force calculations in single, mixed, or double precision
|
||||
and also a capability for accelerating calculations with an
|
||||
Intel(R) Xeon Phi(TM) coprocessor.
|
||||
</P>
|
||||
<P>See this section of the manual to get started:
|
||||
</P>
|
||||
<P><A HREF = "Section_accelerate.html#acc_9">Section_accelerate</A>
|
||||
</P>
|
||||
<P>The person who created this package is W. Michael Brown at Intel
|
||||
(michael.w.brown at intel.com). Contact him directly if you have questions.
|
||||
</P>
|
||||
<HR>
|
||||
|
||||
<H4>USER-LB package
|
||||
</H4>
|
||||
<P>This package contains a LAMMPS implementation of a background
|
||||
|
||||
@ -117,7 +117,7 @@ USER-COLVARS, collective variables, Fiorin & Henin & Kohlmeyer (3), "fix colvars
|
||||
USER-CUDA, NVIDIA GPU styles, Christian Trott (U Tech Ilmenau), "Section accelerate"_Section_accelerate.html#acc_7, USER/cuda, -, lib/cuda
|
||||
USER-EFF, electron force field, Andres Jaramillo-Botero (Caltech), "pair_style eff/cut"_pair_eff.html, USER/eff, "eff"_eff, -
|
||||
USER-FEP, free energy perturbation, Agilio Padua (U Blaise Pascal Clermont-Ferrand), "fix adapt/fep"_fix_adapt.html, USER/fep, -, -
|
||||
USER-INTEL, Vectorized CPU and Intel coprocessor styles, W. Michael Brown (Intel), "Section accelerate"_Section_accelerate.html#acc_9, examples/intel, -, -
|
||||
USER-INTEL, Vectorized CPU and Intel(R) coprocessor styles, W. Michael Brown (Intel), "Section accelerate"_Section_accelerate.html#acc_9, examples/intel, -, -
|
||||
USER-LB, Lattice Boltzmann fluid, Colin Denniston (U Western Ontario), "fix lb/fluid"_fix_lb_fluid.html, USER/lb, -, -
|
||||
USER-MISC, single-file contributions, USER-MISC/README, USER-MISC/README, -, -, -
|
||||
USER-MOLFILE, "VMD"_VMD molfile plug-ins, Axel Kohlmeyer (Temple U), "dump molfile"_dump_molfile.html, -, -, VMD-MOLFILE
|
||||
@ -377,6 +377,22 @@ Contact him directly if you have questions.
|
||||
|
||||
:line
|
||||
|
||||
USER-INTEL package :h4
|
||||
|
||||
This package provides options for performing neighbor list and
|
||||
non-bonded force calculations in single, mixed, or double precision
|
||||
and also a capability for accelerating calculations with an
|
||||
Intel(R) Xeon Phi(TM) coprocessor.
|
||||
|
||||
See this section of the manual to get started:
|
||||
|
||||
"Section_accelerate"_Section_accelerate.html#acc_9
|
||||
|
||||
The person who created this package is W. Michael Brown at Intel
|
||||
(michael.w.brown at intel.com). Contact him directly if you have questions.
|
||||
|
||||
:line
|
||||
|
||||
USER-LB package :h4
|
||||
|
||||
This package contains a LAMMPS implementation of a background
|
||||
|
||||
@ -1493,8 +1493,8 @@ default GPU settings, as if the command "package gpu force/neigh 0 0
|
||||
changed by using the <A HREF = "package.html">package gpu</A> command in your script
|
||||
if desired.
|
||||
</P>
|
||||
<P>For the Intel package, using this command-line switch also invokes the
|
||||
default Intel settings, as if the command "package intel * mixed
|
||||
<P>For the USER-INTEL package, using this command-line switch also invokes the
|
||||
default USER-INTEL settings, as if the command "package intel * mixed
|
||||
balance -1" were used at the top of your input script. These settings
|
||||
can be changed by using the <A HREF = "package.html">package intel</A> command in
|
||||
your script if desired. If the USER-OMP package is installed, the
|
||||
|
||||
@ -1487,8 +1487,8 @@ default GPU settings, as if the command "package gpu force/neigh 0 0
|
||||
changed by using the "package gpu"_package.html command in your script
|
||||
if desired.
|
||||
|
||||
For the Intel package, using this command-line switch also invokes the
|
||||
default Intel settings, as if the command "package intel * mixed
|
||||
For the USER-INTEL package, using this command-line switch also invokes the
|
||||
default USER-INTEL settings, as if the command "package intel * mixed
|
||||
balance -1" were used at the top of your input script. These settings
|
||||
can be changed by using the "package intel"_package.html command in
|
||||
your script if desired. If the USER-OMP package is installed, the
|
||||
|
||||
@ -239,20 +239,29 @@ group. As a result, the center-of-mass of a system with zero initial
|
||||
momentum will not drift over time.
|
||||
</P>
|
||||
<P>The keyword <I>gjf</I> can be used to run the <A HREF = "#Gronbech-Jensen">Gronbech-Jensen/Farago
|
||||
</A> time-discretization of the Langevin model. The
|
||||
effective random force is composed of the average of two random forces
|
||||
representing half-contributions from the previous and current time
|
||||
intervals. This discretization has been shown to be consistent with
|
||||
the underlying physical model of Langevin dynamics and produces the
|
||||
correct Boltzmann distribution of positions for large timesteps,
|
||||
up to the numerical stability limit. In common with all
|
||||
methods based on Verlet integration, the discretized velocities
|
||||
generated by the time integration scheme are not exactly conjugate
|
||||
to the positions. As a result the temperature computed from the
|
||||
discretized velocities will be systematically lower than the
|
||||
target temperature, by an amount that grows with the timestep.
|
||||
Nonetheless, the distribution of positions will be consistent
|
||||
with the target temperature.
|
||||
</A> time-discretization of the Langevin model. As
|
||||
described in the papers cited below, the purpose of this method is to
|
||||
enable longer timesteps to be used (up to the numerical stability
|
||||
limit of the integrator), while still producing the correct Boltzmann
|
||||
distribution of atom positions. It is implemented within LAMMPS, by
|
||||
changing how the the random force is applied so that it is composed of
|
||||
the average of two random forces representing half-contributions from
|
||||
the previous and current time intervals. In common with all methods
|
||||
based on Verlet integration, the discretized velocities generated by
|
||||
this method in conjunction with velocity-Verlet time integration are
|
||||
not exactly conjugate to the positions. As a result the temperature
|
||||
(computed from the discretized velocities) will be systematically
|
||||
lower than the target temperature, by a small amount which grows with
|
||||
the timestep. Nonetheless, the distribution of atom positions will
|
||||
still be consistent with the target temperature. For molecules containing
|
||||
C-H bonds, configurational properties generated with dt = 2.5 fs and
|
||||
tdamp = 100 fs are indistinguishable from dt = 0.5 fs.
|
||||
Because the velocity distribution systematically decreases with increasing
|
||||
timestep, the method should not be used to
|
||||
generate properties that depend on the velocity distribution, such as
|
||||
the velocity autocorrelation function (VACF). In the above example, the
|
||||
velocity distribution at dt = 2.5fs generates an average temperature of 220 K,
|
||||
instead of 300 K.
|
||||
</P>
|
||||
<HR>
|
||||
|
||||
|
||||
@ -227,20 +227,29 @@ group. As a result, the center-of-mass of a system with zero initial
|
||||
momentum will not drift over time.
|
||||
|
||||
The keyword {gjf} can be used to run the "Gronbech-Jensen/Farago
|
||||
"_#Gronbech-Jensen time-discretization of the Langevin model. The
|
||||
effective random force is composed of the average of two random forces
|
||||
representing half-contributions from the previous and current time
|
||||
intervals. This discretization has been shown to be consistent with
|
||||
the underlying physical model of Langevin dynamics and produces the
|
||||
correct Boltzmann distribution of positions for large timesteps,
|
||||
up to the numerical stability limit. In common with all
|
||||
methods based on Verlet integration, the discretized velocities
|
||||
generated by the time integration scheme are not exactly conjugate
|
||||
to the positions. As a result the temperature computed from the
|
||||
discretized velocities will be systematically lower than the
|
||||
target temperature, by an amount that grows with the timestep.
|
||||
Nonetheless, the distribution of positions will be consistent
|
||||
with the target temperature.
|
||||
"_#Gronbech-Jensen time-discretization of the Langevin model. As
|
||||
described in the papers cited below, the purpose of this method is to
|
||||
enable longer timesteps to be used (up to the numerical stability
|
||||
limit of the integrator), while still producing the correct Boltzmann
|
||||
distribution of atom positions. It is implemented within LAMMPS, by
|
||||
changing how the the random force is applied so that it is composed of
|
||||
the average of two random forces representing half-contributions from
|
||||
the previous and current time intervals. In common with all methods
|
||||
based on Verlet integration, the discretized velocities generated by
|
||||
this method in conjunction with velocity-Verlet time integration are
|
||||
not exactly conjugate to the positions. As a result the temperature
|
||||
(computed from the discretized velocities) will be systematically
|
||||
lower than the target temperature, by a small amount which grows with
|
||||
the timestep. Nonetheless, the distribution of atom positions will
|
||||
still be consistent with the target temperature. For molecules containing
|
||||
C-H bonds, configurational properties generated with dt = 2.5 fs and
|
||||
tdamp = 100 fs are indistinguishable from dt = 0.5 fs.
|
||||
Because the velocity distribution systematically decreases with increasing
|
||||
timestep, the method should not be used to
|
||||
generate properties that depend on the velocity distribution, such as
|
||||
the velocity autocorrelation function (VACF). In the above example, the
|
||||
velocity distribution at dt = 2.5fs generates an average temperature of 220 K,
|
||||
instead of 300 K.
|
||||
|
||||
:line
|
||||
|
||||
|
||||
@ -249,7 +249,7 @@ terms and single precision for everything else), or <I>double</I> (intel
|
||||
styles use double precision for all calculations).
|
||||
</P>
|
||||
<P>Additional keyword-value pairs are available that are used to
|
||||
determine how work is offloaded to an Intel coprocessor. If LAMMPS is
|
||||
determine how work is offloaded to an Intel(R) coprocessor. If LAMMPS is
|
||||
built without offload support, these values are ignored. The
|
||||
additional settings are as follows:
|
||||
</P>
|
||||
|
||||
@ -244,7 +244,7 @@ terms and single precision for everything else), or {double} (intel
|
||||
styles use double precision for all calculations).
|
||||
|
||||
Additional keyword-value pairs are available that are used to
|
||||
determine how work is offloaded to an Intel coprocessor. If LAMMPS is
|
||||
determine how work is offloaded to an Intel(R) coprocessor. If LAMMPS is
|
||||
built without offload support, these values are ignored. The
|
||||
additional settings are as follows:
|
||||
|
||||
|
||||
@ -51,7 +51,7 @@ run on one or more GPUs or multicore CPU/GPU nodes
|
||||
|
||||
<LI>USER-INTEL = a collection of pair styles and neighbor routines
|
||||
optimized to run in single, mixed, or double precision on CPUs and
|
||||
Intel coprocessors (Xeon Phi).
|
||||
Intel(R) Xeon Phi(TM) coprocessors.
|
||||
|
||||
<LI>KOKKOS = a collection of atom, pair, and fix styles optimized to run
|
||||
using the Kokkos library on various kinds of hardware, including GPUs
|
||||
|
||||
@ -48,7 +48,7 @@ run on one or more GPUs or multicore CPU/GPU nodes :l
|
||||
|
||||
USER-INTEL = a collection of pair styles and neighbor routines
|
||||
optimized to run in single, mixed, or double precision on CPUs and
|
||||
Intel coprocessors (Xeon Phi). :l
|
||||
Intel(R) Xeon Phi(TM) coprocessors. :l
|
||||
|
||||
KOKKOS = a collection of atom, pair, and fix styles optimized to run
|
||||
using the Kokkos library on various kinds of hardware, including GPUs
|
||||
|
||||
@ -1429,7 +1429,7 @@ void MSM::particle_map()
|
||||
int flag = 0;
|
||||
|
||||
if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2]))
|
||||
error->one(FLERR,"Non-numeric box dimensions. Simulation unstable.");
|
||||
error->one(FLERR,"Non-numeric box dimensions - simulation unstable");
|
||||
|
||||
for (int i = 0; i < nlocal; i++) {
|
||||
|
||||
|
||||
@ -310,7 +310,7 @@ void MSMCG::particle_map()
|
||||
int i;
|
||||
|
||||
if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2]))
|
||||
error->one(FLERR,"Non-numeric box dimensions. Simulation unstable.");
|
||||
error->one(FLERR,"Non-numeric box dimensions - simulation unstable");
|
||||
|
||||
for (int j = 0; j < num_charged; j++) {
|
||||
i = is_charged[j];
|
||||
|
||||
@ -1877,7 +1877,7 @@ void PPPM::particle_map()
|
||||
int flag = 0;
|
||||
|
||||
if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2]))
|
||||
error->one(FLERR,"Non-numeric box dimensions. Simulation unstable.");
|
||||
error->one(FLERR,"Non-numeric box dimensions - simulation unstable");
|
||||
|
||||
for (int i = 0; i < nlocal; i++) {
|
||||
|
||||
@ -1897,9 +1897,8 @@ void PPPM::particle_map()
|
||||
|
||||
if (nx+nlower < nxlo_out || nx+nupper > nxhi_out ||
|
||||
ny+nlower < nylo_out || ny+nupper > nyhi_out ||
|
||||
nz+nlower < nzlo_out || nz+nupper > nzhi_out) {
|
||||
nz+nlower < nzlo_out || nz+nupper > nzhi_out)
|
||||
flag = 1;
|
||||
}
|
||||
}
|
||||
|
||||
if (flag) error->one(FLERR,"Out of range atoms - cannot compute PPPM");
|
||||
|
||||
@ -283,7 +283,7 @@ void PPPMCG::particle_map()
|
||||
double **x = atom->x;
|
||||
|
||||
if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2]))
|
||||
error->one(FLERR,"Non-numeric box dimensions. Simulation unstable.");
|
||||
error->one(FLERR,"Non-numeric box dimensions - simulation unstable");
|
||||
|
||||
int flag = 0;
|
||||
for (int j = 0; j < num_charged; j++) {
|
||||
|
||||
@ -4210,7 +4210,7 @@ void PPPMDisp::particle_map(double delx, double dely, double delz,
|
||||
int nlocal = atom->nlocal;
|
||||
|
||||
if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2]))
|
||||
error->one(FLERR,"Non-numeric box dimensions. Simulation unstable.");
|
||||
error->one(FLERR,"Non-numeric box dimensions - simulation unstable");
|
||||
|
||||
int flag = 0;
|
||||
for (int i = 0; i < nlocal; i++) {
|
||||
|
||||
@ -79,7 +79,7 @@ void PPPMDispTIP4P::particle_map_c(double delx, double dely, double delz,
|
||||
int nlocal = atom->nlocal;
|
||||
|
||||
if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2]))
|
||||
error->one(FLERR,"Non-numeric box dimensions. Simulation unstable.");
|
||||
error->one(FLERR,"Non-numeric box dimensions - simulation unstable");
|
||||
|
||||
int flag = 0;
|
||||
for (int i = 0; i < nlocal; i++) {
|
||||
|
||||
@ -680,7 +680,7 @@ void PPPMStagger::particle_map()
|
||||
int nlocal = atom->nlocal;
|
||||
|
||||
if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2]))
|
||||
error->one(FLERR,"Non-numeric box dimensions. Simulation unstable.");
|
||||
error->one(FLERR,"Non-numeric box dimensions - simulation unstable");
|
||||
|
||||
int flag = 0;
|
||||
for (int i = 0; i < nlocal; i++) {
|
||||
|
||||
@ -74,7 +74,7 @@ void PPPMTIP4P::particle_map()
|
||||
int nlocal = atom->nlocal;
|
||||
|
||||
if (!isfinite(boxlo[0]) || !isfinite(boxlo[1]) || !isfinite(boxlo[2]))
|
||||
error->one(FLERR,"Non-numeric box dimensions. Simulation unstable.");
|
||||
error->one(FLERR,"Non-numeric box dimensions - simulation unstable");
|
||||
|
||||
int flag = 0;
|
||||
for (int i = 0; i < nlocal; i++) {
|
||||
|
||||
@ -98,7 +98,7 @@ void DumpAtomMPIIO::openfile()
|
||||
}
|
||||
else { // replace open
|
||||
|
||||
int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_APPEND | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh);
|
||||
int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh);
|
||||
if (err != MPI_SUCCESS) {
|
||||
char str[128];
|
||||
sprintf(str,"Cannot open dump file %s",filecurrent);
|
||||
|
||||
@ -119,7 +119,7 @@ void DumpCustomMPIIO::openfile()
|
||||
}
|
||||
else { // replace open
|
||||
|
||||
int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_APPEND | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh);
|
||||
int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh);
|
||||
if (err != MPI_SUCCESS) {
|
||||
char str[128];
|
||||
sprintf(str,"Cannot open dump file %s",filecurrent);
|
||||
|
||||
@ -118,7 +118,7 @@ void DumpXYZMPIIO::openfile()
|
||||
}
|
||||
else { // replace open
|
||||
|
||||
int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_APPEND | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh);
|
||||
int err = MPI_File_open( world, filecurrent, MPI_MODE_CREATE | MPI_MODE_WRONLY , MPI_INFO_NULL, &mpifh);
|
||||
if (err != MPI_SUCCESS) {
|
||||
char str[128];
|
||||
sprintf(str,"Cannot open dump file %s",filecurrent);
|
||||
|
||||
@ -59,7 +59,7 @@ void RestartMPIIO::openForRead(char *filename)
|
||||
|
||||
void RestartMPIIO::openForWrite(char *filename)
|
||||
{
|
||||
int err = MPI_File_open(world, filename, MPI_MODE_APPEND | MPI_MODE_WRONLY,
|
||||
int err = MPI_File_open(world, filename, MPI_MODE_WRONLY,
|
||||
MPI_INFO_NULL, &mpifh);
|
||||
if (err != MPI_SUCCESS) {
|
||||
char str[MPI_MAX_ERROR_STRING+128];
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
|
||||
--------------------------------
|
||||
LAMMPS Intel Package
|
||||
LAMMPS Intel(R) Package
|
||||
--------------------------------
|
||||
|
||||
W. Michael Brown (Intel)
|
||||
@ -12,14 +12,15 @@ This package is based on the USER-OMP package and provides LAMMPS styles that:
|
||||
|
||||
1. include support for single and mixed precision in addition to double.
|
||||
2. include modifications to support vectorization for key routines
|
||||
3. include modifications to support offload to Xeon Phi coprocessors
|
||||
3. include modifications to support offload to Intel(R) Xeon Phi(TM)
|
||||
coprocessors
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
|
||||
When using the suffix command with "intel", intel styles will be used if they
|
||||
exist; if they do not, and an omp version exists, that style will be used.
|
||||
This is accomplished through the files *ompinto_intel.h that are created
|
||||
in the src directory when the intel package is installed. For example,
|
||||
exist; if they do not, and the USER-OMP package is installed and an omp version
|
||||
exists, that style will be used. For example, in the case the USER-OMP package
|
||||
is installed,
|
||||
|
||||
kspace_style pppm/intel 1e-4
|
||||
|
||||
@ -31,5 +32,14 @@ because no pppm style has been implemented for the Intel package.
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
|
||||
In order to use offload to Xeon Phi, the flag -DLMP_INTEL_OFFLOAD should be
|
||||
set in the Makefile. Offload requires the use of Intel compilers.
|
||||
In order to use offload to Intel(R) Xeon Phi(TM) coprocessors, the flag
|
||||
-DLMP_INTEL_OFFLOAD should be set in the Makefile. Offload requires the use of
|
||||
Intel compilers.
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
|
||||
The files in this package must be compiled with the Intel C++
|
||||
compiler, i.e. icc/icpc.
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user