Merge branch 'develop' into fix-set-command
This commit is contained in:
@ -29,6 +29,7 @@ OPT.
|
||||
* :doc:`ave/grid <fix_ave_grid>`
|
||||
* :doc:`ave/histo <fix_ave_histo>`
|
||||
* :doc:`ave/histo/weight <fix_ave_histo>`
|
||||
* :doc:`ave/moments <fix_ave_moments>`
|
||||
* :doc:`ave/time <fix_ave_time>`
|
||||
* :doc:`aveforce <fix_aveforce>`
|
||||
* :doc:`balance <fix_balance>`
|
||||
|
||||
@ -29,6 +29,7 @@ Available topics in mostly chronological order are:
|
||||
- `Rename of fix STORE/PERATOM to fix STORE/ATOM and change of arguments`_
|
||||
- `Use Output::get_dump_by_id() instead of Output::find_dump()`_
|
||||
- `Refactored grid communication using Grid3d/Grid2d classes instead of GridComm`_
|
||||
- `FLERR as first argument to minimum image functions in Domain class`_
|
||||
|
||||
----
|
||||
|
||||
@ -610,3 +611,47 @@ KSpace solvers which use distributed FFT grids:
|
||||
- ``src/KSPACE/pppm.cpp``
|
||||
|
||||
This change is **required** or else the code will not compile.
|
||||
|
||||
FLERR as first argument to minimum image functions in Domain class
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. versionchanged:: TBD
|
||||
|
||||
The ``Domain::minimum_image()`` and ``Domain::minimum_image_big()``
|
||||
functions were changed to take the ``FLERR`` macros as first argument.
|
||||
This way the error message indicates *where* the function was called
|
||||
instead of pointing to the implementation of the function. Example:
|
||||
|
||||
Old:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
double delx1 = x[i1][0] - x[i2][0];
|
||||
double dely1 = x[i1][1] - x[i2][1];
|
||||
double delz1 = x[i1][2] - x[i2][2];
|
||||
domain->minimum_image(delx1, dely1, delz1);
|
||||
double r1 = sqrt(delx1 * delx1 + dely1 * dely1 + delz1 * delz1);
|
||||
|
||||
double delx2 = x[i3][0] - x[i2][0];
|
||||
double dely2 = x[i3][1] - x[i2][1];
|
||||
double delz2 = x[i3][2] - x[i2][2];
|
||||
domain->minimum_image_big(delx2, dely2, delz2);
|
||||
double r2 = sqrt(delx2 * delx2 + dely2 * dely2 + delz2 * delz2);
|
||||
|
||||
New:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
double delx1 = x[i1][0] - x[i2][0];
|
||||
double dely1 = x[i1][1] - x[i2][1];
|
||||
double delz1 = x[i1][2] - x[i2][2];
|
||||
domain->minimum_image(FLERR, delx1, dely1, delz1);
|
||||
double r1 = sqrt(delx1 * delx1 + dely1 * dely1 + delz1 * delz1);
|
||||
|
||||
double delx2 = x[i3][0] - x[i2][0];
|
||||
double dely2 = x[i3][1] - x[i2][1];
|
||||
double delz2 = x[i3][2] - x[i2][2];
|
||||
domain->minimum_image_big(FLERR, delx2, dely2, delz2);
|
||||
double r2 = sqrt(delx2 * delx2 + dely2 * dely2 + delz2 * delz2);
|
||||
|
||||
This change is **required** or else the code will not compile.
|
||||
|
||||
@ -75,15 +75,34 @@ section below for examples where this has been done.
|
||||
**Differences between the GPU and KOKKOS packages:**
|
||||
|
||||
* The GPU package accelerates only pair force, neighbor list, and (parts
|
||||
of) PPPM calculations. The KOKKOS package attempts to run most of the
|
||||
of) PPPM calculations (and runs the remaining force computations on
|
||||
the CPU concurrently). The KOKKOS package attempts to run most of the
|
||||
calculation on the GPU, but can transparently support non-accelerated
|
||||
code (with a performance penalty due to having data transfers between
|
||||
host and GPU).
|
||||
* The list of which styles are accelerated by the GPU or KOKKOS package
|
||||
differs with some overlap.
|
||||
* The GPU package requires neighbor lists to be built on the CPU when using
|
||||
hybrid pair styles, exclusion lists, or a triclinic simulation box.
|
||||
* The GPU package can be compiled for CUDA, HIP, or OpenCL and thus supports
|
||||
NVIDIA, AMD, and Intel GPUs well. On NVIDIA hardware, using CUDA is
|
||||
typically resulting in equal or better performance over OpenCL.
|
||||
* OpenCL in the GPU package does theoretically also support Intel CPUs or
|
||||
Intel Xeon Phi, but the native support for those in KOKKOS (or INTEL)
|
||||
is superior.
|
||||
* The GPU package benefits from running multiple MPI processes (2-8) per
|
||||
GPU to parallelize the non-GPU accelerated styles. The KOKKOS package
|
||||
usually not, especially when all parts of the calculation have KOKKOS
|
||||
support.
|
||||
* The GPU package can be compiled for CUDA, HIP, or OpenCL and thus
|
||||
supports NVIDIA, AMD, and Intel GPUs well. On NVIDIA or AMD hardware,
|
||||
using native CUDA or HIP compilation, respectively, with either GPU or
|
||||
KOKKOS results in equal or better performance over OpenCL.
|
||||
* OpenCL in the GPU package supports NVIDIA, AMD, and Intel GPUs at the
|
||||
*same time* and with the *same executable*. KOKKOS currently does not
|
||||
support OpenCL.
|
||||
* The GPU package supports single precision floating point, mixed
|
||||
precision floating point, and double precision floating point math on
|
||||
the GPU. This must be chosen at compile time. KOKKOS currently only
|
||||
supports double precision floating point math. Using single or mixed
|
||||
precision (recommended) results in significantly improved performance
|
||||
on consumer GPUs for some loss in accuracy (which is rather small with
|
||||
mixed precision). Single and mixed precision support for KOKKOS is in
|
||||
development (no ETA yet).
|
||||
* Some pair styles (for example :doc:`snap <pair_snap>`, :doc:`mliap
|
||||
<pair_mliap>` or :doc:`reaxff <pair_reaxff>` in the KOKKOS package have
|
||||
seen extensive optimizations and specializations for GPUs and CPUs.
|
||||
|
||||
@ -1,16 +1,218 @@
|
||||
Measuring performance
|
||||
=====================
|
||||
|
||||
Before trying to make your simulation run faster, you should
|
||||
understand how it currently performs and where the bottlenecks are.
|
||||
Factors that influence performance
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The best way to do this is run the your system (actual number of
|
||||
atoms) for a modest number of timesteps (say 100 steps) on several
|
||||
different processor counts, including a single processor if possible.
|
||||
Do this for an equilibrium version of your system, so that the
|
||||
100-step timings are representative of a much longer run. There is
|
||||
typically no need to run for 1000s of timesteps to get accurate
|
||||
timings; you can simply extrapolate from short runs.
|
||||
Before trying to make your simulation run faster, you should understand
|
||||
how it currently performs and where the bottlenecks are. We generally
|
||||
distinguish between serial performance (how fast can a single process do
|
||||
the calculations?) and parallel efficiency (how much faster does a
|
||||
calculation get by using more processes?). There are many factors
|
||||
affecting either and below are some lists discussing some commonly
|
||||
known but also some less known factors.
|
||||
|
||||
Factors affecting serial performance (in no specific order):
|
||||
|
||||
* CPU hardware: clock rate, cache sizes, CPU architecture (instructions
|
||||
per clock, vectorization support, fused multiply-add support and more)
|
||||
* RAM speed and number of channels that the CPU can use to access RAM
|
||||
* Cooling: CPUs can change the CPU clock based on thermal load, thus the
|
||||
degree of cooling can affect the speed of a CPU. Sometimes even the
|
||||
temperature of neighboring compute nodes in a cluster can make a
|
||||
difference.
|
||||
* Compiler optimization: most of LAMMPS is written to be easy to modify
|
||||
and thus compiler optimization can speed up calculations. However, too
|
||||
aggressive compiler optimization can produce incorrect results or
|
||||
crashes (during compilation or at runtime).
|
||||
* Source code improvements: styles in the OPT, OPENMP, and INTEL package
|
||||
can be faster than their base implementation due to improved data
|
||||
access patterns, cache efficiency, or vectorization. Compiler optimization
|
||||
is required to take full advantage of these.
|
||||
* Number and kind of fixes, computes, or variables used during a simulation,
|
||||
especially if they result in collective communication operations
|
||||
* Pair style cutoffs and system density: calculations get slower the more
|
||||
neighbors are in the neighbor list and thus for which interactions need
|
||||
to be computed. Force fields with pair styles that compute interactions
|
||||
between triples or quadruples of atoms or that use embedding energies or
|
||||
charge equilibration will need to walk the neighbor lists multiple times.
|
||||
* Neighbor list settings: tradeoff between neighbor list skin (larger
|
||||
skin = more neighbors, more distances to compute before applying the
|
||||
cutoff) and frequency of neighbor list builds (larger skin = fewer
|
||||
neighbor list builds).
|
||||
* Proximity of per-atom data in physical memory that for atoms that are
|
||||
close in space improves cache efficiency (thus LAMMPS will by default
|
||||
sort atoms in local storage accordingly)
|
||||
* Using r-RESPA multi-timestepping or a SHAKE or RATTLE fix to constrain
|
||||
bonds with higher-frequency vibrations may allow a larger (outer) timestep
|
||||
and thus fewer force evaluations (usually the most time consuming step in
|
||||
MD) for the same simulated time (with some tradeoff in accuracy).
|
||||
|
||||
Factors affecting parallel efficiency (in no specific order):
|
||||
|
||||
* Bandwidth and latency of communication between processes. This can vary a
|
||||
lot between processes on the same CPU or physical node and processes
|
||||
on different physical nodes and there vary between different
|
||||
communication technologies (like Ethernet or InfiniBand or other
|
||||
high-speed interconnects)
|
||||
* Frequency and complexity of communication patterns required
|
||||
* Number of "work units" (usually correlated with the number of atoms
|
||||
and choice of force field) per MPI-process required for one time step
|
||||
(if this number becomes too small, the cost of communication becomes
|
||||
dominant).
|
||||
* Choice of parallelization method (MPI-only, OpenMP-only, MPI+OpenMP,
|
||||
MPI+GPU, MPI+GPU+OpenMP)
|
||||
* Algorithmic complexity of the chosen force field (pair-wise vs. many-body
|
||||
potential, Ewald vs. PPPM vs. (compensated or smoothed) cutoff-Coulomb)
|
||||
* Communication cutoff: a larger cutoff results in more ghost atoms and
|
||||
thus more data that needs to be communicated
|
||||
* Frequency of neighbor list builds: during a neighbor list build the
|
||||
domain decomposition is updated and the list of ghost atoms rebuilt
|
||||
which requires multiple global communication steps
|
||||
* FFT-grid settings and number of MPI processes for kspace style PPPM:
|
||||
PPPM uses parallel 3d FFTs which will drop much faster in parallel
|
||||
efficiency with respect to the number of MPI processes than other
|
||||
parts of the force computation. Thus using MPI+OpenMP parallelization
|
||||
or :doc:`run style verlet/split <run_style>` can improve parallel
|
||||
efficiency by limiting the number of MPI processes used for the FFTs.
|
||||
* Load (im-)balance: LAMMPS' domain decomposition assumes that atoms are
|
||||
evenly distributed across the entire simulation box. If there are
|
||||
areas of vacuum, this may lead to different amounts of work for
|
||||
different MPI processes. Using the :doc:`processors command
|
||||
<processors>` to change the spatial decomposition, or MPI+OpenMP
|
||||
parallelization instead of only-MPI to have larger sub-domains, or the
|
||||
(fix) balance command (without or with switching to communication style
|
||||
tiled) to change the sub-domain volumes are all methods that
|
||||
can help to avoid load imbalances.
|
||||
|
||||
Examples comparing serial performance
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Before looking at your own input deck(s), you should get some reference
|
||||
data from a known input so that you know what kind of performance you
|
||||
should expect from your input. For the following we therefore use the
|
||||
``in.rhodo.scaled`` input file and ``data.rhodo`` data file from the
|
||||
``bench`` folder. This is a system of 32000 atoms using the CHARMM force
|
||||
field and long-range electrostatics running for 100 MD steps. The
|
||||
performance data is printed at the end of a run and only measures the
|
||||
performance during propagation and excludes the setup phase.
|
||||
|
||||
Running with a single MPI process on an AMD Ryzen Threadripper PRO
|
||||
9985WX CPU (64 cores, 128 threads, base clock: 3.2GHz, max. clock
|
||||
5.4GHz, L1/L2/L3 cache 5MB/64MB/256MB, 8 DDR5-6400 memory channels) one
|
||||
gets the following performance report:
|
||||
|
||||
.. code-block::
|
||||
|
||||
Performance: 1.232 ns/day, 19.476 hours/ns, 7.131 timesteps/s, 228.197 katom-step/s
|
||||
99.2% CPU use with 1 MPI tasks x 1 OpenMP threads
|
||||
|
||||
The %CPU value should be at 100% or very close. Lower values would
|
||||
be an indication that there are *other* processes also using the same
|
||||
CPU core and thus invalidating the performance data. The katom-step/s
|
||||
value is best suited for comparisons, since it is fairly independent
|
||||
from the system size. The `in.rhodo.scaled` input can be easily made
|
||||
larger through replication in the three dimensions by settings variables
|
||||
"x", "y", "z" to values other than 1 from the command line with the
|
||||
"-var" flag. Example:
|
||||
|
||||
- 32000 atoms: 228.8 katom-step/s
|
||||
- 64000 atoms: 231.6 katom-step/s
|
||||
- 128000 atoms: 231.1 katom-step/s
|
||||
- 256000 atoms: 226.4 katom-step/s
|
||||
- 864000 atoms: 229.6 katom-step/s
|
||||
|
||||
Comparing to an AMD Ryzen 7 7840HS CPU (8 cores, 16 threads, base clock
|
||||
3.8GHz, max. clock 5.1GHz, L1/L2/L3 cache 512kB/8MB/16MB, 2 DDR5-5600
|
||||
memory channels), we get similar single core performance (~220
|
||||
katom-step/s vs. ~230 katom-step/s) due to the similar clock and
|
||||
architecture:
|
||||
|
||||
- 32000 atoms: 219.8 katom-step/s
|
||||
- 64000 atoms: 222.5 katom-step/s
|
||||
- 128000 atoms: 216.8 katom-step/s
|
||||
- 256000 atoms: 221.0 katom-step/s
|
||||
- 864000 atoms: 221.1 katom-step/s
|
||||
|
||||
Switching to an older Intel Xeon E5-2650 v4 CPU (12 cores, 12 threads,
|
||||
base clock 2.2GHz, max. clock 2.9GHz, L1/L2/L3 cache (64kB/256kB/30MB, 4
|
||||
DDR4-2400 memory channels) leads to a lower performance of approximately
|
||||
109 katom-step/s due to differences in architecture and clock. In all
|
||||
cases, when looking at multiple runs, the katom-step/s property
|
||||
fluctuates by approximately 1% around the average.
|
||||
|
||||
From here on we are looking at the performance for the 256000 atom system only
|
||||
and change several settings incrementally:
|
||||
|
||||
#. No compiler optimization GCC (-Og -g): 183.8 katom-step/s
|
||||
#. Moderate optimization with debug info GCC (-O2 -g): 231.1 katom-step/s
|
||||
#. Full compiler optimization GCC (-DNDEBUG -O3): 236.0 katom-step/s
|
||||
#. Aggressive compiler optimization GCC (-O3 -ffast-math -march=native): 239.9 katom-step/s
|
||||
#. Source code optimization in OPENMP package (1 thread): 266.7 katom-step/s
|
||||
#. Use *fix nvt* instead of *fix npt* (compute virial only every 50 steps): 272.9 katom-step/s
|
||||
#. Increase pair style cutoff by 2 :math:`\AA`: 181.2 katom-step/s
|
||||
#. Use tight PPPM convergence (1.0e-6 instead of 1.0e-4): 161.9 katom-step/s
|
||||
#. Use Ewald summation instead of PPPM (at 1.0e-4 convergence): 19.9 katom-step/s
|
||||
|
||||
The numbers show that gains from aggressive compiler optimizations are
|
||||
rather small in LAMMPS, the data access optimizations in the OPENMP (and
|
||||
OPT) packages are more prominent. On the other side, using more
|
||||
accurate force field settings causes, not unexpectedly, a significant
|
||||
slowdown (to about half the speed). Finally, using regular Ewald
|
||||
summation causes a massive slowdown due to the bad algorithmic scaling
|
||||
with system size.
|
||||
|
||||
Examples comparing parallel performance
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The parallel performance usually goes on top of the serial performance.
|
||||
Using twice as many processors should increase the performance metric
|
||||
by up to a factor of two. With the number of processors *N* and the
|
||||
serial performance :math:`p_1` and the performance for *N* processors
|
||||
:math:`p_N` we can define a *parallel efficiency* in percent as follows:
|
||||
|
||||
.. math::
|
||||
|
||||
P_{eff} = \frac{p_N}{p_1 \cdot N} \cdot 100\%
|
||||
|
||||
For the AMD Ryzen Threadripper PRO 9985WX CPU and the serial
|
||||
simulation settings of point 6. from above, we get the following
|
||||
parallel efficiency data for the 256000 atom system:
|
||||
|
||||
- 1 MPI task: 273.6 katom-step/s, :math:`P_{eff} = 100\%`
|
||||
- 2 MPI tasks: 530.6 katom-step/s, :math:`P_{eff} = 97\%`
|
||||
- 4 MPI tasks: 1.021 Matom-step/s, :math:`P_{eff} = 93\%`
|
||||
- 8 MPI tasks: 1.837 Matom-step/s, :math:`P_{eff} = 84\%`
|
||||
- 16 MPI tasks: 3.574 Matom-step/s, :math:`P_{eff} = 82\%`
|
||||
- 32 MPI tasks: 6.479 Matom-step/s, :math:`P_{eff} = 74\%`
|
||||
- 64 MPI tasks: 9.032 Matom-step/s, :math:`P_{eff} = 52\%`
|
||||
- 128 MPI tasks: 12.03 Matom-step/s, :math:`P_{eff} = 34\%`
|
||||
|
||||
The 128 MPI tasks run uses CPU cores from hyper-threading.
|
||||
|
||||
For a small system with only 32000 atoms the parallel efficiency
|
||||
drops off earlier when the number of work units is too small relative
|
||||
to the communication overhead:
|
||||
|
||||
- 1 MPI task: 270.8 katom-step/s, :math:`P_{eff} = 100\%`
|
||||
- 2 MPI tasks: 529.3 katom-step/s, :math:`P_{eff} = 98\%`
|
||||
- 4 MPI tasks: 989.8 katom-step/s, :math:`P_{eff} = 91\%`
|
||||
- 8 MPI tasks: 1.832 Matom-step/s, :math:`P_{eff} = 85\%`
|
||||
- 16 MPI tasks: 3.463 Matom-step/s, :math:`P_{eff} = 80\%`
|
||||
- 32 MPI tasks: 5.970 Matom-step/s, :math:`P_{eff} = 69\%`
|
||||
- 64 MPI tasks: 7.477 Matom-step/s, :math:`P_{eff} = 42\%`
|
||||
- 128 MPI tasks: 8.069 Matom-step/s, :math:`P_{eff} = 23\%`
|
||||
|
||||
Measuring performance of your input deck
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The best way to do this is run the your system (actual number of atoms)
|
||||
for a modest number of timesteps (say 100 steps) on several different
|
||||
processor counts, including a single processor if possible. Do this for
|
||||
an equilibrium version of your system, so that the 100-step timings are
|
||||
representative of a much longer run. There is typically no need to run
|
||||
for 1000s of timesteps to get accurate timings; you can simply
|
||||
extrapolate from short runs.
|
||||
|
||||
For the set of runs, look at the timing data printed to the screen and
|
||||
log file at the end of each LAMMPS run. The
|
||||
@ -28,12 +230,15 @@ breakdown and relative percentages. For example, trying different
|
||||
options for speeding up the long-range solvers will have little impact
|
||||
if they only consume 10% of the run time. If the pairwise time is
|
||||
dominating, you may want to look at GPU or OMP versions of the pair
|
||||
style, as discussed below. Comparing how the percentages change as
|
||||
you increase the processor count gives you a sense of how different
|
||||
operations within the timestep are scaling. Note that if you are
|
||||
running with a Kspace solver, there is additional output on the
|
||||
breakdown of the Kspace time. For PPPM, this includes the fraction
|
||||
spent on FFTs, which can be communication intensive.
|
||||
style, as discussed below. Comparing how the percentages change as you
|
||||
increase the processor count gives you a sense of how different
|
||||
operations within the timestep are scaling. If you are using PPPM as
|
||||
Kspace solver, you can turn on an additional output with
|
||||
:doc:`kspace_modify fftbench yes <kspace_modify>` which measures the
|
||||
time spent during PPPM on the 3d FFTs, which can be communication
|
||||
intensive for larger processor counts. This provides an indication
|
||||
whether it is worth trying out alternatives to the default FFT settings
|
||||
for additional performance.
|
||||
|
||||
Another important detail in the timing info are the histograms of
|
||||
atoms counts and neighbor counts. If these vary widely across
|
||||
|
||||
@ -208,6 +208,7 @@ accelerated styles exist.
|
||||
* :doc:`ave/grid <fix_ave_grid>` - compute per-grid time-averaged quantities
|
||||
* :doc:`ave/histo <fix_ave_histo>` - compute/output time-averaged histograms
|
||||
* :doc:`ave/histo/weight <fix_ave_histo>` - weighted version of fix ave/histo
|
||||
* :doc:`ave/moments <fix_ave_moments>` - compute moments of scalar quantities
|
||||
* :doc:`ave/time <fix_ave_time>` - compute/output global time-averaged quantities
|
||||
* :doc:`aveforce <fix_aveforce>` - add an averaged force to each atom
|
||||
* :doc:`balance <fix_balance>` - perform dynamic load-balancing
|
||||
|
||||
@ -82,10 +82,9 @@ specified values may represent calculations performed by computes and
|
||||
fixes which store their own "group" definitions.
|
||||
|
||||
Each listed value can be the result of a compute or fix or the
|
||||
evaluation of an equal-style or vector-style variable. For
|
||||
vector-style variables, the specified indices can include a wildcard
|
||||
character. See the :doc:`fix ave/correlate <fix_ave_correlate>` page
|
||||
for details.
|
||||
evaluation of an equal-style or vector-style variable. The specified
|
||||
indices can include a wildcard string. See the
|
||||
:doc:`fix ave/correlate <fix_ave_correlate>` page for details on that.
|
||||
|
||||
The *Nevery* and *Nfreq* arguments specify on what time steps the input
|
||||
values will be used to calculate correlation data and the frequency
|
||||
|
||||
296
doc/src/fix_ave_moments.rst
Normal file
296
doc/src/fix_ave_moments.rst
Normal file
@ -0,0 +1,296 @@
|
||||
.. index:: fix ave/moments
|
||||
|
||||
fix ave/moments command
|
||||
=======================
|
||||
|
||||
Syntax
|
||||
""""""
|
||||
|
||||
.. code-block:: LAMMPS
|
||||
|
||||
fix ID group-ID ave/moments Nevery Nrepeat Nfreq value1 value2 ... moment1 moment2 ... keyword args ...
|
||||
|
||||
* ID, group-ID are documented in :doc:`fix <fix>` command
|
||||
* ave/moments = style name of this fix command
|
||||
* Nevery = use input values every this many time steps
|
||||
* Nrepeat = # of times to use input values for calculating averages
|
||||
* Nfreq = calculate averages every this many time steps
|
||||
* one or more input variables can be listed
|
||||
* value = v_name
|
||||
|
||||
.. parsed-literal::
|
||||
|
||||
c_ID = global scalar calculated by a compute with ID
|
||||
c_ID[I] = Ith component of global vector calculated by a compute with ID, I can include wildcard (see below)
|
||||
f_ID = global scalar calculated by a fix with ID
|
||||
f_ID[I] = Ith component of global vector calculated by a fix with ID, I can include wildcard (see below)
|
||||
v_name = value calculated by an equal-style variable with name
|
||||
v_name[I] = value calculated by a vector-style variable with name, I can include wildcard (see below)
|
||||
|
||||
* one or more moments to compute can be listed
|
||||
* moment = *mean* or *stddev* or *variance* or *skew* or *kurtosis*, see exact definitions below.
|
||||
* zero or more keyword/arg pairs may be appended
|
||||
* keyword = *start* or *history*
|
||||
|
||||
.. parsed-literal::
|
||||
|
||||
*start* args = Nstart
|
||||
Nstart = invoke first after this time step
|
||||
*history* args = Nrecent
|
||||
Nrecent = keep a history of up to Nrecent outputs
|
||||
|
||||
Examples
|
||||
""""""""
|
||||
|
||||
.. code-block:: LAMMPS
|
||||
|
||||
fix 1 all ave/moments 1 1000 100 v_volume mean stddev
|
||||
fix 1 all ave/moments 1 200 1000 v_volume variance kurtosis history 10
|
||||
|
||||
Description
|
||||
"""""""""""
|
||||
|
||||
.. versionadded:: TBD
|
||||
|
||||
Using one or more values as input, calculate the moments of the underlying
|
||||
(population) distributions based on samples collected every few time
|
||||
steps over a time step window. The definitions of the moments calculated
|
||||
are given below.
|
||||
|
||||
The group specified with this command is ignored. However, note that
|
||||
specified values may represent calculations performed by computes and
|
||||
fixes which store their own "group" definitions.
|
||||
|
||||
Each listed value can be the result of a :doc:`compute <compute>` or
|
||||
:doc:`fix <fix>` or the evaluation of an equal-style or vector-style
|
||||
:doc:`variable <variable>`. In each case, the compute, fix, or variable
|
||||
must produce a global quantity, not a per-atom or local quantity.
|
||||
If you wish to spatial- or time-average or histogram per-atom
|
||||
quantities from a compute, fix, or variable, then see the :doc:`fix
|
||||
ave/chunk <fix_ave_chunk>`, :doc:`fix ave/atom <fix_ave_atom>`, or
|
||||
:doc:`fix ave/histo <fix_ave_histo>` commands. If you wish to sum a
|
||||
per-atom quantity into a single global quantity, see the :doc:`compute
|
||||
reduce <compute_reduce>` command.
|
||||
|
||||
Many :doc:`computes <compute>` and :doc:`fixes <fix>` produce global
|
||||
quantities. See their doc pages for details. :doc:`Variables <variable>`
|
||||
of style *equal* and *vector* are the only ones that can be used with
|
||||
this fix. Variables of style *atom* cannot be used, since they produce
|
||||
per-atom values.
|
||||
|
||||
The input values must all be scalars or vectors with a bracketed term
|
||||
appended, indicating the :math:`I^\text{th}` value of the vector is
|
||||
used.
|
||||
|
||||
The result of this fix can be accessed as a vector, containing the
|
||||
interleaved moments of each input in order. If M moments are requested,
|
||||
then the moments of input 1 will be the first M values in the vector
|
||||
output by this fix. The moments of input 2 will the next M values, etc.
|
||||
If there are N values, the vector length will be N*M.
|
||||
|
||||
----------
|
||||
|
||||
For input values from a compute or fix or variable, the bracketed index
|
||||
I can be specified using a wildcard asterisk with the index to
|
||||
effectively specify multiple values. This takes the form "\*" or "\*n"
|
||||
or "m\*" or "m\*n". If :math:`N` is the size of the vector, then an
|
||||
asterisk with no numeric values means all indices from 1 to :math:`N`.
|
||||
A leading asterisk means all indices from 1 to n (inclusive). A
|
||||
trailing asterisk means all indices from n to :math:`N` (inclusive). A
|
||||
middle asterisk means all indices from m to n (inclusive).
|
||||
|
||||
Using a wildcard is the same as if the individual elements of the vector
|
||||
or cells of the array had been listed one by one. For examples, see the
|
||||
description of this capability in :doc:`fix ave/time <fix_ave_time>`.
|
||||
|
||||
----------
|
||||
|
||||
The :math:`N_\text{every}`, :math:`N_\text{repeat}`, and
|
||||
:math:`N_\text{freq}` arguments specify on what time steps the input
|
||||
values will be used in order to contribute to the average. The final
|
||||
statistics are generated on time steps that are a multiple of
|
||||
:math:`N_\text{freq}`\ . The average is over a window of up to
|
||||
:math:`N_\text{repeat}` quantities, computed in the preceding portion of
|
||||
the simulation once every :math:`N_\text{every}` time steps.
|
||||
|
||||
.. note::
|
||||
|
||||
Contrary to most fix ave/* commands, it is not required that Nevery *
|
||||
Nrepeat <= Nfreq. This is to allow the user to choose the time
|
||||
window and number of samples contributing to the output at each
|
||||
Nfreq interval.
|
||||
|
||||
For example, if :math:`N_\text{freq}=100` and :math:`N_\text{repeat}=5`
|
||||
(and :math:`N_\text{every}=1`), then on step 100 values from time steps
|
||||
96, 97, 98, 99, and 100 will be used. The fix does not compute its
|
||||
inputs on steps that are not required. If :math:`N_\text{freq}=5`,
|
||||
:math:`N_\text{repeat}=8` and :math:`N_\text{every}=1`, then values
|
||||
will first be calculated on step 5 from steps 1-5, on step 10 from 3-10,
|
||||
on step 15 from 8-15 and so on, forming a rolling average over
|
||||
timesteps that span a time window larger than Nfreq.
|
||||
|
||||
----------
|
||||
|
||||
If a value begins with "c\_", a compute ID must follow which has been
|
||||
previously defined in the input script. If no bracketed term is
|
||||
appended, the global scalar calculated by the compute is used. If a
|
||||
bracketed term is appended, the Ith element of the global vector
|
||||
calculated by the compute is used. See the discussion above for how I
|
||||
can be specified with a wildcard asterisk to effectively specify
|
||||
multiple values.
|
||||
|
||||
If a value begins with "f\_", a fix ID must follow which has been
|
||||
previously defined in the input script. If no bracketed term is
|
||||
appended, the global scalar calculated by the fix is used. If a
|
||||
bracketed term is appended, the Ith element of the global vector
|
||||
calculated by the fix is used. See the discussion above for how I can
|
||||
be specified with a wildcard asterisk to effectively specify multiple
|
||||
values.
|
||||
|
||||
Note that some fixes only produce their values on certain time steps,
|
||||
which must be compatible with *Nevery*, else an error will result.
|
||||
Users can also write code for their own fix styles and :doc:`add them to
|
||||
LAMMPS <Modify>`.
|
||||
|
||||
If a value begins with "v\_", a variable name must follow which has been
|
||||
previously defined in the input script. Only equal-style or vector-style
|
||||
variables can be used, which both produce global values. Vector-style
|
||||
variables require a bracketed term to specify the Ith element of the
|
||||
vector calculated by the variable.
|
||||
|
||||
Note that variables of style *equal* and *vector* define a formula which
|
||||
can reference individual atom properties or thermodynamic keywords, or
|
||||
they can invoke other computes, fixes, or variables when they are
|
||||
evaluated, so this is a very general means of specifying quantities to
|
||||
time average.
|
||||
|
||||
----------
|
||||
|
||||
The moments are output in the order requested in the arguments following
|
||||
the last input. Any number and order of moments can be specified,
|
||||
although it does not make much sense to specify the same moment multiple
|
||||
times. All moments are computed using a correction of the sample estimators
|
||||
used to obtain unbiased cumulants :math:`k_{1..4}` (see :ref:`(Cramer)
|
||||
<Cramer1>`). The correction for variance is the standard Bessel
|
||||
correction. For other moments, see :ref:`(Joanes)<Joanes1>`.
|
||||
|
||||
For *mean*, the arithmetic mean :math:`\bar{x} = \frac{1}{n}
|
||||
\sum_{i=1}^{n} x_i` is calculated.
|
||||
|
||||
For *variance*, the Bessel-corrected sample variance :math:`var = k_2 =
|
||||
\frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2` is calculated.
|
||||
|
||||
For *stddev*, the Bessel-corrected sample standard deviation
|
||||
:math:`stddev = \sqrt{k_2}` is calculated.
|
||||
|
||||
For *skew*, the adjusted Fisher--Pearson standardized moment :math:`G_1
|
||||
= \frac{k_3}{k_2^{3/2}} = \frac{k_3}{stddev^3}` is calculated.
|
||||
|
||||
For *kurtosis*, the adjusted Fisher--Pearson standardized moment
|
||||
:math:`G_2 = \frac{k_4}{k_2^2}` is calculated.
|
||||
|
||||
----------
|
||||
|
||||
Fix invocation and output can be modified by optional keywords.
|
||||
|
||||
The *start* keyword specifies that the first computation should be no
|
||||
earlier than the step number given (but will still occur on a multiple
|
||||
of *Nfreq*). The default is step 0. Often input values can be 0.0 at
|
||||
time 0, so setting *start* to a larger value can avoid including a 0.0
|
||||
in a longer series.
|
||||
|
||||
The *history* keyword stores the Nrecent most recent outputs on Nfreq
|
||||
timesteps, so they can be accessed as global outputs of the fix. Nrecent
|
||||
must be >= 1. The default is 1, meaning only the most recent output is
|
||||
accessible. For example, if history 10 is specified and Nfreq = 1000,
|
||||
then on timestep 20000, the Nfreq outputs from steps 20000, 19000, ...
|
||||
11000 are available for access. See below for details on how to access
|
||||
the history values.
|
||||
|
||||
For example, this will store the outputs of the previous 10 Nfreq
|
||||
time steps, i.e. a window of 10000 time steps:
|
||||
|
||||
.. code-block:: LAMMPS
|
||||
|
||||
fix 1 all ave/moments 1 200 1000 v_volume mean history 10
|
||||
|
||||
The previous results can be accessed as values in a global array output
|
||||
by this fix. Each column of the array is the vector output of the N-th
|
||||
preceding Nfreq timestep. For example, assuming a single moment is
|
||||
calculated, the most recent result corresponding to the third input
|
||||
value would be accessed as "f_name[3][1]", "f_name[3][4]" is the 4th
|
||||
most recent and so on. The current vector output is always the first
|
||||
column of the array, corresponding to the most recent result.
|
||||
|
||||
To illustrate the utility of keeping output history, consider using
|
||||
this fix in conjunction with :doc:`fix halt <fix_halt>` to stop a run
|
||||
automatically if a quantity is converged to within some desired tolerance:
|
||||
|
||||
.. code-block:: LAMMPS
|
||||
|
||||
variable target equal etot
|
||||
fix aveg all ave/moments 1 200 1000 v_target mean stddev history 10
|
||||
variable stopcond equal "abs(f_aveg[1]-f_aveg[1][10])<f_aveg[2]"
|
||||
fix fhalt all halt 1000 v_stopcond == 1
|
||||
|
||||
In this example, every 1000 time steps, the average and standard
|
||||
deviation of the total energy over the previous 200 time steps are
|
||||
calculated. If the difference between the most recent and 10-th most
|
||||
recent average is lower than the most recent standard deviation, the run
|
||||
is stopped.
|
||||
|
||||
----------
|
||||
|
||||
Restart, fix_modify, output, run start/stop, minimize info
|
||||
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
||||
|
||||
No information about this fix is written to :doc:`binary restart files
|
||||
<restart>`.
|
||||
|
||||
This fix produces a global vector and global array which can be accessed
|
||||
by various :doc:`output commands <Howto_output>`. The values can be
|
||||
accessed on any time step, but may not be current.
|
||||
|
||||
A global vector is produced with the # of elements = number of moments *
|
||||
number of inputs. The moments are output in the order given in the fix
|
||||
definition. An array is produced having # of rows = length of vector
|
||||
output (with an ordering which matches the vector) and # of columns =
|
||||
value of *history*. There is always at least one column.
|
||||
|
||||
Each element of the global vector or array can be either "intensive" or
|
||||
"extensive", depending on whether the values contributing to the element
|
||||
are "intensive" or "extensive". If a compute or fix provides the value
|
||||
being time averaged, then the compute or fix determines whether the value
|
||||
is intensive or extensive; see the page for that compute or fix for
|
||||
further info. Values produced by a variable are treated as intensive.
|
||||
|
||||
No parameter of this fix can be used with the *start/stop* keywords of
|
||||
the :doc:`run <run>` command. This fix is not invoked during
|
||||
:doc:`energy minimization <minimize>`.
|
||||
|
||||
Restrictions
|
||||
""""""""""""
|
||||
|
||||
This compute is part of the EXTRA-FIX package. It is only enabled if
|
||||
LAMMPS was built with that package. See the :doc:`Build package
|
||||
<Build_package>` page for more info.
|
||||
|
||||
Related commands
|
||||
""""""""""""""""
|
||||
|
||||
:doc:`fix ave/time <fix_ave_time>`,
|
||||
|
||||
Default
|
||||
"""""""
|
||||
|
||||
The option defaults are history = 1, start = 0.
|
||||
|
||||
----------
|
||||
|
||||
.. _Cramer1:
|
||||
|
||||
**(Cramer)** Cramer, Mathematical Methods of Statistics, Princeton University Press (1946).
|
||||
|
||||
.. _Joanes1:
|
||||
|
||||
**(Joanes)** Joanes, Gill, The Statistician, 47, 183--189 (1998).
|
||||
@ -100,6 +100,56 @@ first is assigned to intra-molecular interactions (i.e. both atoms
|
||||
have the same molecule ID), the second to inter-molecular interactions
|
||||
(i.e. interacting atoms have different molecule IDs).
|
||||
|
||||
.. admonition:: When **NOT** to use a hybrid pair style
|
||||
:class: warning
|
||||
|
||||
Using pair style *hybrid* can be very tempting to use if you need a
|
||||
**many-body potential** supporting a mix of elements for which you
|
||||
cannot find a potential file that covers *all* of them. Regardless
|
||||
of how this is set up, there will be *errors*. The major use case
|
||||
where the error is *small*, is when the many-body sub-styles are used
|
||||
on different objects (for example a slab and a liquid, a metal and a
|
||||
nano-machining work piece). In that case the *mixed* terms
|
||||
**should** be provided by a pair-wise additive potential (like
|
||||
Lennard-Jones or Morse) to avoid unexpected behavior and reduce
|
||||
errors. LAMMPS cannot easily check for this condition and thus will
|
||||
accept good and bad choices alike.
|
||||
|
||||
Outside of this, we *strongly* recommend *against* using pair style
|
||||
hybrid with many-body potentials for the following reasons:
|
||||
|
||||
1. When trying to combine EAM or MEAM potentials, there is a *large*
|
||||
error in the embedding term, since it is computed separately for
|
||||
each sub-style only.
|
||||
|
||||
2. When trying to combine many-body potentials like Stillinger-Weber,
|
||||
Tersoff, AIREBO, Vashishta, or similar, you have to understand
|
||||
that the potential of a sub-style cannot be applied in a pair-wise
|
||||
fashion but will need to be applied to multiples of atoms
|
||||
(e.g. a Tersoff potential of elements A and B includes the
|
||||
interactions A-A, B-B, A-B, A-A-A, A-A-B, A-B-B, A-B-A, B-A-A,
|
||||
B-A-B, B-B-A, B-B-B; AIREBO also considers all quadruples of
|
||||
atom elements).
|
||||
|
||||
3. When one of the sub-styles uses charge-equilibration (= QEq; like
|
||||
in ReaxFF or COMB) you have inconsistent QEq behavior because
|
||||
either you try to apply QEq to *all* atoms but then you are
|
||||
missing the QEq parameters for the non-QEq pair style (and it
|
||||
would be inconsistent to apply QEq for pair styles that are not
|
||||
parameterized for QEq) or else you would have either no charges or
|
||||
fixed charges interacting with the QEq which also leads to
|
||||
inconsistent behavior between two sub-styles. When attempting to
|
||||
use multiple ReaxFF instances to combine different potential
|
||||
files, you might be able to work around the QEq limitations, but
|
||||
point 2. still applies.
|
||||
|
||||
We understand that it is frustrating to not be able to run simulations
|
||||
due to lack of available potential files, but that does not justify
|
||||
combining potentials in a broken way via pair style hybrid. This is
|
||||
not what the hybrid pair styles are designed for.
|
||||
|
||||
----------
|
||||
|
||||
Here are two examples of hybrid simulations. The *hybrid* style could
|
||||
be used for a simulation of a metal droplet on a LJ surface. The metal
|
||||
atoms interact with each other via an *eam* potential, the surface atoms
|
||||
@ -374,12 +424,11 @@ selected sub-style.
|
||||
|
||||
----------
|
||||
|
||||
.. note::
|
||||
|
||||
Several of the potentials defined via the pair_style command in
|
||||
LAMMPS are really many-body potentials, such as Tersoff, AIREBO, MEAM,
|
||||
ReaxFF, etc. The way to think about using these potentials in a
|
||||
hybrid setting is as follows.
|
||||
Even though the command name "pair_style" would suggest that these are
|
||||
pair-wise interactions, several of the potentials defined via the
|
||||
pair_style command in LAMMPS are really many-body potentials, such as
|
||||
Tersoff, AIREBO, MEAM, ReaxFF, etc. The way to think about using these
|
||||
potentials in a hybrid setting is as follows.
|
||||
|
||||
A subset of atom types is assigned to the many-body potential with a
|
||||
single :doc:`pair_coeff <pair_coeff>` command, using "\* \*" to include
|
||||
|
||||
Reference in New Issue
Block a user