Merge branch 'develop' into fix-set-command

This commit is contained in:
Axel Kohlmeyer
2025-06-04 16:22:45 -04:00
160 changed files with 4080 additions and 887 deletions

View File

@ -29,6 +29,7 @@ OPT.
* :doc:`ave/grid <fix_ave_grid>`
* :doc:`ave/histo <fix_ave_histo>`
* :doc:`ave/histo/weight <fix_ave_histo>`
* :doc:`ave/moments <fix_ave_moments>`
* :doc:`ave/time <fix_ave_time>`
* :doc:`aveforce <fix_aveforce>`
* :doc:`balance <fix_balance>`

View File

@ -29,6 +29,7 @@ Available topics in mostly chronological order are:
- `Rename of fix STORE/PERATOM to fix STORE/ATOM and change of arguments`_
- `Use Output::get_dump_by_id() instead of Output::find_dump()`_
- `Refactored grid communication using Grid3d/Grid2d classes instead of GridComm`_
- `FLERR as first argument to minimum image functions in Domain class`_
----
@ -610,3 +611,47 @@ KSpace solvers which use distributed FFT grids:
- ``src/KSPACE/pppm.cpp``
This change is **required** or else the code will not compile.
FLERR as first argument to minimum image functions in Domain class
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. versionchanged:: TBD
The ``Domain::minimum_image()`` and ``Domain::minimum_image_big()``
functions were changed to take the ``FLERR`` macros as first argument.
This way the error message indicates *where* the function was called
instead of pointing to the implementation of the function. Example:
Old:
.. code-block:: c++
double delx1 = x[i1][0] - x[i2][0];
double dely1 = x[i1][1] - x[i2][1];
double delz1 = x[i1][2] - x[i2][2];
domain->minimum_image(delx1, dely1, delz1);
double r1 = sqrt(delx1 * delx1 + dely1 * dely1 + delz1 * delz1);
double delx2 = x[i3][0] - x[i2][0];
double dely2 = x[i3][1] - x[i2][1];
double delz2 = x[i3][2] - x[i2][2];
domain->minimum_image_big(delx2, dely2, delz2);
double r2 = sqrt(delx2 * delx2 + dely2 * dely2 + delz2 * delz2);
New:
.. code-block:: c++
double delx1 = x[i1][0] - x[i2][0];
double dely1 = x[i1][1] - x[i2][1];
double delz1 = x[i1][2] - x[i2][2];
domain->minimum_image(FLERR, delx1, dely1, delz1);
double r1 = sqrt(delx1 * delx1 + dely1 * dely1 + delz1 * delz1);
double delx2 = x[i3][0] - x[i2][0];
double dely2 = x[i3][1] - x[i2][1];
double delz2 = x[i3][2] - x[i2][2];
domain->minimum_image_big(FLERR, delx2, dely2, delz2);
double r2 = sqrt(delx2 * delx2 + dely2 * dely2 + delz2 * delz2);
This change is **required** or else the code will not compile.

View File

@ -75,15 +75,34 @@ section below for examples where this has been done.
**Differences between the GPU and KOKKOS packages:**
* The GPU package accelerates only pair force, neighbor list, and (parts
of) PPPM calculations. The KOKKOS package attempts to run most of the
of) PPPM calculations (and runs the remaining force computations on
the CPU concurrently). The KOKKOS package attempts to run most of the
calculation on the GPU, but can transparently support non-accelerated
code (with a performance penalty due to having data transfers between
host and GPU).
* The list of which styles are accelerated by the GPU or KOKKOS package
differs with some overlap.
* The GPU package requires neighbor lists to be built on the CPU when using
hybrid pair styles, exclusion lists, or a triclinic simulation box.
* The GPU package can be compiled for CUDA, HIP, or OpenCL and thus supports
NVIDIA, AMD, and Intel GPUs well. On NVIDIA hardware, using CUDA is
typically resulting in equal or better performance over OpenCL.
* OpenCL in the GPU package does theoretically also support Intel CPUs or
Intel Xeon Phi, but the native support for those in KOKKOS (or INTEL)
is superior.
* The GPU package benefits from running multiple MPI processes (2-8) per
GPU to parallelize the non-GPU accelerated styles. The KOKKOS package
usually not, especially when all parts of the calculation have KOKKOS
support.
* The GPU package can be compiled for CUDA, HIP, or OpenCL and thus
supports NVIDIA, AMD, and Intel GPUs well. On NVIDIA or AMD hardware,
using native CUDA or HIP compilation, respectively, with either GPU or
KOKKOS results in equal or better performance over OpenCL.
* OpenCL in the GPU package supports NVIDIA, AMD, and Intel GPUs at the
*same time* and with the *same executable*. KOKKOS currently does not
support OpenCL.
* The GPU package supports single precision floating point, mixed
precision floating point, and double precision floating point math on
the GPU. This must be chosen at compile time. KOKKOS currently only
supports double precision floating point math. Using single or mixed
precision (recommended) results in significantly improved performance
on consumer GPUs for some loss in accuracy (which is rather small with
mixed precision). Single and mixed precision support for KOKKOS is in
development (no ETA yet).
* Some pair styles (for example :doc:`snap <pair_snap>`, :doc:`mliap
<pair_mliap>` or :doc:`reaxff <pair_reaxff>` in the KOKKOS package have
seen extensive optimizations and specializations for GPUs and CPUs.

View File

@ -1,16 +1,218 @@
Measuring performance
=====================
Before trying to make your simulation run faster, you should
understand how it currently performs and where the bottlenecks are.
Factors that influence performance
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The best way to do this is run the your system (actual number of
atoms) for a modest number of timesteps (say 100 steps) on several
different processor counts, including a single processor if possible.
Do this for an equilibrium version of your system, so that the
100-step timings are representative of a much longer run. There is
typically no need to run for 1000s of timesteps to get accurate
timings; you can simply extrapolate from short runs.
Before trying to make your simulation run faster, you should understand
how it currently performs and where the bottlenecks are. We generally
distinguish between serial performance (how fast can a single process do
the calculations?) and parallel efficiency (how much faster does a
calculation get by using more processes?). There are many factors
affecting either and below are some lists discussing some commonly
known but also some less known factors.
Factors affecting serial performance (in no specific order):
* CPU hardware: clock rate, cache sizes, CPU architecture (instructions
per clock, vectorization support, fused multiply-add support and more)
* RAM speed and number of channels that the CPU can use to access RAM
* Cooling: CPUs can change the CPU clock based on thermal load, thus the
degree of cooling can affect the speed of a CPU. Sometimes even the
temperature of neighboring compute nodes in a cluster can make a
difference.
* Compiler optimization: most of LAMMPS is written to be easy to modify
and thus compiler optimization can speed up calculations. However, too
aggressive compiler optimization can produce incorrect results or
crashes (during compilation or at runtime).
* Source code improvements: styles in the OPT, OPENMP, and INTEL package
can be faster than their base implementation due to improved data
access patterns, cache efficiency, or vectorization. Compiler optimization
is required to take full advantage of these.
* Number and kind of fixes, computes, or variables used during a simulation,
especially if they result in collective communication operations
* Pair style cutoffs and system density: calculations get slower the more
neighbors are in the neighbor list and thus for which interactions need
to be computed. Force fields with pair styles that compute interactions
between triples or quadruples of atoms or that use embedding energies or
charge equilibration will need to walk the neighbor lists multiple times.
* Neighbor list settings: tradeoff between neighbor list skin (larger
skin = more neighbors, more distances to compute before applying the
cutoff) and frequency of neighbor list builds (larger skin = fewer
neighbor list builds).
* Proximity of per-atom data in physical memory that for atoms that are
close in space improves cache efficiency (thus LAMMPS will by default
sort atoms in local storage accordingly)
* Using r-RESPA multi-timestepping or a SHAKE or RATTLE fix to constrain
bonds with higher-frequency vibrations may allow a larger (outer) timestep
and thus fewer force evaluations (usually the most time consuming step in
MD) for the same simulated time (with some tradeoff in accuracy).
Factors affecting parallel efficiency (in no specific order):
* Bandwidth and latency of communication between processes. This can vary a
lot between processes on the same CPU or physical node and processes
on different physical nodes and there vary between different
communication technologies (like Ethernet or InfiniBand or other
high-speed interconnects)
* Frequency and complexity of communication patterns required
* Number of "work units" (usually correlated with the number of atoms
and choice of force field) per MPI-process required for one time step
(if this number becomes too small, the cost of communication becomes
dominant).
* Choice of parallelization method (MPI-only, OpenMP-only, MPI+OpenMP,
MPI+GPU, MPI+GPU+OpenMP)
* Algorithmic complexity of the chosen force field (pair-wise vs. many-body
potential, Ewald vs. PPPM vs. (compensated or smoothed) cutoff-Coulomb)
* Communication cutoff: a larger cutoff results in more ghost atoms and
thus more data that needs to be communicated
* Frequency of neighbor list builds: during a neighbor list build the
domain decomposition is updated and the list of ghost atoms rebuilt
which requires multiple global communication steps
* FFT-grid settings and number of MPI processes for kspace style PPPM:
PPPM uses parallel 3d FFTs which will drop much faster in parallel
efficiency with respect to the number of MPI processes than other
parts of the force computation. Thus using MPI+OpenMP parallelization
or :doc:`run style verlet/split <run_style>` can improve parallel
efficiency by limiting the number of MPI processes used for the FFTs.
* Load (im-)balance: LAMMPS' domain decomposition assumes that atoms are
evenly distributed across the entire simulation box. If there are
areas of vacuum, this may lead to different amounts of work for
different MPI processes. Using the :doc:`processors command
<processors>` to change the spatial decomposition, or MPI+OpenMP
parallelization instead of only-MPI to have larger sub-domains, or the
(fix) balance command (without or with switching to communication style
tiled) to change the sub-domain volumes are all methods that
can help to avoid load imbalances.
Examples comparing serial performance
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Before looking at your own input deck(s), you should get some reference
data from a known input so that you know what kind of performance you
should expect from your input. For the following we therefore use the
``in.rhodo.scaled`` input file and ``data.rhodo`` data file from the
``bench`` folder. This is a system of 32000 atoms using the CHARMM force
field and long-range electrostatics running for 100 MD steps. The
performance data is printed at the end of a run and only measures the
performance during propagation and excludes the setup phase.
Running with a single MPI process on an AMD Ryzen Threadripper PRO
9985WX CPU (64 cores, 128 threads, base clock: 3.2GHz, max. clock
5.4GHz, L1/L2/L3 cache 5MB/64MB/256MB, 8 DDR5-6400 memory channels) one
gets the following performance report:
.. code-block::
Performance: 1.232 ns/day, 19.476 hours/ns, 7.131 timesteps/s, 228.197 katom-step/s
99.2% CPU use with 1 MPI tasks x 1 OpenMP threads
The %CPU value should be at 100% or very close. Lower values would
be an indication that there are *other* processes also using the same
CPU core and thus invalidating the performance data. The katom-step/s
value is best suited for comparisons, since it is fairly independent
from the system size. The `in.rhodo.scaled` input can be easily made
larger through replication in the three dimensions by settings variables
"x", "y", "z" to values other than 1 from the command line with the
"-var" flag. Example:
- 32000 atoms: 228.8 katom-step/s
- 64000 atoms: 231.6 katom-step/s
- 128000 atoms: 231.1 katom-step/s
- 256000 atoms: 226.4 katom-step/s
- 864000 atoms: 229.6 katom-step/s
Comparing to an AMD Ryzen 7 7840HS CPU (8 cores, 16 threads, base clock
3.8GHz, max. clock 5.1GHz, L1/L2/L3 cache 512kB/8MB/16MB, 2 DDR5-5600
memory channels), we get similar single core performance (~220
katom-step/s vs. ~230 katom-step/s) due to the similar clock and
architecture:
- 32000 atoms: 219.8 katom-step/s
- 64000 atoms: 222.5 katom-step/s
- 128000 atoms: 216.8 katom-step/s
- 256000 atoms: 221.0 katom-step/s
- 864000 atoms: 221.1 katom-step/s
Switching to an older Intel Xeon E5-2650 v4 CPU (12 cores, 12 threads,
base clock 2.2GHz, max. clock 2.9GHz, L1/L2/L3 cache (64kB/256kB/30MB, 4
DDR4-2400 memory channels) leads to a lower performance of approximately
109 katom-step/s due to differences in architecture and clock. In all
cases, when looking at multiple runs, the katom-step/s property
fluctuates by approximately 1% around the average.
From here on we are looking at the performance for the 256000 atom system only
and change several settings incrementally:
#. No compiler optimization GCC (-Og -g): 183.8 katom-step/s
#. Moderate optimization with debug info GCC (-O2 -g): 231.1 katom-step/s
#. Full compiler optimization GCC (-DNDEBUG -O3): 236.0 katom-step/s
#. Aggressive compiler optimization GCC (-O3 -ffast-math -march=native): 239.9 katom-step/s
#. Source code optimization in OPENMP package (1 thread): 266.7 katom-step/s
#. Use *fix nvt* instead of *fix npt* (compute virial only every 50 steps): 272.9 katom-step/s
#. Increase pair style cutoff by 2 :math:`\AA`: 181.2 katom-step/s
#. Use tight PPPM convergence (1.0e-6 instead of 1.0e-4): 161.9 katom-step/s
#. Use Ewald summation instead of PPPM (at 1.0e-4 convergence): 19.9 katom-step/s
The numbers show that gains from aggressive compiler optimizations are
rather small in LAMMPS, the data access optimizations in the OPENMP (and
OPT) packages are more prominent. On the other side, using more
accurate force field settings causes, not unexpectedly, a significant
slowdown (to about half the speed). Finally, using regular Ewald
summation causes a massive slowdown due to the bad algorithmic scaling
with system size.
Examples comparing parallel performance
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The parallel performance usually goes on top of the serial performance.
Using twice as many processors should increase the performance metric
by up to a factor of two. With the number of processors *N* and the
serial performance :math:`p_1` and the performance for *N* processors
:math:`p_N` we can define a *parallel efficiency* in percent as follows:
.. math::
P_{eff} = \frac{p_N}{p_1 \cdot N} \cdot 100\%
For the AMD Ryzen Threadripper PRO 9985WX CPU and the serial
simulation settings of point 6. from above, we get the following
parallel efficiency data for the 256000 atom system:
- 1 MPI task: 273.6 katom-step/s, :math:`P_{eff} = 100\%`
- 2 MPI tasks: 530.6 katom-step/s, :math:`P_{eff} = 97\%`
- 4 MPI tasks: 1.021 Matom-step/s, :math:`P_{eff} = 93\%`
- 8 MPI tasks: 1.837 Matom-step/s, :math:`P_{eff} = 84\%`
- 16 MPI tasks: 3.574 Matom-step/s, :math:`P_{eff} = 82\%`
- 32 MPI tasks: 6.479 Matom-step/s, :math:`P_{eff} = 74\%`
- 64 MPI tasks: 9.032 Matom-step/s, :math:`P_{eff} = 52\%`
- 128 MPI tasks: 12.03 Matom-step/s, :math:`P_{eff} = 34\%`
The 128 MPI tasks run uses CPU cores from hyper-threading.
For a small system with only 32000 atoms the parallel efficiency
drops off earlier when the number of work units is too small relative
to the communication overhead:
- 1 MPI task: 270.8 katom-step/s, :math:`P_{eff} = 100\%`
- 2 MPI tasks: 529.3 katom-step/s, :math:`P_{eff} = 98\%`
- 4 MPI tasks: 989.8 katom-step/s, :math:`P_{eff} = 91\%`
- 8 MPI tasks: 1.832 Matom-step/s, :math:`P_{eff} = 85\%`
- 16 MPI tasks: 3.463 Matom-step/s, :math:`P_{eff} = 80\%`
- 32 MPI tasks: 5.970 Matom-step/s, :math:`P_{eff} = 69\%`
- 64 MPI tasks: 7.477 Matom-step/s, :math:`P_{eff} = 42\%`
- 128 MPI tasks: 8.069 Matom-step/s, :math:`P_{eff} = 23\%`
Measuring performance of your input deck
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The best way to do this is run the your system (actual number of atoms)
for a modest number of timesteps (say 100 steps) on several different
processor counts, including a single processor if possible. Do this for
an equilibrium version of your system, so that the 100-step timings are
representative of a much longer run. There is typically no need to run
for 1000s of timesteps to get accurate timings; you can simply
extrapolate from short runs.
For the set of runs, look at the timing data printed to the screen and
log file at the end of each LAMMPS run. The
@ -28,12 +230,15 @@ breakdown and relative percentages. For example, trying different
options for speeding up the long-range solvers will have little impact
if they only consume 10% of the run time. If the pairwise time is
dominating, you may want to look at GPU or OMP versions of the pair
style, as discussed below. Comparing how the percentages change as
you increase the processor count gives you a sense of how different
operations within the timestep are scaling. Note that if you are
running with a Kspace solver, there is additional output on the
breakdown of the Kspace time. For PPPM, this includes the fraction
spent on FFTs, which can be communication intensive.
style, as discussed below. Comparing how the percentages change as you
increase the processor count gives you a sense of how different
operations within the timestep are scaling. If you are using PPPM as
Kspace solver, you can turn on an additional output with
:doc:`kspace_modify fftbench yes <kspace_modify>` which measures the
time spent during PPPM on the 3d FFTs, which can be communication
intensive for larger processor counts. This provides an indication
whether it is worth trying out alternatives to the default FFT settings
for additional performance.
Another important detail in the timing info are the histograms of
atoms counts and neighbor counts. If these vary widely across

View File

@ -208,6 +208,7 @@ accelerated styles exist.
* :doc:`ave/grid <fix_ave_grid>` - compute per-grid time-averaged quantities
* :doc:`ave/histo <fix_ave_histo>` - compute/output time-averaged histograms
* :doc:`ave/histo/weight <fix_ave_histo>` - weighted version of fix ave/histo
* :doc:`ave/moments <fix_ave_moments>` - compute moments of scalar quantities
* :doc:`ave/time <fix_ave_time>` - compute/output global time-averaged quantities
* :doc:`aveforce <fix_aveforce>` - add an averaged force to each atom
* :doc:`balance <fix_balance>` - perform dynamic load-balancing

View File

@ -82,10 +82,9 @@ specified values may represent calculations performed by computes and
fixes which store their own "group" definitions.
Each listed value can be the result of a compute or fix or the
evaluation of an equal-style or vector-style variable. For
vector-style variables, the specified indices can include a wildcard
character. See the :doc:`fix ave/correlate <fix_ave_correlate>` page
for details.
evaluation of an equal-style or vector-style variable. The specified
indices can include a wildcard string. See the
:doc:`fix ave/correlate <fix_ave_correlate>` page for details on that.
The *Nevery* and *Nfreq* arguments specify on what time steps the input
values will be used to calculate correlation data and the frequency

296
doc/src/fix_ave_moments.rst Normal file
View File

@ -0,0 +1,296 @@
.. index:: fix ave/moments
fix ave/moments command
=======================
Syntax
""""""
.. code-block:: LAMMPS
fix ID group-ID ave/moments Nevery Nrepeat Nfreq value1 value2 ... moment1 moment2 ... keyword args ...
* ID, group-ID are documented in :doc:`fix <fix>` command
* ave/moments = style name of this fix command
* Nevery = use input values every this many time steps
* Nrepeat = # of times to use input values for calculating averages
* Nfreq = calculate averages every this many time steps
* one or more input variables can be listed
* value = v_name
.. parsed-literal::
c_ID = global scalar calculated by a compute with ID
c_ID[I] = Ith component of global vector calculated by a compute with ID, I can include wildcard (see below)
f_ID = global scalar calculated by a fix with ID
f_ID[I] = Ith component of global vector calculated by a fix with ID, I can include wildcard (see below)
v_name = value calculated by an equal-style variable with name
v_name[I] = value calculated by a vector-style variable with name, I can include wildcard (see below)
* one or more moments to compute can be listed
* moment = *mean* or *stddev* or *variance* or *skew* or *kurtosis*, see exact definitions below.
* zero or more keyword/arg pairs may be appended
* keyword = *start* or *history*
.. parsed-literal::
*start* args = Nstart
Nstart = invoke first after this time step
*history* args = Nrecent
Nrecent = keep a history of up to Nrecent outputs
Examples
""""""""
.. code-block:: LAMMPS
fix 1 all ave/moments 1 1000 100 v_volume mean stddev
fix 1 all ave/moments 1 200 1000 v_volume variance kurtosis history 10
Description
"""""""""""
.. versionadded:: TBD
Using one or more values as input, calculate the moments of the underlying
(population) distributions based on samples collected every few time
steps over a time step window. The definitions of the moments calculated
are given below.
The group specified with this command is ignored. However, note that
specified values may represent calculations performed by computes and
fixes which store their own "group" definitions.
Each listed value can be the result of a :doc:`compute <compute>` or
:doc:`fix <fix>` or the evaluation of an equal-style or vector-style
:doc:`variable <variable>`. In each case, the compute, fix, or variable
must produce a global quantity, not a per-atom or local quantity.
If you wish to spatial- or time-average or histogram per-atom
quantities from a compute, fix, or variable, then see the :doc:`fix
ave/chunk <fix_ave_chunk>`, :doc:`fix ave/atom <fix_ave_atom>`, or
:doc:`fix ave/histo <fix_ave_histo>` commands. If you wish to sum a
per-atom quantity into a single global quantity, see the :doc:`compute
reduce <compute_reduce>` command.
Many :doc:`computes <compute>` and :doc:`fixes <fix>` produce global
quantities. See their doc pages for details. :doc:`Variables <variable>`
of style *equal* and *vector* are the only ones that can be used with
this fix. Variables of style *atom* cannot be used, since they produce
per-atom values.
The input values must all be scalars or vectors with a bracketed term
appended, indicating the :math:`I^\text{th}` value of the vector is
used.
The result of this fix can be accessed as a vector, containing the
interleaved moments of each input in order. If M moments are requested,
then the moments of input 1 will be the first M values in the vector
output by this fix. The moments of input 2 will the next M values, etc.
If there are N values, the vector length will be N*M.
----------
For input values from a compute or fix or variable, the bracketed index
I can be specified using a wildcard asterisk with the index to
effectively specify multiple values. This takes the form "\*" or "\*n"
or "m\*" or "m\*n". If :math:`N` is the size of the vector, then an
asterisk with no numeric values means all indices from 1 to :math:`N`.
A leading asterisk means all indices from 1 to n (inclusive). A
trailing asterisk means all indices from n to :math:`N` (inclusive). A
middle asterisk means all indices from m to n (inclusive).
Using a wildcard is the same as if the individual elements of the vector
or cells of the array had been listed one by one. For examples, see the
description of this capability in :doc:`fix ave/time <fix_ave_time>`.
----------
The :math:`N_\text{every}`, :math:`N_\text{repeat}`, and
:math:`N_\text{freq}` arguments specify on what time steps the input
values will be used in order to contribute to the average. The final
statistics are generated on time steps that are a multiple of
:math:`N_\text{freq}`\ . The average is over a window of up to
:math:`N_\text{repeat}` quantities, computed in the preceding portion of
the simulation once every :math:`N_\text{every}` time steps.
.. note::
Contrary to most fix ave/* commands, it is not required that Nevery *
Nrepeat <= Nfreq. This is to allow the user to choose the time
window and number of samples contributing to the output at each
Nfreq interval.
For example, if :math:`N_\text{freq}=100` and :math:`N_\text{repeat}=5`
(and :math:`N_\text{every}=1`), then on step 100 values from time steps
96, 97, 98, 99, and 100 will be used. The fix does not compute its
inputs on steps that are not required. If :math:`N_\text{freq}=5`,
:math:`N_\text{repeat}=8` and :math:`N_\text{every}=1`, then values
will first be calculated on step 5 from steps 1-5, on step 10 from 3-10,
on step 15 from 8-15 and so on, forming a rolling average over
timesteps that span a time window larger than Nfreq.
----------
If a value begins with "c\_", a compute ID must follow which has been
previously defined in the input script. If no bracketed term is
appended, the global scalar calculated by the compute is used. If a
bracketed term is appended, the Ith element of the global vector
calculated by the compute is used. See the discussion above for how I
can be specified with a wildcard asterisk to effectively specify
multiple values.
If a value begins with "f\_", a fix ID must follow which has been
previously defined in the input script. If no bracketed term is
appended, the global scalar calculated by the fix is used. If a
bracketed term is appended, the Ith element of the global vector
calculated by the fix is used. See the discussion above for how I can
be specified with a wildcard asterisk to effectively specify multiple
values.
Note that some fixes only produce their values on certain time steps,
which must be compatible with *Nevery*, else an error will result.
Users can also write code for their own fix styles and :doc:`add them to
LAMMPS <Modify>`.
If a value begins with "v\_", a variable name must follow which has been
previously defined in the input script. Only equal-style or vector-style
variables can be used, which both produce global values. Vector-style
variables require a bracketed term to specify the Ith element of the
vector calculated by the variable.
Note that variables of style *equal* and *vector* define a formula which
can reference individual atom properties or thermodynamic keywords, or
they can invoke other computes, fixes, or variables when they are
evaluated, so this is a very general means of specifying quantities to
time average.
----------
The moments are output in the order requested in the arguments following
the last input. Any number and order of moments can be specified,
although it does not make much sense to specify the same moment multiple
times. All moments are computed using a correction of the sample estimators
used to obtain unbiased cumulants :math:`k_{1..4}` (see :ref:`(Cramer)
<Cramer1>`). The correction for variance is the standard Bessel
correction. For other moments, see :ref:`(Joanes)<Joanes1>`.
For *mean*, the arithmetic mean :math:`\bar{x} = \frac{1}{n}
\sum_{i=1}^{n} x_i` is calculated.
For *variance*, the Bessel-corrected sample variance :math:`var = k_2 =
\frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2` is calculated.
For *stddev*, the Bessel-corrected sample standard deviation
:math:`stddev = \sqrt{k_2}` is calculated.
For *skew*, the adjusted Fisher--Pearson standardized moment :math:`G_1
= \frac{k_3}{k_2^{3/2}} = \frac{k_3}{stddev^3}` is calculated.
For *kurtosis*, the adjusted Fisher--Pearson standardized moment
:math:`G_2 = \frac{k_4}{k_2^2}` is calculated.
----------
Fix invocation and output can be modified by optional keywords.
The *start* keyword specifies that the first computation should be no
earlier than the step number given (but will still occur on a multiple
of *Nfreq*). The default is step 0. Often input values can be 0.0 at
time 0, so setting *start* to a larger value can avoid including a 0.0
in a longer series.
The *history* keyword stores the Nrecent most recent outputs on Nfreq
timesteps, so they can be accessed as global outputs of the fix. Nrecent
must be >= 1. The default is 1, meaning only the most recent output is
accessible. For example, if history 10 is specified and Nfreq = 1000,
then on timestep 20000, the Nfreq outputs from steps 20000, 19000, ...
11000 are available for access. See below for details on how to access
the history values.
For example, this will store the outputs of the previous 10 Nfreq
time steps, i.e. a window of 10000 time steps:
.. code-block:: LAMMPS
fix 1 all ave/moments 1 200 1000 v_volume mean history 10
The previous results can be accessed as values in a global array output
by this fix. Each column of the array is the vector output of the N-th
preceding Nfreq timestep. For example, assuming a single moment is
calculated, the most recent result corresponding to the third input
value would be accessed as "f_name[3][1]", "f_name[3][4]" is the 4th
most recent and so on. The current vector output is always the first
column of the array, corresponding to the most recent result.
To illustrate the utility of keeping output history, consider using
this fix in conjunction with :doc:`fix halt <fix_halt>` to stop a run
automatically if a quantity is converged to within some desired tolerance:
.. code-block:: LAMMPS
variable target equal etot
fix aveg all ave/moments 1 200 1000 v_target mean stddev history 10
variable stopcond equal "abs(f_aveg[1]-f_aveg[1][10])<f_aveg[2]"
fix fhalt all halt 1000 v_stopcond == 1
In this example, every 1000 time steps, the average and standard
deviation of the total energy over the previous 200 time steps are
calculated. If the difference between the most recent and 10-th most
recent average is lower than the most recent standard deviation, the run
is stopped.
----------
Restart, fix_modify, output, run start/stop, minimize info
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
No information about this fix is written to :doc:`binary restart files
<restart>`.
This fix produces a global vector and global array which can be accessed
by various :doc:`output commands <Howto_output>`. The values can be
accessed on any time step, but may not be current.
A global vector is produced with the # of elements = number of moments *
number of inputs. The moments are output in the order given in the fix
definition. An array is produced having # of rows = length of vector
output (with an ordering which matches the vector) and # of columns =
value of *history*. There is always at least one column.
Each element of the global vector or array can be either "intensive" or
"extensive", depending on whether the values contributing to the element
are "intensive" or "extensive". If a compute or fix provides the value
being time averaged, then the compute or fix determines whether the value
is intensive or extensive; see the page for that compute or fix for
further info. Values produced by a variable are treated as intensive.
No parameter of this fix can be used with the *start/stop* keywords of
the :doc:`run <run>` command. This fix is not invoked during
:doc:`energy minimization <minimize>`.
Restrictions
""""""""""""
This compute is part of the EXTRA-FIX package. It is only enabled if
LAMMPS was built with that package. See the :doc:`Build package
<Build_package>` page for more info.
Related commands
""""""""""""""""
:doc:`fix ave/time <fix_ave_time>`,
Default
"""""""
The option defaults are history = 1, start = 0.
----------
.. _Cramer1:
**(Cramer)** Cramer, Mathematical Methods of Statistics, Princeton University Press (1946).
.. _Joanes1:
**(Joanes)** Joanes, Gill, The Statistician, 47, 183--189 (1998).

View File

@ -100,6 +100,56 @@ first is assigned to intra-molecular interactions (i.e. both atoms
have the same molecule ID), the second to inter-molecular interactions
(i.e. interacting atoms have different molecule IDs).
.. admonition:: When **NOT** to use a hybrid pair style
:class: warning
Using pair style *hybrid* can be very tempting to use if you need a
**many-body potential** supporting a mix of elements for which you
cannot find a potential file that covers *all* of them. Regardless
of how this is set up, there will be *errors*. The major use case
where the error is *small*, is when the many-body sub-styles are used
on different objects (for example a slab and a liquid, a metal and a
nano-machining work piece). In that case the *mixed* terms
**should** be provided by a pair-wise additive potential (like
Lennard-Jones or Morse) to avoid unexpected behavior and reduce
errors. LAMMPS cannot easily check for this condition and thus will
accept good and bad choices alike.
Outside of this, we *strongly* recommend *against* using pair style
hybrid with many-body potentials for the following reasons:
1. When trying to combine EAM or MEAM potentials, there is a *large*
error in the embedding term, since it is computed separately for
each sub-style only.
2. When trying to combine many-body potentials like Stillinger-Weber,
Tersoff, AIREBO, Vashishta, or similar, you have to understand
that the potential of a sub-style cannot be applied in a pair-wise
fashion but will need to be applied to multiples of atoms
(e.g. a Tersoff potential of elements A and B includes the
interactions A-A, B-B, A-B, A-A-A, A-A-B, A-B-B, A-B-A, B-A-A,
B-A-B, B-B-A, B-B-B; AIREBO also considers all quadruples of
atom elements).
3. When one of the sub-styles uses charge-equilibration (= QEq; like
in ReaxFF or COMB) you have inconsistent QEq behavior because
either you try to apply QEq to *all* atoms but then you are
missing the QEq parameters for the non-QEq pair style (and it
would be inconsistent to apply QEq for pair styles that are not
parameterized for QEq) or else you would have either no charges or
fixed charges interacting with the QEq which also leads to
inconsistent behavior between two sub-styles. When attempting to
use multiple ReaxFF instances to combine different potential
files, you might be able to work around the QEq limitations, but
point 2. still applies.
We understand that it is frustrating to not be able to run simulations
due to lack of available potential files, but that does not justify
combining potentials in a broken way via pair style hybrid. This is
not what the hybrid pair styles are designed for.
----------
Here are two examples of hybrid simulations. The *hybrid* style could
be used for a simulation of a metal droplet on a LJ surface. The metal
atoms interact with each other via an *eam* potential, the surface atoms
@ -374,12 +424,11 @@ selected sub-style.
----------
.. note::
Several of the potentials defined via the pair_style command in
LAMMPS are really many-body potentials, such as Tersoff, AIREBO, MEAM,
ReaxFF, etc. The way to think about using these potentials in a
hybrid setting is as follows.
Even though the command name "pair_style" would suggest that these are
pair-wise interactions, several of the potentials defined via the
pair_style command in LAMMPS are really many-body potentials, such as
Tersoff, AIREBO, MEAM, ReaxFF, etc. The way to think about using these
potentials in a hybrid setting is as follows.
A subset of atom types is assigned to the many-body potential with a
single :doc:`pair_coeff <pair_coeff>` command, using "\* \*" to include