starting grammar, punctuation, and spelling review for developer info sections

This commit is contained in:
Axel Kohlmeyer
2023-02-04 18:03:30 -05:00
parent a0a7e76cc3
commit d1550bf9f6
15 changed files with 366 additions and 357 deletions

View File

@ -1,52 +1,53 @@
Code design
-----------
This section explains some of the code design choices in LAMMPS with
the goal of helping developers write new code similar to the existing
code. Please see the section on :doc:`Requirements for contributed
code <Modify_style>` for more specific recommendations and guidelines.
While that section is organized more in the form of a checklist for
code contributors, the focus here is on overall code design strategy,
choices made between possible alternatives, and discussing some
relevant C++ programming language constructs.
This section explains some code design choices in LAMMPS with the goal
of helping developers write new code similar to the existing code.
Please see the section on :doc:`Requirements for contributed code
<Modify_style>` for more specific recommendations and guidelines. While
that section is organized more in the form of a checklist for code
contributors, the focus here is on overall code design strategy, choices
made between possible alternatives, and discussing some relevant C++
programming language constructs.
Historically, the basic design philosophy of the LAMMPS C++ code was a
"C with classes" style. The motivation was to make it easy to modify
LAMMPS for people without significant training in C++ programming.
Data structures and code constructs were used that resemble the
previous implementation(s) in Fortran. A contributing factor to this
choice also was that at the time, C++ compilers were often not mature
and some of the advanced features contained bugs or did not function
as the standard required. There were also disagreements between
compiler vendors as to how to interpret the C++ standard documents.
LAMMPS for people without significant training in C++ programming. Data
structures and code constructs were used that resemble the previous
implementation(s) in Fortran. A contributing factor to this choice was
that at the time, C++ compilers were often not mature and some advanced
features contained bugs or did not function as the standard required.
There were also disagreements between compiler vendors as to how to
interpret the C++ standard documents.
However, C++ compilers have now advanced significantly. In 2020 we
decided to to require the C++11 standard as the minimum C++ language
standard for LAMMPS. Since then we have begun to also replace some of
the C-style constructs with equivalent C++ functionality, either from
the C++ standard library or as custom classes or functions, in order
to improve readability of the code and to increase code reuse through
abstraction of commonly used functionality.
However, C++ compilers and the C++ programming language have advanced
significantly. In 2020, the LAMMPS developers decided to require the
C++11 standard as the minimum C++ language standard for LAMMPS. Since
then, we have begun to replace C-style constructs with equivalent C++
functionality. This was taken either from the C++ standard library or
implemented as custom classes or functions. The goal is to improve
readability of the code and to increase code reuse through abstraction
of commonly used functionality.
.. note::
Please note that as of spring 2022 there is still a sizable chunk
of legacy code in LAMMPS that has not yet been refactored to
reflect these style conventions in full. LAMMPS has a large code
base and many different contributors and there also is a hierarchy
of precedence in which the code is adapted. Highest priority has
been the code in the ``src`` folder, followed by code in packages
in order of their popularity and complexity (simpler code is
adapted sooner), followed by code in the ``lib`` folder. Source
code that is downloaded from external packages or libraries during
compilation is not subject to the conventions discussed here.
Please note that as of spring 2023 there is still a sizable chunk of
legacy code in LAMMPS that has not yet been refactored to reflect
these style conventions in full. LAMMPS has a large code base and
many contributors. There is also a hierarchy of precedence in which
the code is adapted. Highest priority has been the code in the
``src`` folder, followed by code in packages in order of their
popularity and complexity (simpler code gets adapted sooner), followed
by code in the ``lib`` folder. Source code that is downloaded from
external packages or libraries during compilation is not subject to
the conventions discussed here.
Object oriented code
Object-oriented code
^^^^^^^^^^^^^^^^^^^^
LAMMPS is designed to be an object oriented code. Each simulation is
LAMMPS is designed to be an object-oriented code. Each simulation is
represented by an instance of the LAMMPS class. When running in
parallel each MPI process creates such an instance. This can be seen
parallel, each MPI process creates such an instance. This can be seen
in the ``main.cpp`` file where the core steps of running a LAMMPS
simulation are the following 3 lines of code:
@ -67,14 +68,14 @@ other special features.
The basic LAMMPS class hierarchy which is created by the LAMMPS class
constructor is shown in :ref:`class-topology`. When input commands
are processed, additional class instances are created, or deleted, or
replaced. Likewise specific member functions of specific classes are
replaced. Likewise, specific member functions of specific classes are
called to trigger actions such creating atoms, computing forces,
computing properties, time-propagating the system, or writing output.
Compositing and Inheritance
===========================
LAMMPS makes extensive use of the object oriented programming (OOP)
LAMMPS makes extensive use of the object-oriented programming (OOP)
principles of *compositing* and *inheritance*. Classes like the
``LAMMPS`` class are a **composite** containing pointers to instances
of other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``,
@ -83,7 +84,7 @@ functionality by storing and manipulating data related to the
simulation and providing member functions that trigger certain
actions. Some of those classes like ``Force`` are themselves
composites, containing instances of classes describing different force
interactions. Similarly the ``Modify`` class contains a list of
interactions. Similarly, the ``Modify`` class contains a list of
``Fix`` and ``Compute`` classes. If the input commands that
correspond to these classes include the word *style*, then LAMMPS
stores only a single instance of that class. E.g. *atom_style*,
@ -100,19 +101,18 @@ derived class variant was instantiated. In LAMMPS these derived
classes are often referred to as "styles", e.g. pair styles, fix
styles, atom styles and so on.
This is the origin of the flexibility of LAMMPS. For example pair
This is the origin of the flexibility of LAMMPS. For example, pair
styles implement a variety of different non-bonded interatomic
potentials functions. All details for the implementation of a
potential are stored and executed in a single class.
As mentioned above, there can be multiple instances of classes derived
from the ``Fix`` or ``Compute`` base classes. They represent a
different facet of LAMMPS flexibility as they provide methods which
can be called at different points in time within a timestep, as
explained in `Developer_flow`. This allows the input script to tailor
how a specific simulation is run, what diagnostic computations are
performed, and how the output of those computations is further
processed or output.
different facet of LAMMPS' flexibility, as they provide methods which
can be called at different points within a timestep, as explained in
`Developer_flow`. This allows the input script to tailor how a specific
simulation is run, what diagnostic computations are performed, and how
the output of those computations is further processed or output.
Additional code sharing is possible by creating derived classes from the
derived classes (e.g., to implement an accelerated version of a pair
@ -164,15 +164,15 @@ The difference in behavior of the ``normal()`` and the ``poly()`` member
functions is which of the two member functions is called when executing
`base1->call()` versus `base2->call()`. Without polymorphism, a
function within the base class can only call member functions within the
same scope, that is ``Base::call()`` will always call
``Base::normal()``. But for the `base2->call()` case the call of the
same scope: that is, ``Base::call()`` will always call
``Base::normal()``. But for the `base2->call()` case, the call of the
virtual member function will be dispatched to ``Derived::poly()``
instead. This mechanism means that functions are called within the
scope of the class type that was used to *create* the class instance are
invoked; even if they are assigned to a pointer using the type of a base
class. This is the desired behavior and this way LAMMPS can even use
styles that are loaded at runtime from a shared object file with the
:doc:`plugin command <plugin>`.
instead. This mechanism results in calling functions that are within
the scope of the class that was used to *create* the instance, even if
they are assigned to a pointer for their base class. This is the
desired behavior, and this way LAMMPS can even use styles that are loaded
at runtime from a shared object file with the :doc:`plugin command
<plugin>`.
A special case of virtual functions are so-called pure functions. These
are virtual functions that are initialized to 0 in the class declaration
@ -189,12 +189,12 @@ This has the effect that an instance of the base class cannot be
created and that derived classes **must** implement these functions.
Many of the functions listed with the various class styles in the
section :doc:`Modify` are pure functions. The motivation for this is
to define the interface or API of the functions but defer their
to define the interface or API of the functions, but defer their
implementation to the derived classes.
However, there are downsides to this. For example, calls to virtual
functions from within a constructor, will not be in the scope of the
derived class and thus it is good practice to either avoid calling them
functions from within a constructor, will *not* be in the scope of the
derived class, and thus it is good practice to either avoid calling them
or to provide an explicit scope such as ``Base::poly()`` or
``Derived::poly()``. Furthermore, any destructors in classes containing
virtual functions should be declared virtual too, so they will be
@ -208,8 +208,8 @@ dispatch.
that are intended to replace a virtual or pure function use the
``override`` property keyword. For the same reason, the use of
overloads or default arguments for virtual functions should be
avoided as they lead to confusion over which function is supposed to
override which and which arguments need to be declared.
avoided, as they lead to confusion over which function is supposed to
override which, and which arguments need to be declared.
Style Factories
===============
@ -219,10 +219,10 @@ uses a programming pattern called `Factory`. Those are functions that
create an instance of a specific derived class, say ``PairLJCut`` and
return a pointer to the type of the common base class of that style,
``Pair`` in this case. To associate the factory function with the
style keyword, an ``std::map`` class is used with function pointers
style keyword, a ``std::map`` class is used with function pointers
indexed by their keyword (for example "lj/cut" for ``PairLJCut`` and
"morse" for ``PairMorse``). A couple of typedefs help keep the code
readable and a template function is used to implement the actual
readable, and a template function is used to implement the actual
factory functions for the individual classes. Below is an example
of such a factory function from the ``Force`` class as declared in
``force.h`` and implemented in ``force.cpp``. The file ``style_pair.h``
@ -279,26 +279,26 @@ from and writing to files and console instead of C++ "iostreams".
This is mainly motivated by better performance, better control over
formatting, and less effort to achieve specific formatting.
Since mixing "stdio" and "iostreams" can lead to unexpected
behavior. use of the latter is strongly discouraged. Also output to
the screen should not use the predefined ``stdout`` FILE pointer, but
rather the ``screen`` and ``logfile`` FILE pointers managed by the
LAMMPS class. Furthermore, output should generally only be done by
MPI rank 0 (``comm->me == 0``). Output that is sent to both
``screen`` and ``logfile`` should use the :cpp:func:`utils::logmesg()
convenience function <LAMMPS_NS::utils::logmesg>`.
Since mixing "stdio" and "iostreams" can lead to unexpected behavior,
use of the latter is strongly discouraged. Output to the screen should
*not* use the predefined ``stdout`` FILE pointer, but rather the
``screen`` and ``logfile`` FILE pointers managed by the LAMMPS class.
Furthermore, output should generally only be done by MPI rank 0
(``comm->me == 0``). Output that is sent to both ``screen`` and
``logfile`` should use the :cpp:func:`utils::logmesg() convenience
function <LAMMPS_NS::utils::logmesg>`.
We also discourage the use of stringstreams because the bundled {fmt}
library and the customized tokenizer classes can provide the same
functionality in a cleaner way with better performance. This also
helps maintain a consistent programming syntax with code from many
different contributors.
We discourage the use of stringstreams because the bundled {fmt} library
and the customized tokenizer classes provide the same functionality in a
cleaner way with better performance. This also helps maintain a
consistent programming syntax with code from many different
contributors.
Formatting with the {fmt} library
===================================
The LAMMPS source code includes a copy of the `{fmt} library
<https://fmt.dev>`_ which is preferred over formatting with the
<https://fmt.dev>`_, which is preferred over formatting with the
"printf()" family of functions. The primary reason is that it allows
a typesafe default format for any type of supported data. This is
particularly useful for formatting integers of a given size (32-bit or
@ -313,17 +313,16 @@ been included into the C++20 language standard, so changes to adopt it
are future-proof.
Formatted strings are frequently created by calling the
``fmt::format()`` function which will return a string as a
``std::string`` class instance. In contrast to the ``%`` placeholder
in ``printf()``, the {fmt} library uses ``{}`` to embed format
descriptors. In the simplest case, no additional characters are
needed as {fmt} will choose the default format based on the data type
of the argument. Otherwise the ``fmt::print()`` function may be
used instead of ``printf()`` or ``fprintf()``. In addition, several
LAMMPS output functions, that originally accepted a single string as
argument have been overloaded to accept a format string with optional
arguments as well (e.g., ``Error::all()``, ``Error::one()``,
``utils::logmesg()``).
``fmt::format()`` function, which will return a string as a
``std::string`` class instance. In contrast to the ``%`` placeholder in
``printf()``, the {fmt} library uses ``{}`` to embed format descriptors.
In the simplest case, no additional characters are needed, as {fmt} will
choose the default format based on the data type of the argument.
Otherwise, the ``fmt::print()`` function may be used instead of
``printf()`` or ``fprintf()``. In addition, several LAMMPS output
functions, that originally accepted a single string as argument have
been overloaded to accept a format string with optional arguments as
well (e.g., ``Error::all()``, ``Error::one()``, ``utils::logmesg()``).
Summary of the {fmt} format syntax
==================================
@ -332,10 +331,11 @@ The syntax of the format string is "{[<argument id>][:<format spec>]}",
where either the argument id or the format spec (separated by a colon
':') is optional. The argument id is usually a number starting from 0
that is the index to the arguments following the format string. By
default these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The most
common case for using argument id would be to use the same argument in
multiple places in the format string without having to provide it as an
argument multiple times. In LAMMPS the argument id is rarely used.
default, these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The
most common case for using argument id would be to use the same argument
in multiple places in the format string without having to provide it as
an argument multiple times. The argument id is rarely used in the LAMMPS
source code.
More common is the use of a format specifier, which starts with a colon.
This may optionally be followed by a fill character (default is ' '). If
@ -347,18 +347,19 @@ width, which may be followed by a dot '.' and a precision for floating
point numbers. The final character in the format string would be an
indicator for the "presentation", i.e. 'd' for decimal presentation of
integers, 'x' for hexadecimal, 'o' for octal, 'c' for character etc.
This mostly follows the "printf()" scheme but without requiring an
This mostly follows the "printf()" scheme, but without requiring an
additional length parameter to distinguish between different integer
widths. The {fmt} library will detect those and adapt the formatting
accordingly. For floating point numbers there are correspondingly, 'g'
for generic presentation, 'e' for exponential presentation, and 'f' for
fixed point presentation.
Thus "{:8}" would represent *any* type argument using at least 8
characters; "{:<8}" would do this as left aligned, "{:^8}" as centered,
"{:>8}" as right aligned. If a specific presentation is selected, the
argument type must be compatible or else the {fmt} formatting code will
throw an exception. Some format string examples are given below:
The format string "{:8}" would thus represent *any* type argument and be
replaced by at least 8 characters; "{:<8}" would do this as left
aligned, "{:^8}" as centered, "{:>8}" as right aligned. If a specific
presentation is selected, the argument type must be compatible or else
the {fmt} formatting code will throw an exception. Some format string
examples are given below:
.. code-block:: c++
@ -392,12 +393,12 @@ documentation <https://fmt.dev/latest/syntax.html>`_ website.
Memory management
^^^^^^^^^^^^^^^^^
Dynamical allocation of small data and objects can be done with the
the C++ commands "new" and "delete/delete[]. Large data should use
the member functions of the ``Memory`` class, most commonly,
``Memory::create()``, ``Memory::grow()``, and ``Memory::destroy()``,
which provide variants for vectors, 2d arrays, 3d arrays, etc.
These can also be used for small data.
Dynamical allocation of small data and objects can be done with the C++
commands "new" and "delete/delete[]". Large data should use the member
functions of the ``Memory`` class, most commonly, ``Memory::create()``,
``Memory::grow()``, and ``Memory::destroy()``, which provide variants
for vectors, 2d arrays, 3d arrays, etc. These can also be used for
small data.
The use of ``malloc()``, ``calloc()``, ``realloc()`` and ``free()``
directly is strongly discouraged. To simplify adapting legacy code
@ -408,26 +409,24 @@ perform additional error checks for safety.
Use of these custom memory allocation functions is motivated by the
following considerations:
- memory allocation failures on *any* MPI rank during a parallel run
will trigger an immediate abort of the entire parallel calculation
instead of stalling it
- a failing "new" will trigger an exception which is also captured by
LAMMPS and triggers a global abort
- allocation of multi-dimensional arrays will be done in a C compatible
fashion but so that the storage of the actual data is stored in one
large contiguous block. Thus when MPI communication is needed,
- Memory allocation failures on *any* MPI rank during a parallel run
will trigger an immediate abort of the entire parallel calculation.
- A failing "new" will trigger an exception, which is also captured by
LAMMPS and triggers a global abort.
- Allocation of multidimensional arrays will be done in a C compatible
fashion, but such that the storage of the actual data is stored in one
large contiguous block. Thus, when MPI communication is needed,
the data can be communicated directly (similar to Fortran arrays).
- the "destroy()" and "sfree()" functions may safely be called on NULL
pointers
- the "destroy()" functions will nullify the pointer variables making
"use after free" errors easy to detect
- it is possible to use a larger than default memory alignment (not on
- The "destroy()" and "sfree()" functions may safely be called on NULL
pointers.
- The "destroy()" functions will nullify the pointer variables, thus
making "use after free" errors easy to detect.
- It is possible to use a larger than default memory alignment (not on
all operating systems, since the allocated storage pointers must be
compatible with ``free()`` for technical reasons)
compatible with ``free()`` for technical reasons).
In the practical implementation of code this means that any pointer
variables that are class members should be initialized to a
``nullptr`` value in their respective constructors. That way it is
safe to call ``Memory::destroy()`` or ``delete[]`` on them before
*any* allocation outside the constructor. This helps prevent memory
leaks.
In the practical implementation of code this means, that any pointer
variables, that are class members should be initialized to a ``nullptr``
value in their respective constructors. That way, it is safe to call
``Memory::destroy()`` or ``delete[]`` on them before *any* allocation
outside the constructor. This helps prevent memory leaks.

View File

@ -28,7 +28,7 @@ The need to do this communication arises when data from the owned atoms
is updated (e.g. their positions) and this updated information needs to
be **copied** to the corresponding ghost atoms.
And second, *reverse communication* which sends ghost atom information
And second, *reverse communication*, which sends ghost atom information
from each processor to the owning processor to **accumulate** (sum)
the values with the corresponding owned atoms. The need for this
arises when data is computed and also stored with ghost atoms
@ -58,7 +58,7 @@ embedded-atom method (EAM) which compute intermediate values in the
first part of the compute() function that need to be stored by both
owned and ghost atoms for the second part of the force computation.
The *Comm* class methods perform the MPI communication for buffers of
per-atom data. They "call back" to the *Pair* class so it can *pack*
per-atom data. They "call back" to the *Pair* class, so it can *pack*
or *unpack* the buffer with data the *Pair* class owns. There are 4
such methods that the *Pair* class must define, assuming it uses both
forward and reverse communication:
@ -70,7 +70,7 @@ forward and reverse communication:
The arguments to these methods include the buffer and a list of atoms
to pack or unpack. The *Pair* class also must set the *comm_forward*
and *comm_reverse* variables which store the number of values stored
and *comm_reverse* variables, which store the number of values stored
in the communication buffers for each operation. This means, if
desired, it can choose to store multiple per-atom values in the
buffer, and they will be communicated together to minimize
@ -81,11 +81,11 @@ containing ``double`` values. To correctly store integers that may be
The *Fix*, *Compute*, and *Dump* classes can also invoke the same kind
of forward and reverse communication operations using the same *Comm*
class methods. Likewise the same pack/unpack methods and
class methods. Likewise, the same pack/unpack methods and
comm_forward/comm_reverse variables must be defined by the calling
*Fix*, *Compute*, or *Dump* class.
For *Fix* classes there is an optional second argument to the
For *Fix* classes, there is an optional second argument to the
*forward_comm()* and *reverse_comm()* call which can be used when the
fix performs multiple modes of communication, with different numbers
of values per atom. The fix should set the *comm_forward* and
@ -150,7 +150,7 @@ latter case, when the *ring* operation is complete, each processor can
examine its original buffer to extract modified values.
Note that the *ring* operation is similar to an MPI_Alltoall()
operation where every processor effectively sends and receives data to
operation, where every processor effectively sends and receives data to
every other processor. The difference is that the *ring* operation
does it one step at a time, so the total volume of data does not need
to be stored by every processor. However, the *ring* operation is
@ -184,8 +184,8 @@ The *exchange_data()* method triggers the communication to be
performed. Each processor provides the vector of *N* datums to send,
and the size of each datum. All datums must be the same size.
The *create_atom()* and *exchange_atom()* methods are similar except
that the size of each datum can be different. Typically this is used
The *create_atom()* and *exchange_atom()* methods are similar, except
that the size of each datum can be different. Typically, this is used
to communicate atoms, each with a variable amount of per-atom data, to
other processors.

View File

@ -45,9 +45,9 @@ other methods in the class.
zero before each timestep, so that forces (torques, etc) can be
accumulated.
Now for the ``Verlet::run()`` method. Its basic structure in hi-level pseudo
code is shown below. In the actual code in ``src/verlet.cpp`` some of
these operations are conditionally invoked.
Now for the ``Verlet::run()`` method. Its basic structure in hi-level
pseudocode is shown below. In the actual code in ``src/verlet.cpp``
some of these operations are conditionally invoked.
.. code-block:: python
@ -105,17 +105,17 @@ need it. These flags are passed to the various methods that compute
particle interactions, so that they either compute and tally the
corresponding data or can skip the extra calculations if the energy and
virial are not needed. See the comments for the ``Integrate::ev_set()``
method which document the flag values.
method, which document the flag values.
At various points of the timestep, fixes are invoked,
e.g. ``fix->initial_integrate()``. In the code, this is actually done
via the Modify class which stores all the Fix objects and lists of which
via the Modify class, which stores all the Fix objects and lists of which
should be invoked at what point in the timestep. Fixes are the LAMMPS
mechanism for tailoring the operations of a timestep for a particular
simulation. As described elsewhere, each fix has one or more methods,
each of which is invoked at a specific stage of the timestep, as show in
the timestep pseudo-code. All the active fixes defined in an input
script, that are flagged to have an ``initial_integrate()`` method are
the timestep pseudocode. All the active fixes defined in an input
script, that are flagged to have an ``initial_integrate()`` method, are
invoked at the beginning of each timestep. Examples are :doc:`fix nve
<fix_nve>` or :doc:`fix nvt or fix npt <fix_nh>` which perform the
start-of-timestep velocity-Verlet integration operations to update
@ -131,9 +131,9 @@ can be changed using the :doc:`neigh_modify every/delay/check
<neigh_modify>` command. If not, coordinates of ghost atoms are
acquired by each processor via the ``forward_comm()`` method of the Comm
class. If neighbor lists need to be built, several operations within
the inner if clause of the pseudo-code are first invoked. The
the inner if clause of the pseudocode are first invoked. The
``pre_exchange()`` method of any defined fixes is invoked first.
Typically this inserts or deletes particles from the system.
Typically, this inserts or deletes particles from the system.
Periodic boundary conditions are then applied by the Domain class via
its ``pbc()`` method to remap particles that have moved outside the
@ -148,7 +148,7 @@ The box boundaries are then reset (if needed) via the ``reset_box()``
method of the Domain class, e.g. if box boundaries are shrink-wrapped to
current particle coordinates. A change in the box size or shape
requires internal information for communicating ghost atoms (Comm class)
and neighbor list bins (Neighbor class) be updated. The ``setup()``
and neighbor list bins (Neighbor class) to be updated. The ``setup()``
method of the Comm class and ``setup_bins()`` method of the Neighbor
class perform the update.
@ -217,20 +217,21 @@ file, and restart files. See the :doc:`thermo_style <thermo_style>`,
:doc:`dump <dump>`, and :doc:`restart <restart>` commands for more
details.
The the flow of control during energy minimization iterations is
similar to that of a molecular dynamics timestep. Forces are computed,
neighbor lists are built as needed, atoms migrate to new processors, and
atom coordinates and forces are communicated to neighboring processors.
The only difference is what Fix class operations are invoked when. Only
a subset of LAMMPS fixes are useful during energy minimization, as
The flow of control during energy minimization iterations is similar to
that of a molecular dynamics timestep. Forces are computed, neighbor
lists are built as needed, atoms migrate to new processors, and atom
coordinates and forces are communicated to neighboring processors. The
only difference is what Fix class operations are invoked when. Only a
subset of LAMMPS fixes are useful during energy minimization, as
explained in their individual doc pages. The relevant Fix class methods
are ``min_pre_exchange()``, ``min_pre_force()``, and ``min_post_force()``.
Each fix is invoked at the appropriate place within the minimization
iteration. For example, the ``min_post_force()`` method is analogous to
the ``post_force()`` method for dynamics; it is used to alter or constrain
forces on each atom, which affects the minimization procedure.
are ``min_pre_exchange()``, ``min_pre_force()``, and
``min_post_force()``. Each fix is invoked at the appropriate place
within the minimization iteration. For example, the
``min_post_force()`` method is analogous to the ``post_force()`` method
for dynamics; it is used to alter or constrain forces on each atom,
which affects the minimization procedure.
After all iterations are completed there is a ``cleanup`` step which
After all iterations are completed, there is a ``cleanup`` step which
calls the ``post_run()`` method of fixes to perform operations only required
at the end of a calculations (like freeing temporary storage or creating
at the end of a calculation (like freeing temporary storage or creating
final outputs).

View File

@ -70,7 +70,7 @@ A command can define multiple grids, each of a different size. Each
grid is an instantiation of the Grid2d or Grid3d class.
The command also defines what data it will store for each grid it
creates and it allocates the multi-dimensional array(s) needed to
creates and it allocates the multidimensional array(s) needed to
store the data. No grid cell data is stored within the Grid2d or
Grid3d classes.
@ -115,7 +115,7 @@ which stores *Nvalues* per grid cell.
nyhi_out, nxlo_out, nxhi_out, nvalues,
"data3d_multi");
Note that these multi-dimensional arrays are allocated as contiguous
Note that these multidimensional arrays are allocated as contiguous
chunks of memory where the x-index of the grid varies fastest, then y,
and the z-index slowest. For multiple values per grid cell, the
Nvalues are contiguous, so their index varies even faster than the
@ -798,7 +798,7 @@ A value of -1 is returned if the data name is not recognized.
The *get_griddata_by_index()* method is called after the
*get_griddata_by_name()* method, using the data index it returned as
its argument. This method will return a pointer to the
multi-dimensional array which stores the requested data.
multidimensional array which stores the requested data.
As in the discussion above of the Memory class *create_offset()*
methods, the dimensionality of the array associated with the returned

View File

@ -15,7 +15,7 @@ for more information about that part of the build process. LAMMPS
currently supports building with :doc:`conventional makefiles
<Build_make>` and through :doc:`CMake <Build_cmake>`. Those procedures
differ in how packages are enabled or disabled for inclusion into a
LAMMPS binary so they cannot be mixed. The source files for each
LAMMPS binary, so they cannot be mixed. The source files for each
package are in all-uppercase subdirectories of the ``src`` folder, for
example ``src/MOLECULE`` or ``src/EXTRA-MOLECULE``. The ``src/STUBS``
subdirectory is not a package but contains a dummy MPI library, that is
@ -26,31 +26,31 @@ with traditional makefiles.
The ``lib`` directory contains the source code for several supporting
libraries or files with configuration settings to use globally installed
libraries, that are required by some of the optional packages. They may
libraries, that are required by some optional packages. They may
include python scripts that can transparently download additional source
code on request. Each subdirectory, like ``lib/poems`` or ``lib/gpu``,
contains the source files, some of which are in different languages such
as Fortran or CUDA. These libraries included in the LAMMPS build,
if the corresponding package is installed.
as Fortran or CUDA. These libraries included in the LAMMPS build, if the
corresponding package is installed.
LAMMPS C++ source files almost always come in pairs, such as
``src/run.cpp`` (implementation file) and ``src/run.h`` (header file).
Each pair of files defines a C++ class, for example the
:cpp:class:`LAMMPS_NS::Run` class which contains the code invoked by the
:doc:`run <run>` command in a LAMMPS input script. As this example
:cpp:class:`LAMMPS_NS::Run` class, which contains the code invoked by
the :doc:`run <run>` command in a LAMMPS input script. As this example
illustrates, source file and class names often have a one-to-one
correspondence with a command used in a LAMMPS input script. Some
source files and classes do not have a corresponding input script
command, e.g. ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force`
command, for example ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force`
class. They are discussed in the next section.
The names of all source files are in lower case and may use the
underscore character '_' to separate words. Outside of bundled libraries
which may have different conventions, all C and C++ header files have a
``.h`` extension, all C++ files have a ``.cpp`` extension, and C files a
``.c`` extension. A small number of C++ classes and utility functions
are implemented with only a ``.h`` file. Examples are the Pointers and
Commands classes or the MathVec functions.
underscore character '_' to separate words. Apart from bundled,
externally maintained libraries, which may have different conventions,
all C and C++ header files have a ``.h`` extension, all C++ files have a
``.cpp`` extension, and C files a ``.c`` extension. A few C++ classes
and utility functions are implemented with only a ``.h`` file. Examples
are the Pointers and Commands classes or the MathVec functions.
Class topology
--------------
@ -62,35 +62,36 @@ associated source files in the ``src`` folder, for example the class
:cpp:class:`LAMMPS_NS::Memory` corresponds to the files ``memory.cpp``
and ``memory.h``, or the class :cpp:class:`LAMMPS_NS::AtomVec`
corresponds to the files ``atom_vec.cpp`` and ``atom_vec.h``. Full
lines in the figure represent compositing: that is the class at the base
of the arrow holds a pointer to an instance of the class at the tip.
Dashed lines instead represent inheritance: the class to the tip of the
arrow is derived from the class at the base. Classes with a red boundary
are not instantiated directly, but they represent the base classes for
"styles". Those "styles" make up the bulk of the LAMMPS code and only
a few representative examples are included in the figure so it remains
readable.
lines in the figure represent compositing: that is, the class at the
base of the arrow holds a pointer to an instance of the class at the
tip. Dashed lines instead represent inheritance: the class at the tip
of the arrow is derived from the class at the base. Classes with a red
boundary are not instantiated directly, but they represent the base
classes for "styles". Those "styles" make up the bulk of the LAMMPS
code and only a few representative examples are included in the figure,
so it remains readable.
.. _class-topology:
.. figure:: JPG/lammps-classes.png
LAMMPS class topology
This figure shows some of the relations of the base classes of the
LAMMPS simulation package. Full lines indicate that a class holds an
instance of the class it is pointing to; dashed lines point to
derived classes that are given as examples of what classes may be
instantiated during a LAMMPS run based on the input commands and
accessed through the API define by their respective base classes. At
the core is the :cpp:class:`LAMMPS <LAMMPS_NS::LAMMPS>` class, which
holds pointers to class instances with specific purposes. Those may
hold instances of other classes, sometimes directly, or only
temporarily, sometimes as derived classes or derived classes of
derived classes, which may also hold instances of other classes.
This figure shows relations of base classes of the LAMMPS
simulation package. Full lines indicate that a class holds an
instance of the class it is pointing to; dashed lines point to
derived classes that are given as examples of what classes may be
instantiated during a LAMMPS run based on the input commands and
accessed through the API define by their respective base classes.
At the core is the :cpp:class:`LAMMPS <LAMMPS_NS::LAMMPS>` class,
which holds pointers to class instances with specific purposes.
Those may hold instances of other classes, sometimes directly, or
only temporarily, sometimes as derived classes or derived classes
of derived classes, which may also hold instances of other
classes.
The :cpp:class:`LAMMPS_NS::LAMMPS` class is the topmost class and
represents what is generally referred to an "instance" of LAMMPS. It is
a composite holding pointers to instances of other core classes
represents what is generally referred to as an "instance of LAMMPS". It
is a composite holding pointers to instances of other core classes
providing the core functionality of the MD engine in LAMMPS and through
them abstractions of the required operations. The constructor of the
LAMMPS class will instantiate those instances, process the command line
@ -102,42 +103,44 @@ LAMMPS while passing it the command line flags and input script. It
deletes the LAMMPS instance after the method reading the input returns
and shuts down the MPI environment before it exits the executable.
The :cpp:class:`LAMMPS_NS::Pointers` is not shown in the
The :cpp:class:`LAMMPS_NS::Pointers` class is not shown in the
:ref:`class-topology` figure for clarity. It holds references to many
of the members of the `LAMMPS_NS::LAMMPS`, so that all classes derived
from :cpp:class:`LAMMPS_NS::Pointers` have direct access to those
reference. From the class topology all classes with blue boundary are
references. From the class topology all classes with blue boundary are
referenced in the Pointers class and all classes in the second and third
columns, that are not listed as derived classes are instead derived from
:cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer references
in Pointers, a pointer to the LAMMPS class instance needs to be passed
to the constructor and thus all constructors for classes derived from it
must do so and pass this pointer to the constructor for Pointers.
columns, that are not listed as derived classes, are instead derived
from :cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer
references in Pointers, a pointer to the LAMMPS class instance needs to
be passed to the constructor. All constructors for classes derived from
it, must do so and thus pass that pointer to the constructor for
:cpp:class:`LAMMPS_NS::Pointers`. The default constructor for
:cpp:class:`LAMMPS_NS::Pointers` is disabled to enforce this.
Since all storage is supposed to be encapsulated (there are a few
exceptions), the LAMMPS class can also be instantiated multiple times by
a calling code. Outside of the aforementioned exceptions, those LAMMPS
a calling code. Outside the aforementioned exceptions, those LAMMPS
instances can be used alternately. As of the time of this writing
(early 2021) LAMMPS is not yet sufficiently thread-safe for concurrent
(early 2023) LAMMPS is not yet sufficiently thread-safe for concurrent
execution. When running in parallel with MPI, care has to be taken,
that suitable copies of communicators are used to not create conflicts
between different instances.
The LAMMPS class currently (early 2021) holds instances of 19 classes
representing the core functionality. There are a handful of virtual
parent classes in LAMMPS that define what LAMMPS calls ``styles``. They
are shaded red in the :ref:`class-topology` figure. Each of these are
parents of a number of child classes that implement the interface
defined by the parent class. There are two main categories of these
``styles``: some may only have one instance active at a time (e.g. atom,
pair, bond, angle, dihedral, improper, kspace, comm) and there is a
dedicated pointer variable for each of them in the composite class.
The LAMMPS class currently holds instances of 19 classes representing
the core functionality. There are a handful of virtual parent classes
in LAMMPS that define what LAMMPS calls ``styles``. These are shaded
red in the :ref:`class-topology` figure. Each of these are parents of a
number of child classes that implement the interface defined by the
parent class. There are two main categories of these ``styles``: some
may only have one instance active at a time (e.g. atom, pair, bond,
angle, dihedral, improper, kspace, comm) and there is a dedicated
pointer variable for each of them in the corresponding composite class.
Setups that require a mix of different such styles have to use a
*hybrid* class that takes the place of the one allowed instance and then
manages and forwards calls to the corresponding sub-styles for the
designated subset of atoms or data. The composite class may also have
lists of class instances, e.g. Modify handles lists of compute and fix
styles, while Output handles a list of dump class instances.
*hybrid* class instance that acts as a proxy, and manages and forwards
calls to the corresponding sub-style class instances for the designated
subset of atoms or data. The composite class may also have lists of
class instances, e.g. ``Modify`` handles lists of compute and fix
styles, while ``Output`` handles a list of dump class instances.
The exception to this scheme are the ``command`` style classes. These
implement specific commands that can be invoked before, after, or in
@ -146,19 +149,19 @@ command() method called and then, after completion, the class instance
deleted. Examples for this are the create_box, create_atoms, minimize,
run, set, or velocity command styles.
For all those ``styles`` certain naming conventions are employed: for
For all those ``styles``, certain naming conventions are employed: for
the fix nve command the class is called FixNVE and the source files are
``fix_nve.h`` and ``fix_nve.cpp``. Similarly for fix ave/time we have
``fix_nve.h`` and ``fix_nve.cpp``. Similarly, for fix ave/time we have
FixAveTime and ``fix_ave_time.h`` and ``fix_ave_time.cpp``. Style names
are lower case and without spaces or special characters. A suffix or
words are appended with a forward slash '/' which denotes a variant of
the corresponding class without the suffix. To connect the style name
and the class name, LAMMPS uses macros like: ``AtomStyle()``,
``PairStyle()``, ``BondStyle()``, ``RegionStyle()``, and so on in the
corresponding header file. During configuration or compilation files
corresponding header file. During configuration or compilation, files
with the pattern ``style_<name>.h`` are created that consist of a list
of include statements including all headers of all styles of a given
type that are currently active (or "installed).
type that are currently enabled (or "installed").
More details on individual classes in the :ref:`class-topology` are as
@ -172,8 +175,8 @@ follows:
that one or multiple simulations can be run, on the processors
allocated for a run, e.g. by the mpirun command.
- The Input class reads and processes input input strings and files,
stores variables, and invokes :doc:`commands <Commands_all>`.
- The Input class reads and processes input (strings and files), stores
variables, and invokes :doc:`commands <Commands_all>`.
- Command style classes are derived from the Command class. They provide
input script commands that perform one-time operations
@ -192,7 +195,7 @@ follows:
- The Atom class stores per-atom properties associated with atom styles.
More precisely, they are allocated and managed by a class derived from
the AtomVec class, and the Atom class simply stores pointers to them.
The classes derived from AtomVec represent the different atom styles
The classes derived from AtomVec represent the different atom styles,
and they are instantiated through the :doc:`atom_style <atom_style>`
command.
@ -206,18 +209,22 @@ follows:
class stores a single list (for all atoms). A NeighRequest class
instance is created by pair, fix, or compute styles when they need a
particular kind of neighbor list and use the NeighRequest properties
to select the neighbor list settings for the given request. There can
be multiple instances of the NeighRequest class and the Neighbor class
will try to optimize how they are computed by creating copies or
sub-lists where possible.
to select the neighbor list settings for the given request. There can
be multiple instances of the NeighRequest class. The Neighbor class
will try to optimize how the requests are processed. Depending on the
NeighRequest properties, neighbor lists are constructed from scratch,
aliased, or constructed by post-processing an existing list into
sub-lists.
- The Comm class performs inter-processor communication, typically of
ghost atom information. This usually involves MPI message exchanges
with 6 neighboring processors in the 3d logical grid of processors
mapped to the simulation box. There are two :doc:`communication styles
<comm_style>` enabling different ways to do the domain decomposition.
Sometimes the Irregular class is used, when atoms may migrate to
arbitrary processors.
<comm_style>`, enabling different ways to perform the domain
decomposition.
- The Irregular class is used, when atoms may migrate to arbitrary
processors.
- The Domain class stores the simulation box geometry, as well as
geometric Regions and any user definition of a Lattice. The latter
@ -246,7 +253,7 @@ follows:
file, dump file snapshots, and restart files. These correspond to the
:doc:`Thermo <thermo_style>`, :doc:`Dump <dump>`, and
:doc:`WriteRestart <write_restart>` classes respectively. The Dump
class is a base class with several derived classes implementing
class is a base class, with several derived classes implementing
various dump style variants.
- The Timer class logs timing information, output at the end

View File

@ -1,12 +1,12 @@
Communication
^^^^^^^^^^^^^
Following the partitioning scheme in use all per-atom data is
Following the selected partitioning scheme, all per-atom data is
distributed across the MPI processes, which allows LAMMPS to handle very
large systems provided it uses a correspondingly large number of MPI
processes. Since The per-atom data (atom IDs, positions, velocities,
types, etc.) To be able to compute the short-range interactions MPI
processes need not only access to data of atoms they "own" but also
types, etc.) To be able to compute the short-range interactions, MPI
processes need not only access to the data of atoms they "own" but also
information about atoms from neighboring subdomains, in LAMMPS referred
to as "ghost" atoms. These are copies of atoms storing required
per-atom data for up to the communication cutoff distance. The green
@ -59,7 +59,7 @@ and upper *y*-boundary of rank 0's subdomain. In the *y* stage, ranks
4,5,6 send atoms in their blue-shaded regions to rank 0. This may
include ghost atoms they received in the *x* stage, but only if they
are needed by rank 0 to fill its extended ghost atom regions in the
+/-*y* directions (blue rectangles). Thus in this case, ranks 5 and
+/-*y* directions (blue rectangles). Thus, in this case, ranks 5 and
6 do not include ghost atoms they received from each other (in the *x*
stage) in the atoms they send to rank 0. The key point is that while
the pattern of communication is more complex in the irregular
@ -78,14 +78,14 @@ A "reverse" communication is when computed ghost atom attributes are
sent back to the processor who owns the atom. This is used (for
example) to sum partial forces on ghost atoms to the complete force on
owned atoms. The order of the two stages described in the
:ref:`ghost-atom-comm` figure is inverted and the same lists of atoms
:ref:`ghost-atom-comm` figure is inverted, and the same lists of atoms
are used to pack and unpack message buffers with per-atom forces. When
a received buffer is unpacked, the ghost forces are summed to owned atom
forces. As in forward communication, forces on atoms in the four blue
corners of the diagrams are sent, received, and summed twice (once at
each stage) before owning processors have the full force.
These two operations are used many places within LAMMPS aside from
These two operations are used in many places within LAMMPS aside from
exchange of coordinates and forces, for example by manybody potentials
to share intermediate per-atom values, or by rigid-body integrators to
enable each atom in a body to access body properties. Here are
@ -105,7 +105,7 @@ performed in LAMMPS:
atom pairs when building neighbor lists or computing forces.
- The cutoff distance for exchanging ghost atoms is typically equal to
the neighbor cutoff. But it can also chosen to be longer if needed,
the neighbor cutoff. But it can also set to a larger value if needed,
e.g. half the diameter of a rigid body composed of multiple atoms or
over 3x the length of a stretched bond for dihedral interactions. It
can also exceed the periodic box size. For the regular communication
@ -113,7 +113,7 @@ performed in LAMMPS:
processor's subdomain, then multiple exchanges are performed in the
same direction. Each exchange is with the same neighbor processor,
but buffers are packed/unpacked using a different list of atoms. For
forward communication, in the first exchange a processor sends only
forward communication, in the first exchange, a processor sends only
owned atoms. In subsequent exchanges, it sends ghost atoms received
in previous exchanges. For the irregular pattern (right) overlaps of
a processor's extended ghost-atom subdomain with all other processors

View File

@ -40,28 +40,28 @@ orthogonal boxes.
.. _fft-parallel:
.. figure:: img/fft-decomp-parallel.png
:align: center
parallel FFT in PPPM
Parallel FFT in PPPM
Stages of a parallel FFT for a simulation domain overlaid
with an 8x8x8 3d FFT grid, partitioned across 64 processors.
Within each of the 4 diagrams, grid cells of the same color are
owned by a single processor; for simplicity only cells owned by 4
or 8 of the 64 processors are colored. The two images on the left
illustrate brick-to-pencil communication. The two images on the
right illustrate pencil-to-pencil communication, which in this
case transposes the *y* and *z* dimensions of the grid.
Stages of a parallel FFT for a simulation domain overlaid with an
8x8x8 3d FFT grid, partitioned across 64 processors. Within each
of the 4 diagrams, grid cells of the same color are owned by a
single processor; for simplicity, only cells owned by 4 or 8 of
the 64 processors are colored. The two images on the left
illustrate brick-to-pencil communication. The two images on the
right illustrate pencil-to-pencil communication, which in this
case transposes the *y* and *z* dimensions of the grid.
Parallel 3d FFTs require substantial communication relative to their
computational cost. A 3d FFT is implemented by a series of 1d FFTs
along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus the FFT
grid cannot be decomposed like atoms into 3 dimensions for parallel
along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus, the
FFT grid cannot be decomposed like atoms into 3 dimensions for parallel
processing of the FFTs but only in 1 (as planes) or 2 (as pencils)
dimensions and in between the steps the grid needs to be transposed to
have the FFT grid portion "owned" by each MPI process complete in the
direction of the 1d FFTs it has to perform. LAMMPS uses the
pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure.
pencil-decomposition algorithm as shown in the :ref:`fft-parallel`
figure.
Initially (far left), each processor owns a brick of same-color grid
cells (actually grid points) contained within in its subdomain. A
@ -97,7 +97,7 @@ across all $P$ processors with a single call to ``MPI_Alltoall()``, but
this is typically much slower. However, for the specialized brick and
pencil tiling illustrated in :ref:`fft-parallel` figure, collective
communication across the entire MPI communicator is not required. In
the example an :math:`8^3` grid with 512 grid cells is partitioned
the example, an :math:`8^3` grid with 512 grid cells is partitioned
across 64 processors; each processor owns a 2x2x2 3d brick of grid
cells. The initial brick-to-pencil communication (upper left to upper
right) only requires collective communication within subgroups of 4
@ -132,7 +132,7 @@ grid/particle operations that LAMMPS supports:
- The fftMPI library allows each grid dimension to be a multiple of
small prime factors (2,3,5), and allows any number of processors to
perform the FFT. The resulting brick and pencil decompositions are
thus not always as well-aligned but the size of subgroups of
thus not always as well-aligned, but the size of subgroups of
processors for the two modes of communication (brick/pencil and
pencil/pencil) still scale as :math:`O(P^{\frac{1}{3}})` and
:math:`O(P^{\frac{1}{2}})`.
@ -143,21 +143,23 @@ grid/particle operations that LAMMPS supports:
in memory. This reordering can be done during the packing or
unpacking of buffers for MPI communication.
- For large systems and particularly a large number of MPI processes,
the dominant cost for parallel FFTs is often the communication, not
the computation of 1d FFTs, even though the latter scales as :math:`N
\log(N)` in the number of grid points *N* per grid direction. This is
due to the fact that only a 2d decomposition into pencils is possible
while atom data (and their corresponding short-range force and energy
computations) can be decomposed efficiently in 3d.
- For large systems and particularly many MPI processes, the dominant
cost for parallel FFTs is often the communication, not the computation
of 1d FFTs, even though the latter scales as :math:`N \log(N)` in the
number of grid points *N* per grid direction. This is due to the fact
that only a 2d decomposition into pencils is possible while atom data
(and their corresponding short-range force and energy computations)
can be decomposed efficiently in 3d.
This can be addressed by reducing the number of MPI processes involved
in the MPI communication by using :doc:`hybrid MPI + OpenMP
parallelization <Speed_omp>`. This will use OpenMP parallelization
inside the MPI domains and while that may have a lower parallel
efficiency, it reduces the communication overhead.
Reducing the number of MPI processes involved in the MPI communication
will reduce this kind of overhead. By using a :doc:`hybrid MPI +
OpenMP parallelization <Speed_omp>` it is still possible to use all
processes for parallel computation. It will use OpenMP
parallelization inside the MPI domains. While that may have a lower
parallel efficiency for some part of the computation, that can be less
than the communication overhead in the 3d FFTs.
As an alternative it is also possible to start a :ref:`multi-partition
As an alternative, it is also possible to start a :ref:`multi-partition
<partition>` calculation and then use the :doc:`verlet/split
integrator <run_style>` to perform the PPPM computation on a
dedicated, separate partition of MPI processes. This uses an integer
@ -175,7 +177,7 @@ grid/particle operations that LAMMPS supports:
manner consistent with processor subdomains, and provides methods for
forward and reverse communication of owned and ghost grid point
values. It is used for PPPM as an FFT grid (as outlined above) and
also for the MSM algorithm which uses a cascade of grid sizes from
also for the MSM algorithm, which uses a cascade of grid sizes from
fine to coarse to compute long-range Coulombic forces. The GridComm
class is also useful for models where continuum fields interact with
particles. For example, the two-temperature model (TTM) defines heat

View File

@ -3,12 +3,12 @@ Neighbor lists
To compute forces efficiently, each processor creates a Verlet-style
neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned,
*j* = owned or ghost) with separation less than the applicable
neighbor list cutoff distance. In LAMMPS the neighbor lists are stored
in a multiple-page data structure; each page is a contiguous chunk of
memory which stores vectors of neighbor atoms *j* for many *i* atoms.
This allows pages to be incrementally allocated or deallocated in blocks
as needed. Neighbor lists typically consume the most memory of any data
*j* = owned or ghost) with separation less than the applicable neighbor
list cutoff distance. In LAMMPS, the neighbor lists are stored in a
multiple-page data structure; each page is a contiguous chunk of memory
which stores vectors of neighbor atoms *j* for many *i* atoms. This
allows pages to be incrementally allocated or deallocated in blocks as
needed. Neighbor lists typically consume the most memory of any data
structure in LAMMPS. The neighbor list is rebuilt (from scratch) once
every few timesteps, then used repeatedly each step for force or other
computations. The neighbor cutoff distance is :math:`R_n = R_f +
@ -16,7 +16,7 @@ computations. The neighbor cutoff distance is :math:`R_n = R_f +
the interatomic potential for computing short-range pairwise or manybody
forces and :math:`\Delta_s` is a "skin" distance that allows the list to
be used for multiple steps assuming that atoms do not move very far
between consecutive time steps. Typically the code triggers
between consecutive time steps. Typically, the code triggers
reneighboring when any atom has moved half the skin distance since the
last reneighboring; this and other options of the neighbor list rebuild
can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
@ -26,10 +26,10 @@ their owning processor's subdomain are first migrated to new processors
via communication. Periodic boundary conditions are also (only)
enforced on these steps to ensure each atom is re-assigned to the
correct processor. After migration, the atoms owned by each processor
are stored in a contiguous vector. Periodically each processor
are stored in a contiguous vector. Periodically, each processor
spatially sorts owned atoms within its vector to reorder it for improved
cache efficiency in force computations and neighbor list building. For
that atoms are spatially binned and then reordered so that atoms in the
that, atoms are spatially binned and then reordered so that atoms in the
same bin are adjacent in the vector. Atom sorting can be disabled or
its settings modified with the :doc:`atom_modify <atom_modify>` command.
@ -44,7 +44,7 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.
(left) and triclinic (right) domains. A regular grid of neighbor
bins (thin lines) overlays the entire simulation domain and need not
align with subdomain boundaries; only the portion overlapping the
augmented subdomain is shown. In the triclinic case it overlaps the
augmented subdomain is shown. In the triclinic case, it overlaps the
bounding box of the tilted rectangle. The blue- and red-shaded bins
represent a stencil of bins searched to find neighbors of a particular
atom (black dot).
@ -53,12 +53,12 @@ To build a local neighbor list in linear time, the simulation domain is
overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
as shown in the :ref:`neighbor-stencil` figure for 2d models and a
single MPI processor's subdomain. Each processor stores a set of
neighbor bins which overlap its subdomain extended by the neighbor
neighbor bins which overlap its subdomain, extended by the neighbor
cutoff distance :math:`R_n`. As illustrated, the bins need not align
with processor boundaries; an integer number in each dimension is fit to
the size of the entire simulation box.
Most often LAMMPS builds what it calls a "half" neighbor list where
Most often, LAMMPS builds what is called a "half" neighbor list where
each *i,j* neighbor pair is stored only once, with either atom *i* or
*j* as the central atom. The build can be done efficiently by using a
pre-computed "stencil" of bins around a central origin bin which
@ -67,18 +67,18 @@ is simply a list of integer offsets in *x,y,z* of nearby bins
surrounding the origin bin which are close enough to contain any
neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the
origin bin. Note that for a half neighbor list, the stencil can be
asymmetric since each atom only need store half its nearby neighbors.
asymmetric, since each atom only need store half its nearby neighbors.
These stencils are illustrated in the figure for a half list and a bin
size of :math:`\frac{1}{2} R_n`. There are 13 red+blue stencil bins in
2d (for the orthogonal case, 15 for triclinic). In 3d there would be
63, 13 in the plane of bins that contain the origin bin and 25 in each
of the two planes above it in the *z* direction (75 for triclinic). The
reason the triclinic stencil has extra bins is because the bins tile the
bounding box of the entire triclinic domain and thus are not periodic
with respect to the simulation box itself. The stencil and logic for
determining which *i,j* pairs to include in the neighbor list are
altered slightly to account for this.
triclinic stencil has extra bins because the bins tile the bounding box
of the entire triclinic domain, and thus are not periodic with respect
to the simulation box itself. The stencil and logic for determining
which *i,j* pairs to include in the neighbor list are altered slightly
to account for this.
To build a neighbor list, a processor first loops over its "owned" plus
"ghost" atoms and assigns each to a neighbor bin. This uses an integer
@ -95,7 +95,7 @@ supports:
been found to be optimal for many typical cases. Smaller bins incur
additional overhead to loop over; larger bins require more distance
calculations. Note that for smaller bin sizes, the 2d stencil in the
figure would be more semi-circular in shape (hemispherical in 3d),
figure would be of a more semicircular shape (hemispherical in 3d),
with bins near the corners of the square eliminated due to their
distance from the origin bin.
@ -111,8 +111,8 @@ supports:
symmetric stencil. It also includes lists with partial enumeration of
ghost atom neighbors. The full and ghost-atom lists are used by
various manybody interatomic potentials. Lists may also use different
criteria for inclusion of a pair interaction. Typically this simply
depends only on the distance between two atoms and the cutoff
criteria for inclusion of a pairwise interaction. Typically, this
simply depends only on the distance between two atoms and the cutoff
distance. But for finite-size coarse-grained particles with
individual diameters (e.g. polydisperse granular particles), it can
also depend on the diameters of the two particles.
@ -121,11 +121,11 @@ supports:
of the master neighbor list for the full system need to be generated,
one for each sub-style, which contains only the *i,j* pairs needed to
compute interactions between subsets of atoms for the corresponding
potential. This means not all *i* or *j* atoms owned by a processor
potential. This means, not all *i* or *j* atoms owned by a processor
are included in a particular sub-list.
- Some models use different cutoff lengths for pairwise interactions
between different kinds of particles which are stored in a single
between different kinds of particles, which are stored in a single
neighbor list. One example is a solvated colloidal system with large
colloidal particles where colloid/colloid, colloid/solvent, and
solvent/solvent interaction cutoffs can be dramatically different.
@ -153,7 +153,7 @@ supports:
For the newton pair *on* setting the atom *j* is only added to the
list if its *z* coordinate is larger, or if equal the *y* coordinate
is larger, and that is equal, too, the *x* coordinate is larger. For
homogeneously dense systems that will result in picking neighbors from
homogeneously dense systems, that will result in picking neighbors from
a same size sector in always the same direction relative to the
"owned" atom and thus it should lead to similar length neighbor lists
and thus reduce the chance of a load imbalance.
"owned" atom, and thus it should lead to similar length neighbor lists
and reduce the chance of a load imbalance.

View File

@ -6,7 +6,7 @@ thread parallelism to predominantly distribute loops over local data
and thus follow an orthogonal parallelization strategy to the
decomposition into spatial domains used by the :doc:`MPI partitioning
<Developer_par_part>`. For clarity, this section discusses only the
implementation in the OPENMP package as it is the simplest. The INTEL
implementation in the OPENMP package, as it is the simplest. The INTEL
and KOKKOS package offer additional options and are more complex since
they support more features and different hardware like co-processors
or GPUs.
@ -14,7 +14,7 @@ or GPUs.
One of the key decisions when implementing the OPENMP package was to
keep the changes to the source code small, so that it would be easier to
maintain the code and keep it in sync with the non-threaded standard
implementation. this is achieved by a) making the OPENMP version a
implementation. This is achieved by a) making the OPENMP version a
derived class from the regular version (e.g. ``PairLJCutOMP`` from
``PairLJCut``) and overriding only methods that are multi-threaded or
need to be modified to support multi-threading (similar to what was done
@ -26,13 +26,13 @@ into three separate classes ``ThrOMP``, ``ThrData``, and ``FixOMP``.
available in the corresponding base class (e.g. ``Pair`` for
``PairLJCutOMP``) like multi-thread aware variants of the "tally"
functions. Those functions are made available through multiple
inheritance so those new functions have to have unique names to avoid
inheritance, so those new functions have to have unique names to avoid
ambiguities; typically ``_thr`` is appended to the name of the function.
``ThrData`` is a classes that manages per-thread data structures.
It is used instead of extending the corresponding storage to per-thread
arrays to avoid slowdowns due to "false sharing" when multiple threads
update adjacent elements in an array and thus force the CPU cache lines
to be reset and re-fetched. ``FixOMP`` finally manages the "multi-thread
``ThrData`` is a class that manages per-thread data structures. It is
used instead of extending the corresponding storage to per-thread arrays
to avoid slowdowns due to "false sharing" when multiple threads update
adjacent elements in an array and thus force the CPU cache lines to be
reset and re-fetched. ``FixOMP`` finally manages the "multi-thread
state" like settings and access to per-thread storage, it is activated
by the :doc:`package omp <package>` command.
@ -46,24 +46,24 @@ involve multiple atoms and thus there are race conditions when multiple
threads want to update per-atom data of the same atoms. Five possible
strategies have been considered to avoid this:
1) restructure the code so that there is no overlapping access possible
1. Restructure the code so that there is no overlapping access possible
when computing in parallel, e.g. by breaking lists into multiple
parts and synchronizing threads in between.
2) have each thread be "responsible" for a specific group of atoms and
2. Have each thread be "responsible" for a specific group of atoms and
compute these interactions multiple times, once on each thread that
is responsible for a given atom and then have each thread only update
is responsible for a given atom, and then have each thread only update
the properties of this atom.
3) use mutexes around functions and regions of code where the data race
could happen
4) use atomic operations when updating per-atom properties
5) use replicated per-thread data structures to accumulate data without
3. Use mutexes around functions and regions of code where the data race
could happen.
4. Use atomic operations when updating per-atom properties.
5. Use replicated per-thread data structures to accumulate data without
conflicts and then use a reduction to combine those results into the
data structures used by the regular style.
Option 5 was chosen for the OPENMP package because it would retain the
performance for the case of 1 thread and the code would be more
performance for the case of a single thread and the code would be more
maintainable. Option 1 would require extensive code changes,
particularly to the neighbor list code; options 2 would have incurred a
particularly to the neighbor list code; option 2 would have incurred a
2x or more performance penalty for the serial case; option 3 causes
significant overhead and would enforce serialization of operations in
inner loops and thus defeat the purpose of multi-threading; option 4
@ -80,7 +80,7 @@ equivalent to the number of CPU cores per CPU socket on high-end
supercomputers.
Thus arrays like the force array are dimensioned to the number of atoms
times the number of threads when enabling OpenMP support and inside the
times the number of threads when enabling OpenMP support, and inside the
compute functions a pointer to a different chunk is obtained by each thread.
Similarly, accumulators like potential energy or virial are kept in
per-thread instances of the ``ThrData`` class and then only reduced and
@ -91,7 +91,7 @@ Loop scheduling
"""""""""""""""
Multi-thread parallelization is applied by distributing (outer) loops
statically across threads. Typically this would be the loop over local
statically across threads. Typically, this would be the loop over local
atoms *i* when processing *i,j* pairs of atoms from a neighbor list.
The design of the neighbor list code results in atoms having a similar
number of neighbors for homogeneous systems and thus load imbalances

View File

@ -7,36 +7,36 @@ distributed-memory parallelism is set with the :doc:`comm_style command
.. _domain-decomposition:
.. figure:: img/domain-decomp.png
:align: center
domain decomposition
Domain decomposition schemes
This figure shows the different kinds of domain decomposition used
for MPI parallelization: "brick" on the left with an orthogonal
(left) and a triclinic (middle) simulation domain, and a "tiled"
decomposition (right). The black lines show the division into
subdomains and the contained atoms are "owned" by the corresponding
MPI process. The green dashed lines indicate how subdomains are
extended with "ghost" atoms up to the communication cutoff distance.
This figure shows the different kinds of domain decomposition used
for MPI parallelization: "brick" on the left with an orthogonal
(left) and a triclinic (middle) simulation domain, and a "tiled"
decomposition (right). The black lines show the division into
subdomains, and the contained atoms are "owned" by the
corresponding MPI process. The green dashed lines indicate how
subdomains are extended with "ghost" atoms up to the communication
cutoff distance.
The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
or triclinic in shape, as illustrated in the :ref:`domain-decomposition`
figure for the 2d case. Orthogonal means the box edges are aligned with
the *x*, *y*, *z* Cartesian axes, and the box faces are thus all
rectangular. Triclinic allows for a more general parallelepiped shape
in which edges are aligned with three arbitrary vectors and the box
faces are parallelograms. In each dimension box faces can be periodic,
or non-periodic with fixed or shrink-wrapped boundaries. In the fixed
case, atoms which move outside the face are deleted; shrink-wrapped
means the position of the box face adjusts continuously to enclose all
the atoms.
The LAMMPS simulation box is a 3d or 2d volume, which can be of
orthogonal or triclinic shape, as illustrated in the
:ref:`domain-decomposition` figure for the 2d case. Orthogonal means
the box edges are aligned with the *x*, *y*, *z* Cartesian axes, and the
box faces are thus all rectangular. Triclinic allows for a more general
parallelepiped shape in which edges are aligned with three arbitrary
vectors and the box faces are parallelograms. In each dimension, box
faces can be periodic, or non-periodic with fixed or shrink-wrapped
boundaries. In the fixed case, atoms which move outside the face are
deleted; shrink-wrapped means the position of the box face adjusts
continuously to enclose all the atoms.
For distributed-memory MPI parallelism, the simulation box is spatially
decomposed (partitioned) into non-overlapping subdomains which fill the
box. The default partitioning, "brick", is most suitable when atom
density is roughly uniform, as shown in the left-side images of the
:ref:`domain-decomposition` figure. The subdomains comprise a regular
grid and all subdomains are identical in size and shape. Both the
grid, and all subdomains are identical in size and shape. Both the
orthogonal and triclinic boxes can deform continuously during a
simulation, e.g. to compress a solid or shear a liquid, in which case
the processor subdomains likewise deform.
@ -76,14 +76,14 @@ the load imbalance:
The pictures above demonstrate different decompositions for a 2d system
with 12 MPI ranks. The atom colors indicate the load imbalance of each
subdomain with green being optimal and red the least optimal.
subdomain, with green being optimal and red the least optimal.
Due to the vacuum in the system, the default decomposition is unbalanced
with several MPI ranks without atoms (left). By forcing a 1x12x1
processor grid, every MPI rank does computations now, but number of
atoms per subdomain is still uneven and the thin slice shape increases
the amount of communication between subdomains (center left). With a
2x6x1 processor grid and shifting the subdomain divisions, the load
imbalance is further reduced and the amount of communication required
between subdomains is less (center right). And using the recursive
bisectioning leads to further improved decomposition (right).
Due to the vacuum in the system, the default decomposition is
unbalanced, with several MPI ranks without atoms (left). By forcing a
1x12x1 processor grid, every MPI rank does computations now, but the
number of atoms per subdomain is still uneven, and the thin slice shape
increases the amount of communication between subdomains (center
left). With a 2x6x1 processor grid and shifting the subdomain divisions,
the load imbalance is further reduced and the amount of communication
required between subdomains is less (center right). And using the
recursive bisectioning leads to further improved decomposition (right).

View File

@ -136,7 +136,7 @@ The LAMMPS Python module enables calling the LAMMPS C library API from
Python by dynamically loading functions in the LAMMPS shared library through
the `Python ctypes module <https://docs.python.org/3/library/ctypes.html>`_.
Because of the dynamic loading, it is **required** that LAMMPS is compiled
in :ref:`"shared" mode <exe>`. The Python interface is object oriented, but
in :ref:`"shared" mode <exe>`. The Python interface is object-oriented, but
otherwise tries to be very similar to the C library API. Three different
Python classes to run LAMMPS are available and they build on each other.
More information on this is in the :doc:`Python_head`
@ -152,7 +152,7 @@ LAMMPS Fortran API
The LAMMPS Fortran module is a wrapper around calling functions from the
LAMMPS C library API. This is done using the ISO_C_BINDING feature in
Fortran 2003. The interface is object oriented but otherwise tries to
Fortran 2003. The interface is object-oriented but otherwise tries to
be very similar to the C library API and the basic Python module.
.. toctree::

View File

@ -1071,7 +1071,7 @@ getting started, but not as a fully tested and supported feature of the
LAMMPS distribution. Any contributions to complete this are, of course,
welcome. Please also note, that for the case of creating a Python wrapper,
a fully supported :doc:`Ctypes based lammps module <Python_module>`
already exists. That module is designed to be object oriented while
already exists. That module is designed to be object-oriented while
SWIG will generate a 1:1 translation of the functions in the interface file.
Building the wrapper

View File

@ -730,18 +730,18 @@ is because there can only be one fix which monitors the global
pressure and changes the simulation box dimensions. So you have 3
choices:
* Use one of the 4 NPT or NPH styles for the rigid bodies. Use the
*dilate* all option so that it will dilate the positions of the
non-rigid particles as well. Use :doc:`fix nvt <fix_nh>` (or any
other thermostat) for the non-rigid particles.
* Use :doc:`fix npt <fix_nh>` for the group of non-rigid particles. Use
the *dilate* all option so that it will dilate the center-of-mass
positions of the rigid bodies as well. Use one of the 4 NVE or 2 NVT
rigid styles for the rigid bodies.
* Use :doc:`fix press/berendsen <fix_press_berendsen>` to compute the
pressure and change the box dimensions. Use one of the 4 NVE or 2 NVT
rigid styles for the rigid bodies. Use :doc:`fix nvt <fix_nh>` (or
any other thermostat) for the non-rigid particles.
#. Use one of the 4 NPT or NPH styles for the rigid bodies. Use the
*dilate* all option so that it will dilate the positions of the
non-rigid particles as well. Use :doc:`fix nvt <fix_nh>` (or any
other thermostat) for the non-rigid particles.
#. Use :doc:`fix npt <fix_nh>` for the group of non-rigid particles. Use
the *dilate* all option so that it will dilate the center-of-mass
positions of the rigid bodies as well. Use one of the 4 NVE or 2 NVT
rigid styles for the rigid bodies.
#. Use :doc:`fix press/berendsen <fix_press_berendsen>` to compute the
pressure and change the box dimensions. Use one of the 4 NVE or 2 NVT
rigid styles for the rigid bodies. Use :doc:`fix nvt <fix_nh>` (or
any other thermostat) for the non-rigid particles.
In all case, the rigid bodies and non-rigid particles both contribute
to the global pressure and the box is scaled the same by any of the

View File

@ -166,7 +166,7 @@ events, all the other replicas also run dynamics and event checking
with the same schedule, but the final states are always overwritten by
the state of the event replica.
The outer loop of the pseudo-code above continues until *N* steps of
The outer loop of the pseudocode above continues until *N* steps of
dynamics have been performed. Note that *N* only includes the
dynamics of stages 2 and 3, not the steps taken during dephasing or
the minimization iterations of quenching. The specified *N* is

View File

@ -94,7 +94,7 @@ class Pointers {
python(ptr->python) {}
virtual ~Pointers() = default;
// remove default members execept for the copy constructor
// remove other default members
Pointers() = delete;
Pointers(const Pointers &) = default;