starting grammar, punctuation, and spelling review for developer info sections

This commit is contained in:
Axel Kohlmeyer
2023-02-04 18:03:30 -05:00
parent a0a7e76cc3
commit d1550bf9f6
15 changed files with 366 additions and 357 deletions

View File

@ -1,52 +1,53 @@
Code design Code design
----------- -----------
This section explains some of the code design choices in LAMMPS with This section explains some code design choices in LAMMPS with the goal
the goal of helping developers write new code similar to the existing of helping developers write new code similar to the existing code.
code. Please see the section on :doc:`Requirements for contributed Please see the section on :doc:`Requirements for contributed code
code <Modify_style>` for more specific recommendations and guidelines. <Modify_style>` for more specific recommendations and guidelines. While
While that section is organized more in the form of a checklist for that section is organized more in the form of a checklist for code
code contributors, the focus here is on overall code design strategy, contributors, the focus here is on overall code design strategy, choices
choices made between possible alternatives, and discussing some made between possible alternatives, and discussing some relevant C++
relevant C++ programming language constructs. programming language constructs.
Historically, the basic design philosophy of the LAMMPS C++ code was a Historically, the basic design philosophy of the LAMMPS C++ code was a
"C with classes" style. The motivation was to make it easy to modify "C with classes" style. The motivation was to make it easy to modify
LAMMPS for people without significant training in C++ programming. LAMMPS for people without significant training in C++ programming. Data
Data structures and code constructs were used that resemble the structures and code constructs were used that resemble the previous
previous implementation(s) in Fortran. A contributing factor to this implementation(s) in Fortran. A contributing factor to this choice was
choice also was that at the time, C++ compilers were often not mature that at the time, C++ compilers were often not mature and some advanced
and some of the advanced features contained bugs or did not function features contained bugs or did not function as the standard required.
as the standard required. There were also disagreements between There were also disagreements between compiler vendors as to how to
compiler vendors as to how to interpret the C++ standard documents. interpret the C++ standard documents.
However, C++ compilers have now advanced significantly. In 2020 we However, C++ compilers and the C++ programming language have advanced
decided to to require the C++11 standard as the minimum C++ language significantly. In 2020, the LAMMPS developers decided to require the
standard for LAMMPS. Since then we have begun to also replace some of C++11 standard as the minimum C++ language standard for LAMMPS. Since
the C-style constructs with equivalent C++ functionality, either from then, we have begun to replace C-style constructs with equivalent C++
the C++ standard library or as custom classes or functions, in order functionality. This was taken either from the C++ standard library or
to improve readability of the code and to increase code reuse through implemented as custom classes or functions. The goal is to improve
abstraction of commonly used functionality. readability of the code and to increase code reuse through abstraction
of commonly used functionality.
.. note:: .. note::
Please note that as of spring 2022 there is still a sizable chunk Please note that as of spring 2023 there is still a sizable chunk of
of legacy code in LAMMPS that has not yet been refactored to legacy code in LAMMPS that has not yet been refactored to reflect
reflect these style conventions in full. LAMMPS has a large code these style conventions in full. LAMMPS has a large code base and
base and many different contributors and there also is a hierarchy many contributors. There is also a hierarchy of precedence in which
of precedence in which the code is adapted. Highest priority has the code is adapted. Highest priority has been the code in the
been the code in the ``src`` folder, followed by code in packages ``src`` folder, followed by code in packages in order of their
in order of their popularity and complexity (simpler code is popularity and complexity (simpler code gets adapted sooner), followed
adapted sooner), followed by code in the ``lib`` folder. Source by code in the ``lib`` folder. Source code that is downloaded from
code that is downloaded from external packages or libraries during external packages or libraries during compilation is not subject to
compilation is not subject to the conventions discussed here. the conventions discussed here.
Object oriented code Object-oriented code
^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
LAMMPS is designed to be an object oriented code. Each simulation is LAMMPS is designed to be an object-oriented code. Each simulation is
represented by an instance of the LAMMPS class. When running in represented by an instance of the LAMMPS class. When running in
parallel each MPI process creates such an instance. This can be seen parallel, each MPI process creates such an instance. This can be seen
in the ``main.cpp`` file where the core steps of running a LAMMPS in the ``main.cpp`` file where the core steps of running a LAMMPS
simulation are the following 3 lines of code: simulation are the following 3 lines of code:
@ -67,14 +68,14 @@ other special features.
The basic LAMMPS class hierarchy which is created by the LAMMPS class The basic LAMMPS class hierarchy which is created by the LAMMPS class
constructor is shown in :ref:`class-topology`. When input commands constructor is shown in :ref:`class-topology`. When input commands
are processed, additional class instances are created, or deleted, or are processed, additional class instances are created, or deleted, or
replaced. Likewise specific member functions of specific classes are replaced. Likewise, specific member functions of specific classes are
called to trigger actions such creating atoms, computing forces, called to trigger actions such creating atoms, computing forces,
computing properties, time-propagating the system, or writing output. computing properties, time-propagating the system, or writing output.
Compositing and Inheritance Compositing and Inheritance
=========================== ===========================
LAMMPS makes extensive use of the object oriented programming (OOP) LAMMPS makes extensive use of the object-oriented programming (OOP)
principles of *compositing* and *inheritance*. Classes like the principles of *compositing* and *inheritance*. Classes like the
``LAMMPS`` class are a **composite** containing pointers to instances ``LAMMPS`` class are a **composite** containing pointers to instances
of other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``, of other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``,
@ -83,7 +84,7 @@ functionality by storing and manipulating data related to the
simulation and providing member functions that trigger certain simulation and providing member functions that trigger certain
actions. Some of those classes like ``Force`` are themselves actions. Some of those classes like ``Force`` are themselves
composites, containing instances of classes describing different force composites, containing instances of classes describing different force
interactions. Similarly the ``Modify`` class contains a list of interactions. Similarly, the ``Modify`` class contains a list of
``Fix`` and ``Compute`` classes. If the input commands that ``Fix`` and ``Compute`` classes. If the input commands that
correspond to these classes include the word *style*, then LAMMPS correspond to these classes include the word *style*, then LAMMPS
stores only a single instance of that class. E.g. *atom_style*, stores only a single instance of that class. E.g. *atom_style*,
@ -100,19 +101,18 @@ derived class variant was instantiated. In LAMMPS these derived
classes are often referred to as "styles", e.g. pair styles, fix classes are often referred to as "styles", e.g. pair styles, fix
styles, atom styles and so on. styles, atom styles and so on.
This is the origin of the flexibility of LAMMPS. For example pair This is the origin of the flexibility of LAMMPS. For example, pair
styles implement a variety of different non-bonded interatomic styles implement a variety of different non-bonded interatomic
potentials functions. All details for the implementation of a potentials functions. All details for the implementation of a
potential are stored and executed in a single class. potential are stored and executed in a single class.
As mentioned above, there can be multiple instances of classes derived As mentioned above, there can be multiple instances of classes derived
from the ``Fix`` or ``Compute`` base classes. They represent a from the ``Fix`` or ``Compute`` base classes. They represent a
different facet of LAMMPS flexibility as they provide methods which different facet of LAMMPS' flexibility, as they provide methods which
can be called at different points in time within a timestep, as can be called at different points within a timestep, as explained in
explained in `Developer_flow`. This allows the input script to tailor `Developer_flow`. This allows the input script to tailor how a specific
how a specific simulation is run, what diagnostic computations are simulation is run, what diagnostic computations are performed, and how
performed, and how the output of those computations is further the output of those computations is further processed or output.
processed or output.
Additional code sharing is possible by creating derived classes from the Additional code sharing is possible by creating derived classes from the
derived classes (e.g., to implement an accelerated version of a pair derived classes (e.g., to implement an accelerated version of a pair
@ -164,15 +164,15 @@ The difference in behavior of the ``normal()`` and the ``poly()`` member
functions is which of the two member functions is called when executing functions is which of the two member functions is called when executing
`base1->call()` versus `base2->call()`. Without polymorphism, a `base1->call()` versus `base2->call()`. Without polymorphism, a
function within the base class can only call member functions within the function within the base class can only call member functions within the
same scope, that is ``Base::call()`` will always call same scope: that is, ``Base::call()`` will always call
``Base::normal()``. But for the `base2->call()` case the call of the ``Base::normal()``. But for the `base2->call()` case, the call of the
virtual member function will be dispatched to ``Derived::poly()`` virtual member function will be dispatched to ``Derived::poly()``
instead. This mechanism means that functions are called within the instead. This mechanism results in calling functions that are within
scope of the class type that was used to *create* the class instance are the scope of the class that was used to *create* the instance, even if
invoked; even if they are assigned to a pointer using the type of a base they are assigned to a pointer for their base class. This is the
class. This is the desired behavior and this way LAMMPS can even use desired behavior, and this way LAMMPS can even use styles that are loaded
styles that are loaded at runtime from a shared object file with the at runtime from a shared object file with the :doc:`plugin command
:doc:`plugin command <plugin>`. <plugin>`.
A special case of virtual functions are so-called pure functions. These A special case of virtual functions are so-called pure functions. These
are virtual functions that are initialized to 0 in the class declaration are virtual functions that are initialized to 0 in the class declaration
@ -189,12 +189,12 @@ This has the effect that an instance of the base class cannot be
created and that derived classes **must** implement these functions. created and that derived classes **must** implement these functions.
Many of the functions listed with the various class styles in the Many of the functions listed with the various class styles in the
section :doc:`Modify` are pure functions. The motivation for this is section :doc:`Modify` are pure functions. The motivation for this is
to define the interface or API of the functions but defer their to define the interface or API of the functions, but defer their
implementation to the derived classes. implementation to the derived classes.
However, there are downsides to this. For example, calls to virtual However, there are downsides to this. For example, calls to virtual
functions from within a constructor, will not be in the scope of the functions from within a constructor, will *not* be in the scope of the
derived class and thus it is good practice to either avoid calling them derived class, and thus it is good practice to either avoid calling them
or to provide an explicit scope such as ``Base::poly()`` or or to provide an explicit scope such as ``Base::poly()`` or
``Derived::poly()``. Furthermore, any destructors in classes containing ``Derived::poly()``. Furthermore, any destructors in classes containing
virtual functions should be declared virtual too, so they will be virtual functions should be declared virtual too, so they will be
@ -208,8 +208,8 @@ dispatch.
that are intended to replace a virtual or pure function use the that are intended to replace a virtual or pure function use the
``override`` property keyword. For the same reason, the use of ``override`` property keyword. For the same reason, the use of
overloads or default arguments for virtual functions should be overloads or default arguments for virtual functions should be
avoided as they lead to confusion over which function is supposed to avoided, as they lead to confusion over which function is supposed to
override which and which arguments need to be declared. override which, and which arguments need to be declared.
Style Factories Style Factories
=============== ===============
@ -219,10 +219,10 @@ uses a programming pattern called `Factory`. Those are functions that
create an instance of a specific derived class, say ``PairLJCut`` and create an instance of a specific derived class, say ``PairLJCut`` and
return a pointer to the type of the common base class of that style, return a pointer to the type of the common base class of that style,
``Pair`` in this case. To associate the factory function with the ``Pair`` in this case. To associate the factory function with the
style keyword, an ``std::map`` class is used with function pointers style keyword, a ``std::map`` class is used with function pointers
indexed by their keyword (for example "lj/cut" for ``PairLJCut`` and indexed by their keyword (for example "lj/cut" for ``PairLJCut`` and
"morse" for ``PairMorse``). A couple of typedefs help keep the code "morse" for ``PairMorse``). A couple of typedefs help keep the code
readable and a template function is used to implement the actual readable, and a template function is used to implement the actual
factory functions for the individual classes. Below is an example factory functions for the individual classes. Below is an example
of such a factory function from the ``Force`` class as declared in of such a factory function from the ``Force`` class as declared in
``force.h`` and implemented in ``force.cpp``. The file ``style_pair.h`` ``force.h`` and implemented in ``force.cpp``. The file ``style_pair.h``
@ -279,26 +279,26 @@ from and writing to files and console instead of C++ "iostreams".
This is mainly motivated by better performance, better control over This is mainly motivated by better performance, better control over
formatting, and less effort to achieve specific formatting. formatting, and less effort to achieve specific formatting.
Since mixing "stdio" and "iostreams" can lead to unexpected Since mixing "stdio" and "iostreams" can lead to unexpected behavior,
behavior. use of the latter is strongly discouraged. Also output to use of the latter is strongly discouraged. Output to the screen should
the screen should not use the predefined ``stdout`` FILE pointer, but *not* use the predefined ``stdout`` FILE pointer, but rather the
rather the ``screen`` and ``logfile`` FILE pointers managed by the ``screen`` and ``logfile`` FILE pointers managed by the LAMMPS class.
LAMMPS class. Furthermore, output should generally only be done by Furthermore, output should generally only be done by MPI rank 0
MPI rank 0 (``comm->me == 0``). Output that is sent to both (``comm->me == 0``). Output that is sent to both ``screen`` and
``screen`` and ``logfile`` should use the :cpp:func:`utils::logmesg() ``logfile`` should use the :cpp:func:`utils::logmesg() convenience
convenience function <LAMMPS_NS::utils::logmesg>`. function <LAMMPS_NS::utils::logmesg>`.
We also discourage the use of stringstreams because the bundled {fmt} We discourage the use of stringstreams because the bundled {fmt} library
library and the customized tokenizer classes can provide the same and the customized tokenizer classes provide the same functionality in a
functionality in a cleaner way with better performance. This also cleaner way with better performance. This also helps maintain a
helps maintain a consistent programming syntax with code from many consistent programming syntax with code from many different
different contributors. contributors.
Formatting with the {fmt} library Formatting with the {fmt} library
=================================== ===================================
The LAMMPS source code includes a copy of the `{fmt} library The LAMMPS source code includes a copy of the `{fmt} library
<https://fmt.dev>`_ which is preferred over formatting with the <https://fmt.dev>`_, which is preferred over formatting with the
"printf()" family of functions. The primary reason is that it allows "printf()" family of functions. The primary reason is that it allows
a typesafe default format for any type of supported data. This is a typesafe default format for any type of supported data. This is
particularly useful for formatting integers of a given size (32-bit or particularly useful for formatting integers of a given size (32-bit or
@ -313,17 +313,16 @@ been included into the C++20 language standard, so changes to adopt it
are future-proof. are future-proof.
Formatted strings are frequently created by calling the Formatted strings are frequently created by calling the
``fmt::format()`` function which will return a string as a ``fmt::format()`` function, which will return a string as a
``std::string`` class instance. In contrast to the ``%`` placeholder ``std::string`` class instance. In contrast to the ``%`` placeholder in
in ``printf()``, the {fmt} library uses ``{}`` to embed format ``printf()``, the {fmt} library uses ``{}`` to embed format descriptors.
descriptors. In the simplest case, no additional characters are In the simplest case, no additional characters are needed, as {fmt} will
needed as {fmt} will choose the default format based on the data type choose the default format based on the data type of the argument.
of the argument. Otherwise the ``fmt::print()`` function may be Otherwise, the ``fmt::print()`` function may be used instead of
used instead of ``printf()`` or ``fprintf()``. In addition, several ``printf()`` or ``fprintf()``. In addition, several LAMMPS output
LAMMPS output functions, that originally accepted a single string as functions, that originally accepted a single string as argument have
argument have been overloaded to accept a format string with optional been overloaded to accept a format string with optional arguments as
arguments as well (e.g., ``Error::all()``, ``Error::one()``, well (e.g., ``Error::all()``, ``Error::one()``, ``utils::logmesg()``).
``utils::logmesg()``).
Summary of the {fmt} format syntax Summary of the {fmt} format syntax
================================== ==================================
@ -332,10 +331,11 @@ The syntax of the format string is "{[<argument id>][:<format spec>]}",
where either the argument id or the format spec (separated by a colon where either the argument id or the format spec (separated by a colon
':') is optional. The argument id is usually a number starting from 0 ':') is optional. The argument id is usually a number starting from 0
that is the index to the arguments following the format string. By that is the index to the arguments following the format string. By
default these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The most default, these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The
common case for using argument id would be to use the same argument in most common case for using argument id would be to use the same argument
multiple places in the format string without having to provide it as an in multiple places in the format string without having to provide it as
argument multiple times. In LAMMPS the argument id is rarely used. an argument multiple times. The argument id is rarely used in the LAMMPS
source code.
More common is the use of a format specifier, which starts with a colon. More common is the use of a format specifier, which starts with a colon.
This may optionally be followed by a fill character (default is ' '). If This may optionally be followed by a fill character (default is ' '). If
@ -347,18 +347,19 @@ width, which may be followed by a dot '.' and a precision for floating
point numbers. The final character in the format string would be an point numbers. The final character in the format string would be an
indicator for the "presentation", i.e. 'd' for decimal presentation of indicator for the "presentation", i.e. 'd' for decimal presentation of
integers, 'x' for hexadecimal, 'o' for octal, 'c' for character etc. integers, 'x' for hexadecimal, 'o' for octal, 'c' for character etc.
This mostly follows the "printf()" scheme but without requiring an This mostly follows the "printf()" scheme, but without requiring an
additional length parameter to distinguish between different integer additional length parameter to distinguish between different integer
widths. The {fmt} library will detect those and adapt the formatting widths. The {fmt} library will detect those and adapt the formatting
accordingly. For floating point numbers there are correspondingly, 'g' accordingly. For floating point numbers there are correspondingly, 'g'
for generic presentation, 'e' for exponential presentation, and 'f' for for generic presentation, 'e' for exponential presentation, and 'f' for
fixed point presentation. fixed point presentation.
Thus "{:8}" would represent *any* type argument using at least 8 The format string "{:8}" would thus represent *any* type argument and be
characters; "{:<8}" would do this as left aligned, "{:^8}" as centered, replaced by at least 8 characters; "{:<8}" would do this as left
"{:>8}" as right aligned. If a specific presentation is selected, the aligned, "{:^8}" as centered, "{:>8}" as right aligned. If a specific
argument type must be compatible or else the {fmt} formatting code will presentation is selected, the argument type must be compatible or else
throw an exception. Some format string examples are given below: the {fmt} formatting code will throw an exception. Some format string
examples are given below:
.. code-block:: c++ .. code-block:: c++
@ -392,12 +393,12 @@ documentation <https://fmt.dev/latest/syntax.html>`_ website.
Memory management Memory management
^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^
Dynamical allocation of small data and objects can be done with the Dynamical allocation of small data and objects can be done with the C++
the C++ commands "new" and "delete/delete[]. Large data should use commands "new" and "delete/delete[]". Large data should use the member
the member functions of the ``Memory`` class, most commonly, functions of the ``Memory`` class, most commonly, ``Memory::create()``,
``Memory::create()``, ``Memory::grow()``, and ``Memory::destroy()``, ``Memory::grow()``, and ``Memory::destroy()``, which provide variants
which provide variants for vectors, 2d arrays, 3d arrays, etc. for vectors, 2d arrays, 3d arrays, etc. These can also be used for
These can also be used for small data. small data.
The use of ``malloc()``, ``calloc()``, ``realloc()`` and ``free()`` The use of ``malloc()``, ``calloc()``, ``realloc()`` and ``free()``
directly is strongly discouraged. To simplify adapting legacy code directly is strongly discouraged. To simplify adapting legacy code
@ -408,26 +409,24 @@ perform additional error checks for safety.
Use of these custom memory allocation functions is motivated by the Use of these custom memory allocation functions is motivated by the
following considerations: following considerations:
- memory allocation failures on *any* MPI rank during a parallel run - Memory allocation failures on *any* MPI rank during a parallel run
will trigger an immediate abort of the entire parallel calculation will trigger an immediate abort of the entire parallel calculation.
instead of stalling it - A failing "new" will trigger an exception, which is also captured by
- a failing "new" will trigger an exception which is also captured by LAMMPS and triggers a global abort.
LAMMPS and triggers a global abort - Allocation of multidimensional arrays will be done in a C compatible
- allocation of multi-dimensional arrays will be done in a C compatible fashion, but such that the storage of the actual data is stored in one
fashion but so that the storage of the actual data is stored in one large contiguous block. Thus, when MPI communication is needed,
large contiguous block. Thus when MPI communication is needed,
the data can be communicated directly (similar to Fortran arrays). the data can be communicated directly (similar to Fortran arrays).
- the "destroy()" and "sfree()" functions may safely be called on NULL - The "destroy()" and "sfree()" functions may safely be called on NULL
pointers pointers.
- the "destroy()" functions will nullify the pointer variables making - The "destroy()" functions will nullify the pointer variables, thus
"use after free" errors easy to detect making "use after free" errors easy to detect.
- it is possible to use a larger than default memory alignment (not on - It is possible to use a larger than default memory alignment (not on
all operating systems, since the allocated storage pointers must be all operating systems, since the allocated storage pointers must be
compatible with ``free()`` for technical reasons) compatible with ``free()`` for technical reasons).
In the practical implementation of code this means that any pointer In the practical implementation of code this means, that any pointer
variables that are class members should be initialized to a variables, that are class members should be initialized to a ``nullptr``
``nullptr`` value in their respective constructors. That way it is value in their respective constructors. That way, it is safe to call
safe to call ``Memory::destroy()`` or ``delete[]`` on them before ``Memory::destroy()`` or ``delete[]`` on them before *any* allocation
*any* allocation outside the constructor. This helps prevent memory outside the constructor. This helps prevent memory leaks.
leaks.

View File

@ -28,7 +28,7 @@ The need to do this communication arises when data from the owned atoms
is updated (e.g. their positions) and this updated information needs to is updated (e.g. their positions) and this updated information needs to
be **copied** to the corresponding ghost atoms. be **copied** to the corresponding ghost atoms.
And second, *reverse communication* which sends ghost atom information And second, *reverse communication*, which sends ghost atom information
from each processor to the owning processor to **accumulate** (sum) from each processor to the owning processor to **accumulate** (sum)
the values with the corresponding owned atoms. The need for this the values with the corresponding owned atoms. The need for this
arises when data is computed and also stored with ghost atoms arises when data is computed and also stored with ghost atoms
@ -58,7 +58,7 @@ embedded-atom method (EAM) which compute intermediate values in the
first part of the compute() function that need to be stored by both first part of the compute() function that need to be stored by both
owned and ghost atoms for the second part of the force computation. owned and ghost atoms for the second part of the force computation.
The *Comm* class methods perform the MPI communication for buffers of The *Comm* class methods perform the MPI communication for buffers of
per-atom data. They "call back" to the *Pair* class so it can *pack* per-atom data. They "call back" to the *Pair* class, so it can *pack*
or *unpack* the buffer with data the *Pair* class owns. There are 4 or *unpack* the buffer with data the *Pair* class owns. There are 4
such methods that the *Pair* class must define, assuming it uses both such methods that the *Pair* class must define, assuming it uses both
forward and reverse communication: forward and reverse communication:
@ -70,7 +70,7 @@ forward and reverse communication:
The arguments to these methods include the buffer and a list of atoms The arguments to these methods include the buffer and a list of atoms
to pack or unpack. The *Pair* class also must set the *comm_forward* to pack or unpack. The *Pair* class also must set the *comm_forward*
and *comm_reverse* variables which store the number of values stored and *comm_reverse* variables, which store the number of values stored
in the communication buffers for each operation. This means, if in the communication buffers for each operation. This means, if
desired, it can choose to store multiple per-atom values in the desired, it can choose to store multiple per-atom values in the
buffer, and they will be communicated together to minimize buffer, and they will be communicated together to minimize
@ -81,11 +81,11 @@ containing ``double`` values. To correctly store integers that may be
The *Fix*, *Compute*, and *Dump* classes can also invoke the same kind The *Fix*, *Compute*, and *Dump* classes can also invoke the same kind
of forward and reverse communication operations using the same *Comm* of forward and reverse communication operations using the same *Comm*
class methods. Likewise the same pack/unpack methods and class methods. Likewise, the same pack/unpack methods and
comm_forward/comm_reverse variables must be defined by the calling comm_forward/comm_reverse variables must be defined by the calling
*Fix*, *Compute*, or *Dump* class. *Fix*, *Compute*, or *Dump* class.
For *Fix* classes there is an optional second argument to the For *Fix* classes, there is an optional second argument to the
*forward_comm()* and *reverse_comm()* call which can be used when the *forward_comm()* and *reverse_comm()* call which can be used when the
fix performs multiple modes of communication, with different numbers fix performs multiple modes of communication, with different numbers
of values per atom. The fix should set the *comm_forward* and of values per atom. The fix should set the *comm_forward* and
@ -150,7 +150,7 @@ latter case, when the *ring* operation is complete, each processor can
examine its original buffer to extract modified values. examine its original buffer to extract modified values.
Note that the *ring* operation is similar to an MPI_Alltoall() Note that the *ring* operation is similar to an MPI_Alltoall()
operation where every processor effectively sends and receives data to operation, where every processor effectively sends and receives data to
every other processor. The difference is that the *ring* operation every other processor. The difference is that the *ring* operation
does it one step at a time, so the total volume of data does not need does it one step at a time, so the total volume of data does not need
to be stored by every processor. However, the *ring* operation is to be stored by every processor. However, the *ring* operation is
@ -184,8 +184,8 @@ The *exchange_data()* method triggers the communication to be
performed. Each processor provides the vector of *N* datums to send, performed. Each processor provides the vector of *N* datums to send,
and the size of each datum. All datums must be the same size. and the size of each datum. All datums must be the same size.
The *create_atom()* and *exchange_atom()* methods are similar except The *create_atom()* and *exchange_atom()* methods are similar, except
that the size of each datum can be different. Typically this is used that the size of each datum can be different. Typically, this is used
to communicate atoms, each with a variable amount of per-atom data, to to communicate atoms, each with a variable amount of per-atom data, to
other processors. other processors.

View File

@ -45,9 +45,9 @@ other methods in the class.
zero before each timestep, so that forces (torques, etc) can be zero before each timestep, so that forces (torques, etc) can be
accumulated. accumulated.
Now for the ``Verlet::run()`` method. Its basic structure in hi-level pseudo Now for the ``Verlet::run()`` method. Its basic structure in hi-level
code is shown below. In the actual code in ``src/verlet.cpp`` some of pseudocode is shown below. In the actual code in ``src/verlet.cpp``
these operations are conditionally invoked. some of these operations are conditionally invoked.
.. code-block:: python .. code-block:: python
@ -105,17 +105,17 @@ need it. These flags are passed to the various methods that compute
particle interactions, so that they either compute and tally the particle interactions, so that they either compute and tally the
corresponding data or can skip the extra calculations if the energy and corresponding data or can skip the extra calculations if the energy and
virial are not needed. See the comments for the ``Integrate::ev_set()`` virial are not needed. See the comments for the ``Integrate::ev_set()``
method which document the flag values. method, which document the flag values.
At various points of the timestep, fixes are invoked, At various points of the timestep, fixes are invoked,
e.g. ``fix->initial_integrate()``. In the code, this is actually done e.g. ``fix->initial_integrate()``. In the code, this is actually done
via the Modify class which stores all the Fix objects and lists of which via the Modify class, which stores all the Fix objects and lists of which
should be invoked at what point in the timestep. Fixes are the LAMMPS should be invoked at what point in the timestep. Fixes are the LAMMPS
mechanism for tailoring the operations of a timestep for a particular mechanism for tailoring the operations of a timestep for a particular
simulation. As described elsewhere, each fix has one or more methods, simulation. As described elsewhere, each fix has one or more methods,
each of which is invoked at a specific stage of the timestep, as show in each of which is invoked at a specific stage of the timestep, as show in
the timestep pseudo-code. All the active fixes defined in an input the timestep pseudocode. All the active fixes defined in an input
script, that are flagged to have an ``initial_integrate()`` method are script, that are flagged to have an ``initial_integrate()`` method, are
invoked at the beginning of each timestep. Examples are :doc:`fix nve invoked at the beginning of each timestep. Examples are :doc:`fix nve
<fix_nve>` or :doc:`fix nvt or fix npt <fix_nh>` which perform the <fix_nve>` or :doc:`fix nvt or fix npt <fix_nh>` which perform the
start-of-timestep velocity-Verlet integration operations to update start-of-timestep velocity-Verlet integration operations to update
@ -131,9 +131,9 @@ can be changed using the :doc:`neigh_modify every/delay/check
<neigh_modify>` command. If not, coordinates of ghost atoms are <neigh_modify>` command. If not, coordinates of ghost atoms are
acquired by each processor via the ``forward_comm()`` method of the Comm acquired by each processor via the ``forward_comm()`` method of the Comm
class. If neighbor lists need to be built, several operations within class. If neighbor lists need to be built, several operations within
the inner if clause of the pseudo-code are first invoked. The the inner if clause of the pseudocode are first invoked. The
``pre_exchange()`` method of any defined fixes is invoked first. ``pre_exchange()`` method of any defined fixes is invoked first.
Typically this inserts or deletes particles from the system. Typically, this inserts or deletes particles from the system.
Periodic boundary conditions are then applied by the Domain class via Periodic boundary conditions are then applied by the Domain class via
its ``pbc()`` method to remap particles that have moved outside the its ``pbc()`` method to remap particles that have moved outside the
@ -148,7 +148,7 @@ The box boundaries are then reset (if needed) via the ``reset_box()``
method of the Domain class, e.g. if box boundaries are shrink-wrapped to method of the Domain class, e.g. if box boundaries are shrink-wrapped to
current particle coordinates. A change in the box size or shape current particle coordinates. A change in the box size or shape
requires internal information for communicating ghost atoms (Comm class) requires internal information for communicating ghost atoms (Comm class)
and neighbor list bins (Neighbor class) be updated. The ``setup()`` and neighbor list bins (Neighbor class) to be updated. The ``setup()``
method of the Comm class and ``setup_bins()`` method of the Neighbor method of the Comm class and ``setup_bins()`` method of the Neighbor
class perform the update. class perform the update.
@ -217,20 +217,21 @@ file, and restart files. See the :doc:`thermo_style <thermo_style>`,
:doc:`dump <dump>`, and :doc:`restart <restart>` commands for more :doc:`dump <dump>`, and :doc:`restart <restart>` commands for more
details. details.
The the flow of control during energy minimization iterations is The flow of control during energy minimization iterations is similar to
similar to that of a molecular dynamics timestep. Forces are computed, that of a molecular dynamics timestep. Forces are computed, neighbor
neighbor lists are built as needed, atoms migrate to new processors, and lists are built as needed, atoms migrate to new processors, and atom
atom coordinates and forces are communicated to neighboring processors. coordinates and forces are communicated to neighboring processors. The
The only difference is what Fix class operations are invoked when. Only only difference is what Fix class operations are invoked when. Only a
a subset of LAMMPS fixes are useful during energy minimization, as subset of LAMMPS fixes are useful during energy minimization, as
explained in their individual doc pages. The relevant Fix class methods explained in their individual doc pages. The relevant Fix class methods
are ``min_pre_exchange()``, ``min_pre_force()``, and ``min_post_force()``. are ``min_pre_exchange()``, ``min_pre_force()``, and
Each fix is invoked at the appropriate place within the minimization ``min_post_force()``. Each fix is invoked at the appropriate place
iteration. For example, the ``min_post_force()`` method is analogous to within the minimization iteration. For example, the
the ``post_force()`` method for dynamics; it is used to alter or constrain ``min_post_force()`` method is analogous to the ``post_force()`` method
forces on each atom, which affects the minimization procedure. for dynamics; it is used to alter or constrain forces on each atom,
which affects the minimization procedure.
After all iterations are completed there is a ``cleanup`` step which After all iterations are completed, there is a ``cleanup`` step which
calls the ``post_run()`` method of fixes to perform operations only required calls the ``post_run()`` method of fixes to perform operations only required
at the end of a calculations (like freeing temporary storage or creating at the end of a calculation (like freeing temporary storage or creating
final outputs). final outputs).

View File

@ -70,7 +70,7 @@ A command can define multiple grids, each of a different size. Each
grid is an instantiation of the Grid2d or Grid3d class. grid is an instantiation of the Grid2d or Grid3d class.
The command also defines what data it will store for each grid it The command also defines what data it will store for each grid it
creates and it allocates the multi-dimensional array(s) needed to creates and it allocates the multidimensional array(s) needed to
store the data. No grid cell data is stored within the Grid2d or store the data. No grid cell data is stored within the Grid2d or
Grid3d classes. Grid3d classes.
@ -115,7 +115,7 @@ which stores *Nvalues* per grid cell.
nyhi_out, nxlo_out, nxhi_out, nvalues, nyhi_out, nxlo_out, nxhi_out, nvalues,
"data3d_multi"); "data3d_multi");
Note that these multi-dimensional arrays are allocated as contiguous Note that these multidimensional arrays are allocated as contiguous
chunks of memory where the x-index of the grid varies fastest, then y, chunks of memory where the x-index of the grid varies fastest, then y,
and the z-index slowest. For multiple values per grid cell, the and the z-index slowest. For multiple values per grid cell, the
Nvalues are contiguous, so their index varies even faster than the Nvalues are contiguous, so their index varies even faster than the
@ -798,7 +798,7 @@ A value of -1 is returned if the data name is not recognized.
The *get_griddata_by_index()* method is called after the The *get_griddata_by_index()* method is called after the
*get_griddata_by_name()* method, using the data index it returned as *get_griddata_by_name()* method, using the data index it returned as
its argument. This method will return a pointer to the its argument. This method will return a pointer to the
multi-dimensional array which stores the requested data. multidimensional array which stores the requested data.
As in the discussion above of the Memory class *create_offset()* As in the discussion above of the Memory class *create_offset()*
methods, the dimensionality of the array associated with the returned methods, the dimensionality of the array associated with the returned

View File

@ -15,7 +15,7 @@ for more information about that part of the build process. LAMMPS
currently supports building with :doc:`conventional makefiles currently supports building with :doc:`conventional makefiles
<Build_make>` and through :doc:`CMake <Build_cmake>`. Those procedures <Build_make>` and through :doc:`CMake <Build_cmake>`. Those procedures
differ in how packages are enabled or disabled for inclusion into a differ in how packages are enabled or disabled for inclusion into a
LAMMPS binary so they cannot be mixed. The source files for each LAMMPS binary, so they cannot be mixed. The source files for each
package are in all-uppercase subdirectories of the ``src`` folder, for package are in all-uppercase subdirectories of the ``src`` folder, for
example ``src/MOLECULE`` or ``src/EXTRA-MOLECULE``. The ``src/STUBS`` example ``src/MOLECULE`` or ``src/EXTRA-MOLECULE``. The ``src/STUBS``
subdirectory is not a package but contains a dummy MPI library, that is subdirectory is not a package but contains a dummy MPI library, that is
@ -26,31 +26,31 @@ with traditional makefiles.
The ``lib`` directory contains the source code for several supporting The ``lib`` directory contains the source code for several supporting
libraries or files with configuration settings to use globally installed libraries or files with configuration settings to use globally installed
libraries, that are required by some of the optional packages. They may libraries, that are required by some optional packages. They may
include python scripts that can transparently download additional source include python scripts that can transparently download additional source
code on request. Each subdirectory, like ``lib/poems`` or ``lib/gpu``, code on request. Each subdirectory, like ``lib/poems`` or ``lib/gpu``,
contains the source files, some of which are in different languages such contains the source files, some of which are in different languages such
as Fortran or CUDA. These libraries included in the LAMMPS build, as Fortran or CUDA. These libraries included in the LAMMPS build, if the
if the corresponding package is installed. corresponding package is installed.
LAMMPS C++ source files almost always come in pairs, such as LAMMPS C++ source files almost always come in pairs, such as
``src/run.cpp`` (implementation file) and ``src/run.h`` (header file). ``src/run.cpp`` (implementation file) and ``src/run.h`` (header file).
Each pair of files defines a C++ class, for example the Each pair of files defines a C++ class, for example the
:cpp:class:`LAMMPS_NS::Run` class which contains the code invoked by the :cpp:class:`LAMMPS_NS::Run` class, which contains the code invoked by
:doc:`run <run>` command in a LAMMPS input script. As this example the :doc:`run <run>` command in a LAMMPS input script. As this example
illustrates, source file and class names often have a one-to-one illustrates, source file and class names often have a one-to-one
correspondence with a command used in a LAMMPS input script. Some correspondence with a command used in a LAMMPS input script. Some
source files and classes do not have a corresponding input script source files and classes do not have a corresponding input script
command, e.g. ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force` command, for example ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force`
class. They are discussed in the next section. class. They are discussed in the next section.
The names of all source files are in lower case and may use the The names of all source files are in lower case and may use the
underscore character '_' to separate words. Outside of bundled libraries underscore character '_' to separate words. Apart from bundled,
which may have different conventions, all C and C++ header files have a externally maintained libraries, which may have different conventions,
``.h`` extension, all C++ files have a ``.cpp`` extension, and C files a all C and C++ header files have a ``.h`` extension, all C++ files have a
``.c`` extension. A small number of C++ classes and utility functions ``.cpp`` extension, and C files a ``.c`` extension. A few C++ classes
are implemented with only a ``.h`` file. Examples are the Pointers and and utility functions are implemented with only a ``.h`` file. Examples
Commands classes or the MathVec functions. are the Pointers and Commands classes or the MathVec functions.
Class topology Class topology
-------------- --------------
@ -62,35 +62,36 @@ associated source files in the ``src`` folder, for example the class
:cpp:class:`LAMMPS_NS::Memory` corresponds to the files ``memory.cpp`` :cpp:class:`LAMMPS_NS::Memory` corresponds to the files ``memory.cpp``
and ``memory.h``, or the class :cpp:class:`LAMMPS_NS::AtomVec` and ``memory.h``, or the class :cpp:class:`LAMMPS_NS::AtomVec`
corresponds to the files ``atom_vec.cpp`` and ``atom_vec.h``. Full corresponds to the files ``atom_vec.cpp`` and ``atom_vec.h``. Full
lines in the figure represent compositing: that is the class at the base lines in the figure represent compositing: that is, the class at the
of the arrow holds a pointer to an instance of the class at the tip. base of the arrow holds a pointer to an instance of the class at the
Dashed lines instead represent inheritance: the class to the tip of the tip. Dashed lines instead represent inheritance: the class at the tip
arrow is derived from the class at the base. Classes with a red boundary of the arrow is derived from the class at the base. Classes with a red
are not instantiated directly, but they represent the base classes for boundary are not instantiated directly, but they represent the base
"styles". Those "styles" make up the bulk of the LAMMPS code and only classes for "styles". Those "styles" make up the bulk of the LAMMPS
a few representative examples are included in the figure so it remains code and only a few representative examples are included in the figure,
readable. so it remains readable.
.. _class-topology: .. _class-topology:
.. figure:: JPG/lammps-classes.png .. figure:: JPG/lammps-classes.png
LAMMPS class topology LAMMPS class topology
This figure shows some of the relations of the base classes of the This figure shows relations of base classes of the LAMMPS
LAMMPS simulation package. Full lines indicate that a class holds an simulation package. Full lines indicate that a class holds an
instance of the class it is pointing to; dashed lines point to instance of the class it is pointing to; dashed lines point to
derived classes that are given as examples of what classes may be derived classes that are given as examples of what classes may be
instantiated during a LAMMPS run based on the input commands and instantiated during a LAMMPS run based on the input commands and
accessed through the API define by their respective base classes. At accessed through the API define by their respective base classes.
the core is the :cpp:class:`LAMMPS <LAMMPS_NS::LAMMPS>` class, which At the core is the :cpp:class:`LAMMPS <LAMMPS_NS::LAMMPS>` class,
holds pointers to class instances with specific purposes. Those may which holds pointers to class instances with specific purposes.
hold instances of other classes, sometimes directly, or only Those may hold instances of other classes, sometimes directly, or
temporarily, sometimes as derived classes or derived classes of only temporarily, sometimes as derived classes or derived classes
derived classes, which may also hold instances of other classes. of derived classes, which may also hold instances of other
classes.
The :cpp:class:`LAMMPS_NS::LAMMPS` class is the topmost class and The :cpp:class:`LAMMPS_NS::LAMMPS` class is the topmost class and
represents what is generally referred to an "instance" of LAMMPS. It is represents what is generally referred to as an "instance of LAMMPS". It
a composite holding pointers to instances of other core classes is a composite holding pointers to instances of other core classes
providing the core functionality of the MD engine in LAMMPS and through providing the core functionality of the MD engine in LAMMPS and through
them abstractions of the required operations. The constructor of the them abstractions of the required operations. The constructor of the
LAMMPS class will instantiate those instances, process the command line LAMMPS class will instantiate those instances, process the command line
@ -102,42 +103,44 @@ LAMMPS while passing it the command line flags and input script. It
deletes the LAMMPS instance after the method reading the input returns deletes the LAMMPS instance after the method reading the input returns
and shuts down the MPI environment before it exits the executable. and shuts down the MPI environment before it exits the executable.
The :cpp:class:`LAMMPS_NS::Pointers` is not shown in the The :cpp:class:`LAMMPS_NS::Pointers` class is not shown in the
:ref:`class-topology` figure for clarity. It holds references to many :ref:`class-topology` figure for clarity. It holds references to many
of the members of the `LAMMPS_NS::LAMMPS`, so that all classes derived of the members of the `LAMMPS_NS::LAMMPS`, so that all classes derived
from :cpp:class:`LAMMPS_NS::Pointers` have direct access to those from :cpp:class:`LAMMPS_NS::Pointers` have direct access to those
reference. From the class topology all classes with blue boundary are references. From the class topology all classes with blue boundary are
referenced in the Pointers class and all classes in the second and third referenced in the Pointers class and all classes in the second and third
columns, that are not listed as derived classes are instead derived from columns, that are not listed as derived classes, are instead derived
:cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer references from :cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer
in Pointers, a pointer to the LAMMPS class instance needs to be passed references in Pointers, a pointer to the LAMMPS class instance needs to
to the constructor and thus all constructors for classes derived from it be passed to the constructor. All constructors for classes derived from
must do so and pass this pointer to the constructor for Pointers. it, must do so and thus pass that pointer to the constructor for
:cpp:class:`LAMMPS_NS::Pointers`. The default constructor for
:cpp:class:`LAMMPS_NS::Pointers` is disabled to enforce this.
Since all storage is supposed to be encapsulated (there are a few Since all storage is supposed to be encapsulated (there are a few
exceptions), the LAMMPS class can also be instantiated multiple times by exceptions), the LAMMPS class can also be instantiated multiple times by
a calling code. Outside of the aforementioned exceptions, those LAMMPS a calling code. Outside the aforementioned exceptions, those LAMMPS
instances can be used alternately. As of the time of this writing instances can be used alternately. As of the time of this writing
(early 2021) LAMMPS is not yet sufficiently thread-safe for concurrent (early 2023) LAMMPS is not yet sufficiently thread-safe for concurrent
execution. When running in parallel with MPI, care has to be taken, execution. When running in parallel with MPI, care has to be taken,
that suitable copies of communicators are used to not create conflicts that suitable copies of communicators are used to not create conflicts
between different instances. between different instances.
The LAMMPS class currently (early 2021) holds instances of 19 classes The LAMMPS class currently holds instances of 19 classes representing
representing the core functionality. There are a handful of virtual the core functionality. There are a handful of virtual parent classes
parent classes in LAMMPS that define what LAMMPS calls ``styles``. They in LAMMPS that define what LAMMPS calls ``styles``. These are shaded
are shaded red in the :ref:`class-topology` figure. Each of these are red in the :ref:`class-topology` figure. Each of these are parents of a
parents of a number of child classes that implement the interface number of child classes that implement the interface defined by the
defined by the parent class. There are two main categories of these parent class. There are two main categories of these ``styles``: some
``styles``: some may only have one instance active at a time (e.g. atom, may only have one instance active at a time (e.g. atom, pair, bond,
pair, bond, angle, dihedral, improper, kspace, comm) and there is a angle, dihedral, improper, kspace, comm) and there is a dedicated
dedicated pointer variable for each of them in the composite class. pointer variable for each of them in the corresponding composite class.
Setups that require a mix of different such styles have to use a Setups that require a mix of different such styles have to use a
*hybrid* class that takes the place of the one allowed instance and then *hybrid* class instance that acts as a proxy, and manages and forwards
manages and forwards calls to the corresponding sub-styles for the calls to the corresponding sub-style class instances for the designated
designated subset of atoms or data. The composite class may also have subset of atoms or data. The composite class may also have lists of
lists of class instances, e.g. Modify handles lists of compute and fix class instances, e.g. ``Modify`` handles lists of compute and fix
styles, while Output handles a list of dump class instances. styles, while ``Output`` handles a list of dump class instances.
The exception to this scheme are the ``command`` style classes. These The exception to this scheme are the ``command`` style classes. These
implement specific commands that can be invoked before, after, or in implement specific commands that can be invoked before, after, or in
@ -146,19 +149,19 @@ command() method called and then, after completion, the class instance
deleted. Examples for this are the create_box, create_atoms, minimize, deleted. Examples for this are the create_box, create_atoms, minimize,
run, set, or velocity command styles. run, set, or velocity command styles.
For all those ``styles`` certain naming conventions are employed: for For all those ``styles``, certain naming conventions are employed: for
the fix nve command the class is called FixNVE and the source files are the fix nve command the class is called FixNVE and the source files are
``fix_nve.h`` and ``fix_nve.cpp``. Similarly for fix ave/time we have ``fix_nve.h`` and ``fix_nve.cpp``. Similarly, for fix ave/time we have
FixAveTime and ``fix_ave_time.h`` and ``fix_ave_time.cpp``. Style names FixAveTime and ``fix_ave_time.h`` and ``fix_ave_time.cpp``. Style names
are lower case and without spaces or special characters. A suffix or are lower case and without spaces or special characters. A suffix or
words are appended with a forward slash '/' which denotes a variant of words are appended with a forward slash '/' which denotes a variant of
the corresponding class without the suffix. To connect the style name the corresponding class without the suffix. To connect the style name
and the class name, LAMMPS uses macros like: ``AtomStyle()``, and the class name, LAMMPS uses macros like: ``AtomStyle()``,
``PairStyle()``, ``BondStyle()``, ``RegionStyle()``, and so on in the ``PairStyle()``, ``BondStyle()``, ``RegionStyle()``, and so on in the
corresponding header file. During configuration or compilation files corresponding header file. During configuration or compilation, files
with the pattern ``style_<name>.h`` are created that consist of a list with the pattern ``style_<name>.h`` are created that consist of a list
of include statements including all headers of all styles of a given of include statements including all headers of all styles of a given
type that are currently active (or "installed). type that are currently enabled (or "installed").
More details on individual classes in the :ref:`class-topology` are as More details on individual classes in the :ref:`class-topology` are as
@ -172,8 +175,8 @@ follows:
that one or multiple simulations can be run, on the processors that one or multiple simulations can be run, on the processors
allocated for a run, e.g. by the mpirun command. allocated for a run, e.g. by the mpirun command.
- The Input class reads and processes input input strings and files, - The Input class reads and processes input (strings and files), stores
stores variables, and invokes :doc:`commands <Commands_all>`. variables, and invokes :doc:`commands <Commands_all>`.
- Command style classes are derived from the Command class. They provide - Command style classes are derived from the Command class. They provide
input script commands that perform one-time operations input script commands that perform one-time operations
@ -192,7 +195,7 @@ follows:
- The Atom class stores per-atom properties associated with atom styles. - The Atom class stores per-atom properties associated with atom styles.
More precisely, they are allocated and managed by a class derived from More precisely, they are allocated and managed by a class derived from
the AtomVec class, and the Atom class simply stores pointers to them. the AtomVec class, and the Atom class simply stores pointers to them.
The classes derived from AtomVec represent the different atom styles The classes derived from AtomVec represent the different atom styles,
and they are instantiated through the :doc:`atom_style <atom_style>` and they are instantiated through the :doc:`atom_style <atom_style>`
command. command.
@ -206,18 +209,22 @@ follows:
class stores a single list (for all atoms). A NeighRequest class class stores a single list (for all atoms). A NeighRequest class
instance is created by pair, fix, or compute styles when they need a instance is created by pair, fix, or compute styles when they need a
particular kind of neighbor list and use the NeighRequest properties particular kind of neighbor list and use the NeighRequest properties
to select the neighbor list settings for the given request. There can to select the neighbor list settings for the given request. There can
be multiple instances of the NeighRequest class and the Neighbor class be multiple instances of the NeighRequest class. The Neighbor class
will try to optimize how they are computed by creating copies or will try to optimize how the requests are processed. Depending on the
sub-lists where possible. NeighRequest properties, neighbor lists are constructed from scratch,
aliased, or constructed by post-processing an existing list into
sub-lists.
- The Comm class performs inter-processor communication, typically of - The Comm class performs inter-processor communication, typically of
ghost atom information. This usually involves MPI message exchanges ghost atom information. This usually involves MPI message exchanges
with 6 neighboring processors in the 3d logical grid of processors with 6 neighboring processors in the 3d logical grid of processors
mapped to the simulation box. There are two :doc:`communication styles mapped to the simulation box. There are two :doc:`communication styles
<comm_style>` enabling different ways to do the domain decomposition. <comm_style>`, enabling different ways to perform the domain
Sometimes the Irregular class is used, when atoms may migrate to decomposition.
arbitrary processors.
- The Irregular class is used, when atoms may migrate to arbitrary
processors.
- The Domain class stores the simulation box geometry, as well as - The Domain class stores the simulation box geometry, as well as
geometric Regions and any user definition of a Lattice. The latter geometric Regions and any user definition of a Lattice. The latter
@ -246,7 +253,7 @@ follows:
file, dump file snapshots, and restart files. These correspond to the file, dump file snapshots, and restart files. These correspond to the
:doc:`Thermo <thermo_style>`, :doc:`Dump <dump>`, and :doc:`Thermo <thermo_style>`, :doc:`Dump <dump>`, and
:doc:`WriteRestart <write_restart>` classes respectively. The Dump :doc:`WriteRestart <write_restart>` classes respectively. The Dump
class is a base class with several derived classes implementing class is a base class, with several derived classes implementing
various dump style variants. various dump style variants.
- The Timer class logs timing information, output at the end - The Timer class logs timing information, output at the end

View File

@ -1,12 +1,12 @@
Communication Communication
^^^^^^^^^^^^^ ^^^^^^^^^^^^^
Following the partitioning scheme in use all per-atom data is Following the selected partitioning scheme, all per-atom data is
distributed across the MPI processes, which allows LAMMPS to handle very distributed across the MPI processes, which allows LAMMPS to handle very
large systems provided it uses a correspondingly large number of MPI large systems provided it uses a correspondingly large number of MPI
processes. Since The per-atom data (atom IDs, positions, velocities, processes. Since The per-atom data (atom IDs, positions, velocities,
types, etc.) To be able to compute the short-range interactions MPI types, etc.) To be able to compute the short-range interactions, MPI
processes need not only access to data of atoms they "own" but also processes need not only access to the data of atoms they "own" but also
information about atoms from neighboring subdomains, in LAMMPS referred information about atoms from neighboring subdomains, in LAMMPS referred
to as "ghost" atoms. These are copies of atoms storing required to as "ghost" atoms. These are copies of atoms storing required
per-atom data for up to the communication cutoff distance. The green per-atom data for up to the communication cutoff distance. The green
@ -59,7 +59,7 @@ and upper *y*-boundary of rank 0's subdomain. In the *y* stage, ranks
4,5,6 send atoms in their blue-shaded regions to rank 0. This may 4,5,6 send atoms in their blue-shaded regions to rank 0. This may
include ghost atoms they received in the *x* stage, but only if they include ghost atoms they received in the *x* stage, but only if they
are needed by rank 0 to fill its extended ghost atom regions in the are needed by rank 0 to fill its extended ghost atom regions in the
+/-*y* directions (blue rectangles). Thus in this case, ranks 5 and +/-*y* directions (blue rectangles). Thus, in this case, ranks 5 and
6 do not include ghost atoms they received from each other (in the *x* 6 do not include ghost atoms they received from each other (in the *x*
stage) in the atoms they send to rank 0. The key point is that while stage) in the atoms they send to rank 0. The key point is that while
the pattern of communication is more complex in the irregular the pattern of communication is more complex in the irregular
@ -78,14 +78,14 @@ A "reverse" communication is when computed ghost atom attributes are
sent back to the processor who owns the atom. This is used (for sent back to the processor who owns the atom. This is used (for
example) to sum partial forces on ghost atoms to the complete force on example) to sum partial forces on ghost atoms to the complete force on
owned atoms. The order of the two stages described in the owned atoms. The order of the two stages described in the
:ref:`ghost-atom-comm` figure is inverted and the same lists of atoms :ref:`ghost-atom-comm` figure is inverted, and the same lists of atoms
are used to pack and unpack message buffers with per-atom forces. When are used to pack and unpack message buffers with per-atom forces. When
a received buffer is unpacked, the ghost forces are summed to owned atom a received buffer is unpacked, the ghost forces are summed to owned atom
forces. As in forward communication, forces on atoms in the four blue forces. As in forward communication, forces on atoms in the four blue
corners of the diagrams are sent, received, and summed twice (once at corners of the diagrams are sent, received, and summed twice (once at
each stage) before owning processors have the full force. each stage) before owning processors have the full force.
These two operations are used many places within LAMMPS aside from These two operations are used in many places within LAMMPS aside from
exchange of coordinates and forces, for example by manybody potentials exchange of coordinates and forces, for example by manybody potentials
to share intermediate per-atom values, or by rigid-body integrators to to share intermediate per-atom values, or by rigid-body integrators to
enable each atom in a body to access body properties. Here are enable each atom in a body to access body properties. Here are
@ -105,7 +105,7 @@ performed in LAMMPS:
atom pairs when building neighbor lists or computing forces. atom pairs when building neighbor lists or computing forces.
- The cutoff distance for exchanging ghost atoms is typically equal to - The cutoff distance for exchanging ghost atoms is typically equal to
the neighbor cutoff. But it can also chosen to be longer if needed, the neighbor cutoff. But it can also set to a larger value if needed,
e.g. half the diameter of a rigid body composed of multiple atoms or e.g. half the diameter of a rigid body composed of multiple atoms or
over 3x the length of a stretched bond for dihedral interactions. It over 3x the length of a stretched bond for dihedral interactions. It
can also exceed the periodic box size. For the regular communication can also exceed the periodic box size. For the regular communication
@ -113,7 +113,7 @@ performed in LAMMPS:
processor's subdomain, then multiple exchanges are performed in the processor's subdomain, then multiple exchanges are performed in the
same direction. Each exchange is with the same neighbor processor, same direction. Each exchange is with the same neighbor processor,
but buffers are packed/unpacked using a different list of atoms. For but buffers are packed/unpacked using a different list of atoms. For
forward communication, in the first exchange a processor sends only forward communication, in the first exchange, a processor sends only
owned atoms. In subsequent exchanges, it sends ghost atoms received owned atoms. In subsequent exchanges, it sends ghost atoms received
in previous exchanges. For the irregular pattern (right) overlaps of in previous exchanges. For the irregular pattern (right) overlaps of
a processor's extended ghost-atom subdomain with all other processors a processor's extended ghost-atom subdomain with all other processors

View File

@ -40,28 +40,28 @@ orthogonal boxes.
.. _fft-parallel: .. _fft-parallel:
.. figure:: img/fft-decomp-parallel.png .. figure:: img/fft-decomp-parallel.png
:align: center
parallel FFT in PPPM Parallel FFT in PPPM
Stages of a parallel FFT for a simulation domain overlaid Stages of a parallel FFT for a simulation domain overlaid with an
with an 8x8x8 3d FFT grid, partitioned across 64 processors. 8x8x8 3d FFT grid, partitioned across 64 processors. Within each
Within each of the 4 diagrams, grid cells of the same color are of the 4 diagrams, grid cells of the same color are owned by a
owned by a single processor; for simplicity only cells owned by 4 single processor; for simplicity, only cells owned by 4 or 8 of
or 8 of the 64 processors are colored. The two images on the left the 64 processors are colored. The two images on the left
illustrate brick-to-pencil communication. The two images on the illustrate brick-to-pencil communication. The two images on the
right illustrate pencil-to-pencil communication, which in this right illustrate pencil-to-pencil communication, which in this
case transposes the *y* and *z* dimensions of the grid. case transposes the *y* and *z* dimensions of the grid.
Parallel 3d FFTs require substantial communication relative to their Parallel 3d FFTs require substantial communication relative to their
computational cost. A 3d FFT is implemented by a series of 1d FFTs computational cost. A 3d FFT is implemented by a series of 1d FFTs
along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus the FFT along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus, the
grid cannot be decomposed like atoms into 3 dimensions for parallel FFT grid cannot be decomposed like atoms into 3 dimensions for parallel
processing of the FFTs but only in 1 (as planes) or 2 (as pencils) processing of the FFTs but only in 1 (as planes) or 2 (as pencils)
dimensions and in between the steps the grid needs to be transposed to dimensions and in between the steps the grid needs to be transposed to
have the FFT grid portion "owned" by each MPI process complete in the have the FFT grid portion "owned" by each MPI process complete in the
direction of the 1d FFTs it has to perform. LAMMPS uses the direction of the 1d FFTs it has to perform. LAMMPS uses the
pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure. pencil-decomposition algorithm as shown in the :ref:`fft-parallel`
figure.
Initially (far left), each processor owns a brick of same-color grid Initially (far left), each processor owns a brick of same-color grid
cells (actually grid points) contained within in its subdomain. A cells (actually grid points) contained within in its subdomain. A
@ -97,7 +97,7 @@ across all $P$ processors with a single call to ``MPI_Alltoall()``, but
this is typically much slower. However, for the specialized brick and this is typically much slower. However, for the specialized brick and
pencil tiling illustrated in :ref:`fft-parallel` figure, collective pencil tiling illustrated in :ref:`fft-parallel` figure, collective
communication across the entire MPI communicator is not required. In communication across the entire MPI communicator is not required. In
the example an :math:`8^3` grid with 512 grid cells is partitioned the example, an :math:`8^3` grid with 512 grid cells is partitioned
across 64 processors; each processor owns a 2x2x2 3d brick of grid across 64 processors; each processor owns a 2x2x2 3d brick of grid
cells. The initial brick-to-pencil communication (upper left to upper cells. The initial brick-to-pencil communication (upper left to upper
right) only requires collective communication within subgroups of 4 right) only requires collective communication within subgroups of 4
@ -132,7 +132,7 @@ grid/particle operations that LAMMPS supports:
- The fftMPI library allows each grid dimension to be a multiple of - The fftMPI library allows each grid dimension to be a multiple of
small prime factors (2,3,5), and allows any number of processors to small prime factors (2,3,5), and allows any number of processors to
perform the FFT. The resulting brick and pencil decompositions are perform the FFT. The resulting brick and pencil decompositions are
thus not always as well-aligned but the size of subgroups of thus not always as well-aligned, but the size of subgroups of
processors for the two modes of communication (brick/pencil and processors for the two modes of communication (brick/pencil and
pencil/pencil) still scale as :math:`O(P^{\frac{1}{3}})` and pencil/pencil) still scale as :math:`O(P^{\frac{1}{3}})` and
:math:`O(P^{\frac{1}{2}})`. :math:`O(P^{\frac{1}{2}})`.
@ -143,21 +143,23 @@ grid/particle operations that LAMMPS supports:
in memory. This reordering can be done during the packing or in memory. This reordering can be done during the packing or
unpacking of buffers for MPI communication. unpacking of buffers for MPI communication.
- For large systems and particularly a large number of MPI processes, - For large systems and particularly many MPI processes, the dominant
the dominant cost for parallel FFTs is often the communication, not cost for parallel FFTs is often the communication, not the computation
the computation of 1d FFTs, even though the latter scales as :math:`N of 1d FFTs, even though the latter scales as :math:`N \log(N)` in the
\log(N)` in the number of grid points *N* per grid direction. This is number of grid points *N* per grid direction. This is due to the fact
due to the fact that only a 2d decomposition into pencils is possible that only a 2d decomposition into pencils is possible while atom data
while atom data (and their corresponding short-range force and energy (and their corresponding short-range force and energy computations)
computations) can be decomposed efficiently in 3d. can be decomposed efficiently in 3d.
This can be addressed by reducing the number of MPI processes involved Reducing the number of MPI processes involved in the MPI communication
in the MPI communication by using :doc:`hybrid MPI + OpenMP will reduce this kind of overhead. By using a :doc:`hybrid MPI +
parallelization <Speed_omp>`. This will use OpenMP parallelization OpenMP parallelization <Speed_omp>` it is still possible to use all
inside the MPI domains and while that may have a lower parallel processes for parallel computation. It will use OpenMP
efficiency, it reduces the communication overhead. parallelization inside the MPI domains. While that may have a lower
parallel efficiency for some part of the computation, that can be less
than the communication overhead in the 3d FFTs.
As an alternative it is also possible to start a :ref:`multi-partition As an alternative, it is also possible to start a :ref:`multi-partition
<partition>` calculation and then use the :doc:`verlet/split <partition>` calculation and then use the :doc:`verlet/split
integrator <run_style>` to perform the PPPM computation on a integrator <run_style>` to perform the PPPM computation on a
dedicated, separate partition of MPI processes. This uses an integer dedicated, separate partition of MPI processes. This uses an integer
@ -175,7 +177,7 @@ grid/particle operations that LAMMPS supports:
manner consistent with processor subdomains, and provides methods for manner consistent with processor subdomains, and provides methods for
forward and reverse communication of owned and ghost grid point forward and reverse communication of owned and ghost grid point
values. It is used for PPPM as an FFT grid (as outlined above) and values. It is used for PPPM as an FFT grid (as outlined above) and
also for the MSM algorithm which uses a cascade of grid sizes from also for the MSM algorithm, which uses a cascade of grid sizes from
fine to coarse to compute long-range Coulombic forces. The GridComm fine to coarse to compute long-range Coulombic forces. The GridComm
class is also useful for models where continuum fields interact with class is also useful for models where continuum fields interact with
particles. For example, the two-temperature model (TTM) defines heat particles. For example, the two-temperature model (TTM) defines heat

View File

@ -3,12 +3,12 @@ Neighbor lists
To compute forces efficiently, each processor creates a Verlet-style To compute forces efficiently, each processor creates a Verlet-style
neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned, neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned,
*j* = owned or ghost) with separation less than the applicable *j* = owned or ghost) with separation less than the applicable neighbor
neighbor list cutoff distance. In LAMMPS the neighbor lists are stored list cutoff distance. In LAMMPS, the neighbor lists are stored in a
in a multiple-page data structure; each page is a contiguous chunk of multiple-page data structure; each page is a contiguous chunk of memory
memory which stores vectors of neighbor atoms *j* for many *i* atoms. which stores vectors of neighbor atoms *j* for many *i* atoms. This
This allows pages to be incrementally allocated or deallocated in blocks allows pages to be incrementally allocated or deallocated in blocks as
as needed. Neighbor lists typically consume the most memory of any data needed. Neighbor lists typically consume the most memory of any data
structure in LAMMPS. The neighbor list is rebuilt (from scratch) once structure in LAMMPS. The neighbor list is rebuilt (from scratch) once
every few timesteps, then used repeatedly each step for force or other every few timesteps, then used repeatedly each step for force or other
computations. The neighbor cutoff distance is :math:`R_n = R_f + computations. The neighbor cutoff distance is :math:`R_n = R_f +
@ -16,7 +16,7 @@ computations. The neighbor cutoff distance is :math:`R_n = R_f +
the interatomic potential for computing short-range pairwise or manybody the interatomic potential for computing short-range pairwise or manybody
forces and :math:`\Delta_s` is a "skin" distance that allows the list to forces and :math:`\Delta_s` is a "skin" distance that allows the list to
be used for multiple steps assuming that atoms do not move very far be used for multiple steps assuming that atoms do not move very far
between consecutive time steps. Typically the code triggers between consecutive time steps. Typically, the code triggers
reneighboring when any atom has moved half the skin distance since the reneighboring when any atom has moved half the skin distance since the
last reneighboring; this and other options of the neighbor list rebuild last reneighboring; this and other options of the neighbor list rebuild
can be adjusted with the :doc:`neigh_modify <neigh_modify>` command. can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
@ -26,10 +26,10 @@ their owning processor's subdomain are first migrated to new processors
via communication. Periodic boundary conditions are also (only) via communication. Periodic boundary conditions are also (only)
enforced on these steps to ensure each atom is re-assigned to the enforced on these steps to ensure each atom is re-assigned to the
correct processor. After migration, the atoms owned by each processor correct processor. After migration, the atoms owned by each processor
are stored in a contiguous vector. Periodically each processor are stored in a contiguous vector. Periodically, each processor
spatially sorts owned atoms within its vector to reorder it for improved spatially sorts owned atoms within its vector to reorder it for improved
cache efficiency in force computations and neighbor list building. For cache efficiency in force computations and neighbor list building. For
that atoms are spatially binned and then reordered so that atoms in the that, atoms are spatially binned and then reordered so that atoms in the
same bin are adjacent in the vector. Atom sorting can be disabled or same bin are adjacent in the vector. Atom sorting can be disabled or
its settings modified with the :doc:`atom_modify <atom_modify>` command. its settings modified with the :doc:`atom_modify <atom_modify>` command.
@ -44,7 +44,7 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.
(left) and triclinic (right) domains. A regular grid of neighbor (left) and triclinic (right) domains. A regular grid of neighbor
bins (thin lines) overlays the entire simulation domain and need not bins (thin lines) overlays the entire simulation domain and need not
align with subdomain boundaries; only the portion overlapping the align with subdomain boundaries; only the portion overlapping the
augmented subdomain is shown. In the triclinic case it overlaps the augmented subdomain is shown. In the triclinic case, it overlaps the
bounding box of the tilted rectangle. The blue- and red-shaded bins bounding box of the tilted rectangle. The blue- and red-shaded bins
represent a stencil of bins searched to find neighbors of a particular represent a stencil of bins searched to find neighbors of a particular
atom (black dot). atom (black dot).
@ -53,12 +53,12 @@ To build a local neighbor list in linear time, the simulation domain is
overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins, overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
as shown in the :ref:`neighbor-stencil` figure for 2d models and a as shown in the :ref:`neighbor-stencil` figure for 2d models and a
single MPI processor's subdomain. Each processor stores a set of single MPI processor's subdomain. Each processor stores a set of
neighbor bins which overlap its subdomain extended by the neighbor neighbor bins which overlap its subdomain, extended by the neighbor
cutoff distance :math:`R_n`. As illustrated, the bins need not align cutoff distance :math:`R_n`. As illustrated, the bins need not align
with processor boundaries; an integer number in each dimension is fit to with processor boundaries; an integer number in each dimension is fit to
the size of the entire simulation box. the size of the entire simulation box.
Most often LAMMPS builds what it calls a "half" neighbor list where Most often, LAMMPS builds what is called a "half" neighbor list where
each *i,j* neighbor pair is stored only once, with either atom *i* or each *i,j* neighbor pair is stored only once, with either atom *i* or
*j* as the central atom. The build can be done efficiently by using a *j* as the central atom. The build can be done efficiently by using a
pre-computed "stencil" of bins around a central origin bin which pre-computed "stencil" of bins around a central origin bin which
@ -67,18 +67,18 @@ is simply a list of integer offsets in *x,y,z* of nearby bins
surrounding the origin bin which are close enough to contain any surrounding the origin bin which are close enough to contain any
neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the
origin bin. Note that for a half neighbor list, the stencil can be origin bin. Note that for a half neighbor list, the stencil can be
asymmetric since each atom only need store half its nearby neighbors. asymmetric, since each atom only need store half its nearby neighbors.
These stencils are illustrated in the figure for a half list and a bin These stencils are illustrated in the figure for a half list and a bin
size of :math:`\frac{1}{2} R_n`. There are 13 red+blue stencil bins in size of :math:`\frac{1}{2} R_n`. There are 13 red+blue stencil bins in
2d (for the orthogonal case, 15 for triclinic). In 3d there would be 2d (for the orthogonal case, 15 for triclinic). In 3d there would be
63, 13 in the plane of bins that contain the origin bin and 25 in each 63, 13 in the plane of bins that contain the origin bin and 25 in each
of the two planes above it in the *z* direction (75 for triclinic). The of the two planes above it in the *z* direction (75 for triclinic). The
reason the triclinic stencil has extra bins is because the bins tile the triclinic stencil has extra bins because the bins tile the bounding box
bounding box of the entire triclinic domain and thus are not periodic of the entire triclinic domain, and thus are not periodic with respect
with respect to the simulation box itself. The stencil and logic for to the simulation box itself. The stencil and logic for determining
determining which *i,j* pairs to include in the neighbor list are which *i,j* pairs to include in the neighbor list are altered slightly
altered slightly to account for this. to account for this.
To build a neighbor list, a processor first loops over its "owned" plus To build a neighbor list, a processor first loops over its "owned" plus
"ghost" atoms and assigns each to a neighbor bin. This uses an integer "ghost" atoms and assigns each to a neighbor bin. This uses an integer
@ -95,7 +95,7 @@ supports:
been found to be optimal for many typical cases. Smaller bins incur been found to be optimal for many typical cases. Smaller bins incur
additional overhead to loop over; larger bins require more distance additional overhead to loop over; larger bins require more distance
calculations. Note that for smaller bin sizes, the 2d stencil in the calculations. Note that for smaller bin sizes, the 2d stencil in the
figure would be more semi-circular in shape (hemispherical in 3d), figure would be of a more semicircular shape (hemispherical in 3d),
with bins near the corners of the square eliminated due to their with bins near the corners of the square eliminated due to their
distance from the origin bin. distance from the origin bin.
@ -111,8 +111,8 @@ supports:
symmetric stencil. It also includes lists with partial enumeration of symmetric stencil. It also includes lists with partial enumeration of
ghost atom neighbors. The full and ghost-atom lists are used by ghost atom neighbors. The full and ghost-atom lists are used by
various manybody interatomic potentials. Lists may also use different various manybody interatomic potentials. Lists may also use different
criteria for inclusion of a pair interaction. Typically this simply criteria for inclusion of a pairwise interaction. Typically, this
depends only on the distance between two atoms and the cutoff simply depends only on the distance between two atoms and the cutoff
distance. But for finite-size coarse-grained particles with distance. But for finite-size coarse-grained particles with
individual diameters (e.g. polydisperse granular particles), it can individual diameters (e.g. polydisperse granular particles), it can
also depend on the diameters of the two particles. also depend on the diameters of the two particles.
@ -121,11 +121,11 @@ supports:
of the master neighbor list for the full system need to be generated, of the master neighbor list for the full system need to be generated,
one for each sub-style, which contains only the *i,j* pairs needed to one for each sub-style, which contains only the *i,j* pairs needed to
compute interactions between subsets of atoms for the corresponding compute interactions between subsets of atoms for the corresponding
potential. This means not all *i* or *j* atoms owned by a processor potential. This means, not all *i* or *j* atoms owned by a processor
are included in a particular sub-list. are included in a particular sub-list.
- Some models use different cutoff lengths for pairwise interactions - Some models use different cutoff lengths for pairwise interactions
between different kinds of particles which are stored in a single between different kinds of particles, which are stored in a single
neighbor list. One example is a solvated colloidal system with large neighbor list. One example is a solvated colloidal system with large
colloidal particles where colloid/colloid, colloid/solvent, and colloidal particles where colloid/colloid, colloid/solvent, and
solvent/solvent interaction cutoffs can be dramatically different. solvent/solvent interaction cutoffs can be dramatically different.
@ -153,7 +153,7 @@ supports:
For the newton pair *on* setting the atom *j* is only added to the For the newton pair *on* setting the atom *j* is only added to the
list if its *z* coordinate is larger, or if equal the *y* coordinate list if its *z* coordinate is larger, or if equal the *y* coordinate
is larger, and that is equal, too, the *x* coordinate is larger. For is larger, and that is equal, too, the *x* coordinate is larger. For
homogeneously dense systems that will result in picking neighbors from homogeneously dense systems, that will result in picking neighbors from
a same size sector in always the same direction relative to the a same size sector in always the same direction relative to the
"owned" atom and thus it should lead to similar length neighbor lists "owned" atom, and thus it should lead to similar length neighbor lists
and thus reduce the chance of a load imbalance. and reduce the chance of a load imbalance.

View File

@ -6,7 +6,7 @@ thread parallelism to predominantly distribute loops over local data
and thus follow an orthogonal parallelization strategy to the and thus follow an orthogonal parallelization strategy to the
decomposition into spatial domains used by the :doc:`MPI partitioning decomposition into spatial domains used by the :doc:`MPI partitioning
<Developer_par_part>`. For clarity, this section discusses only the <Developer_par_part>`. For clarity, this section discusses only the
implementation in the OPENMP package as it is the simplest. The INTEL implementation in the OPENMP package, as it is the simplest. The INTEL
and KOKKOS package offer additional options and are more complex since and KOKKOS package offer additional options and are more complex since
they support more features and different hardware like co-processors they support more features and different hardware like co-processors
or GPUs. or GPUs.
@ -14,7 +14,7 @@ or GPUs.
One of the key decisions when implementing the OPENMP package was to One of the key decisions when implementing the OPENMP package was to
keep the changes to the source code small, so that it would be easier to keep the changes to the source code small, so that it would be easier to
maintain the code and keep it in sync with the non-threaded standard maintain the code and keep it in sync with the non-threaded standard
implementation. this is achieved by a) making the OPENMP version a implementation. This is achieved by a) making the OPENMP version a
derived class from the regular version (e.g. ``PairLJCutOMP`` from derived class from the regular version (e.g. ``PairLJCutOMP`` from
``PairLJCut``) and overriding only methods that are multi-threaded or ``PairLJCut``) and overriding only methods that are multi-threaded or
need to be modified to support multi-threading (similar to what was done need to be modified to support multi-threading (similar to what was done
@ -26,13 +26,13 @@ into three separate classes ``ThrOMP``, ``ThrData``, and ``FixOMP``.
available in the corresponding base class (e.g. ``Pair`` for available in the corresponding base class (e.g. ``Pair`` for
``PairLJCutOMP``) like multi-thread aware variants of the "tally" ``PairLJCutOMP``) like multi-thread aware variants of the "tally"
functions. Those functions are made available through multiple functions. Those functions are made available through multiple
inheritance so those new functions have to have unique names to avoid inheritance, so those new functions have to have unique names to avoid
ambiguities; typically ``_thr`` is appended to the name of the function. ambiguities; typically ``_thr`` is appended to the name of the function.
``ThrData`` is a classes that manages per-thread data structures. ``ThrData`` is a class that manages per-thread data structures. It is
It is used instead of extending the corresponding storage to per-thread used instead of extending the corresponding storage to per-thread arrays
arrays to avoid slowdowns due to "false sharing" when multiple threads to avoid slowdowns due to "false sharing" when multiple threads update
update adjacent elements in an array and thus force the CPU cache lines adjacent elements in an array and thus force the CPU cache lines to be
to be reset and re-fetched. ``FixOMP`` finally manages the "multi-thread reset and re-fetched. ``FixOMP`` finally manages the "multi-thread
state" like settings and access to per-thread storage, it is activated state" like settings and access to per-thread storage, it is activated
by the :doc:`package omp <package>` command. by the :doc:`package omp <package>` command.
@ -46,24 +46,24 @@ involve multiple atoms and thus there are race conditions when multiple
threads want to update per-atom data of the same atoms. Five possible threads want to update per-atom data of the same atoms. Five possible
strategies have been considered to avoid this: strategies have been considered to avoid this:
1) restructure the code so that there is no overlapping access possible 1. Restructure the code so that there is no overlapping access possible
when computing in parallel, e.g. by breaking lists into multiple when computing in parallel, e.g. by breaking lists into multiple
parts and synchronizing threads in between. parts and synchronizing threads in between.
2) have each thread be "responsible" for a specific group of atoms and 2. Have each thread be "responsible" for a specific group of atoms and
compute these interactions multiple times, once on each thread that compute these interactions multiple times, once on each thread that
is responsible for a given atom and then have each thread only update is responsible for a given atom, and then have each thread only update
the properties of this atom. the properties of this atom.
3) use mutexes around functions and regions of code where the data race 3. Use mutexes around functions and regions of code where the data race
could happen could happen.
4) use atomic operations when updating per-atom properties 4. Use atomic operations when updating per-atom properties.
5) use replicated per-thread data structures to accumulate data without 5. Use replicated per-thread data structures to accumulate data without
conflicts and then use a reduction to combine those results into the conflicts and then use a reduction to combine those results into the
data structures used by the regular style. data structures used by the regular style.
Option 5 was chosen for the OPENMP package because it would retain the Option 5 was chosen for the OPENMP package because it would retain the
performance for the case of 1 thread and the code would be more performance for the case of a single thread and the code would be more
maintainable. Option 1 would require extensive code changes, maintainable. Option 1 would require extensive code changes,
particularly to the neighbor list code; options 2 would have incurred a particularly to the neighbor list code; option 2 would have incurred a
2x or more performance penalty for the serial case; option 3 causes 2x or more performance penalty for the serial case; option 3 causes
significant overhead and would enforce serialization of operations in significant overhead and would enforce serialization of operations in
inner loops and thus defeat the purpose of multi-threading; option 4 inner loops and thus defeat the purpose of multi-threading; option 4
@ -80,7 +80,7 @@ equivalent to the number of CPU cores per CPU socket on high-end
supercomputers. supercomputers.
Thus arrays like the force array are dimensioned to the number of atoms Thus arrays like the force array are dimensioned to the number of atoms
times the number of threads when enabling OpenMP support and inside the times the number of threads when enabling OpenMP support, and inside the
compute functions a pointer to a different chunk is obtained by each thread. compute functions a pointer to a different chunk is obtained by each thread.
Similarly, accumulators like potential energy or virial are kept in Similarly, accumulators like potential energy or virial are kept in
per-thread instances of the ``ThrData`` class and then only reduced and per-thread instances of the ``ThrData`` class and then only reduced and
@ -91,7 +91,7 @@ Loop scheduling
""""""""""""""" """""""""""""""
Multi-thread parallelization is applied by distributing (outer) loops Multi-thread parallelization is applied by distributing (outer) loops
statically across threads. Typically this would be the loop over local statically across threads. Typically, this would be the loop over local
atoms *i* when processing *i,j* pairs of atoms from a neighbor list. atoms *i* when processing *i,j* pairs of atoms from a neighbor list.
The design of the neighbor list code results in atoms having a similar The design of the neighbor list code results in atoms having a similar
number of neighbors for homogeneous systems and thus load imbalances number of neighbors for homogeneous systems and thus load imbalances

View File

@ -7,36 +7,36 @@ distributed-memory parallelism is set with the :doc:`comm_style command
.. _domain-decomposition: .. _domain-decomposition:
.. figure:: img/domain-decomp.png .. figure:: img/domain-decomp.png
:align: center
domain decomposition Domain decomposition schemes
This figure shows the different kinds of domain decomposition used This figure shows the different kinds of domain decomposition used
for MPI parallelization: "brick" on the left with an orthogonal for MPI parallelization: "brick" on the left with an orthogonal
(left) and a triclinic (middle) simulation domain, and a "tiled" (left) and a triclinic (middle) simulation domain, and a "tiled"
decomposition (right). The black lines show the division into decomposition (right). The black lines show the division into
subdomains and the contained atoms are "owned" by the corresponding subdomains, and the contained atoms are "owned" by the
MPI process. The green dashed lines indicate how subdomains are corresponding MPI process. The green dashed lines indicate how
extended with "ghost" atoms up to the communication cutoff distance. subdomains are extended with "ghost" atoms up to the communication
cutoff distance.
The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal The LAMMPS simulation box is a 3d or 2d volume, which can be of
or triclinic in shape, as illustrated in the :ref:`domain-decomposition` orthogonal or triclinic shape, as illustrated in the
figure for the 2d case. Orthogonal means the box edges are aligned with :ref:`domain-decomposition` figure for the 2d case. Orthogonal means
the *x*, *y*, *z* Cartesian axes, and the box faces are thus all the box edges are aligned with the *x*, *y*, *z* Cartesian axes, and the
rectangular. Triclinic allows for a more general parallelepiped shape box faces are thus all rectangular. Triclinic allows for a more general
in which edges are aligned with three arbitrary vectors and the box parallelepiped shape in which edges are aligned with three arbitrary
faces are parallelograms. In each dimension box faces can be periodic, vectors and the box faces are parallelograms. In each dimension, box
or non-periodic with fixed or shrink-wrapped boundaries. In the fixed faces can be periodic, or non-periodic with fixed or shrink-wrapped
case, atoms which move outside the face are deleted; shrink-wrapped boundaries. In the fixed case, atoms which move outside the face are
means the position of the box face adjusts continuously to enclose all deleted; shrink-wrapped means the position of the box face adjusts
the atoms. continuously to enclose all the atoms.
For distributed-memory MPI parallelism, the simulation box is spatially For distributed-memory MPI parallelism, the simulation box is spatially
decomposed (partitioned) into non-overlapping subdomains which fill the decomposed (partitioned) into non-overlapping subdomains which fill the
box. The default partitioning, "brick", is most suitable when atom box. The default partitioning, "brick", is most suitable when atom
density is roughly uniform, as shown in the left-side images of the density is roughly uniform, as shown in the left-side images of the
:ref:`domain-decomposition` figure. The subdomains comprise a regular :ref:`domain-decomposition` figure. The subdomains comprise a regular
grid and all subdomains are identical in size and shape. Both the grid, and all subdomains are identical in size and shape. Both the
orthogonal and triclinic boxes can deform continuously during a orthogonal and triclinic boxes can deform continuously during a
simulation, e.g. to compress a solid or shear a liquid, in which case simulation, e.g. to compress a solid or shear a liquid, in which case
the processor subdomains likewise deform. the processor subdomains likewise deform.
@ -76,14 +76,14 @@ the load imbalance:
The pictures above demonstrate different decompositions for a 2d system The pictures above demonstrate different decompositions for a 2d system
with 12 MPI ranks. The atom colors indicate the load imbalance of each with 12 MPI ranks. The atom colors indicate the load imbalance of each
subdomain with green being optimal and red the least optimal. subdomain, with green being optimal and red the least optimal.
Due to the vacuum in the system, the default decomposition is unbalanced Due to the vacuum in the system, the default decomposition is
with several MPI ranks without atoms (left). By forcing a 1x12x1 unbalanced, with several MPI ranks without atoms (left). By forcing a
processor grid, every MPI rank does computations now, but number of 1x12x1 processor grid, every MPI rank does computations now, but the
atoms per subdomain is still uneven and the thin slice shape increases number of atoms per subdomain is still uneven, and the thin slice shape
the amount of communication between subdomains (center left). With a increases the amount of communication between subdomains (center
2x6x1 processor grid and shifting the subdomain divisions, the load left). With a 2x6x1 processor grid and shifting the subdomain divisions,
imbalance is further reduced and the amount of communication required the load imbalance is further reduced and the amount of communication
between subdomains is less (center right). And using the recursive required between subdomains is less (center right). And using the
bisectioning leads to further improved decomposition (right). recursive bisectioning leads to further improved decomposition (right).

View File

@ -136,7 +136,7 @@ The LAMMPS Python module enables calling the LAMMPS C library API from
Python by dynamically loading functions in the LAMMPS shared library through Python by dynamically loading functions in the LAMMPS shared library through
the `Python ctypes module <https://docs.python.org/3/library/ctypes.html>`_. the `Python ctypes module <https://docs.python.org/3/library/ctypes.html>`_.
Because of the dynamic loading, it is **required** that LAMMPS is compiled Because of the dynamic loading, it is **required** that LAMMPS is compiled
in :ref:`"shared" mode <exe>`. The Python interface is object oriented, but in :ref:`"shared" mode <exe>`. The Python interface is object-oriented, but
otherwise tries to be very similar to the C library API. Three different otherwise tries to be very similar to the C library API. Three different
Python classes to run LAMMPS are available and they build on each other. Python classes to run LAMMPS are available and they build on each other.
More information on this is in the :doc:`Python_head` More information on this is in the :doc:`Python_head`
@ -152,7 +152,7 @@ LAMMPS Fortran API
The LAMMPS Fortran module is a wrapper around calling functions from the The LAMMPS Fortran module is a wrapper around calling functions from the
LAMMPS C library API. This is done using the ISO_C_BINDING feature in LAMMPS C library API. This is done using the ISO_C_BINDING feature in
Fortran 2003. The interface is object oriented but otherwise tries to Fortran 2003. The interface is object-oriented but otherwise tries to
be very similar to the C library API and the basic Python module. be very similar to the C library API and the basic Python module.
.. toctree:: .. toctree::

View File

@ -1071,7 +1071,7 @@ getting started, but not as a fully tested and supported feature of the
LAMMPS distribution. Any contributions to complete this are, of course, LAMMPS distribution. Any contributions to complete this are, of course,
welcome. Please also note, that for the case of creating a Python wrapper, welcome. Please also note, that for the case of creating a Python wrapper,
a fully supported :doc:`Ctypes based lammps module <Python_module>` a fully supported :doc:`Ctypes based lammps module <Python_module>`
already exists. That module is designed to be object oriented while already exists. That module is designed to be object-oriented while
SWIG will generate a 1:1 translation of the functions in the interface file. SWIG will generate a 1:1 translation of the functions in the interface file.
Building the wrapper Building the wrapper

View File

@ -730,18 +730,18 @@ is because there can only be one fix which monitors the global
pressure and changes the simulation box dimensions. So you have 3 pressure and changes the simulation box dimensions. So you have 3
choices: choices:
* Use one of the 4 NPT or NPH styles for the rigid bodies. Use the #. Use one of the 4 NPT or NPH styles for the rigid bodies. Use the
*dilate* all option so that it will dilate the positions of the *dilate* all option so that it will dilate the positions of the
non-rigid particles as well. Use :doc:`fix nvt <fix_nh>` (or any non-rigid particles as well. Use :doc:`fix nvt <fix_nh>` (or any
other thermostat) for the non-rigid particles. other thermostat) for the non-rigid particles.
* Use :doc:`fix npt <fix_nh>` for the group of non-rigid particles. Use #. Use :doc:`fix npt <fix_nh>` for the group of non-rigid particles. Use
the *dilate* all option so that it will dilate the center-of-mass the *dilate* all option so that it will dilate the center-of-mass
positions of the rigid bodies as well. Use one of the 4 NVE or 2 NVT positions of the rigid bodies as well. Use one of the 4 NVE or 2 NVT
rigid styles for the rigid bodies. rigid styles for the rigid bodies.
* Use :doc:`fix press/berendsen <fix_press_berendsen>` to compute the #. Use :doc:`fix press/berendsen <fix_press_berendsen>` to compute the
pressure and change the box dimensions. Use one of the 4 NVE or 2 NVT pressure and change the box dimensions. Use one of the 4 NVE or 2 NVT
rigid styles for the rigid bodies. Use :doc:`fix nvt <fix_nh>` (or rigid styles for the rigid bodies. Use :doc:`fix nvt <fix_nh>` (or
any other thermostat) for the non-rigid particles. any other thermostat) for the non-rigid particles.
In all case, the rigid bodies and non-rigid particles both contribute In all case, the rigid bodies and non-rigid particles both contribute
to the global pressure and the box is scaled the same by any of the to the global pressure and the box is scaled the same by any of the

View File

@ -166,7 +166,7 @@ events, all the other replicas also run dynamics and event checking
with the same schedule, but the final states are always overwritten by with the same schedule, but the final states are always overwritten by
the state of the event replica. the state of the event replica.
The outer loop of the pseudo-code above continues until *N* steps of The outer loop of the pseudocode above continues until *N* steps of
dynamics have been performed. Note that *N* only includes the dynamics have been performed. Note that *N* only includes the
dynamics of stages 2 and 3, not the steps taken during dephasing or dynamics of stages 2 and 3, not the steps taken during dephasing or
the minimization iterations of quenching. The specified *N* is the minimization iterations of quenching. The specified *N* is

View File

@ -94,7 +94,7 @@ class Pointers {
python(ptr->python) {} python(ptr->python) {}
virtual ~Pointers() = default; virtual ~Pointers() = default;
// remove default members execept for the copy constructor // remove other default members
Pointers() = delete; Pointers() = delete;
Pointers(const Pointers &) = default; Pointers(const Pointers &) = default;