starting grammar, punctuation, and spelling review for developer info sections
This commit is contained in:
@ -1,52 +1,53 @@
|
||||
Code design
|
||||
-----------
|
||||
|
||||
This section explains some of the code design choices in LAMMPS with
|
||||
the goal of helping developers write new code similar to the existing
|
||||
code. Please see the section on :doc:`Requirements for contributed
|
||||
code <Modify_style>` for more specific recommendations and guidelines.
|
||||
While that section is organized more in the form of a checklist for
|
||||
code contributors, the focus here is on overall code design strategy,
|
||||
choices made between possible alternatives, and discussing some
|
||||
relevant C++ programming language constructs.
|
||||
This section explains some code design choices in LAMMPS with the goal
|
||||
of helping developers write new code similar to the existing code.
|
||||
Please see the section on :doc:`Requirements for contributed code
|
||||
<Modify_style>` for more specific recommendations and guidelines. While
|
||||
that section is organized more in the form of a checklist for code
|
||||
contributors, the focus here is on overall code design strategy, choices
|
||||
made between possible alternatives, and discussing some relevant C++
|
||||
programming language constructs.
|
||||
|
||||
Historically, the basic design philosophy of the LAMMPS C++ code was a
|
||||
"C with classes" style. The motivation was to make it easy to modify
|
||||
LAMMPS for people without significant training in C++ programming.
|
||||
Data structures and code constructs were used that resemble the
|
||||
previous implementation(s) in Fortran. A contributing factor to this
|
||||
choice also was that at the time, C++ compilers were often not mature
|
||||
and some of the advanced features contained bugs or did not function
|
||||
as the standard required. There were also disagreements between
|
||||
compiler vendors as to how to interpret the C++ standard documents.
|
||||
LAMMPS for people without significant training in C++ programming. Data
|
||||
structures and code constructs were used that resemble the previous
|
||||
implementation(s) in Fortran. A contributing factor to this choice was
|
||||
that at the time, C++ compilers were often not mature and some advanced
|
||||
features contained bugs or did not function as the standard required.
|
||||
There were also disagreements between compiler vendors as to how to
|
||||
interpret the C++ standard documents.
|
||||
|
||||
However, C++ compilers have now advanced significantly. In 2020 we
|
||||
decided to to require the C++11 standard as the minimum C++ language
|
||||
standard for LAMMPS. Since then we have begun to also replace some of
|
||||
the C-style constructs with equivalent C++ functionality, either from
|
||||
the C++ standard library or as custom classes or functions, in order
|
||||
to improve readability of the code and to increase code reuse through
|
||||
abstraction of commonly used functionality.
|
||||
However, C++ compilers and the C++ programming language have advanced
|
||||
significantly. In 2020, the LAMMPS developers decided to require the
|
||||
C++11 standard as the minimum C++ language standard for LAMMPS. Since
|
||||
then, we have begun to replace C-style constructs with equivalent C++
|
||||
functionality. This was taken either from the C++ standard library or
|
||||
implemented as custom classes or functions. The goal is to improve
|
||||
readability of the code and to increase code reuse through abstraction
|
||||
of commonly used functionality.
|
||||
|
||||
.. note::
|
||||
|
||||
Please note that as of spring 2022 there is still a sizable chunk
|
||||
of legacy code in LAMMPS that has not yet been refactored to
|
||||
reflect these style conventions in full. LAMMPS has a large code
|
||||
base and many different contributors and there also is a hierarchy
|
||||
of precedence in which the code is adapted. Highest priority has
|
||||
been the code in the ``src`` folder, followed by code in packages
|
||||
in order of their popularity and complexity (simpler code is
|
||||
adapted sooner), followed by code in the ``lib`` folder. Source
|
||||
code that is downloaded from external packages or libraries during
|
||||
compilation is not subject to the conventions discussed here.
|
||||
Please note that as of spring 2023 there is still a sizable chunk of
|
||||
legacy code in LAMMPS that has not yet been refactored to reflect
|
||||
these style conventions in full. LAMMPS has a large code base and
|
||||
many contributors. There is also a hierarchy of precedence in which
|
||||
the code is adapted. Highest priority has been the code in the
|
||||
``src`` folder, followed by code in packages in order of their
|
||||
popularity and complexity (simpler code gets adapted sooner), followed
|
||||
by code in the ``lib`` folder. Source code that is downloaded from
|
||||
external packages or libraries during compilation is not subject to
|
||||
the conventions discussed here.
|
||||
|
||||
Object oriented code
|
||||
Object-oriented code
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
LAMMPS is designed to be an object oriented code. Each simulation is
|
||||
LAMMPS is designed to be an object-oriented code. Each simulation is
|
||||
represented by an instance of the LAMMPS class. When running in
|
||||
parallel each MPI process creates such an instance. This can be seen
|
||||
parallel, each MPI process creates such an instance. This can be seen
|
||||
in the ``main.cpp`` file where the core steps of running a LAMMPS
|
||||
simulation are the following 3 lines of code:
|
||||
|
||||
@ -67,14 +68,14 @@ other special features.
|
||||
The basic LAMMPS class hierarchy which is created by the LAMMPS class
|
||||
constructor is shown in :ref:`class-topology`. When input commands
|
||||
are processed, additional class instances are created, or deleted, or
|
||||
replaced. Likewise specific member functions of specific classes are
|
||||
replaced. Likewise, specific member functions of specific classes are
|
||||
called to trigger actions such creating atoms, computing forces,
|
||||
computing properties, time-propagating the system, or writing output.
|
||||
|
||||
Compositing and Inheritance
|
||||
===========================
|
||||
|
||||
LAMMPS makes extensive use of the object oriented programming (OOP)
|
||||
LAMMPS makes extensive use of the object-oriented programming (OOP)
|
||||
principles of *compositing* and *inheritance*. Classes like the
|
||||
``LAMMPS`` class are a **composite** containing pointers to instances
|
||||
of other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``,
|
||||
@ -83,7 +84,7 @@ functionality by storing and manipulating data related to the
|
||||
simulation and providing member functions that trigger certain
|
||||
actions. Some of those classes like ``Force`` are themselves
|
||||
composites, containing instances of classes describing different force
|
||||
interactions. Similarly the ``Modify`` class contains a list of
|
||||
interactions. Similarly, the ``Modify`` class contains a list of
|
||||
``Fix`` and ``Compute`` classes. If the input commands that
|
||||
correspond to these classes include the word *style*, then LAMMPS
|
||||
stores only a single instance of that class. E.g. *atom_style*,
|
||||
@ -100,19 +101,18 @@ derived class variant was instantiated. In LAMMPS these derived
|
||||
classes are often referred to as "styles", e.g. pair styles, fix
|
||||
styles, atom styles and so on.
|
||||
|
||||
This is the origin of the flexibility of LAMMPS. For example pair
|
||||
This is the origin of the flexibility of LAMMPS. For example, pair
|
||||
styles implement a variety of different non-bonded interatomic
|
||||
potentials functions. All details for the implementation of a
|
||||
potential are stored and executed in a single class.
|
||||
|
||||
As mentioned above, there can be multiple instances of classes derived
|
||||
from the ``Fix`` or ``Compute`` base classes. They represent a
|
||||
different facet of LAMMPS flexibility as they provide methods which
|
||||
can be called at different points in time within a timestep, as
|
||||
explained in `Developer_flow`. This allows the input script to tailor
|
||||
how a specific simulation is run, what diagnostic computations are
|
||||
performed, and how the output of those computations is further
|
||||
processed or output.
|
||||
different facet of LAMMPS' flexibility, as they provide methods which
|
||||
can be called at different points within a timestep, as explained in
|
||||
`Developer_flow`. This allows the input script to tailor how a specific
|
||||
simulation is run, what diagnostic computations are performed, and how
|
||||
the output of those computations is further processed or output.
|
||||
|
||||
Additional code sharing is possible by creating derived classes from the
|
||||
derived classes (e.g., to implement an accelerated version of a pair
|
||||
@ -164,15 +164,15 @@ The difference in behavior of the ``normal()`` and the ``poly()`` member
|
||||
functions is which of the two member functions is called when executing
|
||||
`base1->call()` versus `base2->call()`. Without polymorphism, a
|
||||
function within the base class can only call member functions within the
|
||||
same scope, that is ``Base::call()`` will always call
|
||||
``Base::normal()``. But for the `base2->call()` case the call of the
|
||||
same scope: that is, ``Base::call()`` will always call
|
||||
``Base::normal()``. But for the `base2->call()` case, the call of the
|
||||
virtual member function will be dispatched to ``Derived::poly()``
|
||||
instead. This mechanism means that functions are called within the
|
||||
scope of the class type that was used to *create* the class instance are
|
||||
invoked; even if they are assigned to a pointer using the type of a base
|
||||
class. This is the desired behavior and this way LAMMPS can even use
|
||||
styles that are loaded at runtime from a shared object file with the
|
||||
:doc:`plugin command <plugin>`.
|
||||
instead. This mechanism results in calling functions that are within
|
||||
the scope of the class that was used to *create* the instance, even if
|
||||
they are assigned to a pointer for their base class. This is the
|
||||
desired behavior, and this way LAMMPS can even use styles that are loaded
|
||||
at runtime from a shared object file with the :doc:`plugin command
|
||||
<plugin>`.
|
||||
|
||||
A special case of virtual functions are so-called pure functions. These
|
||||
are virtual functions that are initialized to 0 in the class declaration
|
||||
@ -189,12 +189,12 @@ This has the effect that an instance of the base class cannot be
|
||||
created and that derived classes **must** implement these functions.
|
||||
Many of the functions listed with the various class styles in the
|
||||
section :doc:`Modify` are pure functions. The motivation for this is
|
||||
to define the interface or API of the functions but defer their
|
||||
to define the interface or API of the functions, but defer their
|
||||
implementation to the derived classes.
|
||||
|
||||
However, there are downsides to this. For example, calls to virtual
|
||||
functions from within a constructor, will not be in the scope of the
|
||||
derived class and thus it is good practice to either avoid calling them
|
||||
functions from within a constructor, will *not* be in the scope of the
|
||||
derived class, and thus it is good practice to either avoid calling them
|
||||
or to provide an explicit scope such as ``Base::poly()`` or
|
||||
``Derived::poly()``. Furthermore, any destructors in classes containing
|
||||
virtual functions should be declared virtual too, so they will be
|
||||
@ -208,8 +208,8 @@ dispatch.
|
||||
that are intended to replace a virtual or pure function use the
|
||||
``override`` property keyword. For the same reason, the use of
|
||||
overloads or default arguments for virtual functions should be
|
||||
avoided as they lead to confusion over which function is supposed to
|
||||
override which and which arguments need to be declared.
|
||||
avoided, as they lead to confusion over which function is supposed to
|
||||
override which, and which arguments need to be declared.
|
||||
|
||||
Style Factories
|
||||
===============
|
||||
@ -219,10 +219,10 @@ uses a programming pattern called `Factory`. Those are functions that
|
||||
create an instance of a specific derived class, say ``PairLJCut`` and
|
||||
return a pointer to the type of the common base class of that style,
|
||||
``Pair`` in this case. To associate the factory function with the
|
||||
style keyword, an ``std::map`` class is used with function pointers
|
||||
style keyword, a ``std::map`` class is used with function pointers
|
||||
indexed by their keyword (for example "lj/cut" for ``PairLJCut`` and
|
||||
"morse" for ``PairMorse``). A couple of typedefs help keep the code
|
||||
readable and a template function is used to implement the actual
|
||||
readable, and a template function is used to implement the actual
|
||||
factory functions for the individual classes. Below is an example
|
||||
of such a factory function from the ``Force`` class as declared in
|
||||
``force.h`` and implemented in ``force.cpp``. The file ``style_pair.h``
|
||||
@ -279,26 +279,26 @@ from and writing to files and console instead of C++ "iostreams".
|
||||
This is mainly motivated by better performance, better control over
|
||||
formatting, and less effort to achieve specific formatting.
|
||||
|
||||
Since mixing "stdio" and "iostreams" can lead to unexpected
|
||||
behavior. use of the latter is strongly discouraged. Also output to
|
||||
the screen should not use the predefined ``stdout`` FILE pointer, but
|
||||
rather the ``screen`` and ``logfile`` FILE pointers managed by the
|
||||
LAMMPS class. Furthermore, output should generally only be done by
|
||||
MPI rank 0 (``comm->me == 0``). Output that is sent to both
|
||||
``screen`` and ``logfile`` should use the :cpp:func:`utils::logmesg()
|
||||
convenience function <LAMMPS_NS::utils::logmesg>`.
|
||||
Since mixing "stdio" and "iostreams" can lead to unexpected behavior,
|
||||
use of the latter is strongly discouraged. Output to the screen should
|
||||
*not* use the predefined ``stdout`` FILE pointer, but rather the
|
||||
``screen`` and ``logfile`` FILE pointers managed by the LAMMPS class.
|
||||
Furthermore, output should generally only be done by MPI rank 0
|
||||
(``comm->me == 0``). Output that is sent to both ``screen`` and
|
||||
``logfile`` should use the :cpp:func:`utils::logmesg() convenience
|
||||
function <LAMMPS_NS::utils::logmesg>`.
|
||||
|
||||
We also discourage the use of stringstreams because the bundled {fmt}
|
||||
library and the customized tokenizer classes can provide the same
|
||||
functionality in a cleaner way with better performance. This also
|
||||
helps maintain a consistent programming syntax with code from many
|
||||
different contributors.
|
||||
We discourage the use of stringstreams because the bundled {fmt} library
|
||||
and the customized tokenizer classes provide the same functionality in a
|
||||
cleaner way with better performance. This also helps maintain a
|
||||
consistent programming syntax with code from many different
|
||||
contributors.
|
||||
|
||||
Formatting with the {fmt} library
|
||||
===================================
|
||||
|
||||
The LAMMPS source code includes a copy of the `{fmt} library
|
||||
<https://fmt.dev>`_ which is preferred over formatting with the
|
||||
<https://fmt.dev>`_, which is preferred over formatting with the
|
||||
"printf()" family of functions. The primary reason is that it allows
|
||||
a typesafe default format for any type of supported data. This is
|
||||
particularly useful for formatting integers of a given size (32-bit or
|
||||
@ -313,17 +313,16 @@ been included into the C++20 language standard, so changes to adopt it
|
||||
are future-proof.
|
||||
|
||||
Formatted strings are frequently created by calling the
|
||||
``fmt::format()`` function which will return a string as a
|
||||
``std::string`` class instance. In contrast to the ``%`` placeholder
|
||||
in ``printf()``, the {fmt} library uses ``{}`` to embed format
|
||||
descriptors. In the simplest case, no additional characters are
|
||||
needed as {fmt} will choose the default format based on the data type
|
||||
of the argument. Otherwise the ``fmt::print()`` function may be
|
||||
used instead of ``printf()`` or ``fprintf()``. In addition, several
|
||||
LAMMPS output functions, that originally accepted a single string as
|
||||
argument have been overloaded to accept a format string with optional
|
||||
arguments as well (e.g., ``Error::all()``, ``Error::one()``,
|
||||
``utils::logmesg()``).
|
||||
``fmt::format()`` function, which will return a string as a
|
||||
``std::string`` class instance. In contrast to the ``%`` placeholder in
|
||||
``printf()``, the {fmt} library uses ``{}`` to embed format descriptors.
|
||||
In the simplest case, no additional characters are needed, as {fmt} will
|
||||
choose the default format based on the data type of the argument.
|
||||
Otherwise, the ``fmt::print()`` function may be used instead of
|
||||
``printf()`` or ``fprintf()``. In addition, several LAMMPS output
|
||||
functions, that originally accepted a single string as argument have
|
||||
been overloaded to accept a format string with optional arguments as
|
||||
well (e.g., ``Error::all()``, ``Error::one()``, ``utils::logmesg()``).
|
||||
|
||||
Summary of the {fmt} format syntax
|
||||
==================================
|
||||
@ -332,10 +331,11 @@ The syntax of the format string is "{[<argument id>][:<format spec>]}",
|
||||
where either the argument id or the format spec (separated by a colon
|
||||
':') is optional. The argument id is usually a number starting from 0
|
||||
that is the index to the arguments following the format string. By
|
||||
default these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The most
|
||||
common case for using argument id would be to use the same argument in
|
||||
multiple places in the format string without having to provide it as an
|
||||
argument multiple times. In LAMMPS the argument id is rarely used.
|
||||
default, these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The
|
||||
most common case for using argument id would be to use the same argument
|
||||
in multiple places in the format string without having to provide it as
|
||||
an argument multiple times. The argument id is rarely used in the LAMMPS
|
||||
source code.
|
||||
|
||||
More common is the use of a format specifier, which starts with a colon.
|
||||
This may optionally be followed by a fill character (default is ' '). If
|
||||
@ -347,18 +347,19 @@ width, which may be followed by a dot '.' and a precision for floating
|
||||
point numbers. The final character in the format string would be an
|
||||
indicator for the "presentation", i.e. 'd' for decimal presentation of
|
||||
integers, 'x' for hexadecimal, 'o' for octal, 'c' for character etc.
|
||||
This mostly follows the "printf()" scheme but without requiring an
|
||||
This mostly follows the "printf()" scheme, but without requiring an
|
||||
additional length parameter to distinguish between different integer
|
||||
widths. The {fmt} library will detect those and adapt the formatting
|
||||
accordingly. For floating point numbers there are correspondingly, 'g'
|
||||
for generic presentation, 'e' for exponential presentation, and 'f' for
|
||||
fixed point presentation.
|
||||
|
||||
Thus "{:8}" would represent *any* type argument using at least 8
|
||||
characters; "{:<8}" would do this as left aligned, "{:^8}" as centered,
|
||||
"{:>8}" as right aligned. If a specific presentation is selected, the
|
||||
argument type must be compatible or else the {fmt} formatting code will
|
||||
throw an exception. Some format string examples are given below:
|
||||
The format string "{:8}" would thus represent *any* type argument and be
|
||||
replaced by at least 8 characters; "{:<8}" would do this as left
|
||||
aligned, "{:^8}" as centered, "{:>8}" as right aligned. If a specific
|
||||
presentation is selected, the argument type must be compatible or else
|
||||
the {fmt} formatting code will throw an exception. Some format string
|
||||
examples are given below:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
@ -392,12 +393,12 @@ documentation <https://fmt.dev/latest/syntax.html>`_ website.
|
||||
Memory management
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
Dynamical allocation of small data and objects can be done with the
|
||||
the C++ commands "new" and "delete/delete[]. Large data should use
|
||||
the member functions of the ``Memory`` class, most commonly,
|
||||
``Memory::create()``, ``Memory::grow()``, and ``Memory::destroy()``,
|
||||
which provide variants for vectors, 2d arrays, 3d arrays, etc.
|
||||
These can also be used for small data.
|
||||
Dynamical allocation of small data and objects can be done with the C++
|
||||
commands "new" and "delete/delete[]". Large data should use the member
|
||||
functions of the ``Memory`` class, most commonly, ``Memory::create()``,
|
||||
``Memory::grow()``, and ``Memory::destroy()``, which provide variants
|
||||
for vectors, 2d arrays, 3d arrays, etc. These can also be used for
|
||||
small data.
|
||||
|
||||
The use of ``malloc()``, ``calloc()``, ``realloc()`` and ``free()``
|
||||
directly is strongly discouraged. To simplify adapting legacy code
|
||||
@ -408,26 +409,24 @@ perform additional error checks for safety.
|
||||
Use of these custom memory allocation functions is motivated by the
|
||||
following considerations:
|
||||
|
||||
- memory allocation failures on *any* MPI rank during a parallel run
|
||||
will trigger an immediate abort of the entire parallel calculation
|
||||
instead of stalling it
|
||||
- a failing "new" will trigger an exception which is also captured by
|
||||
LAMMPS and triggers a global abort
|
||||
- allocation of multi-dimensional arrays will be done in a C compatible
|
||||
fashion but so that the storage of the actual data is stored in one
|
||||
large contiguous block. Thus when MPI communication is needed,
|
||||
- Memory allocation failures on *any* MPI rank during a parallel run
|
||||
will trigger an immediate abort of the entire parallel calculation.
|
||||
- A failing "new" will trigger an exception, which is also captured by
|
||||
LAMMPS and triggers a global abort.
|
||||
- Allocation of multidimensional arrays will be done in a C compatible
|
||||
fashion, but such that the storage of the actual data is stored in one
|
||||
large contiguous block. Thus, when MPI communication is needed,
|
||||
the data can be communicated directly (similar to Fortran arrays).
|
||||
- the "destroy()" and "sfree()" functions may safely be called on NULL
|
||||
pointers
|
||||
- the "destroy()" functions will nullify the pointer variables making
|
||||
"use after free" errors easy to detect
|
||||
- it is possible to use a larger than default memory alignment (not on
|
||||
- The "destroy()" and "sfree()" functions may safely be called on NULL
|
||||
pointers.
|
||||
- The "destroy()" functions will nullify the pointer variables, thus
|
||||
making "use after free" errors easy to detect.
|
||||
- It is possible to use a larger than default memory alignment (not on
|
||||
all operating systems, since the allocated storage pointers must be
|
||||
compatible with ``free()`` for technical reasons)
|
||||
compatible with ``free()`` for technical reasons).
|
||||
|
||||
In the practical implementation of code this means that any pointer
|
||||
variables that are class members should be initialized to a
|
||||
``nullptr`` value in their respective constructors. That way it is
|
||||
safe to call ``Memory::destroy()`` or ``delete[]`` on them before
|
||||
*any* allocation outside the constructor. This helps prevent memory
|
||||
leaks.
|
||||
In the practical implementation of code this means, that any pointer
|
||||
variables, that are class members should be initialized to a ``nullptr``
|
||||
value in their respective constructors. That way, it is safe to call
|
||||
``Memory::destroy()`` or ``delete[]`` on them before *any* allocation
|
||||
outside the constructor. This helps prevent memory leaks.
|
||||
|
||||
@ -28,7 +28,7 @@ The need to do this communication arises when data from the owned atoms
|
||||
is updated (e.g. their positions) and this updated information needs to
|
||||
be **copied** to the corresponding ghost atoms.
|
||||
|
||||
And second, *reverse communication* which sends ghost atom information
|
||||
And second, *reverse communication*, which sends ghost atom information
|
||||
from each processor to the owning processor to **accumulate** (sum)
|
||||
the values with the corresponding owned atoms. The need for this
|
||||
arises when data is computed and also stored with ghost atoms
|
||||
@ -58,7 +58,7 @@ embedded-atom method (EAM) which compute intermediate values in the
|
||||
first part of the compute() function that need to be stored by both
|
||||
owned and ghost atoms for the second part of the force computation.
|
||||
The *Comm* class methods perform the MPI communication for buffers of
|
||||
per-atom data. They "call back" to the *Pair* class so it can *pack*
|
||||
per-atom data. They "call back" to the *Pair* class, so it can *pack*
|
||||
or *unpack* the buffer with data the *Pair* class owns. There are 4
|
||||
such methods that the *Pair* class must define, assuming it uses both
|
||||
forward and reverse communication:
|
||||
@ -70,7 +70,7 @@ forward and reverse communication:
|
||||
|
||||
The arguments to these methods include the buffer and a list of atoms
|
||||
to pack or unpack. The *Pair* class also must set the *comm_forward*
|
||||
and *comm_reverse* variables which store the number of values stored
|
||||
and *comm_reverse* variables, which store the number of values stored
|
||||
in the communication buffers for each operation. This means, if
|
||||
desired, it can choose to store multiple per-atom values in the
|
||||
buffer, and they will be communicated together to minimize
|
||||
@ -81,11 +81,11 @@ containing ``double`` values. To correctly store integers that may be
|
||||
|
||||
The *Fix*, *Compute*, and *Dump* classes can also invoke the same kind
|
||||
of forward and reverse communication operations using the same *Comm*
|
||||
class methods. Likewise the same pack/unpack methods and
|
||||
class methods. Likewise, the same pack/unpack methods and
|
||||
comm_forward/comm_reverse variables must be defined by the calling
|
||||
*Fix*, *Compute*, or *Dump* class.
|
||||
|
||||
For *Fix* classes there is an optional second argument to the
|
||||
For *Fix* classes, there is an optional second argument to the
|
||||
*forward_comm()* and *reverse_comm()* call which can be used when the
|
||||
fix performs multiple modes of communication, with different numbers
|
||||
of values per atom. The fix should set the *comm_forward* and
|
||||
@ -150,7 +150,7 @@ latter case, when the *ring* operation is complete, each processor can
|
||||
examine its original buffer to extract modified values.
|
||||
|
||||
Note that the *ring* operation is similar to an MPI_Alltoall()
|
||||
operation where every processor effectively sends and receives data to
|
||||
operation, where every processor effectively sends and receives data to
|
||||
every other processor. The difference is that the *ring* operation
|
||||
does it one step at a time, so the total volume of data does not need
|
||||
to be stored by every processor. However, the *ring* operation is
|
||||
@ -184,8 +184,8 @@ The *exchange_data()* method triggers the communication to be
|
||||
performed. Each processor provides the vector of *N* datums to send,
|
||||
and the size of each datum. All datums must be the same size.
|
||||
|
||||
The *create_atom()* and *exchange_atom()* methods are similar except
|
||||
that the size of each datum can be different. Typically this is used
|
||||
The *create_atom()* and *exchange_atom()* methods are similar, except
|
||||
that the size of each datum can be different. Typically, this is used
|
||||
to communicate atoms, each with a variable amount of per-atom data, to
|
||||
other processors.
|
||||
|
||||
|
||||
@ -45,9 +45,9 @@ other methods in the class.
|
||||
zero before each timestep, so that forces (torques, etc) can be
|
||||
accumulated.
|
||||
|
||||
Now for the ``Verlet::run()`` method. Its basic structure in hi-level pseudo
|
||||
code is shown below. In the actual code in ``src/verlet.cpp`` some of
|
||||
these operations are conditionally invoked.
|
||||
Now for the ``Verlet::run()`` method. Its basic structure in hi-level
|
||||
pseudocode is shown below. In the actual code in ``src/verlet.cpp``
|
||||
some of these operations are conditionally invoked.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@ -105,17 +105,17 @@ need it. These flags are passed to the various methods that compute
|
||||
particle interactions, so that they either compute and tally the
|
||||
corresponding data or can skip the extra calculations if the energy and
|
||||
virial are not needed. See the comments for the ``Integrate::ev_set()``
|
||||
method which document the flag values.
|
||||
method, which document the flag values.
|
||||
|
||||
At various points of the timestep, fixes are invoked,
|
||||
e.g. ``fix->initial_integrate()``. In the code, this is actually done
|
||||
via the Modify class which stores all the Fix objects and lists of which
|
||||
via the Modify class, which stores all the Fix objects and lists of which
|
||||
should be invoked at what point in the timestep. Fixes are the LAMMPS
|
||||
mechanism for tailoring the operations of a timestep for a particular
|
||||
simulation. As described elsewhere, each fix has one or more methods,
|
||||
each of which is invoked at a specific stage of the timestep, as show in
|
||||
the timestep pseudo-code. All the active fixes defined in an input
|
||||
script, that are flagged to have an ``initial_integrate()`` method are
|
||||
the timestep pseudocode. All the active fixes defined in an input
|
||||
script, that are flagged to have an ``initial_integrate()`` method, are
|
||||
invoked at the beginning of each timestep. Examples are :doc:`fix nve
|
||||
<fix_nve>` or :doc:`fix nvt or fix npt <fix_nh>` which perform the
|
||||
start-of-timestep velocity-Verlet integration operations to update
|
||||
@ -131,9 +131,9 @@ can be changed using the :doc:`neigh_modify every/delay/check
|
||||
<neigh_modify>` command. If not, coordinates of ghost atoms are
|
||||
acquired by each processor via the ``forward_comm()`` method of the Comm
|
||||
class. If neighbor lists need to be built, several operations within
|
||||
the inner if clause of the pseudo-code are first invoked. The
|
||||
the inner if clause of the pseudocode are first invoked. The
|
||||
``pre_exchange()`` method of any defined fixes is invoked first.
|
||||
Typically this inserts or deletes particles from the system.
|
||||
Typically, this inserts or deletes particles from the system.
|
||||
|
||||
Periodic boundary conditions are then applied by the Domain class via
|
||||
its ``pbc()`` method to remap particles that have moved outside the
|
||||
@ -148,7 +148,7 @@ The box boundaries are then reset (if needed) via the ``reset_box()``
|
||||
method of the Domain class, e.g. if box boundaries are shrink-wrapped to
|
||||
current particle coordinates. A change in the box size or shape
|
||||
requires internal information for communicating ghost atoms (Comm class)
|
||||
and neighbor list bins (Neighbor class) be updated. The ``setup()``
|
||||
and neighbor list bins (Neighbor class) to be updated. The ``setup()``
|
||||
method of the Comm class and ``setup_bins()`` method of the Neighbor
|
||||
class perform the update.
|
||||
|
||||
@ -217,20 +217,21 @@ file, and restart files. See the :doc:`thermo_style <thermo_style>`,
|
||||
:doc:`dump <dump>`, and :doc:`restart <restart>` commands for more
|
||||
details.
|
||||
|
||||
The the flow of control during energy minimization iterations is
|
||||
similar to that of a molecular dynamics timestep. Forces are computed,
|
||||
neighbor lists are built as needed, atoms migrate to new processors, and
|
||||
atom coordinates and forces are communicated to neighboring processors.
|
||||
The only difference is what Fix class operations are invoked when. Only
|
||||
a subset of LAMMPS fixes are useful during energy minimization, as
|
||||
The flow of control during energy minimization iterations is similar to
|
||||
that of a molecular dynamics timestep. Forces are computed, neighbor
|
||||
lists are built as needed, atoms migrate to new processors, and atom
|
||||
coordinates and forces are communicated to neighboring processors. The
|
||||
only difference is what Fix class operations are invoked when. Only a
|
||||
subset of LAMMPS fixes are useful during energy minimization, as
|
||||
explained in their individual doc pages. The relevant Fix class methods
|
||||
are ``min_pre_exchange()``, ``min_pre_force()``, and ``min_post_force()``.
|
||||
Each fix is invoked at the appropriate place within the minimization
|
||||
iteration. For example, the ``min_post_force()`` method is analogous to
|
||||
the ``post_force()`` method for dynamics; it is used to alter or constrain
|
||||
forces on each atom, which affects the minimization procedure.
|
||||
are ``min_pre_exchange()``, ``min_pre_force()``, and
|
||||
``min_post_force()``. Each fix is invoked at the appropriate place
|
||||
within the minimization iteration. For example, the
|
||||
``min_post_force()`` method is analogous to the ``post_force()`` method
|
||||
for dynamics; it is used to alter or constrain forces on each atom,
|
||||
which affects the minimization procedure.
|
||||
|
||||
After all iterations are completed there is a ``cleanup`` step which
|
||||
After all iterations are completed, there is a ``cleanup`` step which
|
||||
calls the ``post_run()`` method of fixes to perform operations only required
|
||||
at the end of a calculations (like freeing temporary storage or creating
|
||||
at the end of a calculation (like freeing temporary storage or creating
|
||||
final outputs).
|
||||
|
||||
@ -70,7 +70,7 @@ A command can define multiple grids, each of a different size. Each
|
||||
grid is an instantiation of the Grid2d or Grid3d class.
|
||||
|
||||
The command also defines what data it will store for each grid it
|
||||
creates and it allocates the multi-dimensional array(s) needed to
|
||||
creates and it allocates the multidimensional array(s) needed to
|
||||
store the data. No grid cell data is stored within the Grid2d or
|
||||
Grid3d classes.
|
||||
|
||||
@ -115,7 +115,7 @@ which stores *Nvalues* per grid cell.
|
||||
nyhi_out, nxlo_out, nxhi_out, nvalues,
|
||||
"data3d_multi");
|
||||
|
||||
Note that these multi-dimensional arrays are allocated as contiguous
|
||||
Note that these multidimensional arrays are allocated as contiguous
|
||||
chunks of memory where the x-index of the grid varies fastest, then y,
|
||||
and the z-index slowest. For multiple values per grid cell, the
|
||||
Nvalues are contiguous, so their index varies even faster than the
|
||||
@ -798,7 +798,7 @@ A value of -1 is returned if the data name is not recognized.
|
||||
The *get_griddata_by_index()* method is called after the
|
||||
*get_griddata_by_name()* method, using the data index it returned as
|
||||
its argument. This method will return a pointer to the
|
||||
multi-dimensional array which stores the requested data.
|
||||
multidimensional array which stores the requested data.
|
||||
|
||||
As in the discussion above of the Memory class *create_offset()*
|
||||
methods, the dimensionality of the array associated with the returned
|
||||
|
||||
@ -15,7 +15,7 @@ for more information about that part of the build process. LAMMPS
|
||||
currently supports building with :doc:`conventional makefiles
|
||||
<Build_make>` and through :doc:`CMake <Build_cmake>`. Those procedures
|
||||
differ in how packages are enabled or disabled for inclusion into a
|
||||
LAMMPS binary so they cannot be mixed. The source files for each
|
||||
LAMMPS binary, so they cannot be mixed. The source files for each
|
||||
package are in all-uppercase subdirectories of the ``src`` folder, for
|
||||
example ``src/MOLECULE`` or ``src/EXTRA-MOLECULE``. The ``src/STUBS``
|
||||
subdirectory is not a package but contains a dummy MPI library, that is
|
||||
@ -26,31 +26,31 @@ with traditional makefiles.
|
||||
|
||||
The ``lib`` directory contains the source code for several supporting
|
||||
libraries or files with configuration settings to use globally installed
|
||||
libraries, that are required by some of the optional packages. They may
|
||||
libraries, that are required by some optional packages. They may
|
||||
include python scripts that can transparently download additional source
|
||||
code on request. Each subdirectory, like ``lib/poems`` or ``lib/gpu``,
|
||||
contains the source files, some of which are in different languages such
|
||||
as Fortran or CUDA. These libraries included in the LAMMPS build,
|
||||
if the corresponding package is installed.
|
||||
as Fortran or CUDA. These libraries included in the LAMMPS build, if the
|
||||
corresponding package is installed.
|
||||
|
||||
LAMMPS C++ source files almost always come in pairs, such as
|
||||
``src/run.cpp`` (implementation file) and ``src/run.h`` (header file).
|
||||
Each pair of files defines a C++ class, for example the
|
||||
:cpp:class:`LAMMPS_NS::Run` class which contains the code invoked by the
|
||||
:doc:`run <run>` command in a LAMMPS input script. As this example
|
||||
:cpp:class:`LAMMPS_NS::Run` class, which contains the code invoked by
|
||||
the :doc:`run <run>` command in a LAMMPS input script. As this example
|
||||
illustrates, source file and class names often have a one-to-one
|
||||
correspondence with a command used in a LAMMPS input script. Some
|
||||
source files and classes do not have a corresponding input script
|
||||
command, e.g. ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force`
|
||||
command, for example ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force`
|
||||
class. They are discussed in the next section.
|
||||
|
||||
The names of all source files are in lower case and may use the
|
||||
underscore character '_' to separate words. Outside of bundled libraries
|
||||
which may have different conventions, all C and C++ header files have a
|
||||
``.h`` extension, all C++ files have a ``.cpp`` extension, and C files a
|
||||
``.c`` extension. A small number of C++ classes and utility functions
|
||||
are implemented with only a ``.h`` file. Examples are the Pointers and
|
||||
Commands classes or the MathVec functions.
|
||||
underscore character '_' to separate words. Apart from bundled,
|
||||
externally maintained libraries, which may have different conventions,
|
||||
all C and C++ header files have a ``.h`` extension, all C++ files have a
|
||||
``.cpp`` extension, and C files a ``.c`` extension. A few C++ classes
|
||||
and utility functions are implemented with only a ``.h`` file. Examples
|
||||
are the Pointers and Commands classes or the MathVec functions.
|
||||
|
||||
Class topology
|
||||
--------------
|
||||
@ -62,35 +62,36 @@ associated source files in the ``src`` folder, for example the class
|
||||
:cpp:class:`LAMMPS_NS::Memory` corresponds to the files ``memory.cpp``
|
||||
and ``memory.h``, or the class :cpp:class:`LAMMPS_NS::AtomVec`
|
||||
corresponds to the files ``atom_vec.cpp`` and ``atom_vec.h``. Full
|
||||
lines in the figure represent compositing: that is the class at the base
|
||||
of the arrow holds a pointer to an instance of the class at the tip.
|
||||
Dashed lines instead represent inheritance: the class to the tip of the
|
||||
arrow is derived from the class at the base. Classes with a red boundary
|
||||
are not instantiated directly, but they represent the base classes for
|
||||
"styles". Those "styles" make up the bulk of the LAMMPS code and only
|
||||
a few representative examples are included in the figure so it remains
|
||||
readable.
|
||||
lines in the figure represent compositing: that is, the class at the
|
||||
base of the arrow holds a pointer to an instance of the class at the
|
||||
tip. Dashed lines instead represent inheritance: the class at the tip
|
||||
of the arrow is derived from the class at the base. Classes with a red
|
||||
boundary are not instantiated directly, but they represent the base
|
||||
classes for "styles". Those "styles" make up the bulk of the LAMMPS
|
||||
code and only a few representative examples are included in the figure,
|
||||
so it remains readable.
|
||||
|
||||
.. _class-topology:
|
||||
.. figure:: JPG/lammps-classes.png
|
||||
|
||||
LAMMPS class topology
|
||||
|
||||
This figure shows some of the relations of the base classes of the
|
||||
LAMMPS simulation package. Full lines indicate that a class holds an
|
||||
This figure shows relations of base classes of the LAMMPS
|
||||
simulation package. Full lines indicate that a class holds an
|
||||
instance of the class it is pointing to; dashed lines point to
|
||||
derived classes that are given as examples of what classes may be
|
||||
instantiated during a LAMMPS run based on the input commands and
|
||||
accessed through the API define by their respective base classes. At
|
||||
the core is the :cpp:class:`LAMMPS <LAMMPS_NS::LAMMPS>` class, which
|
||||
holds pointers to class instances with specific purposes. Those may
|
||||
hold instances of other classes, sometimes directly, or only
|
||||
temporarily, sometimes as derived classes or derived classes of
|
||||
derived classes, which may also hold instances of other classes.
|
||||
accessed through the API define by their respective base classes.
|
||||
At the core is the :cpp:class:`LAMMPS <LAMMPS_NS::LAMMPS>` class,
|
||||
which holds pointers to class instances with specific purposes.
|
||||
Those may hold instances of other classes, sometimes directly, or
|
||||
only temporarily, sometimes as derived classes or derived classes
|
||||
of derived classes, which may also hold instances of other
|
||||
classes.
|
||||
|
||||
The :cpp:class:`LAMMPS_NS::LAMMPS` class is the topmost class and
|
||||
represents what is generally referred to an "instance" of LAMMPS. It is
|
||||
a composite holding pointers to instances of other core classes
|
||||
represents what is generally referred to as an "instance of LAMMPS". It
|
||||
is a composite holding pointers to instances of other core classes
|
||||
providing the core functionality of the MD engine in LAMMPS and through
|
||||
them abstractions of the required operations. The constructor of the
|
||||
LAMMPS class will instantiate those instances, process the command line
|
||||
@ -102,42 +103,44 @@ LAMMPS while passing it the command line flags and input script. It
|
||||
deletes the LAMMPS instance after the method reading the input returns
|
||||
and shuts down the MPI environment before it exits the executable.
|
||||
|
||||
The :cpp:class:`LAMMPS_NS::Pointers` is not shown in the
|
||||
The :cpp:class:`LAMMPS_NS::Pointers` class is not shown in the
|
||||
:ref:`class-topology` figure for clarity. It holds references to many
|
||||
of the members of the `LAMMPS_NS::LAMMPS`, so that all classes derived
|
||||
from :cpp:class:`LAMMPS_NS::Pointers` have direct access to those
|
||||
reference. From the class topology all classes with blue boundary are
|
||||
references. From the class topology all classes with blue boundary are
|
||||
referenced in the Pointers class and all classes in the second and third
|
||||
columns, that are not listed as derived classes are instead derived from
|
||||
:cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer references
|
||||
in Pointers, a pointer to the LAMMPS class instance needs to be passed
|
||||
to the constructor and thus all constructors for classes derived from it
|
||||
must do so and pass this pointer to the constructor for Pointers.
|
||||
columns, that are not listed as derived classes, are instead derived
|
||||
from :cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer
|
||||
references in Pointers, a pointer to the LAMMPS class instance needs to
|
||||
be passed to the constructor. All constructors for classes derived from
|
||||
it, must do so and thus pass that pointer to the constructor for
|
||||
:cpp:class:`LAMMPS_NS::Pointers`. The default constructor for
|
||||
:cpp:class:`LAMMPS_NS::Pointers` is disabled to enforce this.
|
||||
|
||||
Since all storage is supposed to be encapsulated (there are a few
|
||||
exceptions), the LAMMPS class can also be instantiated multiple times by
|
||||
a calling code. Outside of the aforementioned exceptions, those LAMMPS
|
||||
a calling code. Outside the aforementioned exceptions, those LAMMPS
|
||||
instances can be used alternately. As of the time of this writing
|
||||
(early 2021) LAMMPS is not yet sufficiently thread-safe for concurrent
|
||||
(early 2023) LAMMPS is not yet sufficiently thread-safe for concurrent
|
||||
execution. When running in parallel with MPI, care has to be taken,
|
||||
that suitable copies of communicators are used to not create conflicts
|
||||
between different instances.
|
||||
|
||||
The LAMMPS class currently (early 2021) holds instances of 19 classes
|
||||
representing the core functionality. There are a handful of virtual
|
||||
parent classes in LAMMPS that define what LAMMPS calls ``styles``. They
|
||||
are shaded red in the :ref:`class-topology` figure. Each of these are
|
||||
parents of a number of child classes that implement the interface
|
||||
defined by the parent class. There are two main categories of these
|
||||
``styles``: some may only have one instance active at a time (e.g. atom,
|
||||
pair, bond, angle, dihedral, improper, kspace, comm) and there is a
|
||||
dedicated pointer variable for each of them in the composite class.
|
||||
The LAMMPS class currently holds instances of 19 classes representing
|
||||
the core functionality. There are a handful of virtual parent classes
|
||||
in LAMMPS that define what LAMMPS calls ``styles``. These are shaded
|
||||
red in the :ref:`class-topology` figure. Each of these are parents of a
|
||||
number of child classes that implement the interface defined by the
|
||||
parent class. There are two main categories of these ``styles``: some
|
||||
may only have one instance active at a time (e.g. atom, pair, bond,
|
||||
angle, dihedral, improper, kspace, comm) and there is a dedicated
|
||||
pointer variable for each of them in the corresponding composite class.
|
||||
Setups that require a mix of different such styles have to use a
|
||||
*hybrid* class that takes the place of the one allowed instance and then
|
||||
manages and forwards calls to the corresponding sub-styles for the
|
||||
designated subset of atoms or data. The composite class may also have
|
||||
lists of class instances, e.g. Modify handles lists of compute and fix
|
||||
styles, while Output handles a list of dump class instances.
|
||||
*hybrid* class instance that acts as a proxy, and manages and forwards
|
||||
calls to the corresponding sub-style class instances for the designated
|
||||
subset of atoms or data. The composite class may also have lists of
|
||||
class instances, e.g. ``Modify`` handles lists of compute and fix
|
||||
styles, while ``Output`` handles a list of dump class instances.
|
||||
|
||||
The exception to this scheme are the ``command`` style classes. These
|
||||
implement specific commands that can be invoked before, after, or in
|
||||
@ -146,19 +149,19 @@ command() method called and then, after completion, the class instance
|
||||
deleted. Examples for this are the create_box, create_atoms, minimize,
|
||||
run, set, or velocity command styles.
|
||||
|
||||
For all those ``styles`` certain naming conventions are employed: for
|
||||
For all those ``styles``, certain naming conventions are employed: for
|
||||
the fix nve command the class is called FixNVE and the source files are
|
||||
``fix_nve.h`` and ``fix_nve.cpp``. Similarly for fix ave/time we have
|
||||
``fix_nve.h`` and ``fix_nve.cpp``. Similarly, for fix ave/time we have
|
||||
FixAveTime and ``fix_ave_time.h`` and ``fix_ave_time.cpp``. Style names
|
||||
are lower case and without spaces or special characters. A suffix or
|
||||
words are appended with a forward slash '/' which denotes a variant of
|
||||
the corresponding class without the suffix. To connect the style name
|
||||
and the class name, LAMMPS uses macros like: ``AtomStyle()``,
|
||||
``PairStyle()``, ``BondStyle()``, ``RegionStyle()``, and so on in the
|
||||
corresponding header file. During configuration or compilation files
|
||||
corresponding header file. During configuration or compilation, files
|
||||
with the pattern ``style_<name>.h`` are created that consist of a list
|
||||
of include statements including all headers of all styles of a given
|
||||
type that are currently active (or "installed).
|
||||
type that are currently enabled (or "installed").
|
||||
|
||||
|
||||
More details on individual classes in the :ref:`class-topology` are as
|
||||
@ -172,8 +175,8 @@ follows:
|
||||
that one or multiple simulations can be run, on the processors
|
||||
allocated for a run, e.g. by the mpirun command.
|
||||
|
||||
- The Input class reads and processes input input strings and files,
|
||||
stores variables, and invokes :doc:`commands <Commands_all>`.
|
||||
- The Input class reads and processes input (strings and files), stores
|
||||
variables, and invokes :doc:`commands <Commands_all>`.
|
||||
|
||||
- Command style classes are derived from the Command class. They provide
|
||||
input script commands that perform one-time operations
|
||||
@ -192,7 +195,7 @@ follows:
|
||||
- The Atom class stores per-atom properties associated with atom styles.
|
||||
More precisely, they are allocated and managed by a class derived from
|
||||
the AtomVec class, and the Atom class simply stores pointers to them.
|
||||
The classes derived from AtomVec represent the different atom styles
|
||||
The classes derived from AtomVec represent the different atom styles,
|
||||
and they are instantiated through the :doc:`atom_style <atom_style>`
|
||||
command.
|
||||
|
||||
@ -207,17 +210,21 @@ follows:
|
||||
instance is created by pair, fix, or compute styles when they need a
|
||||
particular kind of neighbor list and use the NeighRequest properties
|
||||
to select the neighbor list settings for the given request. There can
|
||||
be multiple instances of the NeighRequest class and the Neighbor class
|
||||
will try to optimize how they are computed by creating copies or
|
||||
sub-lists where possible.
|
||||
be multiple instances of the NeighRequest class. The Neighbor class
|
||||
will try to optimize how the requests are processed. Depending on the
|
||||
NeighRequest properties, neighbor lists are constructed from scratch,
|
||||
aliased, or constructed by post-processing an existing list into
|
||||
sub-lists.
|
||||
|
||||
- The Comm class performs inter-processor communication, typically of
|
||||
ghost atom information. This usually involves MPI message exchanges
|
||||
with 6 neighboring processors in the 3d logical grid of processors
|
||||
mapped to the simulation box. There are two :doc:`communication styles
|
||||
<comm_style>` enabling different ways to do the domain decomposition.
|
||||
Sometimes the Irregular class is used, when atoms may migrate to
|
||||
arbitrary processors.
|
||||
<comm_style>`, enabling different ways to perform the domain
|
||||
decomposition.
|
||||
|
||||
- The Irregular class is used, when atoms may migrate to arbitrary
|
||||
processors.
|
||||
|
||||
- The Domain class stores the simulation box geometry, as well as
|
||||
geometric Regions and any user definition of a Lattice. The latter
|
||||
@ -246,7 +253,7 @@ follows:
|
||||
file, dump file snapshots, and restart files. These correspond to the
|
||||
:doc:`Thermo <thermo_style>`, :doc:`Dump <dump>`, and
|
||||
:doc:`WriteRestart <write_restart>` classes respectively. The Dump
|
||||
class is a base class with several derived classes implementing
|
||||
class is a base class, with several derived classes implementing
|
||||
various dump style variants.
|
||||
|
||||
- The Timer class logs timing information, output at the end
|
||||
|
||||
@ -1,12 +1,12 @@
|
||||
Communication
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
Following the partitioning scheme in use all per-atom data is
|
||||
Following the selected partitioning scheme, all per-atom data is
|
||||
distributed across the MPI processes, which allows LAMMPS to handle very
|
||||
large systems provided it uses a correspondingly large number of MPI
|
||||
processes. Since The per-atom data (atom IDs, positions, velocities,
|
||||
types, etc.) To be able to compute the short-range interactions MPI
|
||||
processes need not only access to data of atoms they "own" but also
|
||||
types, etc.) To be able to compute the short-range interactions, MPI
|
||||
processes need not only access to the data of atoms they "own" but also
|
||||
information about atoms from neighboring subdomains, in LAMMPS referred
|
||||
to as "ghost" atoms. These are copies of atoms storing required
|
||||
per-atom data for up to the communication cutoff distance. The green
|
||||
@ -59,7 +59,7 @@ and upper *y*-boundary of rank 0's subdomain. In the *y* stage, ranks
|
||||
4,5,6 send atoms in their blue-shaded regions to rank 0. This may
|
||||
include ghost atoms they received in the *x* stage, but only if they
|
||||
are needed by rank 0 to fill its extended ghost atom regions in the
|
||||
+/-*y* directions (blue rectangles). Thus in this case, ranks 5 and
|
||||
+/-*y* directions (blue rectangles). Thus, in this case, ranks 5 and
|
||||
6 do not include ghost atoms they received from each other (in the *x*
|
||||
stage) in the atoms they send to rank 0. The key point is that while
|
||||
the pattern of communication is more complex in the irregular
|
||||
@ -78,14 +78,14 @@ A "reverse" communication is when computed ghost atom attributes are
|
||||
sent back to the processor who owns the atom. This is used (for
|
||||
example) to sum partial forces on ghost atoms to the complete force on
|
||||
owned atoms. The order of the two stages described in the
|
||||
:ref:`ghost-atom-comm` figure is inverted and the same lists of atoms
|
||||
:ref:`ghost-atom-comm` figure is inverted, and the same lists of atoms
|
||||
are used to pack and unpack message buffers with per-atom forces. When
|
||||
a received buffer is unpacked, the ghost forces are summed to owned atom
|
||||
forces. As in forward communication, forces on atoms in the four blue
|
||||
corners of the diagrams are sent, received, and summed twice (once at
|
||||
each stage) before owning processors have the full force.
|
||||
|
||||
These two operations are used many places within LAMMPS aside from
|
||||
These two operations are used in many places within LAMMPS aside from
|
||||
exchange of coordinates and forces, for example by manybody potentials
|
||||
to share intermediate per-atom values, or by rigid-body integrators to
|
||||
enable each atom in a body to access body properties. Here are
|
||||
@ -105,7 +105,7 @@ performed in LAMMPS:
|
||||
atom pairs when building neighbor lists or computing forces.
|
||||
|
||||
- The cutoff distance for exchanging ghost atoms is typically equal to
|
||||
the neighbor cutoff. But it can also chosen to be longer if needed,
|
||||
the neighbor cutoff. But it can also set to a larger value if needed,
|
||||
e.g. half the diameter of a rigid body composed of multiple atoms or
|
||||
over 3x the length of a stretched bond for dihedral interactions. It
|
||||
can also exceed the periodic box size. For the regular communication
|
||||
@ -113,7 +113,7 @@ performed in LAMMPS:
|
||||
processor's subdomain, then multiple exchanges are performed in the
|
||||
same direction. Each exchange is with the same neighbor processor,
|
||||
but buffers are packed/unpacked using a different list of atoms. For
|
||||
forward communication, in the first exchange a processor sends only
|
||||
forward communication, in the first exchange, a processor sends only
|
||||
owned atoms. In subsequent exchanges, it sends ghost atoms received
|
||||
in previous exchanges. For the irregular pattern (right) overlaps of
|
||||
a processor's extended ghost-atom subdomain with all other processors
|
||||
|
||||
@ -40,28 +40,28 @@ orthogonal boxes.
|
||||
|
||||
.. _fft-parallel:
|
||||
.. figure:: img/fft-decomp-parallel.png
|
||||
:align: center
|
||||
|
||||
parallel FFT in PPPM
|
||||
Parallel FFT in PPPM
|
||||
|
||||
Stages of a parallel FFT for a simulation domain overlaid
|
||||
with an 8x8x8 3d FFT grid, partitioned across 64 processors.
|
||||
Within each of the 4 diagrams, grid cells of the same color are
|
||||
owned by a single processor; for simplicity only cells owned by 4
|
||||
or 8 of the 64 processors are colored. The two images on the left
|
||||
Stages of a parallel FFT for a simulation domain overlaid with an
|
||||
8x8x8 3d FFT grid, partitioned across 64 processors. Within each
|
||||
of the 4 diagrams, grid cells of the same color are owned by a
|
||||
single processor; for simplicity, only cells owned by 4 or 8 of
|
||||
the 64 processors are colored. The two images on the left
|
||||
illustrate brick-to-pencil communication. The two images on the
|
||||
right illustrate pencil-to-pencil communication, which in this
|
||||
case transposes the *y* and *z* dimensions of the grid.
|
||||
|
||||
Parallel 3d FFTs require substantial communication relative to their
|
||||
computational cost. A 3d FFT is implemented by a series of 1d FFTs
|
||||
along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus the FFT
|
||||
grid cannot be decomposed like atoms into 3 dimensions for parallel
|
||||
along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus, the
|
||||
FFT grid cannot be decomposed like atoms into 3 dimensions for parallel
|
||||
processing of the FFTs but only in 1 (as planes) or 2 (as pencils)
|
||||
dimensions and in between the steps the grid needs to be transposed to
|
||||
have the FFT grid portion "owned" by each MPI process complete in the
|
||||
direction of the 1d FFTs it has to perform. LAMMPS uses the
|
||||
pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure.
|
||||
pencil-decomposition algorithm as shown in the :ref:`fft-parallel`
|
||||
figure.
|
||||
|
||||
Initially (far left), each processor owns a brick of same-color grid
|
||||
cells (actually grid points) contained within in its subdomain. A
|
||||
@ -97,7 +97,7 @@ across all $P$ processors with a single call to ``MPI_Alltoall()``, but
|
||||
this is typically much slower. However, for the specialized brick and
|
||||
pencil tiling illustrated in :ref:`fft-parallel` figure, collective
|
||||
communication across the entire MPI communicator is not required. In
|
||||
the example an :math:`8^3` grid with 512 grid cells is partitioned
|
||||
the example, an :math:`8^3` grid with 512 grid cells is partitioned
|
||||
across 64 processors; each processor owns a 2x2x2 3d brick of grid
|
||||
cells. The initial brick-to-pencil communication (upper left to upper
|
||||
right) only requires collective communication within subgroups of 4
|
||||
@ -132,7 +132,7 @@ grid/particle operations that LAMMPS supports:
|
||||
- The fftMPI library allows each grid dimension to be a multiple of
|
||||
small prime factors (2,3,5), and allows any number of processors to
|
||||
perform the FFT. The resulting brick and pencil decompositions are
|
||||
thus not always as well-aligned but the size of subgroups of
|
||||
thus not always as well-aligned, but the size of subgroups of
|
||||
processors for the two modes of communication (brick/pencil and
|
||||
pencil/pencil) still scale as :math:`O(P^{\frac{1}{3}})` and
|
||||
:math:`O(P^{\frac{1}{2}})`.
|
||||
@ -143,21 +143,23 @@ grid/particle operations that LAMMPS supports:
|
||||
in memory. This reordering can be done during the packing or
|
||||
unpacking of buffers for MPI communication.
|
||||
|
||||
- For large systems and particularly a large number of MPI processes,
|
||||
the dominant cost for parallel FFTs is often the communication, not
|
||||
the computation of 1d FFTs, even though the latter scales as :math:`N
|
||||
\log(N)` in the number of grid points *N* per grid direction. This is
|
||||
due to the fact that only a 2d decomposition into pencils is possible
|
||||
while atom data (and their corresponding short-range force and energy
|
||||
computations) can be decomposed efficiently in 3d.
|
||||
- For large systems and particularly many MPI processes, the dominant
|
||||
cost for parallel FFTs is often the communication, not the computation
|
||||
of 1d FFTs, even though the latter scales as :math:`N \log(N)` in the
|
||||
number of grid points *N* per grid direction. This is due to the fact
|
||||
that only a 2d decomposition into pencils is possible while atom data
|
||||
(and their corresponding short-range force and energy computations)
|
||||
can be decomposed efficiently in 3d.
|
||||
|
||||
This can be addressed by reducing the number of MPI processes involved
|
||||
in the MPI communication by using :doc:`hybrid MPI + OpenMP
|
||||
parallelization <Speed_omp>`. This will use OpenMP parallelization
|
||||
inside the MPI domains and while that may have a lower parallel
|
||||
efficiency, it reduces the communication overhead.
|
||||
Reducing the number of MPI processes involved in the MPI communication
|
||||
will reduce this kind of overhead. By using a :doc:`hybrid MPI +
|
||||
OpenMP parallelization <Speed_omp>` it is still possible to use all
|
||||
processes for parallel computation. It will use OpenMP
|
||||
parallelization inside the MPI domains. While that may have a lower
|
||||
parallel efficiency for some part of the computation, that can be less
|
||||
than the communication overhead in the 3d FFTs.
|
||||
|
||||
As an alternative it is also possible to start a :ref:`multi-partition
|
||||
As an alternative, it is also possible to start a :ref:`multi-partition
|
||||
<partition>` calculation and then use the :doc:`verlet/split
|
||||
integrator <run_style>` to perform the PPPM computation on a
|
||||
dedicated, separate partition of MPI processes. This uses an integer
|
||||
@ -175,7 +177,7 @@ grid/particle operations that LAMMPS supports:
|
||||
manner consistent with processor subdomains, and provides methods for
|
||||
forward and reverse communication of owned and ghost grid point
|
||||
values. It is used for PPPM as an FFT grid (as outlined above) and
|
||||
also for the MSM algorithm which uses a cascade of grid sizes from
|
||||
also for the MSM algorithm, which uses a cascade of grid sizes from
|
||||
fine to coarse to compute long-range Coulombic forces. The GridComm
|
||||
class is also useful for models where continuum fields interact with
|
||||
particles. For example, the two-temperature model (TTM) defines heat
|
||||
|
||||
@ -3,12 +3,12 @@ Neighbor lists
|
||||
|
||||
To compute forces efficiently, each processor creates a Verlet-style
|
||||
neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned,
|
||||
*j* = owned or ghost) with separation less than the applicable
|
||||
neighbor list cutoff distance. In LAMMPS the neighbor lists are stored
|
||||
in a multiple-page data structure; each page is a contiguous chunk of
|
||||
memory which stores vectors of neighbor atoms *j* for many *i* atoms.
|
||||
This allows pages to be incrementally allocated or deallocated in blocks
|
||||
as needed. Neighbor lists typically consume the most memory of any data
|
||||
*j* = owned or ghost) with separation less than the applicable neighbor
|
||||
list cutoff distance. In LAMMPS, the neighbor lists are stored in a
|
||||
multiple-page data structure; each page is a contiguous chunk of memory
|
||||
which stores vectors of neighbor atoms *j* for many *i* atoms. This
|
||||
allows pages to be incrementally allocated or deallocated in blocks as
|
||||
needed. Neighbor lists typically consume the most memory of any data
|
||||
structure in LAMMPS. The neighbor list is rebuilt (from scratch) once
|
||||
every few timesteps, then used repeatedly each step for force or other
|
||||
computations. The neighbor cutoff distance is :math:`R_n = R_f +
|
||||
@ -16,7 +16,7 @@ computations. The neighbor cutoff distance is :math:`R_n = R_f +
|
||||
the interatomic potential for computing short-range pairwise or manybody
|
||||
forces and :math:`\Delta_s` is a "skin" distance that allows the list to
|
||||
be used for multiple steps assuming that atoms do not move very far
|
||||
between consecutive time steps. Typically the code triggers
|
||||
between consecutive time steps. Typically, the code triggers
|
||||
reneighboring when any atom has moved half the skin distance since the
|
||||
last reneighboring; this and other options of the neighbor list rebuild
|
||||
can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
|
||||
@ -26,10 +26,10 @@ their owning processor's subdomain are first migrated to new processors
|
||||
via communication. Periodic boundary conditions are also (only)
|
||||
enforced on these steps to ensure each atom is re-assigned to the
|
||||
correct processor. After migration, the atoms owned by each processor
|
||||
are stored in a contiguous vector. Periodically each processor
|
||||
are stored in a contiguous vector. Periodically, each processor
|
||||
spatially sorts owned atoms within its vector to reorder it for improved
|
||||
cache efficiency in force computations and neighbor list building. For
|
||||
that atoms are spatially binned and then reordered so that atoms in the
|
||||
that, atoms are spatially binned and then reordered so that atoms in the
|
||||
same bin are adjacent in the vector. Atom sorting can be disabled or
|
||||
its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
||||
|
||||
@ -44,7 +44,7 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
||||
(left) and triclinic (right) domains. A regular grid of neighbor
|
||||
bins (thin lines) overlays the entire simulation domain and need not
|
||||
align with subdomain boundaries; only the portion overlapping the
|
||||
augmented subdomain is shown. In the triclinic case it overlaps the
|
||||
augmented subdomain is shown. In the triclinic case, it overlaps the
|
||||
bounding box of the tilted rectangle. The blue- and red-shaded bins
|
||||
represent a stencil of bins searched to find neighbors of a particular
|
||||
atom (black dot).
|
||||
@ -53,12 +53,12 @@ To build a local neighbor list in linear time, the simulation domain is
|
||||
overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
|
||||
as shown in the :ref:`neighbor-stencil` figure for 2d models and a
|
||||
single MPI processor's subdomain. Each processor stores a set of
|
||||
neighbor bins which overlap its subdomain extended by the neighbor
|
||||
neighbor bins which overlap its subdomain, extended by the neighbor
|
||||
cutoff distance :math:`R_n`. As illustrated, the bins need not align
|
||||
with processor boundaries; an integer number in each dimension is fit to
|
||||
the size of the entire simulation box.
|
||||
|
||||
Most often LAMMPS builds what it calls a "half" neighbor list where
|
||||
Most often, LAMMPS builds what is called a "half" neighbor list where
|
||||
each *i,j* neighbor pair is stored only once, with either atom *i* or
|
||||
*j* as the central atom. The build can be done efficiently by using a
|
||||
pre-computed "stencil" of bins around a central origin bin which
|
||||
@ -67,18 +67,18 @@ is simply a list of integer offsets in *x,y,z* of nearby bins
|
||||
surrounding the origin bin which are close enough to contain any
|
||||
neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the
|
||||
origin bin. Note that for a half neighbor list, the stencil can be
|
||||
asymmetric since each atom only need store half its nearby neighbors.
|
||||
asymmetric, since each atom only need store half its nearby neighbors.
|
||||
|
||||
These stencils are illustrated in the figure for a half list and a bin
|
||||
size of :math:`\frac{1}{2} R_n`. There are 13 red+blue stencil bins in
|
||||
2d (for the orthogonal case, 15 for triclinic). In 3d there would be
|
||||
63, 13 in the plane of bins that contain the origin bin and 25 in each
|
||||
of the two planes above it in the *z* direction (75 for triclinic). The
|
||||
reason the triclinic stencil has extra bins is because the bins tile the
|
||||
bounding box of the entire triclinic domain and thus are not periodic
|
||||
with respect to the simulation box itself. The stencil and logic for
|
||||
determining which *i,j* pairs to include in the neighbor list are
|
||||
altered slightly to account for this.
|
||||
triclinic stencil has extra bins because the bins tile the bounding box
|
||||
of the entire triclinic domain, and thus are not periodic with respect
|
||||
to the simulation box itself. The stencil and logic for determining
|
||||
which *i,j* pairs to include in the neighbor list are altered slightly
|
||||
to account for this.
|
||||
|
||||
To build a neighbor list, a processor first loops over its "owned" plus
|
||||
"ghost" atoms and assigns each to a neighbor bin. This uses an integer
|
||||
@ -95,7 +95,7 @@ supports:
|
||||
been found to be optimal for many typical cases. Smaller bins incur
|
||||
additional overhead to loop over; larger bins require more distance
|
||||
calculations. Note that for smaller bin sizes, the 2d stencil in the
|
||||
figure would be more semi-circular in shape (hemispherical in 3d),
|
||||
figure would be of a more semicircular shape (hemispherical in 3d),
|
||||
with bins near the corners of the square eliminated due to their
|
||||
distance from the origin bin.
|
||||
|
||||
@ -111,8 +111,8 @@ supports:
|
||||
symmetric stencil. It also includes lists with partial enumeration of
|
||||
ghost atom neighbors. The full and ghost-atom lists are used by
|
||||
various manybody interatomic potentials. Lists may also use different
|
||||
criteria for inclusion of a pair interaction. Typically this simply
|
||||
depends only on the distance between two atoms and the cutoff
|
||||
criteria for inclusion of a pairwise interaction. Typically, this
|
||||
simply depends only on the distance between two atoms and the cutoff
|
||||
distance. But for finite-size coarse-grained particles with
|
||||
individual diameters (e.g. polydisperse granular particles), it can
|
||||
also depend on the diameters of the two particles.
|
||||
@ -121,11 +121,11 @@ supports:
|
||||
of the master neighbor list for the full system need to be generated,
|
||||
one for each sub-style, which contains only the *i,j* pairs needed to
|
||||
compute interactions between subsets of atoms for the corresponding
|
||||
potential. This means not all *i* or *j* atoms owned by a processor
|
||||
potential. This means, not all *i* or *j* atoms owned by a processor
|
||||
are included in a particular sub-list.
|
||||
|
||||
- Some models use different cutoff lengths for pairwise interactions
|
||||
between different kinds of particles which are stored in a single
|
||||
between different kinds of particles, which are stored in a single
|
||||
neighbor list. One example is a solvated colloidal system with large
|
||||
colloidal particles where colloid/colloid, colloid/solvent, and
|
||||
solvent/solvent interaction cutoffs can be dramatically different.
|
||||
@ -153,7 +153,7 @@ supports:
|
||||
For the newton pair *on* setting the atom *j* is only added to the
|
||||
list if its *z* coordinate is larger, or if equal the *y* coordinate
|
||||
is larger, and that is equal, too, the *x* coordinate is larger. For
|
||||
homogeneously dense systems that will result in picking neighbors from
|
||||
homogeneously dense systems, that will result in picking neighbors from
|
||||
a same size sector in always the same direction relative to the
|
||||
"owned" atom and thus it should lead to similar length neighbor lists
|
||||
and thus reduce the chance of a load imbalance.
|
||||
"owned" atom, and thus it should lead to similar length neighbor lists
|
||||
and reduce the chance of a load imbalance.
|
||||
|
||||
@ -6,7 +6,7 @@ thread parallelism to predominantly distribute loops over local data
|
||||
and thus follow an orthogonal parallelization strategy to the
|
||||
decomposition into spatial domains used by the :doc:`MPI partitioning
|
||||
<Developer_par_part>`. For clarity, this section discusses only the
|
||||
implementation in the OPENMP package as it is the simplest. The INTEL
|
||||
implementation in the OPENMP package, as it is the simplest. The INTEL
|
||||
and KOKKOS package offer additional options and are more complex since
|
||||
they support more features and different hardware like co-processors
|
||||
or GPUs.
|
||||
@ -14,7 +14,7 @@ or GPUs.
|
||||
One of the key decisions when implementing the OPENMP package was to
|
||||
keep the changes to the source code small, so that it would be easier to
|
||||
maintain the code and keep it in sync with the non-threaded standard
|
||||
implementation. this is achieved by a) making the OPENMP version a
|
||||
implementation. This is achieved by a) making the OPENMP version a
|
||||
derived class from the regular version (e.g. ``PairLJCutOMP`` from
|
||||
``PairLJCut``) and overriding only methods that are multi-threaded or
|
||||
need to be modified to support multi-threading (similar to what was done
|
||||
@ -26,13 +26,13 @@ into three separate classes ``ThrOMP``, ``ThrData``, and ``FixOMP``.
|
||||
available in the corresponding base class (e.g. ``Pair`` for
|
||||
``PairLJCutOMP``) like multi-thread aware variants of the "tally"
|
||||
functions. Those functions are made available through multiple
|
||||
inheritance so those new functions have to have unique names to avoid
|
||||
inheritance, so those new functions have to have unique names to avoid
|
||||
ambiguities; typically ``_thr`` is appended to the name of the function.
|
||||
``ThrData`` is a classes that manages per-thread data structures.
|
||||
It is used instead of extending the corresponding storage to per-thread
|
||||
arrays to avoid slowdowns due to "false sharing" when multiple threads
|
||||
update adjacent elements in an array and thus force the CPU cache lines
|
||||
to be reset and re-fetched. ``FixOMP`` finally manages the "multi-thread
|
||||
``ThrData`` is a class that manages per-thread data structures. It is
|
||||
used instead of extending the corresponding storage to per-thread arrays
|
||||
to avoid slowdowns due to "false sharing" when multiple threads update
|
||||
adjacent elements in an array and thus force the CPU cache lines to be
|
||||
reset and re-fetched. ``FixOMP`` finally manages the "multi-thread
|
||||
state" like settings and access to per-thread storage, it is activated
|
||||
by the :doc:`package omp <package>` command.
|
||||
|
||||
@ -46,24 +46,24 @@ involve multiple atoms and thus there are race conditions when multiple
|
||||
threads want to update per-atom data of the same atoms. Five possible
|
||||
strategies have been considered to avoid this:
|
||||
|
||||
1) restructure the code so that there is no overlapping access possible
|
||||
1. Restructure the code so that there is no overlapping access possible
|
||||
when computing in parallel, e.g. by breaking lists into multiple
|
||||
parts and synchronizing threads in between.
|
||||
2) have each thread be "responsible" for a specific group of atoms and
|
||||
2. Have each thread be "responsible" for a specific group of atoms and
|
||||
compute these interactions multiple times, once on each thread that
|
||||
is responsible for a given atom and then have each thread only update
|
||||
is responsible for a given atom, and then have each thread only update
|
||||
the properties of this atom.
|
||||
3) use mutexes around functions and regions of code where the data race
|
||||
could happen
|
||||
4) use atomic operations when updating per-atom properties
|
||||
5) use replicated per-thread data structures to accumulate data without
|
||||
3. Use mutexes around functions and regions of code where the data race
|
||||
could happen.
|
||||
4. Use atomic operations when updating per-atom properties.
|
||||
5. Use replicated per-thread data structures to accumulate data without
|
||||
conflicts and then use a reduction to combine those results into the
|
||||
data structures used by the regular style.
|
||||
|
||||
Option 5 was chosen for the OPENMP package because it would retain the
|
||||
performance for the case of 1 thread and the code would be more
|
||||
performance for the case of a single thread and the code would be more
|
||||
maintainable. Option 1 would require extensive code changes,
|
||||
particularly to the neighbor list code; options 2 would have incurred a
|
||||
particularly to the neighbor list code; option 2 would have incurred a
|
||||
2x or more performance penalty for the serial case; option 3 causes
|
||||
significant overhead and would enforce serialization of operations in
|
||||
inner loops and thus defeat the purpose of multi-threading; option 4
|
||||
@ -80,7 +80,7 @@ equivalent to the number of CPU cores per CPU socket on high-end
|
||||
supercomputers.
|
||||
|
||||
Thus arrays like the force array are dimensioned to the number of atoms
|
||||
times the number of threads when enabling OpenMP support and inside the
|
||||
times the number of threads when enabling OpenMP support, and inside the
|
||||
compute functions a pointer to a different chunk is obtained by each thread.
|
||||
Similarly, accumulators like potential energy or virial are kept in
|
||||
per-thread instances of the ``ThrData`` class and then only reduced and
|
||||
@ -91,7 +91,7 @@ Loop scheduling
|
||||
"""""""""""""""
|
||||
|
||||
Multi-thread parallelization is applied by distributing (outer) loops
|
||||
statically across threads. Typically this would be the loop over local
|
||||
statically across threads. Typically, this would be the loop over local
|
||||
atoms *i* when processing *i,j* pairs of atoms from a neighbor list.
|
||||
The design of the neighbor list code results in atoms having a similar
|
||||
number of neighbors for homogeneous systems and thus load imbalances
|
||||
|
||||
@ -7,36 +7,36 @@ distributed-memory parallelism is set with the :doc:`comm_style command
|
||||
|
||||
.. _domain-decomposition:
|
||||
.. figure:: img/domain-decomp.png
|
||||
:align: center
|
||||
|
||||
domain decomposition
|
||||
Domain decomposition schemes
|
||||
|
||||
This figure shows the different kinds of domain decomposition used
|
||||
for MPI parallelization: "brick" on the left with an orthogonal
|
||||
(left) and a triclinic (middle) simulation domain, and a "tiled"
|
||||
decomposition (right). The black lines show the division into
|
||||
subdomains and the contained atoms are "owned" by the corresponding
|
||||
MPI process. The green dashed lines indicate how subdomains are
|
||||
extended with "ghost" atoms up to the communication cutoff distance.
|
||||
subdomains, and the contained atoms are "owned" by the
|
||||
corresponding MPI process. The green dashed lines indicate how
|
||||
subdomains are extended with "ghost" atoms up to the communication
|
||||
cutoff distance.
|
||||
|
||||
The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
|
||||
or triclinic in shape, as illustrated in the :ref:`domain-decomposition`
|
||||
figure for the 2d case. Orthogonal means the box edges are aligned with
|
||||
the *x*, *y*, *z* Cartesian axes, and the box faces are thus all
|
||||
rectangular. Triclinic allows for a more general parallelepiped shape
|
||||
in which edges are aligned with three arbitrary vectors and the box
|
||||
faces are parallelograms. In each dimension box faces can be periodic,
|
||||
or non-periodic with fixed or shrink-wrapped boundaries. In the fixed
|
||||
case, atoms which move outside the face are deleted; shrink-wrapped
|
||||
means the position of the box face adjusts continuously to enclose all
|
||||
the atoms.
|
||||
The LAMMPS simulation box is a 3d or 2d volume, which can be of
|
||||
orthogonal or triclinic shape, as illustrated in the
|
||||
:ref:`domain-decomposition` figure for the 2d case. Orthogonal means
|
||||
the box edges are aligned with the *x*, *y*, *z* Cartesian axes, and the
|
||||
box faces are thus all rectangular. Triclinic allows for a more general
|
||||
parallelepiped shape in which edges are aligned with three arbitrary
|
||||
vectors and the box faces are parallelograms. In each dimension, box
|
||||
faces can be periodic, or non-periodic with fixed or shrink-wrapped
|
||||
boundaries. In the fixed case, atoms which move outside the face are
|
||||
deleted; shrink-wrapped means the position of the box face adjusts
|
||||
continuously to enclose all the atoms.
|
||||
|
||||
For distributed-memory MPI parallelism, the simulation box is spatially
|
||||
decomposed (partitioned) into non-overlapping subdomains which fill the
|
||||
box. The default partitioning, "brick", is most suitable when atom
|
||||
density is roughly uniform, as shown in the left-side images of the
|
||||
:ref:`domain-decomposition` figure. The subdomains comprise a regular
|
||||
grid and all subdomains are identical in size and shape. Both the
|
||||
grid, and all subdomains are identical in size and shape. Both the
|
||||
orthogonal and triclinic boxes can deform continuously during a
|
||||
simulation, e.g. to compress a solid or shear a liquid, in which case
|
||||
the processor subdomains likewise deform.
|
||||
@ -76,14 +76,14 @@ the load imbalance:
|
||||
|
||||
The pictures above demonstrate different decompositions for a 2d system
|
||||
with 12 MPI ranks. The atom colors indicate the load imbalance of each
|
||||
subdomain with green being optimal and red the least optimal.
|
||||
subdomain, with green being optimal and red the least optimal.
|
||||
|
||||
Due to the vacuum in the system, the default decomposition is unbalanced
|
||||
with several MPI ranks without atoms (left). By forcing a 1x12x1
|
||||
processor grid, every MPI rank does computations now, but number of
|
||||
atoms per subdomain is still uneven and the thin slice shape increases
|
||||
the amount of communication between subdomains (center left). With a
|
||||
2x6x1 processor grid and shifting the subdomain divisions, the load
|
||||
imbalance is further reduced and the amount of communication required
|
||||
between subdomains is less (center right). And using the recursive
|
||||
bisectioning leads to further improved decomposition (right).
|
||||
Due to the vacuum in the system, the default decomposition is
|
||||
unbalanced, with several MPI ranks without atoms (left). By forcing a
|
||||
1x12x1 processor grid, every MPI rank does computations now, but the
|
||||
number of atoms per subdomain is still uneven, and the thin slice shape
|
||||
increases the amount of communication between subdomains (center
|
||||
left). With a 2x6x1 processor grid and shifting the subdomain divisions,
|
||||
the load imbalance is further reduced and the amount of communication
|
||||
required between subdomains is less (center right). And using the
|
||||
recursive bisectioning leads to further improved decomposition (right).
|
||||
|
||||
@ -136,7 +136,7 @@ The LAMMPS Python module enables calling the LAMMPS C library API from
|
||||
Python by dynamically loading functions in the LAMMPS shared library through
|
||||
the `Python ctypes module <https://docs.python.org/3/library/ctypes.html>`_.
|
||||
Because of the dynamic loading, it is **required** that LAMMPS is compiled
|
||||
in :ref:`"shared" mode <exe>`. The Python interface is object oriented, but
|
||||
in :ref:`"shared" mode <exe>`. The Python interface is object-oriented, but
|
||||
otherwise tries to be very similar to the C library API. Three different
|
||||
Python classes to run LAMMPS are available and they build on each other.
|
||||
More information on this is in the :doc:`Python_head`
|
||||
@ -152,7 +152,7 @@ LAMMPS Fortran API
|
||||
|
||||
The LAMMPS Fortran module is a wrapper around calling functions from the
|
||||
LAMMPS C library API. This is done using the ISO_C_BINDING feature in
|
||||
Fortran 2003. The interface is object oriented but otherwise tries to
|
||||
Fortran 2003. The interface is object-oriented but otherwise tries to
|
||||
be very similar to the C library API and the basic Python module.
|
||||
|
||||
.. toctree::
|
||||
|
||||
@ -1071,7 +1071,7 @@ getting started, but not as a fully tested and supported feature of the
|
||||
LAMMPS distribution. Any contributions to complete this are, of course,
|
||||
welcome. Please also note, that for the case of creating a Python wrapper,
|
||||
a fully supported :doc:`Ctypes based lammps module <Python_module>`
|
||||
already exists. That module is designed to be object oriented while
|
||||
already exists. That module is designed to be object-oriented while
|
||||
SWIG will generate a 1:1 translation of the functions in the interface file.
|
||||
|
||||
Building the wrapper
|
||||
|
||||
@ -730,15 +730,15 @@ is because there can only be one fix which monitors the global
|
||||
pressure and changes the simulation box dimensions. So you have 3
|
||||
choices:
|
||||
|
||||
* Use one of the 4 NPT or NPH styles for the rigid bodies. Use the
|
||||
#. Use one of the 4 NPT or NPH styles for the rigid bodies. Use the
|
||||
*dilate* all option so that it will dilate the positions of the
|
||||
non-rigid particles as well. Use :doc:`fix nvt <fix_nh>` (or any
|
||||
other thermostat) for the non-rigid particles.
|
||||
* Use :doc:`fix npt <fix_nh>` for the group of non-rigid particles. Use
|
||||
#. Use :doc:`fix npt <fix_nh>` for the group of non-rigid particles. Use
|
||||
the *dilate* all option so that it will dilate the center-of-mass
|
||||
positions of the rigid bodies as well. Use one of the 4 NVE or 2 NVT
|
||||
rigid styles for the rigid bodies.
|
||||
* Use :doc:`fix press/berendsen <fix_press_berendsen>` to compute the
|
||||
#. Use :doc:`fix press/berendsen <fix_press_berendsen>` to compute the
|
||||
pressure and change the box dimensions. Use one of the 4 NVE or 2 NVT
|
||||
rigid styles for the rigid bodies. Use :doc:`fix nvt <fix_nh>` (or
|
||||
any other thermostat) for the non-rigid particles.
|
||||
|
||||
@ -166,7 +166,7 @@ events, all the other replicas also run dynamics and event checking
|
||||
with the same schedule, but the final states are always overwritten by
|
||||
the state of the event replica.
|
||||
|
||||
The outer loop of the pseudo-code above continues until *N* steps of
|
||||
The outer loop of the pseudocode above continues until *N* steps of
|
||||
dynamics have been performed. Note that *N* only includes the
|
||||
dynamics of stages 2 and 3, not the steps taken during dephasing or
|
||||
the minimization iterations of quenching. The specified *N* is
|
||||
|
||||
@ -94,7 +94,7 @@ class Pointers {
|
||||
python(ptr->python) {}
|
||||
virtual ~Pointers() = default;
|
||||
|
||||
// remove default members execept for the copy constructor
|
||||
// remove other default members
|
||||
|
||||
Pointers() = delete;
|
||||
Pointers(const Pointers &) = default;
|
||||
|
||||
Reference in New Issue
Block a user