starting grammar, punctuation, and spelling review for developer info sections
This commit is contained in:
@ -1,52 +1,53 @@
|
|||||||
Code design
|
Code design
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
This section explains some of the code design choices in LAMMPS with
|
This section explains some code design choices in LAMMPS with the goal
|
||||||
the goal of helping developers write new code similar to the existing
|
of helping developers write new code similar to the existing code.
|
||||||
code. Please see the section on :doc:`Requirements for contributed
|
Please see the section on :doc:`Requirements for contributed code
|
||||||
code <Modify_style>` for more specific recommendations and guidelines.
|
<Modify_style>` for more specific recommendations and guidelines. While
|
||||||
While that section is organized more in the form of a checklist for
|
that section is organized more in the form of a checklist for code
|
||||||
code contributors, the focus here is on overall code design strategy,
|
contributors, the focus here is on overall code design strategy, choices
|
||||||
choices made between possible alternatives, and discussing some
|
made between possible alternatives, and discussing some relevant C++
|
||||||
relevant C++ programming language constructs.
|
programming language constructs.
|
||||||
|
|
||||||
Historically, the basic design philosophy of the LAMMPS C++ code was a
|
Historically, the basic design philosophy of the LAMMPS C++ code was a
|
||||||
"C with classes" style. The motivation was to make it easy to modify
|
"C with classes" style. The motivation was to make it easy to modify
|
||||||
LAMMPS for people without significant training in C++ programming.
|
LAMMPS for people without significant training in C++ programming. Data
|
||||||
Data structures and code constructs were used that resemble the
|
structures and code constructs were used that resemble the previous
|
||||||
previous implementation(s) in Fortran. A contributing factor to this
|
implementation(s) in Fortran. A contributing factor to this choice was
|
||||||
choice also was that at the time, C++ compilers were often not mature
|
that at the time, C++ compilers were often not mature and some advanced
|
||||||
and some of the advanced features contained bugs or did not function
|
features contained bugs or did not function as the standard required.
|
||||||
as the standard required. There were also disagreements between
|
There were also disagreements between compiler vendors as to how to
|
||||||
compiler vendors as to how to interpret the C++ standard documents.
|
interpret the C++ standard documents.
|
||||||
|
|
||||||
However, C++ compilers have now advanced significantly. In 2020 we
|
However, C++ compilers and the C++ programming language have advanced
|
||||||
decided to to require the C++11 standard as the minimum C++ language
|
significantly. In 2020, the LAMMPS developers decided to require the
|
||||||
standard for LAMMPS. Since then we have begun to also replace some of
|
C++11 standard as the minimum C++ language standard for LAMMPS. Since
|
||||||
the C-style constructs with equivalent C++ functionality, either from
|
then, we have begun to replace C-style constructs with equivalent C++
|
||||||
the C++ standard library or as custom classes or functions, in order
|
functionality. This was taken either from the C++ standard library or
|
||||||
to improve readability of the code and to increase code reuse through
|
implemented as custom classes or functions. The goal is to improve
|
||||||
abstraction of commonly used functionality.
|
readability of the code and to increase code reuse through abstraction
|
||||||
|
of commonly used functionality.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
Please note that as of spring 2022 there is still a sizable chunk
|
Please note that as of spring 2023 there is still a sizable chunk of
|
||||||
of legacy code in LAMMPS that has not yet been refactored to
|
legacy code in LAMMPS that has not yet been refactored to reflect
|
||||||
reflect these style conventions in full. LAMMPS has a large code
|
these style conventions in full. LAMMPS has a large code base and
|
||||||
base and many different contributors and there also is a hierarchy
|
many contributors. There is also a hierarchy of precedence in which
|
||||||
of precedence in which the code is adapted. Highest priority has
|
the code is adapted. Highest priority has been the code in the
|
||||||
been the code in the ``src`` folder, followed by code in packages
|
``src`` folder, followed by code in packages in order of their
|
||||||
in order of their popularity and complexity (simpler code is
|
popularity and complexity (simpler code gets adapted sooner), followed
|
||||||
adapted sooner), followed by code in the ``lib`` folder. Source
|
by code in the ``lib`` folder. Source code that is downloaded from
|
||||||
code that is downloaded from external packages or libraries during
|
external packages or libraries during compilation is not subject to
|
||||||
compilation is not subject to the conventions discussed here.
|
the conventions discussed here.
|
||||||
|
|
||||||
Object oriented code
|
Object-oriented code
|
||||||
^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
LAMMPS is designed to be an object oriented code. Each simulation is
|
LAMMPS is designed to be an object-oriented code. Each simulation is
|
||||||
represented by an instance of the LAMMPS class. When running in
|
represented by an instance of the LAMMPS class. When running in
|
||||||
parallel each MPI process creates such an instance. This can be seen
|
parallel, each MPI process creates such an instance. This can be seen
|
||||||
in the ``main.cpp`` file where the core steps of running a LAMMPS
|
in the ``main.cpp`` file where the core steps of running a LAMMPS
|
||||||
simulation are the following 3 lines of code:
|
simulation are the following 3 lines of code:
|
||||||
|
|
||||||
@ -67,14 +68,14 @@ other special features.
|
|||||||
The basic LAMMPS class hierarchy which is created by the LAMMPS class
|
The basic LAMMPS class hierarchy which is created by the LAMMPS class
|
||||||
constructor is shown in :ref:`class-topology`. When input commands
|
constructor is shown in :ref:`class-topology`. When input commands
|
||||||
are processed, additional class instances are created, or deleted, or
|
are processed, additional class instances are created, or deleted, or
|
||||||
replaced. Likewise specific member functions of specific classes are
|
replaced. Likewise, specific member functions of specific classes are
|
||||||
called to trigger actions such creating atoms, computing forces,
|
called to trigger actions such creating atoms, computing forces,
|
||||||
computing properties, time-propagating the system, or writing output.
|
computing properties, time-propagating the system, or writing output.
|
||||||
|
|
||||||
Compositing and Inheritance
|
Compositing and Inheritance
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
LAMMPS makes extensive use of the object oriented programming (OOP)
|
LAMMPS makes extensive use of the object-oriented programming (OOP)
|
||||||
principles of *compositing* and *inheritance*. Classes like the
|
principles of *compositing* and *inheritance*. Classes like the
|
||||||
``LAMMPS`` class are a **composite** containing pointers to instances
|
``LAMMPS`` class are a **composite** containing pointers to instances
|
||||||
of other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``,
|
of other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``,
|
||||||
@ -83,7 +84,7 @@ functionality by storing and manipulating data related to the
|
|||||||
simulation and providing member functions that trigger certain
|
simulation and providing member functions that trigger certain
|
||||||
actions. Some of those classes like ``Force`` are themselves
|
actions. Some of those classes like ``Force`` are themselves
|
||||||
composites, containing instances of classes describing different force
|
composites, containing instances of classes describing different force
|
||||||
interactions. Similarly the ``Modify`` class contains a list of
|
interactions. Similarly, the ``Modify`` class contains a list of
|
||||||
``Fix`` and ``Compute`` classes. If the input commands that
|
``Fix`` and ``Compute`` classes. If the input commands that
|
||||||
correspond to these classes include the word *style*, then LAMMPS
|
correspond to these classes include the word *style*, then LAMMPS
|
||||||
stores only a single instance of that class. E.g. *atom_style*,
|
stores only a single instance of that class. E.g. *atom_style*,
|
||||||
@ -100,19 +101,18 @@ derived class variant was instantiated. In LAMMPS these derived
|
|||||||
classes are often referred to as "styles", e.g. pair styles, fix
|
classes are often referred to as "styles", e.g. pair styles, fix
|
||||||
styles, atom styles and so on.
|
styles, atom styles and so on.
|
||||||
|
|
||||||
This is the origin of the flexibility of LAMMPS. For example pair
|
This is the origin of the flexibility of LAMMPS. For example, pair
|
||||||
styles implement a variety of different non-bonded interatomic
|
styles implement a variety of different non-bonded interatomic
|
||||||
potentials functions. All details for the implementation of a
|
potentials functions. All details for the implementation of a
|
||||||
potential are stored and executed in a single class.
|
potential are stored and executed in a single class.
|
||||||
|
|
||||||
As mentioned above, there can be multiple instances of classes derived
|
As mentioned above, there can be multiple instances of classes derived
|
||||||
from the ``Fix`` or ``Compute`` base classes. They represent a
|
from the ``Fix`` or ``Compute`` base classes. They represent a
|
||||||
different facet of LAMMPS flexibility as they provide methods which
|
different facet of LAMMPS' flexibility, as they provide methods which
|
||||||
can be called at different points in time within a timestep, as
|
can be called at different points within a timestep, as explained in
|
||||||
explained in `Developer_flow`. This allows the input script to tailor
|
`Developer_flow`. This allows the input script to tailor how a specific
|
||||||
how a specific simulation is run, what diagnostic computations are
|
simulation is run, what diagnostic computations are performed, and how
|
||||||
performed, and how the output of those computations is further
|
the output of those computations is further processed or output.
|
||||||
processed or output.
|
|
||||||
|
|
||||||
Additional code sharing is possible by creating derived classes from the
|
Additional code sharing is possible by creating derived classes from the
|
||||||
derived classes (e.g., to implement an accelerated version of a pair
|
derived classes (e.g., to implement an accelerated version of a pair
|
||||||
@ -164,15 +164,15 @@ The difference in behavior of the ``normal()`` and the ``poly()`` member
|
|||||||
functions is which of the two member functions is called when executing
|
functions is which of the two member functions is called when executing
|
||||||
`base1->call()` versus `base2->call()`. Without polymorphism, a
|
`base1->call()` versus `base2->call()`. Without polymorphism, a
|
||||||
function within the base class can only call member functions within the
|
function within the base class can only call member functions within the
|
||||||
same scope, that is ``Base::call()`` will always call
|
same scope: that is, ``Base::call()`` will always call
|
||||||
``Base::normal()``. But for the `base2->call()` case the call of the
|
``Base::normal()``. But for the `base2->call()` case, the call of the
|
||||||
virtual member function will be dispatched to ``Derived::poly()``
|
virtual member function will be dispatched to ``Derived::poly()``
|
||||||
instead. This mechanism means that functions are called within the
|
instead. This mechanism results in calling functions that are within
|
||||||
scope of the class type that was used to *create* the class instance are
|
the scope of the class that was used to *create* the instance, even if
|
||||||
invoked; even if they are assigned to a pointer using the type of a base
|
they are assigned to a pointer for their base class. This is the
|
||||||
class. This is the desired behavior and this way LAMMPS can even use
|
desired behavior, and this way LAMMPS can even use styles that are loaded
|
||||||
styles that are loaded at runtime from a shared object file with the
|
at runtime from a shared object file with the :doc:`plugin command
|
||||||
:doc:`plugin command <plugin>`.
|
<plugin>`.
|
||||||
|
|
||||||
A special case of virtual functions are so-called pure functions. These
|
A special case of virtual functions are so-called pure functions. These
|
||||||
are virtual functions that are initialized to 0 in the class declaration
|
are virtual functions that are initialized to 0 in the class declaration
|
||||||
@ -189,12 +189,12 @@ This has the effect that an instance of the base class cannot be
|
|||||||
created and that derived classes **must** implement these functions.
|
created and that derived classes **must** implement these functions.
|
||||||
Many of the functions listed with the various class styles in the
|
Many of the functions listed with the various class styles in the
|
||||||
section :doc:`Modify` are pure functions. The motivation for this is
|
section :doc:`Modify` are pure functions. The motivation for this is
|
||||||
to define the interface or API of the functions but defer their
|
to define the interface or API of the functions, but defer their
|
||||||
implementation to the derived classes.
|
implementation to the derived classes.
|
||||||
|
|
||||||
However, there are downsides to this. For example, calls to virtual
|
However, there are downsides to this. For example, calls to virtual
|
||||||
functions from within a constructor, will not be in the scope of the
|
functions from within a constructor, will *not* be in the scope of the
|
||||||
derived class and thus it is good practice to either avoid calling them
|
derived class, and thus it is good practice to either avoid calling them
|
||||||
or to provide an explicit scope such as ``Base::poly()`` or
|
or to provide an explicit scope such as ``Base::poly()`` or
|
||||||
``Derived::poly()``. Furthermore, any destructors in classes containing
|
``Derived::poly()``. Furthermore, any destructors in classes containing
|
||||||
virtual functions should be declared virtual too, so they will be
|
virtual functions should be declared virtual too, so they will be
|
||||||
@ -208,8 +208,8 @@ dispatch.
|
|||||||
that are intended to replace a virtual or pure function use the
|
that are intended to replace a virtual or pure function use the
|
||||||
``override`` property keyword. For the same reason, the use of
|
``override`` property keyword. For the same reason, the use of
|
||||||
overloads or default arguments for virtual functions should be
|
overloads or default arguments for virtual functions should be
|
||||||
avoided as they lead to confusion over which function is supposed to
|
avoided, as they lead to confusion over which function is supposed to
|
||||||
override which and which arguments need to be declared.
|
override which, and which arguments need to be declared.
|
||||||
|
|
||||||
Style Factories
|
Style Factories
|
||||||
===============
|
===============
|
||||||
@ -219,10 +219,10 @@ uses a programming pattern called `Factory`. Those are functions that
|
|||||||
create an instance of a specific derived class, say ``PairLJCut`` and
|
create an instance of a specific derived class, say ``PairLJCut`` and
|
||||||
return a pointer to the type of the common base class of that style,
|
return a pointer to the type of the common base class of that style,
|
||||||
``Pair`` in this case. To associate the factory function with the
|
``Pair`` in this case. To associate the factory function with the
|
||||||
style keyword, an ``std::map`` class is used with function pointers
|
style keyword, a ``std::map`` class is used with function pointers
|
||||||
indexed by their keyword (for example "lj/cut" for ``PairLJCut`` and
|
indexed by their keyword (for example "lj/cut" for ``PairLJCut`` and
|
||||||
"morse" for ``PairMorse``). A couple of typedefs help keep the code
|
"morse" for ``PairMorse``). A couple of typedefs help keep the code
|
||||||
readable and a template function is used to implement the actual
|
readable, and a template function is used to implement the actual
|
||||||
factory functions for the individual classes. Below is an example
|
factory functions for the individual classes. Below is an example
|
||||||
of such a factory function from the ``Force`` class as declared in
|
of such a factory function from the ``Force`` class as declared in
|
||||||
``force.h`` and implemented in ``force.cpp``. The file ``style_pair.h``
|
``force.h`` and implemented in ``force.cpp``. The file ``style_pair.h``
|
||||||
@ -279,26 +279,26 @@ from and writing to files and console instead of C++ "iostreams".
|
|||||||
This is mainly motivated by better performance, better control over
|
This is mainly motivated by better performance, better control over
|
||||||
formatting, and less effort to achieve specific formatting.
|
formatting, and less effort to achieve specific formatting.
|
||||||
|
|
||||||
Since mixing "stdio" and "iostreams" can lead to unexpected
|
Since mixing "stdio" and "iostreams" can lead to unexpected behavior,
|
||||||
behavior. use of the latter is strongly discouraged. Also output to
|
use of the latter is strongly discouraged. Output to the screen should
|
||||||
the screen should not use the predefined ``stdout`` FILE pointer, but
|
*not* use the predefined ``stdout`` FILE pointer, but rather the
|
||||||
rather the ``screen`` and ``logfile`` FILE pointers managed by the
|
``screen`` and ``logfile`` FILE pointers managed by the LAMMPS class.
|
||||||
LAMMPS class. Furthermore, output should generally only be done by
|
Furthermore, output should generally only be done by MPI rank 0
|
||||||
MPI rank 0 (``comm->me == 0``). Output that is sent to both
|
(``comm->me == 0``). Output that is sent to both ``screen`` and
|
||||||
``screen`` and ``logfile`` should use the :cpp:func:`utils::logmesg()
|
``logfile`` should use the :cpp:func:`utils::logmesg() convenience
|
||||||
convenience function <LAMMPS_NS::utils::logmesg>`.
|
function <LAMMPS_NS::utils::logmesg>`.
|
||||||
|
|
||||||
We also discourage the use of stringstreams because the bundled {fmt}
|
We discourage the use of stringstreams because the bundled {fmt} library
|
||||||
library and the customized tokenizer classes can provide the same
|
and the customized tokenizer classes provide the same functionality in a
|
||||||
functionality in a cleaner way with better performance. This also
|
cleaner way with better performance. This also helps maintain a
|
||||||
helps maintain a consistent programming syntax with code from many
|
consistent programming syntax with code from many different
|
||||||
different contributors.
|
contributors.
|
||||||
|
|
||||||
Formatting with the {fmt} library
|
Formatting with the {fmt} library
|
||||||
===================================
|
===================================
|
||||||
|
|
||||||
The LAMMPS source code includes a copy of the `{fmt} library
|
The LAMMPS source code includes a copy of the `{fmt} library
|
||||||
<https://fmt.dev>`_ which is preferred over formatting with the
|
<https://fmt.dev>`_, which is preferred over formatting with the
|
||||||
"printf()" family of functions. The primary reason is that it allows
|
"printf()" family of functions. The primary reason is that it allows
|
||||||
a typesafe default format for any type of supported data. This is
|
a typesafe default format for any type of supported data. This is
|
||||||
particularly useful for formatting integers of a given size (32-bit or
|
particularly useful for formatting integers of a given size (32-bit or
|
||||||
@ -313,17 +313,16 @@ been included into the C++20 language standard, so changes to adopt it
|
|||||||
are future-proof.
|
are future-proof.
|
||||||
|
|
||||||
Formatted strings are frequently created by calling the
|
Formatted strings are frequently created by calling the
|
||||||
``fmt::format()`` function which will return a string as a
|
``fmt::format()`` function, which will return a string as a
|
||||||
``std::string`` class instance. In contrast to the ``%`` placeholder
|
``std::string`` class instance. In contrast to the ``%`` placeholder in
|
||||||
in ``printf()``, the {fmt} library uses ``{}`` to embed format
|
``printf()``, the {fmt} library uses ``{}`` to embed format descriptors.
|
||||||
descriptors. In the simplest case, no additional characters are
|
In the simplest case, no additional characters are needed, as {fmt} will
|
||||||
needed as {fmt} will choose the default format based on the data type
|
choose the default format based on the data type of the argument.
|
||||||
of the argument. Otherwise the ``fmt::print()`` function may be
|
Otherwise, the ``fmt::print()`` function may be used instead of
|
||||||
used instead of ``printf()`` or ``fprintf()``. In addition, several
|
``printf()`` or ``fprintf()``. In addition, several LAMMPS output
|
||||||
LAMMPS output functions, that originally accepted a single string as
|
functions, that originally accepted a single string as argument have
|
||||||
argument have been overloaded to accept a format string with optional
|
been overloaded to accept a format string with optional arguments as
|
||||||
arguments as well (e.g., ``Error::all()``, ``Error::one()``,
|
well (e.g., ``Error::all()``, ``Error::one()``, ``utils::logmesg()``).
|
||||||
``utils::logmesg()``).
|
|
||||||
|
|
||||||
Summary of the {fmt} format syntax
|
Summary of the {fmt} format syntax
|
||||||
==================================
|
==================================
|
||||||
@ -332,10 +331,11 @@ The syntax of the format string is "{[<argument id>][:<format spec>]}",
|
|||||||
where either the argument id or the format spec (separated by a colon
|
where either the argument id or the format spec (separated by a colon
|
||||||
':') is optional. The argument id is usually a number starting from 0
|
':') is optional. The argument id is usually a number starting from 0
|
||||||
that is the index to the arguments following the format string. By
|
that is the index to the arguments following the format string. By
|
||||||
default these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The most
|
default, these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The
|
||||||
common case for using argument id would be to use the same argument in
|
most common case for using argument id would be to use the same argument
|
||||||
multiple places in the format string without having to provide it as an
|
in multiple places in the format string without having to provide it as
|
||||||
argument multiple times. In LAMMPS the argument id is rarely used.
|
an argument multiple times. The argument id is rarely used in the LAMMPS
|
||||||
|
source code.
|
||||||
|
|
||||||
More common is the use of a format specifier, which starts with a colon.
|
More common is the use of a format specifier, which starts with a colon.
|
||||||
This may optionally be followed by a fill character (default is ' '). If
|
This may optionally be followed by a fill character (default is ' '). If
|
||||||
@ -347,18 +347,19 @@ width, which may be followed by a dot '.' and a precision for floating
|
|||||||
point numbers. The final character in the format string would be an
|
point numbers. The final character in the format string would be an
|
||||||
indicator for the "presentation", i.e. 'd' for decimal presentation of
|
indicator for the "presentation", i.e. 'd' for decimal presentation of
|
||||||
integers, 'x' for hexadecimal, 'o' for octal, 'c' for character etc.
|
integers, 'x' for hexadecimal, 'o' for octal, 'c' for character etc.
|
||||||
This mostly follows the "printf()" scheme but without requiring an
|
This mostly follows the "printf()" scheme, but without requiring an
|
||||||
additional length parameter to distinguish between different integer
|
additional length parameter to distinguish between different integer
|
||||||
widths. The {fmt} library will detect those and adapt the formatting
|
widths. The {fmt} library will detect those and adapt the formatting
|
||||||
accordingly. For floating point numbers there are correspondingly, 'g'
|
accordingly. For floating point numbers there are correspondingly, 'g'
|
||||||
for generic presentation, 'e' for exponential presentation, and 'f' for
|
for generic presentation, 'e' for exponential presentation, and 'f' for
|
||||||
fixed point presentation.
|
fixed point presentation.
|
||||||
|
|
||||||
Thus "{:8}" would represent *any* type argument using at least 8
|
The format string "{:8}" would thus represent *any* type argument and be
|
||||||
characters; "{:<8}" would do this as left aligned, "{:^8}" as centered,
|
replaced by at least 8 characters; "{:<8}" would do this as left
|
||||||
"{:>8}" as right aligned. If a specific presentation is selected, the
|
aligned, "{:^8}" as centered, "{:>8}" as right aligned. If a specific
|
||||||
argument type must be compatible or else the {fmt} formatting code will
|
presentation is selected, the argument type must be compatible or else
|
||||||
throw an exception. Some format string examples are given below:
|
the {fmt} formatting code will throw an exception. Some format string
|
||||||
|
examples are given below:
|
||||||
|
|
||||||
.. code-block:: c++
|
.. code-block:: c++
|
||||||
|
|
||||||
@ -392,12 +393,12 @@ documentation <https://fmt.dev/latest/syntax.html>`_ website.
|
|||||||
Memory management
|
Memory management
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Dynamical allocation of small data and objects can be done with the
|
Dynamical allocation of small data and objects can be done with the C++
|
||||||
the C++ commands "new" and "delete/delete[]. Large data should use
|
commands "new" and "delete/delete[]". Large data should use the member
|
||||||
the member functions of the ``Memory`` class, most commonly,
|
functions of the ``Memory`` class, most commonly, ``Memory::create()``,
|
||||||
``Memory::create()``, ``Memory::grow()``, and ``Memory::destroy()``,
|
``Memory::grow()``, and ``Memory::destroy()``, which provide variants
|
||||||
which provide variants for vectors, 2d arrays, 3d arrays, etc.
|
for vectors, 2d arrays, 3d arrays, etc. These can also be used for
|
||||||
These can also be used for small data.
|
small data.
|
||||||
|
|
||||||
The use of ``malloc()``, ``calloc()``, ``realloc()`` and ``free()``
|
The use of ``malloc()``, ``calloc()``, ``realloc()`` and ``free()``
|
||||||
directly is strongly discouraged. To simplify adapting legacy code
|
directly is strongly discouraged. To simplify adapting legacy code
|
||||||
@ -408,26 +409,24 @@ perform additional error checks for safety.
|
|||||||
Use of these custom memory allocation functions is motivated by the
|
Use of these custom memory allocation functions is motivated by the
|
||||||
following considerations:
|
following considerations:
|
||||||
|
|
||||||
- memory allocation failures on *any* MPI rank during a parallel run
|
- Memory allocation failures on *any* MPI rank during a parallel run
|
||||||
will trigger an immediate abort of the entire parallel calculation
|
will trigger an immediate abort of the entire parallel calculation.
|
||||||
instead of stalling it
|
- A failing "new" will trigger an exception, which is also captured by
|
||||||
- a failing "new" will trigger an exception which is also captured by
|
LAMMPS and triggers a global abort.
|
||||||
LAMMPS and triggers a global abort
|
- Allocation of multidimensional arrays will be done in a C compatible
|
||||||
- allocation of multi-dimensional arrays will be done in a C compatible
|
fashion, but such that the storage of the actual data is stored in one
|
||||||
fashion but so that the storage of the actual data is stored in one
|
large contiguous block. Thus, when MPI communication is needed,
|
||||||
large contiguous block. Thus when MPI communication is needed,
|
|
||||||
the data can be communicated directly (similar to Fortran arrays).
|
the data can be communicated directly (similar to Fortran arrays).
|
||||||
- the "destroy()" and "sfree()" functions may safely be called on NULL
|
- The "destroy()" and "sfree()" functions may safely be called on NULL
|
||||||
pointers
|
pointers.
|
||||||
- the "destroy()" functions will nullify the pointer variables making
|
- The "destroy()" functions will nullify the pointer variables, thus
|
||||||
"use after free" errors easy to detect
|
making "use after free" errors easy to detect.
|
||||||
- it is possible to use a larger than default memory alignment (not on
|
- It is possible to use a larger than default memory alignment (not on
|
||||||
all operating systems, since the allocated storage pointers must be
|
all operating systems, since the allocated storage pointers must be
|
||||||
compatible with ``free()`` for technical reasons)
|
compatible with ``free()`` for technical reasons).
|
||||||
|
|
||||||
In the practical implementation of code this means that any pointer
|
In the practical implementation of code this means, that any pointer
|
||||||
variables that are class members should be initialized to a
|
variables, that are class members should be initialized to a ``nullptr``
|
||||||
``nullptr`` value in their respective constructors. That way it is
|
value in their respective constructors. That way, it is safe to call
|
||||||
safe to call ``Memory::destroy()`` or ``delete[]`` on them before
|
``Memory::destroy()`` or ``delete[]`` on them before *any* allocation
|
||||||
*any* allocation outside the constructor. This helps prevent memory
|
outside the constructor. This helps prevent memory leaks.
|
||||||
leaks.
|
|
||||||
|
|||||||
@ -28,7 +28,7 @@ The need to do this communication arises when data from the owned atoms
|
|||||||
is updated (e.g. their positions) and this updated information needs to
|
is updated (e.g. their positions) and this updated information needs to
|
||||||
be **copied** to the corresponding ghost atoms.
|
be **copied** to the corresponding ghost atoms.
|
||||||
|
|
||||||
And second, *reverse communication* which sends ghost atom information
|
And second, *reverse communication*, which sends ghost atom information
|
||||||
from each processor to the owning processor to **accumulate** (sum)
|
from each processor to the owning processor to **accumulate** (sum)
|
||||||
the values with the corresponding owned atoms. The need for this
|
the values with the corresponding owned atoms. The need for this
|
||||||
arises when data is computed and also stored with ghost atoms
|
arises when data is computed and also stored with ghost atoms
|
||||||
@ -58,7 +58,7 @@ embedded-atom method (EAM) which compute intermediate values in the
|
|||||||
first part of the compute() function that need to be stored by both
|
first part of the compute() function that need to be stored by both
|
||||||
owned and ghost atoms for the second part of the force computation.
|
owned and ghost atoms for the second part of the force computation.
|
||||||
The *Comm* class methods perform the MPI communication for buffers of
|
The *Comm* class methods perform the MPI communication for buffers of
|
||||||
per-atom data. They "call back" to the *Pair* class so it can *pack*
|
per-atom data. They "call back" to the *Pair* class, so it can *pack*
|
||||||
or *unpack* the buffer with data the *Pair* class owns. There are 4
|
or *unpack* the buffer with data the *Pair* class owns. There are 4
|
||||||
such methods that the *Pair* class must define, assuming it uses both
|
such methods that the *Pair* class must define, assuming it uses both
|
||||||
forward and reverse communication:
|
forward and reverse communication:
|
||||||
@ -70,7 +70,7 @@ forward and reverse communication:
|
|||||||
|
|
||||||
The arguments to these methods include the buffer and a list of atoms
|
The arguments to these methods include the buffer and a list of atoms
|
||||||
to pack or unpack. The *Pair* class also must set the *comm_forward*
|
to pack or unpack. The *Pair* class also must set the *comm_forward*
|
||||||
and *comm_reverse* variables which store the number of values stored
|
and *comm_reverse* variables, which store the number of values stored
|
||||||
in the communication buffers for each operation. This means, if
|
in the communication buffers for each operation. This means, if
|
||||||
desired, it can choose to store multiple per-atom values in the
|
desired, it can choose to store multiple per-atom values in the
|
||||||
buffer, and they will be communicated together to minimize
|
buffer, and they will be communicated together to minimize
|
||||||
@ -81,11 +81,11 @@ containing ``double`` values. To correctly store integers that may be
|
|||||||
|
|
||||||
The *Fix*, *Compute*, and *Dump* classes can also invoke the same kind
|
The *Fix*, *Compute*, and *Dump* classes can also invoke the same kind
|
||||||
of forward and reverse communication operations using the same *Comm*
|
of forward and reverse communication operations using the same *Comm*
|
||||||
class methods. Likewise the same pack/unpack methods and
|
class methods. Likewise, the same pack/unpack methods and
|
||||||
comm_forward/comm_reverse variables must be defined by the calling
|
comm_forward/comm_reverse variables must be defined by the calling
|
||||||
*Fix*, *Compute*, or *Dump* class.
|
*Fix*, *Compute*, or *Dump* class.
|
||||||
|
|
||||||
For *Fix* classes there is an optional second argument to the
|
For *Fix* classes, there is an optional second argument to the
|
||||||
*forward_comm()* and *reverse_comm()* call which can be used when the
|
*forward_comm()* and *reverse_comm()* call which can be used when the
|
||||||
fix performs multiple modes of communication, with different numbers
|
fix performs multiple modes of communication, with different numbers
|
||||||
of values per atom. The fix should set the *comm_forward* and
|
of values per atom. The fix should set the *comm_forward* and
|
||||||
@ -150,7 +150,7 @@ latter case, when the *ring* operation is complete, each processor can
|
|||||||
examine its original buffer to extract modified values.
|
examine its original buffer to extract modified values.
|
||||||
|
|
||||||
Note that the *ring* operation is similar to an MPI_Alltoall()
|
Note that the *ring* operation is similar to an MPI_Alltoall()
|
||||||
operation where every processor effectively sends and receives data to
|
operation, where every processor effectively sends and receives data to
|
||||||
every other processor. The difference is that the *ring* operation
|
every other processor. The difference is that the *ring* operation
|
||||||
does it one step at a time, so the total volume of data does not need
|
does it one step at a time, so the total volume of data does not need
|
||||||
to be stored by every processor. However, the *ring* operation is
|
to be stored by every processor. However, the *ring* operation is
|
||||||
@ -184,8 +184,8 @@ The *exchange_data()* method triggers the communication to be
|
|||||||
performed. Each processor provides the vector of *N* datums to send,
|
performed. Each processor provides the vector of *N* datums to send,
|
||||||
and the size of each datum. All datums must be the same size.
|
and the size of each datum. All datums must be the same size.
|
||||||
|
|
||||||
The *create_atom()* and *exchange_atom()* methods are similar except
|
The *create_atom()* and *exchange_atom()* methods are similar, except
|
||||||
that the size of each datum can be different. Typically this is used
|
that the size of each datum can be different. Typically, this is used
|
||||||
to communicate atoms, each with a variable amount of per-atom data, to
|
to communicate atoms, each with a variable amount of per-atom data, to
|
||||||
other processors.
|
other processors.
|
||||||
|
|
||||||
|
|||||||
@ -45,9 +45,9 @@ other methods in the class.
|
|||||||
zero before each timestep, so that forces (torques, etc) can be
|
zero before each timestep, so that forces (torques, etc) can be
|
||||||
accumulated.
|
accumulated.
|
||||||
|
|
||||||
Now for the ``Verlet::run()`` method. Its basic structure in hi-level pseudo
|
Now for the ``Verlet::run()`` method. Its basic structure in hi-level
|
||||||
code is shown below. In the actual code in ``src/verlet.cpp`` some of
|
pseudocode is shown below. In the actual code in ``src/verlet.cpp``
|
||||||
these operations are conditionally invoked.
|
some of these operations are conditionally invoked.
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
@ -105,17 +105,17 @@ need it. These flags are passed to the various methods that compute
|
|||||||
particle interactions, so that they either compute and tally the
|
particle interactions, so that they either compute and tally the
|
||||||
corresponding data or can skip the extra calculations if the energy and
|
corresponding data or can skip the extra calculations if the energy and
|
||||||
virial are not needed. See the comments for the ``Integrate::ev_set()``
|
virial are not needed. See the comments for the ``Integrate::ev_set()``
|
||||||
method which document the flag values.
|
method, which document the flag values.
|
||||||
|
|
||||||
At various points of the timestep, fixes are invoked,
|
At various points of the timestep, fixes are invoked,
|
||||||
e.g. ``fix->initial_integrate()``. In the code, this is actually done
|
e.g. ``fix->initial_integrate()``. In the code, this is actually done
|
||||||
via the Modify class which stores all the Fix objects and lists of which
|
via the Modify class, which stores all the Fix objects and lists of which
|
||||||
should be invoked at what point in the timestep. Fixes are the LAMMPS
|
should be invoked at what point in the timestep. Fixes are the LAMMPS
|
||||||
mechanism for tailoring the operations of a timestep for a particular
|
mechanism for tailoring the operations of a timestep for a particular
|
||||||
simulation. As described elsewhere, each fix has one or more methods,
|
simulation. As described elsewhere, each fix has one or more methods,
|
||||||
each of which is invoked at a specific stage of the timestep, as show in
|
each of which is invoked at a specific stage of the timestep, as show in
|
||||||
the timestep pseudo-code. All the active fixes defined in an input
|
the timestep pseudocode. All the active fixes defined in an input
|
||||||
script, that are flagged to have an ``initial_integrate()`` method are
|
script, that are flagged to have an ``initial_integrate()`` method, are
|
||||||
invoked at the beginning of each timestep. Examples are :doc:`fix nve
|
invoked at the beginning of each timestep. Examples are :doc:`fix nve
|
||||||
<fix_nve>` or :doc:`fix nvt or fix npt <fix_nh>` which perform the
|
<fix_nve>` or :doc:`fix nvt or fix npt <fix_nh>` which perform the
|
||||||
start-of-timestep velocity-Verlet integration operations to update
|
start-of-timestep velocity-Verlet integration operations to update
|
||||||
@ -131,9 +131,9 @@ can be changed using the :doc:`neigh_modify every/delay/check
|
|||||||
<neigh_modify>` command. If not, coordinates of ghost atoms are
|
<neigh_modify>` command. If not, coordinates of ghost atoms are
|
||||||
acquired by each processor via the ``forward_comm()`` method of the Comm
|
acquired by each processor via the ``forward_comm()`` method of the Comm
|
||||||
class. If neighbor lists need to be built, several operations within
|
class. If neighbor lists need to be built, several operations within
|
||||||
the inner if clause of the pseudo-code are first invoked. The
|
the inner if clause of the pseudocode are first invoked. The
|
||||||
``pre_exchange()`` method of any defined fixes is invoked first.
|
``pre_exchange()`` method of any defined fixes is invoked first.
|
||||||
Typically this inserts or deletes particles from the system.
|
Typically, this inserts or deletes particles from the system.
|
||||||
|
|
||||||
Periodic boundary conditions are then applied by the Domain class via
|
Periodic boundary conditions are then applied by the Domain class via
|
||||||
its ``pbc()`` method to remap particles that have moved outside the
|
its ``pbc()`` method to remap particles that have moved outside the
|
||||||
@ -148,7 +148,7 @@ The box boundaries are then reset (if needed) via the ``reset_box()``
|
|||||||
method of the Domain class, e.g. if box boundaries are shrink-wrapped to
|
method of the Domain class, e.g. if box boundaries are shrink-wrapped to
|
||||||
current particle coordinates. A change in the box size or shape
|
current particle coordinates. A change in the box size or shape
|
||||||
requires internal information for communicating ghost atoms (Comm class)
|
requires internal information for communicating ghost atoms (Comm class)
|
||||||
and neighbor list bins (Neighbor class) be updated. The ``setup()``
|
and neighbor list bins (Neighbor class) to be updated. The ``setup()``
|
||||||
method of the Comm class and ``setup_bins()`` method of the Neighbor
|
method of the Comm class and ``setup_bins()`` method of the Neighbor
|
||||||
class perform the update.
|
class perform the update.
|
||||||
|
|
||||||
@ -217,20 +217,21 @@ file, and restart files. See the :doc:`thermo_style <thermo_style>`,
|
|||||||
:doc:`dump <dump>`, and :doc:`restart <restart>` commands for more
|
:doc:`dump <dump>`, and :doc:`restart <restart>` commands for more
|
||||||
details.
|
details.
|
||||||
|
|
||||||
The the flow of control during energy minimization iterations is
|
The flow of control during energy minimization iterations is similar to
|
||||||
similar to that of a molecular dynamics timestep. Forces are computed,
|
that of a molecular dynamics timestep. Forces are computed, neighbor
|
||||||
neighbor lists are built as needed, atoms migrate to new processors, and
|
lists are built as needed, atoms migrate to new processors, and atom
|
||||||
atom coordinates and forces are communicated to neighboring processors.
|
coordinates and forces are communicated to neighboring processors. The
|
||||||
The only difference is what Fix class operations are invoked when. Only
|
only difference is what Fix class operations are invoked when. Only a
|
||||||
a subset of LAMMPS fixes are useful during energy minimization, as
|
subset of LAMMPS fixes are useful during energy minimization, as
|
||||||
explained in their individual doc pages. The relevant Fix class methods
|
explained in their individual doc pages. The relevant Fix class methods
|
||||||
are ``min_pre_exchange()``, ``min_pre_force()``, and ``min_post_force()``.
|
are ``min_pre_exchange()``, ``min_pre_force()``, and
|
||||||
Each fix is invoked at the appropriate place within the minimization
|
``min_post_force()``. Each fix is invoked at the appropriate place
|
||||||
iteration. For example, the ``min_post_force()`` method is analogous to
|
within the minimization iteration. For example, the
|
||||||
the ``post_force()`` method for dynamics; it is used to alter or constrain
|
``min_post_force()`` method is analogous to the ``post_force()`` method
|
||||||
forces on each atom, which affects the minimization procedure.
|
for dynamics; it is used to alter or constrain forces on each atom,
|
||||||
|
which affects the minimization procedure.
|
||||||
|
|
||||||
After all iterations are completed there is a ``cleanup`` step which
|
After all iterations are completed, there is a ``cleanup`` step which
|
||||||
calls the ``post_run()`` method of fixes to perform operations only required
|
calls the ``post_run()`` method of fixes to perform operations only required
|
||||||
at the end of a calculations (like freeing temporary storage or creating
|
at the end of a calculation (like freeing temporary storage or creating
|
||||||
final outputs).
|
final outputs).
|
||||||
|
|||||||
@ -70,7 +70,7 @@ A command can define multiple grids, each of a different size. Each
|
|||||||
grid is an instantiation of the Grid2d or Grid3d class.
|
grid is an instantiation of the Grid2d or Grid3d class.
|
||||||
|
|
||||||
The command also defines what data it will store for each grid it
|
The command also defines what data it will store for each grid it
|
||||||
creates and it allocates the multi-dimensional array(s) needed to
|
creates and it allocates the multidimensional array(s) needed to
|
||||||
store the data. No grid cell data is stored within the Grid2d or
|
store the data. No grid cell data is stored within the Grid2d or
|
||||||
Grid3d classes.
|
Grid3d classes.
|
||||||
|
|
||||||
@ -115,7 +115,7 @@ which stores *Nvalues* per grid cell.
|
|||||||
nyhi_out, nxlo_out, nxhi_out, nvalues,
|
nyhi_out, nxlo_out, nxhi_out, nvalues,
|
||||||
"data3d_multi");
|
"data3d_multi");
|
||||||
|
|
||||||
Note that these multi-dimensional arrays are allocated as contiguous
|
Note that these multidimensional arrays are allocated as contiguous
|
||||||
chunks of memory where the x-index of the grid varies fastest, then y,
|
chunks of memory where the x-index of the grid varies fastest, then y,
|
||||||
and the z-index slowest. For multiple values per grid cell, the
|
and the z-index slowest. For multiple values per grid cell, the
|
||||||
Nvalues are contiguous, so their index varies even faster than the
|
Nvalues are contiguous, so their index varies even faster than the
|
||||||
@ -798,7 +798,7 @@ A value of -1 is returned if the data name is not recognized.
|
|||||||
The *get_griddata_by_index()* method is called after the
|
The *get_griddata_by_index()* method is called after the
|
||||||
*get_griddata_by_name()* method, using the data index it returned as
|
*get_griddata_by_name()* method, using the data index it returned as
|
||||||
its argument. This method will return a pointer to the
|
its argument. This method will return a pointer to the
|
||||||
multi-dimensional array which stores the requested data.
|
multidimensional array which stores the requested data.
|
||||||
|
|
||||||
As in the discussion above of the Memory class *create_offset()*
|
As in the discussion above of the Memory class *create_offset()*
|
||||||
methods, the dimensionality of the array associated with the returned
|
methods, the dimensionality of the array associated with the returned
|
||||||
|
|||||||
@ -15,7 +15,7 @@ for more information about that part of the build process. LAMMPS
|
|||||||
currently supports building with :doc:`conventional makefiles
|
currently supports building with :doc:`conventional makefiles
|
||||||
<Build_make>` and through :doc:`CMake <Build_cmake>`. Those procedures
|
<Build_make>` and through :doc:`CMake <Build_cmake>`. Those procedures
|
||||||
differ in how packages are enabled or disabled for inclusion into a
|
differ in how packages are enabled or disabled for inclusion into a
|
||||||
LAMMPS binary so they cannot be mixed. The source files for each
|
LAMMPS binary, so they cannot be mixed. The source files for each
|
||||||
package are in all-uppercase subdirectories of the ``src`` folder, for
|
package are in all-uppercase subdirectories of the ``src`` folder, for
|
||||||
example ``src/MOLECULE`` or ``src/EXTRA-MOLECULE``. The ``src/STUBS``
|
example ``src/MOLECULE`` or ``src/EXTRA-MOLECULE``. The ``src/STUBS``
|
||||||
subdirectory is not a package but contains a dummy MPI library, that is
|
subdirectory is not a package but contains a dummy MPI library, that is
|
||||||
@ -26,31 +26,31 @@ with traditional makefiles.
|
|||||||
|
|
||||||
The ``lib`` directory contains the source code for several supporting
|
The ``lib`` directory contains the source code for several supporting
|
||||||
libraries or files with configuration settings to use globally installed
|
libraries or files with configuration settings to use globally installed
|
||||||
libraries, that are required by some of the optional packages. They may
|
libraries, that are required by some optional packages. They may
|
||||||
include python scripts that can transparently download additional source
|
include python scripts that can transparently download additional source
|
||||||
code on request. Each subdirectory, like ``lib/poems`` or ``lib/gpu``,
|
code on request. Each subdirectory, like ``lib/poems`` or ``lib/gpu``,
|
||||||
contains the source files, some of which are in different languages such
|
contains the source files, some of which are in different languages such
|
||||||
as Fortran or CUDA. These libraries included in the LAMMPS build,
|
as Fortran or CUDA. These libraries included in the LAMMPS build, if the
|
||||||
if the corresponding package is installed.
|
corresponding package is installed.
|
||||||
|
|
||||||
LAMMPS C++ source files almost always come in pairs, such as
|
LAMMPS C++ source files almost always come in pairs, such as
|
||||||
``src/run.cpp`` (implementation file) and ``src/run.h`` (header file).
|
``src/run.cpp`` (implementation file) and ``src/run.h`` (header file).
|
||||||
Each pair of files defines a C++ class, for example the
|
Each pair of files defines a C++ class, for example the
|
||||||
:cpp:class:`LAMMPS_NS::Run` class which contains the code invoked by the
|
:cpp:class:`LAMMPS_NS::Run` class, which contains the code invoked by
|
||||||
:doc:`run <run>` command in a LAMMPS input script. As this example
|
the :doc:`run <run>` command in a LAMMPS input script. As this example
|
||||||
illustrates, source file and class names often have a one-to-one
|
illustrates, source file and class names often have a one-to-one
|
||||||
correspondence with a command used in a LAMMPS input script. Some
|
correspondence with a command used in a LAMMPS input script. Some
|
||||||
source files and classes do not have a corresponding input script
|
source files and classes do not have a corresponding input script
|
||||||
command, e.g. ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force`
|
command, for example ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force`
|
||||||
class. They are discussed in the next section.
|
class. They are discussed in the next section.
|
||||||
|
|
||||||
The names of all source files are in lower case and may use the
|
The names of all source files are in lower case and may use the
|
||||||
underscore character '_' to separate words. Outside of bundled libraries
|
underscore character '_' to separate words. Apart from bundled,
|
||||||
which may have different conventions, all C and C++ header files have a
|
externally maintained libraries, which may have different conventions,
|
||||||
``.h`` extension, all C++ files have a ``.cpp`` extension, and C files a
|
all C and C++ header files have a ``.h`` extension, all C++ files have a
|
||||||
``.c`` extension. A small number of C++ classes and utility functions
|
``.cpp`` extension, and C files a ``.c`` extension. A few C++ classes
|
||||||
are implemented with only a ``.h`` file. Examples are the Pointers and
|
and utility functions are implemented with only a ``.h`` file. Examples
|
||||||
Commands classes or the MathVec functions.
|
are the Pointers and Commands classes or the MathVec functions.
|
||||||
|
|
||||||
Class topology
|
Class topology
|
||||||
--------------
|
--------------
|
||||||
@ -62,35 +62,36 @@ associated source files in the ``src`` folder, for example the class
|
|||||||
:cpp:class:`LAMMPS_NS::Memory` corresponds to the files ``memory.cpp``
|
:cpp:class:`LAMMPS_NS::Memory` corresponds to the files ``memory.cpp``
|
||||||
and ``memory.h``, or the class :cpp:class:`LAMMPS_NS::AtomVec`
|
and ``memory.h``, or the class :cpp:class:`LAMMPS_NS::AtomVec`
|
||||||
corresponds to the files ``atom_vec.cpp`` and ``atom_vec.h``. Full
|
corresponds to the files ``atom_vec.cpp`` and ``atom_vec.h``. Full
|
||||||
lines in the figure represent compositing: that is the class at the base
|
lines in the figure represent compositing: that is, the class at the
|
||||||
of the arrow holds a pointer to an instance of the class at the tip.
|
base of the arrow holds a pointer to an instance of the class at the
|
||||||
Dashed lines instead represent inheritance: the class to the tip of the
|
tip. Dashed lines instead represent inheritance: the class at the tip
|
||||||
arrow is derived from the class at the base. Classes with a red boundary
|
of the arrow is derived from the class at the base. Classes with a red
|
||||||
are not instantiated directly, but they represent the base classes for
|
boundary are not instantiated directly, but they represent the base
|
||||||
"styles". Those "styles" make up the bulk of the LAMMPS code and only
|
classes for "styles". Those "styles" make up the bulk of the LAMMPS
|
||||||
a few representative examples are included in the figure so it remains
|
code and only a few representative examples are included in the figure,
|
||||||
readable.
|
so it remains readable.
|
||||||
|
|
||||||
.. _class-topology:
|
.. _class-topology:
|
||||||
.. figure:: JPG/lammps-classes.png
|
.. figure:: JPG/lammps-classes.png
|
||||||
|
|
||||||
LAMMPS class topology
|
LAMMPS class topology
|
||||||
|
|
||||||
This figure shows some of the relations of the base classes of the
|
This figure shows relations of base classes of the LAMMPS
|
||||||
LAMMPS simulation package. Full lines indicate that a class holds an
|
simulation package. Full lines indicate that a class holds an
|
||||||
instance of the class it is pointing to; dashed lines point to
|
instance of the class it is pointing to; dashed lines point to
|
||||||
derived classes that are given as examples of what classes may be
|
derived classes that are given as examples of what classes may be
|
||||||
instantiated during a LAMMPS run based on the input commands and
|
instantiated during a LAMMPS run based on the input commands and
|
||||||
accessed through the API define by their respective base classes. At
|
accessed through the API define by their respective base classes.
|
||||||
the core is the :cpp:class:`LAMMPS <LAMMPS_NS::LAMMPS>` class, which
|
At the core is the :cpp:class:`LAMMPS <LAMMPS_NS::LAMMPS>` class,
|
||||||
holds pointers to class instances with specific purposes. Those may
|
which holds pointers to class instances with specific purposes.
|
||||||
hold instances of other classes, sometimes directly, or only
|
Those may hold instances of other classes, sometimes directly, or
|
||||||
temporarily, sometimes as derived classes or derived classes of
|
only temporarily, sometimes as derived classes or derived classes
|
||||||
derived classes, which may also hold instances of other classes.
|
of derived classes, which may also hold instances of other
|
||||||
|
classes.
|
||||||
|
|
||||||
The :cpp:class:`LAMMPS_NS::LAMMPS` class is the topmost class and
|
The :cpp:class:`LAMMPS_NS::LAMMPS` class is the topmost class and
|
||||||
represents what is generally referred to an "instance" of LAMMPS. It is
|
represents what is generally referred to as an "instance of LAMMPS". It
|
||||||
a composite holding pointers to instances of other core classes
|
is a composite holding pointers to instances of other core classes
|
||||||
providing the core functionality of the MD engine in LAMMPS and through
|
providing the core functionality of the MD engine in LAMMPS and through
|
||||||
them abstractions of the required operations. The constructor of the
|
them abstractions of the required operations. The constructor of the
|
||||||
LAMMPS class will instantiate those instances, process the command line
|
LAMMPS class will instantiate those instances, process the command line
|
||||||
@ -102,42 +103,44 @@ LAMMPS while passing it the command line flags and input script. It
|
|||||||
deletes the LAMMPS instance after the method reading the input returns
|
deletes the LAMMPS instance after the method reading the input returns
|
||||||
and shuts down the MPI environment before it exits the executable.
|
and shuts down the MPI environment before it exits the executable.
|
||||||
|
|
||||||
The :cpp:class:`LAMMPS_NS::Pointers` is not shown in the
|
The :cpp:class:`LAMMPS_NS::Pointers` class is not shown in the
|
||||||
:ref:`class-topology` figure for clarity. It holds references to many
|
:ref:`class-topology` figure for clarity. It holds references to many
|
||||||
of the members of the `LAMMPS_NS::LAMMPS`, so that all classes derived
|
of the members of the `LAMMPS_NS::LAMMPS`, so that all classes derived
|
||||||
from :cpp:class:`LAMMPS_NS::Pointers` have direct access to those
|
from :cpp:class:`LAMMPS_NS::Pointers` have direct access to those
|
||||||
reference. From the class topology all classes with blue boundary are
|
references. From the class topology all classes with blue boundary are
|
||||||
referenced in the Pointers class and all classes in the second and third
|
referenced in the Pointers class and all classes in the second and third
|
||||||
columns, that are not listed as derived classes are instead derived from
|
columns, that are not listed as derived classes, are instead derived
|
||||||
:cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer references
|
from :cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer
|
||||||
in Pointers, a pointer to the LAMMPS class instance needs to be passed
|
references in Pointers, a pointer to the LAMMPS class instance needs to
|
||||||
to the constructor and thus all constructors for classes derived from it
|
be passed to the constructor. All constructors for classes derived from
|
||||||
must do so and pass this pointer to the constructor for Pointers.
|
it, must do so and thus pass that pointer to the constructor for
|
||||||
|
:cpp:class:`LAMMPS_NS::Pointers`. The default constructor for
|
||||||
|
:cpp:class:`LAMMPS_NS::Pointers` is disabled to enforce this.
|
||||||
|
|
||||||
Since all storage is supposed to be encapsulated (there are a few
|
Since all storage is supposed to be encapsulated (there are a few
|
||||||
exceptions), the LAMMPS class can also be instantiated multiple times by
|
exceptions), the LAMMPS class can also be instantiated multiple times by
|
||||||
a calling code. Outside of the aforementioned exceptions, those LAMMPS
|
a calling code. Outside the aforementioned exceptions, those LAMMPS
|
||||||
instances can be used alternately. As of the time of this writing
|
instances can be used alternately. As of the time of this writing
|
||||||
(early 2021) LAMMPS is not yet sufficiently thread-safe for concurrent
|
(early 2023) LAMMPS is not yet sufficiently thread-safe for concurrent
|
||||||
execution. When running in parallel with MPI, care has to be taken,
|
execution. When running in parallel with MPI, care has to be taken,
|
||||||
that suitable copies of communicators are used to not create conflicts
|
that suitable copies of communicators are used to not create conflicts
|
||||||
between different instances.
|
between different instances.
|
||||||
|
|
||||||
The LAMMPS class currently (early 2021) holds instances of 19 classes
|
The LAMMPS class currently holds instances of 19 classes representing
|
||||||
representing the core functionality. There are a handful of virtual
|
the core functionality. There are a handful of virtual parent classes
|
||||||
parent classes in LAMMPS that define what LAMMPS calls ``styles``. They
|
in LAMMPS that define what LAMMPS calls ``styles``. These are shaded
|
||||||
are shaded red in the :ref:`class-topology` figure. Each of these are
|
red in the :ref:`class-topology` figure. Each of these are parents of a
|
||||||
parents of a number of child classes that implement the interface
|
number of child classes that implement the interface defined by the
|
||||||
defined by the parent class. There are two main categories of these
|
parent class. There are two main categories of these ``styles``: some
|
||||||
``styles``: some may only have one instance active at a time (e.g. atom,
|
may only have one instance active at a time (e.g. atom, pair, bond,
|
||||||
pair, bond, angle, dihedral, improper, kspace, comm) and there is a
|
angle, dihedral, improper, kspace, comm) and there is a dedicated
|
||||||
dedicated pointer variable for each of them in the composite class.
|
pointer variable for each of them in the corresponding composite class.
|
||||||
Setups that require a mix of different such styles have to use a
|
Setups that require a mix of different such styles have to use a
|
||||||
*hybrid* class that takes the place of the one allowed instance and then
|
*hybrid* class instance that acts as a proxy, and manages and forwards
|
||||||
manages and forwards calls to the corresponding sub-styles for the
|
calls to the corresponding sub-style class instances for the designated
|
||||||
designated subset of atoms or data. The composite class may also have
|
subset of atoms or data. The composite class may also have lists of
|
||||||
lists of class instances, e.g. Modify handles lists of compute and fix
|
class instances, e.g. ``Modify`` handles lists of compute and fix
|
||||||
styles, while Output handles a list of dump class instances.
|
styles, while ``Output`` handles a list of dump class instances.
|
||||||
|
|
||||||
The exception to this scheme are the ``command`` style classes. These
|
The exception to this scheme are the ``command`` style classes. These
|
||||||
implement specific commands that can be invoked before, after, or in
|
implement specific commands that can be invoked before, after, or in
|
||||||
@ -146,19 +149,19 @@ command() method called and then, after completion, the class instance
|
|||||||
deleted. Examples for this are the create_box, create_atoms, minimize,
|
deleted. Examples for this are the create_box, create_atoms, minimize,
|
||||||
run, set, or velocity command styles.
|
run, set, or velocity command styles.
|
||||||
|
|
||||||
For all those ``styles`` certain naming conventions are employed: for
|
For all those ``styles``, certain naming conventions are employed: for
|
||||||
the fix nve command the class is called FixNVE and the source files are
|
the fix nve command the class is called FixNVE and the source files are
|
||||||
``fix_nve.h`` and ``fix_nve.cpp``. Similarly for fix ave/time we have
|
``fix_nve.h`` and ``fix_nve.cpp``. Similarly, for fix ave/time we have
|
||||||
FixAveTime and ``fix_ave_time.h`` and ``fix_ave_time.cpp``. Style names
|
FixAveTime and ``fix_ave_time.h`` and ``fix_ave_time.cpp``. Style names
|
||||||
are lower case and without spaces or special characters. A suffix or
|
are lower case and without spaces or special characters. A suffix or
|
||||||
words are appended with a forward slash '/' which denotes a variant of
|
words are appended with a forward slash '/' which denotes a variant of
|
||||||
the corresponding class without the suffix. To connect the style name
|
the corresponding class without the suffix. To connect the style name
|
||||||
and the class name, LAMMPS uses macros like: ``AtomStyle()``,
|
and the class name, LAMMPS uses macros like: ``AtomStyle()``,
|
||||||
``PairStyle()``, ``BondStyle()``, ``RegionStyle()``, and so on in the
|
``PairStyle()``, ``BondStyle()``, ``RegionStyle()``, and so on in the
|
||||||
corresponding header file. During configuration or compilation files
|
corresponding header file. During configuration or compilation, files
|
||||||
with the pattern ``style_<name>.h`` are created that consist of a list
|
with the pattern ``style_<name>.h`` are created that consist of a list
|
||||||
of include statements including all headers of all styles of a given
|
of include statements including all headers of all styles of a given
|
||||||
type that are currently active (or "installed).
|
type that are currently enabled (or "installed").
|
||||||
|
|
||||||
|
|
||||||
More details on individual classes in the :ref:`class-topology` are as
|
More details on individual classes in the :ref:`class-topology` are as
|
||||||
@ -172,8 +175,8 @@ follows:
|
|||||||
that one or multiple simulations can be run, on the processors
|
that one or multiple simulations can be run, on the processors
|
||||||
allocated for a run, e.g. by the mpirun command.
|
allocated for a run, e.g. by the mpirun command.
|
||||||
|
|
||||||
- The Input class reads and processes input input strings and files,
|
- The Input class reads and processes input (strings and files), stores
|
||||||
stores variables, and invokes :doc:`commands <Commands_all>`.
|
variables, and invokes :doc:`commands <Commands_all>`.
|
||||||
|
|
||||||
- Command style classes are derived from the Command class. They provide
|
- Command style classes are derived from the Command class. They provide
|
||||||
input script commands that perform one-time operations
|
input script commands that perform one-time operations
|
||||||
@ -192,7 +195,7 @@ follows:
|
|||||||
- The Atom class stores per-atom properties associated with atom styles.
|
- The Atom class stores per-atom properties associated with atom styles.
|
||||||
More precisely, they are allocated and managed by a class derived from
|
More precisely, they are allocated and managed by a class derived from
|
||||||
the AtomVec class, and the Atom class simply stores pointers to them.
|
the AtomVec class, and the Atom class simply stores pointers to them.
|
||||||
The classes derived from AtomVec represent the different atom styles
|
The classes derived from AtomVec represent the different atom styles,
|
||||||
and they are instantiated through the :doc:`atom_style <atom_style>`
|
and they are instantiated through the :doc:`atom_style <atom_style>`
|
||||||
command.
|
command.
|
||||||
|
|
||||||
@ -206,18 +209,22 @@ follows:
|
|||||||
class stores a single list (for all atoms). A NeighRequest class
|
class stores a single list (for all atoms). A NeighRequest class
|
||||||
instance is created by pair, fix, or compute styles when they need a
|
instance is created by pair, fix, or compute styles when they need a
|
||||||
particular kind of neighbor list and use the NeighRequest properties
|
particular kind of neighbor list and use the NeighRequest properties
|
||||||
to select the neighbor list settings for the given request. There can
|
to select the neighbor list settings for the given request. There can
|
||||||
be multiple instances of the NeighRequest class and the Neighbor class
|
be multiple instances of the NeighRequest class. The Neighbor class
|
||||||
will try to optimize how they are computed by creating copies or
|
will try to optimize how the requests are processed. Depending on the
|
||||||
sub-lists where possible.
|
NeighRequest properties, neighbor lists are constructed from scratch,
|
||||||
|
aliased, or constructed by post-processing an existing list into
|
||||||
|
sub-lists.
|
||||||
|
|
||||||
- The Comm class performs inter-processor communication, typically of
|
- The Comm class performs inter-processor communication, typically of
|
||||||
ghost atom information. This usually involves MPI message exchanges
|
ghost atom information. This usually involves MPI message exchanges
|
||||||
with 6 neighboring processors in the 3d logical grid of processors
|
with 6 neighboring processors in the 3d logical grid of processors
|
||||||
mapped to the simulation box. There are two :doc:`communication styles
|
mapped to the simulation box. There are two :doc:`communication styles
|
||||||
<comm_style>` enabling different ways to do the domain decomposition.
|
<comm_style>`, enabling different ways to perform the domain
|
||||||
Sometimes the Irregular class is used, when atoms may migrate to
|
decomposition.
|
||||||
arbitrary processors.
|
|
||||||
|
- The Irregular class is used, when atoms may migrate to arbitrary
|
||||||
|
processors.
|
||||||
|
|
||||||
- The Domain class stores the simulation box geometry, as well as
|
- The Domain class stores the simulation box geometry, as well as
|
||||||
geometric Regions and any user definition of a Lattice. The latter
|
geometric Regions and any user definition of a Lattice. The latter
|
||||||
@ -246,7 +253,7 @@ follows:
|
|||||||
file, dump file snapshots, and restart files. These correspond to the
|
file, dump file snapshots, and restart files. These correspond to the
|
||||||
:doc:`Thermo <thermo_style>`, :doc:`Dump <dump>`, and
|
:doc:`Thermo <thermo_style>`, :doc:`Dump <dump>`, and
|
||||||
:doc:`WriteRestart <write_restart>` classes respectively. The Dump
|
:doc:`WriteRestart <write_restart>` classes respectively. The Dump
|
||||||
class is a base class with several derived classes implementing
|
class is a base class, with several derived classes implementing
|
||||||
various dump style variants.
|
various dump style variants.
|
||||||
|
|
||||||
- The Timer class logs timing information, output at the end
|
- The Timer class logs timing information, output at the end
|
||||||
|
|||||||
@ -1,12 +1,12 @@
|
|||||||
Communication
|
Communication
|
||||||
^^^^^^^^^^^^^
|
^^^^^^^^^^^^^
|
||||||
|
|
||||||
Following the partitioning scheme in use all per-atom data is
|
Following the selected partitioning scheme, all per-atom data is
|
||||||
distributed across the MPI processes, which allows LAMMPS to handle very
|
distributed across the MPI processes, which allows LAMMPS to handle very
|
||||||
large systems provided it uses a correspondingly large number of MPI
|
large systems provided it uses a correspondingly large number of MPI
|
||||||
processes. Since The per-atom data (atom IDs, positions, velocities,
|
processes. Since The per-atom data (atom IDs, positions, velocities,
|
||||||
types, etc.) To be able to compute the short-range interactions MPI
|
types, etc.) To be able to compute the short-range interactions, MPI
|
||||||
processes need not only access to data of atoms they "own" but also
|
processes need not only access to the data of atoms they "own" but also
|
||||||
information about atoms from neighboring subdomains, in LAMMPS referred
|
information about atoms from neighboring subdomains, in LAMMPS referred
|
||||||
to as "ghost" atoms. These are copies of atoms storing required
|
to as "ghost" atoms. These are copies of atoms storing required
|
||||||
per-atom data for up to the communication cutoff distance. The green
|
per-atom data for up to the communication cutoff distance. The green
|
||||||
@ -59,7 +59,7 @@ and upper *y*-boundary of rank 0's subdomain. In the *y* stage, ranks
|
|||||||
4,5,6 send atoms in their blue-shaded regions to rank 0. This may
|
4,5,6 send atoms in their blue-shaded regions to rank 0. This may
|
||||||
include ghost atoms they received in the *x* stage, but only if they
|
include ghost atoms they received in the *x* stage, but only if they
|
||||||
are needed by rank 0 to fill its extended ghost atom regions in the
|
are needed by rank 0 to fill its extended ghost atom regions in the
|
||||||
+/-*y* directions (blue rectangles). Thus in this case, ranks 5 and
|
+/-*y* directions (blue rectangles). Thus, in this case, ranks 5 and
|
||||||
6 do not include ghost atoms they received from each other (in the *x*
|
6 do not include ghost atoms they received from each other (in the *x*
|
||||||
stage) in the atoms they send to rank 0. The key point is that while
|
stage) in the atoms they send to rank 0. The key point is that while
|
||||||
the pattern of communication is more complex in the irregular
|
the pattern of communication is more complex in the irregular
|
||||||
@ -78,14 +78,14 @@ A "reverse" communication is when computed ghost atom attributes are
|
|||||||
sent back to the processor who owns the atom. This is used (for
|
sent back to the processor who owns the atom. This is used (for
|
||||||
example) to sum partial forces on ghost atoms to the complete force on
|
example) to sum partial forces on ghost atoms to the complete force on
|
||||||
owned atoms. The order of the two stages described in the
|
owned atoms. The order of the two stages described in the
|
||||||
:ref:`ghost-atom-comm` figure is inverted and the same lists of atoms
|
:ref:`ghost-atom-comm` figure is inverted, and the same lists of atoms
|
||||||
are used to pack and unpack message buffers with per-atom forces. When
|
are used to pack and unpack message buffers with per-atom forces. When
|
||||||
a received buffer is unpacked, the ghost forces are summed to owned atom
|
a received buffer is unpacked, the ghost forces are summed to owned atom
|
||||||
forces. As in forward communication, forces on atoms in the four blue
|
forces. As in forward communication, forces on atoms in the four blue
|
||||||
corners of the diagrams are sent, received, and summed twice (once at
|
corners of the diagrams are sent, received, and summed twice (once at
|
||||||
each stage) before owning processors have the full force.
|
each stage) before owning processors have the full force.
|
||||||
|
|
||||||
These two operations are used many places within LAMMPS aside from
|
These two operations are used in many places within LAMMPS aside from
|
||||||
exchange of coordinates and forces, for example by manybody potentials
|
exchange of coordinates and forces, for example by manybody potentials
|
||||||
to share intermediate per-atom values, or by rigid-body integrators to
|
to share intermediate per-atom values, or by rigid-body integrators to
|
||||||
enable each atom in a body to access body properties. Here are
|
enable each atom in a body to access body properties. Here are
|
||||||
@ -105,7 +105,7 @@ performed in LAMMPS:
|
|||||||
atom pairs when building neighbor lists or computing forces.
|
atom pairs when building neighbor lists or computing forces.
|
||||||
|
|
||||||
- The cutoff distance for exchanging ghost atoms is typically equal to
|
- The cutoff distance for exchanging ghost atoms is typically equal to
|
||||||
the neighbor cutoff. But it can also chosen to be longer if needed,
|
the neighbor cutoff. But it can also set to a larger value if needed,
|
||||||
e.g. half the diameter of a rigid body composed of multiple atoms or
|
e.g. half the diameter of a rigid body composed of multiple atoms or
|
||||||
over 3x the length of a stretched bond for dihedral interactions. It
|
over 3x the length of a stretched bond for dihedral interactions. It
|
||||||
can also exceed the periodic box size. For the regular communication
|
can also exceed the periodic box size. For the regular communication
|
||||||
@ -113,7 +113,7 @@ performed in LAMMPS:
|
|||||||
processor's subdomain, then multiple exchanges are performed in the
|
processor's subdomain, then multiple exchanges are performed in the
|
||||||
same direction. Each exchange is with the same neighbor processor,
|
same direction. Each exchange is with the same neighbor processor,
|
||||||
but buffers are packed/unpacked using a different list of atoms. For
|
but buffers are packed/unpacked using a different list of atoms. For
|
||||||
forward communication, in the first exchange a processor sends only
|
forward communication, in the first exchange, a processor sends only
|
||||||
owned atoms. In subsequent exchanges, it sends ghost atoms received
|
owned atoms. In subsequent exchanges, it sends ghost atoms received
|
||||||
in previous exchanges. For the irregular pattern (right) overlaps of
|
in previous exchanges. For the irregular pattern (right) overlaps of
|
||||||
a processor's extended ghost-atom subdomain with all other processors
|
a processor's extended ghost-atom subdomain with all other processors
|
||||||
|
|||||||
@ -40,28 +40,28 @@ orthogonal boxes.
|
|||||||
|
|
||||||
.. _fft-parallel:
|
.. _fft-parallel:
|
||||||
.. figure:: img/fft-decomp-parallel.png
|
.. figure:: img/fft-decomp-parallel.png
|
||||||
:align: center
|
|
||||||
|
|
||||||
parallel FFT in PPPM
|
Parallel FFT in PPPM
|
||||||
|
|
||||||
Stages of a parallel FFT for a simulation domain overlaid
|
Stages of a parallel FFT for a simulation domain overlaid with an
|
||||||
with an 8x8x8 3d FFT grid, partitioned across 64 processors.
|
8x8x8 3d FFT grid, partitioned across 64 processors. Within each
|
||||||
Within each of the 4 diagrams, grid cells of the same color are
|
of the 4 diagrams, grid cells of the same color are owned by a
|
||||||
owned by a single processor; for simplicity only cells owned by 4
|
single processor; for simplicity, only cells owned by 4 or 8 of
|
||||||
or 8 of the 64 processors are colored. The two images on the left
|
the 64 processors are colored. The two images on the left
|
||||||
illustrate brick-to-pencil communication. The two images on the
|
illustrate brick-to-pencil communication. The two images on the
|
||||||
right illustrate pencil-to-pencil communication, which in this
|
right illustrate pencil-to-pencil communication, which in this
|
||||||
case transposes the *y* and *z* dimensions of the grid.
|
case transposes the *y* and *z* dimensions of the grid.
|
||||||
|
|
||||||
Parallel 3d FFTs require substantial communication relative to their
|
Parallel 3d FFTs require substantial communication relative to their
|
||||||
computational cost. A 3d FFT is implemented by a series of 1d FFTs
|
computational cost. A 3d FFT is implemented by a series of 1d FFTs
|
||||||
along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus the FFT
|
along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus, the
|
||||||
grid cannot be decomposed like atoms into 3 dimensions for parallel
|
FFT grid cannot be decomposed like atoms into 3 dimensions for parallel
|
||||||
processing of the FFTs but only in 1 (as planes) or 2 (as pencils)
|
processing of the FFTs but only in 1 (as planes) or 2 (as pencils)
|
||||||
dimensions and in between the steps the grid needs to be transposed to
|
dimensions and in between the steps the grid needs to be transposed to
|
||||||
have the FFT grid portion "owned" by each MPI process complete in the
|
have the FFT grid portion "owned" by each MPI process complete in the
|
||||||
direction of the 1d FFTs it has to perform. LAMMPS uses the
|
direction of the 1d FFTs it has to perform. LAMMPS uses the
|
||||||
pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure.
|
pencil-decomposition algorithm as shown in the :ref:`fft-parallel`
|
||||||
|
figure.
|
||||||
|
|
||||||
Initially (far left), each processor owns a brick of same-color grid
|
Initially (far left), each processor owns a brick of same-color grid
|
||||||
cells (actually grid points) contained within in its subdomain. A
|
cells (actually grid points) contained within in its subdomain. A
|
||||||
@ -97,7 +97,7 @@ across all $P$ processors with a single call to ``MPI_Alltoall()``, but
|
|||||||
this is typically much slower. However, for the specialized brick and
|
this is typically much slower. However, for the specialized brick and
|
||||||
pencil tiling illustrated in :ref:`fft-parallel` figure, collective
|
pencil tiling illustrated in :ref:`fft-parallel` figure, collective
|
||||||
communication across the entire MPI communicator is not required. In
|
communication across the entire MPI communicator is not required. In
|
||||||
the example an :math:`8^3` grid with 512 grid cells is partitioned
|
the example, an :math:`8^3` grid with 512 grid cells is partitioned
|
||||||
across 64 processors; each processor owns a 2x2x2 3d brick of grid
|
across 64 processors; each processor owns a 2x2x2 3d brick of grid
|
||||||
cells. The initial brick-to-pencil communication (upper left to upper
|
cells. The initial brick-to-pencil communication (upper left to upper
|
||||||
right) only requires collective communication within subgroups of 4
|
right) only requires collective communication within subgroups of 4
|
||||||
@ -132,7 +132,7 @@ grid/particle operations that LAMMPS supports:
|
|||||||
- The fftMPI library allows each grid dimension to be a multiple of
|
- The fftMPI library allows each grid dimension to be a multiple of
|
||||||
small prime factors (2,3,5), and allows any number of processors to
|
small prime factors (2,3,5), and allows any number of processors to
|
||||||
perform the FFT. The resulting brick and pencil decompositions are
|
perform the FFT. The resulting brick and pencil decompositions are
|
||||||
thus not always as well-aligned but the size of subgroups of
|
thus not always as well-aligned, but the size of subgroups of
|
||||||
processors for the two modes of communication (brick/pencil and
|
processors for the two modes of communication (brick/pencil and
|
||||||
pencil/pencil) still scale as :math:`O(P^{\frac{1}{3}})` and
|
pencil/pencil) still scale as :math:`O(P^{\frac{1}{3}})` and
|
||||||
:math:`O(P^{\frac{1}{2}})`.
|
:math:`O(P^{\frac{1}{2}})`.
|
||||||
@ -143,21 +143,23 @@ grid/particle operations that LAMMPS supports:
|
|||||||
in memory. This reordering can be done during the packing or
|
in memory. This reordering can be done during the packing or
|
||||||
unpacking of buffers for MPI communication.
|
unpacking of buffers for MPI communication.
|
||||||
|
|
||||||
- For large systems and particularly a large number of MPI processes,
|
- For large systems and particularly many MPI processes, the dominant
|
||||||
the dominant cost for parallel FFTs is often the communication, not
|
cost for parallel FFTs is often the communication, not the computation
|
||||||
the computation of 1d FFTs, even though the latter scales as :math:`N
|
of 1d FFTs, even though the latter scales as :math:`N \log(N)` in the
|
||||||
\log(N)` in the number of grid points *N* per grid direction. This is
|
number of grid points *N* per grid direction. This is due to the fact
|
||||||
due to the fact that only a 2d decomposition into pencils is possible
|
that only a 2d decomposition into pencils is possible while atom data
|
||||||
while atom data (and their corresponding short-range force and energy
|
(and their corresponding short-range force and energy computations)
|
||||||
computations) can be decomposed efficiently in 3d.
|
can be decomposed efficiently in 3d.
|
||||||
|
|
||||||
This can be addressed by reducing the number of MPI processes involved
|
Reducing the number of MPI processes involved in the MPI communication
|
||||||
in the MPI communication by using :doc:`hybrid MPI + OpenMP
|
will reduce this kind of overhead. By using a :doc:`hybrid MPI +
|
||||||
parallelization <Speed_omp>`. This will use OpenMP parallelization
|
OpenMP parallelization <Speed_omp>` it is still possible to use all
|
||||||
inside the MPI domains and while that may have a lower parallel
|
processes for parallel computation. It will use OpenMP
|
||||||
efficiency, it reduces the communication overhead.
|
parallelization inside the MPI domains. While that may have a lower
|
||||||
|
parallel efficiency for some part of the computation, that can be less
|
||||||
|
than the communication overhead in the 3d FFTs.
|
||||||
|
|
||||||
As an alternative it is also possible to start a :ref:`multi-partition
|
As an alternative, it is also possible to start a :ref:`multi-partition
|
||||||
<partition>` calculation and then use the :doc:`verlet/split
|
<partition>` calculation and then use the :doc:`verlet/split
|
||||||
integrator <run_style>` to perform the PPPM computation on a
|
integrator <run_style>` to perform the PPPM computation on a
|
||||||
dedicated, separate partition of MPI processes. This uses an integer
|
dedicated, separate partition of MPI processes. This uses an integer
|
||||||
@ -175,7 +177,7 @@ grid/particle operations that LAMMPS supports:
|
|||||||
manner consistent with processor subdomains, and provides methods for
|
manner consistent with processor subdomains, and provides methods for
|
||||||
forward and reverse communication of owned and ghost grid point
|
forward and reverse communication of owned and ghost grid point
|
||||||
values. It is used for PPPM as an FFT grid (as outlined above) and
|
values. It is used for PPPM as an FFT grid (as outlined above) and
|
||||||
also for the MSM algorithm which uses a cascade of grid sizes from
|
also for the MSM algorithm, which uses a cascade of grid sizes from
|
||||||
fine to coarse to compute long-range Coulombic forces. The GridComm
|
fine to coarse to compute long-range Coulombic forces. The GridComm
|
||||||
class is also useful for models where continuum fields interact with
|
class is also useful for models where continuum fields interact with
|
||||||
particles. For example, the two-temperature model (TTM) defines heat
|
particles. For example, the two-temperature model (TTM) defines heat
|
||||||
|
|||||||
@ -3,12 +3,12 @@ Neighbor lists
|
|||||||
|
|
||||||
To compute forces efficiently, each processor creates a Verlet-style
|
To compute forces efficiently, each processor creates a Verlet-style
|
||||||
neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned,
|
neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned,
|
||||||
*j* = owned or ghost) with separation less than the applicable
|
*j* = owned or ghost) with separation less than the applicable neighbor
|
||||||
neighbor list cutoff distance. In LAMMPS the neighbor lists are stored
|
list cutoff distance. In LAMMPS, the neighbor lists are stored in a
|
||||||
in a multiple-page data structure; each page is a contiguous chunk of
|
multiple-page data structure; each page is a contiguous chunk of memory
|
||||||
memory which stores vectors of neighbor atoms *j* for many *i* atoms.
|
which stores vectors of neighbor atoms *j* for many *i* atoms. This
|
||||||
This allows pages to be incrementally allocated or deallocated in blocks
|
allows pages to be incrementally allocated or deallocated in blocks as
|
||||||
as needed. Neighbor lists typically consume the most memory of any data
|
needed. Neighbor lists typically consume the most memory of any data
|
||||||
structure in LAMMPS. The neighbor list is rebuilt (from scratch) once
|
structure in LAMMPS. The neighbor list is rebuilt (from scratch) once
|
||||||
every few timesteps, then used repeatedly each step for force or other
|
every few timesteps, then used repeatedly each step for force or other
|
||||||
computations. The neighbor cutoff distance is :math:`R_n = R_f +
|
computations. The neighbor cutoff distance is :math:`R_n = R_f +
|
||||||
@ -16,7 +16,7 @@ computations. The neighbor cutoff distance is :math:`R_n = R_f +
|
|||||||
the interatomic potential for computing short-range pairwise or manybody
|
the interatomic potential for computing short-range pairwise or manybody
|
||||||
forces and :math:`\Delta_s` is a "skin" distance that allows the list to
|
forces and :math:`\Delta_s` is a "skin" distance that allows the list to
|
||||||
be used for multiple steps assuming that atoms do not move very far
|
be used for multiple steps assuming that atoms do not move very far
|
||||||
between consecutive time steps. Typically the code triggers
|
between consecutive time steps. Typically, the code triggers
|
||||||
reneighboring when any atom has moved half the skin distance since the
|
reneighboring when any atom has moved half the skin distance since the
|
||||||
last reneighboring; this and other options of the neighbor list rebuild
|
last reneighboring; this and other options of the neighbor list rebuild
|
||||||
can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
|
can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
|
||||||
@ -26,10 +26,10 @@ their owning processor's subdomain are first migrated to new processors
|
|||||||
via communication. Periodic boundary conditions are also (only)
|
via communication. Periodic boundary conditions are also (only)
|
||||||
enforced on these steps to ensure each atom is re-assigned to the
|
enforced on these steps to ensure each atom is re-assigned to the
|
||||||
correct processor. After migration, the atoms owned by each processor
|
correct processor. After migration, the atoms owned by each processor
|
||||||
are stored in a contiguous vector. Periodically each processor
|
are stored in a contiguous vector. Periodically, each processor
|
||||||
spatially sorts owned atoms within its vector to reorder it for improved
|
spatially sorts owned atoms within its vector to reorder it for improved
|
||||||
cache efficiency in force computations and neighbor list building. For
|
cache efficiency in force computations and neighbor list building. For
|
||||||
that atoms are spatially binned and then reordered so that atoms in the
|
that, atoms are spatially binned and then reordered so that atoms in the
|
||||||
same bin are adjacent in the vector. Atom sorting can be disabled or
|
same bin are adjacent in the vector. Atom sorting can be disabled or
|
||||||
its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
||||||
|
|
||||||
@ -44,7 +44,7 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
|||||||
(left) and triclinic (right) domains. A regular grid of neighbor
|
(left) and triclinic (right) domains. A regular grid of neighbor
|
||||||
bins (thin lines) overlays the entire simulation domain and need not
|
bins (thin lines) overlays the entire simulation domain and need not
|
||||||
align with subdomain boundaries; only the portion overlapping the
|
align with subdomain boundaries; only the portion overlapping the
|
||||||
augmented subdomain is shown. In the triclinic case it overlaps the
|
augmented subdomain is shown. In the triclinic case, it overlaps the
|
||||||
bounding box of the tilted rectangle. The blue- and red-shaded bins
|
bounding box of the tilted rectangle. The blue- and red-shaded bins
|
||||||
represent a stencil of bins searched to find neighbors of a particular
|
represent a stencil of bins searched to find neighbors of a particular
|
||||||
atom (black dot).
|
atom (black dot).
|
||||||
@ -53,12 +53,12 @@ To build a local neighbor list in linear time, the simulation domain is
|
|||||||
overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
|
overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
|
||||||
as shown in the :ref:`neighbor-stencil` figure for 2d models and a
|
as shown in the :ref:`neighbor-stencil` figure for 2d models and a
|
||||||
single MPI processor's subdomain. Each processor stores a set of
|
single MPI processor's subdomain. Each processor stores a set of
|
||||||
neighbor bins which overlap its subdomain extended by the neighbor
|
neighbor bins which overlap its subdomain, extended by the neighbor
|
||||||
cutoff distance :math:`R_n`. As illustrated, the bins need not align
|
cutoff distance :math:`R_n`. As illustrated, the bins need not align
|
||||||
with processor boundaries; an integer number in each dimension is fit to
|
with processor boundaries; an integer number in each dimension is fit to
|
||||||
the size of the entire simulation box.
|
the size of the entire simulation box.
|
||||||
|
|
||||||
Most often LAMMPS builds what it calls a "half" neighbor list where
|
Most often, LAMMPS builds what is called a "half" neighbor list where
|
||||||
each *i,j* neighbor pair is stored only once, with either atom *i* or
|
each *i,j* neighbor pair is stored only once, with either atom *i* or
|
||||||
*j* as the central atom. The build can be done efficiently by using a
|
*j* as the central atom. The build can be done efficiently by using a
|
||||||
pre-computed "stencil" of bins around a central origin bin which
|
pre-computed "stencil" of bins around a central origin bin which
|
||||||
@ -67,18 +67,18 @@ is simply a list of integer offsets in *x,y,z* of nearby bins
|
|||||||
surrounding the origin bin which are close enough to contain any
|
surrounding the origin bin which are close enough to contain any
|
||||||
neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the
|
neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the
|
||||||
origin bin. Note that for a half neighbor list, the stencil can be
|
origin bin. Note that for a half neighbor list, the stencil can be
|
||||||
asymmetric since each atom only need store half its nearby neighbors.
|
asymmetric, since each atom only need store half its nearby neighbors.
|
||||||
|
|
||||||
These stencils are illustrated in the figure for a half list and a bin
|
These stencils are illustrated in the figure for a half list and a bin
|
||||||
size of :math:`\frac{1}{2} R_n`. There are 13 red+blue stencil bins in
|
size of :math:`\frac{1}{2} R_n`. There are 13 red+blue stencil bins in
|
||||||
2d (for the orthogonal case, 15 for triclinic). In 3d there would be
|
2d (for the orthogonal case, 15 for triclinic). In 3d there would be
|
||||||
63, 13 in the plane of bins that contain the origin bin and 25 in each
|
63, 13 in the plane of bins that contain the origin bin and 25 in each
|
||||||
of the two planes above it in the *z* direction (75 for triclinic). The
|
of the two planes above it in the *z* direction (75 for triclinic). The
|
||||||
reason the triclinic stencil has extra bins is because the bins tile the
|
triclinic stencil has extra bins because the bins tile the bounding box
|
||||||
bounding box of the entire triclinic domain and thus are not periodic
|
of the entire triclinic domain, and thus are not periodic with respect
|
||||||
with respect to the simulation box itself. The stencil and logic for
|
to the simulation box itself. The stencil and logic for determining
|
||||||
determining which *i,j* pairs to include in the neighbor list are
|
which *i,j* pairs to include in the neighbor list are altered slightly
|
||||||
altered slightly to account for this.
|
to account for this.
|
||||||
|
|
||||||
To build a neighbor list, a processor first loops over its "owned" plus
|
To build a neighbor list, a processor first loops over its "owned" plus
|
||||||
"ghost" atoms and assigns each to a neighbor bin. This uses an integer
|
"ghost" atoms and assigns each to a neighbor bin. This uses an integer
|
||||||
@ -95,7 +95,7 @@ supports:
|
|||||||
been found to be optimal for many typical cases. Smaller bins incur
|
been found to be optimal for many typical cases. Smaller bins incur
|
||||||
additional overhead to loop over; larger bins require more distance
|
additional overhead to loop over; larger bins require more distance
|
||||||
calculations. Note that for smaller bin sizes, the 2d stencil in the
|
calculations. Note that for smaller bin sizes, the 2d stencil in the
|
||||||
figure would be more semi-circular in shape (hemispherical in 3d),
|
figure would be of a more semicircular shape (hemispherical in 3d),
|
||||||
with bins near the corners of the square eliminated due to their
|
with bins near the corners of the square eliminated due to their
|
||||||
distance from the origin bin.
|
distance from the origin bin.
|
||||||
|
|
||||||
@ -111,8 +111,8 @@ supports:
|
|||||||
symmetric stencil. It also includes lists with partial enumeration of
|
symmetric stencil. It also includes lists with partial enumeration of
|
||||||
ghost atom neighbors. The full and ghost-atom lists are used by
|
ghost atom neighbors. The full and ghost-atom lists are used by
|
||||||
various manybody interatomic potentials. Lists may also use different
|
various manybody interatomic potentials. Lists may also use different
|
||||||
criteria for inclusion of a pair interaction. Typically this simply
|
criteria for inclusion of a pairwise interaction. Typically, this
|
||||||
depends only on the distance between two atoms and the cutoff
|
simply depends only on the distance between two atoms and the cutoff
|
||||||
distance. But for finite-size coarse-grained particles with
|
distance. But for finite-size coarse-grained particles with
|
||||||
individual diameters (e.g. polydisperse granular particles), it can
|
individual diameters (e.g. polydisperse granular particles), it can
|
||||||
also depend on the diameters of the two particles.
|
also depend on the diameters of the two particles.
|
||||||
@ -121,11 +121,11 @@ supports:
|
|||||||
of the master neighbor list for the full system need to be generated,
|
of the master neighbor list for the full system need to be generated,
|
||||||
one for each sub-style, which contains only the *i,j* pairs needed to
|
one for each sub-style, which contains only the *i,j* pairs needed to
|
||||||
compute interactions between subsets of atoms for the corresponding
|
compute interactions between subsets of atoms for the corresponding
|
||||||
potential. This means not all *i* or *j* atoms owned by a processor
|
potential. This means, not all *i* or *j* atoms owned by a processor
|
||||||
are included in a particular sub-list.
|
are included in a particular sub-list.
|
||||||
|
|
||||||
- Some models use different cutoff lengths for pairwise interactions
|
- Some models use different cutoff lengths for pairwise interactions
|
||||||
between different kinds of particles which are stored in a single
|
between different kinds of particles, which are stored in a single
|
||||||
neighbor list. One example is a solvated colloidal system with large
|
neighbor list. One example is a solvated colloidal system with large
|
||||||
colloidal particles where colloid/colloid, colloid/solvent, and
|
colloidal particles where colloid/colloid, colloid/solvent, and
|
||||||
solvent/solvent interaction cutoffs can be dramatically different.
|
solvent/solvent interaction cutoffs can be dramatically different.
|
||||||
@ -153,7 +153,7 @@ supports:
|
|||||||
For the newton pair *on* setting the atom *j* is only added to the
|
For the newton pair *on* setting the atom *j* is only added to the
|
||||||
list if its *z* coordinate is larger, or if equal the *y* coordinate
|
list if its *z* coordinate is larger, or if equal the *y* coordinate
|
||||||
is larger, and that is equal, too, the *x* coordinate is larger. For
|
is larger, and that is equal, too, the *x* coordinate is larger. For
|
||||||
homogeneously dense systems that will result in picking neighbors from
|
homogeneously dense systems, that will result in picking neighbors from
|
||||||
a same size sector in always the same direction relative to the
|
a same size sector in always the same direction relative to the
|
||||||
"owned" atom and thus it should lead to similar length neighbor lists
|
"owned" atom, and thus it should lead to similar length neighbor lists
|
||||||
and thus reduce the chance of a load imbalance.
|
and reduce the chance of a load imbalance.
|
||||||
|
|||||||
@ -6,7 +6,7 @@ thread parallelism to predominantly distribute loops over local data
|
|||||||
and thus follow an orthogonal parallelization strategy to the
|
and thus follow an orthogonal parallelization strategy to the
|
||||||
decomposition into spatial domains used by the :doc:`MPI partitioning
|
decomposition into spatial domains used by the :doc:`MPI partitioning
|
||||||
<Developer_par_part>`. For clarity, this section discusses only the
|
<Developer_par_part>`. For clarity, this section discusses only the
|
||||||
implementation in the OPENMP package as it is the simplest. The INTEL
|
implementation in the OPENMP package, as it is the simplest. The INTEL
|
||||||
and KOKKOS package offer additional options and are more complex since
|
and KOKKOS package offer additional options and are more complex since
|
||||||
they support more features and different hardware like co-processors
|
they support more features and different hardware like co-processors
|
||||||
or GPUs.
|
or GPUs.
|
||||||
@ -14,7 +14,7 @@ or GPUs.
|
|||||||
One of the key decisions when implementing the OPENMP package was to
|
One of the key decisions when implementing the OPENMP package was to
|
||||||
keep the changes to the source code small, so that it would be easier to
|
keep the changes to the source code small, so that it would be easier to
|
||||||
maintain the code and keep it in sync with the non-threaded standard
|
maintain the code and keep it in sync with the non-threaded standard
|
||||||
implementation. this is achieved by a) making the OPENMP version a
|
implementation. This is achieved by a) making the OPENMP version a
|
||||||
derived class from the regular version (e.g. ``PairLJCutOMP`` from
|
derived class from the regular version (e.g. ``PairLJCutOMP`` from
|
||||||
``PairLJCut``) and overriding only methods that are multi-threaded or
|
``PairLJCut``) and overriding only methods that are multi-threaded or
|
||||||
need to be modified to support multi-threading (similar to what was done
|
need to be modified to support multi-threading (similar to what was done
|
||||||
@ -26,13 +26,13 @@ into three separate classes ``ThrOMP``, ``ThrData``, and ``FixOMP``.
|
|||||||
available in the corresponding base class (e.g. ``Pair`` for
|
available in the corresponding base class (e.g. ``Pair`` for
|
||||||
``PairLJCutOMP``) like multi-thread aware variants of the "tally"
|
``PairLJCutOMP``) like multi-thread aware variants of the "tally"
|
||||||
functions. Those functions are made available through multiple
|
functions. Those functions are made available through multiple
|
||||||
inheritance so those new functions have to have unique names to avoid
|
inheritance, so those new functions have to have unique names to avoid
|
||||||
ambiguities; typically ``_thr`` is appended to the name of the function.
|
ambiguities; typically ``_thr`` is appended to the name of the function.
|
||||||
``ThrData`` is a classes that manages per-thread data structures.
|
``ThrData`` is a class that manages per-thread data structures. It is
|
||||||
It is used instead of extending the corresponding storage to per-thread
|
used instead of extending the corresponding storage to per-thread arrays
|
||||||
arrays to avoid slowdowns due to "false sharing" when multiple threads
|
to avoid slowdowns due to "false sharing" when multiple threads update
|
||||||
update adjacent elements in an array and thus force the CPU cache lines
|
adjacent elements in an array and thus force the CPU cache lines to be
|
||||||
to be reset and re-fetched. ``FixOMP`` finally manages the "multi-thread
|
reset and re-fetched. ``FixOMP`` finally manages the "multi-thread
|
||||||
state" like settings and access to per-thread storage, it is activated
|
state" like settings and access to per-thread storage, it is activated
|
||||||
by the :doc:`package omp <package>` command.
|
by the :doc:`package omp <package>` command.
|
||||||
|
|
||||||
@ -46,24 +46,24 @@ involve multiple atoms and thus there are race conditions when multiple
|
|||||||
threads want to update per-atom data of the same atoms. Five possible
|
threads want to update per-atom data of the same atoms. Five possible
|
||||||
strategies have been considered to avoid this:
|
strategies have been considered to avoid this:
|
||||||
|
|
||||||
1) restructure the code so that there is no overlapping access possible
|
1. Restructure the code so that there is no overlapping access possible
|
||||||
when computing in parallel, e.g. by breaking lists into multiple
|
when computing in parallel, e.g. by breaking lists into multiple
|
||||||
parts and synchronizing threads in between.
|
parts and synchronizing threads in between.
|
||||||
2) have each thread be "responsible" for a specific group of atoms and
|
2. Have each thread be "responsible" for a specific group of atoms and
|
||||||
compute these interactions multiple times, once on each thread that
|
compute these interactions multiple times, once on each thread that
|
||||||
is responsible for a given atom and then have each thread only update
|
is responsible for a given atom, and then have each thread only update
|
||||||
the properties of this atom.
|
the properties of this atom.
|
||||||
3) use mutexes around functions and regions of code where the data race
|
3. Use mutexes around functions and regions of code where the data race
|
||||||
could happen
|
could happen.
|
||||||
4) use atomic operations when updating per-atom properties
|
4. Use atomic operations when updating per-atom properties.
|
||||||
5) use replicated per-thread data structures to accumulate data without
|
5. Use replicated per-thread data structures to accumulate data without
|
||||||
conflicts and then use a reduction to combine those results into the
|
conflicts and then use a reduction to combine those results into the
|
||||||
data structures used by the regular style.
|
data structures used by the regular style.
|
||||||
|
|
||||||
Option 5 was chosen for the OPENMP package because it would retain the
|
Option 5 was chosen for the OPENMP package because it would retain the
|
||||||
performance for the case of 1 thread and the code would be more
|
performance for the case of a single thread and the code would be more
|
||||||
maintainable. Option 1 would require extensive code changes,
|
maintainable. Option 1 would require extensive code changes,
|
||||||
particularly to the neighbor list code; options 2 would have incurred a
|
particularly to the neighbor list code; option 2 would have incurred a
|
||||||
2x or more performance penalty for the serial case; option 3 causes
|
2x or more performance penalty for the serial case; option 3 causes
|
||||||
significant overhead and would enforce serialization of operations in
|
significant overhead and would enforce serialization of operations in
|
||||||
inner loops and thus defeat the purpose of multi-threading; option 4
|
inner loops and thus defeat the purpose of multi-threading; option 4
|
||||||
@ -80,7 +80,7 @@ equivalent to the number of CPU cores per CPU socket on high-end
|
|||||||
supercomputers.
|
supercomputers.
|
||||||
|
|
||||||
Thus arrays like the force array are dimensioned to the number of atoms
|
Thus arrays like the force array are dimensioned to the number of atoms
|
||||||
times the number of threads when enabling OpenMP support and inside the
|
times the number of threads when enabling OpenMP support, and inside the
|
||||||
compute functions a pointer to a different chunk is obtained by each thread.
|
compute functions a pointer to a different chunk is obtained by each thread.
|
||||||
Similarly, accumulators like potential energy or virial are kept in
|
Similarly, accumulators like potential energy or virial are kept in
|
||||||
per-thread instances of the ``ThrData`` class and then only reduced and
|
per-thread instances of the ``ThrData`` class and then only reduced and
|
||||||
@ -91,7 +91,7 @@ Loop scheduling
|
|||||||
"""""""""""""""
|
"""""""""""""""
|
||||||
|
|
||||||
Multi-thread parallelization is applied by distributing (outer) loops
|
Multi-thread parallelization is applied by distributing (outer) loops
|
||||||
statically across threads. Typically this would be the loop over local
|
statically across threads. Typically, this would be the loop over local
|
||||||
atoms *i* when processing *i,j* pairs of atoms from a neighbor list.
|
atoms *i* when processing *i,j* pairs of atoms from a neighbor list.
|
||||||
The design of the neighbor list code results in atoms having a similar
|
The design of the neighbor list code results in atoms having a similar
|
||||||
number of neighbors for homogeneous systems and thus load imbalances
|
number of neighbors for homogeneous systems and thus load imbalances
|
||||||
|
|||||||
@ -7,36 +7,36 @@ distributed-memory parallelism is set with the :doc:`comm_style command
|
|||||||
|
|
||||||
.. _domain-decomposition:
|
.. _domain-decomposition:
|
||||||
.. figure:: img/domain-decomp.png
|
.. figure:: img/domain-decomp.png
|
||||||
:align: center
|
|
||||||
|
|
||||||
domain decomposition
|
Domain decomposition schemes
|
||||||
|
|
||||||
This figure shows the different kinds of domain decomposition used
|
This figure shows the different kinds of domain decomposition used
|
||||||
for MPI parallelization: "brick" on the left with an orthogonal
|
for MPI parallelization: "brick" on the left with an orthogonal
|
||||||
(left) and a triclinic (middle) simulation domain, and a "tiled"
|
(left) and a triclinic (middle) simulation domain, and a "tiled"
|
||||||
decomposition (right). The black lines show the division into
|
decomposition (right). The black lines show the division into
|
||||||
subdomains and the contained atoms are "owned" by the corresponding
|
subdomains, and the contained atoms are "owned" by the
|
||||||
MPI process. The green dashed lines indicate how subdomains are
|
corresponding MPI process. The green dashed lines indicate how
|
||||||
extended with "ghost" atoms up to the communication cutoff distance.
|
subdomains are extended with "ghost" atoms up to the communication
|
||||||
|
cutoff distance.
|
||||||
|
|
||||||
The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
|
The LAMMPS simulation box is a 3d or 2d volume, which can be of
|
||||||
or triclinic in shape, as illustrated in the :ref:`domain-decomposition`
|
orthogonal or triclinic shape, as illustrated in the
|
||||||
figure for the 2d case. Orthogonal means the box edges are aligned with
|
:ref:`domain-decomposition` figure for the 2d case. Orthogonal means
|
||||||
the *x*, *y*, *z* Cartesian axes, and the box faces are thus all
|
the box edges are aligned with the *x*, *y*, *z* Cartesian axes, and the
|
||||||
rectangular. Triclinic allows for a more general parallelepiped shape
|
box faces are thus all rectangular. Triclinic allows for a more general
|
||||||
in which edges are aligned with three arbitrary vectors and the box
|
parallelepiped shape in which edges are aligned with three arbitrary
|
||||||
faces are parallelograms. In each dimension box faces can be periodic,
|
vectors and the box faces are parallelograms. In each dimension, box
|
||||||
or non-periodic with fixed or shrink-wrapped boundaries. In the fixed
|
faces can be periodic, or non-periodic with fixed or shrink-wrapped
|
||||||
case, atoms which move outside the face are deleted; shrink-wrapped
|
boundaries. In the fixed case, atoms which move outside the face are
|
||||||
means the position of the box face adjusts continuously to enclose all
|
deleted; shrink-wrapped means the position of the box face adjusts
|
||||||
the atoms.
|
continuously to enclose all the atoms.
|
||||||
|
|
||||||
For distributed-memory MPI parallelism, the simulation box is spatially
|
For distributed-memory MPI parallelism, the simulation box is spatially
|
||||||
decomposed (partitioned) into non-overlapping subdomains which fill the
|
decomposed (partitioned) into non-overlapping subdomains which fill the
|
||||||
box. The default partitioning, "brick", is most suitable when atom
|
box. The default partitioning, "brick", is most suitable when atom
|
||||||
density is roughly uniform, as shown in the left-side images of the
|
density is roughly uniform, as shown in the left-side images of the
|
||||||
:ref:`domain-decomposition` figure. The subdomains comprise a regular
|
:ref:`domain-decomposition` figure. The subdomains comprise a regular
|
||||||
grid and all subdomains are identical in size and shape. Both the
|
grid, and all subdomains are identical in size and shape. Both the
|
||||||
orthogonal and triclinic boxes can deform continuously during a
|
orthogonal and triclinic boxes can deform continuously during a
|
||||||
simulation, e.g. to compress a solid or shear a liquid, in which case
|
simulation, e.g. to compress a solid or shear a liquid, in which case
|
||||||
the processor subdomains likewise deform.
|
the processor subdomains likewise deform.
|
||||||
@ -76,14 +76,14 @@ the load imbalance:
|
|||||||
|
|
||||||
The pictures above demonstrate different decompositions for a 2d system
|
The pictures above demonstrate different decompositions for a 2d system
|
||||||
with 12 MPI ranks. The atom colors indicate the load imbalance of each
|
with 12 MPI ranks. The atom colors indicate the load imbalance of each
|
||||||
subdomain with green being optimal and red the least optimal.
|
subdomain, with green being optimal and red the least optimal.
|
||||||
|
|
||||||
Due to the vacuum in the system, the default decomposition is unbalanced
|
Due to the vacuum in the system, the default decomposition is
|
||||||
with several MPI ranks without atoms (left). By forcing a 1x12x1
|
unbalanced, with several MPI ranks without atoms (left). By forcing a
|
||||||
processor grid, every MPI rank does computations now, but number of
|
1x12x1 processor grid, every MPI rank does computations now, but the
|
||||||
atoms per subdomain is still uneven and the thin slice shape increases
|
number of atoms per subdomain is still uneven, and the thin slice shape
|
||||||
the amount of communication between subdomains (center left). With a
|
increases the amount of communication between subdomains (center
|
||||||
2x6x1 processor grid and shifting the subdomain divisions, the load
|
left). With a 2x6x1 processor grid and shifting the subdomain divisions,
|
||||||
imbalance is further reduced and the amount of communication required
|
the load imbalance is further reduced and the amount of communication
|
||||||
between subdomains is less (center right). And using the recursive
|
required between subdomains is less (center right). And using the
|
||||||
bisectioning leads to further improved decomposition (right).
|
recursive bisectioning leads to further improved decomposition (right).
|
||||||
|
|||||||
@ -136,7 +136,7 @@ The LAMMPS Python module enables calling the LAMMPS C library API from
|
|||||||
Python by dynamically loading functions in the LAMMPS shared library through
|
Python by dynamically loading functions in the LAMMPS shared library through
|
||||||
the `Python ctypes module <https://docs.python.org/3/library/ctypes.html>`_.
|
the `Python ctypes module <https://docs.python.org/3/library/ctypes.html>`_.
|
||||||
Because of the dynamic loading, it is **required** that LAMMPS is compiled
|
Because of the dynamic loading, it is **required** that LAMMPS is compiled
|
||||||
in :ref:`"shared" mode <exe>`. The Python interface is object oriented, but
|
in :ref:`"shared" mode <exe>`. The Python interface is object-oriented, but
|
||||||
otherwise tries to be very similar to the C library API. Three different
|
otherwise tries to be very similar to the C library API. Three different
|
||||||
Python classes to run LAMMPS are available and they build on each other.
|
Python classes to run LAMMPS are available and they build on each other.
|
||||||
More information on this is in the :doc:`Python_head`
|
More information on this is in the :doc:`Python_head`
|
||||||
@ -152,7 +152,7 @@ LAMMPS Fortran API
|
|||||||
|
|
||||||
The LAMMPS Fortran module is a wrapper around calling functions from the
|
The LAMMPS Fortran module is a wrapper around calling functions from the
|
||||||
LAMMPS C library API. This is done using the ISO_C_BINDING feature in
|
LAMMPS C library API. This is done using the ISO_C_BINDING feature in
|
||||||
Fortran 2003. The interface is object oriented but otherwise tries to
|
Fortran 2003. The interface is object-oriented but otherwise tries to
|
||||||
be very similar to the C library API and the basic Python module.
|
be very similar to the C library API and the basic Python module.
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
|
|||||||
@ -1071,7 +1071,7 @@ getting started, but not as a fully tested and supported feature of the
|
|||||||
LAMMPS distribution. Any contributions to complete this are, of course,
|
LAMMPS distribution. Any contributions to complete this are, of course,
|
||||||
welcome. Please also note, that for the case of creating a Python wrapper,
|
welcome. Please also note, that for the case of creating a Python wrapper,
|
||||||
a fully supported :doc:`Ctypes based lammps module <Python_module>`
|
a fully supported :doc:`Ctypes based lammps module <Python_module>`
|
||||||
already exists. That module is designed to be object oriented while
|
already exists. That module is designed to be object-oriented while
|
||||||
SWIG will generate a 1:1 translation of the functions in the interface file.
|
SWIG will generate a 1:1 translation of the functions in the interface file.
|
||||||
|
|
||||||
Building the wrapper
|
Building the wrapper
|
||||||
|
|||||||
@ -730,18 +730,18 @@ is because there can only be one fix which monitors the global
|
|||||||
pressure and changes the simulation box dimensions. So you have 3
|
pressure and changes the simulation box dimensions. So you have 3
|
||||||
choices:
|
choices:
|
||||||
|
|
||||||
* Use one of the 4 NPT or NPH styles for the rigid bodies. Use the
|
#. Use one of the 4 NPT or NPH styles for the rigid bodies. Use the
|
||||||
*dilate* all option so that it will dilate the positions of the
|
*dilate* all option so that it will dilate the positions of the
|
||||||
non-rigid particles as well. Use :doc:`fix nvt <fix_nh>` (or any
|
non-rigid particles as well. Use :doc:`fix nvt <fix_nh>` (or any
|
||||||
other thermostat) for the non-rigid particles.
|
other thermostat) for the non-rigid particles.
|
||||||
* Use :doc:`fix npt <fix_nh>` for the group of non-rigid particles. Use
|
#. Use :doc:`fix npt <fix_nh>` for the group of non-rigid particles. Use
|
||||||
the *dilate* all option so that it will dilate the center-of-mass
|
the *dilate* all option so that it will dilate the center-of-mass
|
||||||
positions of the rigid bodies as well. Use one of the 4 NVE or 2 NVT
|
positions of the rigid bodies as well. Use one of the 4 NVE or 2 NVT
|
||||||
rigid styles for the rigid bodies.
|
rigid styles for the rigid bodies.
|
||||||
* Use :doc:`fix press/berendsen <fix_press_berendsen>` to compute the
|
#. Use :doc:`fix press/berendsen <fix_press_berendsen>` to compute the
|
||||||
pressure and change the box dimensions. Use one of the 4 NVE or 2 NVT
|
pressure and change the box dimensions. Use one of the 4 NVE or 2 NVT
|
||||||
rigid styles for the rigid bodies. Use :doc:`fix nvt <fix_nh>` (or
|
rigid styles for the rigid bodies. Use :doc:`fix nvt <fix_nh>` (or
|
||||||
any other thermostat) for the non-rigid particles.
|
any other thermostat) for the non-rigid particles.
|
||||||
|
|
||||||
In all case, the rigid bodies and non-rigid particles both contribute
|
In all case, the rigid bodies and non-rigid particles both contribute
|
||||||
to the global pressure and the box is scaled the same by any of the
|
to the global pressure and the box is scaled the same by any of the
|
||||||
|
|||||||
@ -166,7 +166,7 @@ events, all the other replicas also run dynamics and event checking
|
|||||||
with the same schedule, but the final states are always overwritten by
|
with the same schedule, but the final states are always overwritten by
|
||||||
the state of the event replica.
|
the state of the event replica.
|
||||||
|
|
||||||
The outer loop of the pseudo-code above continues until *N* steps of
|
The outer loop of the pseudocode above continues until *N* steps of
|
||||||
dynamics have been performed. Note that *N* only includes the
|
dynamics have been performed. Note that *N* only includes the
|
||||||
dynamics of stages 2 and 3, not the steps taken during dephasing or
|
dynamics of stages 2 and 3, not the steps taken during dephasing or
|
||||||
the minimization iterations of quenching. The specified *N* is
|
the minimization iterations of quenching. The specified *N* is
|
||||||
|
|||||||
@ -94,7 +94,7 @@ class Pointers {
|
|||||||
python(ptr->python) {}
|
python(ptr->python) {}
|
||||||
virtual ~Pointers() = default;
|
virtual ~Pointers() = default;
|
||||||
|
|
||||||
// remove default members execept for the copy constructor
|
// remove other default members
|
||||||
|
|
||||||
Pointers() = delete;
|
Pointers() = delete;
|
||||||
Pointers(const Pointers &) = default;
|
Pointers(const Pointers &) = default;
|
||||||
|
|||||||
Reference in New Issue
Block a user