starting grammar, punctuation, and spelling review for developer info sections

This commit is contained in:
Axel Kohlmeyer
2023-02-04 18:03:30 -05:00
parent a0a7e76cc3
commit d1550bf9f6
15 changed files with 366 additions and 357 deletions

View File

@ -1,52 +1,53 @@
Code design
-----------
This section explains some of the code design choices in LAMMPS with
the goal of helping developers write new code similar to the existing
code. Please see the section on :doc:`Requirements for contributed
code <Modify_style>` for more specific recommendations and guidelines.
While that section is organized more in the form of a checklist for
code contributors, the focus here is on overall code design strategy,
choices made between possible alternatives, and discussing some
relevant C++ programming language constructs.
This section explains some code design choices in LAMMPS with the goal
of helping developers write new code similar to the existing code.
Please see the section on :doc:`Requirements for contributed code
<Modify_style>` for more specific recommendations and guidelines. While
that section is organized more in the form of a checklist for code
contributors, the focus here is on overall code design strategy, choices
made between possible alternatives, and discussing some relevant C++
programming language constructs.
Historically, the basic design philosophy of the LAMMPS C++ code was a
"C with classes" style. The motivation was to make it easy to modify
LAMMPS for people without significant training in C++ programming.
Data structures and code constructs were used that resemble the
previous implementation(s) in Fortran. A contributing factor to this
choice also was that at the time, C++ compilers were often not mature
and some of the advanced features contained bugs or did not function
as the standard required. There were also disagreements between
compiler vendors as to how to interpret the C++ standard documents.
LAMMPS for people without significant training in C++ programming. Data
structures and code constructs were used that resemble the previous
implementation(s) in Fortran. A contributing factor to this choice was
that at the time, C++ compilers were often not mature and some advanced
features contained bugs or did not function as the standard required.
There were also disagreements between compiler vendors as to how to
interpret the C++ standard documents.
However, C++ compilers have now advanced significantly. In 2020 we
decided to to require the C++11 standard as the minimum C++ language
standard for LAMMPS. Since then we have begun to also replace some of
the C-style constructs with equivalent C++ functionality, either from
the C++ standard library or as custom classes or functions, in order
to improve readability of the code and to increase code reuse through
abstraction of commonly used functionality.
However, C++ compilers and the C++ programming language have advanced
significantly. In 2020, the LAMMPS developers decided to require the
C++11 standard as the minimum C++ language standard for LAMMPS. Since
then, we have begun to replace C-style constructs with equivalent C++
functionality. This was taken either from the C++ standard library or
implemented as custom classes or functions. The goal is to improve
readability of the code and to increase code reuse through abstraction
of commonly used functionality.
.. note::
Please note that as of spring 2022 there is still a sizable chunk
of legacy code in LAMMPS that has not yet been refactored to
reflect these style conventions in full. LAMMPS has a large code
base and many different contributors and there also is a hierarchy
of precedence in which the code is adapted. Highest priority has
been the code in the ``src`` folder, followed by code in packages
in order of their popularity and complexity (simpler code is
adapted sooner), followed by code in the ``lib`` folder. Source
code that is downloaded from external packages or libraries during
compilation is not subject to the conventions discussed here.
Please note that as of spring 2023 there is still a sizable chunk of
legacy code in LAMMPS that has not yet been refactored to reflect
these style conventions in full. LAMMPS has a large code base and
many contributors. There is also a hierarchy of precedence in which
the code is adapted. Highest priority has been the code in the
``src`` folder, followed by code in packages in order of their
popularity and complexity (simpler code gets adapted sooner), followed
by code in the ``lib`` folder. Source code that is downloaded from
external packages or libraries during compilation is not subject to
the conventions discussed here.
Object oriented code
Object-oriented code
^^^^^^^^^^^^^^^^^^^^
LAMMPS is designed to be an object oriented code. Each simulation is
LAMMPS is designed to be an object-oriented code. Each simulation is
represented by an instance of the LAMMPS class. When running in
parallel each MPI process creates such an instance. This can be seen
parallel, each MPI process creates such an instance. This can be seen
in the ``main.cpp`` file where the core steps of running a LAMMPS
simulation are the following 3 lines of code:
@ -67,14 +68,14 @@ other special features.
The basic LAMMPS class hierarchy which is created by the LAMMPS class
constructor is shown in :ref:`class-topology`. When input commands
are processed, additional class instances are created, or deleted, or
replaced. Likewise specific member functions of specific classes are
replaced. Likewise, specific member functions of specific classes are
called to trigger actions such creating atoms, computing forces,
computing properties, time-propagating the system, or writing output.
Compositing and Inheritance
===========================
LAMMPS makes extensive use of the object oriented programming (OOP)
LAMMPS makes extensive use of the object-oriented programming (OOP)
principles of *compositing* and *inheritance*. Classes like the
``LAMMPS`` class are a **composite** containing pointers to instances
of other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``,
@ -83,7 +84,7 @@ functionality by storing and manipulating data related to the
simulation and providing member functions that trigger certain
actions. Some of those classes like ``Force`` are themselves
composites, containing instances of classes describing different force
interactions. Similarly the ``Modify`` class contains a list of
interactions. Similarly, the ``Modify`` class contains a list of
``Fix`` and ``Compute`` classes. If the input commands that
correspond to these classes include the word *style*, then LAMMPS
stores only a single instance of that class. E.g. *atom_style*,
@ -100,19 +101,18 @@ derived class variant was instantiated. In LAMMPS these derived
classes are often referred to as "styles", e.g. pair styles, fix
styles, atom styles and so on.
This is the origin of the flexibility of LAMMPS. For example pair
This is the origin of the flexibility of LAMMPS. For example, pair
styles implement a variety of different non-bonded interatomic
potentials functions. All details for the implementation of a
potential are stored and executed in a single class.
As mentioned above, there can be multiple instances of classes derived
from the ``Fix`` or ``Compute`` base classes. They represent a
different facet of LAMMPS flexibility as they provide methods which
can be called at different points in time within a timestep, as
explained in `Developer_flow`. This allows the input script to tailor
how a specific simulation is run, what diagnostic computations are
performed, and how the output of those computations is further
processed or output.
different facet of LAMMPS' flexibility, as they provide methods which
can be called at different points within a timestep, as explained in
`Developer_flow`. This allows the input script to tailor how a specific
simulation is run, what diagnostic computations are performed, and how
the output of those computations is further processed or output.
Additional code sharing is possible by creating derived classes from the
derived classes (e.g., to implement an accelerated version of a pair
@ -164,15 +164,15 @@ The difference in behavior of the ``normal()`` and the ``poly()`` member
functions is which of the two member functions is called when executing
`base1->call()` versus `base2->call()`. Without polymorphism, a
function within the base class can only call member functions within the
same scope, that is ``Base::call()`` will always call
``Base::normal()``. But for the `base2->call()` case the call of the
same scope: that is, ``Base::call()`` will always call
``Base::normal()``. But for the `base2->call()` case, the call of the
virtual member function will be dispatched to ``Derived::poly()``
instead. This mechanism means that functions are called within the
scope of the class type that was used to *create* the class instance are
invoked; even if they are assigned to a pointer using the type of a base
class. This is the desired behavior and this way LAMMPS can even use
styles that are loaded at runtime from a shared object file with the
:doc:`plugin command <plugin>`.
instead. This mechanism results in calling functions that are within
the scope of the class that was used to *create* the instance, even if
they are assigned to a pointer for their base class. This is the
desired behavior, and this way LAMMPS can even use styles that are loaded
at runtime from a shared object file with the :doc:`plugin command
<plugin>`.
A special case of virtual functions are so-called pure functions. These
are virtual functions that are initialized to 0 in the class declaration
@ -189,12 +189,12 @@ This has the effect that an instance of the base class cannot be
created and that derived classes **must** implement these functions.
Many of the functions listed with the various class styles in the
section :doc:`Modify` are pure functions. The motivation for this is
to define the interface or API of the functions but defer their
to define the interface or API of the functions, but defer their
implementation to the derived classes.
However, there are downsides to this. For example, calls to virtual
functions from within a constructor, will not be in the scope of the
derived class and thus it is good practice to either avoid calling them
functions from within a constructor, will *not* be in the scope of the
derived class, and thus it is good practice to either avoid calling them
or to provide an explicit scope such as ``Base::poly()`` or
``Derived::poly()``. Furthermore, any destructors in classes containing
virtual functions should be declared virtual too, so they will be
@ -208,8 +208,8 @@ dispatch.
that are intended to replace a virtual or pure function use the
``override`` property keyword. For the same reason, the use of
overloads or default arguments for virtual functions should be
avoided as they lead to confusion over which function is supposed to
override which and which arguments need to be declared.
avoided, as they lead to confusion over which function is supposed to
override which, and which arguments need to be declared.
Style Factories
===============
@ -219,10 +219,10 @@ uses a programming pattern called `Factory`. Those are functions that
create an instance of a specific derived class, say ``PairLJCut`` and
return a pointer to the type of the common base class of that style,
``Pair`` in this case. To associate the factory function with the
style keyword, an ``std::map`` class is used with function pointers
style keyword, a ``std::map`` class is used with function pointers
indexed by their keyword (for example "lj/cut" for ``PairLJCut`` and
"morse" for ``PairMorse``). A couple of typedefs help keep the code
readable and a template function is used to implement the actual
readable, and a template function is used to implement the actual
factory functions for the individual classes. Below is an example
of such a factory function from the ``Force`` class as declared in
``force.h`` and implemented in ``force.cpp``. The file ``style_pair.h``
@ -279,26 +279,26 @@ from and writing to files and console instead of C++ "iostreams".
This is mainly motivated by better performance, better control over
formatting, and less effort to achieve specific formatting.
Since mixing "stdio" and "iostreams" can lead to unexpected
behavior. use of the latter is strongly discouraged. Also output to
the screen should not use the predefined ``stdout`` FILE pointer, but
rather the ``screen`` and ``logfile`` FILE pointers managed by the
LAMMPS class. Furthermore, output should generally only be done by
MPI rank 0 (``comm->me == 0``). Output that is sent to both
``screen`` and ``logfile`` should use the :cpp:func:`utils::logmesg()
convenience function <LAMMPS_NS::utils::logmesg>`.
Since mixing "stdio" and "iostreams" can lead to unexpected behavior,
use of the latter is strongly discouraged. Output to the screen should
*not* use the predefined ``stdout`` FILE pointer, but rather the
``screen`` and ``logfile`` FILE pointers managed by the LAMMPS class.
Furthermore, output should generally only be done by MPI rank 0
(``comm->me == 0``). Output that is sent to both ``screen`` and
``logfile`` should use the :cpp:func:`utils::logmesg() convenience
function <LAMMPS_NS::utils::logmesg>`.
We also discourage the use of stringstreams because the bundled {fmt}
library and the customized tokenizer classes can provide the same
functionality in a cleaner way with better performance. This also
helps maintain a consistent programming syntax with code from many
different contributors.
We discourage the use of stringstreams because the bundled {fmt} library
and the customized tokenizer classes provide the same functionality in a
cleaner way with better performance. This also helps maintain a
consistent programming syntax with code from many different
contributors.
Formatting with the {fmt} library
===================================
The LAMMPS source code includes a copy of the `{fmt} library
<https://fmt.dev>`_ which is preferred over formatting with the
<https://fmt.dev>`_, which is preferred over formatting with the
"printf()" family of functions. The primary reason is that it allows
a typesafe default format for any type of supported data. This is
particularly useful for formatting integers of a given size (32-bit or
@ -313,17 +313,16 @@ been included into the C++20 language standard, so changes to adopt it
are future-proof.
Formatted strings are frequently created by calling the
``fmt::format()`` function which will return a string as a
``std::string`` class instance. In contrast to the ``%`` placeholder
in ``printf()``, the {fmt} library uses ``{}`` to embed format
descriptors. In the simplest case, no additional characters are
needed as {fmt} will choose the default format based on the data type
of the argument. Otherwise the ``fmt::print()`` function may be
used instead of ``printf()`` or ``fprintf()``. In addition, several
LAMMPS output functions, that originally accepted a single string as
argument have been overloaded to accept a format string with optional
arguments as well (e.g., ``Error::all()``, ``Error::one()``,
``utils::logmesg()``).
``fmt::format()`` function, which will return a string as a
``std::string`` class instance. In contrast to the ``%`` placeholder in
``printf()``, the {fmt} library uses ``{}`` to embed format descriptors.
In the simplest case, no additional characters are needed, as {fmt} will
choose the default format based on the data type of the argument.
Otherwise, the ``fmt::print()`` function may be used instead of
``printf()`` or ``fprintf()``. In addition, several LAMMPS output
functions, that originally accepted a single string as argument have
been overloaded to accept a format string with optional arguments as
well (e.g., ``Error::all()``, ``Error::one()``, ``utils::logmesg()``).
Summary of the {fmt} format syntax
==================================
@ -332,10 +331,11 @@ The syntax of the format string is "{[<argument id>][:<format spec>]}",
where either the argument id or the format spec (separated by a colon
':') is optional. The argument id is usually a number starting from 0
that is the index to the arguments following the format string. By
default these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The most
common case for using argument id would be to use the same argument in
multiple places in the format string without having to provide it as an
argument multiple times. In LAMMPS the argument id is rarely used.
default, these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The
most common case for using argument id would be to use the same argument
in multiple places in the format string without having to provide it as
an argument multiple times. The argument id is rarely used in the LAMMPS
source code.
More common is the use of a format specifier, which starts with a colon.
This may optionally be followed by a fill character (default is ' '). If
@ -347,18 +347,19 @@ width, which may be followed by a dot '.' and a precision for floating
point numbers. The final character in the format string would be an
indicator for the "presentation", i.e. 'd' for decimal presentation of
integers, 'x' for hexadecimal, 'o' for octal, 'c' for character etc.
This mostly follows the "printf()" scheme but without requiring an
This mostly follows the "printf()" scheme, but without requiring an
additional length parameter to distinguish between different integer
widths. The {fmt} library will detect those and adapt the formatting
accordingly. For floating point numbers there are correspondingly, 'g'
for generic presentation, 'e' for exponential presentation, and 'f' for
fixed point presentation.
Thus "{:8}" would represent *any* type argument using at least 8
characters; "{:<8}" would do this as left aligned, "{:^8}" as centered,
"{:>8}" as right aligned. If a specific presentation is selected, the
argument type must be compatible or else the {fmt} formatting code will
throw an exception. Some format string examples are given below:
The format string "{:8}" would thus represent *any* type argument and be
replaced by at least 8 characters; "{:<8}" would do this as left
aligned, "{:^8}" as centered, "{:>8}" as right aligned. If a specific
presentation is selected, the argument type must be compatible or else
the {fmt} formatting code will throw an exception. Some format string
examples are given below:
.. code-block:: c++
@ -392,12 +393,12 @@ documentation <https://fmt.dev/latest/syntax.html>`_ website.
Memory management
^^^^^^^^^^^^^^^^^
Dynamical allocation of small data and objects can be done with the
the C++ commands "new" and "delete/delete[]. Large data should use
the member functions of the ``Memory`` class, most commonly,
``Memory::create()``, ``Memory::grow()``, and ``Memory::destroy()``,
which provide variants for vectors, 2d arrays, 3d arrays, etc.
These can also be used for small data.
Dynamical allocation of small data and objects can be done with the C++
commands "new" and "delete/delete[]". Large data should use the member
functions of the ``Memory`` class, most commonly, ``Memory::create()``,
``Memory::grow()``, and ``Memory::destroy()``, which provide variants
for vectors, 2d arrays, 3d arrays, etc. These can also be used for
small data.
The use of ``malloc()``, ``calloc()``, ``realloc()`` and ``free()``
directly is strongly discouraged. To simplify adapting legacy code
@ -408,26 +409,24 @@ perform additional error checks for safety.
Use of these custom memory allocation functions is motivated by the
following considerations:
- memory allocation failures on *any* MPI rank during a parallel run
will trigger an immediate abort of the entire parallel calculation
instead of stalling it
- a failing "new" will trigger an exception which is also captured by
LAMMPS and triggers a global abort
- allocation of multi-dimensional arrays will be done in a C compatible
fashion but so that the storage of the actual data is stored in one
large contiguous block. Thus when MPI communication is needed,
- Memory allocation failures on *any* MPI rank during a parallel run
will trigger an immediate abort of the entire parallel calculation.
- A failing "new" will trigger an exception, which is also captured by
LAMMPS and triggers a global abort.
- Allocation of multidimensional arrays will be done in a C compatible
fashion, but such that the storage of the actual data is stored in one
large contiguous block. Thus, when MPI communication is needed,
the data can be communicated directly (similar to Fortran arrays).
- the "destroy()" and "sfree()" functions may safely be called on NULL
pointers
- the "destroy()" functions will nullify the pointer variables making
"use after free" errors easy to detect
- it is possible to use a larger than default memory alignment (not on
- The "destroy()" and "sfree()" functions may safely be called on NULL
pointers.
- The "destroy()" functions will nullify the pointer variables, thus
making "use after free" errors easy to detect.
- It is possible to use a larger than default memory alignment (not on
all operating systems, since the allocated storage pointers must be
compatible with ``free()`` for technical reasons)
compatible with ``free()`` for technical reasons).
In the practical implementation of code this means that any pointer
variables that are class members should be initialized to a
``nullptr`` value in their respective constructors. That way it is
safe to call ``Memory::destroy()`` or ``delete[]`` on them before
*any* allocation outside the constructor. This helps prevent memory
leaks.
In the practical implementation of code this means, that any pointer
variables, that are class members should be initialized to a ``nullptr``
value in their respective constructors. That way, it is safe to call
``Memory::destroy()`` or ``delete[]`` on them before *any* allocation
outside the constructor. This helps prevent memory leaks.