diff --git a/doc/src/Developer_code_design.rst b/doc/src/Developer_code_design.rst index 786bd51c26..844dbd0512 100644 --- a/doc/src/Developer_code_design.rst +++ b/doc/src/Developer_code_design.rst @@ -1,52 +1,53 @@ Code design ----------- -This section explains some of the code design choices in LAMMPS with -the goal of helping developers write new code similar to the existing -code. Please see the section on :doc:`Requirements for contributed -code ` for more specific recommendations and guidelines. -While that section is organized more in the form of a checklist for -code contributors, the focus here is on overall code design strategy, -choices made between possible alternatives, and discussing some -relevant C++ programming language constructs. +This section explains some code design choices in LAMMPS with the goal +of helping developers write new code similar to the existing code. +Please see the section on :doc:`Requirements for contributed code +` for more specific recommendations and guidelines. While +that section is organized more in the form of a checklist for code +contributors, the focus here is on overall code design strategy, choices +made between possible alternatives, and discussing some relevant C++ +programming language constructs. Historically, the basic design philosophy of the LAMMPS C++ code was a "C with classes" style. The motivation was to make it easy to modify -LAMMPS for people without significant training in C++ programming. -Data structures and code constructs were used that resemble the -previous implementation(s) in Fortran. A contributing factor to this -choice also was that at the time, C++ compilers were often not mature -and some of the advanced features contained bugs or did not function -as the standard required. There were also disagreements between -compiler vendors as to how to interpret the C++ standard documents. +LAMMPS for people without significant training in C++ programming. Data +structures and code constructs were used that resemble the previous +implementation(s) in Fortran. A contributing factor to this choice was +that at the time, C++ compilers were often not mature and some advanced +features contained bugs or did not function as the standard required. +There were also disagreements between compiler vendors as to how to +interpret the C++ standard documents. -However, C++ compilers have now advanced significantly. In 2020 we -decided to to require the C++11 standard as the minimum C++ language -standard for LAMMPS. Since then we have begun to also replace some of -the C-style constructs with equivalent C++ functionality, either from -the C++ standard library or as custom classes or functions, in order -to improve readability of the code and to increase code reuse through -abstraction of commonly used functionality. +However, C++ compilers and the C++ programming language have advanced +significantly. In 2020, the LAMMPS developers decided to require the +C++11 standard as the minimum C++ language standard for LAMMPS. Since +then, we have begun to replace C-style constructs with equivalent C++ +functionality. This was taken either from the C++ standard library or +implemented as custom classes or functions. The goal is to improve +readability of the code and to increase code reuse through abstraction +of commonly used functionality. .. note:: - Please note that as of spring 2022 there is still a sizable chunk - of legacy code in LAMMPS that has not yet been refactored to - reflect these style conventions in full. LAMMPS has a large code - base and many different contributors and there also is a hierarchy - of precedence in which the code is adapted. Highest priority has - been the code in the ``src`` folder, followed by code in packages - in order of their popularity and complexity (simpler code is - adapted sooner), followed by code in the ``lib`` folder. Source - code that is downloaded from external packages or libraries during - compilation is not subject to the conventions discussed here. + Please note that as of spring 2023 there is still a sizable chunk of + legacy code in LAMMPS that has not yet been refactored to reflect + these style conventions in full. LAMMPS has a large code base and + many contributors. There is also a hierarchy of precedence in which + the code is adapted. Highest priority has been the code in the + ``src`` folder, followed by code in packages in order of their + popularity and complexity (simpler code gets adapted sooner), followed + by code in the ``lib`` folder. Source code that is downloaded from + external packages or libraries during compilation is not subject to + the conventions discussed here. -Object oriented code +Object-oriented code ^^^^^^^^^^^^^^^^^^^^ -LAMMPS is designed to be an object oriented code. Each simulation is +LAMMPS is designed to be an object-oriented code. Each simulation is represented by an instance of the LAMMPS class. When running in -parallel each MPI process creates such an instance. This can be seen +parallel, each MPI process creates such an instance. This can be seen in the ``main.cpp`` file where the core steps of running a LAMMPS simulation are the following 3 lines of code: @@ -67,14 +68,14 @@ other special features. The basic LAMMPS class hierarchy which is created by the LAMMPS class constructor is shown in :ref:`class-topology`. When input commands are processed, additional class instances are created, or deleted, or -replaced. Likewise specific member functions of specific classes are +replaced. Likewise, specific member functions of specific classes are called to trigger actions such creating atoms, computing forces, computing properties, time-propagating the system, or writing output. Compositing and Inheritance =========================== -LAMMPS makes extensive use of the object oriented programming (OOP) +LAMMPS makes extensive use of the object-oriented programming (OOP) principles of *compositing* and *inheritance*. Classes like the ``LAMMPS`` class are a **composite** containing pointers to instances of other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``, @@ -83,7 +84,7 @@ functionality by storing and manipulating data related to the simulation and providing member functions that trigger certain actions. Some of those classes like ``Force`` are themselves composites, containing instances of classes describing different force -interactions. Similarly the ``Modify`` class contains a list of +interactions. Similarly, the ``Modify`` class contains a list of ``Fix`` and ``Compute`` classes. If the input commands that correspond to these classes include the word *style*, then LAMMPS stores only a single instance of that class. E.g. *atom_style*, @@ -100,19 +101,18 @@ derived class variant was instantiated. In LAMMPS these derived classes are often referred to as "styles", e.g. pair styles, fix styles, atom styles and so on. -This is the origin of the flexibility of LAMMPS. For example pair +This is the origin of the flexibility of LAMMPS. For example, pair styles implement a variety of different non-bonded interatomic potentials functions. All details for the implementation of a potential are stored and executed in a single class. As mentioned above, there can be multiple instances of classes derived from the ``Fix`` or ``Compute`` base classes. They represent a -different facet of LAMMPS flexibility as they provide methods which -can be called at different points in time within a timestep, as -explained in `Developer_flow`. This allows the input script to tailor -how a specific simulation is run, what diagnostic computations are -performed, and how the output of those computations is further -processed or output. +different facet of LAMMPS' flexibility, as they provide methods which +can be called at different points within a timestep, as explained in +`Developer_flow`. This allows the input script to tailor how a specific +simulation is run, what diagnostic computations are performed, and how +the output of those computations is further processed or output. Additional code sharing is possible by creating derived classes from the derived classes (e.g., to implement an accelerated version of a pair @@ -164,15 +164,15 @@ The difference in behavior of the ``normal()`` and the ``poly()`` member functions is which of the two member functions is called when executing `base1->call()` versus `base2->call()`. Without polymorphism, a function within the base class can only call member functions within the -same scope, that is ``Base::call()`` will always call -``Base::normal()``. But for the `base2->call()` case the call of the +same scope: that is, ``Base::call()`` will always call +``Base::normal()``. But for the `base2->call()` case, the call of the virtual member function will be dispatched to ``Derived::poly()`` -instead. This mechanism means that functions are called within the -scope of the class type that was used to *create* the class instance are -invoked; even if they are assigned to a pointer using the type of a base -class. This is the desired behavior and this way LAMMPS can even use -styles that are loaded at runtime from a shared object file with the -:doc:`plugin command `. +instead. This mechanism results in calling functions that are within +the scope of the class that was used to *create* the instance, even if +they are assigned to a pointer for their base class. This is the +desired behavior, and this way LAMMPS can even use styles that are loaded +at runtime from a shared object file with the :doc:`plugin command +`. A special case of virtual functions are so-called pure functions. These are virtual functions that are initialized to 0 in the class declaration @@ -189,12 +189,12 @@ This has the effect that an instance of the base class cannot be created and that derived classes **must** implement these functions. Many of the functions listed with the various class styles in the section :doc:`Modify` are pure functions. The motivation for this is -to define the interface or API of the functions but defer their +to define the interface or API of the functions, but defer their implementation to the derived classes. However, there are downsides to this. For example, calls to virtual -functions from within a constructor, will not be in the scope of the -derived class and thus it is good practice to either avoid calling them +functions from within a constructor, will *not* be in the scope of the +derived class, and thus it is good practice to either avoid calling them or to provide an explicit scope such as ``Base::poly()`` or ``Derived::poly()``. Furthermore, any destructors in classes containing virtual functions should be declared virtual too, so they will be @@ -208,8 +208,8 @@ dispatch. that are intended to replace a virtual or pure function use the ``override`` property keyword. For the same reason, the use of overloads or default arguments for virtual functions should be - avoided as they lead to confusion over which function is supposed to - override which and which arguments need to be declared. + avoided, as they lead to confusion over which function is supposed to + override which, and which arguments need to be declared. Style Factories =============== @@ -219,10 +219,10 @@ uses a programming pattern called `Factory`. Those are functions that create an instance of a specific derived class, say ``PairLJCut`` and return a pointer to the type of the common base class of that style, ``Pair`` in this case. To associate the factory function with the -style keyword, an ``std::map`` class is used with function pointers +style keyword, a ``std::map`` class is used with function pointers indexed by their keyword (for example "lj/cut" for ``PairLJCut`` and "morse" for ``PairMorse``). A couple of typedefs help keep the code -readable and a template function is used to implement the actual +readable, and a template function is used to implement the actual factory functions for the individual classes. Below is an example of such a factory function from the ``Force`` class as declared in ``force.h`` and implemented in ``force.cpp``. The file ``style_pair.h`` @@ -279,26 +279,26 @@ from and writing to files and console instead of C++ "iostreams". This is mainly motivated by better performance, better control over formatting, and less effort to achieve specific formatting. -Since mixing "stdio" and "iostreams" can lead to unexpected -behavior. use of the latter is strongly discouraged. Also output to -the screen should not use the predefined ``stdout`` FILE pointer, but -rather the ``screen`` and ``logfile`` FILE pointers managed by the -LAMMPS class. Furthermore, output should generally only be done by -MPI rank 0 (``comm->me == 0``). Output that is sent to both -``screen`` and ``logfile`` should use the :cpp:func:`utils::logmesg() -convenience function `. +Since mixing "stdio" and "iostreams" can lead to unexpected behavior, +use of the latter is strongly discouraged. Output to the screen should +*not* use the predefined ``stdout`` FILE pointer, but rather the +``screen`` and ``logfile`` FILE pointers managed by the LAMMPS class. +Furthermore, output should generally only be done by MPI rank 0 +(``comm->me == 0``). Output that is sent to both ``screen`` and +``logfile`` should use the :cpp:func:`utils::logmesg() convenience +function `. -We also discourage the use of stringstreams because the bundled {fmt} -library and the customized tokenizer classes can provide the same -functionality in a cleaner way with better performance. This also -helps maintain a consistent programming syntax with code from many -different contributors. +We discourage the use of stringstreams because the bundled {fmt} library +and the customized tokenizer classes provide the same functionality in a +cleaner way with better performance. This also helps maintain a +consistent programming syntax with code from many different +contributors. Formatting with the {fmt} library =================================== The LAMMPS source code includes a copy of the `{fmt} library -`_ which is preferred over formatting with the +`_, which is preferred over formatting with the "printf()" family of functions. The primary reason is that it allows a typesafe default format for any type of supported data. This is particularly useful for formatting integers of a given size (32-bit or @@ -313,17 +313,16 @@ been included into the C++20 language standard, so changes to adopt it are future-proof. Formatted strings are frequently created by calling the -``fmt::format()`` function which will return a string as a -``std::string`` class instance. In contrast to the ``%`` placeholder -in ``printf()``, the {fmt} library uses ``{}`` to embed format -descriptors. In the simplest case, no additional characters are -needed as {fmt} will choose the default format based on the data type -of the argument. Otherwise the ``fmt::print()`` function may be -used instead of ``printf()`` or ``fprintf()``. In addition, several -LAMMPS output functions, that originally accepted a single string as -argument have been overloaded to accept a format string with optional -arguments as well (e.g., ``Error::all()``, ``Error::one()``, -``utils::logmesg()``). +``fmt::format()`` function, which will return a string as a +``std::string`` class instance. In contrast to the ``%`` placeholder in +``printf()``, the {fmt} library uses ``{}`` to embed format descriptors. +In the simplest case, no additional characters are needed, as {fmt} will +choose the default format based on the data type of the argument. +Otherwise, the ``fmt::print()`` function may be used instead of +``printf()`` or ``fprintf()``. In addition, several LAMMPS output +functions, that originally accepted a single string as argument have +been overloaded to accept a format string with optional arguments as +well (e.g., ``Error::all()``, ``Error::one()``, ``utils::logmesg()``). Summary of the {fmt} format syntax ================================== @@ -332,10 +331,11 @@ The syntax of the format string is "{[][:]}", where either the argument id or the format spec (separated by a colon ':') is optional. The argument id is usually a number starting from 0 that is the index to the arguments following the format string. By -default these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The most -common case for using argument id would be to use the same argument in -multiple places in the format string without having to provide it as an -argument multiple times. In LAMMPS the argument id is rarely used. +default, these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.). The +most common case for using argument id would be to use the same argument +in multiple places in the format string without having to provide it as +an argument multiple times. The argument id is rarely used in the LAMMPS +source code. More common is the use of a format specifier, which starts with a colon. This may optionally be followed by a fill character (default is ' '). If @@ -347,18 +347,19 @@ width, which may be followed by a dot '.' and a precision for floating point numbers. The final character in the format string would be an indicator for the "presentation", i.e. 'd' for decimal presentation of integers, 'x' for hexadecimal, 'o' for octal, 'c' for character etc. -This mostly follows the "printf()" scheme but without requiring an +This mostly follows the "printf()" scheme, but without requiring an additional length parameter to distinguish between different integer widths. The {fmt} library will detect those and adapt the formatting accordingly. For floating point numbers there are correspondingly, 'g' for generic presentation, 'e' for exponential presentation, and 'f' for fixed point presentation. -Thus "{:8}" would represent *any* type argument using at least 8 -characters; "{:<8}" would do this as left aligned, "{:^8}" as centered, -"{:>8}" as right aligned. If a specific presentation is selected, the -argument type must be compatible or else the {fmt} formatting code will -throw an exception. Some format string examples are given below: +The format string "{:8}" would thus represent *any* type argument and be +replaced by at least 8 characters; "{:<8}" would do this as left +aligned, "{:^8}" as centered, "{:>8}" as right aligned. If a specific +presentation is selected, the argument type must be compatible or else +the {fmt} formatting code will throw an exception. Some format string +examples are given below: .. code-block:: c++ @@ -392,12 +393,12 @@ documentation `_ website. Memory management ^^^^^^^^^^^^^^^^^ -Dynamical allocation of small data and objects can be done with the -the C++ commands "new" and "delete/delete[]. Large data should use -the member functions of the ``Memory`` class, most commonly, -``Memory::create()``, ``Memory::grow()``, and ``Memory::destroy()``, -which provide variants for vectors, 2d arrays, 3d arrays, etc. -These can also be used for small data. +Dynamical allocation of small data and objects can be done with the C++ +commands "new" and "delete/delete[]". Large data should use the member +functions of the ``Memory`` class, most commonly, ``Memory::create()``, +``Memory::grow()``, and ``Memory::destroy()``, which provide variants +for vectors, 2d arrays, 3d arrays, etc. These can also be used for +small data. The use of ``malloc()``, ``calloc()``, ``realloc()`` and ``free()`` directly is strongly discouraged. To simplify adapting legacy code @@ -408,26 +409,24 @@ perform additional error checks for safety. Use of these custom memory allocation functions is motivated by the following considerations: -- memory allocation failures on *any* MPI rank during a parallel run - will trigger an immediate abort of the entire parallel calculation - instead of stalling it -- a failing "new" will trigger an exception which is also captured by - LAMMPS and triggers a global abort -- allocation of multi-dimensional arrays will be done in a C compatible - fashion but so that the storage of the actual data is stored in one - large contiguous block. Thus when MPI communication is needed, +- Memory allocation failures on *any* MPI rank during a parallel run + will trigger an immediate abort of the entire parallel calculation. +- A failing "new" will trigger an exception, which is also captured by + LAMMPS and triggers a global abort. +- Allocation of multidimensional arrays will be done in a C compatible + fashion, but such that the storage of the actual data is stored in one + large contiguous block. Thus, when MPI communication is needed, the data can be communicated directly (similar to Fortran arrays). -- the "destroy()" and "sfree()" functions may safely be called on NULL - pointers -- the "destroy()" functions will nullify the pointer variables making - "use after free" errors easy to detect -- it is possible to use a larger than default memory alignment (not on +- The "destroy()" and "sfree()" functions may safely be called on NULL + pointers. +- The "destroy()" functions will nullify the pointer variables, thus + making "use after free" errors easy to detect. +- It is possible to use a larger than default memory alignment (not on all operating systems, since the allocated storage pointers must be - compatible with ``free()`` for technical reasons) + compatible with ``free()`` for technical reasons). -In the practical implementation of code this means that any pointer -variables that are class members should be initialized to a -``nullptr`` value in their respective constructors. That way it is -safe to call ``Memory::destroy()`` or ``delete[]`` on them before -*any* allocation outside the constructor. This helps prevent memory -leaks. +In the practical implementation of code this means, that any pointer +variables, that are class members should be initialized to a ``nullptr`` +value in their respective constructors. That way, it is safe to call +``Memory::destroy()`` or ``delete[]`` on them before *any* allocation +outside the constructor. This helps prevent memory leaks. diff --git a/doc/src/Developer_comm_ops.rst b/doc/src/Developer_comm_ops.rst index 5eac8e46de..de09cbb260 100644 --- a/doc/src/Developer_comm_ops.rst +++ b/doc/src/Developer_comm_ops.rst @@ -28,7 +28,7 @@ The need to do this communication arises when data from the owned atoms is updated (e.g. their positions) and this updated information needs to be **copied** to the corresponding ghost atoms. -And second, *reverse communication* which sends ghost atom information +And second, *reverse communication*, which sends ghost atom information from each processor to the owning processor to **accumulate** (sum) the values with the corresponding owned atoms. The need for this arises when data is computed and also stored with ghost atoms @@ -58,7 +58,7 @@ embedded-atom method (EAM) which compute intermediate values in the first part of the compute() function that need to be stored by both owned and ghost atoms for the second part of the force computation. The *Comm* class methods perform the MPI communication for buffers of -per-atom data. They "call back" to the *Pair* class so it can *pack* +per-atom data. They "call back" to the *Pair* class, so it can *pack* or *unpack* the buffer with data the *Pair* class owns. There are 4 such methods that the *Pair* class must define, assuming it uses both forward and reverse communication: @@ -70,7 +70,7 @@ forward and reverse communication: The arguments to these methods include the buffer and a list of atoms to pack or unpack. The *Pair* class also must set the *comm_forward* -and *comm_reverse* variables which store the number of values stored +and *comm_reverse* variables, which store the number of values stored in the communication buffers for each operation. This means, if desired, it can choose to store multiple per-atom values in the buffer, and they will be communicated together to minimize @@ -81,11 +81,11 @@ containing ``double`` values. To correctly store integers that may be The *Fix*, *Compute*, and *Dump* classes can also invoke the same kind of forward and reverse communication operations using the same *Comm* -class methods. Likewise the same pack/unpack methods and +class methods. Likewise, the same pack/unpack methods and comm_forward/comm_reverse variables must be defined by the calling *Fix*, *Compute*, or *Dump* class. -For *Fix* classes there is an optional second argument to the +For *Fix* classes, there is an optional second argument to the *forward_comm()* and *reverse_comm()* call which can be used when the fix performs multiple modes of communication, with different numbers of values per atom. The fix should set the *comm_forward* and @@ -150,7 +150,7 @@ latter case, when the *ring* operation is complete, each processor can examine its original buffer to extract modified values. Note that the *ring* operation is similar to an MPI_Alltoall() -operation where every processor effectively sends and receives data to +operation, where every processor effectively sends and receives data to every other processor. The difference is that the *ring* operation does it one step at a time, so the total volume of data does not need to be stored by every processor. However, the *ring* operation is @@ -184,8 +184,8 @@ The *exchange_data()* method triggers the communication to be performed. Each processor provides the vector of *N* datums to send, and the size of each datum. All datums must be the same size. -The *create_atom()* and *exchange_atom()* methods are similar except -that the size of each datum can be different. Typically this is used +The *create_atom()* and *exchange_atom()* methods are similar, except +that the size of each datum can be different. Typically, this is used to communicate atoms, each with a variable amount of per-atom data, to other processors. diff --git a/doc/src/Developer_flow.rst b/doc/src/Developer_flow.rst index d582043492..115d6ee6ae 100644 --- a/doc/src/Developer_flow.rst +++ b/doc/src/Developer_flow.rst @@ -45,9 +45,9 @@ other methods in the class. zero before each timestep, so that forces (torques, etc) can be accumulated. -Now for the ``Verlet::run()`` method. Its basic structure in hi-level pseudo -code is shown below. In the actual code in ``src/verlet.cpp`` some of -these operations are conditionally invoked. +Now for the ``Verlet::run()`` method. Its basic structure in hi-level +pseudocode is shown below. In the actual code in ``src/verlet.cpp`` +some of these operations are conditionally invoked. .. code-block:: python @@ -105,17 +105,17 @@ need it. These flags are passed to the various methods that compute particle interactions, so that they either compute and tally the corresponding data or can skip the extra calculations if the energy and virial are not needed. See the comments for the ``Integrate::ev_set()`` -method which document the flag values. +method, which document the flag values. At various points of the timestep, fixes are invoked, e.g. ``fix->initial_integrate()``. In the code, this is actually done -via the Modify class which stores all the Fix objects and lists of which +via the Modify class, which stores all the Fix objects and lists of which should be invoked at what point in the timestep. Fixes are the LAMMPS mechanism for tailoring the operations of a timestep for a particular simulation. As described elsewhere, each fix has one or more methods, each of which is invoked at a specific stage of the timestep, as show in -the timestep pseudo-code. All the active fixes defined in an input -script, that are flagged to have an ``initial_integrate()`` method are +the timestep pseudocode. All the active fixes defined in an input +script, that are flagged to have an ``initial_integrate()`` method, are invoked at the beginning of each timestep. Examples are :doc:`fix nve ` or :doc:`fix nvt or fix npt ` which perform the start-of-timestep velocity-Verlet integration operations to update @@ -131,9 +131,9 @@ can be changed using the :doc:`neigh_modify every/delay/check ` command. If not, coordinates of ghost atoms are acquired by each processor via the ``forward_comm()`` method of the Comm class. If neighbor lists need to be built, several operations within -the inner if clause of the pseudo-code are first invoked. The +the inner if clause of the pseudocode are first invoked. The ``pre_exchange()`` method of any defined fixes is invoked first. -Typically this inserts or deletes particles from the system. +Typically, this inserts or deletes particles from the system. Periodic boundary conditions are then applied by the Domain class via its ``pbc()`` method to remap particles that have moved outside the @@ -148,7 +148,7 @@ The box boundaries are then reset (if needed) via the ``reset_box()`` method of the Domain class, e.g. if box boundaries are shrink-wrapped to current particle coordinates. A change in the box size or shape requires internal information for communicating ghost atoms (Comm class) -and neighbor list bins (Neighbor class) be updated. The ``setup()`` +and neighbor list bins (Neighbor class) to be updated. The ``setup()`` method of the Comm class and ``setup_bins()`` method of the Neighbor class perform the update. @@ -217,20 +217,21 @@ file, and restart files. See the :doc:`thermo_style `, :doc:`dump `, and :doc:`restart ` commands for more details. -The the flow of control during energy minimization iterations is -similar to that of a molecular dynamics timestep. Forces are computed, -neighbor lists are built as needed, atoms migrate to new processors, and -atom coordinates and forces are communicated to neighboring processors. -The only difference is what Fix class operations are invoked when. Only -a subset of LAMMPS fixes are useful during energy minimization, as +The flow of control during energy minimization iterations is similar to +that of a molecular dynamics timestep. Forces are computed, neighbor +lists are built as needed, atoms migrate to new processors, and atom +coordinates and forces are communicated to neighboring processors. The +only difference is what Fix class operations are invoked when. Only a +subset of LAMMPS fixes are useful during energy minimization, as explained in their individual doc pages. The relevant Fix class methods -are ``min_pre_exchange()``, ``min_pre_force()``, and ``min_post_force()``. -Each fix is invoked at the appropriate place within the minimization -iteration. For example, the ``min_post_force()`` method is analogous to -the ``post_force()`` method for dynamics; it is used to alter or constrain -forces on each atom, which affects the minimization procedure. +are ``min_pre_exchange()``, ``min_pre_force()``, and +``min_post_force()``. Each fix is invoked at the appropriate place +within the minimization iteration. For example, the +``min_post_force()`` method is analogous to the ``post_force()`` method +for dynamics; it is used to alter or constrain forces on each atom, +which affects the minimization procedure. -After all iterations are completed there is a ``cleanup`` step which +After all iterations are completed, there is a ``cleanup`` step which calls the ``post_run()`` method of fixes to perform operations only required -at the end of a calculations (like freeing temporary storage or creating +at the end of a calculation (like freeing temporary storage or creating final outputs). diff --git a/doc/src/Developer_grid.rst b/doc/src/Developer_grid.rst index 8adf36d6f4..cd6d8d12ab 100644 --- a/doc/src/Developer_grid.rst +++ b/doc/src/Developer_grid.rst @@ -70,7 +70,7 @@ A command can define multiple grids, each of a different size. Each grid is an instantiation of the Grid2d or Grid3d class. The command also defines what data it will store for each grid it -creates and it allocates the multi-dimensional array(s) needed to +creates and it allocates the multidimensional array(s) needed to store the data. No grid cell data is stored within the Grid2d or Grid3d classes. @@ -115,7 +115,7 @@ which stores *Nvalues* per grid cell. nyhi_out, nxlo_out, nxhi_out, nvalues, "data3d_multi"); -Note that these multi-dimensional arrays are allocated as contiguous +Note that these multidimensional arrays are allocated as contiguous chunks of memory where the x-index of the grid varies fastest, then y, and the z-index slowest. For multiple values per grid cell, the Nvalues are contiguous, so their index varies even faster than the @@ -798,7 +798,7 @@ A value of -1 is returned if the data name is not recognized. The *get_griddata_by_index()* method is called after the *get_griddata_by_name()* method, using the data index it returned as its argument. This method will return a pointer to the -multi-dimensional array which stores the requested data. +multidimensional array which stores the requested data. As in the discussion above of the Memory class *create_offset()* methods, the dimensionality of the array associated with the returned diff --git a/doc/src/Developer_org.rst b/doc/src/Developer_org.rst index 563dae7b69..f1804655ae 100644 --- a/doc/src/Developer_org.rst +++ b/doc/src/Developer_org.rst @@ -15,7 +15,7 @@ for more information about that part of the build process. LAMMPS currently supports building with :doc:`conventional makefiles ` and through :doc:`CMake `. Those procedures differ in how packages are enabled or disabled for inclusion into a -LAMMPS binary so they cannot be mixed. The source files for each +LAMMPS binary, so they cannot be mixed. The source files for each package are in all-uppercase subdirectories of the ``src`` folder, for example ``src/MOLECULE`` or ``src/EXTRA-MOLECULE``. The ``src/STUBS`` subdirectory is not a package but contains a dummy MPI library, that is @@ -26,31 +26,31 @@ with traditional makefiles. The ``lib`` directory contains the source code for several supporting libraries or files with configuration settings to use globally installed -libraries, that are required by some of the optional packages. They may +libraries, that are required by some optional packages. They may include python scripts that can transparently download additional source code on request. Each subdirectory, like ``lib/poems`` or ``lib/gpu``, contains the source files, some of which are in different languages such -as Fortran or CUDA. These libraries included in the LAMMPS build, -if the corresponding package is installed. +as Fortran or CUDA. These libraries included in the LAMMPS build, if the +corresponding package is installed. LAMMPS C++ source files almost always come in pairs, such as ``src/run.cpp`` (implementation file) and ``src/run.h`` (header file). Each pair of files defines a C++ class, for example the -:cpp:class:`LAMMPS_NS::Run` class which contains the code invoked by the -:doc:`run ` command in a LAMMPS input script. As this example +:cpp:class:`LAMMPS_NS::Run` class, which contains the code invoked by +the :doc:`run ` command in a LAMMPS input script. As this example illustrates, source file and class names often have a one-to-one correspondence with a command used in a LAMMPS input script. Some source files and classes do not have a corresponding input script -command, e.g. ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force` +command, for example ``src/force.cpp`` and the :cpp:class:`LAMMPS_NS::Force` class. They are discussed in the next section. The names of all source files are in lower case and may use the -underscore character '_' to separate words. Outside of bundled libraries -which may have different conventions, all C and C++ header files have a -``.h`` extension, all C++ files have a ``.cpp`` extension, and C files a -``.c`` extension. A small number of C++ classes and utility functions -are implemented with only a ``.h`` file. Examples are the Pointers and -Commands classes or the MathVec functions. +underscore character '_' to separate words. Apart from bundled, +externally maintained libraries, which may have different conventions, +all C and C++ header files have a ``.h`` extension, all C++ files have a +``.cpp`` extension, and C files a ``.c`` extension. A few C++ classes +and utility functions are implemented with only a ``.h`` file. Examples +are the Pointers and Commands classes or the MathVec functions. Class topology -------------- @@ -62,35 +62,36 @@ associated source files in the ``src`` folder, for example the class :cpp:class:`LAMMPS_NS::Memory` corresponds to the files ``memory.cpp`` and ``memory.h``, or the class :cpp:class:`LAMMPS_NS::AtomVec` corresponds to the files ``atom_vec.cpp`` and ``atom_vec.h``. Full -lines in the figure represent compositing: that is the class at the base -of the arrow holds a pointer to an instance of the class at the tip. -Dashed lines instead represent inheritance: the class to the tip of the -arrow is derived from the class at the base. Classes with a red boundary -are not instantiated directly, but they represent the base classes for -"styles". Those "styles" make up the bulk of the LAMMPS code and only -a few representative examples are included in the figure so it remains -readable. +lines in the figure represent compositing: that is, the class at the +base of the arrow holds a pointer to an instance of the class at the +tip. Dashed lines instead represent inheritance: the class at the tip +of the arrow is derived from the class at the base. Classes with a red +boundary are not instantiated directly, but they represent the base +classes for "styles". Those "styles" make up the bulk of the LAMMPS +code and only a few representative examples are included in the figure, +so it remains readable. .. _class-topology: .. figure:: JPG/lammps-classes.png LAMMPS class topology - This figure shows some of the relations of the base classes of the - LAMMPS simulation package. Full lines indicate that a class holds an - instance of the class it is pointing to; dashed lines point to - derived classes that are given as examples of what classes may be - instantiated during a LAMMPS run based on the input commands and - accessed through the API define by their respective base classes. At - the core is the :cpp:class:`LAMMPS ` class, which - holds pointers to class instances with specific purposes. Those may - hold instances of other classes, sometimes directly, or only - temporarily, sometimes as derived classes or derived classes of - derived classes, which may also hold instances of other classes. + This figure shows relations of base classes of the LAMMPS + simulation package. Full lines indicate that a class holds an + instance of the class it is pointing to; dashed lines point to + derived classes that are given as examples of what classes may be + instantiated during a LAMMPS run based on the input commands and + accessed through the API define by their respective base classes. + At the core is the :cpp:class:`LAMMPS ` class, + which holds pointers to class instances with specific purposes. + Those may hold instances of other classes, sometimes directly, or + only temporarily, sometimes as derived classes or derived classes + of derived classes, which may also hold instances of other + classes. The :cpp:class:`LAMMPS_NS::LAMMPS` class is the topmost class and -represents what is generally referred to an "instance" of LAMMPS. It is -a composite holding pointers to instances of other core classes +represents what is generally referred to as an "instance of LAMMPS". It +is a composite holding pointers to instances of other core classes providing the core functionality of the MD engine in LAMMPS and through them abstractions of the required operations. The constructor of the LAMMPS class will instantiate those instances, process the command line @@ -102,42 +103,44 @@ LAMMPS while passing it the command line flags and input script. It deletes the LAMMPS instance after the method reading the input returns and shuts down the MPI environment before it exits the executable. -The :cpp:class:`LAMMPS_NS::Pointers` is not shown in the +The :cpp:class:`LAMMPS_NS::Pointers` class is not shown in the :ref:`class-topology` figure for clarity. It holds references to many of the members of the `LAMMPS_NS::LAMMPS`, so that all classes derived from :cpp:class:`LAMMPS_NS::Pointers` have direct access to those -reference. From the class topology all classes with blue boundary are +references. From the class topology all classes with blue boundary are referenced in the Pointers class and all classes in the second and third -columns, that are not listed as derived classes are instead derived from -:cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer references -in Pointers, a pointer to the LAMMPS class instance needs to be passed -to the constructor and thus all constructors for classes derived from it -must do so and pass this pointer to the constructor for Pointers. +columns, that are not listed as derived classes, are instead derived +from :cpp:class:`LAMMPS_NS::Pointers`. To initialize the pointer +references in Pointers, a pointer to the LAMMPS class instance needs to +be passed to the constructor. All constructors for classes derived from +it, must do so and thus pass that pointer to the constructor for +:cpp:class:`LAMMPS_NS::Pointers`. The default constructor for +:cpp:class:`LAMMPS_NS::Pointers` is disabled to enforce this. Since all storage is supposed to be encapsulated (there are a few exceptions), the LAMMPS class can also be instantiated multiple times by -a calling code. Outside of the aforementioned exceptions, those LAMMPS +a calling code. Outside the aforementioned exceptions, those LAMMPS instances can be used alternately. As of the time of this writing -(early 2021) LAMMPS is not yet sufficiently thread-safe for concurrent +(early 2023) LAMMPS is not yet sufficiently thread-safe for concurrent execution. When running in parallel with MPI, care has to be taken, that suitable copies of communicators are used to not create conflicts between different instances. -The LAMMPS class currently (early 2021) holds instances of 19 classes -representing the core functionality. There are a handful of virtual -parent classes in LAMMPS that define what LAMMPS calls ``styles``. They -are shaded red in the :ref:`class-topology` figure. Each of these are -parents of a number of child classes that implement the interface -defined by the parent class. There are two main categories of these -``styles``: some may only have one instance active at a time (e.g. atom, -pair, bond, angle, dihedral, improper, kspace, comm) and there is a -dedicated pointer variable for each of them in the composite class. +The LAMMPS class currently holds instances of 19 classes representing +the core functionality. There are a handful of virtual parent classes +in LAMMPS that define what LAMMPS calls ``styles``. These are shaded +red in the :ref:`class-topology` figure. Each of these are parents of a +number of child classes that implement the interface defined by the +parent class. There are two main categories of these ``styles``: some +may only have one instance active at a time (e.g. atom, pair, bond, +angle, dihedral, improper, kspace, comm) and there is a dedicated +pointer variable for each of them in the corresponding composite class. Setups that require a mix of different such styles have to use a -*hybrid* class that takes the place of the one allowed instance and then -manages and forwards calls to the corresponding sub-styles for the -designated subset of atoms or data. The composite class may also have -lists of class instances, e.g. Modify handles lists of compute and fix -styles, while Output handles a list of dump class instances. +*hybrid* class instance that acts as a proxy, and manages and forwards +calls to the corresponding sub-style class instances for the designated +subset of atoms or data. The composite class may also have lists of +class instances, e.g. ``Modify`` handles lists of compute and fix +styles, while ``Output`` handles a list of dump class instances. The exception to this scheme are the ``command`` style classes. These implement specific commands that can be invoked before, after, or in @@ -146,19 +149,19 @@ command() method called and then, after completion, the class instance deleted. Examples for this are the create_box, create_atoms, minimize, run, set, or velocity command styles. -For all those ``styles`` certain naming conventions are employed: for +For all those ``styles``, certain naming conventions are employed: for the fix nve command the class is called FixNVE and the source files are -``fix_nve.h`` and ``fix_nve.cpp``. Similarly for fix ave/time we have +``fix_nve.h`` and ``fix_nve.cpp``. Similarly, for fix ave/time we have FixAveTime and ``fix_ave_time.h`` and ``fix_ave_time.cpp``. Style names are lower case and without spaces or special characters. A suffix or words are appended with a forward slash '/' which denotes a variant of the corresponding class without the suffix. To connect the style name and the class name, LAMMPS uses macros like: ``AtomStyle()``, ``PairStyle()``, ``BondStyle()``, ``RegionStyle()``, and so on in the -corresponding header file. During configuration or compilation files +corresponding header file. During configuration or compilation, files with the pattern ``style_.h`` are created that consist of a list of include statements including all headers of all styles of a given -type that are currently active (or "installed). +type that are currently enabled (or "installed"). More details on individual classes in the :ref:`class-topology` are as @@ -172,8 +175,8 @@ follows: that one or multiple simulations can be run, on the processors allocated for a run, e.g. by the mpirun command. -- The Input class reads and processes input input strings and files, - stores variables, and invokes :doc:`commands `. +- The Input class reads and processes input (strings and files), stores + variables, and invokes :doc:`commands `. - Command style classes are derived from the Command class. They provide input script commands that perform one-time operations @@ -192,7 +195,7 @@ follows: - The Atom class stores per-atom properties associated with atom styles. More precisely, they are allocated and managed by a class derived from the AtomVec class, and the Atom class simply stores pointers to them. - The classes derived from AtomVec represent the different atom styles + The classes derived from AtomVec represent the different atom styles, and they are instantiated through the :doc:`atom_style ` command. @@ -206,18 +209,22 @@ follows: class stores a single list (for all atoms). A NeighRequest class instance is created by pair, fix, or compute styles when they need a particular kind of neighbor list and use the NeighRequest properties - to select the neighbor list settings for the given request. There can - be multiple instances of the NeighRequest class and the Neighbor class - will try to optimize how they are computed by creating copies or - sub-lists where possible. + to select the neighbor list settings for the given request. There can + be multiple instances of the NeighRequest class. The Neighbor class + will try to optimize how the requests are processed. Depending on the + NeighRequest properties, neighbor lists are constructed from scratch, + aliased, or constructed by post-processing an existing list into + sub-lists. - The Comm class performs inter-processor communication, typically of ghost atom information. This usually involves MPI message exchanges with 6 neighboring processors in the 3d logical grid of processors mapped to the simulation box. There are two :doc:`communication styles - ` enabling different ways to do the domain decomposition. - Sometimes the Irregular class is used, when atoms may migrate to - arbitrary processors. + `, enabling different ways to perform the domain + decomposition. + +- The Irregular class is used, when atoms may migrate to arbitrary + processors. - The Domain class stores the simulation box geometry, as well as geometric Regions and any user definition of a Lattice. The latter @@ -246,7 +253,7 @@ follows: file, dump file snapshots, and restart files. These correspond to the :doc:`Thermo `, :doc:`Dump `, and :doc:`WriteRestart ` classes respectively. The Dump - class is a base class with several derived classes implementing + class is a base class, with several derived classes implementing various dump style variants. - The Timer class logs timing information, output at the end diff --git a/doc/src/Developer_par_comm.rst b/doc/src/Developer_par_comm.rst index ec3ef87f47..88f0bf7fe9 100644 --- a/doc/src/Developer_par_comm.rst +++ b/doc/src/Developer_par_comm.rst @@ -1,12 +1,12 @@ Communication ^^^^^^^^^^^^^ -Following the partitioning scheme in use all per-atom data is +Following the selected partitioning scheme, all per-atom data is distributed across the MPI processes, which allows LAMMPS to handle very large systems provided it uses a correspondingly large number of MPI processes. Since The per-atom data (atom IDs, positions, velocities, -types, etc.) To be able to compute the short-range interactions MPI -processes need not only access to data of atoms they "own" but also +types, etc.) To be able to compute the short-range interactions, MPI +processes need not only access to the data of atoms they "own" but also information about atoms from neighboring subdomains, in LAMMPS referred to as "ghost" atoms. These are copies of atoms storing required per-atom data for up to the communication cutoff distance. The green @@ -59,7 +59,7 @@ and upper *y*-boundary of rank 0's subdomain. In the *y* stage, ranks 4,5,6 send atoms in their blue-shaded regions to rank 0. This may include ghost atoms they received in the *x* stage, but only if they are needed by rank 0 to fill its extended ghost atom regions in the -+/-*y* directions (blue rectangles). Thus in this case, ranks 5 and ++/-*y* directions (blue rectangles). Thus, in this case, ranks 5 and 6 do not include ghost atoms they received from each other (in the *x* stage) in the atoms they send to rank 0. The key point is that while the pattern of communication is more complex in the irregular @@ -78,14 +78,14 @@ A "reverse" communication is when computed ghost atom attributes are sent back to the processor who owns the atom. This is used (for example) to sum partial forces on ghost atoms to the complete force on owned atoms. The order of the two stages described in the -:ref:`ghost-atom-comm` figure is inverted and the same lists of atoms +:ref:`ghost-atom-comm` figure is inverted, and the same lists of atoms are used to pack and unpack message buffers with per-atom forces. When a received buffer is unpacked, the ghost forces are summed to owned atom forces. As in forward communication, forces on atoms in the four blue corners of the diagrams are sent, received, and summed twice (once at each stage) before owning processors have the full force. -These two operations are used many places within LAMMPS aside from +These two operations are used in many places within LAMMPS aside from exchange of coordinates and forces, for example by manybody potentials to share intermediate per-atom values, or by rigid-body integrators to enable each atom in a body to access body properties. Here are @@ -105,7 +105,7 @@ performed in LAMMPS: atom pairs when building neighbor lists or computing forces. - The cutoff distance for exchanging ghost atoms is typically equal to - the neighbor cutoff. But it can also chosen to be longer if needed, + the neighbor cutoff. But it can also set to a larger value if needed, e.g. half the diameter of a rigid body composed of multiple atoms or over 3x the length of a stretched bond for dihedral interactions. It can also exceed the periodic box size. For the regular communication @@ -113,7 +113,7 @@ performed in LAMMPS: processor's subdomain, then multiple exchanges are performed in the same direction. Each exchange is with the same neighbor processor, but buffers are packed/unpacked using a different list of atoms. For - forward communication, in the first exchange a processor sends only + forward communication, in the first exchange, a processor sends only owned atoms. In subsequent exchanges, it sends ghost atoms received in previous exchanges. For the irregular pattern (right) overlaps of a processor's extended ghost-atom subdomain with all other processors diff --git a/doc/src/Developer_par_long.rst b/doc/src/Developer_par_long.rst index cd240ffd21..73b92e47c2 100644 --- a/doc/src/Developer_par_long.rst +++ b/doc/src/Developer_par_long.rst @@ -40,28 +40,28 @@ orthogonal boxes. .. _fft-parallel: .. figure:: img/fft-decomp-parallel.png - :align: center - parallel FFT in PPPM + Parallel FFT in PPPM - Stages of a parallel FFT for a simulation domain overlaid - with an 8x8x8 3d FFT grid, partitioned across 64 processors. - Within each of the 4 diagrams, grid cells of the same color are - owned by a single processor; for simplicity only cells owned by 4 - or 8 of the 64 processors are colored. The two images on the left - illustrate brick-to-pencil communication. The two images on the - right illustrate pencil-to-pencil communication, which in this - case transposes the *y* and *z* dimensions of the grid. + Stages of a parallel FFT for a simulation domain overlaid with an + 8x8x8 3d FFT grid, partitioned across 64 processors. Within each + of the 4 diagrams, grid cells of the same color are owned by a + single processor; for simplicity, only cells owned by 4 or 8 of + the 64 processors are colored. The two images on the left + illustrate brick-to-pencil communication. The two images on the + right illustrate pencil-to-pencil communication, which in this + case transposes the *y* and *z* dimensions of the grid. Parallel 3d FFTs require substantial communication relative to their computational cost. A 3d FFT is implemented by a series of 1d FFTs -along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus the FFT -grid cannot be decomposed like atoms into 3 dimensions for parallel +along the *x-*, *y-*, and *z-*\ direction of the FFT grid. Thus, the +FFT grid cannot be decomposed like atoms into 3 dimensions for parallel processing of the FFTs but only in 1 (as planes) or 2 (as pencils) dimensions and in between the steps the grid needs to be transposed to have the FFT grid portion "owned" by each MPI process complete in the direction of the 1d FFTs it has to perform. LAMMPS uses the -pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure. +pencil-decomposition algorithm as shown in the :ref:`fft-parallel` +figure. Initially (far left), each processor owns a brick of same-color grid cells (actually grid points) contained within in its subdomain. A @@ -97,7 +97,7 @@ across all $P$ processors with a single call to ``MPI_Alltoall()``, but this is typically much slower. However, for the specialized brick and pencil tiling illustrated in :ref:`fft-parallel` figure, collective communication across the entire MPI communicator is not required. In -the example an :math:`8^3` grid with 512 grid cells is partitioned +the example, an :math:`8^3` grid with 512 grid cells is partitioned across 64 processors; each processor owns a 2x2x2 3d brick of grid cells. The initial brick-to-pencil communication (upper left to upper right) only requires collective communication within subgroups of 4 @@ -132,7 +132,7 @@ grid/particle operations that LAMMPS supports: - The fftMPI library allows each grid dimension to be a multiple of small prime factors (2,3,5), and allows any number of processors to perform the FFT. The resulting brick and pencil decompositions are - thus not always as well-aligned but the size of subgroups of + thus not always as well-aligned, but the size of subgroups of processors for the two modes of communication (brick/pencil and pencil/pencil) still scale as :math:`O(P^{\frac{1}{3}})` and :math:`O(P^{\frac{1}{2}})`. @@ -143,21 +143,23 @@ grid/particle operations that LAMMPS supports: in memory. This reordering can be done during the packing or unpacking of buffers for MPI communication. -- For large systems and particularly a large number of MPI processes, - the dominant cost for parallel FFTs is often the communication, not - the computation of 1d FFTs, even though the latter scales as :math:`N - \log(N)` in the number of grid points *N* per grid direction. This is - due to the fact that only a 2d decomposition into pencils is possible - while atom data (and their corresponding short-range force and energy - computations) can be decomposed efficiently in 3d. +- For large systems and particularly many MPI processes, the dominant + cost for parallel FFTs is often the communication, not the computation + of 1d FFTs, even though the latter scales as :math:`N \log(N)` in the + number of grid points *N* per grid direction. This is due to the fact + that only a 2d decomposition into pencils is possible while atom data + (and their corresponding short-range force and energy computations) + can be decomposed efficiently in 3d. - This can be addressed by reducing the number of MPI processes involved - in the MPI communication by using :doc:`hybrid MPI + OpenMP - parallelization `. This will use OpenMP parallelization - inside the MPI domains and while that may have a lower parallel - efficiency, it reduces the communication overhead. + Reducing the number of MPI processes involved in the MPI communication + will reduce this kind of overhead. By using a :doc:`hybrid MPI + + OpenMP parallelization ` it is still possible to use all + processes for parallel computation. It will use OpenMP + parallelization inside the MPI domains. While that may have a lower + parallel efficiency for some part of the computation, that can be less + than the communication overhead in the 3d FFTs. - As an alternative it is also possible to start a :ref:`multi-partition + As an alternative, it is also possible to start a :ref:`multi-partition ` calculation and then use the :doc:`verlet/split integrator ` to perform the PPPM computation on a dedicated, separate partition of MPI processes. This uses an integer @@ -175,7 +177,7 @@ grid/particle operations that LAMMPS supports: manner consistent with processor subdomains, and provides methods for forward and reverse communication of owned and ghost grid point values. It is used for PPPM as an FFT grid (as outlined above) and - also for the MSM algorithm which uses a cascade of grid sizes from + also for the MSM algorithm, which uses a cascade of grid sizes from fine to coarse to compute long-range Coulombic forces. The GridComm class is also useful for models where continuum fields interact with particles. For example, the two-temperature model (TTM) defines heat diff --git a/doc/src/Developer_par_neigh.rst b/doc/src/Developer_par_neigh.rst index 43cbf1b0e2..c3eaf9c0f7 100644 --- a/doc/src/Developer_par_neigh.rst +++ b/doc/src/Developer_par_neigh.rst @@ -3,12 +3,12 @@ Neighbor lists To compute forces efficiently, each processor creates a Verlet-style neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned, -*j* = owned or ghost) with separation less than the applicable -neighbor list cutoff distance. In LAMMPS the neighbor lists are stored -in a multiple-page data structure; each page is a contiguous chunk of -memory which stores vectors of neighbor atoms *j* for many *i* atoms. -This allows pages to be incrementally allocated or deallocated in blocks -as needed. Neighbor lists typically consume the most memory of any data +*j* = owned or ghost) with separation less than the applicable neighbor +list cutoff distance. In LAMMPS, the neighbor lists are stored in a +multiple-page data structure; each page is a contiguous chunk of memory +which stores vectors of neighbor atoms *j* for many *i* atoms. This +allows pages to be incrementally allocated or deallocated in blocks as +needed. Neighbor lists typically consume the most memory of any data structure in LAMMPS. The neighbor list is rebuilt (from scratch) once every few timesteps, then used repeatedly each step for force or other computations. The neighbor cutoff distance is :math:`R_n = R_f + @@ -16,7 +16,7 @@ computations. The neighbor cutoff distance is :math:`R_n = R_f + the interatomic potential for computing short-range pairwise or manybody forces and :math:`\Delta_s` is a "skin" distance that allows the list to be used for multiple steps assuming that atoms do not move very far -between consecutive time steps. Typically the code triggers +between consecutive time steps. Typically, the code triggers reneighboring when any atom has moved half the skin distance since the last reneighboring; this and other options of the neighbor list rebuild can be adjusted with the :doc:`neigh_modify ` command. @@ -26,10 +26,10 @@ their owning processor's subdomain are first migrated to new processors via communication. Periodic boundary conditions are also (only) enforced on these steps to ensure each atom is re-assigned to the correct processor. After migration, the atoms owned by each processor -are stored in a contiguous vector. Periodically each processor +are stored in a contiguous vector. Periodically, each processor spatially sorts owned atoms within its vector to reorder it for improved cache efficiency in force computations and neighbor list building. For -that atoms are spatially binned and then reordered so that atoms in the +that, atoms are spatially binned and then reordered so that atoms in the same bin are adjacent in the vector. Atom sorting can be disabled or its settings modified with the :doc:`atom_modify ` command. @@ -44,7 +44,7 @@ its settings modified with the :doc:`atom_modify ` command. (left) and triclinic (right) domains. A regular grid of neighbor bins (thin lines) overlays the entire simulation domain and need not align with subdomain boundaries; only the portion overlapping the - augmented subdomain is shown. In the triclinic case it overlaps the + augmented subdomain is shown. In the triclinic case, it overlaps the bounding box of the tilted rectangle. The blue- and red-shaded bins represent a stencil of bins searched to find neighbors of a particular atom (black dot). @@ -53,12 +53,12 @@ To build a local neighbor list in linear time, the simulation domain is overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins, as shown in the :ref:`neighbor-stencil` figure for 2d models and a single MPI processor's subdomain. Each processor stores a set of -neighbor bins which overlap its subdomain extended by the neighbor +neighbor bins which overlap its subdomain, extended by the neighbor cutoff distance :math:`R_n`. As illustrated, the bins need not align with processor boundaries; an integer number in each dimension is fit to the size of the entire simulation box. -Most often LAMMPS builds what it calls a "half" neighbor list where +Most often, LAMMPS builds what is called a "half" neighbor list where each *i,j* neighbor pair is stored only once, with either atom *i* or *j* as the central atom. The build can be done efficiently by using a pre-computed "stencil" of bins around a central origin bin which @@ -67,18 +67,18 @@ is simply a list of integer offsets in *x,y,z* of nearby bins surrounding the origin bin which are close enough to contain any neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the origin bin. Note that for a half neighbor list, the stencil can be -asymmetric since each atom only need store half its nearby neighbors. +asymmetric, since each atom only need store half its nearby neighbors. These stencils are illustrated in the figure for a half list and a bin size of :math:`\frac{1}{2} R_n`. There are 13 red+blue stencil bins in 2d (for the orthogonal case, 15 for triclinic). In 3d there would be 63, 13 in the plane of bins that contain the origin bin and 25 in each of the two planes above it in the *z* direction (75 for triclinic). The -reason the triclinic stencil has extra bins is because the bins tile the -bounding box of the entire triclinic domain and thus are not periodic -with respect to the simulation box itself. The stencil and logic for -determining which *i,j* pairs to include in the neighbor list are -altered slightly to account for this. +triclinic stencil has extra bins because the bins tile the bounding box +of the entire triclinic domain, and thus are not periodic with respect +to the simulation box itself. The stencil and logic for determining +which *i,j* pairs to include in the neighbor list are altered slightly +to account for this. To build a neighbor list, a processor first loops over its "owned" plus "ghost" atoms and assigns each to a neighbor bin. This uses an integer @@ -95,7 +95,7 @@ supports: been found to be optimal for many typical cases. Smaller bins incur additional overhead to loop over; larger bins require more distance calculations. Note that for smaller bin sizes, the 2d stencil in the - figure would be more semi-circular in shape (hemispherical in 3d), + figure would be of a more semicircular shape (hemispherical in 3d), with bins near the corners of the square eliminated due to their distance from the origin bin. @@ -111,8 +111,8 @@ supports: symmetric stencil. It also includes lists with partial enumeration of ghost atom neighbors. The full and ghost-atom lists are used by various manybody interatomic potentials. Lists may also use different - criteria for inclusion of a pair interaction. Typically this simply - depends only on the distance between two atoms and the cutoff + criteria for inclusion of a pairwise interaction. Typically, this + simply depends only on the distance between two atoms and the cutoff distance. But for finite-size coarse-grained particles with individual diameters (e.g. polydisperse granular particles), it can also depend on the diameters of the two particles. @@ -121,11 +121,11 @@ supports: of the master neighbor list for the full system need to be generated, one for each sub-style, which contains only the *i,j* pairs needed to compute interactions between subsets of atoms for the corresponding - potential. This means not all *i* or *j* atoms owned by a processor + potential. This means, not all *i* or *j* atoms owned by a processor are included in a particular sub-list. - Some models use different cutoff lengths for pairwise interactions - between different kinds of particles which are stored in a single + between different kinds of particles, which are stored in a single neighbor list. One example is a solvated colloidal system with large colloidal particles where colloid/colloid, colloid/solvent, and solvent/solvent interaction cutoffs can be dramatically different. @@ -153,7 +153,7 @@ supports: For the newton pair *on* setting the atom *j* is only added to the list if its *z* coordinate is larger, or if equal the *y* coordinate is larger, and that is equal, too, the *x* coordinate is larger. For - homogeneously dense systems that will result in picking neighbors from + homogeneously dense systems, that will result in picking neighbors from a same size sector in always the same direction relative to the - "owned" atom and thus it should lead to similar length neighbor lists - and thus reduce the chance of a load imbalance. + "owned" atom, and thus it should lead to similar length neighbor lists + and reduce the chance of a load imbalance. diff --git a/doc/src/Developer_par_openmp.rst b/doc/src/Developer_par_openmp.rst index 91c649a7b8..7e2021de67 100644 --- a/doc/src/Developer_par_openmp.rst +++ b/doc/src/Developer_par_openmp.rst @@ -6,7 +6,7 @@ thread parallelism to predominantly distribute loops over local data and thus follow an orthogonal parallelization strategy to the decomposition into spatial domains used by the :doc:`MPI partitioning `. For clarity, this section discusses only the -implementation in the OPENMP package as it is the simplest. The INTEL +implementation in the OPENMP package, as it is the simplest. The INTEL and KOKKOS package offer additional options and are more complex since they support more features and different hardware like co-processors or GPUs. @@ -14,7 +14,7 @@ or GPUs. One of the key decisions when implementing the OPENMP package was to keep the changes to the source code small, so that it would be easier to maintain the code and keep it in sync with the non-threaded standard -implementation. this is achieved by a) making the OPENMP version a +implementation. This is achieved by a) making the OPENMP version a derived class from the regular version (e.g. ``PairLJCutOMP`` from ``PairLJCut``) and overriding only methods that are multi-threaded or need to be modified to support multi-threading (similar to what was done @@ -26,13 +26,13 @@ into three separate classes ``ThrOMP``, ``ThrData``, and ``FixOMP``. available in the corresponding base class (e.g. ``Pair`` for ``PairLJCutOMP``) like multi-thread aware variants of the "tally" functions. Those functions are made available through multiple -inheritance so those new functions have to have unique names to avoid +inheritance, so those new functions have to have unique names to avoid ambiguities; typically ``_thr`` is appended to the name of the function. -``ThrData`` is a classes that manages per-thread data structures. -It is used instead of extending the corresponding storage to per-thread -arrays to avoid slowdowns due to "false sharing" when multiple threads -update adjacent elements in an array and thus force the CPU cache lines -to be reset and re-fetched. ``FixOMP`` finally manages the "multi-thread +``ThrData`` is a class that manages per-thread data structures. It is +used instead of extending the corresponding storage to per-thread arrays +to avoid slowdowns due to "false sharing" when multiple threads update +adjacent elements in an array and thus force the CPU cache lines to be +reset and re-fetched. ``FixOMP`` finally manages the "multi-thread state" like settings and access to per-thread storage, it is activated by the :doc:`package omp ` command. @@ -46,24 +46,24 @@ involve multiple atoms and thus there are race conditions when multiple threads want to update per-atom data of the same atoms. Five possible strategies have been considered to avoid this: -1) restructure the code so that there is no overlapping access possible +1. Restructure the code so that there is no overlapping access possible when computing in parallel, e.g. by breaking lists into multiple parts and synchronizing threads in between. -2) have each thread be "responsible" for a specific group of atoms and +2. Have each thread be "responsible" for a specific group of atoms and compute these interactions multiple times, once on each thread that - is responsible for a given atom and then have each thread only update + is responsible for a given atom, and then have each thread only update the properties of this atom. -3) use mutexes around functions and regions of code where the data race - could happen -4) use atomic operations when updating per-atom properties -5) use replicated per-thread data structures to accumulate data without +3. Use mutexes around functions and regions of code where the data race + could happen. +4. Use atomic operations when updating per-atom properties. +5. Use replicated per-thread data structures to accumulate data without conflicts and then use a reduction to combine those results into the data structures used by the regular style. Option 5 was chosen for the OPENMP package because it would retain the -performance for the case of 1 thread and the code would be more +performance for the case of a single thread and the code would be more maintainable. Option 1 would require extensive code changes, -particularly to the neighbor list code; options 2 would have incurred a +particularly to the neighbor list code; option 2 would have incurred a 2x or more performance penalty for the serial case; option 3 causes significant overhead and would enforce serialization of operations in inner loops and thus defeat the purpose of multi-threading; option 4 @@ -80,7 +80,7 @@ equivalent to the number of CPU cores per CPU socket on high-end supercomputers. Thus arrays like the force array are dimensioned to the number of atoms -times the number of threads when enabling OpenMP support and inside the +times the number of threads when enabling OpenMP support, and inside the compute functions a pointer to a different chunk is obtained by each thread. Similarly, accumulators like potential energy or virial are kept in per-thread instances of the ``ThrData`` class and then only reduced and @@ -91,7 +91,7 @@ Loop scheduling """"""""""""""" Multi-thread parallelization is applied by distributing (outer) loops -statically across threads. Typically this would be the loop over local +statically across threads. Typically, this would be the loop over local atoms *i* when processing *i,j* pairs of atoms from a neighbor list. The design of the neighbor list code results in atoms having a similar number of neighbors for homogeneous systems and thus load imbalances diff --git a/doc/src/Developer_par_part.rst b/doc/src/Developer_par_part.rst index be42857f63..06042c59a5 100644 --- a/doc/src/Developer_par_part.rst +++ b/doc/src/Developer_par_part.rst @@ -7,36 +7,36 @@ distributed-memory parallelism is set with the :doc:`comm_style command .. _domain-decomposition: .. figure:: img/domain-decomp.png - :align: center - domain decomposition + Domain decomposition schemes - This figure shows the different kinds of domain decomposition used - for MPI parallelization: "brick" on the left with an orthogonal - (left) and a triclinic (middle) simulation domain, and a "tiled" - decomposition (right). The black lines show the division into - subdomains and the contained atoms are "owned" by the corresponding - MPI process. The green dashed lines indicate how subdomains are - extended with "ghost" atoms up to the communication cutoff distance. + This figure shows the different kinds of domain decomposition used + for MPI parallelization: "brick" on the left with an orthogonal + (left) and a triclinic (middle) simulation domain, and a "tiled" + decomposition (right). The black lines show the division into + subdomains, and the contained atoms are "owned" by the + corresponding MPI process. The green dashed lines indicate how + subdomains are extended with "ghost" atoms up to the communication + cutoff distance. -The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal -or triclinic in shape, as illustrated in the :ref:`domain-decomposition` -figure for the 2d case. Orthogonal means the box edges are aligned with -the *x*, *y*, *z* Cartesian axes, and the box faces are thus all -rectangular. Triclinic allows for a more general parallelepiped shape -in which edges are aligned with three arbitrary vectors and the box -faces are parallelograms. In each dimension box faces can be periodic, -or non-periodic with fixed or shrink-wrapped boundaries. In the fixed -case, atoms which move outside the face are deleted; shrink-wrapped -means the position of the box face adjusts continuously to enclose all -the atoms. +The LAMMPS simulation box is a 3d or 2d volume, which can be of +orthogonal or triclinic shape, as illustrated in the +:ref:`domain-decomposition` figure for the 2d case. Orthogonal means +the box edges are aligned with the *x*, *y*, *z* Cartesian axes, and the +box faces are thus all rectangular. Triclinic allows for a more general +parallelepiped shape in which edges are aligned with three arbitrary +vectors and the box faces are parallelograms. In each dimension, box +faces can be periodic, or non-periodic with fixed or shrink-wrapped +boundaries. In the fixed case, atoms which move outside the face are +deleted; shrink-wrapped means the position of the box face adjusts +continuously to enclose all the atoms. For distributed-memory MPI parallelism, the simulation box is spatially decomposed (partitioned) into non-overlapping subdomains which fill the box. The default partitioning, "brick", is most suitable when atom density is roughly uniform, as shown in the left-side images of the :ref:`domain-decomposition` figure. The subdomains comprise a regular -grid and all subdomains are identical in size and shape. Both the +grid, and all subdomains are identical in size and shape. Both the orthogonal and triclinic boxes can deform continuously during a simulation, e.g. to compress a solid or shear a liquid, in which case the processor subdomains likewise deform. @@ -76,14 +76,14 @@ the load imbalance: The pictures above demonstrate different decompositions for a 2d system with 12 MPI ranks. The atom colors indicate the load imbalance of each -subdomain with green being optimal and red the least optimal. +subdomain, with green being optimal and red the least optimal. -Due to the vacuum in the system, the default decomposition is unbalanced -with several MPI ranks without atoms (left). By forcing a 1x12x1 -processor grid, every MPI rank does computations now, but number of -atoms per subdomain is still uneven and the thin slice shape increases -the amount of communication between subdomains (center left). With a -2x6x1 processor grid and shifting the subdomain divisions, the load -imbalance is further reduced and the amount of communication required -between subdomains is less (center right). And using the recursive -bisectioning leads to further improved decomposition (right). +Due to the vacuum in the system, the default decomposition is +unbalanced, with several MPI ranks without atoms (left). By forcing a +1x12x1 processor grid, every MPI rank does computations now, but the +number of atoms per subdomain is still uneven, and the thin slice shape +increases the amount of communication between subdomains (center +left). With a 2x6x1 processor grid and shifting the subdomain divisions, +the load imbalance is further reduced and the amount of communication +required between subdomains is less (center right). And using the +recursive bisectioning leads to further improved decomposition (right). diff --git a/doc/src/Library.rst b/doc/src/Library.rst index 3533c347c8..09561cda82 100644 --- a/doc/src/Library.rst +++ b/doc/src/Library.rst @@ -136,7 +136,7 @@ The LAMMPS Python module enables calling the LAMMPS C library API from Python by dynamically loading functions in the LAMMPS shared library through the `Python ctypes module `_. Because of the dynamic loading, it is **required** that LAMMPS is compiled -in :ref:`"shared" mode `. The Python interface is object oriented, but +in :ref:`"shared" mode `. The Python interface is object-oriented, but otherwise tries to be very similar to the C library API. Three different Python classes to run LAMMPS are available and they build on each other. More information on this is in the :doc:`Python_head` @@ -152,7 +152,7 @@ LAMMPS Fortran API The LAMMPS Fortran module is a wrapper around calling functions from the LAMMPS C library API. This is done using the ISO_C_BINDING feature in -Fortran 2003. The interface is object oriented but otherwise tries to +Fortran 2003. The interface is object-oriented but otherwise tries to be very similar to the C library API and the basic Python module. .. toctree:: diff --git a/doc/src/Tools.rst b/doc/src/Tools.rst index 0421cf33e4..20e3d2c094 100644 --- a/doc/src/Tools.rst +++ b/doc/src/Tools.rst @@ -1071,7 +1071,7 @@ getting started, but not as a fully tested and supported feature of the LAMMPS distribution. Any contributions to complete this are, of course, welcome. Please also note, that for the case of creating a Python wrapper, a fully supported :doc:`Ctypes based lammps module ` -already exists. That module is designed to be object oriented while +already exists. That module is designed to be object-oriented while SWIG will generate a 1:1 translation of the functions in the interface file. Building the wrapper diff --git a/doc/src/fix_rigid.rst b/doc/src/fix_rigid.rst index 3a2477f90a..884d883f5a 100644 --- a/doc/src/fix_rigid.rst +++ b/doc/src/fix_rigid.rst @@ -730,18 +730,18 @@ is because there can only be one fix which monitors the global pressure and changes the simulation box dimensions. So you have 3 choices: -* Use one of the 4 NPT or NPH styles for the rigid bodies. Use the - *dilate* all option so that it will dilate the positions of the - non-rigid particles as well. Use :doc:`fix nvt ` (or any - other thermostat) for the non-rigid particles. -* Use :doc:`fix npt ` for the group of non-rigid particles. Use - the *dilate* all option so that it will dilate the center-of-mass - positions of the rigid bodies as well. Use one of the 4 NVE or 2 NVT - rigid styles for the rigid bodies. -* Use :doc:`fix press/berendsen ` to compute the - pressure and change the box dimensions. Use one of the 4 NVE or 2 NVT - rigid styles for the rigid bodies. Use :doc:`fix nvt ` (or - any other thermostat) for the non-rigid particles. +#. Use one of the 4 NPT or NPH styles for the rigid bodies. Use the + *dilate* all option so that it will dilate the positions of the + non-rigid particles as well. Use :doc:`fix nvt ` (or any + other thermostat) for the non-rigid particles. +#. Use :doc:`fix npt ` for the group of non-rigid particles. Use + the *dilate* all option so that it will dilate the center-of-mass + positions of the rigid bodies as well. Use one of the 4 NVE or 2 NVT + rigid styles for the rigid bodies. +#. Use :doc:`fix press/berendsen ` to compute the + pressure and change the box dimensions. Use one of the 4 NVE or 2 NVT + rigid styles for the rigid bodies. Use :doc:`fix nvt ` (or + any other thermostat) for the non-rigid particles. In all case, the rigid bodies and non-rigid particles both contribute to the global pressure and the box is scaled the same by any of the diff --git a/doc/src/prd.rst b/doc/src/prd.rst index 0f6e9e481f..7c6af3abb5 100644 --- a/doc/src/prd.rst +++ b/doc/src/prd.rst @@ -166,7 +166,7 @@ events, all the other replicas also run dynamics and event checking with the same schedule, but the final states are always overwritten by the state of the event replica. -The outer loop of the pseudo-code above continues until *N* steps of +The outer loop of the pseudocode above continues until *N* steps of dynamics have been performed. Note that *N* only includes the dynamics of stages 2 and 3, not the steps taken during dephasing or the minimization iterations of quenching. The specified *N* is diff --git a/src/pointers.h b/src/pointers.h index 09efa49cbc..7e5d5dbeb0 100644 --- a/src/pointers.h +++ b/src/pointers.h @@ -94,7 +94,7 @@ class Pointers { python(ptr->python) {} virtual ~Pointers() = default; - // remove default members execept for the copy constructor + // remove other default members Pointers() = delete; Pointers(const Pointers &) = default;