Merge pull request #2957 from akohlmey/next_release_version

Step version strings for stable release
Merge pull request #2966 from akohlmey/cmake-tweaks
2021-09-29 20:40:00 -04:00 · 2021-09-29 19:46:33 -04:00 · 2021-09-29 18:42:00 -04:00 · 2021-09-29 15:13:55 -04:00 · 2021-09-29 14:40:22 -04:00 · 2021-09-29 14:04:01 -04:00
573 changed files with 5196 additions and 4341 deletions
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@ -5,8 +5,9 @@ Thank your for considering to contribute to the LAMMPS software project.
 The following is a set of guidelines as well as explanations of policies and work flows for contributing to the LAMMPS molecular dynamics software project. These guidelines focus on submitting issues or pull requests on the LAMMPS GitHub project.

 Thus please also have a look at:
-* [The Section on submitting new features for inclusion in LAMMPS of the Manual](https://lammps.sandia.gov/doc/Modify_contribute.html)
-* [The LAMMPS GitHub Tutorial in the Manual](http://lammps.sandia.gov/doc/Howto_github.html)
+* [The guide for submitting new features in the LAMMPS manual](https://lammps.sandia.gov/doc/Modify_contribute.html)
+* [The guide on programming style and requirement in the LAMMPS manual](https://lammps.sandia.gov/doc/Modify_contribute.html)
+* [The GitHub tutorial in the LAMMPS manual](http://lammps.sandia.gov/doc/Howto_github.html)

 ## Table of Contents

@ -26,11 +27,11 @@ __

 ## I don't want to read this whole thing I just have a question!

-> **Note:** Please do not file an issue to ask a general question about LAMMPS, its features, how to use specific commands, or how perform simulations or analysis in LAMMPS. Instead post your question to either the ['lammps-users' mailing list](https://lammps.sandia.gov/mail.html) or the [LAMMPS Material Science Discourse forum](https://matsci.org/lammps). You do not need to be subscribed to post to the list (but a mailing list subscription avoids having your post delayed until it is approved by a mailing list moderator). Most posts to the mailing list receive a response within less than 24 hours. Before posting to the mailing list, please read the [mailing list guidelines](https://lammps.sandia.gov/guidelines.html). Following those guidelines will help greatly to get a helpful response. Always mention which LAMMPS version you are using. The LAMMPS forum was recently created as part of a larger effort to build a materials science community and have discussions not just about using LAMMPS. Thus the forum may be also used for discussions that would be off-topic for the mailing list. Those will just have to be moved to a more general category.
+> **Note:** Please do not file an issue to ask a general question about LAMMPS, its features, how to use specific commands, or how perform simulations or analysis in LAMMPS. Instead post your question to either the ['lammps-users' mailing list](https://lammps.sandia.gov/mail.html) or the [LAMMPS Material Science Discourse forum](https://matsci.org/lammps). You do not need to be subscribed to post to the list (but a mailing list subscription avoids having your post delayed until it is approved by a mailing list moderator). Most posts to the mailing list receive a response within less than 24 hours. Before posting to the mailing list, please read the [mailing list guidelines](https://lammps.sandia.gov/guidelines.html). Following those guidelines will help greatly to get a helpful response. Always mention which LAMMPS version you are using. The LAMMPS forum was recently created as part of a larger effort to build a materials science community and have discussions not just about using LAMMPS. Thus the forum may be also used for discussions that would be off-topic for the mailing list. Those will just have to be posted to a more general category.

 ## How Can I Contribute?

-There are several ways how you can actively contribute to the LAMMPS project: you can discuss compiling and using LAMMPS, and solving LAMMPS related problems with other LAMMPS users on the lammps-users mailing list, you can report bugs or suggest enhancements by creating issues on GitHub (or posting them to the lammps-users mailing list or posting in the LAMMPS Materials Science Discourse forum), and you can contribute by submitting pull requests on GitHub or e-mail your code
+There are several ways how you can actively contribute to the LAMMPS project: you can discuss compiling and using LAMMPS, and solving LAMMPS related problems with other LAMMPS users on the lammps-users mailing list or the forum, you can report bugs or suggest enhancements by creating issues on GitHub (or posting them to the lammps-users mailing list or posting in the LAMMPS Materials Science Discourse forum), and you can contribute by submitting pull requests on GitHub or e-mail your code
 to one of the [LAMMPS core developers](https://lammps.sandia.gov/authors.html). As you may see from the aforementioned developer page, the LAMMPS software package includes the efforts of a very large number of contributors beyond the principal authors and maintainers.

 ### Discussing How To Use LAMMPS
@ -62,37 +63,12 @@ To be able to submit an issue on GitHub, you have to register for an account (fo

 ### Contributing Code

-We encourage users to submit new features or modifications for LAMMPS to the core developers so they can be added to the LAMMPS distribution. The preferred way to manage and coordinate this is by submitting a pull request at the LAMMPS project on GitHub. For any larger modifications or programming project, you are encouraged to contact the LAMMPS developers ahead of time, in order to discuss implementation strategies and coding guidelines, that will make it easier to integrate your contribution and result in less work for everybody involved. You are also encouraged to search through the list of open issues on GitHub and submit a new issue for a planned feature, so you would not duplicate the work of others (and possibly get scooped by them) or have your work duplicated by others.
+We encourage users to submit new features or modifications for LAMMPS. Instructions, guidelines, requirements,
+and recommendations are in the following sections of the LAMMPS manual:
+* [The guide for submitting new features in the LAMMPS manual](https://lammps.sandia.gov/doc/Modify_contribute.html)
+* [The guide on programming style and requirement in the LAMMPS manual](https://lammps.sandia.gov/doc/Modify_contribute.html)
+* [The GitHub tutorial in the LAMMPS manual](http://lammps.sandia.gov/doc/Howto_github.html)

-How quickly your contribution will be integrated depends largely on how much effort it will cause to integrate and test it, how much it requires changes to the core code base, and of how much interest it is to the larger LAMMPS community. Please see below for a checklist of typical requirements. Once you have prepared everything, see [this tutorial](https://lammps.sandia.gov/doc/Howto_github.html)
- for instructions on how to submit your changes or new files through a GitHub pull request
-
-Here is a checklist of steps you need to follow to submit a single file or user package for our consideration. Following these steps will save both you and us time. See existing files in packages in the source directory for examples. If you are uncertain, please ask on the lammps-users mailing list.
-
-* C++ source code must be compatible with the C++-11 standard. Packages may require a later standard, if justified.
-* All source files you provide must compile with the most current version of LAMMPS with multiple configurations. In particular you need to test compiling LAMMPS from scratch with `-DLAMMPS_BIGBIG` set in addition to the default `-DLAMMPS_SMALLBIG` setting. Your code will need to work correctly in serial and in parallel using MPI.
-* For consistency with the rest of LAMMPS and especially, if you want your contribution(s) to be added to main LAMMPS code or one of its standard packages, it needs to be written in a style compatible with other LAMMPS source files. This means: 2-character indentation per level, no tabs, no trailing whitespace, no lines over 80 characters. I/O is done via the C-style stdio library, style class header files should not import any system headers, STL containers should be avoided in headers, and forward declarations used where possible or needed. All added code should be placed into the LAMMPS_NS namespace or a sub-namespace; global or static variables should be avoided, as they conflict with the modular nature of LAMMPS and the C++ class structure. There MUST NOT be any "using namespace XXX;" statements in headers. In the implementation file (<name>.cpp) system includes should be placed in angular brackets (<>) and for c-library functions the C++ style header files should be included (<cstdio> instead of <stdio.h>, or <cstring> instead of <string.h>). This all is so the developers can more easily understand, integrate, and maintain your contribution and reduce conflicts with other parts of LAMMPS. This basically means that the code accesses data structures, performs its operations, and is formatted similar to other LAMMPS source files, including the use of the error class for error and warning messages.
-* Source, style name, and documentation file should follow the following naming convention: style names should be lowercase and words separated by a forward slash; for a new fix style 'foo/bar', the class should be named FixFooBar, the name of the source files should be 'fix_foo_bar.h' and 'fix_foo_bar.cpp' and the corresponding documentation should be in a file 'fix_foo_bar.rst'.
-* If you want your contribution to be added as a user-contributed feature, and it is a single file (actually a `<name>.cpp` and `<name>.h` file) it can be rapidly added to the USER-MISC directory. Include the one-line entry to add to the USER-MISC/README file in that directory, along with the 2 source files. You can do this multiple times if you wish to contribute several individual features.
-* If you want your contribution to be added as a user-contribution and it is several related features, it is probably best to make it a user package directory with a name like FOO. In addition to your new files, the directory should contain a README text file. The README should contain your name and contact information and a brief description of what your new package does. If your files depend on other LAMMPS style files also being installed (e.g. because your file is a derived class from the other LAMMPS class), then an Install.sh file is also needed to check for those dependencies. See other README and Install.sh files in other USER directories as examples. Send us a tarball of this FOO directory.
-* Your new source files need to have the LAMMPS copyright, GPL notice, and your name and email address at the top, like other user-contributed LAMMPS source files. They need to create a class that is inside the LAMMPS namespace. If the file is for one of the USER packages, including USER-MISC, then we are not as picky about the coding style (see above). I.e. the files do not need to be in the same stylistic format and syntax as other LAMMPS files, though that would be nice for developers as well as users who try to read your code.
-* You **must** also create or extend a documentation file for each new command or style you are adding to LAMMPS. For simplicity and convenience, the documentation of groups of closely related commands or styles may be combined into a single file. This will be one file for a single-file feature. For a package, it might be several files. These are files in the [reStructuredText](https://docutils.sourceforge.io/rst.html) markup language, that are then converted to HTML and PDF. The tools for this conversion are included in the source distribution, and the translation can be as simple as doing "make html pdf" in the doc folder. Thus the documentation source files must be in the same format and style as other `<name>.rst` files in the lammps/doc/src directory for similar commands and styles; use one or more of them as a starting point. An introduction to reStructuredText can be found at [https://docutils.sourceforge.io/docs/user/rst/quickstart.html](https://docutils.sourceforge.io/docs/user/rst/quickstart.html). The text files can include mathematical expressions and symbol in ".. math::" sections or ":math:" expressions or figures (see doc/JPG for examples), or even additional PDF files with further details (see doc/PDF for examples). The doc page should also include literature citations as appropriate; see the bottom of doc/fix_nh.rst for examples and the earlier part of the same file for how to format the cite itself. The "Restrictions" section of the doc page should indicate that your command is only available if LAMMPS is built with the appropriate USER-MISC or FOO package. See other user package doc files for examples of how to do this. The prerequisite for building the HTML format files are Python 3.x and virtualenv. Please run at least `make html`, `make pdf` and `make spelling` and carefully inspect and proofread the resulting HTML format doc page as well as the output produced to the screen. Make sure that all spelling errors are fixed or the necessary false positives are added to the `doc/utils/sphinx-config/false_positives.txt` file.  For new styles, those usually also need to be added to lists on the respective overview pages. This can be checked for also with `make style_check`.
-* For a new package (or even a single command) you should include one or more example scripts demonstrating its use. These should run in no more than a couple minutes, even on a single processor, and not require large data files as input. See directories under examples/PACKAGES for examples of input scripts other users provided for their packages. These example inputs are also required for validating memory accesses and testing for memory leaks with valgrind
-* For new utility functions or class (i.e. anything that does not depend on a LAMMPS object), new unit tests should be added to the unittest tree.
-* When adding a new LAMMPS style, a .yaml file with a test configuration and reference data should be added for the styles where a suitable tester program already exists (e.g. pair styles, bond styles, etc.).
-* If there is a paper of yours describing your feature (either the algorithm/science behind the feature itself, or its initial usage, or its implementation in LAMMPS), you can add the citation to the <name>.cpp source file. See src/EFF/atom_vec_electron.cpp for an example. A LaTeX citation is stored in a variable at the top of the file and a single line of code that references the variable is added to the constructor of the class. Whenever a user invokes your feature from their input script, this will cause LAMMPS to output the citation to a log.cite file and prompt the user to examine the file. Note that you should only use this for a paper you or your group authored. E.g. adding a cite in the code for a paper by Nose and Hoover if you write a fix that implements their integrator is not the intended usage. That kind of citation should just be in the doc page you provide.
-
-Finally, as a general rule-of-thumb, the more clear and self-explanatory you make your documentation and README files, and the easier you make it for people to get started, e.g. by providing example scripts, the more likely it is that users will try out your new feature.
-
-If the new features/files are broadly useful we may add them as core files to LAMMPS or as part of a standard package. Else we will add them as a user-contributed file or package. Examples of user packages are in src sub-directories that start with USER. The USER-MISC package is simply a collection of (mostly) unrelated single files, which is the simplest way to have your contribution quickly added to the LAMMPS distribution. You can see a list of the both standard and user packages by typing "make package" in the LAMMPS src directory.
-
-Note that by providing us files to release, you are agreeing to make them open-source, i.e. we can release them under the terms of the GPL, used as a license for the rest of LAMMPS. See Section 1.4 for details.
-
-With user packages and files, all we are really providing (aside from the fame and fortune that accompanies having your name in the source code and on the Authors page of the LAMMPS WWW site), is a means for you to distribute your work to the LAMMPS user community, and a mechanism for others to easily try out your new feature. This may help you find bugs or make contact with new collaborators. Note that you are also implicitly agreeing to support your code which means answer questions, fix bugs, and maintain it if LAMMPS changes in some way that breaks it (an unusual event).
-
-To be able to submit an issue on GitHub, you have to register for an account (for GitHub in general). If you do not want to do that, or have other reservations or difficulties to submit a pull request, you can - as an alternative - contact one or more of the core LAMMPS developers and ask if one of them would be interested in manually merging your code into LAMMPS and send them your source code. Since the effort to merge a pull request is a small fraction of the effort of integrating source code manually (which would usually be done by converting the contribution into a pull request), your chances to have your new code included quickly are the best with a pull request.
-
-If you prefer to submit patches or full files, you should first make certain, that your code works correctly with the latest patch-level version of LAMMPS and contains all bug fixes from it. Then create a gzipped tar file of all changed or added files or a corresponding patch file using 'diff -u' or 'diff -c' and compress it with gzip. Please only use gzip compression, as this works well on all platforms.

 ## GitHub Workflows

@ -102,17 +78,17 @@ This section briefly summarizes the steps that will happen **after** you have su

 After submitting an issue, one or more of the LAMMPS developers will review it and categorize it by assigning labels. Confirmed bug reports will be labeled `bug`; if the bug report also contains a suggestion for how to fix it, it will be labeled `bugfix`; if the issue is a feature request, it will be labeled `enhancement`. Other labels may be attached as well, depending on which parts of the LAMMPS code are affected. If the assessment is, that the issue does not warrant any changes, the `wontfix` label will be applied and if the submission is incorrect or something that should not be submitted as an issue, the `invalid` label will be applied. In both of the last two cases, the issue will then be closed without further action.

-For feature requests, what happens next is that developers may comment on the viability or relevance of the request, discuss and make suggestions for how to implement it. If a LAMMPS developer or user is planning to implement the feature, the issue will be assigned to that developer. For developers, that are not yet listed as LAMMPS project collaborators, they will receive an invitation to be added to the LAMMPS project as a collaborator so they can get assigned. If the requested feature or enhancement is implemented, it will usually be submitted as a pull request, which will contain a reference to the issue number. And once the pull request is reviewed and accepted for inclusion into LAMMPS, the issue will be closed. For details on how pull requests are processed, please see below.
+For feature requests, what happens next is that developers may comment on the viability or relevance of the request, discuss and make suggestions for how to implement it. If a LAMMPS developer or user is planning to implement the feature, the issue will be assigned to that developer. For developers, that are not yet listed as LAMMPS project collaborators, they will receive an invitation to be added to the LAMMPS project as a collaborator so they can get assigned. If the requested feature or enhancement is implemented, it will be submitted as a pull request, which will contain a reference to the issue number. And once the pull request is reviewed and accepted for inclusion into LAMMPS, the issue will be closed. For details on how pull requests are processed, please see below. Feature requests may be labeled with `volunteer_needed` if none of the LAMMPS developers has the time and the required knowledge implement the feature.

-For bug reports, the next step is that one of the core LAMMPS developers will self-assign to the issue and try to confirm the bug. If confirmed, the `bug` label and potentially other labels are added to classify the issue and its impact to LAMMPS. Before confirming, further questions may be asked or requests for providing additional input files or details about the steps required to reproduce the issue. Any bugfix is likely to be submitted as a pull request (more about that below) and since most bugs require only local changes, the bugfix may be included in a pull request specifically set up to collect such local bugfixes or small enhancements. Once the bugfix is included in the master branch, the issue will be closed.
+For bug reports, the next step is that one of the core LAMMPS developers will self-assign to the issue and try to confirm the bug. If confirmed, the `bug` label and potentially other labels are added to classify the issue and its impact to LAMMPS. Otherwise the `unconfirmed` label will be applied and some comment about what was tried to confirm the bug added. Before confirming, further questions may be asked or requests for providing additional input files or details about the steps required to reproduce the issue. Any bugfix will be submitted as a pull request (more about that below) and since most bugs require only local changes, the bugfix may be included in a pull request specifically set up to collect such local bugfixes or small enhancements. Once the bugfix is included in the master branch, the issue will be closed.

 ### Pull Requests

-For submitting pull requests, there is a [detailed tutorial](https://lammps.sandia.gov/doc/Howto_github.html) in the LAMMPS manual. Thus only a brief breakdown of the steps is presented here. Please note, that the LAMMPS developers are still reviewing and trying to improve the process. If you are unsure about something, do not hesitate to post a question on the lammps-users mailing list or contact one fo the core LAMMPS developers.
-Immediately after the submission, the LAMMPS continuing integration server at ci.lammps.org will download your submitted branch and perform a simple compilation test, i.e. will test whether your submitted code can be compiled under various conditions. It will also do a check on whether your included documentation translates cleanly. Whether these tests are successful or fail will be recorded. If a test fails, please inspect the corresponding output on the CI server and take the necessary steps, if needed, so that the code can compile cleanly again. The test will be re-run each the pull request is updated with a push to the remote branch on GitHub.
-Next a LAMMPS core developer will self-assign and do an overall technical assessment of the submission. If you are not yet registered as a LAMMPS collaborator, you will receive an invitation for that. As part of the assessment, the pull request will be categorized with labels. There are two special labels: `needs_work` (indicates that work from the submitter of the pull request is needed) and `work_in_progress` (indicates, that the assigned LAMMPS developer will make changes, if not done by the contributor who made the submit). 
+Pull requests are the **only** way that changes get made to the LAMMPS distribution.  So also the LAMMPS core developers will submit pull requests for their own changes and discuss them on GitHub.  Thus if you submit a pull request it will be treated in a similar fashion. When you submit a pull request you may opt to submit a "Draft" pull request.  That means your changes are visible and will be subject to testing, but reviewers will not be (auto-)assigned and comments will take into account that this is not complete. On the other hand, this is a perfect way to ask the LAMMPS developers for comments on non-obvious changes and get feedback and possible suggestions for improvements or recommendations about what to avoid.
+Immediately after the submission, the LAMMPS continuing integration server at ci.lammps.org will download your submitted branch and perform a number of tests: it will tests whether it compiles cleanly under various conditions, it will also do a check on whether your included documentation translates cleanly and run some unit tests and other checks. Whether these tests are successful or fail will be recorded.  If a test fails, please inspect the corresponding output on the CI server and take the necessary steps, if needed, so that the code can compile cleanly again.  The test will be re-run each time the pull request is updated with a push to the remote branch on GitHub.  If you are unsure about what you need to change, ask a question in the discussion area of the pull request.
+Next a LAMMPS core developer will self-assign and do an overall technical assessment of the submission.  If you submitted a draft pull request, this will not happen unless you mark it "ready for review".  If you are not yet invited as a LAMMPS collaborator, and your contribution seems significant, you may also receive an invitation for collaboration on the LAMMPS repository.  As part of the assessment, the pull request will be categorized with labels. There are two special labels: `needs_work` (indicates that work from the submitter of the pull request is needed) and `work_in_progress` (indicates, that the assigned LAMMPS developer will make changes, if not done by the contributor who made the submit). 
 You may also receive comments and suggestions on the overall submission or specific details and on occasion specific requests for changes as part of the review. If permitted, also additional changes may be pushed into your pull request branch or a pull request may be filed in your LAMMPS fork on GitHub to include those changes.
 The LAMMPS developer may then decide to assign the pull request to another developer (e.g. when that developer is more knowledgeable about the submitted feature or enhancement or has written the modified code). It may also happen, that additional developers are requested to provide a review and approve the changes. For submissions, that may change the general behavior of LAMMPS, or where a possibility of unwanted side effects exists, additional tests may be requested by the assigned developer.
-If the assigned developer is satisfied and considers the submission ready for inclusion into LAMMPS, the pull request will receive approvals and be merged into the master branch by one of the core LAMMPS developers. After the pull request is merged, you may delete the feature branch used for the pull request in your personal LAMMPS fork.
-Since the learning curve for git is quite steep for efficiently managing remote repositories, local and remote branches, pull requests and more, do not hesitate to ask questions, if you are not sure about how to do certain steps that are asked of you. Even if the changes asked of you do not make sense to you, they may be important for the LAMMPS developers. Please also note, that these all are guidelines and nothing set in stone. So depending on the nature of the contribution, the workflow may be adjusted.
+If the assigned developer is satisfied and considers the submission ready for inclusion into LAMMPS, the pull request will receive approvals and be merged into the master branch by one of the core LAMMPS developers. After the pull request is merged, you may delete the feature branch used for the pull request in your personal LAMMPS fork.  The minimum requirement to merge a pull request is that all automated tests have to pass and at least one LAMMPS developer has approved integrating the submitted code. Since the approver will not be the person merging a pull request, you will have at least two LAMMPS developers that looked at your contribution.
+Since the learning curve for git is quite steep for efficiently managing remote repositories, local and remote branches, pull requests and more, do not hesitate to ask questions, if you are not sure about how to do certain steps that are asked of you. Even if the changes asked of you do not make sense to you, they may be important for the LAMMPS developers. Please also note, that these all are guidelines and nothing set in stone. So depending on the nature of the contribution, the work flow may be adjusted.

--- a/cmake/CMakeLists.txt
+++ b/cmake/CMakeLists.txt
@ -36,7 +36,11 @@ find_package(Git)

 # by default, install into $HOME/.local (not /usr/local), so that no root access (and sudo!!) is needed
 if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
-  set(CMAKE_INSTALL_PREFIX "$ENV{HOME}/.local" CACHE PATH "Default install path" FORCE)
+  if((CMAKE_SYSTEM_NAME STREQUAL "Windows") AND (NOT CMAKE_CROSSCOMPILING))
+    set(CMAKE_INSTALL_PREFIX "$ENV{USERPROFILE}/LAMMPS" CACHE PATH "Default install path" FORCE)
+  else()
+    set(CMAKE_INSTALL_PREFIX "$ENV{HOME}/.local" CACHE PATH "Default install path" FORCE)
+  endif()
 endif()

 # If enabled, no need to use LD_LIBRARY_PATH / DYLD_LIBRARY_PATH when installed
@ -90,6 +94,10 @@ endif()
 set(CMAKE_CXX_STANDARD 11)
 set(CMAKE_CXX_STANDARD_REQUIRED ON)
 set(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "Use compiler extensions")
+# ugly hack for MSVC which by default always reports an old C++ standard in the __cplusplus macro
+if(MSVC)
+  add_compile_options(/Zc:__cplusplus)
+endif()

 # export all symbols when building a .dll file on windows
 if((CMAKE_SYSTEM_NAME STREQUAL "Windows") AND BUILD_SHARED_LIBS)
--- a/cmake/Modules/OpenCLLoader.cmake
+++ b/cmake/Modules/OpenCLLoader.cmake
@ -1,6 +1,6 @@
 message(STATUS "Downloading and building OpenCL loader library")
-set(OPENCL_LOADER_URL "${LAMMPS_THIRDPARTY_URL}/opencl-loader-2021.06.30.tar.gz" CACHE STRING "URL for OpenCL loader tarball")
-set(OPENCL_LOADER_MD5 "f9e55dd550cfbf77f46507adf7cb8fd2" CACHE STRING "MD5 checksum of OpenCL loader tarball")
+set(OPENCL_LOADER_URL "${LAMMPS_THIRDPARTY_URL}/opencl-loader-2021.09.18.tar.gz" CACHE STRING "URL for OpenCL loader tarball")
+set(OPENCL_LOADER_MD5 "3b3882627964bd02e5c3b02065daac3c" CACHE STRING "MD5 checksum of OpenCL loader tarball")
 mark_as_advanced(OPENCL_LOADER_URL)
 mark_as_advanced(OPENCL_LOADER_MD5)

--- a/cmake/Modules/Packages/GPU.cmake
+++ b/cmake/Modules/Packages/GPU.cmake
@ -71,44 +71,47 @@ if(GPU_API STREQUAL "CUDA")
  # build arch/gencode commands for nvcc based on CUDA toolkit version and use choice
  # --arch translates directly instead of JIT, so this should be for the preferred or most common architecture
  set(GPU_CUDA_GENCODE "-arch=${GPU_ARCH}")
-  # Fermi (GPU Arch 2.x) is supported by CUDA 3.2 to CUDA 8.0
-  if((CUDA_VERSION VERSION_GREATER_EQUAL "3.2") AND (CUDA_VERSION VERSION_LESS "9.0"))
-    string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_20,code=[sm_20,compute_20] ")
-  endif()
-  # Kepler (GPU Arch 3.0) is supported by CUDA 5 to CUDA 10.2
-  if((CUDA_VERSION VERSION_GREATER_EQUAL "5.0") AND (CUDA_VERSION VERSION_LESS "11.0"))
-    string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_30,code=[sm_30,compute_30] ")
-  endif()
-  # Kepler (GPU Arch 3.5) is supported by CUDA 5 to CUDA 11
-  if((CUDA_VERSION VERSION_GREATER_EQUAL "5.0") AND (CUDA_VERSION VERSION_LESS "12.0"))
-    string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_35,code=[sm_35,compute_35]")
-  endif()
-  # Maxwell (GPU Arch 5.x) is supported by CUDA 6 and later
-  if(CUDA_VERSION VERSION_GREATER_EQUAL "6.0")
-    string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_50,code=[sm_50,compute_50] -gencode arch=compute_52,code=[sm_52,compute_52]")
-  endif()
-  # Pascal (GPU Arch 6.x) is supported by CUDA 8 and later
-  if(CUDA_VERSION VERSION_GREATER_EQUAL "8.0")
-    string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_60,code=[sm_60,compute_60] -gencode arch=compute_61,code=[sm_61,compute_61]")
-  endif()
-  # Volta (GPU Arch 7.0) is supported by CUDA 9 and later
-  if(CUDA_VERSION VERSION_GREATER_EQUAL "9.0")
-    string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_70,code=[sm_70,compute_70]")
-  endif()
-  # Turing (GPU Arch 7.5) is supported by CUDA 10 and later
-  if(CUDA_VERSION VERSION_GREATER_EQUAL "10.0")
-    string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_75,code=[sm_75,compute_75]")
-  endif()
-  # Ampere (GPU Arch 8.0) is supported by CUDA 11 and later
-  if(CUDA_VERSION VERSION_GREATER_EQUAL "11.0")
-    string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_80,code=[sm_80,compute_80]")
-  endif()
-  # Ampere (GPU Arch 8.6) is supported by CUDA 11.1 and later
-  if(CUDA_VERSION VERSION_GREATER_EQUAL "11.1")
-    string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_86,code=[sm_86,compute_86]")
-  endif()
+
+  # apply the following to build "fat" CUDA binaries only for known CUDA toolkits
  if(CUDA_VERSION VERSION_GREATER_EQUAL "12.0")
-    message(WARNING "Unsupported CUDA version. Use at your own risk.")
+    message(WARNING "Untested CUDA Toolkit version. Use at your own risk")
+  else()
+    # Fermi (GPU Arch 2.x) is supported by CUDA 3.2 to CUDA 8.0
+    if((CUDA_VERSION VERSION_GREATER_EQUAL "3.2") AND (CUDA_VERSION VERSION_LESS "9.0"))
+      string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_20,code=[sm_20,compute_20] ")
+    endif()
+    # Kepler (GPU Arch 3.0) is supported by CUDA 5 to CUDA 10.2
+    if((CUDA_VERSION VERSION_GREATER_EQUAL "5.0") AND (CUDA_VERSION VERSION_LESS "11.0"))
+      string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_30,code=[sm_30,compute_30] ")
+    endif()
+    # Kepler (GPU Arch 3.5) is supported by CUDA 5 to CUDA 11
+    if((CUDA_VERSION VERSION_GREATER_EQUAL "5.0") AND (CUDA_VERSION VERSION_LESS "12.0"))
+      string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_35,code=[sm_35,compute_35]")
+    endif()
+    # Maxwell (GPU Arch 5.x) is supported by CUDA 6 and later
+    if(CUDA_VERSION VERSION_GREATER_EQUAL "6.0")
+      string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_50,code=[sm_50,compute_50] -gencode arch=compute_52,code=[sm_52,compute_52]")
+    endif()
+    # Pascal (GPU Arch 6.x) is supported by CUDA 8 and later
+    if(CUDA_VERSION VERSION_GREATER_EQUAL "8.0")
+      string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_60,code=[sm_60,compute_60] -gencode arch=compute_61,code=[sm_61,compute_61]")
+    endif()
+    # Volta (GPU Arch 7.0) is supported by CUDA 9 and later
+    if(CUDA_VERSION VERSION_GREATER_EQUAL "9.0")
+      string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_70,code=[sm_70,compute_70]")
+    endif()
+    # Turing (GPU Arch 7.5) is supported by CUDA 10 and later
+    if(CUDA_VERSION VERSION_GREATER_EQUAL "10.0")
+      string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_75,code=[sm_75,compute_75]")
+    endif()
+    # Ampere (GPU Arch 8.0) is supported by CUDA 11 and later
+    if(CUDA_VERSION VERSION_GREATER_EQUAL "11.0")
+      string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_80,code=[sm_80,compute_80]")
+    endif()
+    # Ampere (GPU Arch 8.6) is supported by CUDA 11.1 and later
+    if(CUDA_VERSION VERSION_GREATER_EQUAL "11.1")
+      string(APPEND GPU_CUDA_GENCODE " -gencode arch=compute_86,code=[sm_86,compute_86]")
+    endif()
  endif()

  cuda_compile_fatbin(GPU_GEN_OBJS ${GPU_LIB_CU} OPTIONS ${CUDA_REQUEST_PIC}
@ -214,13 +217,20 @@ elseif(GPU_API STREQUAL "OPENCL")
 elseif(GPU_API STREQUAL "HIP")
  if(NOT DEFINED HIP_PATH)
      if(NOT DEFINED ENV{HIP_PATH})
-          set(HIP_PATH "/opt/rocm/hip" CACHE PATH "Path to which HIP has been installed")
+          set(HIP_PATH "/opt/rocm/hip" CACHE PATH "Path to HIP installation")
      else()
-          set(HIP_PATH $ENV{HIP_PATH} CACHE PATH "Path to which HIP has been installed")
+          set(HIP_PATH $ENV{HIP_PATH} CACHE PATH "Path to HIP installation")
      endif()
  endif()
-  set(CMAKE_MODULE_PATH "${HIP_PATH}/cmake" ${CMAKE_MODULE_PATH})
-  find_package(HIP REQUIRED)
+  if(NOT DEFINED ROCM_PATH)
+      if(NOT DEFINED ENV{ROCM_PATH})
+          set(ROCM_PATH "/opt/rocm" CACHE PATH "Path to ROCm installation")
+      else()
+          set(ROCM_PATH $ENV{ROCM_PATH} CACHE PATH "Path to ROCm installation")
+      endif()
+  endif()
+  list(APPEND CMAKE_PREFIX_PATH ${HIP_PATH} ${ROCM_PATH})
+  find_package(hip REQUIRED)
  option(HIP_USE_DEVICE_SORT "Use GPU sorting" ON)

  if(NOT DEFINED HIP_PLATFORM)
@ -322,10 +332,11 @@ elseif(GPU_API STREQUAL "HIP")

  set_directory_properties(PROPERTIES ADDITIONAL_MAKE_CLEAN_FILES "${LAMMPS_LIB_BINARY_DIR}/gpu/*_cubin.h ${LAMMPS_LIB_BINARY_DIR}/gpu/*.cu.cpp")

-  hip_add_library(gpu STATIC ${GPU_LIB_SOURCES})
+  add_library(gpu STATIC ${GPU_LIB_SOURCES})
  target_include_directories(gpu PRIVATE ${LAMMPS_LIB_BINARY_DIR}/gpu)
  target_compile_definitions(gpu PRIVATE -D_${GPU_PREC_SETTING} -DMPI_GERYON -DUCL_NO_EXIT)
  target_compile_definitions(gpu PRIVATE -DUSE_HIP)
+  target_link_libraries(gpu PRIVATE hip::host)

  if(HIP_USE_DEVICE_SORT)
    # add hipCUB
@ -374,8 +385,9 @@ elseif(GPU_API STREQUAL "HIP")
    endif()
  endif()

-  hip_add_executable(hip_get_devices ${LAMMPS_LIB_SOURCE_DIR}/gpu/geryon/ucl_get_devices.cpp)
+  add_executable(hip_get_devices ${LAMMPS_LIB_SOURCE_DIR}/gpu/geryon/ucl_get_devices.cpp)
  target_compile_definitions(hip_get_devices PRIVATE -DUCL_HIP)
+  target_link_libraries(hip_get_devices hip::host)

  if(HIP_PLATFORM STREQUAL "nvcc")
    target_compile_definitions(gpu PRIVATE -D__HIP_PLATFORM_NVCC__)
--- a/cmake/Modules/Packages/KOKKOS.cmake
+++ b/cmake/Modules/Packages/KOKKOS.cmake
@ -1,6 +1,8 @@
 ########################################################################
 # As of version 3.3.0 Kokkos requires C++14
-set(CMAKE_CXX_STANDARD 14)
+if(CMAKE_CXX_STANDARD LESS 14)
+  set(CMAKE_CXX_STANDARD 14)
+endif()
 ########################################################################
 # consistency checks and Kokkos options/settings required by LAMMPS
 if(Kokkos_ENABLE_CUDA)
--- a/cmake/Modules/Packages/LATTE.cmake
+++ b/cmake/Modules/Packages/LATTE.cmake
@ -19,6 +19,14 @@ if(DOWNLOAD_LATTE)
  set(LATTE_MD5 "820e73a457ced178c08c71389a385de7" CACHE STRING "MD5 checksum of LATTE tarball")
  mark_as_advanced(LATTE_URL)
  mark_as_advanced(LATTE_MD5)
+
+  # CMake cannot pass BLAS or LAPACK library variable to external project if they are a list
+  list(LENGTH BLAS_LIBRARIES} NUM_BLAS)
+  list(LENGTH LAPACK_LIBRARIES NUM_LAPACK)
+  if((NUM_BLAS GREATER 1) OR (NUM_LAPACK GREATER 1))
+    message(FATAL_ERROR "Cannot compile downloaded LATTE library due to a technical limitation")
+  endif()
+
  include(ExternalProject)
  ExternalProject_Add(latte_build
    URL     ${LATTE_URL}
--- a/cmake/Modules/Packages/ML-HDNNP.cmake
+++ b/cmake/Modules/Packages/ML-HDNNP.cmake
@ -45,12 +45,12 @@ if(DOWNLOAD_N2P2)
    # get path to MPI include directory when cross-compiling to windows
    if((CMAKE_SYSTEM_NAME STREQUAL Windows) AND CMAKE_CROSSCOMPILING)
      get_target_property(N2P2_MPI_INCLUDE MPI::MPI_CXX INTERFACE_INCLUDE_DIRECTORIES)
-      set(N2P2_PROJECT_OPTIONS "-I ${N2P2_MPI_INCLUDE} -DMPICH_SKIP_MPICXX=1")
+      set(N2P2_PROJECT_OPTIONS "-I${N2P2_MPI_INCLUDE}")
      set(MPI_CXX_COMPILER ${CMAKE_CXX_COMPILER})
    endif()
    if(CMAKE_CXX_COMPILER_ID STREQUAL "Intel")
      get_target_property(N2P2_MPI_INCLUDE MPI::MPI_CXX INTERFACE_INCLUDE_DIRECTORIES)
-      set(N2P2_PROJECT_OPTIONS "-I ${N2P2_MPI_INCLUDE} -DMPICH_SKIP_MPICXX=1")
+      set(N2P2_PROJECT_OPTIONS "-I${N2P2_MPI_INCLUDE}")
      set(MPI_CXX_COMPILER ${CMAKE_CXX_COMPILER})
    endif()
  endif()
@ -69,6 +69,12 @@ if(DOWNLOAD_N2P2)
  # echo final flag for debugging
  message(STATUS "N2P2 BUILD OPTIONS: ${N2P2_BUILD_OPTIONS}")

+  # must have "sed" command to compile n2p2 library (for now)
+  find_program(HAVE_SED sed)
+  if(NOT HAVE_SED)
+    message(FATAL_ERROR "Must have 'sed' program installed to compile 'n2p2' library for ML-HDNNP package")
+  endif()
+
  # download compile n2p2 library. much patch MPI calls in LAMMPS interface to accommodate MPI-2 (e.g. for cross-compiling)
  include(ExternalProject)
  ExternalProject_Add(n2p2_build
--- a/cmake/Modules/Packages/ML-QUIP.cmake
+++ b/cmake/Modules/Packages/ML-QUIP.cmake
@ -38,7 +38,7 @@ if(DOWNLOAD_QUIP)
  set(temp "${temp}HAVE_LOCAL_E_MIX=0\nHAVE_QC=0\nHAVE_GAP=1\nHAVE_DESCRIPTORS_NONCOMMERCIAL=1\n")
  set(temp "${temp}HAVE_TURBOGAP=0\nHAVE_QR=1\nHAVE_THIRDPARTY=0\nHAVE_FX=0\nHAVE_SCME=0\nHAVE_MTP=0\n")
  set(temp "${temp}HAVE_MBD=0\nHAVE_TTM_NF=0\nHAVE_CH4=0\nHAVE_NETCDF4=0\nHAVE_MDCORE=0\nHAVE_ASAP=0\n")
-  set(temp "${temp}HAVE_CGAL=0\nHAVE_METIS=0\nHAVE_LMTO_TBE=0\n")
+  set(temp "${temp}HAVE_CGAL=0\nHAVE_METIS=0\nHAVE_LMTO_TBE=0\nHAVE_SCALAPACK=0\n")
  file(WRITE ${CMAKE_BINARY_DIR}/quip.config "${temp}")

  message(STATUS "QUIP download via git requested - we will build our own")
@ -50,7 +50,7 @@ if(DOWNLOAD_QUIP)
    GIT_TAG origin/public
    GIT_SHALLOW YES
    GIT_PROGRESS YES
-    PATCH_COMMAND cp ${CMAKE_BINARY_DIR}/quip.config <SOURCE_DIR>/arch/Makefile.lammps
+    PATCH_COMMAND ${CMAKE_COMMAND} -E copy_if_different ${CMAKE_BINARY_DIR}/quip.config <SOURCE_DIR>/arch/Makefile.lammps
    CONFIGURE_COMMAND env QUIP_ARCH=lammps make config
    BUILD_COMMAND env QUIP_ARCH=lammps make libquip
    INSTALL_COMMAND ""
--- a/cmake/Modules/Packages/MSCG.cmake
+++ b/cmake/Modules/Packages/MSCG.cmake
@ -12,6 +12,13 @@ if(DOWNLOAD_MSCG)
  mark_as_advanced(MSCG_URL)
  mark_as_advanced(MSCG_MD5)

+  # CMake cannot pass BLAS or LAPACK library variable to external project if they are a list
+  list(LENGTH BLAS_LIBRARIES} NUM_BLAS)
+  list(LENGTH LAPACK_LIBRARIES NUM_LAPACK)
+  if((NUM_BLAS GREATER 1) OR (NUM_LAPACK GREATER 1))
+    message(FATAL_ERROR "Cannot compile downloaded MSCG library due to a technical limitation")
+  endif()
+
  include(ExternalProject)
  ExternalProject_Add(mscg_build
    URL     ${MSCG_URL}
--- a/cmake/Modules/Packages/SCAFACOS.cmake
+++ b/cmake/Modules/Packages/SCAFACOS.cmake
@ -23,6 +23,11 @@ if(DOWNLOAD_SCAFACOS)
  file(DOWNLOAD ${LAMMPS_THIRDPARTY_URL}/scafacos-1.0.1-fix.diff ${CMAKE_CURRENT_BINARY_DIR}/scafacos-1.0.1.fix.diff
          EXPECTED_HASH MD5=4baa1333bb28fcce102d505e1992d032)

+  find_program(HAVE_PATCH patch)
+  if(NOT HAVE_PATCH)
+    message(FATAL_ERROR "The 'patch' program is required to build the ScaFaCoS library")
+  endif()
+
  include(ExternalProject)
  ExternalProject_Add(scafacos_build
    URL     ${SCAFACOS_URL}
--- a/cmake/Modules/Packages/VORONOI.cmake
+++ b/cmake/Modules/Packages/VORONOI.cmake
@ -26,6 +26,11 @@ if(DOWNLOAD_VORO)
    set(VORO_BUILD_OPTIONS CXX=${CMAKE_CXX_COMPILER} CFLAGS=${VORO_BUILD_CFLAGS})
  endif()

+  find_program(HAVE_PATCH patch)
+  if(NOT HAVE_PATCH)
+    message(FATAL_ERROR "The 'patch' program is required to build the voro++ library")
+  endif()
+
  ExternalProject_Add(voro_build
    URL     ${VORO_URL}
    URL_MD5 ${VORO_MD5}
--- a/cmake/iwyu/iwyu-extra-map.imp
+++ b/cmake/iwyu/iwyu-extra-map.imp
@ -1,7 +1,28 @@
 [
-  { include: [ "<bits/types/struct_rusage.h>", private, "<sys/resource.h>", public ] },
-  { include: [ "<bits/exception.h>", public, "<exception>", public ] },
  { include: [ "@<Eigen/.*>", private, "<Eigen/Eigen>", public ] },
  { include: [ "@<gtest/.*>", private, "\"gtest/gtest.h\"", public ] },
  { include: [ "@<gmock/.*>", private, "\"gmock/gmock.h\"", public ] },
+  { include: [ "@<gmock/.*>", private, "\"gmock/gmock.h\"", public ] },
+  { include: [ "@<(cell|c_loops|container).hh>", private, "<voro++.hh>", public ] },
+  { include: [ "@\"atom_vec_.*.h\"", public, "\"style_atom.h\"", public ] },
+  { include: [ "@\"body_.*.h\"", public, "\"style_body.h\"", public ] },
+  { include: [ "@\"compute_.*.h\"", public, "\"style_compute.h\"", public ] },
+  { include: [ "@\"fix_.*.h\"", public, "\"style_fix.h\"", public ] },
+  { include: [ "@\"dump_.*.h\"", public, "\"style_dump.h\"", public ] },
+  { include: [ "@\"min_.*.h\"", public, "\"style_minimize.h\"", public ] },
+  { include: [ "@\"reader_.*.h\"", public, "\"style_reader.h\"", public ] },
+  { include: [ "@\"region_.*.h\"", public, "\"style_region.h\"", public ] },
+  { include: [ "@\"pair_.*.h\"", public, "\"style_pair.h\"", public ] },
+  { include: [ "@\"angle_.*.h\"", public, "\"style_angle.h\"", public ] },
+  { include: [ "@\"bond_.*.h\"", public, "\"style_bond.h\"", public ] },
+  { include: [ "@\"dihedral_.*.h\"", public, "\"style_dihedral.h\"", public ] },
+  { include: [ "@\"improper_.*.h\"", public, "\"style_improper.h\"", public ] },
+  { include: [ "@\"kspace_.*.h\"", public, "\"style_kspace.h\"", public ] },
+  { include: [ "@\"nbin_.*.h\"", public, "\"style_nbin.h\"", public ] },
+  { include: [ "@\"npair_.*.h\"", public, "\"style_npair.h\"", public ] },
+  { include: [ "@\"nstenci_.*.h\"", public, "\"style_nstencil.h\"", public ] },
+  { include: [ "@\"ntopo_.*.h\"", public, "\"style_ntopo.h\"", public ] },
+  { include: [ "<float.h>",   public, "<cfloat>", public ] },
+  { include: [ "<limits.h>",  public, "<climits>", public ] },
+  { include: [ "<bits/types/struct_tm.h>", private, "<ctime>", public ] },
 ]
--- a/cmake/presets/hip_amd.cmake
+++ b/cmake/presets/hip_amd.cmake
@ -0,0 +1,30 @@
+# preset that will enable hip (clang/clang++) with support for MPI and OpenMP (on Linux boxes)
+
+# prefer flang over gfortran, if available
+find_program(CLANG_FORTRAN NAMES flang gfortran f95)
+set(ENV{OMPI_FC} ${CLANG_FORTRAN})
+
+set(CMAKE_CXX_COMPILER "hipcc" CACHE STRING "" FORCE)
+set(CMAKE_C_COMPILER "hipcc" CACHE STRING "" FORCE)
+set(CMAKE_Fortran_COMPILER ${CLANG_FORTRAN} CACHE STRING "" FORCE)
+set(CMAKE_CXX_FLAGS_DEBUG "-Wall -Wextra -g" CACHE STRING "" FORCE)
+set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-Wall -Wextra -g -O2 -DNDEBUG" CACHE STRING "" FORCE)
+set(CMAKE_CXX_FLAGS_RELEASE "-O3 -DNDEBUG" CACHE STRING "" FORCE)
+set(CMAKE_Fortran_FLAGS_DEBUG "-Wall -Wextra -g -std=f2003" CACHE STRING "" FORCE)
+set(CMAKE_Fortran_FLAGS_RELWITHDEBINFO "-Wall -Wextra -g -O2 -DNDEBUG -std=f2003" CACHE STRING "" FORCE)
+set(CMAKE_Fortran_FLAGS_RELEASE "-O3 -DNDEBUG -std=f2003" CACHE STRING "" FORCE)
+set(CMAKE_C_FLAGS_DEBUG "-Wall -Wextra -g" CACHE STRING "" FORCE)
+set(CMAKE_C_FLAGS_RELWITHDEBINFO "-Wall -Wextra -g -O2 -DNDEBUG" CACHE STRING "" FORCE)
+set(CMAKE_C_FLAGS_RELEASE "-O3 -DNDEBUG" CACHE STRING "" FORCE)
+
+set(MPI_CXX "hipcc" CACHE STRING "" FORCE)
+set(MPI_CXX_COMPILER "mpicxx" CACHE STRING "" FORCE)
+
+unset(HAVE_OMP_H_INCLUDE CACHE)
+set(OpenMP_C "hipcc" CACHE STRING "" FORCE)
+set(OpenMP_C_FLAGS "-fopenmp" CACHE STRING "" FORCE)
+set(OpenMP_C_LIB_NAMES "omp" CACHE STRING "" FORCE)
+set(OpenMP_CXX "hipcc" CACHE STRING "" FORCE)
+set(OpenMP_CXX_FLAGS "-fopenmp" CACHE STRING "" FORCE)
+set(OpenMP_CXX_LIB_NAMES "omp" CACHE STRING "" FORCE)
+set(OpenMP_omp_LIBRARY "libomp.so" CACHE PATH "" FORCE)
--- a/cmake/presets/most.cmake
+++ b/cmake/presets/most.cmake
@ -24,6 +24,7 @@ set(ALL_PACKAGES
  DRUDE
  EFF
  EXTRA-COMPUTE
+  EXTRA-DUMP
  EXTRA-FIX
  EXTRA-MOLECULE
  EXTRA-PAIR
--- a/doc/github-development-workflow.md
+++ b/doc/github-development-workflow.md
@ -6,7 +6,7 @@ choices the LAMMPS developers have agreed on. Git and GitHub provide the
 tools, but do not set policies, so it is up to the developers to come to
 an agreement as to how to define and interpret policies. This document
 is likely to change as our experiences and needs change and we try to
-adapt accordingly. Last change 2018-12-19.
+adapt accordingly. Last change 2021-09-02.

 ## Table of Contents

@ -23,10 +23,10 @@ adapt accordingly. Last change 2018-12-19.

 In the interest of consistency, ONLY ONE of the core LAMMPS developers
 should doing the merging itself.  This is currently
-[@akohlmey](https://github.com/akohlmey) (Axel Kohlmeyer).
-If this assignment needs to be changed, it shall be done right after a
-stable release.  If the currently assigned developer cannot merge outstanding pull
-requests in a timely manner, or in other extenuating circumstances,
+[@akohlmey](https://github.com/akohlmey) (Axel Kohlmeyer).  If this
+assignment needs to be changed, it shall be done right after a stable
+release.  If the currently assigned developer cannot merge outstanding
+pull requests in a timely manner, or in other extenuating circumstances,
 other core LAMMPS developers with merge rights can merge pull requests,
 when necessary.

@ -55,13 +55,14 @@ the required changes or ask the submitter of the pull request to implement
 them.  Even though, all LAMMPS developers may have write access to pull
 requests (if enabled by the submitter, which is the default), only the
 submitter or the assignee of a pull request may do so.  During this
-period the `work_in_progress` label shall be applied to the pull
+period the `work_in_progress` label may be applied to the pull
 request.  The assignee gets to decide what happens to the pull request
 next, e.g. whether it should be assigned to a different developer for
 additional checks and changes, or is recommended to be merged.  Removing
 the `work_in_progress` label and assigning the pull request to the
 developer tasked with merging signals that a pull request is ready to be
-merged.
+merged. In addition, a `ready_for_merge` label may also be assigned
+to signal urgency to merge this pull request quickly.

 ### Pull Request Reviews

@ -97,108 +98,50 @@ rationale behind choices made.  Exceptions to this policy are technical
 discussions, that are centered on tools or policies themselves
 (git, GitHub, c++) rather than on the content of the pull request.

-### Checklist for Pull Requests
-
-Here are some items to check:
-  * source and text files should not have CR/LF line endings (use dos2unix to remove)
-  * every new command or style should have documentation. The names of
-  source files (c++ and manual) should follow the name of the style.
-  (example: `src/fix_nve.cpp`, `src/fix_nve.h` for `fix nve` command,
-  implementing the class `FixNVE`, documented in `doc/src/fix_nve.rst`)
-  * all new style names should be lower case, the must be no dashes,
-  blanks, or underscores separating words, only forward slashes.
-  * new style docs should be added to the "overview" files in
-  `doc/src/Commands_*.rst`, `doc/src/{fixes,computes,pairs,bonds,...}.rst`
-  * check whether manual cleanly translates with `make html` and `make pdf`
-  * if documentation is (still) provided as a .txt file, convert to .rst
-  and remove the .txt file. For files in doc/txt the conversion is automatic.
-  * remove all .txt files in `doc/txt` that are out of sync with their .rst counterparts in `doc/src`
-  * check spelling of manual with `make spelling` in doc folder
-  * check style tables and command lists with `make style_check`
-  * new source files in packages should be added to `src/.gitignore`
-  * removed or renamed files in packages should be added to `src/Purge.list`
-  * C++ source files should use C++ style include files for accessing
-  C-library APIs, e.g. `#include <cstdlib>` instead of `#include <stdlib.h>`.
-  And they should use angular brackets instead of double quotes. Full list:
-    * assert.h -> cassert
-    * ctype.h -> cctype
-    * errno.h -> cerrno
-    * float.h -> cfloat
-    * limits.h -> climits
-    * math.h -> cmath
-    * complex.h -> complex
-    * setjmp.h -> csetjmp
-    * signal.h -> csignal
-    * stddef.h -> cstddef
-    * stdint.h -> cstdint
-    * stdio.h -> cstdio
-    * stdlib.h -> cstdlib
-    * string.h -> cstring
-    * time.h -> ctime
-    * Do NOT replace (as they are C++-11): `inttypes.h` and `stdint.h`.
-  * Code must follow the C++-11 standard. C++98-only is no longer accepted
-  * Code should use `nullptr` instead of `NULL` where applicable.
-  in individual special purpose packages
-  * indentation is 2 spaces per level
-  * there should be NO tabs and no trailing whitespace (review the "checkstyle" test on pull requests)
-  * header files, especially of new styles, should not include any
-  other headers, except the header with the base class or cstdio.
-  Forward declarations should be used instead when possible.
-  * iostreams should be avoided. LAMMPS uses stdio from the C-library.
-  * use of STL in headers and class definitions should be avoided.
-  exception is <string>, but it won't need to be explicitly included
-  since pointers.h already includes it. so std::string can be used directly.
-  * there MUST NOT be any "using namespace XXX;" statements in headers.
-  * static class members should be avoided at all cost.
-  * anything storing atom IDs should be using `tagint` and not `int`.
-  This can be flagged by the compiler only for pointers and only when
-  compiling LAMMPS with `-DLAMMPS_BIGBIG`.
-  * when including both `lmptype.h` (and using defines or macros from it)
-  and `mpi.h`, `lmptype.h` must be included first.
-  * see https://github.com/lammps/lammps/blob/master/doc/include-file-conventions.md
-  for general include file conventions and best practices
-  * when pair styles are added, check if settings for flags like
-  `single_enable`, `writedata`, `reinitflag`, `manybody_flag`
-  and others are correctly set and supported.
-
 ## GitHub Issues

 The GitHub issue tracker is the location where the LAMMPS developers
 and other contributors or LAMMPS users can report issues or bugs with
-the LAMMPS code or request new features to be added. Feature requests
-are usually indicated by a `[Feature Request]` marker in the subject.
-Issues are assigned to a person, if this person is working on this
-feature or working to resolve an issue. Issues that have nobody working
-on them at the moment, have the label `volunteer needed` attached.
+the LAMMPS code or request new features to be added. Bug reports have
+a `[Bug]` marker in the subject line; suggestions for changes or
+adding new functionality are indicated by a `[Feature Request]`
+marker in the subject. This is automatically done when using the
+corresponding template for submitting an issue.  Issues may be assigned
+to one or more developers, if they are working on this feature or
+working to resolve an issue.  Issues that have nobody working
+on them at the moment or in the near future, have the label
+`volunteer needed` attached.

 When an issue, say `#125` is resolved by a specific pull request,
 the comment for the pull request shall contain the text `closes #125`
 or `fixes #125`, so that the issue is automatically deleted when
-the pull request is merged.
+the pull request is merged.  The template for pull requests includes
+a header where connections between pull requests and issues can be listed
+and thus were this comment should be placed.

 ## Milestones and Release Planning

 LAMMPS uses a continuous release development model with incremental
 changes, i.e. significant effort is made - including automated pre-merge
-testing - that the code in the branch "master" does not get broken.
-More extensive testing (including regression testing) is performed after
-code is merged to the "master" branch. There are patch releases of
-LAMMPS every 1-3 weeks at a point, when the LAMMPS developers feel, that
-a sufficient amount of changes have happened, and the post-merge testing
-has been successful. These patch releases are marked with a
-`patch_<version date>` tag and the "unstable" branch follows only these
-versions (and thus is always supposed to be of production quality,
-unlike "master", which may be temporary broken, in the case of larger
-change sets or unexpected incompatibilities or side effects.
+testing - that the code in the branch "master" does not get easily
+broken.  These tests are run after every update to a pull request.  More
+extensive and time consuming tests (including regression testing) are
+performed after code is merged to the "master" branch. There are patch
+releases of LAMMPS every 3-5 weeks at a point, when the LAMMPS
+developers feel, that a sufficient amount of changes have happened, and
+the post-merge testing has been successful. These patch releases are
+marked with a `patch_<version date>` tag and the "unstable" branch
+follows only these versions (and thus is always supposed to be of
+production quality, unlike "master", which may be temporary broken, in
+the case of larger change sets or unexpected incompatibilities or side
+effects.

-About 3-4 times each year, there are going to be "stable" releases
-of LAMMPS.  These have seen additional, manual testing and review of
+About 1-2 times each year, there are going to be "stable" releases of
+LAMMPS.  These have seen additional, manual testing and review of
 results from testing with instrumented code and static code analysis.
-Also, in the last 2-3 patch releases before a stable release are
-"release candidate" versions which only contain bugfixes and
-documentation updates.  For release planning and the information of
-code contributors, issues and pull requests being actively worked on
-are assigned a "milestone", which corresponds to the next stable
-release or the stable release after that, with a tentative release
-date.
-
+Also, the last 1-3 patch releases before a stable release are "release
+candidate" versions which only contain bugfixes and documentation
+updates.  For release planning and the information of code contributors,
+issues and pull requests being actively worked on are assigned a
+"milestone", which corresponds to the next stable release or the stable
+release after that, with a tentative release date.
--- a/doc/include-file-conventions.md
+++ b/doc/include-file-conventions.md
@ -1,128 +0,0 @@
-# Outline of include file conventions in LAMMPS
-
-This purpose of this document is to provide a point of reference
-for LAMMPS developers and contributors as to what include files
-and definitions to put where into LAMMPS source.
-Last change 2020-08-31
-
-## Table of Contents
-
-  * [Motivation](#motivation)
-  * [Rules](#rules)
-  * [Tools](#tools)
-  * [Legacy Code](#legacy-code)
-
-## Motivation
-
-The conventions outlined in this document are supposed to help make
-maintenance of the LAMMPS software easier.  By trying to achieve
-consistency across files contributed by different developers, it will
-become easier for the code maintainers to modify and adjust files and,
-overall, the chance for errors or portability issues will be reduced.
-The rules employed are supposed to minimize naming conflicts and
-simplify dependencies between files and thus speed up compilation. They
-may, as well, make otherwise hidden dependencies visible.
-
-## Rules
-
-Below are the various rules that are applied.  Not all are enforced
-strictly and automatically.  If there are no significant side effects,
-exceptions may be possible for cases where a full compliance to the
-rules may require a large effort compared to the benefit.
-
-### Core Files Versus Package Files
-
-All rules listed below are most strictly observed for core LAMMPS files,
-which are the files that are not part of a package, and the files of the
-packages MOLECULE, MANYBODY, KSPACE, and RIGID.  On the other end of
-the spectrum are USER packages and legacy packages that predate these
-rules and thus may not be fully compliant.  Also, new contributions
-will be checked more closely, while existing code will be incrementally
-adapted to the rules as time and required effort permits.
-
-### System Versus Local Header Files
-
-All system- or library-provided include files are included with angular
-brackets (examples: `#include <cstring>` or `#include <mpi.h>`) while
-include files provided with LAMMPS are included with double quotes
-(examples: `#include "pointers.h"` or `#include "compute_temp.h"`).
-
-For headers declaring functions of the C-library, the corresponding
-C++ versions should be included (examples: `#include <cstdlib>` or
-`#include <cctypes>` instead of `#include <stdlib.h>` or
-`#include<ctypes.h>` ).
-
-### C++ Standard Compliance
-
-LAMMPS core files use standard conforming C++ compatible with the
-C++11 standard, unless explicitly noted.  Also, LAMMPS uses the C-style
-stdio library for I/O instead of iostreams.  Since using both at the
-same time can cause problems, iostreams should be avoided where possible.
-
-### Lean Header Files
-
-Header files will typically contain the definition of a (single) class.
-These header files should have as few include statements as possible.
-This is particularly important for classes that implement a "style" and
-thus use a macro of the kind `SomeStyle(some/name,SomeName)`. These will
-all be included in the auto-generated `"some_style.h"` files which
-results in a high potential for direct or indirect symbol name clashes.
-
-In the ideal case, the header would only include one file defining the
-parent class. That would typically be either `#include "pointers.h"` for
-the `Pointers` class, or a header of a class derived from it like
-`#include "pair.h"` for the `Pair` class and so on.  References to other
-classes inside the class should be make through pointers, for which forward
-declarations (inside the `LAMMPS_NS` or the new class' namespace) can
-be employed.  The full definition will then be included into the corresponding
-implementation file.  In the given example from above, the header file
-would be called `some_name.h` and the implementation `some_name.cpp` (all
-lower case with underscores, while the class itself would be in camel case
-and no underscores `SomeName`, and the style name with lower case names separated by
-a forward slash).
-
-### Implementation Files
-
-In the implementation files (typically, those would have the same base name
-as the corresponding header with a .cpp extension instead of .h) include
-statements should follow the "include what you use" principle.
-
-### Order of Include Statements
-
-Include files should be included in this order:
-* the header matching the implementation (`some_class.h` for file `some_class.cpp`)
-* mpi.h  (only if needed)
-* LAMMPS local headers (preferably in alphabetical order)
-* system and library headers (anything that is using angular brackets; preferably in alphabetical order)
-* conditional include statements (i.e. anything bracketed with ifdefs)
-
-### Special Cases and Exceptions
-
-#### pointers.h
-
-The `pointer.h` header file also includes (in this order) `lmptype.h`,
-`mpi.h`, `cstddef`, `cstdio`, `string`, `utils.h`, and `fmt/format.h`
-and through `lmptype.h` indirectly also `climits`, `cstdlib`, `cinttypes`.
-This means any header including `pointers.h` can assume that `FILE`,
-`NULL`, `INT_MAX` are defined, and the may freely use the std::string
-for arguments. Corresponding implementation files do not need to include
-those headers.
-
-## Tools
-
-The [Include What You Use tool](https://include-what-you-use.org/)
-can be used to provide supporting information about compliance with
-the rules listed here.  Through setting `-DENABLE_IWYU=on` when running
-CMake, a custom build target is added that will enable recording
-the compilation commands and then run the `iwyu_tool` using the
-recorded compilation commands information when typing `make iwyu`.
-
-## Legacy Code
-
-A lot of code predates the application of the rules in this document
-and the rules themselves are a moving target.  So there are going to be
-significant chunks of code that do not fully comply.  This applies
-for example to the REAXFF, or the ATC package.  The LAMMPS
-developers are dedicated to make an effort to improve the compliance
-and welcome volunteers wanting to help with the process.
-
--- a/doc/lammps.1
+++ b/doc/lammps.1
@ -1,4 +1,4 @@
-.TH LAMMPS "31 August 2021" "2021-08-31"
+.TH LAMMPS "29 September 2021" "2021-09-29"
 .SH NAME
 .B LAMMPS
 \- Molecular Dynamics Simulator.
@ -54,7 +54,7 @@ using
 this <machine name> parameter can be chosen arbitrarily at configuration
 time, but more common is to just use
 .B lmp
-without a suffix. In this manpage we will use
+without a suffix. In this man page we will use
 .B lmp
 to represent any of those names.

@ -94,7 +94,7 @@ Enable or disable general KOKKOS support, as provided by the KOKKOS
 package.  Even if LAMMPS is built with this package, this switch must
 be set to \fBon\fR to enable running with KOKKOS-enabled styles. More
 details on this switch and its optional keyword value pairs are discussed
-at: https://lammps.sandia.gov/doc/Run_options.html
+at: https://docs.lammps.org/Run_options.html
 .TP
 \fB\-l <log file>\fR or \fB\-log <log file>\fR
 Specify a log file for LAMMPS to write status information to.
@ -122,6 +122,38 @@ to perform client/server messaging with another application.
 .B LAMMPS
 can act as either a client or server (or both).
 .TP
+\fB\-mdi '<mdi_flags>'\fR
+This flag is only recognized and used when
+.B LAMMPS
+has support for the MolSSI
+Driver Interface (MDI) included as part of the MDI package.  This flag is
+specific to the MDI library and controls how
+.B LAMMPS
+interacts with MDI.  There are usually multiple flags that have to follow it
+and those have to be placed in quotation marks.  For more information about
+how to launch LAMMPS in MDI client/server mode please refer to the
+MDI How-to at  https://docs.lammps.org/Howto_mdi.html
+.TP
+\fB\-c\fR or \fB\-cite <style or filename>\fR
+Select how and where to output a reminder about citing contributions
+to the
+.B LAMMPS
+code that were used during the run. Available keywords
+for styles are "both", "none", "screen", or "log".  Any other keyword
+will be considered a file name to write the detailed citation info to
+instead of logfile or screen.  Default is the "log" style where there
+is a short summary in the screen output and detailed citations
+in BibTeX format in the logfile.  The option "both" selects the detailed
+output for both, "none", the short output for both, and "screen" will
+write the detailed info to the screen and the short version to the log
+file.  If a dedicated citation info file is requested, the screen and
+log file output will be in the short format (same as with "none").
+
+See https://docs.lammps.org/Intro_citing.html for more details on
+how to correctly reference and cite
+.B LAMMPS
+.
+.TP
 \fB\-nc\fR or \fB\-nocite\fR
 Disable writing the "log.cite" file which is normally written to
 list references for specific cite-able features used during a
@ -202,7 +234,7 @@ the standard output. If <file name> is "none", (most) screen
 output will be suppressed.  In multi-partition mode only
 some high-level all-partition information is written to the
 screen or "<file name>" file, the remainder is written in a
-per-partition file "screen.N" or "<file name>.N" 
+per-partition file "screen.N" or "<file name>.N"
 with "N" being the respective partition number, and unless
 overridden by the \-pscreen flag (see above).
 .TP
@ -218,8 +250,19 @@ and then "omp") and thus requires two arguments. Along with the
 "-package" command-line switch, this is a convenient mechanism for
 invoking styles from accelerator packages and setting their options
 without having to edit an input script.
+.TP
+\fB\-sr\fR or \fB\-skiprun\fR
+Insert the command "timer timeout 0 every 1" at the
+beginning of an input file or after a "clear" command.
+This has the effect that the entire
+.B LAMMPS
+input script is processed without executing actual
+"run" or "minimize" or similar commands (their main loops are skipped).
+This can be helpful and convenient to test input scripts of long running
+calculations for correctness to avoid having them crash after a
+long time due to a typo or syntax error in the middle or at the end.

-See https://lammps.sandia.gov/doc/Run_options.html for additional
+See https://docs.lammps.org/Run_options.html for additional
 details and discussions on command-line options.

 .SH LAMMPS BASICS
@ -254,7 +297,7 @@ the chapter on errors in the
 manual gives some additional information about error messages, if possible.

 .SH COPYRIGHT
-© 2003--2020 Sandia Corporation
+© 2003--2021 Sandia Corporation

 This package is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License version 2 as
--- a/doc/msi2lmp.1
+++ b/doc/msi2lmp.1
@ -98,7 +98,7 @@ msi2lmp decane -c 0 -f oplsaa


 .SH COPYRIGHT
-© 2003--2019 Sandia Corporation
+© 2003--2021 Sandia Corporation

 This package is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License version 2 as
--- a/doc/src/Build_development.rst
+++ b/doc/src/Build_development.rst
@ -56,16 +56,18 @@ Report missing and unneeded '#include' statements (CMake only)
 --------------------------------------------------------------

 The conventions for how and when to use and order include statements in
-LAMMPS are `documented in a separate file <https://github.com/lammps/lammps/blob/master/doc/include-file-conventions.md>`_
-(also included in the source code distribution).  To assist with following
+LAMMPS are documented in :doc:`Modify_style`.  To assist with following
 these conventions one can use the `Include What You Use tool <https://include-what-you-use.org/>`_.
-This is still under development and for large and complex projects like LAMMPS
+This tool is still under development and for large and complex projects like LAMMPS
 there are some false positives, so suggested changes need to be verified manually.
-It is recommended to use at least version 0.14, which has much fewer incorrect
-reports than earlier versions.
+It is recommended to use at least version 0.16, which has much fewer incorrect
+reports than earlier versions.  To install the IWYU toolkit, you need to have
+the clang compiler **and** its development package installed.  Download the IWYU
+version that matches the version of the clang compiler, configure, build, and
+install it.

-The necessary steps to generate the report can be enabled via a
-CMake variable:
+The necessary steps to generate the report can be enabled via a CMake variable
+during CMake configuration.

 .. code-block:: bash

--- a/doc/src/Build_manual.rst
+++ b/doc/src/Build_manual.rst
@ -22,7 +22,6 @@ files. Here is a list with descriptions:
   .gitignore       # list of files and folders to be ignored by git
   doxygen-warn.log # logfile with warnings from running doxygen
   github-development-workflow.md   # notes on the LAMMPS development workflow
-   include-file-conventions.md      # notes on LAMMPS' include file conventions

 If you downloaded LAMMPS as a tarball from `the LAMMPS website <lws_>`_,
 the html folder and the PDF files should be included.
@ -75,8 +74,8 @@ folder.  The following ``make`` commands are available:
 .. code-block:: bash

   make html          # generate HTML in html dir using Sphinx
-   make pdf           # generate PDF  as Manual.pdf using Sphinx and pdflatex
-   make fetch         # fetch HTML pages and PDF files from LAMMPS web site
+   make pdf           # generate PDF  as Manual.pdf using Sphinx and PDFLaTeX
+   make fetch         # fetch HTML pages and PDF files from LAMMPS website
                      #  and unpack into the html_www folder and Manual_www.pdf
   make epub          # generate LAMMPS.epub in ePUB format using Sphinx
   make mobi          # generate LAMMPS.mobi in MOBI format using ebook-convert
--- a/doc/src/Build_settings.rst
+++ b/doc/src/Build_settings.rst
@ -71,7 +71,8 @@ LAMMPS can use them if they are available on your system.

         -D FFTW3_INCLUDE_DIR=path   # path to FFTW3 include files
         -D FFTW3_LIBRARY=path       # path to FFTW3 libraries
-         -D FFT_FFTW_THREADS=on      # enable using threaded FFTW3 libraries
+         -D FFTW3_OMP_LIBRARY=path   # path to FFTW3 OpenMP wrapper libraries
+         -D FFT_FFTW_THREADS=on      # enable using OpenMP threaded FFTW3 libraries
         -D MKL_INCLUDE_DIR=path     # ditto for Intel MKL library
         -D FFT_MKL_THREADS=on       # enable using threaded FFTs with MKL libraries
         -D MKL_LIBRARY=path         # path to MKL libraries
--- a/doc/src/Commands_input.rst
+++ b/doc/src/Commands_input.rst
@ -1,55 +1,75 @@
 LAMMPS input scripts
 ====================

-LAMMPS executes by reading commands from a input script (text file),
-one line at a time.  When the input script ends, LAMMPS exits.  Each
-command causes LAMMPS to take some action.  It may set an internal
-variable, read in a file, or run a simulation.  Most commands have
-default settings, which means you only need to use the command if you
-wish to change the default.
+LAMMPS executes calculations by reading commands from a input script (text file), one
+line at a time.  When the input script ends, LAMMPS exits.  This is different
+from programs that read and process the entire input before starting a calculation.
+
+Each command causes LAMMPS to take some immediate action without regard
+for any commands that may be processed later. Commands may set an
+internal variable, read in a file, or run a simulation.  These actions
+can be grouped into three categories:
+
+a) commands that change a global setting (examples: timestep, newton,
+   echo, log, thermo, restart),
+b) commands that add, modify, remove, or replace "styles" that are
+   executed during a "run" (examples: pair_style, fix, compute, dump,
+   thermo_style, pair_modify), and
+c) commands that execute a "run" or perform some other computation or
+   operation (examples: print, run, minimize, temper, write_dump, rerun,
+   read_data, read_restart)
+
+Commands in category a) have default settings, which means you only
+need to use the command if you wish to change the defaults.

 In many cases, the ordering of commands in an input script is not
-important.  However the following rules apply:
+important, but can have consequences when the global state is changed
+between commands in the c) category. The following rules apply:

 (1) LAMMPS does not read your entire input script and then perform a
-simulation with all the settings.  Rather, the input script is read
-one line at a time and each command takes effect when it is read.
-Thus this sequence of commands:
+    simulation with all the settings.  Rather, the input script is read
+    one line at a time and each command takes effect when it is read.
+    Thus this sequence of commands:

-.. code-block:: LAMMPS
+    .. code-block:: LAMMPS

-   timestep 0.5
-   run      100
-   run      100
+       timestep 0.5
+       run      100
+       run      100

-does something different than this sequence:
+    does something different than this sequence:

-.. code-block:: LAMMPS
+    .. code-block:: LAMMPS

-   run      100
-   timestep 0.5
-   run      100
+       run      100
+       timestep 0.5
+       run      100

-In the first case, the specified timestep (0.5 fs) is used for two
-simulations of 100 timesteps each.  In the second case, the default
-timestep (1.0 fs) is used for the first 100 step simulation and a 0.5 fs
-timestep is used for the second one.
+    In the first case, the specified timestep (0.5 fs) is used for two
+    simulations of 100 timesteps each.  In the second case, the default
+    timestep (1.0 fs) is used for the first 100 step simulation and a
+    0.5 fs timestep is used for the second one.

 (2) Some commands are only valid when they follow other commands.  For
-example you cannot set the temperature of a group of atoms until atoms
-have been defined and a group command is used to define which atoms
-belong to the group.
+    example you cannot set the temperature of a group of atoms until
+    atoms have been defined and a group command is used to define which
+    atoms belong to the group.

 (3) Sometimes command B will use values that can be set by command A.
-This means command A must precede command B in the input script if it
-is to have the desired effect.  For example, the
-:doc:`read_data <read_data>` command initializes the system by setting
-up the simulation box and assigning atoms to processors.  If default
-values are not desired, the :doc:`processors <processors>` and
-:doc:`boundary <boundary>` commands need to be used before read_data to
-tell LAMMPS how to map processors to the simulation box.
+    This means command A must precede command B in the input script if
+    it is to have the desired effect.  For example, the :doc:`read_data
+    <read_data>` command initializes the system by setting up the
+    simulation box and assigning atoms to processors.  If default values
+    are not desired, the :doc:`processors <processors>` and
+    :doc:`boundary <boundary>` commands need to be used before read_data
+    to tell LAMMPS how to map processors to the simulation box.

 Many input script errors are detected by LAMMPS and an ERROR or
 WARNING message is printed.  The :doc:`Errors <Errors>` page gives
 more information on what errors mean.  The documentation for each
 command lists restrictions on how the command can be used.
+
+You can use the :ref:`-skiprun <skiprun>` command line flag
+to have LAMMPS skip the execution of any "run", "minimize", or similar
+commands to check the entire input for correct syntax to avoid crashes
+on typos or syntax errors in long runs.
--- a/doc/src/Developer.rst
+++ b/doc/src/Developer.rst
@ -11,6 +11,7 @@ of time and requests from the LAMMPS user community.
   :maxdepth: 1

   Developer_org
+   Developer_parallel
   Developer_flow
   Developer_write
   Developer_notes
--- a/doc/src/Developer_par_comm.rst
+++ b/doc/src/Developer_par_comm.rst
@ -0,0 +1,120 @@
+Communication
+^^^^^^^^^^^^^
+
+Following the partitioning scheme in use all per-atom data is
+distributed across the MPI processes, which allows LAMMPS to handle very
+large systems provided it uses a correspondingly large number of MPI
+processes.  Since The per-atom data (atom IDs, positions, velocities,
+types, etc.)  To be able to compute the short-range interactions MPI
+processes need not only access to data of atoms they "own" but also
+information about atoms from neighboring sub-domains, in LAMMPS referred
+to as "ghost" atoms.  These are copies of atoms storing required
+per-atom data for up to the communication cutoff distance. The green
+dashed-line boxes in the :ref:`domain-decomposition` figure illustrate
+the extended ghost-atom sub-domain for one processor.
+
+This approach is also used to implement periodic boundary
+conditions: atoms that lie within the cutoff distance across a periodic
+boundary are also stored as ghost atoms and taken from the periodic
+replication of the sub-domain, which may be the same sub-domain, e.g. if
+running in serial.  As a consequence of this, force computation in
+LAMMPS is not subject to minimum image conventions and thus cutoffs may
+be larger than half the simulation domain.
+
+.. _ghost-atom-comm:
+.. figure:: img/ghost-comm.png
+   :align: center
+
+   ghost atom communication
+
+   This figure shows the ghost atom communication patterns between
+   sub-domains for "brick" (left) and "tiled" communication styles for
+   2d simulations.  The numbers indicate MPI process ranks.  Here the
+   sub-domains are drawn spatially separated for clarity.  The
+   dashed-line box is the extended sub-domain of processor 0 which
+   includes its ghost atoms.  The red- and blue-shaded boxes are the
+   regions of communicated ghost atoms.
+
+Efficient communication patterns are needed to update the "ghost" atom
+data, since that needs to be done at every MD time step or minimization
+step.  The diagrams of the `ghost-atom-comm` figure illustrate how ghost
+atom communication is performed in two stages for a 2d simulation (three
+in 3d) for both a regular and irregular partitioning of the simulation
+box.  For the regular case (left) atoms are exchanged first in the
+*x*-direction, then in *y*, with four neighbors in the grid of processor
+sub-domains.
+
+In the *x* stage, processor ranks 1 and 2 send owned atoms in their
+red-shaded regions to rank 0 (and vice versa).  Then in the *y* stage,
+ranks 3 and 4 send atoms in their blue-shaded regions to rank 0, which
+includes ghost atoms they received in the *x* stage.  Rank 0 thus
+acquires all its ghost atoms; atoms in the solid blue corner regions
+are communicated twice before rank 0 receives them.
+
+For the irregular case (right) the two stages are similar, but a
+processor can have more than one neighbor in each direction.  In the
+*x* stage, MPI ranks 1,2,3 send owned atoms in their red-shaded regions to
+rank 0 (and vice versa).  These include only atoms between the lower
+and upper *y*-boundary of rank 0's sub-domain.  In the *y* stage, ranks
+4,5,6 send atoms in their blue-shaded regions to rank 0.  This may
+include ghost atoms they received in the *x* stage, but only if they
+are needed by rank 0 to fill its extended ghost atom regions in the
+/-*y* directions (blue rectangles).  Thus in this case, ranks 5 and
+6 do not include ghost atoms they received from each other (in the *x*
+stage) in the atoms they send to rank 0.  The key point is that while
+the pattern of communication is more complex in the irregular
+partitioning case, it can still proceed in two stages (three in 3d)
+via atom exchanges with only neighboring processors.
+
+When attributes of owned atoms are sent to neighboring processors to
+become attributes of their ghost atoms, LAMMPS calls this a "forward"
+communication.  On timesteps when atoms migrate to new owning processors
+and neighbor lists are rebuilt, each processor creates a list of its
+owned atoms which are ghost atoms in each of its neighbor processors.
+These lists are used to pack per-atom coordinates (for example) into
+message buffers in subsequent steps until the next reneighboring.
+
+A "reverse" communication is when computed ghost atom attributes are
+sent back to the processor who owns the atom.  This is used (for
+example) to sum partial forces on ghost atoms to the complete force on
+owned atoms.  The order of the two stages described in the
+:ref:`ghost-atom-comm` figure is inverted and the same lists of atoms
+are used to pack and unpack message buffers with per-atom forces.  When
+a received buffer is unpacked, the ghost forces are summed to owned atom
+forces.  As in forward communication, forces on atoms in the four blue
+corners of the diagrams are sent, received, and summed twice (once at
+each stage) before owning processors have the full force.
+
+These two operations are used many places within LAMMPS aside from
+exchange of coordinates and forces, for example by manybody potentials
+to share intermediate per-atom values, or by rigid-body integrators to
+enable each atom in a body to access body properties.  Here are
+additional details about how these communication operations are
+performed in LAMMPS:
+
+- When exchanging data with different processors, forward and reverse
+  communication is done using ``MPI_Send()`` and ``MPI_IRecv()`` calls.
+  If a processor is "exchanging" atoms with itself, only the pack and
+  unpack operations are performed, e.g. to create ghost atoms across
+  periodic boundaries when running on a single processor.
+
+- For forward communication of owned atom coordinates, periodic box
+  lengths are added and subtracted when the receiving processor is
+  across a periodic boundary from the sender.  There is then no need to
+  apply a minimum image convention when calculating distances between
+  atom pairs when building neighbor lists or computing forces.
+
+- The cutoff distance for exchanging ghost atoms is typically equal to
+  the neighbor cutoff.  But it can also chosen to be longer if needed,
+  e.g. half the diameter of a rigid body composed of multiple atoms or
+  over 3x the length of a stretched bond for dihedral interactions.  It
+  can also exceed the periodic box size.  For the regular communication
+  pattern (left), if the cutoff distance extends beyond a neighbor
+  processor's sub-domain, then multiple exchanges are performed in the
+  same direction.  Each exchange is with the same neighbor processor,
+  but buffers are packed/unpacked using a different list of atoms. For
+  forward communication, in the first exchange a processor sends only
+  owned atoms.  In subsequent exchanges, it sends ghost atoms received
+  in previous exchanges.  For the irregular pattern (right) overlaps of
+  a processor's extended ghost-atom sub-domain with all other processors
+  in each dimension are detected.
--- a/doc/src/Developer_par_long.rst
+++ b/doc/src/Developer_par_long.rst
@ -0,0 +1,188 @@
+Long-range interactions
+^^^^^^^^^^^^^^^^^^^^^^^
+
+For charged systems, LAMMPS can compute long-range Coulombic
+interactions via the FFT-based particle-particle/particle-mesh (PPPM)
+method implemented in :doc:`kspace style pppm and its variants
+<kspace_style>`.  For that Coulombic interactions are partitioned into
+short- and long-range components.  The short-ranged portion is computed
+in real space as a loop over pairs of charges within a cutoff distance,
+using neighbor lists.  The long-range portion is computed in reciprocal
+space using a kspace style.  For the PPPM implementation the simulation
+cell is overlaid with a regular FFT grid in 3d. It proceeds in several stages:
+
+a) each atom's point charge is interpolated to nearby FFT grid points,
+b) a forward 3d FFT is performed,
+c) a convolution operation is performed in reciprocal space,
+d) one or more inverse 3d FFTs are performed, and
+e) electric field values from grid points near each atom are interpolated to compute
+   its forces.
+
+For any of the spatial-decomposition partitioning schemes each processor
+owns the brick-shaped portion of FFT grid points contained within its
+sub-domain.  The two interpolation operations use a stencil of grid
+points surrounding each atom.  To accommodate the stencil size, each
+processor also stores a few layers of ghost grid points surrounding its
+brick.  Forward and reverse communication of grid point values is
+performed similar to the corresponding :doc:`atom data communication
+<Developer_par_comm>`.  In this case, electric field values on owned
+grid points are sent to neighboring processors to become ghost point
+values.  Likewise charge values on ghost points are sent and summed to
+values on owned points.
+
+For triclinic simulation boxes, the FFT grid planes are parallel to
+the box faces, but the mapping of charge and electric field values
+to/from grid points is done in reduced coordinates where the tilted
+box is conceptually a unit cube, so that the stencil and FFT
+operations are unchanged.  However the FFT grid size required for a
+given accuracy is larger for triclinic domains than it is for
+orthogonal boxes.
+
+.. _fft-parallel:
+.. figure:: img/fft-decomp-parallel.png
+   :align: center
+
+   parallel FFT in PPPM
+
+   Stages of a parallel FFT for a simulation domain overlaid
+   with an 8x8x8 3d FFT grid, partitioned across 64 processors.
+   Within each of the 4 diagrams, grid cells of the same color are
+   owned by a single processor; for simplicity only cells owned by 4
+   or 8 of the 64 processors are colored.  The two images on the left
+   illustrate brick-to-pencil communication.  The two images on the
+   right illustrate pencil-to-pencil communication, which in this
+   case transposes the *y* and *z* dimensions of the grid.
+
+Parallel 3d FFTs require substantial communication relative to their
+computational cost.  A 3d FFT is implemented by a series of 1d FFTs
+along the *x-*, *y-*, and *z-*\ direction of the FFT grid.  Thus the FFT
+grid cannot be decomposed like atoms into 3 dimensions for parallel
+processing of the FFTs but only in 1 (as planes) or 2 (as pencils)
+dimensions and in between the steps the grid needs to be transposed to
+have the FFT grid portion "owned" by each MPI process complete in the
+direction of the 1d FFTs it has to perform. LAMMPS uses the
+pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure.
+
+Initially (far left), each processor owns a brick of same-color grid
+cells (actually grid points) contained within in its sub-domain.  A
+brick-to-pencil communication operation converts this layout to 1d
+pencils in the *x*-dimension (center left).  Again, cells of the same
+color are owned by the same processor.  Each processor can then compute
+a 1d FFT on each pencil of data it wholly owns using a call to the
+configured FFT library.  A pencil-to-pencil communication then converts
+this layout to pencils in the *y* dimension (center right) which
+effectively transposes the *x* and *y* dimensions of the grid, followed
+by 1d FFTs in *y*.  A final transpose of pencils from *y* to *z* (far
+right) followed by 1d FFTs in *z* completes the forward FFT.  The data
+is left in a *z*-pencil layout for the convolution operation.  One or
+more inverse FFTs then perform the sequence of 1d FFTs and communication
+steps in reverse order; the final layout of resulting grid values is the
+same as the initial brick layout.
+
+Each communication operation within the FFT (brick-to-pencil or
+pencil-to-pencil or pencil-to-brick) converts one tiling of the 3d grid
+to another, where a tiling in this context means an assignment of a
+small brick-shaped subset of grid points to each processor, the union of
+which comprise the entire grid.  The parallel `fftMPI library
+<https://lammps.github.io/fftmpi/>`_ written for LAMMPS allows arbitrary
+definitions of the tiling so that an irregular partitioning of the
+simulation domain can use it directly.  Transforming data from one
+tiling to another is implemented in `fftMPI` using point-to-point
+communication, where each processor sends data to a few other
+processors, since each tile in the initial tiling overlaps with a
+handful of tiles in the final tiling.
+
+The transformations could also be done using collective communication
+across all $P$ processors with a single call to ``MPI_Alltoall()``, but
+this is typically much slower.  However, for the specialized brick and
+pencil tiling illustrated in :ref:`fft-parallel` figure, collective
+communication across the entire MPI communicator is not required.  In
+the example an :math:`8^3` grid with 512 grid cells is partitioned
+across 64 processors; each processor owns a 2x2x2 3d brick of grid
+cells.  The initial brick-to-pencil communication (upper left to upper
+right) only requires collective communication within subgroups of 4
+processors, as illustrated by the 4 colors.  More generally, a
+brick-to-pencil communication can be performed by partitioning *P*
+processors into :math:`P^{\frac{2}{3}}` subgroups of
+:math:`P^{\frac{1}{3}}` processors each.  Each subgroup performs
+collective communication only within its subgroup.  Similarly,
+pencil-to-pencil communication can be performed by partitioning *P*
+processors into :math:`P^{\frac{1}{2}}` subgroups of
+:math:`P^{\frac{1}{2}}` processors each.  This is illustrated in the
+figure for the :math:`y \Rightarrow z` communication (center).  An
+eight-processor subgroup owns the front *yz* plane of data and performs
+collective communication within the subgroup to transpose from a
+*y*-pencil to *z*-pencil layout.
+
+LAMMPS invokes point-to-point communication by default, but also
+provides the option of partitioned collective communication when using a
+:doc:`kspace_modify collective yes <kspace_modify>` command to switch to
+that mode.  In the latter case, the code detects the size of the
+disjoint subgroups and partitions the single *P*-size communicator into
+multiple smaller communicators, each of which invokes collective
+communication.  Testing on a large IBM Blue Gene/Q machine at Argonne
+National Labs showed a significant improvement in FFT performance for
+large processor counts; partitioned collective communication was faster
+than point-to-point communication or global collective communication
+involving all *P* processors.
+
+Here are some additional details about FFTs for long-range and related
+grid/particle operations that LAMMPS supports:
+
+- The fftMPI library allows each grid dimension to be a multiple of
+  small prime factors (2,3,5), and allows any number of processors to
+  perform the FFT.  The resulting brick and pencil decompositions are
+  thus not always as well-aligned but the size of subgroups of
+  processors for the two modes of communication (brick/pencil and
+  pencil/pencil) still scale as :math:`O(P^{\frac{1}{3}})` and
+  :math:`O(P^{\frac{1}{2}})`.
+
+- For efficiency in performing 1d FFTs, the grid transpose
+  operations illustrated in Figure \ref{fig:fft} also involve
+  reordering the 3d data so that a different dimension is contiguous
+  in memory.  This reordering can be done during the packing or
+  unpacking of buffers for MPI communication.
+
+- For large systems and particularly a large number of MPI processes,
+  the dominant cost for parallel FFTs is often the communication, not
+  the computation of 1d FFTs, even though the latter scales as :math:`N
+  \log(N)` in the number of grid points *N* per grid direction.  This is
+  due to the fact that only a 2d decomposition into pencils is possible
+  while atom data (and their corresponding short-range force and energy
+  computations) can be decomposed efficiently in 3d.
+
+  This can be addressed by reducing the number of MPI processes involved
+  in the MPI communication by using :doc:`hybrid MPI + OpenMP
+  parallelization <Speed_omp>`.  This will use OpenMP parallelization
+  inside the MPI domains and while that may have a lower parallel
+  efficiency, it reduces the communication overhead.
+
+  As an alternative it is also possible to start a :ref:`multi-partition
+  <partition>` calculation and then use the :doc:`verlet/split
+  integrator <run_style>` to perform the PPPM computation on a
+  dedicated, separate partition of MPI processes.  This uses an integer
+  "1:*p*" mapping of *p* sub-domains of the atom decomposition to one
+  sub-domain of the FFT grid decomposition and where pairwise non-bonded
+  and bonded forces and energies are computed on the larger partition
+  and the PPPM kspace computation concurrently on the smaller partition.
+
+- LAMMPS also implements PPPM-based solvers for other long-range
+  interactions, dipole and dispersion (Lennard-Jones), which can be used
+  in conjunction with long-range  Coulombics for point charges.
+
+- LAMMPS implements a ``GridComm`` class which overlays the simulation
+  domain with a regular grid, partitions it across processors in a
+  manner consistent with processor sub-domains, and provides methods for
+  forward and reverse communication of owned and ghost grid point
+  values.  It is used for PPPM as an FFT grid (as outlined above) and
+  also for the MSM algorithm which uses a cascade of grid sizes from
+  fine to coarse to compute long-range Coulombic forces.  The GridComm
+  class is also useful for models where continuum fields interact with
+  particles.  For example, the two-temperature model (TTM) defines heat
+  transfer between atoms (particles) and electrons (continuum gas) where
+  spatial variations in the electron temperature are computed by finite
+  differences of a discretized heat equation on a regular grid.  The
+  :doc:`fix ttm/grid <fix_ttm>` command uses the ``GridComm`` class
+  internally to perform its grid operations on a distributed grid
+  instead of the original :doc:`fix ttm <fix_ttm>` which uses a
+  replicated grid.
--- a/doc/src/Developer_par_neigh.rst
+++ b/doc/src/Developer_par_neigh.rst
@ -0,0 +1,159 @@
+Neighbor lists
+^^^^^^^^^^^^^^
+
+To compute forces efficiently, each processor creates a Verlet-style
+neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned,
+*j* = owned or ghost) with separation less than the applicable
+neighbor list cutoff distance.  In LAMMPS the neighbor lists are stored
+in a multiple-page data structure; each page is a contiguous chunk of
+memory which stores vectors of neighbor atoms *j* for many *i* atoms.
+This allows pages to be incrementally allocated or deallocated in blocks
+as needed.  Neighbor lists typically consume the most memory of any data
+structure in LAMMPS.  The neighbor list is rebuilt (from scratch) once
+every few timesteps, then used repeatedly each step for force or other
+computations.  The neighbor cutoff distance is :math:`R_n = R_f +
+\Delta_s`, where :math:`R_f` is the (largest) force cutoff defined by
+the interatomic potential for computing short-range pairwise or manybody
+forces and :math:`\Delta_s` is a "skin" distance that allows the list to
+be used for multiple steps assuming that atoms do not move very far
+between consecutive time steps.  Typically the code triggers
+reneighboring when any atom has moved half the skin distance since the
+last reneighboring; this and other options of the neighbor list rebuild
+can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
+
+On steps when reneighboring is performed, atoms which have moved outside
+their owning processor's sub-domain are first migrated to new processors
+via communication.  Periodic boundary conditions are also (only)
+enforced on these steps to ensure each atom is re-assigned to the
+correct processor.  After migration, the atoms owned by each processor
+are stored in a contiguous vector.  Periodically each processor
+spatially sorts owned atoms within its vector to reorder it for improved
+cache efficiency in force computations and neighbor list building.  For
+that atoms are spatially binned and then reordered so that atoms in the
+same bin are adjacent in the vector.  Atom sorting can be disabled or
+its settings modified with the :doc:`atom_modify <atom_modify>` command.
+
+.. _neighbor-stencil:
+.. figure:: img/neigh-stencil.png
+   :align: center
+
+   neighbor list stencils
+
+   A 2d simulation sub-domain (thick black line) and the corresponding
+   ghost atom cutoff region (dashed blue line) for both orthogonal
+   (left) and triclinic (right) domains.  A regular grid of neighbor
+   bins (thin lines) overlays the entire simulation domain and need not
+   align with sub-domain boundaries; only the portion overlapping the
+   augmented sub-domain is shown.  In the triclinic case it overlaps the
+   bounding box of the tilted rectangle.  The blue- and red-shaded bins
+   represent a stencil of bins searched to find neighbors of a particular
+   atom (black dot).
+
+To build a local neighbor list in linear time, the simulation domain is
+overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
+as shown in the :ref:`neighbor-stencil` figure for 2d models and a
+single MPI processor's sub-domain.  Each processor stores a set of
+neighbor bins which overlap its sub-domain extended by the neighbor
+cutoff distance :math:`R_n`.  As illustrated, the bins need not align
+with processor boundaries; an integer number in each dimension is fit to
+the size of the entire simulation box.
+
+Most often LAMMPS builds what it calls a "half" neighbor list where
+each *i,j* neighbor pair is stored only once, with either atom *i* or
+*j* as the central atom.  The build can be done efficiently by using a
+pre-computed "stencil" of bins around a central origin bin which
+contains the atom whose neighbors are being searched for.  A stencil
+is simply a list of integer offsets in *x,y,z* of nearby bins
+surrounding the origin bin which are close enough to contain any
+neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the
+origin bin.  Note that for a half neighbor list, the stencil can be
+asymmetric since each atom only need store half its nearby neighbors.
+
+These stencils are illustrated in the figure for a half list and a bin
+size of :math:`\frac{1}{2} R_n`.  There are 13 red+blue stencil bins in
+2d (for the orthogonal case, 15 for triclinic).  In 3d there would be
+63, 13 in the plane of bins that contain the origin bin and 25 in each
+of the two planes above it in the *z* direction (75 for triclinic).  The
+reason the triclinic stencil has extra bins is because the bins tile the
+bounding box of the entire triclinic domain and thus are not periodic
+with respect to the simulation box itself.  The stencil and logic for
+determining which *i,j* pairs to include in the neighbor list are
+altered slightly to account for this.
+
+To build a neighbor list, a processor first loops over its "owned" plus
+"ghost" atoms and assigns each to a neighbor bin.  This uses an integer
+vector to create a linked list of atom indices within each bin.  It then
+performs a triply-nested loop over its owned atoms *i*, the stencil of
+bins surrounding atom *i*'s bin, and the *j* atoms in each stencil bin
+(including ghost atoms).  If the distance :math:`r_{ij} < R_n`, then
+atom *j* is added to the vector of atom *i*'s neighbors.
+
+Here are additional details about neighbor list build options LAMMPS
+supports:
+
+- The choice of bin size is an option; a size half of :math:`R_n` has
+  been found to be optimal for many typical cases.  Smaller bins incur
+  additional overhead to loop over; larger bins require more distance
+  calculations.  Note that for smaller bin sizes, the 2d stencil in the
+  figure would be more semi-circular in shape (hemispherical in 3d),
+  with bins near the corners of the square eliminated due to their
+  distance from the origin bin.
+
+- Depending on the interatomic potential(s) and other commands used in
+  an input script, multiple neighbor lists and stencils with different
+  attributes may be needed.  This includes lists with different cutoff
+  distances, e.g. for force computation versus occasional diagnostic
+  computations such as a radial distribution function, or for the
+  r-RESPA time integrator which can partition pairwise forces by
+  distance into subsets computed at different time intervals.  It
+  includes "full" lists (as opposed to half lists) where each *i,j* pair
+  appears twice, stored once with *i* and *j*, and which use a larger
+  symmetric stencil.  It also includes lists with partial enumeration of
+  ghost atom neighbors.  The full and ghost-atom lists are used by
+  various manybody interatomic potentials.  Lists may also use different
+  criteria for inclusion of a pair interaction.  Typically this simply
+  depends only on the distance between two atoms and the cutoff
+  distance.  But for finite-size coarse-grained particles with
+  individual diameters (e.g. polydisperse granular particles), it can
+  also depend on the diameters of the two particles.
+
+- When using :doc:`pair style hybrid <pair_hybrid>` multiple sub-lists
+  of the master neighbor list for the full system need to be generated,
+  one for each sub-style, which contains only the *i,j* pairs needed to
+  compute interactions between subsets of atoms for the corresponding
+  potential.  This means not all *i* or *j* atoms owned by a processor
+  are included in a particular sub-list.
+
+- Some models use different cutoff lengths for pairwise interactions
+  between different kinds of particles which are stored in a single
+  neighbor list.  One example is a solvated colloidal system with large
+  colloidal particles where colloid/colloid, colloid/solvent, and
+  solvent/solvent interaction cutoffs can be dramatically different.
+  Another is a model of polydisperse finite-size granular particles;
+  pairs of particles interact only when they are in contact with each
+  other.  Mixtures with particle size ratios as high as 10-100x may be
+  used to model realistic systems.  Efficient neighbor list building
+  algorithms for these kinds of systems are available in LAMMPS.  They
+  include a method which uses different stencils for different cutoff
+  lengths and trims the stencil to only include bins that straddle the
+  cutoff sphere surface.  More recently a method which uses both
+  multiple stencils and multiple bin sizes was developed; it builds
+  neighbor lists efficiently for systems with particles of any size
+  ratio, though other considerations (timestep size, force computations)
+  may limit the ability to model systems with huge polydispersity.
+
+- For small and sparse systems and as a fallback method, LAMMPS also
+  supports neighbor list construction without binning by using a full
+  :math:`O(N^2)` loop over all *i,j* atom pairs in a sub-domain when
+  using the :doc:`neighbor nsq <neighbor>` command.
+
+- Dependent on the "pair" setting of the :doc:`newton <newton>` command,
+  the "half" neighbor lists may contain **all** pairs of atoms where
+  atom *j* is a ghost atom (i.e. when the newton pair setting is *off*)
+  For the newton pair *on* setting the atom *j* is only added to the
+  list if its *z* coordinate is larger, or if equal the *y* coordinate
+  is larger, and that is equal, too, the *x* coordinate is larger.  For
+  homogeneously dense systems that will result in picking neighbors from
+  a same size sector in always the same direction relative to the
+  "owned" atom and thus it should lead to similar length neighbor lists
+  and thus reduce the chance of a load imbalance.
--- a/doc/src/Developer_par_openmp.rst
+++ b/doc/src/Developer_par_openmp.rst
@ -0,0 +1,114 @@
+OpenMP Parallelism
+^^^^^^^^^^^^^^^^^^
+
+The styles in the INTEL, KOKKOS, and OPENMP package offer to use OpenMP
+thread parallelism to predominantly distribute loops over local data
+and thus follow an orthogonal parallelization strategy to the
+decomposition into spatial domains used by the :doc:`MPI partitioning
+<Developer_par_part>`.  For clarity, this section discusses only the
+implementation in the OPENMP package as it is the simplest. The INTEL
+and KOKKOS package offer additional options and are more complex since
+they support more features and different hardware like co-processors
+or GPUs.
+
+One of the key decisions when implementing the OPENMP package was to
+keep the changes to the source code small, so that it would be easier to
+maintain the code and keep it in sync with the non-threaded standard
+implementation.  this is achieved by a) making the OPENMP version a
+derived class from the regular version (e.g. ``PairLJCutOMP`` from
+``PairLJCut``) and overriding only methods that are multi-threaded or
+need to be modified to support multi-threading (similar to what was done
+in the OPT package), b) keeping the structure in the modified code very
+similar so that side-by-side comparisons are still useful, and c)
+offloading additional functionality and multi-thread support functions
+into three separate classes ``ThrOMP``, ``ThrData``, and ``FixOMP``.
+``ThrOMP`` provides additional, multi-thread aware functionality not
+available in the corresponding base class (e.g. ``Pair`` for
+``PairLJCutOMP``) like multi-thread aware variants of the "tally"
+functions. Those functions are made available through multiple
+inheritance so those new functions have to have unique names to avoid
+ambiguities; typically ``_thr`` is appended to the name of the function.
+``ThrData`` is a classes that manages per-thread data structures.
+It is used instead of extending the corresponding storage to per-thread
+arrays to avoid slowdowns due to "false sharing" when multiple threads
+update adjacent elements in an array and thus force the CPU cache lines
+to be reset and re-fetched.  ``FixOMP`` finally manages the "multi-thread
+state" like settings and access to per-thread storage, it is activated
+by the :doc:`package omp <package>` command.
+
+Avoiding data races
+"""""""""""""""""""
+
+A key problem when implementing thread parallelism in an MD code is
+to avoid data races when updating accumulated properties like forces,
+energies, and stresses.  When interactions are computed, they always
+involve multiple atoms and thus there are race conditions when multiple
+threads want to update per-atom data of the same atoms.  Five possible
+strategies have been considered to avoid this:
+
+1) restructure the code so that there is no overlapping access possible
+   when computing in parallel, e.g. by breaking lists into multiple
+   parts and synchronizing threads in between.
+2) have each thread be "responsible" for a specific group of atoms and
+   compute these interactions multiple times, once on each thread that
+   is responsible for a given atom and then have each thread only update
+   the properties of this atom.
+3) use mutexes around functions and regions of code where the data race
+   could happen
+4) use atomic operations when updating per-atom properties
+5) use replicated per-thread data structures to accumulate data without
+   conflicts and then use a reduction to combine those results into the
+   data structures used by the regular style.
+
+Option 5 was chosen for the OPENMP package because it would retain the
+performance for the case of 1 thread and the code would be more
+maintainable.  Option 1 would require extensive code changes,
+particularly to the neighbor list code; options 2 would have incurred a
+2x or more performance penalty for the serial case; option 3 causes
+significant overhead and would enforce serialization of operations in
+inner loops and thus defeat the purpose of multi-threading; option 4
+slows down the serial case although not quite as bad as option 2.  The
+downside of option 5 is that the overhead of the reduction operations
+grows with the number of threads used, so there would be a crossover
+point where options 2 or 4 would result in faster executing.  That is
+why option 2 for example is used in the GPU package because a GPU is a
+processor with a massive number of threads.  However, since the MPI
+parallelization is generally more effective for typical MD systems, the
+expectation is that thread parallelism is only used for a smaller number
+of threads (2-8).  At the time of its implementation, that number was
+equivalent to the number of CPU cores per CPU socket on high-end
+supercomputers.
+
+Thus arrays like the force array are dimensioned to the number of atoms
+times the number of threads when enabling OpenMP support and inside the
+compute functions a pointer to a different chunk is obtained by each thread.
+Similarly, accumulators like potential energy or virial are kept in
+per-thread instances of the ``ThrData`` class and then only reduced and
+stored in their global counterparts at the end of the force computation.
+
+
+Loop scheduling
+"""""""""""""""
+
+Multi-thread parallelization is applied by distributing (outer) loops
+statically across threads.  Typically this would be the loop over local
+atoms *i* when processing *i,j* pairs of atoms from a neighbor list.
+The design of the neighbor list code results in atoms having a similar
+number of neighbors for homogeneous systems and thus load imbalances
+across threads are not common and typically happen for systems where
+also the MPI parallelization would be unbalanced, which would typically
+have a more pronounced impact on the performance.  This same loop
+scheduling scheme can also be applied to the reduction operations on
+per-atom data to try and reduce the overhead of the reduction operation.
+
+Neighbor list parallelization
+"""""""""""""""""""""""""""""
+
+In addition to the parallelization of force computations, also the
+generation of the neighbor lists is parallelized.  As explained
+previously, neighbor lists are built by looping over "owned" atoms and
+storing the neighbors in "pages".  In the OPENMP variants of the
+neighbor list code, each thread operates on a different chunk of "owned"
+atoms and allocates and fills its own set of pages with neighbor list
+data.  This is achieved by each thread keeping its own instance of the
+:cpp:class:`MyPage <LAMMPS_NS::MyPage>` page allocator class.
--- a/doc/src/Developer_par_part.rst
+++ b/doc/src/Developer_par_part.rst
@ -0,0 +1,89 @@
+Partitioning
+^^^^^^^^^^^^
+
+The underlying spatial decomposition strategy used by LAMMPS for
+distributed-memory parallelism is set with the :doc:`comm_style command
+<comm_style>` and can be either "brick" (a regular grid) or "tiled".
+
+.. _domain-decomposition:
+.. figure:: img/domain-decomp.png
+   :align: center
+
+   domain decomposition
+
+   This figure shows the different kinds of domain decomposition used
+   for MPI parallelization: "brick" on the left with an orthogonal
+   (left) and a triclinic (middle) simulation domain, and a "tiled"
+   decomposition (right).  The black lines show the division into
+   sub-domains and the contained atoms are "owned" by the corresponding
+   MPI process. The green dashed lines indicate how sub-domains are
+   extended with "ghost" atoms up to the communication cutoff distance.
+
+The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
+or triclinic in shape, as illustrated in the :ref:`domain-decomposition`
+figure for the 2d case.  Orthogonal means the box edges are aligned with
+the *x*, *y*, *z* Cartesian axes, and the box faces are thus all
+rectangular.  Triclinic allows for a more general parallelepiped shape
+in which edges are aligned with three arbitrary vectors and the box
+faces are parallelograms.  In each dimension box faces can be periodic,
+or non-periodic with fixed or shrink-wrapped boundaries.  In the fixed
+case, atoms which move outside the face are deleted; shrink-wrapped
+means the position of the box face adjusts continuously to enclose all
+the atoms.
+
+For distributed-memory MPI parallelism, the simulation box is spatially
+decomposed (partitioned) into non-overlapping sub-domains which fill the
+box. The default partitioning, "brick", is most suitable when atom
+density is roughly uniform, as shown in the left-side images of the
+:ref:`domain-decomposition` figure.  The sub-domains comprise a regular
+grid and all sub-domains are identical in size and shape.  Both the
+orthogonal and triclinic boxes can deform continuously during a
+simulation, e.g. to compress a solid or shear a liquid, in which case
+the processor sub-domains likewise deform.
+
+
+For models with non-uniform density, the number of particles per
+processor can be load-imbalanced with the default partitioning.  This
+reduces parallel efficiency, as the overall simulation rate is limited
+by the slowest processor, i.e. the one with the largest computational
+load.  For such models, LAMMPS supports multiple strategies to reduce
+the load imbalance:
+
+- The processor grid decomposition is by default based on the simulation
+  cell volume and tries to optimize the volume to surface ratio for the sub-domains.
+  This can be changed with the :doc:`processors command <processors>`.
+- The parallel planes defining the size of the sub-domains can be shifted
+  with the :doc:`balance command <balance>`. Which can be done in addition
+  to choosing a more optimal processor grid.
+- The recursive bisectioning algorithm in combination with the "tiled"
+  communication style can produce a partitioning with equal numbers of
+  particles in each sub-domain.
+
+
+.. |decomp1| image:: img/decomp-regular.png
+   :width: 24%
+
+.. |decomp2| image:: img/decomp-processors.png
+   :width: 24%
+
+.. |decomp3| image:: img/decomp-balance.png
+   :width: 24%
+
+.. |decomp4| image:: img/decomp-rcb.png
+   :width: 24%
+
+|decomp1|  |decomp2|  |decomp3|  |decomp4|
+
+The pictures above demonstrate different decompositions for a 2d system
+with 12 MPI ranks.  The atom colors indicate the load imbalance of each
+sub-domain with green being optimal and red the least optimal.
+
+Due to the vacuum in the system, the default decomposition is unbalanced
+with several MPI ranks without atoms (left). By forcing a 1x12x1
+processor grid, every MPI rank does computations now, but number of
+atoms per sub-domain is still uneven and the thin slice shape increases
+the amount of communication between sub-domains (center left). With a
+2x6x1 processor grid and shifting the sub-domain divisions, the load
+imbalance is further reduced and the amount of communication required
+between sub-domains is less (center right).  And using the recursive
+bisectioning leads to further improved decomposition (right).
--- a/doc/src/Developer_parallel.rst
+++ b/doc/src/Developer_parallel.rst
@ -0,0 +1,28 @@
+Parallel algorithms
+-------------------
+
+LAMMPS is designed to enable running simulations in parallel using the
+MPI parallel communication standard with distributed data via domain
+decomposition.  The parallelization aims to be efficient result in good
+strong scaling (= good speedup for the same system) and good weak
+scaling (= the computational cost of enlarging the system is
+proportional to the system size).  Additional parallelization using GPUs
+or OpenMP can also be applied within the sub-domain assigned to an MPI
+process.  For clarity, most of the following illustrations show the 2d
+simulation case. The underlying algorithms in those cases, however,
+apply to both 2d and 3d cases equally well.
+
+.. note::
+
+   The text and most of the figures in this chapter were adapted
+   for the manual from the section on parallel algorithms in the
+   :ref:`new LAMMPS paper <lammps_paper>`.
+
+.. toctree::
+   :maxdepth: 1
+
+   Developer_par_part
+   Developer_par_comm
+   Developer_par_neigh
+   Developer_par_long
+   Developer_par_openmp
--- a/doc/src/Developer_utils.rst
+++ b/doc/src/Developer_utils.rst
@ -203,9 +203,15 @@ Convenience functions
 .. doxygenfunction:: date2num
   :project: progguide

+.. doxygenfunction:: current_date
+   :project: progguide
+
 Customized standard functions
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+.. doxygenfunction:: binary_search
+   :project: progguide
+
 .. doxygenfunction:: merge_sort
   :project: progguide

--- a/doc/src/Errors_debug.rst
+++ b/doc/src/Errors_debug.rst
@ -40,11 +40,10 @@ We use it to show how to identify the origin of a segmentation fault.

 After recompiling LAMMPS and running the input you should get something like this:

-.. code-block:
+.. code-block::

   $ ./lmp -in in.melt
   LAMMPS (19 Mar 2020)
-   OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:94)
     using 1 OpenMP thread(s) per MPI task
   Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
   Created orthogonal box = (0 0 0) to (16.796 16.796 16.796)
--- a/doc/src/Errors_messages.rst
+++ b/doc/src/Errors_messages.rst
@ -714,7 +714,7 @@ Doc page with :doc:`WARNING messages <Errors_warnings>`

 *Cannot create/grow a vector/array of pointers for %s*
   LAMMPS code is making an illegal call to the templated memory
-   allocaters, to create a vector or array of pointers.
+   allocators, to create a vector or array of pointers.

 *Cannot create_atoms after reading restart file with per-atom info*
   The per-atom info was stored to be used when by a fix that you may
@ -7879,19 +7879,19 @@ keyword to allow for additional bonds to be formed
 *Unexpected end of -reorder file*
   Self-explanatory.

-*Unexpected empty line in AngleCoeffs section*
+*Unexpected empty line in Angle Coeffs section*
   Read a blank line where there should be coefficient data.

-*Unexpected empty line in BondCoeffs section*
+*Unexpected empty line in Bond Coeffs section*
   Read a blank line where there should be coefficient data.

-*Unexpected empty line in DihedralCoeffs section*
+*Unexpected empty line in Dihedral Coeffs section*
   Read a blank line where there should be coefficient data.

-*Unexpected empty line in ImproperCoeffs section*
+*Unexpected empty line in Improper Coeffs section*
   Read a blank line where there should be coefficient data.

-*Unexpected empty line in PairCoeffs section*
+*Unexpected empty line in Pair Coeffs section*
   Read a blank line where there should be coefficient data.

 *Unexpected end of custom file*
--- a/doc/src/Examples.rst
+++ b/doc/src/Examples.rst
@ -27,7 +27,7 @@ be quickly post-processed into a movie using commands described on the
 :doc:`dump image <dump_image>` doc page.

 Animations of many of the examples can be viewed on the Movies section
-of the `LAMMPS web site <https://www.lammps.org/movies.html>`_.
+of the `LAMMPS website <https://www.lammps.org/movies.html>`_.

 There are two kinds of sub-directories in the examples folder.  Lower
 case named directories contain one or a few simple, quick-to-run
@ -169,7 +169,7 @@ Running the simulation produces the files *dump.indent* and
 *log.lammps*\ .  You can visualize the dump file of snapshots with a
 variety of third-party tools highlighted on the
 `Visualization <https://www.lammps.org/viz.html>`_ page of the LAMMPS
-web site.
+website.

 If you uncomment the :doc:`dump image <dump_image>` line(s) in the input
 script a series of JPG images will be produced by the run (assuming
--- a/doc/src/Install_windows.rst
+++ b/doc/src/Install_windows.rst
@ -12,7 +12,7 @@ Note that each installer package has a date in its name, which
 corresponds to the LAMMPS version of the same date.  Installers for
 current and older versions of LAMMPS are available.  32-bit and 64-bit
 installers are available, and each installer contains both a serial
-and parallel executable.  The installer web site also explains how to
+and parallel executable.  The installer website also explains how to
 install the Windows MPI package (MPICH2 from Argonne National Labs),
 needed to run in parallel with MPI.

--- a/doc/src/Intro_citing.rst
+++ b/doc/src/Intro_citing.rst
@ -4,28 +4,41 @@ Citing LAMMPS
 Core Algorithms
 ^^^^^^^^^^^^^^^

-Since LAMMPS is a community project, there is not a single one
-publication or reference that describes **all** of LAMMPS.
-The canonical publication that describes the foundation, that is
-the basic spatial decomposition approach, the neighbor finding,
-and basic communications algorithms used in LAMMPS is:
+The paper mentioned below is the best overview of LAMMPS, but there are
+also publications describing particular models or algorithms implemented
+in LAMMPS or complementary software that is has interfaces to.  Please
+see below for how to cite contributions to LAMMPS.

- `S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995). <http://www.sandia.gov/~sjplimp/papers/jcompphys95.pdf>`_
+.. _lammps_paper:

-So any project using LAMMPS (or a derivative application using LAMMPS as
-a simulation engine) should cite this paper. A new publication
-describing the developments and improvements of LAMMPS in the 25 years
-since then is currently in preparation.
+The latest canonical publication that describes the basic features, the
+source code design, the program structure, the spatial decomposition
+approach, the neighbor finding, basic communications algorithms, and how
+users and developers have contributed to LAMMPS is:
+
+  `LAMMPS - A flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales, Comp. Phys. Comm. (accepted 09/2021), DOI:10.1016/j.cpc.2021.108171 <https://doi.org/10.1016/j.cpc.2021.108171>`_
+
+So a project using LAMMPS or a derivative application that uses LAMMPS
+as a simulation engine should cite this paper.  The paper is expected to
+be published in its final form under the same DOI in the first half
+of 2022.  Please also give the URL of the LAMMPS website in your paper,
+namely https://www.lammps.org.
+
+The original publication describing the parallel algorithms used in the
+initial versions of LAMMPS is:
+
+  `S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995). <http://www.sandia.gov/~sjplimp/papers/jcompphys95.pdf>`_


 DOI for the LAMMPS code
 ^^^^^^^^^^^^^^^^^^^^^^^

-LAMMPS developers use the `Zenodo service at CERN
-<https://zenodo.org/>`_ to create digital object identifies (DOI) for
-stable releases of the LAMMPS code. There are two types of DOIs for the
-LAMMPS source code: the canonical DOI for **all** versions of LAMMPS,
-which will always point to the **latest** stable release version is:
+LAMMPS developers use the `Zenodo service at CERN <https://zenodo.org/>`_
+to create digital object identifies (DOI) for stable releases of the
+LAMMPS source code. There are two types of DOIs for the LAMMPS source code.
+
+The canonical DOI for **all** versions of LAMMPS, which will always
+point to the **latest** stable release version is:

 - DOI: `10.5281/zenodo.3726416 <https://dx.doi.org/10.5281/zenodo.3726416>`_

@ -45,11 +58,13 @@ about LAMMPS and its features.
 Citing contributions
 ^^^^^^^^^^^^^^^^^^^^

-LAMMPS has many features and that use either previously published
-methods and algorithms or novel features.  It also includes potential
-parameter filed for specific models.  Where available, a reminder about
-references for optional features used in a specific run is printed to
-the screen and log file.  Style and output location can be selected with
-the :ref:`-cite command-line switch <cite>`.  Additional references are
+LAMMPS has many features that use either previously published methods
+and algorithms or novel features.  It also includes potential parameter
+files for specific models.  Where available, a reminder about references
+for optional features used in a specific run is printed to the screen
+and log file.  Style and output location can be selected with the
+:ref:`-cite command-line switch <cite>`.  Additional references are
 given in the documentation of the :doc:`corresponding commands
-<Commands_all>` or in the :doc:`Howto tutorials <Howto>`.
+<Commands_all>` or in the :doc:`Howto tutorials <Howto>`.  So please
+make certain, that you provide the proper acknowledgments and citations
+in any published works using LAMMPS.
--- a/doc/src/Intro_website.rst
+++ b/doc/src/Intro_website.rst
@ -26,7 +26,7 @@ available online are listed below.
 * `Tutorials <https://www.lammps.org/tutorials.html>`_

 * `Pre- and post-processing tools for LAMMPS <https://www.lammps.org/prepost.html>`_
-* `Other software usable with LAMMPS <https://www.lammps.org/offsite.html>`_
+* `Other software usable with LAMMPS <https://www.lammps.org/external.html>`_
 * `Viz tools usable with LAMMPS <https://www.lammps.org/viz.html>`_

 * `Benchmark performance <https://www.lammps.org/bench.html>`_
--- a/doc/src/Library_create.rst
+++ b/doc/src/Library_create.rst
@ -10,6 +10,7 @@ This section documents the following functions:
 - :cpp:func:`lammps_mpi_init`
 - :cpp:func:`lammps_mpi_finalize`
 - :cpp:func:`lammps_kokkos_finalize`
+- :cpp:func:`lammps_python_finalize`

 --------------------

@ -33,7 +34,7 @@ simple example demonstrating its use:
     int lmpargc = sizeof(lmpargv)/sizeof(const char *);

     /* create LAMMPS instance */
-     handle = lammps_open_no_mpi(lmpargc, lmpargv, NULL);
+     handle = lammps_open_no_mpi(lmpargc, (char **)lmpargv, NULL);
     if (handle == NULL) {
       printf("LAMMPS initialization failed");
       lammps_mpi_finalize();
@ -104,3 +105,13 @@ calling program.

 .. doxygenfunction:: lammps_mpi_finalize
   :project: progguide
+
+-----------------------
+
+.. doxygenfunction:: lammps_kokkos_finalize
+   :project: progguide
+
+-----------------------
+
+.. doxygenfunction:: lammps_python_finalize
+   :project: progguide
--- a/doc/src/Modify.rst
+++ b/doc/src/Modify.rst
@ -9,13 +9,15 @@ this.
 If you add a new feature to LAMMPS and think it will be of interest to
 general users, we encourage you to submit it for inclusion in LAMMPS
 as a pull request on our `GitHub site <https://github.com/lammps/lammps>`_,
-after reading :doc:`this page <Modify_contribute>`.
+after reading about :doc:`how to prepare your code for submission <Modify_contribute>`
+and :doc:`the style requirements and recommendations <Modify_style>`.

 .. toctree::
   :maxdepth: 1

   Modify_overview
   Modify_contribute
+   Modify_style

 .. toctree::
   :maxdepth: 1
--- a/doc/src/Modify_contribute.rst
+++ b/doc/src/Modify_contribute.rst
@ -1,22 +1,20 @@
 Submitting new features for inclusion in LAMMPS
 ===============================================

-We encourage users to submit new features or modifications for LAMMPS to
-`the core developers <https://www.lammps.org/authors.html>`_ so they
-can be added to the LAMMPS distribution. The preferred way to manage and
-coordinate this is via the LAMMPS project on `GitHub
-<https://github.com/lammps/lammps>`_.  Please see the :doc:`GitHub
-Tutorial <Howto_github>` for a demonstration on how to do that.  An
-alternative is to contact the LAMMPS developers or the indicated
-developer of a package or feature directly and send in your contribution
-via e-mail, but that can add a significant delay on getting your
-contribution included, depending on how busy the respective developer
-is, how complex a task it would be to integrate that code, and how
-many - if any - changes are required before the code can be included.
+We encourage LAMMPS users to submit new features they wrote for LAMMPS
+to be included into the LAMMPS distribution and thus become easily
+accessible to all LAMMPS users.  The LAMMPS source code is managed with
+git and public development is hosted on `GitHub
+<https://github.com/lammps/lammps>`_.  You can monitor the repository to
+be notified of releases, follow the ongoing development, and comment on
+topics of interest to you.
+
+Communication with the LAMMPS developers
+----------------------------------------

 For any larger modifications or programming project, you are encouraged
 to contact the LAMMPS developers ahead of time in order to discuss
-implementation strategies and coding guidelines. That will make it
+implementation strategies and coding guidelines.  That will make it
 easier to integrate your contribution and results in less work for
 everybody involved.  You are also encouraged to search through the list
 of `open issues on GitHub <https://github.com/lammps/lammps/issues>`_
@ -24,235 +22,105 @@ and submit a new issue for a planned feature, so you would not duplicate
 the work of others (and possibly get scooped by them) or have your work
 duplicated by others.

-For informal communication with (some of) the LAMMPS developers you may
-ask to join the `LAMMPS developers on Slack
-<https://lammps.slack.com>`_.  This slack work space is by invitation
-only. Thus for access, please send an e-mail to ``slack@lammps.org``
-explaining what part of LAMMPS you are working on.  Only discussions
-related to LAMMPS development are tolerated, so this is **NOT** for
-people that look for help with compiling, installing, or using
-LAMMPS. Please contact the
-`lammps-users mailing list <https://www.lammps.org/mail.html>`_ or the
-`LAMMPS forum <https://www.lammps.org/forum.html>`_ for those purposes
-instead.
+For informal communication with the LAMMPS developers you may ask to
+join the `LAMMPS developers on Slack <https://lammps.slack.com>`_.  This
+slack work space is by invitation only.  Thus for access, please send an
+e-mail to ``slack@lammps.org`` explaining what part of LAMMPS you are
+working on.  Only discussions related to LAMMPS development are
+tolerated in that work space, so this is **NOT** for people that look for
+help with compiling, installing, or using LAMMPS.  Please post a message
+to the `lammps-users mailing list <https://www.lammps.org/mail.html>`_
+or the `LAMMPS forum <https://www.lammps.org/forum.html>`_ for those
+purposes.

-How quickly your contribution will be integrated depends largely on how
-much effort it will cause to integrate and test it, how many and what
-kind of changes it requires to the core codebase, and of how much
-interest it is to the larger LAMMPS community.  Please see below for a
-checklist of typical requirements.  Once you have prepared everything,
-see the :doc:`LAMMPS GitHub Tutorial <Howto_github>` page for
-instructions on how to submit your changes or new files through a GitHub
-pull request.  If you prefer to submit patches or full files, you should
-first make certain, that your code works correctly with the latest
-patch-level version of LAMMPS and contains all bug fixes from it.  Then
-create a gzipped tar file of all changed or added files or a
-corresponding patch file using 'diff -u' or 'diff -c' and compress it
-with gzip.  Please only use gzip compression, as this works well and is
-available on all platforms.
+Packages versus individual files
+--------------------------------
+
+The remainder of this chapter describes how to add new "style" files of
+various kinds to LAMMPS.  Packages are simply collections of one or more
+such new class files which are invoked as a new style within a LAMMPS
+input script.  In some cases also collections of supporting functions or
+classes are included as separate files in a package, especially when
+they can be shared between multiple styles. If designed correctly, these
+additions typically do not require any changes to the core code of
+LAMMPS; they are simply add-on files that are compiled with the rest of
+LAMMPS.  To make those styles work, you may need some trivial changes to
+the core code; an example of a trivial change is making a parent-class
+method "virtual" when you derive a new child class from it.
+
+If you think your new feature or package requires some non-trivial
+changes in core LAMMPS files, you should communicate with the LAMMPS
+developers `on Slack <https://lammps.org/slack.html>`_, `on GitHub
+<https://github.com/lammps/lammps/issues>`_, or `via email
+<https://www.lammps.org/authors.html>`_, since we may have
+recommendations about what changes to do where, or may not want to
+include certain changes for some reason and thus you would need to look
+for alternatives.
+
+Time and effort required
+------------------------
+
+How quickly your contribution will be integrated can vary a lot.  It
+depends largely on how much effort it will cause the LAMMPS developers
+to integrate and test it, how many and what kind of changes to the core
+code are required, how quickly you can address them and of how much
+interest it is to the larger LAMMPS community.  Please see the section
+on :doc:`LAMMPS programming style and requirements <Modify_style>` for
+instructions, recommendations, and formal requirements.  A small,
+modular, well written contribution may be integrated within hours, but a
+complex change that will require a redesign of some core functionality
+in LAMMPS for a clean integration can take many months until it is
+considered ready for inclusion (though this is rare).
+
+
+Submission procedure
+--------------------
+
+All changes to LAMMPS (including those from LAMMPS developers) are
+integrated via pull requests on GitHub and cannot be merged without
+passing the automated testing and an approving review by a LAMMPS core
+developer.  Thus before submitting your contribution, you should first
+make certain, that your added or modified code compiles and works
+correctly with the latest patch-level or development version of LAMMPS
+and contains all bug fixes from it.
+
+Once you have prepared everything, see the :doc:`LAMMPS GitHub Tutorial
+<Howto_github>` page for instructions on how to submit your changes or
+new files through a GitHub pull request yourself.  If you are unable or
+unwilling to submit via GitHub yourself, you may also submit patch files
+or full files to the LAMMPS developers and ask them to submit a pull
+request on GitHub on your behalf.  Then create a gzipped tar file of
+all  changed or added files or a corresponding patch file using
+'diff -u' or 'diff -c' format and compress it with gzip.  Please only
+use gzip compression, as this works well and is available on all platforms.

 If the new features/files are broadly useful we may add them as core
 files to LAMMPS or as part of a :doc:`package <Packages_list>`.  All
 packages are listed and described on the :doc:`Packages details
 <Packages_details>` doc page.

-Note that by providing us files to release, you are agreeing to make
-them open-source, i.e. we can release them under the terms of the GPL
-(version 2), used as a license for the rest of LAMMPS.  And as part of
-a LGPL (version 2.1) distribution that we make available to developers
-on request only and with files that are authorized for that kind of
-distribution removed (e.g. interface to FFTW).  See the
+Licensing
+---------
+
+Note that by providing us files to release, you agree to make them
+open-source, i.e. we can release them under the terms of the GPL
+(version 2) with the rest of LAMMPS.  And similarly as part of a LGPL
+(version 2.1) distribution of LAMMPS that we make available to
+developers on request only and with files that are not authorized for
+that kind of distribution removed (e.g. interface to FFTW).  See the
 :doc:`LAMMPS license <Intro_opensource>` page for details.

-.. note::
+External contributions
+----------------------

-   If you prefer to actively develop and support your add-on feature
-   yourself, then you may wish to make it available for download from
-   your own website, as a user package that LAMMPS users can add to
-   their copy of LAMMPS.  See the `Offsite LAMMPS packages and tools
-   <https://www.lammps.org/offsite.html>`_ page of the LAMMPS web site
-   for examples of groups that do this.  We are happy to advertise your
-   package and web site from that page.  Simply email the `developers
-   <https://www.lammps.org/authors.html>`_ with info about your package
-   and we will post it there.  We recommend to name external packages
-   USER-\<name\> so they can be easily distinguished from bundled packages
-   that do not have the USER- prefix.
+If you prefer to do so, you can also develop and support your add-on
+feature **without** having it included in the LAMMPS distribution, for
+example as a download from a website of your own.  See the `External
+LAMMPS packages and tools <https://www.lammps.org/external.html>`_ page
+of the LAMMPS website for examples of groups that do this.  We are happy
+to advertise your package and website from that page.  Simply email the
+`developers <https://www.lammps.org/authors.html>`_ with info about your
+package and we will post it there.  We recommend to name external
+packages USER-\<name\> so they can be easily distinguished from bundled
+packages that do not have the USER- prefix.

-.. _lws: https://www.lammps.org
-
-The previous sections of this page describe how to add new "style"
-files of various kinds to LAMMPS.  Packages are simply collections of
-one or more new class files which are invoked as a new style within a
-LAMMPS input script.  If designed correctly, these additions typically
-do not require changes to the main core of LAMMPS; they are simply
-add-on files.  If you think your new feature requires non-trivial
-changes in core LAMMPS files, you should `communicate with the
-developers <https://www.lammps.org/authors.html>`_, since we may or
-may not want to include those changes for some reason.  An example of a
-trivial change is making a parent-class method "virtual" when you derive
-a new child class from it.
-
-Here is a checklist of steps you need to follow to submit a single file
-or package for our consideration.  Following these steps will save
-both you and us time. Please have a look at the existing files in
-packages in the src directory for examples. If you are uncertain, please ask.
-
-* All source files you provide must compile with the most current
-  version of LAMMPS with multiple configurations. In particular you
-  need to test compiling LAMMPS from scratch with -DLAMMPS_BIGBIG
-  set in addition to the default -DLAMMPS_SMALLBIG setting. Your code
-  will need to work correctly in serial and in parallel using MPI.
-
-* For consistency with the rest of LAMMPS and especially, if you want
-  your contribution(s) to be added to main LAMMPS code or one of its
-  standard packages, it needs to be written in a style compatible with
-  other LAMMPS source files. This means: 2-character indentation per
-  level, **no tabs**, no lines over 100 characters. I/O is done via
-  the C-style stdio library (mixing of stdio and iostreams is generally
-  discouraged), class header files should not import any system headers
-  outside of <cstdio>, STL containers should be avoided in headers,
-  system header from the C library should use the C++-style names
-  (<cstdlib>, <cstdio>, or <cstring>) instead of the C-style names
-  <stdlib.h>, <stdio.h>, or <string.h>), and forward declarations
-  used where possible or needed to avoid including headers.
-  All added code should be placed into the LAMMPS_NS namespace or a
-  sub-namespace; global or static variables should be avoided, as they
-  conflict with the modular nature of LAMMPS and the C++ class structure.
-  Header files must **not** import namespaces with *using*\ .
-  This all is so the developers can more easily understand, integrate,
-  and maintain your contribution and reduce conflicts with other parts
-  of LAMMPS.  This basically means that the code accesses data
-  structures, performs its operations, and is formatted similar to other
-  LAMMPS source files, including the use of the error class for error
-  and warning messages.
-
-* To simplify reformatting contributed code in a way that is compatible
-  with the LAMMPS formatting styles, you can use clang-format (version 8
-  or later).  The LAMMPS distribution includes a suitable ``.clang-format``
-  file which will be applied if you run ``clang-format -i some_file.cpp``
-  on your files inside the LAMMPS src tree.  Please only reformat files
-  that you have contributed.  For header files containing a
-  ``SomeStyle(keyword, ClassName)`` macros it is required to have this
-  macro embedded with a pair of ``// clang-format off``, ``// clang-format on``
-  commends and the line must be terminated with a semi-colon (;).
-  Example:
-
-  .. code-block:: c++
-
-     #ifdef COMMAND_CLASS
-     // clang-format off
-     CommandStyle(run,Run);
-     // clang-format on
-     #else
-
-     #ifndef LMP_RUN_H
-     [...]
-
-  You may also use ``// clang-format on/off`` throughout your file
-  to protect sections of the file from being reformatted.
-
-* Please review the list of :doc:`available Packages <Packages_details>`
-  to see if your contribution could be added to be added to one of them.
-  It should fit into the general purposed of that package.  If it does not
-  fit well, it can be added to one of the EXTRA- packages or the MISC package.
-
-* If your contribution has several related features that are not covered
-  by one of the existing packages or is dependent on a library (bundled
-  or external), it is best to make it a package directory with a name
-  like FOO.  In addition to your new files, the directory should contain
-  a README text file.  The README should contain your name and contact
-  information and a brief description of what your new package does.  If
-  your files depend on other LAMMPS style files also being installed
-  (e.g. because your file is a derived class from the other LAMMPS
-  class), then an Install.sh file is also needed to check for those
-  dependencies and modifications to src/Depend.sh to trigger the checks.
-  See other README and Install.sh files in other directories as examples.
-  Similarly for CMake support changes need to be made to cmake/CMakeLists.txt,
-  the files in cmake/presets, and possibly a file to cmake/Modules/Packages/
-  added.  Please check out how this is handled for existing packages and
-  ask the LAMMPS developers if you need assistance.  Please submit a pull
-  request on GitHub or send us a tarball of this FOO directory and all
-  modified files.  Pull requests are strongly encouraged since they greatly
-  reduce the effort required to integrate a contribution and simplify the
-  process of adjusting the contributed code to cleanly fit into the
-  LAMMPS distribution.
-
-* Your new source files need to have the LAMMPS copyright, GPL notice,
-  and your name and email address at the top, like other
-  user-contributed LAMMPS source files.  They need to create a class
-  that is inside the LAMMPS namespace.  To simplify maintenance, we
-  may ask to adjust the programming style and formatting style to closer
-  match the rest of LAMMPS.  We bundle a clang-format configuration file
-  that can help with adjusting the formatting, although this is not a
-  strict requirement.
-
-* You **must** also create a **documentation** file for each new command
-  or style you are adding to LAMMPS.  For simplicity and convenience,
-  the documentation of groups of closely related commands or styles may
-  be combined into a single file.  This will be one file for a
-  single-file feature.  For a package, it might be several files.  These
-  are text files with a .rst extension using the `reStructuredText
-  <rst_>`_ markup language, that are then converted to HTML and PDF
-  using the `Sphinx <sphinx_>`_ documentation generator tool.  Running
-  Sphinx with the included configuration requires Python 3.x.
-  Configuration settings and custom extensions for this conversion are
-  included in the source distribution, and missing python packages will
-  be transparently downloaded into a virtual environment via pip. Thus,
-  if your local system is missing required packages, you need access to
-  the internet. The translation can be as simple as doing "make html
-  pdf" in the doc folder.  As appropriate, the text files can include
-  inline mathematical expression or figures (see doc/JPG for examples).
-  Additional PDF files with further details (see doc/PDF for examples)
-  may also be included.  The page should also include literature
-  citations as appropriate; see the bottom of doc/fix_nh.rst for
-  examples and the earlier part of the same file for how to format the
-  cite itself.  Citation labels must be unique across all .rst files.
-  The "Restrictions" section of the page should indicate if your
-  command is only available if LAMMPS is built with the appropriate
-  FOO package.  See other package doc files for examples of
-  how to do this.  Please run at least "make html" and "make spelling"
-  and carefully inspect and proofread the resulting HTML format doc page
-  before submitting your code.  Upon submission of a pull request,
-  checks for error free completion of the HTML and PDF build will be
-  performed and also a spell check, a check for correct anchors and
-  labels, and a check for completeness of references all styles in their
-  corresponding tables and lists is run.  In case the spell check
-  reports false positives they can be added to the file
-  doc/utils/sphinx-config/false_positives.txt
-
-* For a new package (or even a single command) you should include one or
-  more example scripts demonstrating its use.  These should run in no
-  more than a couple minutes, even on a single processor, and not require
-  large data files as input.  See directories under examples/PACKAGES for
-  examples of input scripts other users provided for their packages.
-  These example inputs are also required for validating memory accesses
-  and testing for memory leaks with valgrind
-
-* If there is a paper of yours describing your feature (either the
-  algorithm/science behind the feature itself, or its initial usage, or
-  its implementation in LAMMPS), you can add the citation to the \*.cpp
-  source file.  See src/EFF/atom_vec_electron.cpp for an example.
-  A LaTeX citation is stored in a variable at the top of the file and
-  a single line of code registering this variable is added to the
-  constructor of the class.  If there is additional functionality (which
-  may have been added later) described in a different publication,
-  additional citation descriptions may be added for as long as they
-  are only registered when the corresponding keyword activating this
-  functionality is used.  With these options it is possible to have
-  LAMMPS output a specific citation reminder whenever a user invokes
-  your feature from their input script.  Note that you should only use
-  this for the most relevant paper for a feature and a publication that
-  you or your group authored.  E.g. adding a citation in the code for
-  a paper by Nose and Hoover if you write a fix that implements their
-  integrator is not the intended usage.  That kind of citation should
-  just be included in the documentation page you provide describing
-  your contribution.  If you are not sure what the best option would
-  be, please contact the LAMMPS developers for advice.
-
-Finally, as a general rule-of-thumb, the more clear and
-self-explanatory you make your documentation and README files, and the
-easier you make it for people to get started, e.g. by providing example
-scripts, the more likely it is that users will try out your new feature.
-
-.. _rst: https://docutils.readthedocs.io/en/sphinx-docs/user/rst/quickstart.html
-.. _sphinx: https://sphinx-doc.org
--- a/doc/src/Modify_overview.rst
+++ b/doc/src/Modify_overview.rst
@ -40,8 +40,10 @@ then your pair_foo.h file should be structured as follows:
 .. code-block:: c++

   #ifdef PAIR_CLASS
-   PairStyle(foo,PairFoo)
+   // clang-format off
+   PairStyle(foo,PairFoo);
   #else
+   // clanf-format on
   ...
   (class definition for PairFoo)
   ...
--- a/doc/src/Modify_style.rst
+++ b/doc/src/Modify_style.rst
@ -0,0 +1,439 @@
+LAMMPS programming style and requirements for contributions
+===========================================================
+
+The following is a summary of the current requirements and
+recommendations for including contributed source code or documentation
+into the LAMMPS software distribution.
+
+Motivation
+----------
+
+The LAMMPS developers are committed to providing a software package that
+is versatile, reliable, high-quality, efficient, portable, and easy to
+maintain and modify.  Achieving all of these goals is challenging since
+a large part of LAMMPS consists of contributed code from many different
+authors and not many of them are professionally trained programmers and
+familiar with the idiosyncrasies of maintaining a large software
+package.  In addition, changes that interfere with the parallel
+efficiency of the core code must be avoided.  As LAMMPS continues to
+grow and more features and functionality are added, it becomes a
+necessity to be more discriminating with new contributions while also
+working at the same time to improve the existing code.
+
+The following requirements and recommendations are provided to help
+maintaining or improving that status.  Where possible we utilize
+available continuous integration tools to search for common programming
+mistakes, portability limitations, incompatible formatting, and
+undesired side effects.  It is indicated which requirements are strict,
+and which represent a preference and thus are negotiable or optional.
+
+Please feel free to contact the LAMMPS core developers in case you need
+additional explanations or clarifications or in case you need assistance
+in realizing the (strict) requirements for your contributions.
+
+Licensing requirements (strict)
+-------------------------------
+
+Contributing authors agree when submitting a pull request that their
+contributions can be distributed under the LAMMPS license
+conditions. This is the GNU public license in version 2 (not 3 or later)
+for the publicly distributed versions, e.g. on the LAMMPS homepage or on
+GitHub.  On request we also make a version of LAMMPS available under
+LGPL 2.1 terms; this will usually be the latest available or a previous
+stable version with a few LGPL 2.1 incompatible files removed.
+
+Your new source files should have the LAMMPS copyright, GPL notice, and
+your name and email address at the top, like other user-contributed
+LAMMPS source files.
+
+Contributions may be under a different license for long as that
+license does not conflict with the aforementioned terms.  Contributions
+that use code with a conflicting license can be split into two parts:
+
+1. the core parts (i.e. parts that must be in the `src` tree) that are
+   licensed under compatible terms and bundled with the LAMMPS sources
+2. an external library that must be downloaded and compiled (either
+   separately or as part of the LAMMPS compilation)
+
+Please note, that this split licensed mode may complicate including the
+contribution in binary packages.
+
+Using Pull Requests on GitHub (preferred)
+-----------------------------------------
+
+All contributions to LAMMPS are processed as pull requests on GitHub
+(this also applies to the work of the core LAMMPS developers).  A
+:doc:`tutorial for submitting pull requests on GitHub <Howto_github>` is
+provided.  If this is still problematic, contributors may contact any of
+the core LAMMPS developers for help or to create a pull request on their
+behalf.  This latter way of submission may delay the integration as it
+depends on the amount of time required to prepare the pull request and
+free time available by the LAMMPS developer in question to spend on this
+task.
+
+Integration Testing (strict)
+----------------------------
+
+Contributed code, like all pull requests, must pass the automated
+tests on GitHub before it can be merged with the LAMMPS distribution.
+These tests compile LAMMPS in a variety of environments and settings and
+run the bundled unit tests.  At the discretion of the LAMMPS developer
+managing the pull request, additional tests may be activated that test
+for "side effects" on running a collection of input decks and create
+consistent results.  Also, the translation of the documentation to HTML
+and PDF is tested for.
+
+More specifically, this means that contributed source code **must**
+compile with the most current version of LAMMPS with ``-DLAMMPS_BIGBIG``
+in addition to the default setting of ``-DLAMMPS_SMALLBIG``.  The code
+needs to work correctly in both cases and also in serial and parallel
+using MPI.
+
+Some "disruptive" changes may break tests and require updates to the
+testing tools or scripts or tests themselves.  This is rare.  If in
+doubt, contact the LAMMPS developer that is assigned to the pull request
+for further details and explanations and suggestions of what needs to be
+done.
+
+Documentation (strict)
+----------------------
+
+Contributions that add new styles or commands or augment existing ones
+must include the corresponding new or modified documentation in
+`ReStructuredText format <rst>`_ (.rst files in the ``doc/src/`` folder). The
+documentation shall be written in American English and the .rst file
+must use only ASCII characters so it can be cleanly translated to PDF
+files (via `sphinx <sphinx>`_ and PDFLaTeX).  Special characters may be included via
+embedded math expression typeset in a LaTeX subset.
+
+.. _rst: https://docutils.readthedocs.io/en/sphinx-docs/user/rst/quickstart.html
+
+When adding new commands, they need to be integrated into the sphinx
+documentation system, and the corresponding command tables and lists
+updated. When translating the documentation into html files there should
+be no warnings. When adding a new package also some lists describing
+packages must be updated as well as a package specific description added
+and, if necessary, some package specific build instructions included.
+
+As appropriate, the text files with the documentation can include inline
+mathematical expression or figures (see ``doc/JPG`` for examples).
+Additional PDF files with further details (see ``doc/PDF`` for examples) may
+also be included.  The page should also include literature citations as
+appropriate; see the bottom of ``doc/fix_nh.rst`` for examples and the
+earlier part of the same file for how to format the cite itself.
+Citation labels must be unique across **all** .rst files.  The
+"Restrictions" section of the page should indicate if your command is
+only available if LAMMPS is built with the appropriate FOO package.  See
+other package doc files for examples of how to do this.
+
+Please run at least "make html" and "make spelling" and carefully
+inspect and proofread the resulting HTML format doc page before
+submitting your code.  Upon submission of a pull request, checks for
+error free completion of the HTML and PDF build will be performed and
+also a spell check, a check for correct anchors and labels, and a check
+for completeness of references all styles in their corresponding tables
+and lists is run.  In case the spell check reports false positives they
+can be added to the file doc/utils/sphinx-config/false_positives.txt
+
+Contributions that add or modify the library interface or "public" APIs
+from the C++ code or the Fortran module must include suitable doxygen
+comments in the source and corresponding changes to the documentation
+sources for the "Programmer Guide" guide section of the LAMMPS manual.
+
+Examples (preferred)
+--------------------
+
+In most cases, it is preferred that example scripts (simple, small, fast
+to complete on 1 CPU) are included that demonstrate the use of new or
+extended functionality. These are typically under the examples or
+examples/PACKAGES directory.  A few guidelines for such example input
+decks.
+
+- commands that generate output should be commented out (except when the
+  output is the sole purpose or the feature, e.g. for a new compute).
+
+- commands like :doc:`log <log>`, :doc:`echo <echo>`, :doc:`package
+  <package>`, :doc:`processors <processors>`, :doc:`suffix <suffix>` may
+  **not** be used in the input file (exception: "processors * * 1" or
+  similar is acceptable when used to avoid unwanted domain decomposition
+  of empty volumes).
+
+- outside of the log files no generated output should be included
+
+- custom thermo_style settings may not include output measuring CPU or other time
+  as that makes comparing the thermo output between different runs more complicated.
+
+- input files should be named ``in.name``, data files should be named
+  ``data.name`` and log files should be named ``log.version.name.<compiler>.<ncpu>``
+
+- the total file size of all the inputs and outputs should be small
+
+- where possible potential files from the "potentials" folder or data
+  file from other folders should be re-used through symbolic links
+
+Howto document (optional)
+-------------------------
+
+If your feature requires some more complex steps and explanations to be
+used correctly or some external or bundled tools or scripts, we
+recommend that you also contribute a :doc:`Howto document <Howto>`
+providing some more background information and some tutorial material.
+This can also be used to provide more in-depth explanations for bundled
+examples.
+
+As a general rule-of-thumb, the more clear and self-explanatory you make
+your documentation, README files and examples, and the easier you make
+it for people to get started, the more likely it is that users will try
+out your new feature.
+
+Programming Style Requirements (varied)
+---------------------------------------
+
+The LAMMPS developers aim to employ a consistent programming style and
+naming conventions across the entire code base, as this helps with
+maintenance, debugging, and understanding the code, both for developers
+and users.
+
+The files `pair_lj_cut.h`, `pair_lj_cut.cpp`, `utils.h`, and `utils.cpp`
+may serve as representative examples.
+
+Command or Style names, file names, and keywords (mostly strict)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+All user-visible command or style names should be all lower case and
+should only use letters, numbers, or forward slashes.  They should be
+descriptive and initialisms should be avoided unless they are well
+established (e.g. lj for Lennard-Jones).  For a compute style
+"some/name" the source files must be called `compute_some_name.h` and
+`compute_some_name.cpp`. The "include guard" would then be
+`LMP_COMPUTE_SOME_NAME_H` and the class name `ComputeSomeName`.
+
+Whitespace and permissions (preferred)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Source files should not contain TAB characters unless required by the
+syntax (e.g. in makefiles) and no trailing whitespace.  Text files
+should be added with Unix-style line endings (LF-only). Git will
+automatically convert those in both directions when running on Windows;
+use dos2unix on Linux machines to convert files.  Text files should have
+a line ending on the last line.
+
+All files should have 0644 permissions, i.e writable to the user only
+and readable by all and no executable permissions.  Executable
+permissions (0755) should only be on shell scripts or python or similar
+scripts for interpreted script languages.
+
+Indentation and Placement of Braces (strongly preferred)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+LAMMPS uses 2 characters per indentation level and lines should be
+kept within 100 characters wide.
+
+For new files added to the "src" tree, a `clang-format
+<https://clang.llvm.org/docs/ClangFormat.html>`_ configuration file is
+provided under the name `.clang-format`.  This file is compatible with
+clang-format version 8 and later. With that file present files can be
+reformatted according to the configuration with a command like:
+`clang-format -i new-file.cpp`.  Ideally, this is done while writing the
+code or before a pull request is submitted.  Blocks of code where the
+reformatting from clang-format yields undesirable output may be
+protected with placing a pair `// clang-format off` and `// clang-format
+on` comments around that block.
+
+Programming language standards (required)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The core of LAMMPS is written in C++11 in a style that can be mostly
+described as "C with classes".  Advanced C++ features like operator
+overloading or excessive use of templates are avoided with the intent to
+keep the code readable to programmers that have limited C++ programming
+experience.  C++ constructs are acceptable when they help improving the
+readability and reliability of the code, e.g. when using the
+`std::string` class instead of manipulating pointers and calling the
+string functions of the C library.  In addition and number of convenient
+:doc:`utility functions and classes <Developer_utils>` for recurring
+tasks are provided.
+
+Included Fortran code has to be compatible with the Fortran 2003
+standard.  Python code must be compatible with Python 3.5.  Large parts
+or LAMMPS (including the :ref:`PYTHON package <PKG-PYTHON>`) are also
+compatible with Python 2.7.  Compatibility with Python 2.7 is
+desirable, but compatibility with Python 3.5 is **required**.
+
+Compatibility with these older programming language standards is very
+important to maintain portability, especially with HPC cluster
+environments, which tend to be running older software stacks and LAMMPS
+users may be required to use those older tools or not have the option to
+install newer compilers.
+
+Programming conventions (varied)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following is a collection of conventions that should be applied when
+writing code for LAMMPS.  Following these steps will make it much easier
+to integrate your contribution. Please have a look at the existing files
+in packages in the src directory for examples.  As a demonstration for
+how can be adapted to these conventions you may compare the REAXFF
+package with the what it looked like when it was called USER-REAXC.  If
+you are uncertain, please ask.
+
+- system headers or from installed libraries are include with angular
+  brackets (example: ``#include <vector>``), while local include file
+  use double quotes (example: ``#include "atom.h"``).
+
+- when including system header files from the C library use the
+    C++-style names (``<cstdlib>`` or ``<cstring>``) instead of the
+    C-style names (``<stdlib.h>`` or ``<string.h>``)
+
+- the order of ``#include`` statements in a file ``some_name.cpp`` that
+  implements a class ``SomeName`` defined in a header file
+  ``some_name.h`` should be as follows:
+
+  - ``#include "some_name.h"`` followed by an empty line
+
+  - LAMMPS include files e.g. ``#include "comm.h"`` or ``#include
+    "modify.h"`` in alphabetical order followed by an empty line
+
+  - system header files from the C++ or C standard library followed by
+    an empty line
+
+  - ``using namespace LAMMPS_NS`` or other namespace imports.
+
+- I/O is done via the C-style stdio library and **not** iostreams.
+
+- Output to the screen and the logfile should be using the corresponding
+  FILE pointers and only be done on MPI rank 0.  Use the :cpp:func:`utils::logmesg`
+  convenience function where possible.
+
+- Header files, especially those defining a "style", should only use
+  the absolute minimum number of include files and **must not** contain
+  any ``using`` statements. Typically that would be only the header for
+  the base class. Instead any include statements should be put into the
+  corresponding implementation files and forward declarations be used.
+  For implementation files, the "include what you use" principle should
+  be employed.  However, there is the notable exception that when the
+  ``pointers.h`` header is included (or one of the base classes derived
+  from it) certain headers will always be included and thus do not need
+  to be explicitly specified.
+  These are: `mpi.h`, `cstddef`, `cstdio`, `cstdlib`, `string`, `utils.h`,
+  `vector`, `fmt/format.h`, `climits`, `cinttypes`.
+  This also means any such file can assume that `FILE`, `NULL`, and
+  `INT_MAX` are defined.
+
+- Header files that define a new LAMMPS style (i.e. that have a
+  ``SomeStyle(some/name,SomeName);`` macro in them) should only use the
+  include file for the base class and otherwise use forward declarations
+  and pointers; when interfacing to a library use the PIMPL (pointer
+  to implementation) approach where you have a pointer to a struct
+  that contains all library specific data (and thus requires the library
+  header) but use a forward declaration and define the struct only in
+  the implementation file. This is a **strict** requirement since this
+  is where type clashes between packages and hard to find bugs have
+  regularly manifested in the past.
+
+- Please use clang-format only to reformat files that you have
+  contributed.  For header files containing a ``SomeStyle(keyword,
+  ClassName)`` macros it is required to have this macro embedded with a
+  pair of ``// clang-format off``, ``// clang-format on`` commends and
+  the line must be terminated with a semi-colon (;).  Example:
+
+  .. code-block:: c++
+
+     #ifdef COMMAND_CLASS
+     // clang-format off
+     CommandStyle(run,Run);
+     // clang-format on
+     #else
+
+     #ifndef LMP_RUN_H
+     [...]
+
+  You may also use ``// clang-format on/off`` throughout your files
+  to protect individual sections from being reformatted.
+
+- We rarely accept new styles in the core src folder.  Thus please
+  review the list of :doc:`available Packages <Packages_details>` to see
+  if your contribution could be added to be added to one of them.  It
+  should fit into the general purposed of that package.  If it does not
+  fit well, it may be added to one of the EXTRA- packages or the MISC
+  package.
+
+
+Contributing a package
+----------------------
+
+If your contribution has several related features that are not covered
+by one of the existing packages or is dependent on a library (bundled or
+external), it is best to make it a package directory with a name like
+FOO.  In addition to your new files, the directory should contain a
+README text file.  The README should contain your name and contact
+information and a brief description of what your new package does.
+
+
+Build system (strongly preferred)
+---------------------------------
+
+LAMMPS currently supports two build systems: one that is based on
+:doc:`traditional Makefiles <Build_make>` and one that is based on
+:doc:`CMake <Build_cmake>`.  Thus your contribution must be compatible
+with and support both.
+
+For a single pair of header and implementation files that are an
+independent feature, it is usually only required to add them to
+`src/.gitignore``.
+
+For traditional make, if your contributed files or package depend on
+other LAMMPS style files or packages also being installed (e.g. because
+your file is a derived class from the other LAMMPS class), then an
+Install.sh file is also needed to check for those dependencies and
+modifications to src/Depend.sh to trigger the checks.  See other README
+and Install.sh files in other directories as examples.
+
+Similarly for CMake support, changes may need to be made to
+cmake/CMakeLists.txt, some of the files in cmake/presets, and possibly a
+file with specific instructions needs to be added to
+cmake/Modules/Packages/.  Please check out how this is handled for
+existing packages and ask the LAMMPS developers if you need assistance.
+
+
+Citation reminder (suggested)
+-----------------------------
+
+If there is a paper of yours describing your feature (either the
+algorithm/science behind the feature itself, or its initial usage, or
+its implementation in LAMMPS), you can add the citation to the \*.cpp
+source file.  See ``src/DIFFRACTION/compute_saed.cpp`` for an example.
+A BibTeX format citation is stored in a string variable at the top
+of the file and  a single line of code registering this variable is
+added to the constructor of the class.  When your feature is used,
+by default, LAMMPS will print the brief info and the DOI
+in the first line to the screen and the full citation to the log file.
+
+If there is additional functionality (which may have been added later)
+described in a different publication, additional citation descriptions
+may be added for as long as they are only registered when the
+corresponding keyword activating this functionality is used.  With these
+options it is possible to have LAMMPS output a specific citation
+reminder whenever a user invokes your feature from their input script.
+Please note that you should *only* use this for the *most* relevant
+paper for a feature and a publication that you or your group authored.
+E.g. adding a citation in the code for a paper by Nose and Hoover if you
+write a fix that implements their integrator is not the intended usage.
+That latter kind of citation should just be included in the
+documentation page you provide describing your contribution.  If you are
+not sure what the best option would be, please contact the LAMMPS
+developers for advice.
+
+
+Testing (optional)
+------------------
+
+If your contribution contains new utility functions or a supporting class
+(i.e. anything that does not depend on a LAMMPS object), new unit tests
+should be added to a suitable folder in the ``unittest`` tree.
+When adding a new LAMMPS style computing forces or selected fixes,
+a ``.yaml`` file with a test configuration and reference data should be
+added for the styles where a suitable tester program already exists
+(e.g. pair styles, bond styles, etc.). Please see
+:ref:`this section in the manual <testing>` for more information on
+how to enable, run, and expand testing.
--- a/doc/src/PDF/CG-DNA.pdf
+++ b/doc/src/PDF/CG-DNA.pdf
--- a/doc/src/PDF/colvars-refman-lammps.pdf
+++ b/doc/src/PDF/colvars-refman-lammps.pdf
--- a/doc/src/Packages_details.rst
+++ b/doc/src/Packages_details.rst
@ -915,7 +915,7 @@ This package has :ref:`specific installation instructions <gpu>` on the :doc:`Bu
 * :doc:`package gpu <package>`
 * :doc:`Commands <Commands_all>` pages (:doc:`pair <Commands_pair>`, :doc:`kspace <Commands_kspace>`)
  for styles followed by (g)
-* `Benchmarks page <https://www.lammps.org/bench.html>`_ of web site
+* `Benchmarks page <https://www.lammps.org/bench.html>`_ of website

 ----------

@ -1027,7 +1027,7 @@ This package has :ref:`specific installation instructions <intel>` on the :doc:`
 * Search the :doc:`commands <Commands_all>` pages (:doc:`fix <Commands_fix>`, :doc:`compute <Commands_compute>`,
  :doc:`pair <Commands_pair>`, :doc:`bond, angle, dihedral, improper <Commands_bond>`, :doc:`kspace <Commands_kspace>`) for styles followed by (i)
 * src/INTEL/TEST
-* `Benchmarks page <https://www.lammps.org/bench.html>`_ of web site
+* `Benchmarks page <https://www.lammps.org/bench.html>`_ of website

 ----------

@ -1164,7 +1164,7 @@ This package has :ref:`specific installation instructions <kokkos>` on the :doc:
 * Search the :doc:`commands <Commands_all>` pages (:doc:`fix <Commands_fix>`, :doc:`compute <Commands_compute>`,
  :doc:`pair <Commands_pair>`, :doc:`bond, angle, dihedral, improper <Commands_bond>`,
  :doc:`kspace <Commands_kspace>`) for styles followed by (k)
-* `Benchmarks page <https://www.lammps.org/bench.html>`_ of web site
+* `Benchmarks page <https://www.lammps.org/bench.html>`_ of website

 ----------

@ -1242,7 +1242,7 @@ A fix command which wraps the LATTE DFTB code, so that molecular
 dynamics can be run with LAMMPS using density-functional tight-binding
 quantum forces calculated by LATTE.

-More information on LATTE can be found at this web site:
+More information on LATTE can be found at this website:
 `https://github.com/lanl/LATTE <latte-home_>`_.  A brief technical
 description is given with the :doc:`fix latte <fix_latte>` command.

@ -2017,7 +2017,7 @@ the :doc:`Build extras <Build_extras>` page.
 * Search the :doc:`commands <Commands_all>` pages (:doc:`fix <Commands_fix>`, :doc:`compute <Commands_compute>`,
  :doc:`pair <Commands_pair>`, :doc:`bond, angle, dihedral, improper <Commands_bond>`,
  :doc:`kspace <Commands_kspace>`) for styles followed by (o)
-* `Benchmarks page <https://www.lammps.org/bench.html>`_ of web site
+* `Benchmarks page <https://www.lammps.org/bench.html>`_ of website

 ----------

@ -2051,7 +2051,7 @@ This package has :ref:`specific installation instructions <opt>` on the :doc:`Bu
 * :doc:`OPT package <Speed_opt>`
 * :doc:`Section 2.6 -sf opt <Run_options>`
 * Search the :doc:`pair style <Commands_pair>` page for styles followed by (t)
-* `Benchmarks page <https://www.lammps.org/bench.html>`_ of web site
+* `Benchmarks page <https://www.lammps.org/bench.html>`_ of website

 .. _PKG-ORIENT:

@ -2248,16 +2248,16 @@ PYTHON package

 A :doc:`python <python>` command which allow you to execute Python code
 from a LAMMPS input script.  The code can be in a separate file or
-embedded in the input script itself.  See the :doc:`Python call <Python_call>` page for an overview of using Python from
-LAMMPS in this manner and all the :doc:`Python <Python_head>` manual pages
-for other ways to use LAMMPS and Python together.
+embedded in the input script itself.  See the :doc:`Python call
+<Python_call>` page for an overview of using Python from LAMMPS in this
+manner and all the :doc:`Python <Python_head>` manual pages for other
+ways to use LAMMPS and Python together.

 .. note::

-   Building with the PYTHON package assumes you have a Python
-   shared library available on your system, which needs to be a Python 2
-   version, 2.6 or later.  Python 3 is not yet supported.  See the
-   lib/python/README for more details.
+   Building with the PYTHON package assumes you have a Python development
+   environment (headers and libraries) available on your system, which needs
+   to be either Python version 2.7 or Python 3.5 and later.

 **Install:**

--- a/doc/src/Run_basics.rst
+++ b/doc/src/Run_basics.rst
@ -2,17 +2,25 @@ Basics of running LAMMPS
 ========================

 LAMMPS is run from the command line, reading commands from a file via
-the -in command line flag, or from standard input.
-Using the "-in in.file" variant is recommended:
+the -in command line flag, or from standard input.  Using the "-in
+in.file" variant is recommended (see note below).  The name of the
+LAMMPS executable is either ``lmp`` or ``lmp_<machine>`` with
+`<machine>` being the machine string used when compiling LAMMPS.  This
+is required when compiling LAMMPS with the traditional build system
+(e.g. with ``make mpi``), but optional when using CMake to configure and
+build LAMMPS:

 .. code-block:: bash

   $ lmp_serial -in in.file
   $ lmp_serial < in.file
+   $ lmp -in in.file
+   $ lmp < in.file
   $ /path/to/lammps/src/lmp_serial -i in.file
   $ mpirun -np 4 lmp_mpi -in in.file
+   $ mpiexec -np 4 lmp -in in.file
   $ mpirun -np 8 /path/to/lammps/src/lmp_mpi -in in.file
-   $ mpirun -np 6 /usr/local/bin/lmp -in in.file
+   $ mpiexec -n 6 /usr/local/bin/lmp -in in.file

 You normally run the LAMMPS command in the directory where your input
 script is located.  That is also where output files are produced by
@ -23,7 +31,7 @@ executable itself can be placed elsewhere.
 .. note::

   The redirection operator "<" will not always work when running
-   in parallel with mpirun; for those systems the -in form is required.
+   in parallel with mpirun or mpiexec; for those systems the -in form is required.

 As LAMMPS runs it prints info to the screen and a logfile named
 *log.lammps*\ .  More info about output is given on the
--- a/doc/src/Run_options.rst
+++ b/doc/src/Run_options.rst
@ -2,7 +2,7 @@ Command-line options
 ====================

 At run time, LAMMPS recognizes several optional command-line switches
-which may be used in any order.  Either the full word or a one-or-two
+which may be used in any order.  Either the full word or a one or two
 letter abbreviation can be used:

 * :ref:`-e or -echo <echo>`
@ -22,6 +22,7 @@ letter abbreviation can be used:
 * :ref:`-r2data or -restart2data <restart2data>`
 * :ref:`-r2dump or -restart2dump <restart2dump>`
 * :ref:`-sc or -screen <screen>`
+* :ref:`-sr or skiprun <skiprun>`
 * :ref:`-sf or -suffix <suffix>`
 * :ref:`-v or -var <var>`

@ -241,10 +242,11 @@ links with from the lib/message directory.  See the
 **-cite style** or **file name**

 Select how and where to output a reminder about citing contributions
-to the LAMMPS code that were used during the run. Available styles are
-"both", "none", "screen", or "log".  Any flag will be considered a file
-name to write the detailed citation info to.  Default is the "log" style
-where there is a short summary in the screen output and detailed citations
+to the LAMMPS code that were used during the run. Available keywords
+for styles are "both", "none", "screen", or "log".  Any other keyword
+will be considered a file name to write the detailed citation info to
+instead of logfile or screen.  Default is the "log" style where there
+is a short summary in the screen output and detailed citations
 in BibTeX format in the logfile.  The option "both" selects the detailed
 output for both, "none", the short output for both, and "screen" will
 write the detailed info to the screen and the short version to the log
@ -532,6 +534,21 @@ partition screen files file.N.

 ----------

+.. _skiprun:
+
+**-skiprun**
+
+Insert the command :doc:`timer timeout 0 every 1 <timer>` at the
+beginning of an input file or after a :doc:`clear <clear>` command.
+This has the effect that the entire LAMMPS input script is processed
+without executing actual :doc:`run <run>` or :doc:`minimize <minimize>`
+and similar commands (their main loops are skipped).  This can be
+helpful and convenient to test input scripts of long running
+calculations for correctness to avoid having them crash after a
+long time due to a typo or syntax error in the middle or at the end.
+
+----------
+
 .. _suffix:

 **-suffix style args**
--- a/doc/src/Speed.rst
+++ b/doc/src/Speed.rst
@ -13,7 +13,7 @@ for certain kinds of hardware, including multi-core CPUs, GPUs, and
 Intel Xeon Phi co-processors.

 The `Benchmark page <https://www.lammps.org/bench.html>`_ of the LAMMPS
-web site gives performance results for the various accelerator
+website gives performance results for the various accelerator
 packages discussed on the :doc:`Speed packages <Speed_packages>` doc
 page, for several of the standard LAMMPS benchmark problems, as a
 function of problem size and number of compute nodes, on different
--- a/doc/src/Speed_gpu.rst
+++ b/doc/src/Speed_gpu.rst
@ -153,7 +153,7 @@ usually resulting in inferior performance compared to using LAMMPS' native
 threading and vectorization support in the OPENMP and INTEL packages.

 See the `Benchmark page <https://www.lammps.org/bench.html>`_ of the
-LAMMPS web site for performance of the GPU package on various
+LAMMPS website for performance of the GPU package on various
 hardware, including the Titan HPC platform at ORNL.

 You should also experiment with how many MPI tasks per GPU to use to
--- a/doc/src/Speed_kokkos.rst
+++ b/doc/src/Speed_kokkos.rst
@ -407,7 +407,7 @@ Generally speaking, the following rules of thumb apply:
  by switching to single or mixed precision mode.

 See the `Benchmark page <https://www.lammps.org/bench.html>`_ of the
-LAMMPS web site for performance of the KOKKOS package on different
+LAMMPS website for performance of the KOKKOS package on different
 hardware.

 Advanced Kokkos options
--- a/doc/src/Speed_packages.rst
+++ b/doc/src/Speed_packages.rst
@ -144,7 +144,7 @@ sub-directories with Make.py commands and input scripts for using all
 the accelerator packages on various machines.  See the README files in
 those directories.

-As mentioned above, the `Benchmark page <https://www.lammps.org/bench.html>`_ of the LAMMPS web site gives
+As mentioned above, the `Benchmark page <https://www.lammps.org/bench.html>`_ of the LAMMPS website gives
 performance results for the various accelerator packages for several
 of the standard LAMMPS benchmark problems, as a function of problem
 size and number of compute nodes, on different hardware platforms.
--- a/doc/src/Tools.rst
+++ b/doc/src/Tools.rst
@ -7,7 +7,7 @@ steps are often necessary to setup and analyze a simulation.  A list
 of such tools can be found on the `LAMMPS webpage <lws_>`_ at these links:

 * `Pre/Post processing <https://www.lammps.org/prepost.html>`_
-* `Offsite LAMMPS packages & tools <https://www.lammps.org/offsite.html>`_
+* `External LAMMPS packages & tools <https://www.lammps.org/external.html>`_
 * `Pizza.py toolkit <pizza_>`_

 The last link for `Pizza.py <pizza_>`_ is a Python-based tool developed at
--- a/doc/src/bond_oxdna.rst
+++ b/doc/src/bond_oxdna.rst
@ -99,10 +99,10 @@ duplexes or arrays of DNA/RNA duplexes can be found in
 examples/PACKAGES/cgdna/util/.

 Please cite :ref:`(Henrich) <Henrich0>` in any publication that uses
-this implementation.  The article contains general information
+this implementation. An updated documentation that contains general information
 on the model, its implementation and performance as well as the structure of
-the data and input file. The preprint version of the article can be found
-`here <PDF/CG-DNA.pdf>`_.
+the data and input file can be found `here <PDF/CG-DNA.pdf>`_.
+
 Please cite also the relevant oxDNA/oxRNA publications. These are
 :ref:`(Ouldridge) <Ouldridge0>` and
 :ref:`(Ouldridge-DPhil) <Ouldridge-DPhil0>` for oxDNA,
--- a/doc/src/compute_angle.rst
+++ b/doc/src/compute_angle.rst
@ -23,22 +23,23 @@ Examples
 Description
 """""""""""

-Define a computation that extracts the angle energy calculated by each
-of the angle sub-styles used in the doc:`angle_style hybrid <angle_hybrid>`
-command.  These values are made accessible
-for output or further processing by other commands.  The group
-specified for this command is ignored.
+Define a computation that extracts the angle energy calculated by each of the
+angle sub-styles used in the :doc:`angle_style hybrid <angle_hybrid>` command.
+These values are made accessible for output or further processing by other
+commands.  The group specified for this command is ignored.

-This compute is useful when using :doc:`angle_style hybrid <angle_hybrid>` if you want to know the portion of the total
-energy contributed by one or more of the hybrid sub-styles.
+This compute is useful when using :doc:`angle_style hybrid <angle_hybrid>` if
+you want to know the portion of the total energy contributed by one or more of
+the hybrid sub-styles.

 Output info
 """""""""""

-This compute calculates a global vector of length N where N is the
-number of sub_styles defined by the :doc:`angle_style hybrid <angle_style>` command, which can be accessed by indices
-1-N.  These values can be used by any command that uses global scalar
-or vector values from a compute as input.  See the :doc:`Howto output <Howto_output>` page for an overview of LAMMPS output
+This compute calculates a global vector of length N where N is the number of
+sub_styles defined by the :doc:`angle_style hybrid <angle_style>` command,
+which can be accessed by indices 1-N.  These values can be used by any command
+that uses global scalar or vector values from a compute as input.  See the
+:doc:`Howto output <Howto_output>` page for an overview of LAMMPS output
 options.

 The vector values are "extensive" and will be in energy
@ -46,7 +47,8 @@ The vector values are "extensive" and will be in energy

 Restrictions
 """"""""""""
- none
+
+none

 Related commands
 """"""""""""""""
--- a/doc/src/fix_bond_swap.rst
+++ b/doc/src/fix_bond_swap.rst
@ -88,18 +88,18 @@ and bond partners B2 of B1 a is performed.  For each pair of A1-A2 and
 B1-B2 bonds to be eligible for swapping, the following 4 criteria must
 be met:

-(1) All 4 monomers must be in the fix group.
+1. All 4 monomers must be in the fix group.

-(2) All 4 monomers must be owned by the processor (not ghost atoms).
-This insures that another processor does not attempt to swap bonds
-involving the same atoms on the same timestep.  Note that this also
-means that bond pairs which straddle processor boundaries are not
-eligible for swapping on this step.
+2. All 4 monomers must be owned by the processor (not ghost atoms).
+   This insures that another processor does not attempt to swap bonds
+   involving the same atoms on the same timestep.  Note that this also
+   means that bond pairs which straddle processor boundaries are not
+   eligible for swapping on this step.

-(3) The distances between 4 pairs of atoms -- (A1,A2), (B1,B2),
-(A1,B2), (B1,A2) -- must all be less than the specified *cutoff*\ .
+3. The distances between 4 pairs of atoms -- (A1,A2), (B1,B2), (A1,B2),
+   (B1,A2) -- must all be less than the specified *cutoff*.

-(4) The molecule IDs of A1 and B1 must be the same (see below).
+4. The molecule IDs of A1 and B1 must be the same (see below).

 If an eligible B1 partner is found, the energy change due to swapping
 the 2 bonds is computed.  This includes changes in pairwise, bond, and
--- a/doc/src/fix_brownian.rst
+++ b/doc/src/fix_brownian.rst
@ -8,9 +8,8 @@ fix brownian command
 fix brownian/sphere command
 ===========================

-fix brownian/sphere command
-===========================
-
+fix brownian/asphere command
+============================

 Syntax
 """"""
--- a/doc/src/fix_wall_gran_region.rst
+++ b/doc/src/fix_wall_gran_region.rst
@ -66,7 +66,7 @@ non-granular particles and simpler wall geometries, respectively.
 Here are snapshots of example models using this command.  Corresponding
 input scripts can be found in examples/granregion.  Movies of these
 simulations are `here on the Movies page <https://www.lammps.org/movies.html#granregion>`_
-of the LAMMPS web site.
+of the LAMMPS website.

 .. |wallgran1| image:: img/gran_funnel.png
   :width: 48%
--- a/doc/src/group.rst
+++ b/doc/src/group.rst
@ -38,7 +38,7 @@ Syntax
       *intersect* args = two or more group IDs
       *dynamic* args = parent-ID keyword value ...
         one or more keyword/value pairs may be appended
-         keyword = *region* or *var* or *every*
+         keyword = *region* or *var* or *property* or *every*
           *region* value = region-ID
           *var* value = name of variable
           *property* value = name of custom integer or floating point vector
--- a/doc/src/img/decomp-balance.png
+++ b/doc/src/img/decomp-balance.png
--- a/doc/src/img/decomp-processors.png
+++ b/doc/src/img/decomp-processors.png
--- a/doc/src/img/decomp-rcb.png
+++ b/doc/src/img/decomp-rcb.png
--- a/doc/src/img/decomp-regular.png
+++ b/doc/src/img/decomp-regular.png
--- a/doc/src/img/domain-decomp.png
+++ b/doc/src/img/domain-decomp.png
--- a/doc/src/img/fft-decomp-parallel.png
+++ b/doc/src/img/fft-decomp-parallel.png
--- a/doc/src/img/ghost-comm.png
+++ b/doc/src/img/ghost-comm.png
--- a/doc/src/img/neigh-stencil.png
+++ b/doc/src/img/neigh-stencil.png
--- a/doc/src/package.rst
+++ b/doc/src/package.rst
@ -27,7 +27,7 @@ Syntax
             on = set Newton pairwise flag on (currently not allowed)
           *pair/only* = *off* or *on*
             off = apply "gpu" suffix to all available styles in the GPU package (default)
-             on  - apply "gpu" suffix only pair styles
+             on  = apply "gpu" suffix only pair styles
           *binsize* value = size
             size = bin size for neighbor list construction (distance units)
           *split* = fraction
--- a/doc/src/pair_hybrid.rst
+++ b/doc/src/pair_hybrid.rst
@ -198,8 +198,8 @@ same:

 Coefficients must be defined for each pair of atoms types via the
 :doc:`pair_coeff <pair_coeff>` command as described above, or in the
-data file read by the :doc:`read_data <read_data>` commands, or by
-mixing as described below.
+"Pair Coeffs" or "PairIJ Coeffs" section of the data file read by the
+:doc:`read_data <read_data>` command, or by mixing as described below.

 For all of the *hybrid*, *hybrid/overlay*, and *hybrid/scaled* styles,
 every atom type pair I,J (where I <= J) must be assigned to at least one
@ -208,14 +208,21 @@ examples above, or in the data file read by the :doc:`read_data
 <read_data>`, or by mixing as described below.  Also all sub-styles
 must be used at least once in a :doc:`pair_coeff <pair_coeff>` command.

+.. note::
+
+   LAMMPS never performs mixing of parameters from different sub-styles,
+   **even** if they use the same type of coefficients, e.g. contain
+   a Lennard-Jones potential variant.  Those parameters must be provided
+   explicitly.
+
 If you want there to be no interactions between a particular pair of
-atom types, you have 3 choices.  You can assign the type pair to some
-sub-style and use the :doc:`neigh_modify exclude type <neigh_modify>`
+atom types, you have 3 choices.  You can assign the pair of atom types
+to some sub-style and use the :doc:`neigh_modify exclude type <neigh_modify>`
 command.  You can assign it to some sub-style and set the coefficients
 so that there is effectively no interaction (e.g. epsilon = 0.0 in a LJ
 potential).  Or, for *hybrid*, *hybrid/overlay*, or *hybrid/scaled*
 simulations, you can use this form of the pair_coeff command in your
-input script:
+input script or the "PairIJ Coeffs" section of your data file:

 .. code-block:: LAMMPS

@ -238,19 +245,20 @@ styles with different requirements.

 ----------

-Different force fields (e.g. CHARMM vs AMBER) may have different rules
-for applying weightings that change the strength of pairwise
-interactions between pairs of atoms that are also 1-2, 1-3, and 1-4
-neighbors in the molecular bond topology, as normally set by the
-:doc:`special_bonds <special_bonds>` command.  Different weights can be
-assigned to different pair hybrid sub-styles via the :doc:`pair_modify
-special <pair_modify>` command. This allows multiple force fields to be
-used in a model of a hybrid system, however, there is no consistent
-approach to determine parameters automatically for the interactions
-between the two force fields, this is only recommended when particles
+Different force fields (e.g. CHARMM vs. AMBER) may have different rules
+for applying exclusions or weights that change the strength of pairwise
+non-bonded interactions between pairs of atoms that are also 1-2, 1-3,
+and 1-4 neighbors in the molecular bond topology. This is normally a
+global setting defined the :doc:`special_bonds <special_bonds>` command.
+However, different weights can be assigned to different hybrid
+sub-styles via the :doc:`pair_modify special <pair_modify>` command.
+This allows multiple force fields to be used in a model of a hybrid
+system, however, there is no consistent approach to determine parameters
+automatically for the interactions **between** atoms of the two force
+fields, thus this approach this is only recommended when particles
 described by the different force fields do not mix.

-Here is an example for mixing CHARMM and AMBER: The global *amber*
+Here is an example for combining CHARMM and AMBER: The global *amber*
 setting sets the 1-4 interactions to non-zero scaling factors and
 then overrides them with 0.0 only for CHARMM:

@ -260,7 +268,7 @@ then overrides them with 0.0 only for CHARMM:
   pair_style hybrid lj/charmm/coul/long 8.0 10.0 lj/cut/coul/long 10.0
   pair_modify pair lj/charmm/coul/long special lj/coul 0.0 0.0 0.0

-The this input achieves the same effect:
+This input achieves the same effect:

 .. code-block:: LAMMPS

@ -270,9 +278,9 @@ The this input achieves the same effect:
   pair_modify pair lj/cut/coul/long special coul 0.0 0.0 0.83333333
   pair_modify pair lj/charmm/coul/long special lj/coul 0.0 0.0 0.0

-Here is an example for mixing Tersoff with OPLS/AA based on
-a data file that defines bonds for all atoms where for the
-Tersoff part of the system the force constants for the bonded
+Here is an example for combining Tersoff with OPLS/AA based on
+a data file that defines bonds for all atoms where - for the
+Tersoff part of the system - the force constants for the bonded
 interactions have been set to 0. Note the global settings are
 effectively *lj/coul 0.0 0.0 0.5* as required for OPLS/AA:

--- a/doc/src/pair_oxdna.rst
+++ b/doc/src/pair_oxdna.rst
@ -104,10 +104,10 @@ A simple python setup tool which creates single straight or helical DNA strands,
 DNA duplexes or arrays of DNA duplexes can be found in examples/PACKAGES/cgdna/util/.

 Please cite :ref:`(Henrich) <Henrich1>` in any publication that uses
-this implementation.  The article contains general information
+this implementation. An updated documentation that contains general information
 on the model, its implementation and performance as well as the structure of
-the data and input file. The preprint version of the article can be found
-`here <PDF/CG-DNA.pdf>`_.
+the data and input file can be found `here <PDF/CG-DNA.pdf>`_.
+
 Please cite also the relevant oxDNA publications
 :ref:`(Ouldridge) <Ouldridge1>`,
 :ref:`(Ouldridge-DPhil) <Ouldridge-DPhil1>`
--- a/doc/src/pair_oxdna2.rst
+++ b/doc/src/pair_oxdna2.rst
@ -113,10 +113,10 @@ A simple python setup tool which creates single straight or helical DNA strands,
 DNA duplexes or arrays of DNA duplexes can be found in examples/PACKAGES/cgdna/util/.

 Please cite :ref:`(Henrich) <Henrich2>` in any publication that uses
-this implementation.  The article contains general information
+this implementation. An updated documentation that contains general information
 on the model, its implementation and performance as well as the structure of
-the data and input file. The preprint version of the article can be found
-`here <PDF/CG-DNA.pdf>`_.
+the data and input file can be found `here <PDF/CG-DNA.pdf>`_.
+
 Please cite also the relevant oxDNA2 publications
 :ref:`(Snodin) <Snodin2>` and :ref:`(Sulc) <Sulc2>`.

--- a/doc/src/read_data.rst
+++ b/doc/src/read_data.rst
@ -619,7 +619,7 @@ of analysis.
   * - bond
     - atom-ID molecule-ID atom-type x y z
   * - charge
-     - atom-type q x y z
+     - atom-ID atom-type q x y z
   * - dipole
     - atom-ID atom-type q x y z mux muy muz
   * - dpd
--- a/doc/utils/requirements.txt
+++ b/doc/utils/requirements.txt
@ -1,7 +1,7 @@
 Sphinx==4.0.3
-sphinxcontrib-spelling
+sphinxcontrib-spelling==7.2.1
 git+git://github.com/akohlmey/sphinx-fortran@parallel-read
-sphinx_tabs
-breathe
-Pygments
-six
+sphinx_tabs==3.2.0
+breathe==4.31.0
+Pygments==2.10.0
+six==1.16.0
--- a/doc/utils/sphinx-config/conf.py.in
+++ b/doc/utils/sphinx-config/conf.py.in
@ -418,6 +418,7 @@ html_context['current_version'] = os.environ.get('LAMMPS_WEBSITE_BUILD_VERSION',
 html_context['git_commit'] = git_commit
 html_context['versions'] = [
  ('latest', 'https://docs.lammps.org/latest/'),
+  ('stable', 'https://docs.lammps.org/stable/'),
  (version, 'https://docs.lammps.org/')
 ]
 html_context['downloads'] = [('PDF', 'Manual.pdf')]
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@ -72,7 +72,8 @@ Alexey
 ali
 aliceblue
 Allinger
-allocaters
+allocator
+allocators
 allosws
 AlO
 Alonso
@ -1008,6 +1009,7 @@ FFmpeg
 ffplay
 fft
 fftbench
+fftMPI
 fftw
 fgets
 fhg
@ -1384,6 +1386,7 @@ inhomogeneities
 inhomogeneous
 init
 initialdelay
+initialisms
 initializations
 InitiatorIDs
 initio
@ -1690,6 +1693,7 @@ Lett
 Leuven
 Leven
 Lewy
+LF
 LGPL
 lgvdw
 Liang
@ -2095,6 +2099,7 @@ Murdick
 Murtola
 Murty
 Muser
+mutexes
 Muto
 muVT
 mux
@ -2435,6 +2440,7 @@ packings
 padua
 Padua
 pafi
+PairIJ
 palegoldenrod
 palegreen
 paleturquoise
--- a/examples/COUPLE/lammps_quest/README
+++ b/examples/COUPLE/lammps_quest/README
@ -1,15 +1,21 @@
+IMPORTANT NOTE: This example has not been updated since 2014,
+so it is not likely to work anymore out of the box.  There have
+been changes to LAMMPS and its library interface that would need
+to be applied. Please see the manual for the documentation of
+the library interface.
+
 This directory has an application that runs classical MD via LAMMPS,
 but uses quantum forces calculated by the Quest DFT (density
 functional) code in place of the usual classical MD forces calculated
 by a pair style in LAMMPS.

-lmpqst.cpp	    main program
-		    it links LAMMPS as a library
-		    it invokes Quest as an executable
-in.lammps	    LAMMPS input script, without the run command
-si_111.in	    Quest input script for an 8-atom Si unit cell
-lmppath.h	    contains path to LAMMPS home directory
-qstexe.h	    contains full pathname to Quest executable
+lmpqst.cpp          main program
+                    it links LAMMPS as a library
+                    it invokes Quest as an executable
+in.lammps           LAMMPS input script, without the run command
+si_111.in           Quest input script for an 8-atom Si unit cell
+lmppath.h           contains path to LAMMPS home directory
+qstexe.h            contains full pathname to Quest executable

 After editing the Makefile, lmppath.h, and qstexe.h to make them
 suitable for your box, type:
--- a/examples/COUPLE/lammps_spparks/README
+++ b/examples/COUPLE/lammps_spparks/README
@ -1,3 +1,9 @@
+IMPORTANT NOTE: This example has not been updated since 2013,
+so it is not likely to work anymore out of the box.  There have
+been changes to LAMMPS and its library interface that would need
+to be applied. Please see the manual for the documentation of
+the library interface.
+
 This directory has an application that models grain growth in the
 presence of strain.

--- a/examples/COUPLE/multiple/multiple.cpp
+++ b/examples/COUPLE/multiple/multiple.cpp
@ -5,7 +5,7 @@

   Copyright (2003) Sandia Corporation.  Under the terms of Contract
   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
-   certain rights in this software.  This software is distributed under 
+   certain rights in this software.  This software is distributed under
   the GNU General Public License.

   See the README file in the top-level LAMMPS directory.
@ -28,13 +28,9 @@
 #include <cstdlib>
 #include <cstring>

-#include "lammps.h"         // these are LAMMPS include files
-#include "input.h"
-#include "atom.h"
+#define LAMMPS_LIB_MPI  // to make lammps_open() visible
 #include "library.h"

-using namespace LAMMPS_NS;
-
 int main(int narg, char **arg)
 {
  // setup MPI and various communicators
@ -68,13 +64,13 @@ int main(int narg, char **arg)
  int instance = me*ninstance / nprocs;
  MPI_Comm comm_lammps;
  MPI_Comm_split(MPI_COMM_WORLD,instance,0,&comm_lammps);
-  
+
  // each instance: unique screen file, log file, temperature

  char str1[32],str2[32],str3[32];

  char **lmparg = new char*[8];
-  lmparg[0] = NULL;                 // required placeholder for program name
+  lmparg[0] = (char *) "LAMMPS";              // required placeholder for program name
  lmparg[1] = (char *) "-screen";
  sprintf(str1,"screen.%d",instance);
  lmparg[2] = str1;
@ -86,13 +82,9 @@ int main(int narg, char **arg)
  sprintf(str3,"%g",temperature + instance*tdelta);
  lmparg[7] = str3;

-  // open N instances of LAMMPS
-  // either of these methods will work
+  // create N instances of LAMMPS

-  LAMMPS *lmp = new LAMMPS(8,lmparg,comm_lammps);
-
-  //LAMMPS *lmp;
-  //lammps_open(8,lmparg,comm_lammps,(void **) &lmp);
+  void *lmp = lammps_open(8,lmparg,comm_lammps,NULL);

  delete [] lmparg;

@ -102,8 +94,8 @@ int main(int narg, char **arg)

  // query final temperature and print result for each instance

-  double *ptr = (double *) 
-    lammps_extract_compute(lmp,(char *) "thermo_temp",0,0);
+  double *ptr = (double *)
+    lammps_extract_compute(lmp,"thermo_temp",LMP_STYLE_GLOBAL,LMP_TYPE_SCALAR);
  double finaltemp = *ptr;

  double *temps = new double[ninstance];
@ -112,7 +104,7 @@ int main(int narg, char **arg)
  int me_lammps;
  MPI_Comm_rank(comm_lammps,&me_lammps);
  if (me_lammps == 0) temps[instance] = finaltemp;
-  
+
  double *alltemps = new double[ninstance];
  MPI_Allreduce(temps,alltemps,ninstance,MPI_DOUBLE,MPI_SUM,MPI_COMM_WORLD);

@ -125,7 +117,7 @@ int main(int narg, char **arg)

  // delete LAMMPS instances

-  delete lmp;
+  lammps_close(lmp);

  // close down MPI

--- a/examples/COUPLE/plugin/README
+++ b/examples/COUPLE/plugin/README
@ -13,7 +13,7 @@ like below.

 mpicc -c -O -Wall -g -I$HOME/lammps/src liblammpsplugin.c
 mpicc -c -O -Wall -g simple.c
-mpicc simple.o liblammsplugin.o -ldl -o simpleC
+mpicc simple.o liblammpsplugin.o -ldl -o simpleC

 You also need to build LAMMPS as a shared library
 (see examples/COUPLE/README), e.g. 
--- a/examples/COUPLE/plugin/liblammpsplugin.c
+++ b/examples/COUPLE/plugin/liblammpsplugin.c
@ -31,51 +31,105 @@ liblammpsplugin_t *liblammpsplugin_load(const char *lib)
  if (lib == NULL) return NULL;
  handle = dlopen(lib,RTLD_NOW|RTLD_GLOBAL);
  if (handle == NULL) return NULL;
-  
+
  lmp = (liblammpsplugin_t *) malloc(sizeof(liblammpsplugin_t));
  lmp->handle = handle;

 #define ADDSYM(symbol) lmp->symbol = dlsym(handle,"lammps_" #symbol)
  ADDSYM(open);
  ADDSYM(open_no_mpi);
+  ADDSYM(open_fortran);
  ADDSYM(close);
-  ADDSYM(version);
+
+  ADDSYM(mpi_init);
+  ADDSYM(mpi_finalize);
+  ADDSYM(kokkos_finalize);
+  ADDSYM(python_finalize);
+
  ADDSYM(file);
  ADDSYM(command);
  ADDSYM(commands_list);
  ADDSYM(commands_string);
-  ADDSYM(free);
-  ADDSYM(extract_setting);
-  ADDSYM(extract_global);
+
+  ADDSYM(get_natoms);
+  ADDSYM(get_thermo);
+
  ADDSYM(extract_box);
+  ADDSYM(reset_box);
+
+  ADDSYM(memory_usage);
+  ADDSYM(get_mpi_comm);
+
+  ADDSYM(extract_setting);
+  ADDSYM(extract_global_datatype);
+  ADDSYM(extract_global);
+
+  ADDSYM(extract_atom_datatype);
  ADDSYM(extract_atom);
+
  ADDSYM(extract_compute);
  ADDSYM(extract_fix);
  ADDSYM(extract_variable);
-
-  ADDSYM(get_thermo);
-  ADDSYM(get_natoms);
-
  ADDSYM(set_variable);
-  ADDSYM(reset_box);

  ADDSYM(gather_atoms);
  ADDSYM(gather_atoms_concat);
  ADDSYM(gather_atoms_subset);
  ADDSYM(scatter_atoms);
  ADDSYM(scatter_atoms_subset);
+  ADDSYM(gather_bonds);

-  ADDSYM(set_fix_external_callback);
+  ADDSYM(create_atoms);

-  ADDSYM(config_has_package);
-  ADDSYM(config_package_count);
-  ADDSYM(config_package_name);
+  ADDSYM(find_pair_neighlist);
+  ADDSYM(find_fix_neighlist);
+  ADDSYM(find_compute_neighlist);
+  ADDSYM(neighlist_num_elements);
+  ADDSYM(neighlist_element_neighbors);
+
+  ADDSYM(version);
+  ADDSYM(get_os_info);
+
+  ADDSYM(config_has_mpi_support);
  ADDSYM(config_has_gzip_support);
  ADDSYM(config_has_png_support);
  ADDSYM(config_has_jpeg_support);
  ADDSYM(config_has_ffmpeg_support);
  ADDSYM(config_has_exceptions);
-  ADDSYM(create_atoms);
+
+  ADDSYM(config_has_package);
+  ADDSYM(config_package_count);
+  ADDSYM(config_package_name);
+
+  ADDSYM(config_accelerator);
+  ADDSYM(has_gpu_device);
+  ADDSYM(get_gpu_device_info);
+
+  ADDSYM(has_style);
+  ADDSYM(style_count);
+  ADDSYM(style_name);
+
+  ADDSYM(has_id);
+  ADDSYM(id_count);
+  ADDSYM(id_name);
+
+  ADDSYM(plugin_count);
+  ADDSYM(plugin_name);
+
+  ADDSYM(set_fix_external_callback);
+  ADDSYM(fix_external_get_force);
+  ADDSYM(fix_external_set_energy_global);
+  ADDSYM(fix_external_set_energy_peratom);
+  ADDSYM(fix_external_set_virial_global);
+  ADDSYM(fix_external_set_virial_peratom);
+  ADDSYM(fix_external_set_vector_length);
+  ADDSYM(fix_external_set_vector);
+
+  ADDSYM(free);
+
+  ADDSYM(is_running);
+  ADDSYM(force_timeout);
+
 #ifdef LAMMPS_EXCEPTIONS
  lmp->has_exceptions = 1;
  ADDSYM(has_error);
--- a/examples/COUPLE/plugin/liblammpsplugin.h
+++ b/examples/COUPLE/plugin/liblammpsplugin.h
@ -39,75 +39,121 @@ extern "C" {

 #if defined(LAMMPS_BIGBIG)
 typedef void (*FixExternalFnPtr)(void *, int64_t, int, int64_t *, double **, double **);
-#elif defined(LAMMPS_SMALLBIG)
-typedef void (*FixExternalFnPtr)(void *, int64_t, int, int *, double **, double **);
-#else
+#elif defined(LAMMPS_SMALLSMALL)
 typedef void (*FixExternalFnPtr)(void *, int, int, int *, double **, double **);
+#else
+typedef void (*FixExternalFnPtr)(void *, int64_t, int, int *, double **, double **);
 #endif

-  
 struct _liblammpsplugin {
  int abiversion;
  int has_exceptions;
  void *handle;
-  void (*open)(int, char **, MPI_Comm, void **);
-  void (*open_no_mpi)(int, char **, void **);
+  void *(*open)(int, char **, MPI_Comm, void **);
+  void *(*open_no_mpi)(int, char **, void **);
+  void *(*open_fortran)(int, char **, void **, int);
  void (*close)(void *);
-  int  (*version)(void *);
+
+  void (*mpi_init)();
+  void (*mpi_finalize)();
+  void (*kokkos_finalize)();
+  void (*python_finalize)();
+
  void (*file)(void *, char *);
-  char *(*command)(void *, char *);
-  void (*commands_list)(void *, int, char **);
-  void (*commands_string)(void *, char *);
-  void (*free)(void *);
-  int (*extract_setting)(void *, char *);
-  void *(*extract_global)(void *, char *);
-  void (*extract_box)(void *, double *, double *,
-		      double *, double *, double *, int *, int *);
-  void *(*extract_atom)(void *, char *);
-  void *(*extract_compute)(void *, char *, int, int);
-  void *(*extract_fix)(void *, char *, int, int, int, int);
-  void *(*extract_variable)(void *, char *, char *);
+  char *(*command)(void *, const char *);
+  void (*commands_list)(void *, int, const char **);
+  void (*commands_string)(void *, const char *);

+  double (*get_natoms)(void *);
  double (*get_thermo)(void *, char *);
-  int (*get_natoms)(void *);

-  int (*set_variable)(void *, char *, char *);
+  void (*extract_box)(void *, double *, double *,
+                      double *, double *, double *, int *, int *);
  void (*reset_box)(void *, double *, double *, double, double, double);

+  void (*memory_usage)(void *, double *);
+  int (*get_mpi_comm)(void *);
+
+  int (*extract_setting)(void *, const char *);
+  int *(*extract_global_datatype)(void *, const char *);
+  void *(*extract_global)(void *, const char *);
+
+  void *(*extract_atom_datatype)(void *, const char *);
+  void *(*extract_atom)(void *, const char *);
+
+  void *(*extract_compute)(void *, const char *, int, int);
+  void *(*extract_fix)(void *, const char *, int, int, int, int);
+  void *(*extract_variable)(void *, const char *, char *);
+  int (*set_variable)(void *, char *, char *);
+
  void (*gather_atoms)(void *, char *, int, int, void *);
  void (*gather_atoms_concat)(void *, char *, int, int, void *);
  void (*gather_atoms_subset)(void *, char *, int, int, int, int *, void *);
  void (*scatter_atoms)(void *, char *, int, int, void *);
  void (*scatter_atoms_subset)(void *, char *, int, int, int, int *, void *);

-  void (*set_fix_external_callback)(void *, char *, FixExternalFnPtr, void*);
+  void (*gather_bonds)(void *, void *);
+  
+// lammps_create_atoms() takes tagint and imageint as args
+// ifdef insures they are compatible with rest of LAMMPS
+// caller must match to how LAMMPS library is built

-  int (*config_has_package)(char * package_name);
-  int (*config_package_count)();
-  int (*config_package_name)(int index, char * buffer, int max_size);
+#ifndef LAMMPS_BIGBIG
+ void (*create_atoms)(void *, int, int *, int *, double *,
+                      double *, int *, int);
+#else
+  void (*create_atoms)(void *, int, int64_t *, int *, double *,
+                       double *, int64_t *, int);
+#endif
+
+  int (*find_pair_neighlist)(void *, const char *, int, int, int);
+  int (*find_fix_neighlist)(void *, const char *, int);
+  int (*find_compute_neighlist)(void *, char *, int);
+  int (*neighlist_num_elements)(void *, int);
+  void (*neighlist_element_neighbors)(void *, int, int, int *, int *, int **);
+
+  int (*version)(void *);
+  void (*get_os_info)(char *, int);
+
+  int (*config_has_mpi_support)();
  int (*config_has_gzip_support)();
  int (*config_has_png_support)();
  int (*config_has_jpeg_support)();
  int (*config_has_ffmpeg_support)();
  int (*config_has_exceptions)();

-  int (*find_pair_neighlist)(void* ptr, char * style, int exact, int nsub, int request);
-  int (*find_fix_neighlist)(void* ptr, char * id, int request);
-  int (*find_compute_neighlist)(void* ptr, char * id, int request);
-  int (*neighlist_num_elements)(void* ptr, int idx);
-  void (*neighlist_element_neighbors)(void * ptr, int idx, int element, int * iatom, int * numneigh, int ** neighbors);
+  int (*config_has_package)(const char *);
+  int (*config_package_count)();
+  int (*config_package_name)(int, char *, int);

-// lammps_create_atoms() takes tagint and imageint as args
-// ifdef insures they are compatible with rest of LAMMPS
-// caller must match to how LAMMPS library is built
+  int (*config_accelerator)(const char *, const char *, const char *);
+  int (*has_gpu_device)();
+  void (*get_gpu_device_info)(char *, int);

-#ifdef LAMMPS_BIGBIG
-  void (*create_atoms)(void *, int, int64_t *, int *,
-                         double *, double *, int64_t *, int);
-#else
-  void (*create_atoms)(void *, int, int *, int *,
-                         double *, double *, int *, int);
-#endif
+  int (*has_style)(void *, const char *, const char *);
+  int (*style_count)(void *, const char *);
+  int (*style_name)(void *, const char *, int, char *, int);
+
+  int (*has_id)(void *, const char *, const char *);
+  int (*id_count)(void *, const char *);
+  int (*id_name)(void *, const char *, int, char *, int);
+
+  int (*plugin_count)();
+  int (*plugin_name)(int, char *, char *, int);
+
+  void (*set_fix_external_callback)(void *, const char *, FixExternalFnPtr, void*);
+  void (*fix_external_get_force)(void *, const char *);
+  void (*fix_external_set_energy_global)(void *, const char *, double);
+  void (*fix_external_set_energy_peratom)(void *, const char *, double *);
+  void (*fix_external_set_virial_global)(void *, const char *, double *);
+  void (*fix_external_set_virial_peratom)(void *, const char *, double **);
+  void (*fix_external_set_vector_length)(void *, const char *, int);
+  void (*fix_external_set_vector)(void *, const char *, int, double);
+
+  void (*free)(void *);
+
+  void (*is_running)(void *);
+  void (*force_timeout)(void *);

  int (*has_error)(void *);
  int (*get_last_error_message)(void *, char *, int);
@ -117,7 +163,7 @@ typedef struct _liblammpsplugin liblammpsplugin_t;

 liblammpsplugin_t *liblammpsplugin_load(const char *);
 int liblammpsplugin_release(liblammpsplugin_t *);
-  
+
 #undef LAMMPS
 #ifdef __cplusplus
 }
--- a/examples/COUPLE/plugin/log.simple.plugin.1
+++ b/examples/COUPLE/plugin/log.simple.plugin.1
@ -1,9 +1,12 @@
-LAMMPS (18 Feb 2020)
-Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
-Created orthogonal box = (0 0 0) to (6.71838 6.71838 6.71838)
+LAMMPS (31 Aug 2021)
+OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
+  using 1 OpenMP thread(s) per MPI task
+Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
+Created orthogonal box = (0.0000000 0.0000000 0.0000000) to (6.7183848 6.7183848 6.7183848)
  1 by 1 by 1 MPI processor grid
 Created 256 atoms
-  create_atoms CPU = 0.000297844 secs
+  using lattice units in orthogonal box = (0.0000000 0.0000000 0.0000000) to (6.7183848 6.7183848 6.7183848)
+  create_atoms CPU = 0.001 seconds
 Neighbor list info ...
  update every 20 steps, delay 0 steps, check no
  max neighbors/atom: 2000, page size: 100000
@ -14,108 +17,108 @@ Neighbor list info ...
  (1) pair lj/cut, perpetual
      attributes: half, newton on
      pair build: half/bin/atomonly/newton
-      stencil: half/bin/3d/newton
+      stencil: half/bin/3d
      bin: standard
 Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
-Per MPI rank memory allocation (min/avg/max) = 2.63 | 2.63 | 2.63 Mbytes
+Per MPI rank memory allocation (min/avg/max) = 2.630 | 2.630 | 2.630 Mbytes
 Step Temp E_pair E_mol TotEng Press 
       0         1.44   -6.7733681            0   -4.6218056   -5.0244179 
      10    1.1298532   -6.3095502            0   -4.6213906   -2.6058175 
-Loop time of 0.00164276 on 1 procs for 10 steps with 256 atoms
+Loop time of 0.00239712 on 1 procs for 10 steps with 256 atoms

-Performance: 2629719.113 tau/day, 6087.313 timesteps/s
-93.7% CPU use with 1 MPI tasks x no OpenMP threads
+Performance: 1802163.347 tau/day, 4171.674 timesteps/s
+97.2% CPU use with 1 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0014956  | 0.0014956  | 0.0014956  |   0.0 | 91.04
+Pair    | 0.0020572  | 0.0020572  | 0.0020572  |   0.0 | 85.82
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 8.045e-05  | 8.045e-05  | 8.045e-05  |   0.0 |  4.90
-Output  | 1.1399e-05 | 1.1399e-05 | 1.1399e-05 |   0.0 |  0.69
-Modify  | 3.7431e-05 | 3.7431e-05 | 3.7431e-05 |   0.0 |  2.28
-Other   |            | 1.789e-05  |            |       |  1.09
+Comm    | 0.00018731 | 0.00018731 | 0.00018731 |   0.0 |  7.81
+Output  | 4.478e-05  | 4.478e-05  | 4.478e-05  |   0.0 |  1.87
+Modify  | 6.3637e-05 | 6.3637e-05 | 6.3637e-05 |   0.0 |  2.65
+Other   |            | 4.419e-05  |            |       |  1.84

-Nlocal:    256 ave 256 max 256 min
+Nlocal:        256.000 ave         256 max         256 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Nghost:    1431 ave 1431 max 1431 min
+Nghost:        1431.00 ave        1431 max        1431 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Neighs:    9984 ave 9984 max 9984 min
+Neighs:        9984.00 ave        9984 max        9984 min
 Histogram: 1 0 0 0 0 0 0 0 0 0

 Total # of neighbors = 9984
-Ave neighs/atom = 39
+Ave neighs/atom = 39.000000
 Neighbor list builds = 0
 Dangerous builds not checked
 Setting up Verlet run ...
  Unit style    : lj
  Current step  : 10
  Time step     : 0.005
-Per MPI rank memory allocation (min/avg/max) = 2.63 | 2.63 | 2.63 Mbytes
+Per MPI rank memory allocation (min/avg/max) = 2.630 | 2.630 | 2.630 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      10    1.1298532   -6.3095502            0   -4.6213906   -2.6058175 
      20    0.6239063    -5.557644            0   -4.6254403   0.97451173 
-Loop time of 0.00199768 on 1 procs for 10 steps with 256 atoms
+Loop time of 0.00329271 on 1 procs for 10 steps with 256 atoms

-Performance: 2162504.180 tau/day, 5005.797 timesteps/s
-99.8% CPU use with 1 MPI tasks x no OpenMP threads
+Performance: 1311987.619 tau/day, 3037.008 timesteps/s
+96.4% CPU use with 1 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0018518  | 0.0018518  | 0.0018518  |   0.0 | 92.70
+Pair    | 0.0029015  | 0.0029015  | 0.0029015  |   0.0 | 88.12
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 7.9768e-05 | 7.9768e-05 | 7.9768e-05 |   0.0 |  3.99
-Output  | 1.1433e-05 | 1.1433e-05 | 1.1433e-05 |   0.0 |  0.57
-Modify  | 3.6904e-05 | 3.6904e-05 | 3.6904e-05 |   0.0 |  1.85
-Other   |            | 1.773e-05  |            |       |  0.89
+Comm    | 0.00021807 | 0.00021807 | 0.00021807 |   0.0 |  6.62
+Output  | 4.9163e-05 | 4.9163e-05 | 4.9163e-05 |   0.0 |  1.49
+Modify  | 7.0573e-05 | 7.0573e-05 | 7.0573e-05 |   0.0 |  2.14
+Other   |            | 5.339e-05  |            |       |  1.62

-Nlocal:    256 ave 256 max 256 min
+Nlocal:        256.000 ave         256 max         256 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Nghost:    1431 ave 1431 max 1431 min
+Nghost:        1431.00 ave        1431 max        1431 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Neighs:    9952 ave 9952 max 9952 min
+Neighs:        9952.00 ave        9952 max        9952 min
 Histogram: 1 0 0 0 0 0 0 0 0 0

 Total # of neighbors = 9952
-Ave neighs/atom = 38.875
+Ave neighs/atom = 38.875000
 Neighbor list builds = 0
 Dangerous builds not checked
 Setting up Verlet run ...
  Unit style    : lj
  Current step  : 20
  Time step     : 0.005
-Per MPI rank memory allocation (min/avg/max) = 2.63 | 2.63 | 2.63 Mbytes
+Per MPI rank memory allocation (min/avg/max) = 2.630 | 2.630 | 2.630 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      20    0.6239063   -5.5404291            0   -4.6082254    1.0394285 
      21   0.63845863   -5.5628733            0   -4.6089263   0.99398278 
-Loop time of 0.000304321 on 1 procs for 1 steps with 256 atoms
+Loop time of 0.000638039 on 1 procs for 1 steps with 256 atoms

-Performance: 1419553.695 tau/day, 3286.004 timesteps/s
-98.9% CPU use with 1 MPI tasks x no OpenMP threads
+Performance: 677074.599 tau/day, 1567.302 timesteps/s
+98.9% CPU use with 1 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.00027815 | 0.00027815 | 0.00027815 |   0.0 | 91.40
+Pair    | 0.00042876 | 0.00042876 | 0.00042876 |   0.0 | 67.20
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 8.321e-06  | 8.321e-06  | 8.321e-06  |   0.0 |  2.73
-Output  | 1.0513e-05 | 1.0513e-05 | 1.0513e-05 |   0.0 |  3.45
-Modify  | 3.968e-06  | 3.968e-06  | 3.968e-06  |   0.0 |  1.30
-Other   |            | 3.365e-06  |            |       |  1.11
+Comm    | 5.2872e-05 | 5.2872e-05 | 5.2872e-05 |   0.0 |  8.29
+Output  | 0.00012218 | 0.00012218 | 0.00012218 |   0.0 | 19.15
+Modify  | 1.3762e-05 | 1.3762e-05 | 1.3762e-05 |   0.0 |  2.16
+Other   |            | 2.047e-05  |            |       |  3.21

-Nlocal:    256 ave 256 max 256 min
+Nlocal:        256.000 ave         256 max         256 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Nghost:    1431 ave 1431 max 1431 min
+Nghost:        1431.00 ave        1431 max        1431 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Neighs:    9705 ave 9705 max 9705 min
+Neighs:        9705.00 ave        9705 max        9705 min
 Histogram: 1 0 0 0 0 0 0 0 0 0

 Total # of neighbors = 9705
-Ave neighs/atom = 37.9102
+Ave neighs/atom = 37.910156
 Neighbor list builds = 0
 Dangerous builds not checked
 Force on 1 atom via extract_atom: 26.9581
@ -124,136 +127,136 @@ Setting up Verlet run ...
  Unit style    : lj
  Current step  : 21
  Time step     : 0.005
-Per MPI rank memory allocation (min/avg/max) = 2.63 | 2.63 | 2.63 Mbytes
+Per MPI rank memory allocation (min/avg/max) = 2.630 | 2.630 | 2.630 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      21   0.63845863   -5.5628733            0   -4.6089263   0.99398278 
      31    0.7494946   -5.7306417            0   -4.6107913   0.41043597 
-Loop time of 0.00196027 on 1 procs for 10 steps with 256 atoms
+Loop time of 0.00281277 on 1 procs for 10 steps with 256 atoms

-Performance: 2203779.175 tau/day, 5101.341 timesteps/s
-99.7% CPU use with 1 MPI tasks x no OpenMP threads
+Performance: 1535852.558 tau/day, 3555.214 timesteps/s
+92.6% CPU use with 1 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0018146  | 0.0018146  | 0.0018146  |   0.0 | 92.57
+Pair    | 0.0024599  | 0.0024599  | 0.0024599  |   0.0 | 87.45
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 8.0268e-05 | 8.0268e-05 | 8.0268e-05 |   0.0 |  4.09
-Output  | 1.0973e-05 | 1.0973e-05 | 1.0973e-05 |   0.0 |  0.56
-Modify  | 3.6913e-05 | 3.6913e-05 | 3.6913e-05 |   0.0 |  1.88
-Other   |            | 1.756e-05  |            |       |  0.90
+Comm    | 0.00020234 | 0.00020234 | 0.00020234 |   0.0 |  7.19
+Output  | 3.6436e-05 | 3.6436e-05 | 3.6436e-05 |   0.0 |  1.30
+Modify  | 6.7542e-05 | 6.7542e-05 | 6.7542e-05 |   0.0 |  2.40
+Other   |            | 4.655e-05  |            |       |  1.65

-Nlocal:    256 ave 256 max 256 min
+Nlocal:        256.000 ave         256 max         256 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Nghost:    1431 ave 1431 max 1431 min
+Nghost:        1431.00 ave        1431 max        1431 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Neighs:    9688 ave 9688 max 9688 min
+Neighs:        9688.00 ave        9688 max        9688 min
 Histogram: 1 0 0 0 0 0 0 0 0 0

 Total # of neighbors = 9688
-Ave neighs/atom = 37.8438
+Ave neighs/atom = 37.843750
 Neighbor list builds = 0
 Dangerous builds not checked
 Setting up Verlet run ...
  Unit style    : lj
  Current step  : 31
  Time step     : 0.005
-Per MPI rank memory allocation (min/avg/max) = 2.63 | 2.63 | 2.63 Mbytes
+Per MPI rank memory allocation (min/avg/max) = 2.630 | 2.630 | 2.630 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      31    0.7494946   -5.7306417            0   -4.6107913   0.41043597 
      51   0.71349216   -5.6772387            0   -4.6111811   0.52117681 
-Loop time of 0.00433063 on 1 procs for 20 steps with 256 atoms
+Loop time of 0.00560916 on 1 procs for 20 steps with 256 atoms

-Performance: 1995088.941 tau/day, 4618.261 timesteps/s
-99.3% CPU use with 1 MPI tasks x no OpenMP threads
+Performance: 1540338.414 tau/day, 3565.598 timesteps/s
+99.2% CPU use with 1 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0035121  | 0.0035121  | 0.0035121  |   0.0 | 81.10
-Neigh   | 0.00050258 | 0.00050258 | 0.00050258 |   0.0 | 11.61
-Comm    | 0.00019444 | 0.00019444 | 0.00019444 |   0.0 |  4.49
-Output  | 1.2092e-05 | 1.2092e-05 | 1.2092e-05 |   0.0 |  0.28
-Modify  | 7.2917e-05 | 7.2917e-05 | 7.2917e-05 |   0.0 |  1.68
-Other   |            | 3.647e-05  |            |       |  0.84
+Pair    | 0.0044403  | 0.0044403  | 0.0044403  |   0.0 | 79.16
+Neigh   | 0.00056186 | 0.00056186 | 0.00056186 |   0.0 | 10.02
+Comm    | 0.00036797 | 0.00036797 | 0.00036797 |   0.0 |  6.56
+Output  | 3.676e-05  | 3.676e-05  | 3.676e-05  |   0.0 |  0.66
+Modify  | 0.00011282 | 0.00011282 | 0.00011282 |   0.0 |  2.01
+Other   |            | 8.943e-05  |            |       |  1.59

-Nlocal:    256 ave 256 max 256 min
+Nlocal:        256.000 ave         256 max         256 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Nghost:    1421 ave 1421 max 1421 min
+Nghost:        1421.00 ave        1421 max        1421 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Neighs:    9700 ave 9700 max 9700 min
+Neighs:        9700.00 ave        9700 max        9700 min
 Histogram: 1 0 0 0 0 0 0 0 0 0

 Total # of neighbors = 9700
-Ave neighs/atom = 37.8906
+Ave neighs/atom = 37.890625
 Neighbor list builds = 1
 Dangerous builds not checked
 Setting up Verlet run ...
  Unit style    : lj
  Current step  : 51
  Time step     : 0.005
-Per MPI rank memory allocation (min/avg/max) = 2.63 | 2.63 | 2.63 Mbytes
+Per MPI rank memory allocation (min/avg/max) = 2.630 | 2.630 | 2.630 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      51   0.71349216   -5.6772387            0   -4.6111811   0.52117681 
      61   0.78045421   -5.7781094            0   -4.6120011  0.093808941 
-Loop time of 0.00196567 on 1 procs for 10 steps with 256 atoms
+Loop time of 0.00373815 on 1 procs for 10 steps with 256 atoms

-Performance: 2197727.285 tau/day, 5087.332 timesteps/s
-99.7% CPU use with 1 MPI tasks x no OpenMP threads
+Performance: 1155650.623 tau/day, 2675.117 timesteps/s
+98.0% CPU use with 1 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0018222  | 0.0018222  | 0.0018222  |   0.0 | 92.70
+Pair    | 0.0030908  | 0.0030908  | 0.0030908  |   0.0 | 82.68
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 7.8285e-05 | 7.8285e-05 | 7.8285e-05 |   0.0 |  3.98
-Output  | 1.0862e-05 | 1.0862e-05 | 1.0862e-05 |   0.0 |  0.55
-Modify  | 3.6719e-05 | 3.6719e-05 | 3.6719e-05 |   0.0 |  1.87
-Other   |            | 1.764e-05  |            |       |  0.90
+Comm    | 0.00038189 | 0.00038189 | 0.00038189 |   0.0 | 10.22
+Output  | 4.1615e-05 | 4.1615e-05 | 4.1615e-05 |   0.0 |  1.11
+Modify  | 0.00013851 | 0.00013851 | 0.00013851 |   0.0 |  3.71
+Other   |            | 8.533e-05  |            |       |  2.28

-Nlocal:    256 ave 256 max 256 min
+Nlocal:        256.000 ave         256 max         256 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Nghost:    1421 ave 1421 max 1421 min
+Nghost:        1421.00 ave        1421 max        1421 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Neighs:    9700 ave 9700 max 9700 min
+Neighs:        9700.00 ave        9700 max        9700 min
 Histogram: 1 0 0 0 0 0 0 0 0 0

 Total # of neighbors = 9700
-Ave neighs/atom = 37.8906
+Ave neighs/atom = 37.890625
 Neighbor list builds = 0
 Dangerous builds not checked
 Setting up Verlet run ...
  Unit style    : lj
  Current step  : 61
  Time step     : 0.005
-Per MPI rank memory allocation (min/avg/max) = 2.63 | 2.63 | 2.63 Mbytes
+Per MPI rank memory allocation (min/avg/max) = 2.630 | 2.630 | 2.630 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      61   0.78045421   -5.7781094            0   -4.6120011  0.093808941 
      81   0.77743907   -5.7735004            0   -4.6118971  0.090822641 
-Loop time of 0.00430528 on 1 procs for 20 steps with 256 atoms
+Loop time of 0.00612177 on 1 procs for 20 steps with 256 atoms

-Performance: 2006838.581 tau/day, 4645.460 timesteps/s
-99.8% CPU use with 1 MPI tasks x no OpenMP threads
+Performance: 1411356.519 tau/day, 3267.029 timesteps/s
+98.6% CPU use with 1 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0034931  | 0.0034931  | 0.0034931  |   0.0 | 81.13
-Neigh   | 0.00050437 | 0.00050437 | 0.00050437 |   0.0 | 11.72
-Comm    | 0.0001868  | 0.0001868  | 0.0001868  |   0.0 |  4.34
-Output  | 1.1699e-05 | 1.1699e-05 | 1.1699e-05 |   0.0 |  0.27
-Modify  | 7.3308e-05 | 7.3308e-05 | 7.3308e-05 |   0.0 |  1.70
-Other   |            | 3.604e-05  |            |       |  0.84
+Pair    | 0.0047211  | 0.0047211  | 0.0047211  |   0.0 | 77.12
+Neigh   | 0.00083088 | 0.00083088 | 0.00083088 |   0.0 | 13.57
+Comm    | 0.00032716 | 0.00032716 | 0.00032716 |   0.0 |  5.34
+Output  | 3.9891e-05 | 3.9891e-05 | 3.9891e-05 |   0.0 |  0.65
+Modify  | 0.00010926 | 0.00010926 | 0.00010926 |   0.0 |  1.78
+Other   |            | 9.346e-05  |            |       |  1.53

-Nlocal:    256 ave 256 max 256 min
+Nlocal:        256.000 ave         256 max         256 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Nghost:    1405 ave 1405 max 1405 min
+Nghost:        1405.00 ave        1405 max        1405 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Neighs:    9701 ave 9701 max 9701 min
+Neighs:        9701.00 ave        9701 max        9701 min
 Histogram: 1 0 0 0 0 0 0 0 0 0

 Total # of neighbors = 9701
-Ave neighs/atom = 37.8945
+Ave neighs/atom = 37.894531
 Neighbor list builds = 1
 Dangerous builds not checked
 Deleted 256 atoms, new total = 0
@ -261,34 +264,34 @@ Setting up Verlet run ...
  Unit style    : lj
  Current step  : 81
  Time step     : 0.005
-Per MPI rank memory allocation (min/avg/max) = 2.63 | 2.63 | 2.63 Mbytes
+Per MPI rank memory allocation (min/avg/max) = 2.630 | 2.630 | 2.630 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      81    0.6239063   -5.5404291            0   -4.6082254    1.0394285 
      91   0.75393007   -5.7375259            0   -4.6110484   0.39357367 
-Loop time of 0.00195843 on 1 procs for 10 steps with 256 atoms
+Loop time of 0.00319065 on 1 procs for 10 steps with 256 atoms

-Performance: 2205851.941 tau/day, 5106.139 timesteps/s
-99.7% CPU use with 1 MPI tasks x no OpenMP threads
+Performance: 1353954.393 tau/day, 3134.154 timesteps/s
+99.2% CPU use with 1 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0018143  | 0.0018143  | 0.0018143  |   0.0 | 92.64
+Pair    | 0.0027828  | 0.0027828  | 0.0027828  |   0.0 | 87.22
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 7.8608e-05 | 7.8608e-05 | 7.8608e-05 |   0.0 |  4.01
-Output  | 1.0786e-05 | 1.0786e-05 | 1.0786e-05 |   0.0 |  0.55
-Modify  | 3.7106e-05 | 3.7106e-05 | 3.7106e-05 |   0.0 |  1.89
-Other   |            | 1.762e-05  |            |       |  0.90
+Comm    | 0.00023286 | 0.00023286 | 0.00023286 |   0.0 |  7.30
+Output  | 4.0459e-05 | 4.0459e-05 | 4.0459e-05 |   0.0 |  1.27
+Modify  | 7.3576e-05 | 7.3576e-05 | 7.3576e-05 |   0.0 |  2.31
+Other   |            | 6.094e-05  |            |       |  1.91

-Nlocal:    256 ave 256 max 256 min
+Nlocal:        256.000 ave         256 max         256 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Nghost:    1431 ave 1431 max 1431 min
+Nghost:        1431.00 ave        1431 max        1431 min
 Histogram: 1 0 0 0 0 0 0 0 0 0
-Neighs:    9705 ave 9705 max 9705 min
+Neighs:        9705.00 ave        9705 max        9705 min
 Histogram: 1 0 0 0 0 0 0 0 0 0

 Total # of neighbors = 9705
-Ave neighs/atom = 37.9102
+Ave neighs/atom = 37.910156
 Neighbor list builds = 0
 Dangerous builds not checked
 Total wall time: 0:00:00
--- a/examples/COUPLE/plugin/log.simple.plugin.4
+++ b/examples/COUPLE/plugin/log.simple.plugin.4
@ -1,9 +1,12 @@
-LAMMPS (18 Feb 2020)
-Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
-Created orthogonal box = (0 0 0) to (6.71838 6.71838 6.71838)
+LAMMPS (31 Aug 2021)
+OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
+  using 1 OpenMP thread(s) per MPI task
+Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
+Created orthogonal box = (0.0000000 0.0000000 0.0000000) to (6.7183848 6.7183848 6.7183848)
  1 by 1 by 2 MPI processor grid
 Created 256 atoms
-  create_atoms CPU = 0.000265157 secs
+  using lattice units in orthogonal box = (0.0000000 0.0000000 0.0000000) to (6.7183848 6.7183848 6.7183848)
+  create_atoms CPU = 0.003 seconds
 Neighbor list info ...
  update every 20 steps, delay 0 steps, check no
  max neighbors/atom: 2000, page size: 100000
@ -14,7 +17,7 @@ Neighbor list info ...
  (1) pair lj/cut, perpetual
      attributes: half, newton on
      pair build: half/bin/atomonly/newton
-      stencil: half/bin/3d/newton
+      stencil: half/bin/3d
      bin: standard
 Setting up Verlet run ...
  Unit style    : lj
@ -24,30 +27,30 @@ Per MPI rank memory allocation (min/avg/max) = 2.624 | 2.624 | 2.624 Mbytes
 Step Temp E_pair E_mol TotEng Press 
       0         1.44   -6.7733681            0   -4.6218056   -5.0244179 
      10    1.1298532   -6.3095502            0   -4.6213906   -2.6058175 
-Loop time of 0.00115264 on 2 procs for 10 steps with 256 atoms
+Loop time of 0.00330899 on 2 procs for 10 steps with 256 atoms

-Performance: 3747912.946 tau/day, 8675.724 timesteps/s
-94.5% CPU use with 2 MPI tasks x no OpenMP threads
+Performance: 1305535.501 tau/day, 3022.073 timesteps/s
+75.7% CPU use with 2 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.00074885 | 0.00075021 | 0.00075156 |   0.0 | 65.09
+Pair    | 0.0013522  | 0.0013813  | 0.0014104  |   0.1 | 41.74
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 0.00031829 | 0.00031943 | 0.00032056 |   0.0 | 27.71
-Output  | 9.306e-06  | 2.6673e-05 | 4.4041e-05 |   0.0 |  2.31
-Modify  | 2.0684e-05 | 2.0891e-05 | 2.1098e-05 |   0.0 |  1.81
-Other   |            | 3.544e-05  |            |       |  3.07
+Comm    | 0.00049139 | 0.00054241 | 0.00059342 |   0.0 | 16.39
+Output  | 3.6986e-05 | 0.00056588 | 0.0010948  |   0.0 | 17.10
+Modify  | 4.3909e-05 | 4.3924e-05 | 4.3939e-05 |   0.0 |  1.33
+Other   |            | 0.0007755  |            |       | 23.44

-Nlocal:    128 ave 128 max 128 min
+Nlocal:        128.000 ave         128 max         128 min
 Histogram: 2 0 0 0 0 0 0 0 0 0
-Nghost:    1109 ave 1109 max 1109 min
+Nghost:        1109.00 ave        1109 max        1109 min
 Histogram: 2 0 0 0 0 0 0 0 0 0
-Neighs:    4992 ave 4992 max 4992 min
+Neighs:        4992.00 ave        4992 max        4992 min
 Histogram: 2 0 0 0 0 0 0 0 0 0

 Total # of neighbors = 9984
-Ave neighs/atom = 39
+Ave neighs/atom = 39.000000
 Neighbor list builds = 0
 Dangerous builds not checked
 Setting up Verlet run ...
@ -58,30 +61,30 @@ Per MPI rank memory allocation (min/avg/max) = 2.624 | 2.624 | 2.624 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      10    1.1298532   -6.3095502            0   -4.6213906   -2.6058175 
      20    0.6239063    -5.557644            0   -4.6254403   0.97451173 
-Loop time of 0.00120443 on 2 procs for 10 steps with 256 atoms
+Loop time of 0.00648485 on 2 procs for 10 steps with 256 atoms

-Performance: 3586761.860 tau/day, 8302.689 timesteps/s
-95.5% CPU use with 2 MPI tasks x no OpenMP threads
+Performance: 666168.017 tau/day, 1542.056 timesteps/s
+44.3% CPU use with 2 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.00087798 | 0.00091359 | 0.0009492  |   0.0 | 75.85
+Pair    | 0.0022373  | 0.0024405  | 0.0026437  |   0.4 | 37.63
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 0.00016739 | 0.00020368 | 0.00023997 |   0.0 | 16.91
-Output  | 1.0124e-05 | 3.0513e-05 | 5.0901e-05 |   0.0 |  2.53
-Modify  | 1.89e-05   | 1.9812e-05 | 2.0725e-05 |   0.0 |  1.64
-Other   |            | 3.683e-05  |            |       |  3.06
+Comm    | 0.0024446  | 0.0026464  | 0.0028481  |   0.4 | 40.81
+Output  | 3.9069e-05 | 0.00059734 | 0.0011556  |   0.0 |  9.21
+Modify  | 4.869e-05  | 4.912e-05  | 4.9551e-05 |   0.0 |  0.76
+Other   |            | 0.0007515  |            |       | 11.59

-Nlocal:    128 ave 134 max 122 min
+Nlocal:        128.000 ave         134 max         122 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Nghost:    1109 ave 1115 max 1103 min
+Nghost:        1109.00 ave        1115 max        1103 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Neighs:    4976 ave 5205 max 4747 min
+Neighs:        4976.00 ave        5205 max        4747 min
 Histogram: 1 0 0 0 0 0 0 0 0 1

 Total # of neighbors = 9952
-Ave neighs/atom = 38.875
+Ave neighs/atom = 38.875000
 Neighbor list builds = 0
 Dangerous builds not checked
 Setting up Verlet run ...
@ -92,34 +95,34 @@ Per MPI rank memory allocation (min/avg/max) = 2.624 | 2.624 | 2.624 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      20    0.6239063   -5.5404291            0   -4.6082254    1.0394285 
      21   0.63845863   -5.5628733            0   -4.6089263   0.99398278 
-Loop time of 0.000206062 on 2 procs for 1 steps with 256 atoms
+Loop time of 0.00128072 on 2 procs for 1 steps with 256 atoms

-Performance: 2096456.406 tau/day, 4852.908 timesteps/s
-94.1% CPU use with 2 MPI tasks x no OpenMP threads
+Performance: 337310.921 tau/day, 780.812 timesteps/s
+60.2% CPU use with 2 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.00012947 | 0.00013524 | 0.00014101 |   0.0 | 65.63
+Pair    | 0.00047351 | 0.00049064 | 0.00050777 |   0.0 | 38.31
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 1.858e-05  | 2.4113e-05 | 2.9647e-05 |   0.0 | 11.70
-Output  | 8.699e-06  | 2.4204e-05 | 3.9708e-05 |   0.0 | 11.75
-Modify  | 2.34e-06   | 2.3705e-06 | 2.401e-06  |   0.0 |  1.15
-Other   |            | 2.013e-05  |            |       |  9.77
+Comm    | 7.6767e-05 | 9.3655e-05 | 0.00011054 |   0.0 |  7.31
+Output  | 5.4217e-05 | 0.00026297 | 0.00047172 |   0.0 | 20.53
+Modify  | 1.1554e-05 | 1.2026e-05 | 1.2498e-05 |   0.0 |  0.94
+Other   |            | 0.0004214  |            |       | 32.91

-Nlocal:    128 ave 135 max 121 min
+Nlocal:        128.000 ave         135 max         121 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Nghost:    1109 ave 1116 max 1102 min
+Nghost:        1109.00 ave        1116 max        1102 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Neighs:    4852.5 ave 5106 max 4599 min
+Neighs:        4852.50 ave        5106 max        4599 min
 Histogram: 1 0 0 0 0 0 0 0 0 1

 Total # of neighbors = 9705
-Ave neighs/atom = 37.9102
-Force on 1 atom via extract_atom: -18.109
-Force on 1 atom via extract_variable: -18.109
+Ave neighs/atom = 37.910156
 Neighbor list builds = 0
 Dangerous builds not checked
+Force on 1 atom via extract_atom: -18.109
+Force on 1 atom via extract_variable: -18.109
 Force on 1 atom via extract_atom: 26.9581
 Force on 1 atom via extract_variable: 26.9581
 Setting up Verlet run ...
@ -130,30 +133,30 @@ Per MPI rank memory allocation (min/avg/max) = 2.624 | 2.624 | 2.624 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      21   0.63845863   -5.5628733            0   -4.6089263   0.99398278 
      31    0.7494946   -5.7306417            0   -4.6107913   0.41043597 
-Loop time of 0.00119048 on 2 procs for 10 steps with 256 atoms
+Loop time of 0.00784933 on 2 procs for 10 steps with 256 atoms

-Performance: 3628802.105 tau/day, 8400.005 timesteps/s
-98.0% CPU use with 2 MPI tasks x no OpenMP threads
+Performance: 550365.761 tau/day, 1273.995 timesteps/s
+59.6% CPU use with 2 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.00085276 | 0.00089699 | 0.00094123 |   0.0 | 75.35
+Pair    | 0.0019235  | 0.0033403  | 0.0047572  |   2.5 | 42.56
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 0.00016896 | 0.00021444 | 0.00025992 |   0.0 | 18.01
-Output  | 9.413e-06  | 2.5939e-05 | 4.2465e-05 |   0.0 |  2.18
-Modify  | 1.8977e-05 | 2.0009e-05 | 2.1042e-05 |   0.0 |  1.68
-Other   |            | 3.31e-05   |            |       |  2.78
+Comm    | 0.0016948  | 0.003118   | 0.0045411  |   2.5 | 39.72
+Output  | 3.6445e-05 | 0.00064636 | 0.0012563  |   0.0 |  8.23
+Modify  | 6.2842e-05 | 6.3209e-05 | 6.3577e-05 |   0.0 |  0.81
+Other   |            | 0.0006814  |            |       |  8.68

-Nlocal:    128 ave 135 max 121 min
+Nlocal:        128.000 ave         135 max         121 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Nghost:    1109 ave 1116 max 1102 min
+Nghost:        1109.00 ave        1116 max        1102 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Neighs:    4844 ave 5096 max 4592 min
+Neighs:        4844.00 ave        5096 max        4592 min
 Histogram: 1 0 0 0 0 0 0 0 0 1

 Total # of neighbors = 9688
-Ave neighs/atom = 37.8438
+Ave neighs/atom = 37.843750
 Neighbor list builds = 0
 Dangerous builds not checked
 Setting up Verlet run ...
@ -164,30 +167,30 @@ Per MPI rank memory allocation (min/avg/max) = 2.624 | 2.624 | 2.624 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      31    0.7494946   -5.7306417            0   -4.6107913   0.41043597 
      51   0.71349216   -5.6772387            0   -4.6111811   0.52117681 
-Loop time of 0.00252603 on 2 procs for 20 steps with 256 atoms
+Loop time of 0.00696051 on 2 procs for 20 steps with 256 atoms

-Performance: 3420382.192 tau/day, 7917.551 timesteps/s
-99.2% CPU use with 2 MPI tasks x no OpenMP threads
+Performance: 1241287.730 tau/day, 2873.351 timesteps/s
+79.2% CPU use with 2 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0016245  | 0.0017014  | 0.0017784  |   0.2 | 67.36
-Neigh   | 0.00025359 | 0.0002563  | 0.00025901 |   0.0 | 10.15
-Comm    | 0.00036863 | 0.00045124 | 0.00053385 |   0.0 | 17.86
-Output  | 9.839e-06  | 2.8031e-05 | 4.6223e-05 |   0.0 |  1.11
-Modify  | 3.7027e-05 | 3.9545e-05 | 4.2063e-05 |   0.0 |  1.57
-Other   |            | 4.948e-05  |            |       |  1.96
+Pair    | 0.0028267  | 0.0036088  | 0.004391   |   1.3 | 51.85
+Neigh   | 0.00040272 | 0.00040989 | 0.00041707 |   0.0 |  5.89
+Comm    | 0.00081061 | 0.0015825  | 0.0023544  |   1.9 | 22.74
+Output  | 3.6006e-05 | 0.00062106 | 0.0012061  |   0.0 |  8.92
+Modify  | 6.8937e-05 | 7.1149e-05 | 7.336e-05  |   0.0 |  1.02
+Other   |            | 0.0006671  |            |       |  9.58

-Nlocal:    128 ave 132 max 124 min
+Nlocal:        128.000 ave         132 max         124 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Nghost:    1100 ave 1101 max 1099 min
+Nghost:        1100.00 ave        1101 max        1099 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Neighs:    4850 ave 4953 max 4747 min
+Neighs:        4850.00 ave        4953 max        4747 min
 Histogram: 1 0 0 0 0 0 0 0 0 1

 Total # of neighbors = 9700
-Ave neighs/atom = 37.8906
+Ave neighs/atom = 37.890625
 Neighbor list builds = 1
 Dangerous builds not checked
 Setting up Verlet run ...
@ -198,30 +201,30 @@ Per MPI rank memory allocation (min/avg/max) = 2.624 | 2.624 | 2.624 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      51   0.71349216   -5.6772387            0   -4.6111811   0.52117681 
      61   0.78045421   -5.7781094            0   -4.6120011  0.093808941 
-Loop time of 0.00115444 on 2 procs for 10 steps with 256 atoms
+Loop time of 0.00155862 on 2 procs for 10 steps with 256 atoms

-Performance: 3742065.976 tau/day, 8662.190 timesteps/s
-96.5% CPU use with 2 MPI tasks x no OpenMP threads
+Performance: 2771678.197 tau/day, 6415.922 timesteps/s
+95.0% CPU use with 2 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.00087346 | 0.00089311 | 0.00091275 |   0.0 | 77.36
+Pair    | 0.0012369  | 0.001266   | 0.001295   |   0.1 | 81.22
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 0.00016192 | 0.0001823  | 0.00020269 |   0.0 | 15.79
-Output  | 9.49e-06   | 2.6234e-05 | 4.2978e-05 |   0.0 |  2.27
-Modify  | 1.9095e-05 | 1.9843e-05 | 2.0591e-05 |   0.0 |  1.72
-Other   |            | 3.296e-05  |            |       |  2.85
+Comm    | 0.00019462 | 0.00022315 | 0.00025169 |   0.0 | 14.32
+Output  | 2.0217e-05 | 2.1945e-05 | 2.3673e-05 |   0.0 |  1.41
+Modify  | 2.562e-05  | 2.5759e-05 | 2.5898e-05 |   0.0 |  1.65
+Other   |            | 2.181e-05  |            |       |  1.40

-Nlocal:    128 ave 132 max 124 min
+Nlocal:        128.000 ave         132 max         124 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Nghost:    1100 ave 1101 max 1099 min
+Nghost:        1100.00 ave        1101 max        1099 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Neighs:    4850 ave 4953 max 4747 min
+Neighs:        4850.00 ave        4953 max        4747 min
 Histogram: 1 0 0 0 0 0 0 0 0 1

 Total # of neighbors = 9700
-Ave neighs/atom = 37.8906
+Ave neighs/atom = 37.890625
 Neighbor list builds = 0
 Dangerous builds not checked
 Setting up Verlet run ...
@ -232,30 +235,30 @@ Per MPI rank memory allocation (min/avg/max) = 2.624 | 2.624 | 2.624 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      61   0.78045421   -5.7781094            0   -4.6120011  0.093808941 
      81   0.77743907   -5.7735004            0   -4.6118971  0.090822641 
-Loop time of 0.00244325 on 2 procs for 20 steps with 256 atoms
+Loop time of 0.00351607 on 2 procs for 20 steps with 256 atoms

-Performance: 3536279.919 tau/day, 8185.833 timesteps/s
-99.0% CPU use with 2 MPI tasks x no OpenMP threads
+Performance: 2457288.612 tau/day, 5688.168 timesteps/s
+97.9% CPU use with 2 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0016916  | 0.0017038  | 0.001716   |   0.0 | 69.73
-Neigh   | 0.00025229 | 0.00025512 | 0.00025795 |   0.0 | 10.44
-Comm    | 0.00035772 | 0.00036918 | 0.00038064 |   0.0 | 15.11
-Output  | 1.0858e-05 | 2.7875e-05 | 4.4891e-05 |   0.0 |  1.14
-Modify  | 3.817e-05  | 3.9325e-05 | 4.048e-05  |   0.0 |  1.61
-Other   |            | 4.796e-05  |            |       |  1.96
+Pair    | 0.0023896  | 0.0024147  | 0.0024397  |   0.1 | 68.67
+Neigh   | 0.00037331 | 0.00040456 | 0.0004358  |   0.0 | 11.51
+Comm    | 0.00050571 | 0.00051343 | 0.00052116 |   0.0 | 14.60
+Output  | 2.6424e-05 | 5.6547e-05 | 8.667e-05  |   0.0 |  1.61
+Modify  | 5.0287e-05 | 5.1071e-05 | 5.1856e-05 |   0.0 |  1.45
+Other   |            | 7.58e-05   |            |       |  2.16

-Nlocal:    128 ave 128 max 128 min
+Nlocal:        128.000 ave         128 max         128 min
 Histogram: 2 0 0 0 0 0 0 0 0 0
-Nghost:    1088.5 ave 1092 max 1085 min
+Nghost:        1088.50 ave        1092 max        1085 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Neighs:    4850.5 ave 4851 max 4850 min
+Neighs:        4850.50 ave        4851 max        4850 min
 Histogram: 1 0 0 0 0 0 0 0 0 1

 Total # of neighbors = 9701
-Ave neighs/atom = 37.8945
+Ave neighs/atom = 37.894531
 Neighbor list builds = 1
 Dangerous builds not checked
 Deleted 256 atoms, new total = 0
@ -267,30 +270,30 @@ Per MPI rank memory allocation (min/avg/max) = 2.624 | 2.624 | 2.624 Mbytes
 Step Temp E_pair E_mol TotEng Press 
      81    0.6239063   -5.5404291            0   -4.6082254    1.0394285 
      91   0.75393007   -5.7375259            0   -4.6110484   0.39357367 
-Loop time of 0.00118092 on 2 procs for 10 steps with 256 atoms
+Loop time of 0.0109747 on 2 procs for 10 steps with 256 atoms

-Performance: 3658158.625 tau/day, 8467.960 timesteps/s
-98.6% CPU use with 2 MPI tasks x no OpenMP threads
+Performance: 393631.731 tau/day, 911.185 timesteps/s
+53.5% CPU use with 2 MPI tasks x 1 OpenMP threads

 MPI task timing breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
 ---------------------------------------------------------------
-Pair    | 0.0008476  | 0.00089265 | 0.00093771 |   0.0 | 75.59
+Pair    | 0.0012057  | 0.0012732  | 0.0013407  |   0.2 | 11.60
 Neigh   | 0          | 0          | 0          |   0.0 |  0.00
-Comm    | 0.00016335 | 0.00020946 | 0.00025557 |   0.0 | 17.74
-Output  | 8.87e-06   | 2.5733e-05 | 4.2595e-05 |   0.0 |  2.18
-Modify  | 1.8755e-05 | 1.9814e-05 | 2.0872e-05 |   0.0 |  1.68
-Other   |            | 3.326e-05  |            |       |  2.82
+Comm    | 0.00018882 | 0.00025686 | 0.00032489 |   0.0 |  2.34
+Output  | 2.1943e-05 | 0.0047067  | 0.0093915  |   6.8 | 42.89
+Modify  | 2.4614e-05 | 2.5439e-05 | 2.6264e-05 |   0.0 |  0.23
+Other   |            | 0.004712   |            |       | 42.94

-Nlocal:    128 ave 135 max 121 min
+Nlocal:        128.000 ave         135 max         121 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Nghost:    1109 ave 1116 max 1102 min
+Nghost:        1109.00 ave        1116 max        1102 min
 Histogram: 1 0 0 0 0 0 0 0 0 1
-Neighs:    4852.5 ave 5106 max 4599 min
+Neighs:        4852.50 ave        5106 max        4599 min
 Histogram: 1 0 0 0 0 0 0 0 0 1

 Total # of neighbors = 9705
-Ave neighs/atom = 37.9102
+Ave neighs/atom = 37.910156
 Neighbor list builds = 0
 Dangerous builds not checked
 Total wall time: 0:00:00
--- a/examples/COUPLE/plugin/simple.c
+++ b/examples/COUPLE/plugin/simple.c
@ -87,7 +87,7 @@ int main(int narg, char **arg)
      MPI_Abort(MPI_COMM_WORLD,1);
    }
  }
-  if (lammps == 1) plugin->open(0,NULL,comm_lammps,&lmp);
+  if (lammps == 1) lmp = plugin->open(0,NULL,comm_lammps,NULL);

  while (1) {
    if (me == 0) {
@ -139,7 +139,7 @@ int main(int narg, char **arg)

  cmds[0] = (char *)"run 10";
  cmds[1] = (char *)"run 20";
-  if (lammps == 1) plugin->commands_list(lmp,2,cmds);
+  if (lammps == 1) plugin->commands_list(lmp,2,(const char **)cmds);

  /* delete all atoms
     create_atoms() to create new ones with old coords, vels
@ -164,12 +164,13 @@ int main(int narg, char **arg)

  if (lammps == 1) {
    plugin->close(lmp);
+    MPI_Barrier(comm_lammps);
+    MPI_Comm_free(&comm_lammps);
    liblammpsplugin_release(plugin);
  }

  /* close down MPI */

-  if (lammps == 1) MPI_Comm_free(&comm_lammps);
  MPI_Barrier(MPI_COMM_WORLD);
  MPI_Finalize();
 }
--- a/examples/PACKAGES/charge_regulation/in.chreg-polymer
+++ b/examples/PACKAGES/charge_regulation/in.chreg-polymer
@ -8,7 +8,7 @@ bond_style      harmonic
 bond_coeff      1 100 1.122462 # K R0
 velocity        all create 1.0 8008 loop geom

-pair_style      lj/cut/coul/long 1.122462 20
+pair_style      lj/cut/coul/long/soft 2 0.5 10.0  1.122462 20
 pair_coeff      * *  1.0 1.0 1.122462 # charges
 kspace_style    pppm 1.0e-3
 pair_modify     shift yes
--- a/lib/colvars/colvarmodule.cpp
+++ b/lib/colvars/colvarmodule.cpp
@ -1476,7 +1476,9 @@ int colvarmodule::write_output_files()
       bi != biases.end();
       bi++) {
    // Only write output files if they have not already been written this time step
-    if ((*bi)->output_freq == 0 || (cvm::step_absolute() % (*bi)->output_freq) != 0) {
+    if ((*bi)->output_freq == 0    ||
+        cvm::step_relative() == 0  ||
+        (cvm::step_absolute() % (*bi)->output_freq) != 0) {
      error_code |= (*bi)->write_output_files();
    }
    error_code |= (*bi)->write_state_to_replicas();
--- a/lib/colvars/colvars_version.h
+++ b/lib/colvars/colvars_version.h
@ -1,3 +1,3 @@
 #ifndef COLVARS_VERSION
-#define COLVARS_VERSION "2021-08-06"
+#define COLVARS_VERSION "2021-09-21"
 #endif
--- a/lib/gpu/geryon/ocl_device.h
+++ b/lib/gpu/geryon/ocl_device.h
@ -462,7 +462,6 @@ int UCL_Device::set_platform(int pid) {
  _num_devices = 0;
  for (int i=0; i<num_unpart; i++) {
    cl_uint num_subdevices = 1;
-    cl_device_id *subdevice_list = device_list + i;

    #ifdef CL_VERSION_1_2
    cl_device_affinity_domain adomain;
@ -479,19 +478,21 @@ int UCL_Device::set_platform(int pid) {
      CL_SAFE_CALL(clCreateSubDevices(device_list[i], props, 0, NULL,
                                      &num_subdevices));
    if (num_subdevices > 1) {
-      subdevice_list = new cl_device_id[num_subdevices];
+      cl_device_id *subdevice_list = new cl_device_id[num_subdevices];
      CL_SAFE_CALL(clCreateSubDevices(device_list[i], props, num_subdevices,
                                      subdevice_list, &num_subdevices));
+      for (int j=0; j<num_subdevices; j++) {
+        _cl_devices.push_back(device_list[i]);
+        add_properties(device_list[i]);
+        _num_devices++;
+      }
+      delete[] subdevice_list;
+    } else {
+      _cl_devices.push_back(device_list[i]);
+      add_properties(device_list[i]);
+      _num_devices++;
    }
    #endif
-
-    for (int j=0; j<num_subdevices; j++) {
-      _num_devices++;
-      _cl_devices.push_back(subdevice_list[j]);
-      add_properties(subdevice_list[j]);
-    }
-
-    if (num_subdevices > 1) delete[] subdevice_list;
  } // for i
  #endif

@ -555,16 +556,22 @@ void UCL_Device::add_properties(cl_device_id device_list) {
                               sizeof(float_width),&float_width,nullptr));
  op.preferred_vector_width32=float_width;

-  // Determine if double precision is supported
  cl_uint double_width;
  CL_SAFE_CALL(clGetDeviceInfo(device_list,
                               CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE,
                               sizeof(double_width),&double_width,nullptr));
  op.preferred_vector_width64=double_width;
-  if (double_width==0)
-    op.double_precision=false;
-  else
+
+  // Determine if double precision is supported: All bits in the mask must be set.
+  cl_device_fp_config double_mask = (CL_FP_FMA|CL_FP_ROUND_TO_NEAREST|CL_FP_ROUND_TO_ZERO|
+                                     CL_FP_ROUND_TO_INF|CL_FP_INF_NAN|CL_FP_DENORM);
+  cl_device_fp_config double_avail;
+  CL_SAFE_CALL(clGetDeviceInfo(device_list,CL_DEVICE_DOUBLE_FP_CONFIG,
+                               sizeof(double_avail),&double_avail,nullptr));
+  if ((double_avail & double_mask) == double_mask)
    op.double_precision=true;
+  else
+    op.double_precision=false;

  CL_SAFE_CALL(clGetDeviceInfo(device_list,
                               CL_DEVICE_PROFILING_TIMER_RESOLUTION,
--- a/lib/gpu/geryon/ocl_timer.h
+++ b/lib/gpu/geryon/ocl_timer.h
@ -38,8 +38,10 @@ namespace ucl_opencl {
 /// Class for timing OpenCL events
 class UCL_Timer {
 public:
-  inline UCL_Timer() : _total_time(0.0f), _initialized(false), has_measured_time(false) { }
-  inline UCL_Timer(UCL_Device &dev) : _total_time(0.0f), _initialized(false), has_measured_time(false)
+  inline UCL_Timer() : start_event(nullptr), stop_event(nullptr), _total_time(0.0f),
+                       _initialized(false), has_measured_time(false) { }
+  inline UCL_Timer(UCL_Device &dev) : start_event(nullptr), stop_event(nullptr), _total_time(0.0f),
+                                      _initialized(false), has_measured_time(false)
    { init(dev); }

  inline ~UCL_Timer() { clear(); }
--- a/lib/gpu/lal_answer.cpp
+++ b/lib/gpu/lal_answer.cpp
@ -23,7 +23,7 @@ namespace LAMMPS_AL {

 template <class numtyp, class acctyp>
 AnswerT::Answer() : _allocated(false),_eflag(false),_vflag(false),
-                            _inum(0),_ilist(nullptr),_newton(false) {
+                    _inum(0),_ilist(nullptr),_newton(false) {
 }

 template <class numtyp, class acctyp>
--- a/lib/gpu/lal_answer.h
+++ b/lib/gpu/lal_answer.h
@ -127,9 +127,8 @@ class Answer {
  /// Add forces and torques from the GPU into a LAMMPS pointer
  void get_answers(double **f, double **tor);

-  inline double get_answers(double **f, double **tor, double *eatom,
-                            double **vatom, double *virial, double &ecoul,
-                            int &error_flag_in) {
+  inline double get_answers(double **f, double **tor, double *eatom, double **vatom,
+                            double *virial, double &ecoul, int &error_flag_in) {
    double ta=MPI_Wtime();
    time_answer.sync_stop();
    _time_cpu_idle+=MPI_Wtime()-ta;
--- a/lib/gpu/lal_born_coul_long.cpp
+++ b/lib/gpu/lal_born_coul_long.cpp
@ -34,7 +34,7 @@ BornCoulLongT::BornCoulLong() : BaseCharge<numtyp,acctyp>(),
 }

 template <class numtyp, class acctyp>
-BornCoulLongT::~BornCoulLongT() {
+BornCoulLongT::~BornCoulLong() {
  clear();
 }

--- a/lib/gpu/lal_born_coul_wolf.cpp
+++ b/lib/gpu/lal_born_coul_wolf.cpp
@ -34,7 +34,7 @@ BornCoulWolfT::BornCoulWolf() : BaseCharge<numtyp,acctyp>(),
 }

 template <class numtyp, class acctyp>
-BornCoulWolfT::~BornCoulWolfT() {
+BornCoulWolfT::~BornCoulWolf() {
  clear();
 }

--- a/lib/gpu/lal_buck_coul_long.cpp
+++ b/lib/gpu/lal_buck_coul_long.cpp
@ -34,7 +34,7 @@ BuckCoulLongT::BuckCoulLong() : BaseCharge<numtyp,acctyp>(),
 }

 template <class numtyp, class acctyp>
-BuckCoulLongT::~BuckCoulLongT() {
+BuckCoulLongT::~BuckCoulLong() {
  clear();
 }

--- a/lib/gpu/lal_device.cpp
+++ b/lib/gpu/lal_device.cpp
@ -333,6 +333,12 @@ int DeviceT::init_device(MPI_Comm world, MPI_Comm replica, const int ngpu,
    gpu_barrier();
  }

+  // check if double precision support is available
+  #if defined(_SINGLE_DOUBLE) || defined(_DOUBLE_DOUBLE)
+  if (!gpu->double_precision())
+    return -16;
+  #endif
+
  // Setup auto bin size calculation for calls from atom::sort
  // - This is repeated in neighbor init with additional info
  if (_user_cell_size<0.0) {
@ -348,7 +354,7 @@ int DeviceT::init_device(MPI_Comm world, MPI_Comm replica, const int ngpu,
 }

 template <class numtyp, class acctyp>
-int DeviceT::set_ocl_params(std::string s_config, std::string extra_args) {
+int DeviceT::set_ocl_params(std::string s_config, const std::string &extra_args) {
  #ifdef USE_OPENCL

  #include "lal_pre_ocl_config.h"
@ -368,7 +374,7 @@ int DeviceT::set_ocl_params(std::string s_config, std::string extra_args) {
  int token_count=0;
  std::string params[18];
  char ocl_config[2048];
-  strcpy(ocl_config,s_config.c_str());
+  strncpy(ocl_config,s_config.c_str(),2047);
  char *pch = strtok(ocl_config,",");
  _ocl_config_name=pch;
  pch = strtok(nullptr,",");
@ -546,14 +552,9 @@ int DeviceT::init_nbor(Neighbor *nbor, const int nlocal,
    return -3;

  if (_user_cell_size<0.0) {
-    #ifndef LAL_USE_OLD_NEIGHBOR
-    _neighbor_shared.setup_auto_cell_size(true,cutoff,nbor->simd_size());
-    #else
    _neighbor_shared.setup_auto_cell_size(false,cutoff,nbor->simd_size());
-    #endif
  } else
-    _neighbor_shared.setup_auto_cell_size(false,_user_cell_size,
-                                          nbor->simd_size());
+    _neighbor_shared.setup_auto_cell_size(false,_user_cell_size,nbor->simd_size());
  nbor->set_cutoff(cutoff);

  return 0;
@ -984,18 +985,16 @@ int DeviceT::compile_kernels() {
  _max_bio_shared_types=gpu_lib_data[17];
  _pppm_max_spline=gpu_lib_data[18];

-  if (static_cast<size_t>(_block_pair)>gpu->group_size_dim(0) ||
-      static_cast<size_t>(_block_bio_pair)>gpu->group_size_dim(0) ||
-      static_cast<size_t>(_block_ellipse)>gpu->group_size_dim(0) ||
-      static_cast<size_t>(_pppm_block)>gpu->group_size_dim(0) ||
-      static_cast<size_t>(_block_nbor_build)>gpu->group_size_dim(0) ||
-      static_cast<size_t>(_block_cell_2d)>gpu->group_size_dim(0) ||
-      static_cast<size_t>(_block_cell_2d)>gpu->group_size_dim(1) ||
-      static_cast<size_t>(_block_cell_id)>gpu->group_size_dim(0) ||
-      static_cast<size_t>(_max_shared_types*_max_shared_types*
-                          sizeof(numtyp)*17 > gpu->slm_size()) ||
-      static_cast<size_t>(_max_bio_shared_types*2*sizeof(numtyp) >
-                          gpu->slm_size()))
+  if (static_cast<size_t>(_block_pair) > gpu->group_size_dim(0) ||
+      static_cast<size_t>(_block_bio_pair) > gpu->group_size_dim(0) ||
+      static_cast<size_t>(_block_ellipse) > gpu->group_size_dim(0) ||
+      static_cast<size_t>(_pppm_block) > gpu->group_size_dim(0) ||
+      static_cast<size_t>(_block_nbor_build) > gpu->group_size_dim(0) ||
+      static_cast<size_t>(_block_cell_2d) > gpu->group_size_dim(0) ||
+      static_cast<size_t>(_block_cell_2d) > gpu->group_size_dim(1) ||
+      static_cast<size_t>(_block_cell_id) > gpu->group_size_dim(0) ||
+      static_cast<size_t>(_max_shared_types*_max_shared_types*sizeof(numtyp)*17 > gpu->slm_size()) ||
+      static_cast<size_t>(_max_bio_shared_types*2*sizeof(numtyp) > gpu->slm_size()))
    return -13;

  if (_block_pair % _simd_size != 0 || _block_bio_pair % _simd_size != 0 ||
@ -1071,9 +1070,8 @@ void lmp_clear_device() {
  global_device.clear_device();
 }

-double lmp_gpu_forces(double **f, double **tor, double *eatom,
-                      double **vatom, double *virial, double &ecoul,
-                      int &error_flag) {
+double lmp_gpu_forces(double **f, double **tor, double *eatom, double **vatom,
+                      double *virial, double &ecoul, int &error_flag) {
  return global_device.fix_gpu(f,tor,eatom,vatom,virial,ecoul,error_flag);
 }

--- a/lib/gpu/lal_device.h
+++ b/lib/gpu/lal_device.h
@ -163,17 +163,15 @@ class Device {
    { ans_queue.push(ans); }

  /// Add "answers" (force,energies,etc.) into LAMMPS structures
-  inline double fix_gpu(double **f, double **tor, double *eatom,
-                        double **vatom, double *virial, double &ecoul,
-                        int &error_flag) {
+  inline double fix_gpu(double **f, double **tor, double *eatom, double **vatom,
+                        double *virial, double &ecoul, int &error_flag) {
    error_flag=0;
    atom.data_unavail();
    if (ans_queue.empty()==false) {
      stop_host_timer();
      double evdw=0.0;
      while (ans_queue.empty()==false) {
-        evdw+=ans_queue.front()->get_answers(f,tor,eatom,vatom,virial,ecoul,
-                                             error_flag);
+        evdw += ans_queue.front()->get_answers(f,tor,eatom,vatom,virial,ecoul,error_flag);
        ans_queue.pop();
      }
      return evdw;
@ -350,7 +348,7 @@ class Device {
  int _data_in_estimate, _data_out_estimate;

  std::string _ocl_config_name, _ocl_config_string, _ocl_compile_string;
-  int set_ocl_params(std::string, std::string);
+  int set_ocl_params(std::string, const std::string &);
 };

 }
--- a/lib/gpu/lal_neighbor.cpp
+++ b/lib/gpu/lal_neighbor.cpp
@ -39,7 +39,7 @@ bool Neighbor::init(NeighborShared *shared, const int inum,
                    const int block_cell_2d, const int block_cell_id,
                    const int block_nbor_build, const int threads_per_atom,
                    const int simd_size, const bool time_device,
-                    const std::string compile_flags, const bool ilist_map) {
+                    const std::string &compile_flags, const bool ilist_map) {
  clear();
  _ilist_map = ilist_map;

@ -743,7 +743,7 @@ void Neighbor::build_nbor_list(double **x, const int inum, const int host_inum,
    mn = _max_nbors;
    const numtyp i_cell_size=static_cast<numtyp>(1.0/_cell_size);
    const int neigh_block=_block_cell_id;
-    const int GX=(int)ceil((float)nall/neigh_block);
+    const int GX=(int)ceil((double)nall/neigh_block);
    const numtyp sublo0=static_cast<numtyp>(sublo[0]);
    const numtyp sublo1=static_cast<numtyp>(sublo[1]);
    const numtyp sublo2=static_cast<numtyp>(sublo[2]);
--- a/lib/gpu/lal_neighbor.h
+++ b/lib/gpu/lal_neighbor.h
@ -71,7 +71,7 @@ class Neighbor {
            const int block_cell_2d, const int block_cell_id,
            const int block_nbor_build, const int threads_per_atom,
            const int simd_size, const bool time_device,
-            const std::string compile_flags, const bool ilist_map);
+            const std::string &compile_flags, const bool ilist_map);

  /// Set the cutoff+skin
  inline void set_cutoff(const double cutoff) {
--- a/lib/gpu/lal_neighbor_shared.cpp
+++ b/lib/gpu/lal_neighbor_shared.cpp
@ -89,7 +89,7 @@ double NeighborShared::best_cell_size(const double subx, const double suby,
 }

 void NeighborShared::compile_kernels(UCL_Device &dev, const int gpu_nbor,
-                                     const std::string flags) {
+                                     const std::string &flags) {
  if (_compiled)
          return;

--- a/lib/gpu/lal_neighbor_shared.h
+++ b/lib/gpu/lal_neighbor_shared.h
@ -87,7 +87,7 @@ class NeighborShared {

  /// Compile kernels for neighbor lists
  void compile_kernels(UCL_Device &dev, const int gpu_nbor,
-                       const std::string flags);
+                       const std::string &flags);

  // ----------------------------- Kernels
  UCL_Program *nbor_program, *build_program;
--- a/lib/gpu/lal_pppm.cpp
+++ b/lib/gpu/lal_pppm.cpp
@ -54,7 +54,7 @@ int PPPMT::bytes_per_atom() const {
 }

 template <class numtyp, class acctyp, class grdtyp, class grdtyp4>
-grdtyp * PPPMT::init(const int nlocal, const int nall, FILE *_screen,
+grdtyp *PPPMT::init(const int nlocal, const int nall, FILE *_screen,
                              const int order, const int nxlo_out,
                              const int nylo_out, const int nzlo_out,
                              const int nxhi_out, const int nyhi_out,
@ -69,14 +69,14 @@ grdtyp * PPPMT::init(const int nlocal, const int nall, FILE *_screen,

  flag=device->init(*ans,nlocal,nall);
  if (flag!=0)
-    return 0;
+    return nullptr;
  if (sizeof(grdtyp)==sizeof(double) && device->double_precision()==false) {
    flag=-15;
-    return 0;
+    return nullptr;
  }
  if (device->ptx_arch()>0.0 && device->ptx_arch()<1.1) {
    flag=-4;
-    return 0;
+    return nullptr;
  }

  ucl_device=device->gpu;
@ -168,7 +168,7 @@ grdtyp * PPPMT::init(const int nlocal, const int nall, FILE *_screen,
                                       UCL_READ_WRITE)==UCL_SUCCESS);
  if (!success) {
    flag=-3;
-    return 0;
+    return nullptr;
  }

  error_flag.device.zero();
@ -342,13 +342,15 @@ void PPPMT::interp(const grdtyp qqrd2e_scale) {
  vd_brick.update_device(true);
  time_in.stop();

+  int ainum=this->ans->inum();
+  if (ainum==0)
+    return;
+
  time_interp.start();
  // Compute the block size and grid size to keep all cores busy
  int BX=this->block_size();
  int GX=static_cast<int>(ceil(static_cast<double>(this->ans->inum())/BX));

-  int ainum=this->ans->inum();
-
  k_interp.set_size(GX,BX);
  k_interp.run(&atom->x, &atom->q, &ainum, &vd_brick, &d_rho_coeff,
               &_npts_x, &_npts_yx, &_brick_x, &_brick_y, &_brick_z, &_delxinv,
--- a/lib/pace/Makefile
+++ b/lib/pace/Makefile
@ -2,8 +2,8 @@ SHELL = /bin/sh

 # ------ FILES ------

-SRC_FILES = $(wildcard src/ML-PACE/*.cpp)
-SRC = $(filter-out src/ML-PACE/pair_pace.cpp, $(SRC_FILES))
+SRC_FILES = $(wildcard src/USER-PACE/*.cpp)
+SRC = $(filter-out src/USER-PACE/pair_pace.cpp, $(SRC_FILES))

 # ------ DEFINITIONS ------

@ -12,7 +12,7 @@ OBJ =   $(SRC:.cpp=.o)


 # ------ SETTINGS ------
-CXXFLAGS = -O3 -fPIC -Isrc/ML-PACE
+CXXFLAGS = -O3 -fPIC -Isrc/USER-PACE

 ARCHIVE =	ar
 ARCHFLAG =	-rc
--- a/Show More
+++ b/Show More