Merge pull request #3131 from akohlmey/lammps-cxx-style

More general LAMMPS code design info for the Programmer guide section of the manual
2022-02-14 18:13:32 -05:00
parent 702a2dd3f6 baf443766a
commit 618b3ec94f
6 changed files with 605 additions and 143 deletions
--- a/doc/src/Developer.rst
+++ b/doc/src/Developer.rst
@ -11,6 +11,7 @@ of time and requests from the LAMMPS user community.
   :maxdepth: 1

   Developer_org
+   Developer_cxx_vs_c_style
   Developer_parallel
   Developer_flow
   Developer_write
--- a/doc/src/Developer_cxx_vs_c_style.rst
+++ b/doc/src/Developer_cxx_vs_c_style.rst
@ -0,0 +1,384 @@
+Code design
+-----------
+
+This section discusses some of the code design choices in LAMMPS and
+overall strategy in order to assist developers to write new code that
+will fit well with the remaining code.  Please see the section on
+:doc:`Requirements for contributed code <Modify_style>` for more
+specific recommendations and guidelines.  While that section is
+organized more in the form of a checklist for code contributors, the
+focus here is on overall code design strategy, choices made between
+possible alternatives, and to discuss of some relevant C++ programming
+language constructs.
+
+Historically, the basic design philosophy of the LAMMPS C++ code was
+that of a "C with classes" style.  The was motivated by the desire to
+make it easier to modify LAMMPS for people without significant training
+in C++ programming and by trying to use data structures and code constructs
+that somewhat resemble the previous implementation(s) in Fortran.
+A contributing factor for this choice also was that at the time the
+implementation of C++ compilers was not always very mature and some of
+the advanced features contained bugs or were not functioning exactly
+as the standard required; plus there was some disagreement between
+compiler vendors about how to interpret the C++ standard documents.
+
+However, C++ compilers have advanced a lot since then and with the
+transition to requiring the C++11 standard in 2020 as the minimum C++ language
+standard for LAMMPS, the decision was made to also replace some of the
+C-style constructs with equivalent C++ functionality, either from the
+C++ standard library or as custom classes or function, in order to
+improve readability of the code and to increase code reuse through
+abstraction of commonly used functionality.
+
+.. note::
+
+   Please note that as of spring 2022 there is still a sizable chunk of
+   legacy code in LAMMPS that has not yet been refactored to reflect these
+   style conventions in full.  LAMMPS has a large code base and many
+   different contributors and there also is a hierarchy of precedence
+   in which the code is adapted.  Highest priority has the code in the
+   ``src`` folder, followed by code in packages in order of their popularity
+   and complexity (simpler code is adapted sooner), followed by code
+   in the ``lib`` folder.  Source code that is downloaded during compilation
+   is not subject to the conventions discussed here.
+
+Object oriented code
+^^^^^^^^^^^^^^^^^^^^
+
+LAMMPS is designed to be an object oriented code, that is each
+simulation is represented by an instance of the LAMMPS class.  When
+running in parallel, of course, each MPI process will create such an
+instance.  This can be seen in the ``main.cpp`` file where the core
+steps of running a LAMMPS simulation are the following 3 lines of code:
+
+.. code-block:: C++
+
+    LAMMPS *lammps = new LAMMPS(argc, argv, lammps_comm);
+    lammps->input->file();
+    delete lammps;
+
+The first line creates a LAMMPS class instance and passes the command
+line arguments and the global communicator to its constructor.  The
+second line tells the LAMMPS instance to process the input (either from
+standard input or the provided input file) until the end.  And the third
+line deletes that instance again.  The remainder of the main.cpp file
+are for error handling, MPI configuration and other special features.
+
+In the constructor of the LAMMPS class instance the basic LAMMPS class hierarchy
+is created as shown in :ref:`class-topology`.  While processing the input further
+class instances are created, or deleted, or replaced and specific member functions
+of specific classes are called to trigger actions like creating atoms, computing
+forces, computing properties, propagating the system, or writing output.
+
+Compositing and Inheritance
+===========================
+
+LAMMPS makes extensive use of the object oriented programming (OOP)
+principles of *compositing* and *inheritance*. Classes like the
+``LAMMPS`` class are a **composite** containing pointers to instances of
+other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``,
+``Modify``, and so on.  Each of these classes implement certain
+functionality by storing and manipulating data related to the simulation
+and providing member functions that trigger certain actions.  Some of
+those classes like ``Force`` are a composite again containing instances
+of classes describing the force interactions or ``Modify`` containing
+and calling fixes and computes.  In most cases (e.g. ``AtomVec``, ``Comm``,
+``Pair``, or ``Bond``) there is only one instance of those member classes
+allowed, but in a few cases (e.g. ``Region``, ``Fix``, ``Compute``, or
+``Dump``) there can be multiple instances and the parent class is
+maintaining a list of the pointers of instantiated classes instead
+of a single pointer.
+
+Changing behavior or adjusting how LAMMPS handles a simulation is
+implemented via **inheritance** where different variants of the
+functionality are realized by creating *derived* classes that can share
+common functionality in their base class and provide a consistent
+interface where the derived classes replace (dummy or pure) functions in
+the base class.  The higher level classes can then call those methods of
+the instantiated classes without having to know which specific derived
+class variant was instantiated.  In the LAMMPS documentation those
+derived classes are usually referred to a "styles", e.g.  pair styles,
+fix styles, atom styles and so on.
+
+This is the origin of the flexibility of LAMMPS and facilitates for
+example to compute forces for very different non-bonded potential
+functions by having different pair styles (implemented as different
+classes derived from the ``Pair`` class) where the evaluation of the
+potential function is confined to the implementation of the individual
+classes.  Whenever a new :doc:`pair_style` or :doc:`bond_style` or
+:doc:`comm_style` or similar command is processed in the LAMMPS input
+any existing class instance is deleted and a new instance created in
+it place.
+
+Classes derived from ``Fix`` or ``Compute`` represent a different facet
+of LAMMPS' flexibility as there can be multiple instances of them an
+their member functions will be called at different phases of the time
+integration process (as explained in `Developer_flow`).  This way
+multiple manipulations of the entire or parts of the system can be
+programmed (with fix styles) or different computations can be performed
+and accessed and further processed or output through a common interface
+(with compute styles).
+
+Further code sharing is possible by creating derived classes from the
+derived classes (for instance to implement an accelerated version of a
+pair style) where then only a subset of the methods are replaced with
+the accelerated versions.
+
+Polymorphism
+============
+
+Polymorphism and dynamic dispatch are another OOP feature that play an
+important part of how LAMMPS selects which code to execute.  In a nutshell,
+this is a mechanism where the decision of which member function to call
+from a class is determined at runtime and not when the code is compiled.
+To enable it, the function has to be declared as ``virtual`` and all
+corresponding functions in derived classes should be using the ``override``
+property. Below is a brief example.
+
+.. code-block:: c++
+
+   class Base {
+   public:
+    virtual ~Base() = default;
+    void call();
+    void normal();
+    virtual void poly();
+   };
+
+   void Base::call() {
+    normal();
+    poly();
+   }
+
+   class Derived : public Base {
+   public:
+    ~Derived() override = default;
+    void normal();
+    void poly() override;
+   };
+
+   // [....]
+
+   Base *base1 = new Base();
+   Base *base2 = new Derived();
+
+   base1->call();
+   base2->call();
+
+The difference in behavior of the ``normal()`` and the ``poly()`` member
+functions is in which of the two member functions is called when
+executing `base1->call()` and `base2->call()`.  Without polymorphism, a
+function within the base class will call only member functions within
+the same scope, that is ``Base::call()`` will always call
+``Base::normal()``.  But for the `base2->call()` the call for the
+virtual member function will be dispatched to ``Derived::poly()``
+instead.  This mechanism allows to always call functions within the
+scope of the class type that was used to create the class instance, even
+if they are assigned to a pointer using the type of a base class. This
+is the desired behavior, and thanks to dynamic dispatch, LAMMPS can even
+use styles that are loaded at runtime from a shared object file with the
+:doc:`plugin command <plugin>`.
+
+A special case of virtual functions are so-called pure functions. These
+are virtual functions that are initialized to 0 in the class declaration
+(see example below).
+
+.. code-block:: c++
+
+   class Base {
+   public:
+    virtual void pure() = 0;
+   };
+
+This has the effect that it will no longer be possible to create an
+instance of the base class and that derived classes **must** implement
+these functions.  Many of the functions listed with the various class
+styles in the section :doc:`Modify` are such pure functions.  The
+motivation for this is to define the interface or API of the functions
+but defer the implementation to the derived classes.
+
+However, there are downsides to this. For example, calls to virtual
+functions from within a constructor, will not be in the scope of the
+derived class and thus it is good practice to either avoid calling them
+or to provide an explicit scope like in ``Base::poly()``.  Furthermore,
+any destructors in classes containing virtual functions should be
+declared virtual, too, so they are processed in the expected order
+before types are removed from dynamic dispatch.
+
+.. admonition:: Important Notes
+
+   In order to be able to detect incompatibilities and to avoid unexpected
+   behavior already at compile time, it is crucial that all member functions
+   that are intended to replace a virtual or pure function use the ``override``
+   property keyword.  For the same reason it should be avoided to use overloads
+   or default arguments for virtual functions as they lead to confusion over
+   which function is supposed to override which and which arguments need to be
+   declared.
+
+Style Factories
+===============
+
+In order to create class instances of the different styles, LAMMPS often
+uses a programming pattern called `Factory`.  Those are functions that create
+an instance of a specific derived class, say ``PairLJCut`` and return a pointer
+to the type of the common base class of that style, ``Pair`` in this case.
+To associate the factory function with the style keyword, an ``std::map``
+class is used in which function pointers are indexed by their keyword
+(for example "lj/cut" for ``PairLJCut`` and "morse" ``PairMorse``).
+A couple of typedefs help to keep the code readable and a template function
+is used to implement the actual factory functions for the individual classes.
+
+I/O and output formatting
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+C-style stdio versus C++ style iostreams
+========================================
+
+LAMMPS chooses to use the "stdio" library of the standard C library for
+reading from and writing to files and console instead of C++
+"iostreams".  This is mainly motivated by the better performance, better
+control over formatting, and less effort to achieve specific formatting.
+
+Since mixing "stdio" and "iostreams" can lead to unexpected behavior using
+the latter is strongly discouraged.  Also output to the screen should not
+use the predefined ``stdout`` FILE pointer, but rather the ``screen`` and
+``logfile`` FILE pointers managed by the LAMMPS class.  Furthermore, output
+should only be done by MPI rank 0 (``comm->me == 0``) and output that is
+send to both ``screen`` and ``logfile`` should use the
+:cpp:func:`utils::logmesg() convenience function <LAMMPS_NS::utils::logmesg>`.
+
+We also discourage the use for stringstreams as the bundled {fmt} library
+and the customized tokenizer classes can provide the same functionality
+in a cleaner way with better performance. This will also help to retain
+a consistent programming style despite the many different contributors.
+
+Formatting with the {fmt} library
+===================================
+
+The LAMMPS source code includes a copy of the `{fmt} library
+<https://fmt.dev>`_ which is preferred over formatting with the
+"printf()" family of functions.  The primary reason is that it allows a
+typesafe default format for any type of supported data.  This is
+particularly useful for formatting integers of a given size (32-bit or
+64-bit) which may require different format strings depending on compile
+time settings or compilers/operating systems.  Furthermore, {fmt} gives
+better performance, has more functionality, a familiar formatting syntax
+that has similarities to ``format()`` in Python, and provides a facility
+that can be used to integrate format strings and a variable number of
+arguments into custom functions in a much simpler way that the varargs
+mechanism of the C library.  Finally, {fmt} has been included into the
+C++20 language standard, so changes to adopt it are future proof.
+
+Formatted strings are frequently created by calling the
+``fmt::format()`` function which will return a string as ``std::string``
+class instance.  In contrast to the ``%`` placeholder in ``printf()``,
+the {fmt} library uses ``{}`` to embed format descriptors.  In the
+simplest case, no additional characters are needed as {fmt} will choose
+the default format based on the data type of the argument. Alternatively
+The ``fmt::print()`` function may be used instead of ``printf()`` or
+``fprintf()``.  In addition, several LAMMPS output functions, that
+originally accepted a single string as arguments have been overloaded to
+accept a format string with optional arguments as well (e.g.
+``Error::all()``, ``Error::one()``, ``utils::logmesg()``).
+
+Summary of the {fmt} format syntax
+==================================
+
+The syntax of the format string is "{[<argument id>][:<format spec>]}",
+where either the argument id or the format spec (separated by a colon
+':') is optional.  The argument id is usually a number starting from 0
+that is the index to the arguments following the format string.  By
+default these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.).  The most
+common case for using argument id would be to use the same argument in
+multiple places in the format string without having to provide it as an
+argument multiple times. In LAMMPS the argument id is rarely used.
+
+More common is the use of the format specifier, which starts with a
+colon.  This may optionally be followed by a fill character (default is
+' '). If provided, the fill character **must** be followed by an
+alignment character ('<', '^', '>' for left, centered, or right
+alignment (default)). The alignment character may be used without a fill
+character. The next important format parameter would be the minimum
+width, which may be followed by a dot '.'  and a precision for floating
+point numbers. The final character in the format string would be an
+indicator for the "presentation", i.e. 'd' for decimal presentation of
+integers, 'x' for hexadecimal, 'o' for octal, 'c' for character
+etc. This mostly follows the "printf()" scheme but without requiring an
+additional length parameter to distinguish between different integer
+widths. The {fmt} library will detect those and adapt the formatting
+accordingly.  For floating point numbers there are correspondingly, 'g'
+for generic presentation, 'e' for exponential presentation, and 'f' for
+fixed point presentation.
+
+Thus "{:8}" would represent *any* type argument using at least 8
+characters; "{:<8}" would do this as left aligned, "{:^8}" as centered,
+"{:>8}" as right aligned.  If a specific presentation is selected, the
+argument type must be compatible or else the {fmt} formatting code will
+throw an exception. Some format string examples are given below:
+
+.. code-block:: C
+
+   auto mesg = fmt::format("  CPU time: {:4d}:{:02d}:{:02d}\n", cpuh, cpum, cpus);
+   mesg = fmt::format("{:<8s}| {:<10.5g} | {:<10.5g} | {:<10.5g} |{:6.1f} |{:6.2f}\n",
+                      label, time_min, time, time_max, time_sq, tmp);
+   utils::logmesg(lmp,"{:>6} = max # of 1-2 neighbors\n",maxall);
+   utils::logmesg(lmp,"Lattice spacing in x,y,z = {:.8} {:.8} {:.8}\n",
+                  xlattice,ylattice,zlattice);
+
+which will create the following output lines:
+
+.. parsed-literal::
+
+     CPU time:    0:02:16
+     Pair    | 2.0133     | 2.0133     | 2.0133     |   0.0 | 84.21
+          4 = max # of 1-2 neighbors
+     Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
+
+A special feature of the {fmt} library is that format parameters like
+the width or the precision may be also provided as arguments. In that
+case a nested format is used where a pair of curly braces (with an
+optional argument id) "{}" are used instead of the value, for example
+"{:{}d}" will consume two integer arguments, the first will be the value
+shown and the second the minimum width.
+
+For more details and examples, please consult the `{fmt} syntax
+documentation <https://fmt.dev/latest/syntax.html>`_ website.
+
+
+Memory management
+^^^^^^^^^^^^^^^^^
+
+Dynamical allocation of data and objects should be done with either the
+C++ commands "new" and "delete/delete[]" or using member functions of
+the ``Memory`` class, most commonly, ``Memory::create()``,
+``Memory::grow()``, and ``Memory::destroy()``.  The use of ``malloc()``,
+``calloc()``, ``realloc()`` and ``free()`` directly is strongly
+discouraged.  To simplify adapting legacy code into the LAMMPS code base
+the member functions ``Memory::smalloc()``, ``Memory::srealloc()``, and
+``Memory::sfree()`` are available.
+
+Using those custom memory allocation functions is motivated by the
+following considerations:
+
+- memory allocation failures on *any* MPI rank during a parallel run
+  will trigger an immediate abort of the entire parallel calculation
+  instead of stalling it
+- a failing "new" will trigger an exception which is also captured by
+  LAMMPS and triggers a global abort
+- allocation of multi-dimensional arrays will be done in a C compatible
+  fashion but so that the storage of the actual data is stored in one
+  large consecutive block and thus when MPI communication is needed,
+  only this storage needs to be communicated (similar to Fortran arrays)
+- the "destroy()" and "sfree()" functions may safely be called on NULL
+  pointers
+- the "destroy()" functions will nullify the pointer variables making
+  "use after free" errors easy to detect
+- it is possible to use a larger than default memory alignment (not on
+  all operating systems, since the allocated storage pointers must be
+  compatible with ``free()`` for technical reasons)
+
+In the practical implementation of code this means that any pointer variables
+that are class members should be initialized to a ``nullptr`` value in their
+respective constructors.  That way it would be safe to call ``Memory::destroy()``
+or ``delete[]`` on them before *any* allocation outside the constructor.
+This helps to prevent memory leaks.
--- a/doc/src/Developer_notes.rst
+++ b/doc/src/Developer_notes.rst
@ -7,6 +7,61 @@ typically document what a variable stores, what a small section of
 code does, or what a function does and its input/outputs.  The topics
 on this page are intended to document code functionality at a higher level.

+Reading and parsing of text and text files
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+It is frequently required for a class in LAMMPS to read in additional
+data from a file, most commonly potential parameters from a potential
+file for manybody potentials.  LAMMPS provides several custom classes
+and convenience functions to simplify the process.  This offers the
+following benefits:
+
+- better code reuse and fewer lines of code needed to implement reading
+  and parsing data from a file
+- better detection of format errors, incompatible data, and better error messages
+- exit with an error message instead of silently converting only part of the
+  text to a number or returning a 0 on unrecognized text and thus reading incorrect values
+- re-entrant code through avoiding global static variables (as used by ``strtok()``)
+- transparent support for translating unsupported UTF-8 characters to their ASCII equivalents
+  (the text to value conversion functions **only** accept ASCII characters)
+
+In most cases (e.g. potential files) the same data is needed on all MPI
+ranks.  Then it is best to do the reading and parsing only on MPI rank
+0, and communicate the data later with one or more ``MPI_Bcast()``
+calls.  For reading generic text and potential parameter files the
+custom classes :cpp:class:`TextFileReader <LAMMPS_NS::TextFileReader>`
+and :cpp:class:`PotentialFileReader <LAMMPS_NS::PotentialFileReader>`
+are available. Those classes allow to read the file as individual lines
+for which they can return a tokenizer class (see below) for parsing the
+line, or they can return blocks of numbers as a vector directly.  The
+documentation on `File reader classes <file-reader-classes>`_ contains
+an example for a typical case.
+
+When reading per-atom data, the data in the file usually needs include
+an atom ID so it can be associated with a particular atom.  In that case
+the data can be read in multi-line chunks and broadcast to all MPI ranks
+with :cpp:func:`utils::read_lines_from_file()
+<LAMMPS_NS::utils::read_lines_from_file>`.  Those chunks are then
+split into lines, parsed, and applied only to atoms the MPI rank
+"owns".
+
+For splitting a string (incrementally) into words and optionally
+converting those to numbers, the :cpp:class:`Tokenizer
+<LAMMPS_NS::Tokenizer>` and :cpp:class:`ValueTokenizer
+<LAMMPS_NS::ValueTokenizer>` can be used.  Those provide a superset of
+the functionality of ``strtok()`` from the C-library and the latter also
+includes conversion to different types.  Any errors while processing the
+string in those classes will result in an exception, which can be caught
+and the error processed as needed.  Unlike the C-library functions
+``atoi()``, ``atof()``, ``strtol()``, or ``strtod()`` the conversion
+will check if the converted text is a valid integer of floating point
+number and will not silently return an unexpected or incorrect value.
+For example, ``atoi()`` will return 12 when converting "12.5" while the
+ValueTokenizer class will throw an :cpp:class:`InvalidIntegerException
+<LAMMPS_NS::InvalidIntegerException>` if
+:cpp:func:`ValueTokenizer::next_int()
+<LAMMPS_NS::ValueTokenizer::next_int>` is called on the same string.
+
 Fix contributions to instantaneous energy, virial, and cumulative energy
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

--- a/doc/src/Developer_utils.rst
+++ b/doc/src/Developer_utils.rst
@ -21,18 +21,21 @@ In that case, the functions will stop with an error message, indicating
 the name of the problematic file, if possible unless the *error* argument
 is a NULL pointer.

-The :cpp:func:`fgets_trunc` function will work similar for ``fgets()``
-but it will read in a whole line (i.e. until the end of line or end
-of file), but store only as many characters as will fit into the buffer
-including a final newline character and the terminating NULL byte.
-If the line in the file is longer it will thus be truncated in the buffer.
-This function is used by :cpp:func:`read_lines_from_file` to read individual
-lines but make certain they follow the size constraints.
+The :cpp:func:`utils::fgets_trunc() <LAMMPS_NS::utils::fgets_trunc>`
+function will work similar for ``fgets()`` but it will read in a whole
+line (i.e. until the end of line or end of file), but store only as many
+characters as will fit into the buffer including a final newline
+character and the terminating NULL byte.  If the line in the file is
+longer it will thus be truncated in the buffer.  This function is used
+by :cpp:func:`utils::read_lines_from_file()
+<LAMMPS_NS::utils::read_lines_from_file>` to read individual lines but
+make certain they follow the size constraints.

-The :cpp:func:`read_lines_from_file` function will read the requested
-number of lines of a maximum length into a buffer and will return 0
-if successful or 1 if not. It also guarantees that all lines are
-terminated with a newline character and the entire buffer with a
+The :cpp:func:`utils::read_lines_from_file()
+<LAMMPS_NS::utils::read_lines_from_file>` function will read the
+requested number of lines of a maximum length into a buffer and will
+return 0 if successful or 1 if not. It also guarantees that all lines
+are terminated with a newline character and the entire buffer with a
 NULL character.

 ----------
@ -62,7 +65,7 @@ silently returning the result of a partial conversion or zero in cases
 where the string is not a valid number.  This behavior improves
 detecting typos or issues when processing input files.

-Similarly the :cpp:func:`logical() <LAMMPS_NS::utils::logical>` function
+Similarly the :cpp:func:`utils::logical() <LAMMPS_NS::utils::logical>` function
 will convert a string into a boolean and will only accept certain words.

 The *do_abort* flag should be set to ``true`` in case  this function
@ -70,8 +73,8 @@ is called only on a single MPI rank, as that will then trigger the
 a call to ``Error::one()`` for errors instead of ``Error::all()``
 and avoids a "hanging" calculation when run in parallel.

-Please also see :cpp:func:`is_integer() <LAMMPS_NS::utils::is_integer>`
-and :cpp:func:`is_double() <LAMMPS_NS::utils::is_double>` for testing
+Please also see :cpp:func:`utils::is_integer() <LAMMPS_NS::utils::is_integer>`
+and :cpp:func:`utils::is_double() <LAMMPS_NS::utils::is_double>` for testing
 strings for compliance without conversion.

 ----------
@ -340,11 +343,11 @@ This code example should produce the following output:

 .. doxygenclass:: LAMMPS_NS::InvalidIntegerException
   :project: progguide
-   :members: what
+   :members:

 .. doxygenclass:: LAMMPS_NS::InvalidFloatException
   :project: progguide
-   :members: what
+   :members:

 ----------

@ -393,21 +396,26 @@ A typical code segment would look like this:

 ----------

+.. file-reader-classes:
+
 File reader classes
 -------------------

 The purpose of the file reader classes is to simplify the recurring task
 of reading and parsing files. They can use the
-:cpp:class:`LAMMPS_NS::ValueTokenizer` class to process the read in
-text.  The :cpp:class:`LAMMPS_NS::TextFileReader` is a more general
-version while :cpp:class:`LAMMPS_NS::PotentialFileReader` is specialized
-to implement the behavior expected for looking up and reading/parsing
-files with potential parameters in LAMMPS.  The potential file reader
-class requires a LAMMPS instance, requires to be run on MPI rank 0 only,
-will use the :cpp:func:`LAMMPS_NS::utils::get_potential_file_path`
-function to look up and open the file, and will call the
-:cpp:class:`LAMMPS_NS::Error` class in case of failures to read or to
-convert numbers, so that LAMMPS will be aborted.
+:cpp:class:`ValueTokenizer <LAMMPS_NS::ValueTokenizer>` class to process
+the read in text.  The :cpp:class:`TextFileReader
+<LAMMPS_NS::TextFileReader>` is a more general version while
+:cpp:class:`PotentialFileReader <LAMMPS_NS::PotentialFileReader>` is
+specialized to implement the behavior expected for looking up and
+reading/parsing files with potential parameters in LAMMPS.  The
+potential file reader class requires a LAMMPS instance, requires to be
+run on MPI rank 0 only, will use the
+:cpp:func:`utils::get_potential_file_path
+<LAMMPS_NS::utils::get_potential_file_path>` function to look up and
+open the file, and will call the :cpp:class:`LAMMPS_NS::Error` class in
+case of failures to read or to convert numbers, so that LAMMPS will be
+aborted.

 .. code-block:: C++
   :caption: Use of PotentialFileReader class in pair style coul/streitz
@ -482,10 +490,10 @@ provided, as that is used to determine whether a new page of memory
 must be used.

 The :cpp:class:`MyPage <LAMMPS_NS::MyPage>` class offers two ways to
-reserve a chunk: 1) with :cpp:func:`get() <LAMMPS_NS::MyPage::get>` the
-chunk size needs to be known in advance, 2) with :cpp:func:`vget()
+reserve a chunk: 1) with :cpp:func:`MyPage::get() <LAMMPS_NS::MyPage::get>` the
+chunk size needs to be known in advance, 2) with :cpp:func:`MyPage::vget()
 <LAMMPS_NS::MyPage::vget>` a pointer to the next chunk is returned, but
-its size is registered later with :cpp:func:`vgot()
+its size is registered later with :cpp:func:`MyPage::vgot()
 <LAMMPS_NS::MyPage::vgot>`.

 .. code-block:: C++
@ -588,4 +596,3 @@ the communication buffers.

 .. doxygenunion:: LAMMPS_NS::ubuf
   :project: progguide
-
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@ -52,8 +52,8 @@ aij
 aimd
 airebo
 Aj
-ajs
 ajaramil
+ajs
 akohlmey
 Aktulga
 al
@ -119,10 +119,10 @@ Appl
 Apu
 arallel
 arccos
-arge
 Archlinux
 arcsin
 arg
+arge
 args
 argv
 arrhenius
@ -149,9 +149,9 @@ atc
 AtC
 ATC
 athermal
+athomps
 atime
 atimestep
-athomps
 atm
 atomeye
 atomfile
@ -196,7 +196,6 @@ Bagi
 Bagnold
 Baig
 Bajaj
-Bkappa
 Bal
 balancer
 Balankura
@ -215,8 +214,8 @@ barostatting
 Barostatting
 Barrat
 Barros
-Bartelt
 Bartels
+Bartelt
 barycenter
 barye
 Bashford
@ -258,7 +257,6 @@ bigint
 Bij
 bilayer
 bilayers
-biquadratic
 binsize
 binstyle
 binutils
@ -267,6 +265,7 @@ biomolecule
 Biomolecules
 Biophys
 Biosym
+biquadratic
 bisectioning
 bispectrum
 Bispectrum
@ -277,6 +276,7 @@ bitrate
 bitrates
 Bitzek
 Bjerrum
+Bkappa
 Blaise
 blanchedalmond
 blocksize
@ -315,14 +315,14 @@ Botu
 Bouguet
 Bourne
 boxcolor
-boxlo
 boxhi
-boxxlo
+boxlo
 boxxhi
-boxylo
+boxxlo
 boxyhi
-boxzlo
+boxylo
 boxzhi
+boxzlo
 bp
 bpclermont
 bpls
@ -422,13 +422,14 @@ Chaudhuri
 checkbox
 checkmark
 checkqeq
+checksum
 chemistries
 Chemnitz
 Cheng
 Chenoweth
 chiral
-ChiralIDs
 chiralIDs
+ChiralIDs
 chirality
 Cho
 ChooseOffset
@ -499,12 +500,12 @@ cond
 conda
 Conda
 Condens
-Connor
 conf
 config
 configfile
 configurational
 conformational
+Connor
 ConstMatrix
 Contrib
 cooperativity
@ -561,14 +562,14 @@ cstring
 cstyle
 csvr
 ctrl
-Ctypes
 ctypes
+Ctypes
 cuda
 Cuda
 CUDA
+cuFFT
 CuH
 Cui
-cuFFT
 Cummins
 Curk
 Cusentino
@ -630,11 +631,11 @@ de
 dE
 De
 deallocated
-decorrelation
 debye
 Debye
 Decius
 decompositions
+decorrelation
 decrementing
 deeppink
 deepskyblue
@ -643,8 +644,8 @@ defn
 deformable
 del
 delaystep
-DeleteIDs
 deleteIDs
+DeleteIDs
 delflag
 Dellago
 delocalization
@ -704,8 +705,8 @@ Dihedrals
 dihydride
 Dij
 dimdim
-dimensioned
 dimensionality
+dimensioned
 dimgray
 dipolar
 dir
@ -730,10 +731,10 @@ dmg
 dmi
 dnf
 DNi
-Dobson
 Dobnikar
-Dodds
+Dobson
 docenv
+Dodds
 dodgerblue
 dof
 doi
@ -741,10 +742,10 @@ Donadio
 Donev
 dotc
 Doty
+downarrow
 doxygen
 doxygenclass
 doxygenfunction
-downarrow
 Doye
 Doyl
 dpd
@ -813,8 +814,8 @@ eco
 ecoul
 ecp
 Ecut
-EdgeIDs
 edgeIDs
+EdgeIDs
 edihed
 edim
 edip
@ -826,8 +827,8 @@ ee
 Eebt
 ees
 eFF
-efield
 effm
+efield
 eflag
 eflux
 eg
@ -836,10 +837,10 @@ ehex
 eHEX
 Ei
 eigen
+eigendecomposition
 eigensolve
 eigensolver
 eigensolvers
-eigendecomposition
 eigenvalue
 eigenvalues
 eigenvector
@ -914,15 +915,15 @@ equilibrated
 equilibrates
 equilibrating
 equilibration
-Equilibria
 equilibria
+Equilibria
 equilization
 equipartitioning
-Ercolessi
-Erdmann
 eradius
 erate
 erc
+Ercolessi
+Erdmann
 erf
 erfc
 Erhart
@ -932,10 +933,10 @@ erotate
 errno
 Ertas
 ervel
-Espanol
-Eshelby
 eshelby
+Eshelby
 eskm
+Espanol
 esph
 estretch
 esu
@ -950,20 +951,20 @@ etol
 etot
 etotal
 etube
-Eulerian
 eulerian
+Eulerian
 eulerimplicit
 Europhys
 ev
 eV
+eval
+evals
 evalue
 Evanseck
 evdwl
-evector
 evec
 evecs
-eval
-evals
+evector
 Everaers
 Evgeny
 evirials
@ -992,13 +993,13 @@ fbMC
 Fc
 fcc
 fcm
-Fd
 fd
+Fd
 fdotr
 fdt
+fe
 Fehlberg
 Fellinger
-fe
 femtosecond
 femtoseconds
 fene
@ -1033,11 +1034,14 @@ filename
 Filename
 filenames
 Filenames
-Fily
 fileper
 filesystem
+filesystems
+Fily
 Fincham
 Finchham
+fingerprintconstants
+fingerprintsperelement
 Finnis
 Fiorin
 fixID
@ -1076,8 +1080,8 @@ forestgreen
 formatarg
 formulae
 Forschungszentrum
-Fortran
 fortran
+Fortran
 Fosado
 fourier
 fp
@ -1099,6 +1103,7 @@ fstyle
 fsw
 ftm
 ftol
+fuer
 fugacity
 Fumi
 func
@ -1106,7 +1111,6 @@ funcs
 functionalities
 functionals
 funroll
-fuer
 fx
 fy
 fz
@ -1145,8 +1149,8 @@ Germann
 Germano
 gerolf
 Gerolf
-getrusage
 Gershgorin
+getrusage
 getter
 gettimeofday
 gewald
@ -1223,8 +1227,8 @@ gsmooth
 gstyle
 GTL
 Gubbins
-Guericke
 Guenole
+Guericke
 gui
 Gumbsch
 Gunsteren
@ -1300,7 +1304,6 @@ histogrammed
 histogramming
 hma
 hmaktulga
-hplanck
 hoc
 Hochbruck
 Hofling
@ -1317,6 +1320,7 @@ howto
 Howto
 Hoy
 Hoyt
+hplanck
 Hs
 hstyle
 html
@ -1347,8 +1351,8 @@ hyperspherical
 hysteretic
 hz
 IAP
-Ibanez
 iatom
+Ibanez
 ibar
 ibm
 icc
@ -1439,8 +1443,8 @@ ipi
 ipp
 Ippolito
 IPv
-IPython
 ipython
+IPython
 Isele
 isenthalpic
 ish
@ -1456,8 +1460,8 @@ isotropically
 isovolume
 Isralewitz
 iter
-iters
 iteratively
+iters
 Ith
 Itsets
 itype
@ -1600,8 +1604,8 @@ KMP
 kmu
 Knizhnik
 knl
-Kofke
 kofke
+Kofke
 Kohlmeyer
 Kohn
 kokkos
@ -1653,15 +1657,15 @@ Lackmann
 Ladd
 lagrangian
 lambdai
-lamda
 LambdaLanczos
+lamda
 lammps
 Lammps
 LAMMPS
 lammpsplot
 lammpsplugin
-Lampis
 Lamoureux
+Lampis
 Lanczos
 Lande
 Landron
@ -1674,8 +1678,8 @@ larentzos
 Larentzos
 Laroche
 lars
-LATBOLTZ
 latboltz
+LATBOLTZ
 latencies
 Latour
 latourr
@ -1830,13 +1834,13 @@ Lyulin
 lz
 lzma
 Maaravi
-MACHDYN
 machdyn
+MACHDYN
 Mackay
 Mackrodt
+MacOS
 Macromolecules
 macroparticle
-MacOS
 Madura
 Magda
 Magdeburg
@ -1920,8 +1924,8 @@ mc
 McLachlan
 md
 mdf
-MDI
 mdi
+MDI
 mdpd
 mDPD
 meam
@ -1945,8 +1949,8 @@ Mei
 Melchor
 Meloni
 Melrose
-Mem
 mem
+Mem
 memalign
 MEMALIGN
 membered
@ -1960,10 +1964,10 @@ Merz
 meshless
 meso
 mesocnt
-MESODPD
 mesodpd
-MESONT
+MESODPD
 mesont
+MESONT
 mesoparticle
 mesoscale
 mesoscopic
@ -1998,8 +2002,8 @@ Militzer
 Minary
 mincap
 Mindlin
-minhbonds
 mingw
+minhbonds
 minima
 minimizations
 minimizer
@ -2098,6 +2102,7 @@ Muccioli
 mui
 Mukherjee
 Mulders
+Müller
 mult
 multi
 multibody
@ -2126,7 +2131,6 @@ muVT
 mux
 muy
 muz
-Müller
 mv
 mV
 Mvapich
@ -2146,9 +2150,9 @@ nabla
 Nagaosa
 Nakano
 nall
+namedtuple
 namespace
 namespaces
-namedtuple
 nan
 NaN
 Nandor
@ -2164,8 +2168,8 @@ nanometer
 nanometers
 nanoparticle
 nanoparticles
-Nanotube
 nanotube
+Nanotube
 nanotubes
 Narulkar
 nasa
@ -2201,8 +2205,8 @@ ncount
 nd
 ndescriptors
 ndihedrals
-ndihedraltypes
 Ndihedraltype
+ndihedraltypes
 Ndirango
 ndof
 Ndof
@ -2214,9 +2218,9 @@ Neel
 Neelov
 Negre
 nelem
-nelems
 Nelement
 Nelements
+nelems
 nemd
 netcdf
 netstat
@ -2250,8 +2254,8 @@ Nicklas
 Niklasson
 Nikolskiy
 nimpropers
-nimpropertypes
 Nimpropertype
+nimpropertypes
 Ninteger
 NiO
 Nissila
@ -2265,8 +2269,8 @@ nktv
 nl
 nlayers
 nlen
-Nlines
 nlines
+Nlines
 nlo
 nlocal
 Nlocal
@ -2274,16 +2278,16 @@ Nlog
 nlp
 nm
 Nm
-Nmax
 nmax
+Nmax
 nmc
-Nmin
 nmin
+Nmin
 Nmols
 nn
 nnodes
-Nocedal
 nO
+Nocedal
 nocite
 nocoeff
 nodeless
@ -2336,11 +2340,11 @@ Nrho
 Nroff
 nrow
 nrun
+ns
 Ns
 Nsample
 Nskip
 Nspecies
-ns
 nsq
 Nstart
 nstats
@ -2349,9 +2353,9 @@ Nsteplast
 Nstop
 nsub
 Nswap
+nt
 Nt
 Ntable
-nt
 ntheta
 nthreads
 ntimestep
@ -2394,11 +2398,11 @@ ocl
 octahedral
 octants
 Ohara
+O'Hearn
 ohenrich
 ok
 Okabe
 Okamoto
-O'Hearn
 O'Keefe
 OKeefe
 oldlace
@ -2456,8 +2460,8 @@ overdamped
 overlayed
 Ovito
 oxdna
-oxrna
 oxDNA
+oxrna
 oxRNA
 packings
 padua
@ -2506,7 +2510,6 @@ pc
 pchain
 Pchain
 pcmoves
-pmcmoves
 Pdamp
 pdb
 pdf
@ -2565,13 +2568,16 @@ Pieniazek
 Pieter
 pIm
 pimd
-pIp
 Piola
+pIp
 Pisarev
 Pishevar
 Pitera
 pj
 pjintve
+pKa
+pKb
+pKs
 planeforce
 Plathe
 Plimpton
@ -2580,10 +2586,8 @@ ploop
 PloS
 plt
 plumedfile
-pKa
-pKb
-pKs
 pmb
+pmcmoves
 Pmolrotate
 Pmoltrans
 pN
@ -2605,8 +2609,8 @@ polydisperse
 polydispersity
 polyelectrolyte
 polyhedra
-polymorphism
 Polym
+polymorphism
 popen
 Popov
 popstore
@ -2622,11 +2626,12 @@ Potapkin
 potin
 Pourtois
 powderblue
+PowerShell
 ppn
 pppm
-prd
 Prakash
 Praprotnik
+prd
 pre
 Pre
 prec
@ -2643,8 +2648,8 @@ Priya
 proc
 Proc
 procs
-Prony
 progguide
+Prony
 ps
 Ps
 pscreen
@ -2675,8 +2680,8 @@ px
 Px
 pxx
 Pxx
-Pxy
 pxy
+Pxy
 pxz
 py
 Py
@ -2693,13 +2698,13 @@ Pyy
 pyz
 pz
 Pz
-Pzz
 pzz
+Pzz
 qbmsst
 qcore
 qdist
-qE
 qe
+qE
 qeff
 qelectron
 qeq
@ -2775,15 +2780,15 @@ RDideal
 rdx
 reacter
 Readline
-realTypeMap
-real_t
 README
+real_t
 realtime
+realTypeMap
 reamin
 reax
-REAXFF
-ReaxFF
 reaxff
+ReaxFF
+REAXFF
 rebo
 recurse
 recursing
@ -2811,8 +2816,8 @@ Rensselaer
 reparameterizing
 repo
 representable
-Reproducibility
 reproducibility
+Reproducibility
 repuls
 reqid
 rescale
@ -2934,10 +2939,10 @@ rxd
 rxnave
 rxnsum
 ry
-rz
 Ryckaert
 Rycroft
 Rydbergs
+rz
 Rz
 Sabry
 saddlebrown
@ -2970,9 +2975,9 @@ Schimansky
 Schiotz
 Schlitter
 Schmid
-Schratt
 Schoen
 Schotte
+Schratt
 Schulten
 Schunk
 Schuring
@ -3010,6 +3015,7 @@ Setmask
 setpoint
 setvel
 sfftw
+sfree
 Sg
 Shan
 Shanno
@ -3027,8 +3033,8 @@ Shiga
 Shinoda
 Shiomi
 shlib
-SHM
 shm
+SHM
 shockvel
 shrinkexceed
 Shugaev
@ -3147,10 +3153,10 @@ stepwise
 Stesmans
 Stillinger
 stk
-Stockmayer
-Stoddard
 stochastically
 stochasticity
+Stockmayer
+Stoddard
 stoichiometric
 stoichiometry
 Stokesian
@ -3169,6 +3175,7 @@ Streiz
 strerror
 strided
 strietz
+stringstreams
 strmatch
 strncmp
 strstr
@ -3210,8 +3217,8 @@ Swiler
 Swinburne
 Swol
 Swope
-Sx
 sx
+Sx
 sy
 Sy
 symplectic
@ -3220,8 +3227,8 @@ sys
 sysdim
 Syst
 systemd
-Sz
 sz
+Sz
 Tabbernor
 tabinner
 Tadmor
@ -3268,9 +3275,9 @@ tfmc
 tfMC
 tgnpt
 tgnvt
+th
 Thakkar
 Thaokar
-th
 thb
 thei
 Theodorou
@ -3328,11 +3335,11 @@ Tmin
 tmp
 tN
 Tobias
+Toennies
 Tohoku
 tokenizer
 tokyo
 tol
-Toennies
 tomic
 toolchain
 topologies
@ -3404,10 +3411,12 @@ twojmax
 Tx
 txt
 Tyagi
+typeargs
+typedefs
 typeI
 typeJ
 typeN
-typeargs
+typesafe
 Tz
 Tzou
 ub
@ -3425,8 +3434,8 @@ uk
 ul
 ulb
 Uleft
-uloop
 Ulomek
+uloop
 ulsph
 Ultrafast
 uMech
@ -3491,6 +3500,7 @@ Valuev
 Vanden
 Vandenbrande
 Vanduyfhuys
+varargs
 varavg
 Varshalovich
 Varshney
@ -3584,10 +3594,10 @@ vzcm
 vzi
 Waals
 Wadley
-Waroquier
 wallstyle
 walltime
 Waltham
+Waroquier
 wavepacket
 wB
 Wbody
@ -3605,12 +3615,12 @@ whitesmoke
 whitespace
 Wi
 Wicaksono
-Widom
 widom
+Widom
 Wijk
 Wikipedia
-Wildcard
 wildcard
+Wildcard
 wildcards
 Winkler
 Wirnsberger
@ -3624,12 +3634,12 @@ Worley
 Wriggers
 Wuppertal
 Wurtzite
-Wysocki
 www
 wx
 Wx
 wy
 Wy
+Wysocki
 wz
 Wz
 xa
@ -3677,10 +3687,10 @@ xyz
 xz
 xzhou
 yaff
-yaml
-Yanxon
 YAFF
 Yamada
+yaml
+Yanxon
 Yaser
 Yazdani
 Ybar
@ -3711,14 +3721,15 @@ Yuya
 yx
 yy
 yz
+Zagaceta
 Zannoni
 Zavattieri
 zbl
 ZBL
 Zc
 zcm
-Zeeman
 zeeman
+Zeeman
 Zemer
 Zepeda
 zflag
@ -3732,29 +3743,23 @@ zi
 Zi
 ziegenhain
 Ziegenhain
+zincblende
 Zj
 zlim
 zlo
+Zm
 zmax
 zmin
 zmq
 zN
 zs
 zst
+Zstandard
+zstd
+Zstd
 zsu
 zu
 zx
 zy
 Zybin
 zz
-Zm
-PowerShell
-filesystems
-fingerprintconstants
-fingerprintsperelement
-Zagaceta
-zincblende
-Zstandard
-Zstd
-zstd
-checksum
--- a/src/tokenizer.h
+++ b/src/tokenizer.h
@ -52,10 +52,15 @@ class Tokenizer {
  std::vector<std::string> as_vector();
 };

+/** General Tokenizer exception class */
+
 class TokenizerException : public std::exception {
  std::string message;

 public:
+  // remove unused default constructor
+  TokenizerException() = delete;
+
  /** Thrown during retrieving or skipping tokens
   *
   * \param  msg    String with error message
@ -67,7 +72,10 @@ class TokenizerException : public std::exception {
  const char *what() const noexcept override { return message.c_str(); }
 };

+/** Exception thrown by ValueTokenizer when trying to convert an invalid integer string */
+
 class InvalidIntegerException : public TokenizerException {
+
 public:
  /** Thrown during converting string to integer number
   *
@ -78,6 +86,8 @@ class InvalidIntegerException : public TokenizerException {
  }
 };

+/** Exception thrown by ValueTokenizer when trying to convert an floating point string */
+
 class InvalidFloatException : public TokenizerException {
 public:
  /** Thrown during converting string to floating point number