From 6887a16fa13c9b29f8447c85bf36341c3b5b8917 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Mon, 24 Jan 2022 17:49:32 -0500
Subject: [PATCH 01/11] start add general code design doc.

---
 doc/src/Developer.rst                |  1 +
 doc/src/Developer_cxx_vs_c_style.rst | 73 ++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)
 create mode 100644 doc/src/Developer_cxx_vs_c_style.rst

diff --git a/doc/src/Developer.rst b/doc/src/Developer.rst
index fd4a44a8a0..4d82f93625 100644
--- a/doc/src/Developer.rst
+++ b/doc/src/Developer.rst
@@ -11,6 +11,7 @@ of time and requests from the LAMMPS user community.
    :maxdepth: 1
 
    Developer_org
+   Developer_cxx_vs_c_style
    Developer_parallel
    Developer_flow
    Developer_write
diff --git a/doc/src/Developer_cxx_vs_c_style.rst b/doc/src/Developer_cxx_vs_c_style.rst
new file mode 100644
index 0000000000..868b183ec0
--- /dev/null
+++ b/doc/src/Developer_cxx_vs_c_style.rst
@@ -0,0 +1,73 @@
+Code design
+-----------
+
+This section discusses some of the code design choices in LAMMPS and
+overall strategy in order to assist developers to write new code that
+will fit well with the remaining code.  Please see the section on
+:doc:`Requirements for contributed code <Modify_style>` for more
+specific recommendations and guidelines.  Here the focus is on overall
+strategy and discussion of some relevant C++ programming language
+constructs.
+
+Historically, the basic design philosophy of the LAMMPS C++ code was
+that of a "C with classes" style.  The was motivated by the desire to
+make it easier to modify LAMMPS for people without significant training
+in C++ programming and by trying to use data structures and code constructs
+that somewhat resemble the previous implementation(s) in Fortran.
+A contributing factor for this choice also was that at the time the
+implementation of C++ compilers was not always very mature and some of
+the advanced features contained bugs or were not functioning exactly
+as the standard required; plus there was some disagreement between
+compiler vendors about how to interpret the C++ standard documents.
+
+However, C++ compilers have advanced a lot since then and with the
+transition to requiring the C++11 standard in 2020 as the minimum C++ language
+standard for LAMMPS, the decision was made to also replace some of the
+C-style constructs with equivalent C++ functionality, either from the
+C++ standard library or as custom classes or function, in order to
+improve readability of the code and to increase code reuse through
+abstraction of commonly used functionality.
+
+
+Object oriented code
+^^^^^^^^^^^^^^^^^^^^
+
+LAMMPS is designed to be an object oriented code, that is each simulation
+is represented by an instance of the LAMMPS class.  When running in parallel,
+of course, each MPI process will create such an instance.  This can be seen
+in the ``main.cpp`` file where the core steps of running a LAMMPS simulation
+are the following 3 lines of code:
+
+.. code-block:: C++
+
+    LAMMPS *lammps = new LAMMPS(argc, argv, lammps_comm);
+    lammps->input->file();
+    delete lammps;
+
+The first line creates a LAMMPS class instance and passes the command line arguments
+and the global communicator to its constructor.  The second line tells the LAMMPS
+instance to process the input (either from standard input or the provided input file)
+until the end.  And the third line deletes that instance again.  The remainder of
+the main.cpp file are for error handling, MPI configuration and other special features.
+
+In the constructor of the LAMMPS class instance the basic LAMMPS class hierachy
+is created as shown in :ref:`class-topology`.  While processing the input further
+class instances are created, or deleted, or replaced and specific member functions
+of specific classes are called to trigger actions like creating atoms, computing
+forces, computing properties, propagating the system, or writing output.
+
+
+Inheritance and Compositing
+===========================
+
+Polymorphism
+============
+
+
+I/O and output formatting
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Memory management
+^^^^^^^^^^^^^^^^^
+
+

From 1c7e1faeff8ecb214f30754c64cf38708fe29179 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Fri, 11 Feb 2022 11:59:27 -0500
Subject: [PATCH 02/11] add sections on inheritance, compositing, polymorphism

---
 doc/src/Developer_cxx_vs_c_style.rst        | 164 ++++++++++++++++++--
 doc/utils/sphinx-config/false_positives.txt |   1 +
 2 files changed, 150 insertions(+), 15 deletions(-)

diff --git a/doc/src/Developer_cxx_vs_c_style.rst b/doc/src/Developer_cxx_vs_c_style.rst
index 868b183ec0..0b0526e7f9 100644
--- a/doc/src/Developer_cxx_vs_c_style.rst
+++ b/doc/src/Developer_cxx_vs_c_style.rst
@@ -32,11 +32,11 @@ abstraction of commonly used functionality.
 Object oriented code
 ^^^^^^^^^^^^^^^^^^^^
 
-LAMMPS is designed to be an object oriented code, that is each simulation
-is represented by an instance of the LAMMPS class.  When running in parallel,
-of course, each MPI process will create such an instance.  This can be seen
-in the ``main.cpp`` file where the core steps of running a LAMMPS simulation
-are the following 3 lines of code:
+LAMMPS is designed to be an object oriented code, that is each
+simulation is represented by an instance of the LAMMPS class.  When
+running in parallel, of course, each MPI process will create such an
+instance.  This can be seen in the ``main.cpp`` file where the core
+steps of running a LAMMPS simulation are the following 3 lines of code:
 
 .. code-block:: C++
 
@@ -44,30 +44,164 @@ are the following 3 lines of code:
     lammps->input->file();
     delete lammps;
 
-The first line creates a LAMMPS class instance and passes the command line arguments
-and the global communicator to its constructor.  The second line tells the LAMMPS
-instance to process the input (either from standard input or the provided input file)
-until the end.  And the third line deletes that instance again.  The remainder of
-the main.cpp file are for error handling, MPI configuration and other special features.
+The first line creates a LAMMPS class instance and passes the command
+line arguments and the global communicator to its constructor.  The
+second line tells the LAMMPS instance to process the input (either from
+standard input or the provided input file) until the end.  And the third
+line deletes that instance again.  The remainder of the main.cpp file
+are for error handling, MPI configuration and other special features.
 
-In the constructor of the LAMMPS class instance the basic LAMMPS class hierachy
+
+In the constructor of the LAMMPS class instance the basic LAMMPS class hierarchy
 is created as shown in :ref:`class-topology`.  While processing the input further
 class instances are created, or deleted, or replaced and specific member functions
 of specific classes are called to trigger actions like creating atoms, computing
 forces, computing properties, propagating the system, or writing output.
 
-
-Inheritance and Compositing
+Compositing and Inheritance
 ===========================
 
+LAMMPS makes extensive use of the object oriented programming (OOP)
+principles of *compositing* and *inheritance*. Classes like the
+``LAMMPS`` class are a **composite** containing pointers to instances of
+other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``,
+``Modify``, and so on. Each of these classes implement certain
+functionality by storing and manipulating data related to the simulation
+and providing member functions that trigger certain actions.  Some of
+those classes like ``Force`` and a composite again containing instances
+of classes describing the force interactions or ``Modify`` containing
+and calling fixes and computes. In most cases there is only one instance
+of those member classes allowed, but in a few cases there can also be
+multiple instances and the parent class is maintaining a list of the
+pointers of instantiated classes.
+
+Changing behavior or adjusting how LAMMPS handles the simulation is
+implemented via **inheritance** where different variants of the
+functionality are realized by creating *derived* classes that can share
+common functionality in their base class and provide a consistent
+interface where the derived classes replace (dummy or pure) functions in
+the base class. The higher level classes can then call those methods of
+the instantiated classes without having to know which specific derived
+class variant was instantiated.  In the LAMMPS documentation those
+derived classes are usually referred to a "styles", e.g.  pair styles,
+fix styles, atom styles and so on.
+
+This is the origin of the flexibility of LAMMPS and facilitates for
+example to compute forces for very different non-bonded potential
+functions by having different pair styles (implemented as different
+classes derived from the ``Pair`` class) where the evaluation of the
+potential function is confined to the implementation of the individual
+classes.  Whenever a new :doc:`pair_style` or :doc:`bond_style` or
+:doc:`comm_style` or similar command is processed in the LAMMPS input
+any existing class instance is deleted and a new instance created in
+it place.
+
+Further code sharing is possible by creating derived classes from the
+derived classes (for instance to implement an accelerated version of a
+pair style) where then only a subset of the methods are replaced with
+the accelerated versions.
+
 Polymorphism
 ============
 
+Polymorphism and dynamic dispatch are another OOP feature that play an
+important part of how LAMMPS selects which code to execute.  In a nutshell,
+this is a mechanism where the decision of which member function to call
+from a class is determined at runtime and not when the code is compiled.
+To enable it, the function has to be declared as ``virtual`` and all
+corresponding functions in derived classes should be using the ``override``
+property. Below is a brief example.
+
+.. code-block:: c++
+
+   class Base {
+   public:
+    virtual ~Base() = default;
+    void call();
+    void normal();
+    virtual void poly();
+   };
+
+   void Base::call() {
+    normal();
+    poly();
+   }
+
+   class Derived : public Base {
+   public:
+    ~Derived() override = default;
+    void normal();
+    void poly() override;
+   };
+
+   // [....]
+
+   Base *base1 = new Base();
+   Base *base2 = new Derived();
+
+   base1->call();
+   base2->call();
+
+The difference in behavior of the ``normal()`` and the ``poly()`` member
+functions is in which of the two member functions is called when
+executing `base1->call()` and `base2->call()`.  Without polymorphism, a
+function within the base class will call only member functions within
+the same scope, that is ``Base::call()`` will always call
+``Base::normal()``.  But for the `base2->call()` the call for the
+virtual member function will be dispatched to ``Derived::poly()``
+instead.  This mechanism allows to always call functions within the
+scope of the class type that was used to create the class instance, even
+if they are assigned to a pointer using the type of a base class.
+Thanks to dynamic dispatch, LAMMPS can even use styles that are loaded
+at runtime from a shared object file with the :doc:`plugin command <plugin>`.
+
+A special case of virtual functions are so-called pure functions. These
+are virtual functions that are initialized to 0 in the class declaration
+(see example below).
+
+.. code-block:: c++
+
+   class Base {
+   public:
+    virtual void pure() = 0;
+   };
+
+This has the effect that it will no longer be possible to create an instance
+of the base class and that derived classes **must** implement these classes.
+Many of the functions listed with the various styles in the section :doc:`Modify`
+are such pure functions. The motivation for this is to define the interface
+or API of functions.
+
+However, there are downsides to this. For example, calls virtual functions
+from within a constructor, will not be in the scope of the derived class and thus
+it is good practice to either avoid calling them or to provide an explicit scope like
+in ``Base::poly()``.  Furthermore, any destructors in classes containing
+virtual functions should be declared virtual, too, so they are processed
+in the expected order before types are removed from dynamic dispatch.
+
+.. admonition:: Important Notes
+
+   In order to be able to detect incompatibilities and to avoid unexpected
+   behavior already at compile time, it is crucial that all member functions
+   that are intended to replace a virtual or pure function use the ``override``
+   property keyword.  For the same reason it should be avoided to use overloads
+   or default arguments for virtual functions.
+
+Style Factories
+===============
+
+In order to create class instances of the different styles, LAMMPS often
+uses a programming pattern called `Factory`.  Those are functions that create
+an instance of a specific derived class, say ``PairLJCut`` and return a pointer
+to the type of the common base class of that style, ``Pair`` in this case.
+To associate the factory function with the style keyword, an ``std::map``
+class is used in which function pointers are indexed by their keyword
+(for example "lj/cut" for ``PairLJCut`` and "morse" ``PairMorse``).
+A couple of typedefs help to keep the code readable and a template function
+is used to implement the actual factory functions for the individual classes.   
 
 I/O and output formatting
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Memory management
 ^^^^^^^^^^^^^^^^^
-
-
diff --git a/doc/utils/sphinx-config/false_positives.txt b/doc/utils/sphinx-config/false_positives.txt
index 1d4c27822b..935a31069f 100644
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@@ -3408,6 +3408,7 @@ typeI
 typeJ
 typeN
 typeargs
+typedefs
 Tz
 Tzou
 ub

From 1ab5b9d7fd4baa3891006a571231f495c15df1f3 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Sun, 13 Feb 2022 15:25:51 -0500
Subject: [PATCH 03/11] re-sort list of false poisitives alphabetically with
 "sort"

---
 doc/utils/sphinx-config/false_positives.txt | 228 ++++++++++----------
 1 file changed, 114 insertions(+), 114 deletions(-)

diff --git a/doc/utils/sphinx-config/false_positives.txt b/doc/utils/sphinx-config/false_positives.txt
index 935a31069f..b133bbb1f6 100644
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@@ -52,8 +52,8 @@ aij
 aimd
 airebo
 Aj
-ajs
 ajaramil
+ajs
 akohlmey
 Aktulga
 al
@@ -119,10 +119,10 @@ Appl
 Apu
 arallel
 arccos
-arge
 Archlinux
 arcsin
 arg
+arge
 args
 argv
 arrhenius
@@ -149,9 +149,9 @@ atc
 AtC
 ATC
 athermal
+athomps
 atime
 atimestep
-athomps
 atm
 atomeye
 atomfile
@@ -196,7 +196,6 @@ Bagi
 Bagnold
 Baig
 Bajaj
-Bkappa
 Bal
 balancer
 Balankura
@@ -215,8 +214,8 @@ barostatting
 Barostatting
 Barrat
 Barros
-Bartelt
 Bartels
+Bartelt
 barycenter
 barye
 Bashford
@@ -258,7 +257,6 @@ bigint
 Bij
 bilayer
 bilayers
-biquadratic
 binsize
 binstyle
 binutils
@@ -267,6 +265,7 @@ biomolecule
 Biomolecules
 Biophys
 Biosym
+biquadratic
 bisectioning
 bispectrum
 Bispectrum
@@ -277,6 +276,7 @@ bitrate
 bitrates
 Bitzek
 Bjerrum
+Bkappa
 Blaise
 blanchedalmond
 blocksize
@@ -315,14 +315,14 @@ Botu
 Bouguet
 Bourne
 boxcolor
-boxlo
 boxhi
-boxxlo
+boxlo
 boxxhi
-boxylo
+boxxlo
 boxyhi
-boxzlo
+boxylo
 boxzhi
+boxzlo
 bp
 bpclermont
 bpls
@@ -422,13 +422,14 @@ Chaudhuri
 checkbox
 checkmark
 checkqeq
+checksum
 chemistries
 Chemnitz
 Cheng
 Chenoweth
 chiral
-ChiralIDs
 chiralIDs
+ChiralIDs
 chirality
 Cho
 ChooseOffset
@@ -499,12 +500,12 @@ cond
 conda
 Conda
 Condens
-Connor
 conf
 config
 configfile
 configurational
 conformational
+Connor
 ConstMatrix
 Contrib
 cooperativity
@@ -561,14 +562,14 @@ cstring
 cstyle
 csvr
 ctrl
-Ctypes
 ctypes
+Ctypes
 cuda
 Cuda
 CUDA
+cuFFT
 CuH
 Cui
-cuFFT
 Cummins
 Curk
 Cusentino
@@ -630,11 +631,11 @@ de
 dE
 De
 deallocated
-decorrelation
 debye
 Debye
 Decius
 decompositions
+decorrelation
 decrementing
 deeppink
 deepskyblue
@@ -643,8 +644,8 @@ defn
 deformable
 del
 delaystep
-DeleteIDs
 deleteIDs
+DeleteIDs
 delflag
 Dellago
 delocalization
@@ -704,8 +705,8 @@ Dihedrals
 dihydride
 Dij
 dimdim
-dimensioned
 dimensionality
+dimensioned
 dimgray
 dipolar
 dir
@@ -730,10 +731,10 @@ dmg
 dmi
 dnf
 DNi
-Dobson
 Dobnikar
-Dodds
+Dobson
 docenv
+Dodds
 dodgerblue
 dof
 doi
@@ -741,10 +742,10 @@ Donadio
 Donev
 dotc
 Doty
+downarrow
 doxygen
 doxygenclass
 doxygenfunction
-downarrow
 Doye
 Doyl
 dpd
@@ -813,8 +814,8 @@ eco
 ecoul
 ecp
 Ecut
-EdgeIDs
 edgeIDs
+EdgeIDs
 edihed
 edim
 edip
@@ -826,8 +827,8 @@ ee
 Eebt
 ees
 eFF
-efield
 effm
+efield
 eflag
 eflux
 eg
@@ -836,10 +837,10 @@ ehex
 eHEX
 Ei
 eigen
+eigendecomposition
 eigensolve
 eigensolver
 eigensolvers
-eigendecomposition
 eigenvalue
 eigenvalues
 eigenvector
@@ -914,15 +915,15 @@ equilibrated
 equilibrates
 equilibrating
 equilibration
-Equilibria
 equilibria
+Equilibria
 equilization
 equipartitioning
-Ercolessi
-Erdmann
 eradius
 erate
 erc
+Ercolessi
+Erdmann
 erf
 erfc
 Erhart
@@ -932,10 +933,10 @@ erotate
 errno
 Ertas
 ervel
-Espanol
-Eshelby
 eshelby
+Eshelby
 eskm
+Espanol
 esph
 estretch
 esu
@@ -950,20 +951,20 @@ etol
 etot
 etotal
 etube
-Eulerian
 eulerian
+Eulerian
 eulerimplicit
 Europhys
 ev
 eV
+eval
+evals
 evalue
 Evanseck
 evdwl
-evector
 evec
 evecs
-eval
-evals
+evector
 Everaers
 Evgeny
 evirials
@@ -992,13 +993,13 @@ fbMC
 Fc
 fcc
 fcm
-Fd
 fd
+Fd
 fdotr
 fdt
+fe
 Fehlberg
 Fellinger
-fe
 femtosecond
 femtoseconds
 fene
@@ -1033,11 +1034,14 @@ filename
 Filename
 filenames
 Filenames
-Fily
 fileper
 filesystem
+filesystems
+Fily
 Fincham
 Finchham
+fingerprintconstants
+fingerprintsperelement
 Finnis
 Fiorin
 fixID
@@ -1076,8 +1080,8 @@ forestgreen
 formatarg
 formulae
 Forschungszentrum
-Fortran
 fortran
+Fortran
 Fosado
 fourier
 fp
@@ -1099,6 +1103,7 @@ fstyle
 fsw
 ftm
 ftol
+fuer
 fugacity
 Fumi
 func
@@ -1106,7 +1111,6 @@ funcs
 functionalities
 functionals
 funroll
-fuer
 fx
 fy
 fz
@@ -1145,8 +1149,8 @@ Germann
 Germano
 gerolf
 Gerolf
-getrusage
 Gershgorin
+getrusage
 getter
 gettimeofday
 gewald
@@ -1223,8 +1227,8 @@ gsmooth
 gstyle
 GTL
 Gubbins
-Guericke
 Guenole
+Guericke
 gui
 Gumbsch
 Gunsteren
@@ -1300,7 +1304,6 @@ histogrammed
 histogramming
 hma
 hmaktulga
-hplanck
 hoc
 Hochbruck
 Hofling
@@ -1317,6 +1320,7 @@ howto
 Howto
 Hoy
 Hoyt
+hplanck
 Hs
 hstyle
 html
@@ -1347,8 +1351,8 @@ hyperspherical
 hysteretic
 hz
 IAP
-Ibanez
 iatom
+Ibanez
 ibar
 ibm
 icc
@@ -1439,8 +1443,8 @@ ipi
 ipp
 Ippolito
 IPv
-IPython
 ipython
+IPython
 Isele
 isenthalpic
 ish
@@ -1456,8 +1460,8 @@ isotropically
 isovolume
 Isralewitz
 iter
-iters
 iteratively
+iters
 Ith
 Itsets
 itype
@@ -1600,8 +1604,8 @@ KMP
 kmu
 Knizhnik
 knl
-Kofke
 kofke
+Kofke
 Kohlmeyer
 Kohn
 kokkos
@@ -1653,15 +1657,15 @@ Lackmann
 Ladd
 lagrangian
 lambdai
-lamda
 LambdaLanczos
+lamda
 lammps
 Lammps
 LAMMPS
 lammpsplot
 lammpsplugin
-Lampis
 Lamoureux
+Lampis
 Lanczos
 Lande
 Landron
@@ -1674,8 +1678,8 @@ larentzos
 Larentzos
 Laroche
 lars
-LATBOLTZ
 latboltz
+LATBOLTZ
 latencies
 Latour
 latourr
@@ -1830,13 +1834,13 @@ Lyulin
 lz
 lzma
 Maaravi
-MACHDYN
 machdyn
+MACHDYN
 Mackay
 Mackrodt
+MacOS
 Macromolecules
 macroparticle
-MacOS
 Madura
 Magda
 Magdeburg
@@ -1920,8 +1924,8 @@ mc
 McLachlan
 md
 mdf
-MDI
 mdi
+MDI
 mdpd
 mDPD
 meam
@@ -1945,8 +1949,8 @@ Mei
 Melchor
 Meloni
 Melrose
-Mem
 mem
+Mem
 memalign
 MEMALIGN
 membered
@@ -1960,10 +1964,10 @@ Merz
 meshless
 meso
 mesocnt
-MESODPD
 mesodpd
-MESONT
+MESODPD
 mesont
+MESONT
 mesoparticle
 mesoscale
 mesoscopic
@@ -1998,8 +2002,8 @@ Militzer
 Minary
 mincap
 Mindlin
-minhbonds
 mingw
+minhbonds
 minima
 minimizations
 minimizer
@@ -2098,6 +2102,7 @@ Muccioli
 mui
 Mukherjee
 Mulders
+Müller
 mult
 multi
 multibody
@@ -2126,7 +2131,6 @@ muVT
 mux
 muy
 muz
-Müller
 mv
 mV
 Mvapich
@@ -2146,9 +2150,9 @@ nabla
 Nagaosa
 Nakano
 nall
+namedtuple
 namespace
 namespaces
-namedtuple
 nan
 NaN
 Nandor
@@ -2164,8 +2168,8 @@ nanometer
 nanometers
 nanoparticle
 nanoparticles
-Nanotube
 nanotube
+Nanotube
 nanotubes
 Narulkar
 nasa
@@ -2201,8 +2205,8 @@ ncount
 nd
 ndescriptors
 ndihedrals
-ndihedraltypes
 Ndihedraltype
+ndihedraltypes
 Ndirango
 ndof
 Ndof
@@ -2214,9 +2218,9 @@ Neel
 Neelov
 Negre
 nelem
-nelems
 Nelement
 Nelements
+nelems
 nemd
 netcdf
 netstat
@@ -2250,8 +2254,8 @@ Nicklas
 Niklasson
 Nikolskiy
 nimpropers
-nimpropertypes
 Nimpropertype
+nimpropertypes
 Ninteger
 NiO
 Nissila
@@ -2265,8 +2269,8 @@ nktv
 nl
 nlayers
 nlen
-Nlines
 nlines
+Nlines
 nlo
 nlocal
 Nlocal
@@ -2274,16 +2278,16 @@ Nlog
 nlp
 nm
 Nm
-Nmax
 nmax
+Nmax
 nmc
-Nmin
 nmin
+Nmin
 Nmols
 nn
 nnodes
-Nocedal
 nO
+Nocedal
 nocite
 nocoeff
 nodeless
@@ -2336,11 +2340,11 @@ Nrho
 Nroff
 nrow
 nrun
+ns
 Ns
 Nsample
 Nskip
 Nspecies
-ns
 nsq
 Nstart
 nstats
@@ -2349,9 +2353,9 @@ Nsteplast
 Nstop
 nsub
 Nswap
+nt
 Nt
 Ntable
-nt
 ntheta
 nthreads
 ntimestep
@@ -2394,11 +2398,11 @@ ocl
 octahedral
 octants
 Ohara
+O'Hearn
 ohenrich
 ok
 Okabe
 Okamoto
-O'Hearn
 O'Keefe
 OKeefe
 oldlace
@@ -2456,8 +2460,8 @@ overdamped
 overlayed
 Ovito
 oxdna
-oxrna
 oxDNA
+oxrna
 oxRNA
 packings
 padua
@@ -2506,7 +2510,6 @@ pc
 pchain
 Pchain
 pcmoves
-pmcmoves
 Pdamp
 pdb
 pdf
@@ -2565,13 +2568,16 @@ Pieniazek
 Pieter
 pIm
 pimd
-pIp
 Piola
+pIp
 Pisarev
 Pishevar
 Pitera
 pj
 pjintve
+pKa
+pKb
+pKs
 planeforce
 Plathe
 Plimpton
@@ -2580,10 +2586,8 @@ ploop
 PloS
 plt
 plumedfile
-pKa
-pKb
-pKs
 pmb
+pmcmoves
 Pmolrotate
 Pmoltrans
 pN
@@ -2605,8 +2609,8 @@ polydisperse
 polydispersity
 polyelectrolyte
 polyhedra
-polymorphism
 Polym
+polymorphism
 popen
 Popov
 popstore
@@ -2622,11 +2626,12 @@ Potapkin
 potin
 Pourtois
 powderblue
+PowerShell
 ppn
 pppm
-prd
 Prakash
 Praprotnik
+prd
 pre
 Pre
 prec
@@ -2643,8 +2648,8 @@ Priya
 proc
 Proc
 procs
-Prony
 progguide
+Prony
 ps
 Ps
 pscreen
@@ -2675,8 +2680,8 @@ px
 Px
 pxx
 Pxx
-Pxy
 pxy
+Pxy
 pxz
 py
 Py
@@ -2693,13 +2698,13 @@ Pyy
 pyz
 pz
 Pz
-Pzz
 pzz
+Pzz
 qbmsst
 qcore
 qdist
-qE
 qe
+qE
 qeff
 qelectron
 qeq
@@ -2775,15 +2780,15 @@ RDideal
 rdx
 reacter
 Readline
-realTypeMap
-real_t
 README
+real_t
 realtime
+realTypeMap
 reamin
 reax
-REAXFF
-ReaxFF
 reaxff
+ReaxFF
+REAXFF
 rebo
 recurse
 recursing
@@ -2811,8 +2816,8 @@ Rensselaer
 reparameterizing
 repo
 representable
-Reproducibility
 reproducibility
+Reproducibility
 repuls
 reqid
 rescale
@@ -2934,10 +2939,10 @@ rxd
 rxnave
 rxnsum
 ry
-rz
 Ryckaert
 Rycroft
 Rydbergs
+rz
 Rz
 Sabry
 saddlebrown
@@ -2970,9 +2975,9 @@ Schimansky
 Schiotz
 Schlitter
 Schmid
-Schratt
 Schoen
 Schotte
+Schratt
 Schulten
 Schunk
 Schuring
@@ -3027,8 +3032,8 @@ Shiga
 Shinoda
 Shiomi
 shlib
-SHM
 shm
+SHM
 shockvel
 shrinkexceed
 Shugaev
@@ -3147,10 +3152,10 @@ stepwise
 Stesmans
 Stillinger
 stk
-Stockmayer
-Stoddard
 stochastically
 stochasticity
+Stockmayer
+Stoddard
 stoichiometric
 stoichiometry
 Stokesian
@@ -3210,8 +3215,8 @@ Swiler
 Swinburne
 Swol
 Swope
-Sx
 sx
+Sx
 sy
 Sy
 symplectic
@@ -3220,8 +3225,8 @@ sys
 sysdim
 Syst
 systemd
-Sz
 sz
+Sz
 Tabbernor
 tabinner
 Tadmor
@@ -3268,9 +3273,9 @@ tfmc
 tfMC
 tgnpt
 tgnvt
+th
 Thakkar
 Thaokar
-th
 thb
 thei
 Theodorou
@@ -3328,11 +3333,11 @@ Tmin
 tmp
 tN
 Tobias
+Toennies
 Tohoku
 tokenizer
 tokyo
 tol
-Toennies
 tomic
 toolchain
 topologies
@@ -3404,11 +3409,11 @@ twojmax
 Tx
 txt
 Tyagi
+typeargs
+typedefs
 typeI
 typeJ
 typeN
-typeargs
-typedefs
 Tz
 Tzou
 ub
@@ -3426,8 +3431,8 @@ uk
 ul
 ulb
 Uleft
-uloop
 Ulomek
+uloop
 ulsph
 Ultrafast
 uMech
@@ -3585,10 +3590,10 @@ vzcm
 vzi
 Waals
 Wadley
-Waroquier
 wallstyle
 walltime
 Waltham
+Waroquier
 wavepacket
 wB
 Wbody
@@ -3606,12 +3611,12 @@ whitesmoke
 whitespace
 Wi
 Wicaksono
-Widom
 widom
+Widom
 Wijk
 Wikipedia
-Wildcard
 wildcard
+Wildcard
 wildcards
 Winkler
 Wirnsberger
@@ -3625,12 +3630,12 @@ Worley
 Wriggers
 Wuppertal
 Wurtzite
-Wysocki
 www
 wx
 Wx
 wy
 Wy
+Wysocki
 wz
 Wz
 xa
@@ -3678,10 +3683,10 @@ xyz
 xz
 xzhou
 yaff
-yaml
-Yanxon
 YAFF
 Yamada
+yaml
+Yanxon
 Yaser
 Yazdani
 Ybar
@@ -3712,14 +3717,15 @@ Yuya
 yx
 yy
 yz
+Zagaceta
 Zannoni
 Zavattieri
 zbl
 ZBL
 Zc
 zcm
-Zeeman
 zeeman
+Zeeman
 Zemer
 Zepeda
 zflag
@@ -3733,29 +3739,23 @@ zi
 Zi
 ziegenhain
 Ziegenhain
+zincblende
 Zj
 zlim
 zlo
+Zm
 zmax
 zmin
 zmq
 zN
 zs
 zst
+Zstandard
+zstd
+Zstd
 zsu
 zu
 zx
 zy
 Zybin
 zz
-Zm
-PowerShell
-filesystems
-fingerprintconstants
-fingerprintsperelement
-Zagaceta
-zincblende
-Zstandard
-Zstd
-zstd
-checksum

From 810717bfe53597dc0e5eac1986c7f8f14b6f9588 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Sun, 13 Feb 2022 15:49:50 -0500
Subject: [PATCH 04/11] discuss stdio vs iostreams and fmtlib

---
 doc/src/Developer_cxx_vs_c_style.rst        | 60 ++++++++++++++++++---
 doc/utils/sphinx-config/false_positives.txt |  2 +
 2 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/doc/src/Developer_cxx_vs_c_style.rst b/doc/src/Developer_cxx_vs_c_style.rst
index 0b0526e7f9..e4f04335a4 100644
--- a/doc/src/Developer_cxx_vs_c_style.rst
+++ b/doc/src/Developer_cxx_vs_c_style.rst
@@ -51,7 +51,6 @@ standard input or the provided input file) until the end.  And the third
 line deletes that instance again.  The remainder of the main.cpp file
 are for error handling, MPI configuration and other special features.
 
-
 In the constructor of the LAMMPS class instance the basic LAMMPS class hierarchy
 is created as shown in :ref:`class-topology`.  While processing the input further
 class instances are created, or deleted, or replaced and specific member functions
@@ -151,9 +150,10 @@ the same scope, that is ``Base::call()`` will always call
 virtual member function will be dispatched to ``Derived::poly()``
 instead.  This mechanism allows to always call functions within the
 scope of the class type that was used to create the class instance, even
-if they are assigned to a pointer using the type of a base class.
-Thanks to dynamic dispatch, LAMMPS can even use styles that are loaded
-at runtime from a shared object file with the :doc:`plugin command <plugin>`.
+if they are assigned to a pointer using the type of a base class. This
+is the desired behavior, and thanks to dynamic dispatch, LAMMPS can even
+use styles that are loaded at runtime from a shared object file with the
+:doc:`plugin command <plugin>`.
 
 A special case of virtual functions are so-called pure functions. These
 are virtual functions that are initialized to 0 in the class declaration
@@ -170,9 +170,10 @@ This has the effect that it will no longer be possible to create an instance
 of the base class and that derived classes **must** implement these classes.
 Many of the functions listed with the various styles in the section :doc:`Modify`
 are such pure functions. The motivation for this is to define the interface
-or API of functions.
+or API of functions but defer the implementation of those functionality to
+the derived classes.
 
-However, there are downsides to this. For example, calls virtual functions
+However, there are downsides to this. For example, calls to virtual functions
 from within a constructor, will not be in the scope of the derived class and thus
 it is good practice to either avoid calling them or to provide an explicit scope like
 in ``Base::poly()``.  Furthermore, any destructors in classes containing
@@ -185,7 +186,9 @@ in the expected order before types are removed from dynamic dispatch.
    behavior already at compile time, it is crucial that all member functions
    that are intended to replace a virtual or pure function use the ``override``
    property keyword.  For the same reason it should be avoided to use overloads
-   or default arguments for virtual functions.
+   or default arguments for virtual functions as they lead to confusion over
+   which function is supposed to override which and which arguments need to be
+   declared.
 
 Style Factories
 ===============
@@ -198,10 +201,51 @@ To associate the factory function with the style keyword, an ``std::map``
 class is used in which function pointers are indexed by their keyword
 (for example "lj/cut" for ``PairLJCut`` and "morse" ``PairMorse``).
 A couple of typedefs help to keep the code readable and a template function
-is used to implement the actual factory functions for the individual classes.   
+is used to implement the actual factory functions for the individual classes.
 
 I/O and output formatting
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 
+C-style stdio versus C++ style iostreams
+========================================
+
+LAMMPS chooses to use the "stdio" library of the standard C library for
+reading from and writing to files and console instead of "iostreams" that were
+introduced with C++.  This is mainly motivated by the better performance,
+better control over formatting, and less effort to achieve specific formatting.
+
+Since mixing "stdio" and "iostreams" can lead to unexpected behavior using
+the latter is strongly discouraged.  Also output to the screen should not
+use the predefined ``stdout`` FILE pointer, but rather the ``screen`` and
+``logfile`` FILE pointers managed by the LAMMPS class.  Furthermore, output
+should only be done by MPI rank 0 (``comm->me == 0``) and output that is
+send to both ``screen`` and ``logfile`` should use the
+:cpp:func:`utils::logmesg() convenience function <LAMMPS_NS::utils::logmesg>`.
+
+Formatting with the {fmt} library
+===================================
+
+The LAMMPS source code includes a copy of the `{fmt} library
+<https://fmt.dev>`_ which is preferred over formatting with the
+"printf()" family of functions.  The primary reason is that it allows a
+typesafe default format for any type of supported data.  This is
+particularly useful for formatting integers of a given size (32-bit or
+64-bit) which may require different format strings depending on compile
+time settings or compilers/operating systems.  Furthermore, {fmt} gives
+better performance, has more functionality, a familiar formatting syntax
+that has similarities to ``format()`` in Python, and provides a facility
+that can be used to integrate format strings and a variable number of
+arguments into custom functions in a much simpler way that the varargs
+mechanism of the C library.  Finally, {fmt} has been included into the
+C++20 language standard, so changes to adopt it are future proof.
+
+Formatted strings are most commonly created by calling the
+``fmt::format()`` function which will return a string as ``std::string``
+class instance.  In contrast to the ``%`` placeholder in ``printf()``,
+the {fmt} library uses ``{}`` to embed format descriptors.  In the
+simplest case, no additional characters are needed as {fmt} will choose
+the default format based on the data type of the argument.
+
+
 Memory management
 ^^^^^^^^^^^^^^^^^
diff --git a/doc/utils/sphinx-config/false_positives.txt b/doc/utils/sphinx-config/false_positives.txt
index b133bbb1f6..944f8409da 100644
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@@ -3414,6 +3414,7 @@ typedefs
 typeI
 typeJ
 typeN
+typesafe
 Tz
 Tzou
 ub
@@ -3497,6 +3498,7 @@ Valuev
 Vanden
 Vandenbrande
 Vanduyfhuys
+varargs
 varavg
 Varshalovich
 Varshney

From 12f746046f67594391a846629f208691793bd4e1 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Mon, 14 Feb 2022 08:45:55 -0500
Subject: [PATCH 05/11] finalize {fmt} lib info

---
 doc/src/Developer_cxx_vs_c_style.rst | 63 +++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 2 deletions(-)

diff --git a/doc/src/Developer_cxx_vs_c_style.rst b/doc/src/Developer_cxx_vs_c_style.rst
index e4f04335a4..a40d231318 100644
--- a/doc/src/Developer_cxx_vs_c_style.rst
+++ b/doc/src/Developer_cxx_vs_c_style.rst
@@ -239,12 +239,71 @@ arguments into custom functions in a much simpler way that the varargs
 mechanism of the C library.  Finally, {fmt} has been included into the
 C++20 language standard, so changes to adopt it are future proof.
 
-Formatted strings are most commonly created by calling the
+Formatted strings are frequently created by calling the
 ``fmt::format()`` function which will return a string as ``std::string``
 class instance.  In contrast to the ``%`` placeholder in ``printf()``,
 the {fmt} library uses ``{}`` to embed format descriptors.  In the
 simplest case, no additional characters are needed as {fmt} will choose
-the default format based on the data type of the argument.
+the default format based on the data type of the argument. Alternatively
+The ``fmt::print()`` function may be used instead of ``printf()`` or
+``fprintf()``.  In addition, several LAMMPS output functions, that
+originally accepted a single string as arguments have been overloaded to
+accept a format string with optional arguments as well (e.g.
+``Error::all()``, ``Error::one()``, ``utils::logmesg()``).
+
+Summary of the {fmt} format syntax
+==================================
+
+The syntax of the format string is "{[<argument id>][:<format spec>]}",
+where either the argument id or the format spec (separated by a colon
+':') is optional.  The argument id is usually a number starting from 0
+that is the index to the arguments following the format string.  By
+default these are assigned in order (i.e. 0, 1, 2, 3, 4 etc.).  The most
+common case for using argument id would be to use the same argument in
+multiple places in the format string without having to provide it as an
+argument multiple times. In LAMMPS the argument id is rarely used.
+
+More common is the use of the format specifier, which starts with a
+colon.  This may optionally be followed by a fill character (default is
+' '). If provided, the fill character **must** be followed by an
+alignment character ('<', '^', '>' for left, centered, or right
+alignment (default)). The alignment character may be used without a fill
+character. The next important format parameter would be the minimum
+width, which may be followed by a dot '.'  and a precision for floating
+point numbers. The final character in the format string would be an
+indicator for the "presentation", i.e. 'd' for decimal presentation of
+integers, 'x' for hexadecimal, 'o' for octal, 'c' for character
+etc. This mostly follows the "printf()" scheme but without requiring an
+additional length parameter to distinguish between different integer
+widths. The {fmt} library will detect those and adapt the formatting
+accordingly.  For floating point numbers there are correspondingly, 'g'
+for generic presentation, 'e' for exponential presentation, and 'f' for
+fixed point presentation.
+
+Thus "{:8}" would represent *any* type argument using at least 8
+characters; "{:<8}" would do this as left aligned, "{:^8}" as centered,
+"{:>8}" as right aligned.  If a specific presentation is selected, the
+argument type must be compatible or else the {fmt} formatting code will
+throw an exception. Some format string examples are given below:
+
+.. code-block:: C
+
+   auto mesg = fmt::format("  CPU time: {:4d}:{:02d}:{:02d}\n", cpuh, cpum, cpus);
+   mesg = fmt::format("{:<8s}| {:<10.5g} | {:<10.5g} | {:<10.5g} |{:6.1f} |{:6.2f}\n",
+                      label, time_min, time, time_max, time_sq, tmp);
+   utils::logmesg(lmp,"{:>6} = max # of 1-2 neighbors\n",maxall);
+   utils::logmesg(lmp,"Lattice spacing in x,y,z = {:.8} {:.8} {:.8}\n",
+                  xlattice,ylattice,zlattice);
+
+A special feature of the {fmt} library is that format parameters like
+the width or the precision may be also provided as arguments. In that
+case a nested format is used where a pair of curly braces (with an
+optional argument id) "{}" are used instead of the value, for example
+"{:{}d}" will consume two integer arguments, the first will be the value
+shown and the second the minimum width.
+
+For more details and examples, please consult the `{fmt} syntax
+documentation <https://fmt.dev/latest/syntax.html>`_ website.
 
 
 Memory management

From 1a6b627fa0034afb00fc71279dc88569b1dcba66 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Mon, 14 Feb 2022 11:54:37 -0500
Subject: [PATCH 06/11] add section about memory allocations

---
 doc/src/Developer_cxx_vs_c_style.rst | 72 +++++++++++++++++++++++++---
 1 file changed, 66 insertions(+), 6 deletions(-)

diff --git a/doc/src/Developer_cxx_vs_c_style.rst b/doc/src/Developer_cxx_vs_c_style.rst
index a40d231318..993a6aa6a5 100644
--- a/doc/src/Developer_cxx_vs_c_style.rst
+++ b/doc/src/Developer_cxx_vs_c_style.rst
@@ -5,9 +5,11 @@ This section discusses some of the code design choices in LAMMPS and
 overall strategy in order to assist developers to write new code that
 will fit well with the remaining code.  Please see the section on
 :doc:`Requirements for contributed code <Modify_style>` for more
-specific recommendations and guidelines.  Here the focus is on overall
-strategy and discussion of some relevant C++ programming language
-constructs.
+specific recommendations and guidelines.  While that section is
+organized more in the form of a checklist for code contributors, the
+focus here is on overall code design strategy, choices made between
+possible alternatives, and to discuss of some relevant C++ programming
+language constructs.
 
 Historically, the basic design philosophy of the LAMMPS C++ code was
 that of a "C with classes" style.  The was motivated by the desire to
@@ -28,6 +30,17 @@ C++ standard library or as custom classes or function, in order to
 improve readability of the code and to increase code reuse through
 abstraction of commonly used functionality.
 
+.. note::
+
+   Please note that as of spring 2022 there is still a sizable chunk of
+   legacy code in LAMMPS that has not yet been refactored to reflect these
+   style conventions in full.  LAMMPS has a large code base and many
+   different contributors and there also is a hierarchy of precedence
+   in which the code is adapted.  Highest priority has the code in the
+   ``src`` folder, followed by code in packages in order of their popularity
+   and complexity (simpler code is adapted sooner), followed by code
+   in the ``lib`` folder.  Source code that is downloaded during compilation
+   is not subject to the conventions discussed here.
 
 Object oriented code
 ^^^^^^^^^^^^^^^^^^^^
@@ -210,9 +223,9 @@ C-style stdio versus C++ style iostreams
 ========================================
 
 LAMMPS chooses to use the "stdio" library of the standard C library for
-reading from and writing to files and console instead of "iostreams" that were
-introduced with C++.  This is mainly motivated by the better performance,
-better control over formatting, and less effort to achieve specific formatting.
+reading from and writing to files and console instead of C++
+"iostreams".  This is mainly motivated by the better performance, better
+control over formatting, and less effort to achieve specific formatting.
 
 Since mixing "stdio" and "iostreams" can lead to unexpected behavior using
 the latter is strongly discouraged.  Also output to the screen should not
@@ -222,6 +235,11 @@ should only be done by MPI rank 0 (``comm->me == 0``) and output that is
 send to both ``screen`` and ``logfile`` should use the
 :cpp:func:`utils::logmesg() convenience function <LAMMPS_NS::utils::logmesg>`.
 
+We also discourage the use for stringstreams as the bundled {fmt} library
+and the customized tokenizer classes can provide the same functionality
+in a cleaner way with better performance. This will also help to retain
+a consistent programming style despite the many different contributors.
+
 Formatting with the {fmt} library
 ===================================
 
@@ -295,6 +313,15 @@ throw an exception. Some format string examples are given below:
    utils::logmesg(lmp,"Lattice spacing in x,y,z = {:.8} {:.8} {:.8}\n",
                   xlattice,ylattice,zlattice);
 
+which will create the following output lines:
+
+.. parsed-literal::
+
+     CPU time:    0:02:16
+     Pair    | 2.0133     | 2.0133     | 2.0133     |   0.0 | 84.21
+          4 = max # of 1-2 neighbors
+     Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
+
 A special feature of the {fmt} library is that format parameters like
 the width or the precision may be also provided as arguments. In that
 case a nested format is used where a pair of curly braces (with an
@@ -308,3 +335,36 @@ documentation <https://fmt.dev/latest/syntax.html>`_ website.
 
 Memory management
 ^^^^^^^^^^^^^^^^^
+
+Dynamical allocation of data and objects should be done with either the
+C++ commands "new" and "delete/delete[]" or using member functions of
+the ``Memory`` class, most commonly, ``Memory::create()``,
+``Memory::grow()``, and ``Memory::destroy()``.  The use of ``malloc()``,
+``calloc()``, ``realloc()`` and ``free()`` directly is strongly
+discouraged.  To simplify adapting legacy code into the LAMMPS code base
+the member functions ``Memory::smalloc()``, ``Memory::srealloc()``, and
+``Memory::sfree()`` are available.
+
+Using those custom memory allocation functions is motivated by the
+following considerations:
+
+- memory allocation failures on *any* MPI rank during a parallel run will trigger
+  an immediate abort of the entire parallel calculation instead of stalling it
+- a failing "new" will trigger an exception which is also captured by LAMMPS and
+  triggers a global abort
+- allocation of multi-dimensional arrays will be done in a C compatible fashion
+  but so that the storage of the actual data is stored in one large consecutive block
+  and thus when MPI communication is needed, only this storage needs to be
+  communicated (similar to Fortran arrays)
+- the "destroy()" and "sfree()" functions may safely be called on NULL pointers
+- the "destroy()" functions will nullify the pointer variables making
+  "use after free" errors easy to detect
+- it is possible to use a large than default memory alignment (not on all operating
+  systems, since the allocated storage pointers must be compatible with ``free()``
+  for technical reasons)
+
+In the practical implementation of code this means that any pointer variables
+that are class members should be initialized to a ``nullptr`` value in their
+respective constructors.  That way it would be safe to call ``Memory::destroy()``
+or ``delete[]`` on them before *any* allocation outside the constructor.
+This helps to prevent memory leaks.

From fbf95c2cbc530b72df6c43aadf7afeb483335cd9 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Mon, 14 Feb 2022 11:54:50 -0500
Subject: [PATCH 07/11] add notes about file parsing

---
 doc/src/Developer_notes.rst | 51 +++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/doc/src/Developer_notes.rst b/doc/src/Developer_notes.rst
index ab2e3826f2..23344de61b 100644
--- a/doc/src/Developer_notes.rst
+++ b/doc/src/Developer_notes.rst
@@ -7,6 +7,57 @@ typically document what a variable stores, what a small section of
 code does, or what a function does and its input/outputs.  The topics
 on this page are intended to document code functionality at a higher level.
 
+Reading and parsing of text and text files
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+It is frequently required for a class in LAMMPS to read in additional
+data from a file, most commonly potential parameters from a potential
+file for manybody potentials.  LAMMPS provides several custom classes
+and convenience functions to simplify the process.  This offers the
+following benefits:
+
+- better code reuse and fewer lines of code needed to implement reading
+  and parsing data from a file
+- better detection of format errors, incompatible data, and better error messages
+- exit with an error message instead of silently converting only part of the
+  text to a number or returning a 0 on unrecognized text and thus reading incorrect values
+- re-entrant code through avoiding global static variables (as used by ``strtok()``)
+- transparent support for translating unsupported UTF-8 characters to their ASCII equivalents
+  (the text to value conversion functions **only** accept ASCII characters)
+
+In most cases (e.g. potential files) the same data is needed on all MPI
+ranks.  Then it is best to do the reading and parsing only on MPI rank
+0, and communicate the data later with one or more ``MPI_Bcast()``
+calls.  For reading generic text and potential parameter files the
+custom classes :cpp:class:`TextFileReader <LAMMPS_NS::TextFileReader>`
+and :cpp:class:`PotentialFileReader <LAMMPS_NS::PotentialFileReader>`
+are available. Those classes allow to read the file as individual lines
+for which they can return a tokenizer class (see below) for parsing the
+line, or they can return blocks of numbers as a vector directly.  The
+documentation on `File reader classes <file-reader-classes>`_ contains
+an example for a typical case.
+
+When reading per-atom data, the data in the file usually needs include
+an atom ID so it can be associated with a particular atom.  In that case
+the data can be read in multi-line chunks and broadcast to all MPI ranks
+with :cpp:func:`utils::read_lines_from_file()
+<LAMMPS_NS::utils::read_lines_from_file>`.  Those chunks are then
+split into lines, parsed, and applied only to atoms the MPI rank
+"owns".
+
+For splitting a string (incrementally) into words and optionally
+converting those to numbers, the :cpp:class:`Tokenizer
+<LAMMPS_NS::Tokenizer>` and :cpp:class:`ValueTokenizer
+<LAMMPS_NS::ValueTokenizer>` can be used.  Those provide a superset
+of the functionality of ``strtok()`` from the C-library and the latter
+also includes conversion to different types.  Any errors while processing
+the string in those classes will result in an exception, which can
+be caught and the error processed as needed.  Unlike C-library functions
+like ``atoi()``, ``atof()``, ``strtol()``, or ``strtod()`` the
+conversion to numbers first checks of the string is a valid number
+and thus will not silently return an unexpected or incorrect value.
+
+
 Fix contributions to instantaneous energy, virial, and cumulative energy
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 

From 1a436c5aa9fd550d1c8bef281e6accc8efab12f8 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Mon, 14 Feb 2022 11:55:04 -0500
Subject: [PATCH 08/11] fix some broken links

---
 doc/src/Developer_utils.rst | 63 ++++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 28 deletions(-)

diff --git a/doc/src/Developer_utils.rst b/doc/src/Developer_utils.rst
index 7172f81eb7..39ac9c716b 100644
--- a/doc/src/Developer_utils.rst
+++ b/doc/src/Developer_utils.rst
@@ -21,18 +21,21 @@ In that case, the functions will stop with an error message, indicating
 the name of the problematic file, if possible unless the *error* argument
 is a NULL pointer.
 
-The :cpp:func:`fgets_trunc` function will work similar for ``fgets()``
-but it will read in a whole line (i.e. until the end of line or end
-of file), but store only as many characters as will fit into the buffer
-including a final newline character and the terminating NULL byte.
-If the line in the file is longer it will thus be truncated in the buffer.
-This function is used by :cpp:func:`read_lines_from_file` to read individual
-lines but make certain they follow the size constraints.
+The :cpp:func:`utils::fgets_trunc() <LAMMPS_NS::utils::fgets_trunc>`
+function will work similar for ``fgets()`` but it will read in a whole
+line (i.e. until the end of line or end of file), but store only as many
+characters as will fit into the buffer including a final newline
+character and the terminating NULL byte.  If the line in the file is
+longer it will thus be truncated in the buffer.  This function is used
+by :cpp:func:`utils::read_lines_from_file()
+<LAMMPS_NS::utils::read_lines_from_file>` to read individual lines but
+make certain they follow the size constraints.
 
-The :cpp:func:`read_lines_from_file` function will read the requested
-number of lines of a maximum length into a buffer and will return 0
-if successful or 1 if not. It also guarantees that all lines are
-terminated with a newline character and the entire buffer with a
+The :cpp:func:`utils::read_lines_from_file()
+<LAMMPS_NS::utils::read_lines_from_file>` function will read the
+requested number of lines of a maximum length into a buffer and will
+return 0 if successful or 1 if not. It also guarantees that all lines
+are terminated with a newline character and the entire buffer with a
 NULL character.
 
 ----------
@@ -62,7 +65,7 @@ silently returning the result of a partial conversion or zero in cases
 where the string is not a valid number.  This behavior improves
 detecting typos or issues when processing input files.
 
-Similarly the :cpp:func:`logical() <LAMMPS_NS::utils::logical>` function
+Similarly the :cpp:func:`utils::logical() <LAMMPS_NS::utils::logical>` function
 will convert a string into a boolean and will only accept certain words.
 
 The *do_abort* flag should be set to ``true`` in case  this function
@@ -70,8 +73,8 @@ is called only on a single MPI rank, as that will then trigger the
 a call to ``Error::one()`` for errors instead of ``Error::all()``
 and avoids a "hanging" calculation when run in parallel.
 
-Please also see :cpp:func:`is_integer() <LAMMPS_NS::utils::is_integer>`
-and :cpp:func:`is_double() <LAMMPS_NS::utils::is_double>` for testing
+Please also see :cpp:func:`utils::is_integer() <LAMMPS_NS::utils::is_integer>`
+and :cpp:func:`utils::is_double() <LAMMPS_NS::utils::is_double>` for testing
 strings for compliance without conversion.
 
 ----------
@@ -393,21 +396,26 @@ A typical code segment would look like this:
 
 ----------
 
+.. file-reader-classes:
+
 File reader classes
 -------------------
 
 The purpose of the file reader classes is to simplify the recurring task
 of reading and parsing files. They can use the
-:cpp:class:`LAMMPS_NS::ValueTokenizer` class to process the read in
-text.  The :cpp:class:`LAMMPS_NS::TextFileReader` is a more general
-version while :cpp:class:`LAMMPS_NS::PotentialFileReader` is specialized
-to implement the behavior expected for looking up and reading/parsing
-files with potential parameters in LAMMPS.  The potential file reader
-class requires a LAMMPS instance, requires to be run on MPI rank 0 only,
-will use the :cpp:func:`LAMMPS_NS::utils::get_potential_file_path`
-function to look up and open the file, and will call the
-:cpp:class:`LAMMPS_NS::Error` class in case of failures to read or to
-convert numbers, so that LAMMPS will be aborted.
+:cpp:class:`ValueTokenizer <LAMMPS_NS::ValueTokenizer>` class to process
+the read in text.  The :cpp:class:`TextFileReader
+<LAMMPS_NS::TextFileReader>` is a more general version while
+:cpp:class:`PotentialFileReader <LAMMPS_NS::PotentialFileReader>` is
+specialized to implement the behavior expected for looking up and
+reading/parsing files with potential parameters in LAMMPS.  The
+potential file reader class requires a LAMMPS instance, requires to be
+run on MPI rank 0 only, will use the
+:cpp:func:`utils::get_potential_file_path
+<LAMMPS_NS::utils::get_potential_file_path>` function to look up and
+open the file, and will call the :cpp:class:`LAMMPS_NS::Error` class in
+case of failures to read or to convert numbers, so that LAMMPS will be
+aborted.
 
 .. code-block:: C++
    :caption: Use of PotentialFileReader class in pair style coul/streitz
@@ -482,10 +490,10 @@ provided, as that is used to determine whether a new page of memory
 must be used.
 
 The :cpp:class:`MyPage <LAMMPS_NS::MyPage>` class offers two ways to
-reserve a chunk: 1) with :cpp:func:`get() <LAMMPS_NS::MyPage::get>` the
-chunk size needs to be known in advance, 2) with :cpp:func:`vget()
+reserve a chunk: 1) with :cpp:func:`MyPage::get() <LAMMPS_NS::MyPage::get>` the
+chunk size needs to be known in advance, 2) with :cpp:func:`MyPage::vget()
 <LAMMPS_NS::MyPage::vget>` a pointer to the next chunk is returned, but
-its size is registered later with :cpp:func:`vgot()
+its size is registered later with :cpp:func:`MyPage::vgot()
 <LAMMPS_NS::MyPage::vgot>`.
 
 .. code-block:: C++
@@ -588,4 +596,3 @@ the communication buffers.
 
 .. doxygenunion:: LAMMPS_NS::ubuf
    :project: progguide
-

From 37cd4ba2ea14e194ea74fe2706f176abe0ce2855 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Mon, 14 Feb 2022 11:55:09 -0500
Subject: [PATCH 09/11] spelling

---
 doc/utils/sphinx-config/false_positives.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/utils/sphinx-config/false_positives.txt b/doc/utils/sphinx-config/false_positives.txt
index 944f8409da..dfff998cf5 100644
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@@ -3015,6 +3015,7 @@ Setmask
 setpoint
 setvel
 sfftw
+sfree
 Sg
 Shan
 Shanno
@@ -3174,6 +3175,7 @@ Streiz
 strerror
 strided
 strietz
+stringstreams
 strmatch
 strncmp
 strstr

From f84790ba623be9256754ad85342826ba8736aca1 Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Mon, 14 Feb 2022 15:50:36 -0500
Subject: [PATCH 10/11] add a more specific example to explain how values are
 rejected and how atoi() fails

---
 doc/src/Developer_notes.rst | 22 +++++++++++++---------
 doc/src/Developer_utils.rst |  4 ++--
 src/tokenizer.h             | 10 ++++++++++
 3 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/doc/src/Developer_notes.rst b/doc/src/Developer_notes.rst
index 23344de61b..a15354bb9a 100644
--- a/doc/src/Developer_notes.rst
+++ b/doc/src/Developer_notes.rst
@@ -48,15 +48,19 @@ split into lines, parsed, and applied only to atoms the MPI rank
 For splitting a string (incrementally) into words and optionally
 converting those to numbers, the :cpp:class:`Tokenizer
 <LAMMPS_NS::Tokenizer>` and :cpp:class:`ValueTokenizer
-<LAMMPS_NS::ValueTokenizer>` can be used.  Those provide a superset
-of the functionality of ``strtok()`` from the C-library and the latter
-also includes conversion to different types.  Any errors while processing
-the string in those classes will result in an exception, which can
-be caught and the error processed as needed.  Unlike C-library functions
-like ``atoi()``, ``atof()``, ``strtol()``, or ``strtod()`` the
-conversion to numbers first checks of the string is a valid number
-and thus will not silently return an unexpected or incorrect value.
-
+<LAMMPS_NS::ValueTokenizer>` can be used.  Those provide a superset of
+the functionality of ``strtok()`` from the C-library and the latter also
+includes conversion to different types.  Any errors while processing the
+string in those classes will result in an exception, which can be caught
+and the error processed as needed.  Unlike the C-library functions
+``atoi()``, ``atof()``, ``strtol()``, or ``strtod()`` the conversion
+will check if the converted text is a valid integer of floating point
+number and will not silently return an unexpected or incorrect value.
+For example, ``atoi()`` will return 12 when converting "12.5" while the
+ValueTokenizer class will throw an :cpp:class:`InvalidIntegerException
+<LAMMPS_NS::InvalidIntegerException>` if
+:cpp:func:`ValueTokenizer::next_int()
+<LAMMPS_NS::ValueTokenizer::next_int>` is called on the same string.
 
 Fix contributions to instantaneous energy, virial, and cumulative energy
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/doc/src/Developer_utils.rst b/doc/src/Developer_utils.rst
index 39ac9c716b..a9df85c899 100644
--- a/doc/src/Developer_utils.rst
+++ b/doc/src/Developer_utils.rst
@@ -343,11 +343,11 @@ This code example should produce the following output:
 
 .. doxygenclass:: LAMMPS_NS::InvalidIntegerException
    :project: progguide
-   :members: what
+   :members:
 
 .. doxygenclass:: LAMMPS_NS::InvalidFloatException
    :project: progguide
-   :members: what
+   :members:
 
 ----------
 
diff --git a/src/tokenizer.h b/src/tokenizer.h
index 03afa59836..b267e89b23 100644
--- a/src/tokenizer.h
+++ b/src/tokenizer.h
@@ -52,10 +52,15 @@ class Tokenizer {
   std::vector<std::string> as_vector();
 };
 
+/** General Tokenizer exception class */
+
 class TokenizerException : public std::exception {
   std::string message;
 
  public:
+  // remove unused default constructor
+  TokenizerException() = delete;
+
   /** Thrown during retrieving or skipping tokens
    *
    * \param  msg    String with error message
@@ -67,7 +72,10 @@ class TokenizerException : public std::exception {
   const char *what() const noexcept override { return message.c_str(); }
 };
 
+/** Exception thrown by ValueTokenizer when trying to convert an invalid integer string */
+
 class InvalidIntegerException : public TokenizerException {
+
  public:
   /** Thrown during converting string to integer number
    *
@@ -78,6 +86,8 @@ class InvalidIntegerException : public TokenizerException {
   }
 };
 
+/** Exception thrown by ValueTokenizer when trying to convert an floating point string */
+
 class InvalidFloatException : public TokenizerException {
  public:
   /** Thrown during converting string to floating point number

From baf443766a0af0ae6dbcba5b4f2500e0d8c4db2c Mon Sep 17 00:00:00 2001
From: Axel Kohlmeyer <akohlmey@gmail.com>
Date: Mon, 14 Feb 2022 16:09:52 -0500
Subject: [PATCH 11/11] fix a few typos or mistyped words and explain some
 details better

---
 doc/src/Developer_cxx_vs_c_style.rst | 78 ++++++++++++++++------------
 1 file changed, 46 insertions(+), 32 deletions(-)

diff --git a/doc/src/Developer_cxx_vs_c_style.rst b/doc/src/Developer_cxx_vs_c_style.rst
index 993a6aa6a5..438e57abc7 100644
--- a/doc/src/Developer_cxx_vs_c_style.rst
+++ b/doc/src/Developer_cxx_vs_c_style.rst
@@ -77,22 +77,24 @@ LAMMPS makes extensive use of the object oriented programming (OOP)
 principles of *compositing* and *inheritance*. Classes like the
 ``LAMMPS`` class are a **composite** containing pointers to instances of
 other classes like ``Atom``, ``Comm``, ``Force``, ``Neighbor``,
-``Modify``, and so on. Each of these classes implement certain
+``Modify``, and so on.  Each of these classes implement certain
 functionality by storing and manipulating data related to the simulation
 and providing member functions that trigger certain actions.  Some of
-those classes like ``Force`` and a composite again containing instances
+those classes like ``Force`` are a composite again containing instances
 of classes describing the force interactions or ``Modify`` containing
-and calling fixes and computes. In most cases there is only one instance
-of those member classes allowed, but in a few cases there can also be
-multiple instances and the parent class is maintaining a list of the
-pointers of instantiated classes.
+and calling fixes and computes.  In most cases (e.g. ``AtomVec``, ``Comm``,
+``Pair``, or ``Bond``) there is only one instance of those member classes
+allowed, but in a few cases (e.g. ``Region``, ``Fix``, ``Compute``, or
+``Dump``) there can be multiple instances and the parent class is
+maintaining a list of the pointers of instantiated classes instead
+of a single pointer.
 
-Changing behavior or adjusting how LAMMPS handles the simulation is
+Changing behavior or adjusting how LAMMPS handles a simulation is
 implemented via **inheritance** where different variants of the
 functionality are realized by creating *derived* classes that can share
 common functionality in their base class and provide a consistent
 interface where the derived classes replace (dummy or pure) functions in
-the base class. The higher level classes can then call those methods of
+the base class.  The higher level classes can then call those methods of
 the instantiated classes without having to know which specific derived
 class variant was instantiated.  In the LAMMPS documentation those
 derived classes are usually referred to a "styles", e.g.  pair styles,
@@ -108,6 +110,15 @@ classes.  Whenever a new :doc:`pair_style` or :doc:`bond_style` or
 any existing class instance is deleted and a new instance created in
 it place.
 
+Classes derived from ``Fix`` or ``Compute`` represent a different facet
+of LAMMPS' flexibility as there can be multiple instances of them an
+their member functions will be called at different phases of the time
+integration process (as explained in `Developer_flow`).  This way
+multiple manipulations of the entire or parts of the system can be
+programmed (with fix styles) or different computations can be performed
+and accessed and further processed or output through a common interface
+(with compute styles).
+
 Further code sharing is possible by creating derived classes from the
 derived classes (for instance to implement an accelerated version of a
 pair style) where then only a subset of the methods are replaced with
@@ -179,19 +190,20 @@ are virtual functions that are initialized to 0 in the class declaration
     virtual void pure() = 0;
    };
 
-This has the effect that it will no longer be possible to create an instance
-of the base class and that derived classes **must** implement these classes.
-Many of the functions listed with the various styles in the section :doc:`Modify`
-are such pure functions. The motivation for this is to define the interface
-or API of functions but defer the implementation of those functionality to
-the derived classes.
+This has the effect that it will no longer be possible to create an
+instance of the base class and that derived classes **must** implement
+these functions.  Many of the functions listed with the various class
+styles in the section :doc:`Modify` are such pure functions.  The
+motivation for this is to define the interface or API of the functions
+but defer the implementation to the derived classes.
 
-However, there are downsides to this. For example, calls to virtual functions
-from within a constructor, will not be in the scope of the derived class and thus
-it is good practice to either avoid calling them or to provide an explicit scope like
-in ``Base::poly()``.  Furthermore, any destructors in classes containing
-virtual functions should be declared virtual, too, so they are processed
-in the expected order before types are removed from dynamic dispatch.
+However, there are downsides to this. For example, calls to virtual
+functions from within a constructor, will not be in the scope of the
+derived class and thus it is good practice to either avoid calling them
+or to provide an explicit scope like in ``Base::poly()``.  Furthermore,
+any destructors in classes containing virtual functions should be
+declared virtual, too, so they are processed in the expected order
+before types are removed from dynamic dispatch.
 
 .. admonition:: Important Notes
 
@@ -348,20 +360,22 @@ the member functions ``Memory::smalloc()``, ``Memory::srealloc()``, and
 Using those custom memory allocation functions is motivated by the
 following considerations:
 
-- memory allocation failures on *any* MPI rank during a parallel run will trigger
-  an immediate abort of the entire parallel calculation instead of stalling it
-- a failing "new" will trigger an exception which is also captured by LAMMPS and
-  triggers a global abort
-- allocation of multi-dimensional arrays will be done in a C compatible fashion
-  but so that the storage of the actual data is stored in one large consecutive block
-  and thus when MPI communication is needed, only this storage needs to be
-  communicated (similar to Fortran arrays)
-- the "destroy()" and "sfree()" functions may safely be called on NULL pointers
+- memory allocation failures on *any* MPI rank during a parallel run
+  will trigger an immediate abort of the entire parallel calculation
+  instead of stalling it
+- a failing "new" will trigger an exception which is also captured by
+  LAMMPS and triggers a global abort
+- allocation of multi-dimensional arrays will be done in a C compatible
+  fashion but so that the storage of the actual data is stored in one
+  large consecutive block and thus when MPI communication is needed,
+  only this storage needs to be communicated (similar to Fortran arrays)
+- the "destroy()" and "sfree()" functions may safely be called on NULL
+  pointers
 - the "destroy()" functions will nullify the pointer variables making
   "use after free" errors easy to detect
-- it is possible to use a large than default memory alignment (not on all operating
-  systems, since the allocated storage pointers must be compatible with ``free()``
-  for technical reasons)
+- it is possible to use a larger than default memory alignment (not on
+  all operating systems, since the allocated storage pointers must be
+  compatible with ``free()`` for technical reasons)
 
 In the practical implementation of code this means that any pointer variables
 that are class members should be initialized to a ``nullptr`` value in their