revise based on suggestions from languagetool.org

2023-01-22 08:33:04 -05:00
parent 57349b042e
commit f65f79ef82
56 changed files with 275 additions and 267 deletions
--- a/doc/src/Developer_comm_ops.rst
+++ b/doc/src/Developer_comm_ops.rst
@ -14,8 +14,8 @@ Owned and ghost atoms
 As described on the :doc:`parallel partitioning algorithms
 <Developer_par_part>` page, LAMMPS spatially decomposes the simulation
 domain, either in a *brick* or *tiled* manner.  Each processor (MPI
-task) owns atoms within its sub-domain and additionally stores ghost
-atoms within a cutoff distance of its sub-domain.
+task) owns atoms within its subdomain and additionally stores ghost
+atoms within a cutoff distance of its subdomain.

 Forward and reverse communication
 =================================
--- a/doc/src/Developer_flow.rst
+++ b/doc/src/Developer_flow.rst
@ -139,7 +139,7 @@ Periodic boundary conditions are then applied by the Domain class via
 its ``pbc()`` method to remap particles that have moved outside the
 simulation box back into the box.  Note that this is not done every
 timestep, but only when neighbor lists are rebuilt.  This is so that
-each processor's sub-domain will have consistent (nearby) atom
+each processor's subdomain will have consistent (nearby) atom
 coordinates for its owned and ghost atoms.  It is also why dumped atom
 coordinates may be slightly outside the simulation box if not dumped
 on a step where the neighbor lists are rebuilt.
@ -153,10 +153,10 @@ method of the Comm class and ``setup_bins()`` method of the Neighbor
 class perform the update.

 The code is now ready to migrate atoms that have left a processor's
-geometric sub-domain to new processors.  The ``exchange()`` method of
+geometric subdomain to new processors.  The ``exchange()`` method of
 the Comm class performs this operation.  The ``borders()`` method of the
 Comm class then identifies ghost atoms surrounding each processor's
-sub-domain and communicates ghost atom information to neighboring
+subdomain and communicates ghost atom information to neighboring
 processors.  It does this by looping over all the atoms owned by a
 processor to make lists of those to send to each neighbor processor.  On
 subsequent timesteps, the lists are used by the ``Comm::forward_comm()``
--- a/doc/src/Developer_grid.rst
+++ b/doc/src/Developer_grid.rst
@ -28,9 +28,9 @@ grid.

 More specifically, a grid point is defined for each cell (by default
 the center point), and a processor owns a grid cell if its point is
-within the processor's spatial sub-domain.  The union of processor
-sub-domains is the global simulation box.  If a grid point is on the
-boundary of two sub-domains, the lower processor owns the grid cell.  A
+within the processor's spatial subdomain.  The union of processor
+subdomains is the global simulation box.  If a grid point is on the
+boundary of two subdomains, the lower processor owns the grid cell.  A
 processor may also store copies of ghost cells which surround its
 owned cells.

@ -62,7 +62,7 @@ y-dimension.  It is even possible to define a 1x1x1 3d grid, though it
 may be inefficient to use it in a computational sense.

 Note that the choice of grid size is independent of the number of
-processors or their layout in a grid of processor sub-domains which
+processors or their layout in a grid of processor subdomains which
 overlays the simulations domain.  Depending on the distributed grid
 size, a single processor may own many 1000s or no grid cells.

@ -235,7 +235,7 @@ invoked, because they influence its operation.
   void set_zfactor(double factor);

 Processors own a grid cell if a point within the grid cell is inside
-the processor's sub-domain.  By default this is the center point of the
+the processor's subdomain.  By default this is the center point of the
 grid cell.  The *set_shift_grid()* method can change this.  The *shift*
 argument is a value from 0.0 to 1.0 (inclusive) which is the offset of
 the point within the grid cell in each dimension.  The default is 0.5
@ -245,9 +245,9 @@ typically no need to change the default as it is optimal for
 minimizing the number of ghost cells needed.

 If a processor maps its particles to grid cells, it needs to allow for
-its particles being outside its sub-domain between reneighboring.  The
+its particles being outside its subdomain between reneighboring.  The
 *distance* argument of the *set_distance()* method sets the furthest
-distance outside a processor's sub-domain which a particle can move.
+distance outside a processor's subdomain which a particle can move.
 Typically this is half the neighbor skin distance, assuming
 reneighboring is done appropriately.  This distance is used in
 determining how many ghost cells a processor needs to store to enable
@ -295,7 +295,7 @@ to the Grid class via the *set_zfactor()* method (*set_yfactor()* for
 2d grids).  The Grid class will then assign ownership of the 1/3 of
 grid cells that overlay the simulation box to the processors which
 also overlay the simulation box.  The remaining 2/3 of the grid cells
-are assigned to processors whose sub-domains are adjacent to the upper
+are assigned to processors whose subdomains are adjacent to the upper
 z boundary of the simulation box.

 ----------
@ -549,13 +549,13 @@ Grid class remap methods for load balancing
 The following methods are used when a load-balancing operation,
 triggered by the :doc:`balance <balance>` or :doc:`fix balance
 <fix_balance>` commands, changes the partitioning of the simulation
-domain into processor sub-domains.
+domain into processor subdomains.

 In order to work with load-balancing, any style command (compute, fix,
 pair, or kspace style) which allocates a grid and stores per-grid data
 should define a *reset_grid()* method; it takes no arguments.  It will
 be called by the two balance commands after they have reset processor
-sub-domains and migrated atoms (particles) to new owning processors.
+subdomains and migrated atoms (particles) to new owning processors.
 The *reset_grid()* method will typically perform some or all of the
 following operations.  See the src/fix_ave_grid.cpp and
 src/EXTRA_FIX/fix_ttm_grid.cpp files for examples of *reset_grid()*
@ -564,7 +564,7 @@ functions.

 First, the *reset_grid()* method can instantiate new grid(s) of the
 same global size, then call *setup_grid()* to partition them via the
-new processor sub-domains.  At this point, it can invoke the
+new processor subdomains.  At this point, it can invoke the
 *identical()* method which compares the owned and ghost grid cell
 index bounds between two grids, the old grid passed as a pointer
 argument, and the new grid whose *identical()* method is being called.
--- a/doc/src/Developer_notes.rst
+++ b/doc/src/Developer_notes.rst
@ -102,7 +102,7 @@ build is then :doc:`processed in parallel <Developer_par_neigh>`.
 The most commonly required neighbor list is a so-called "half" neighbor
 list, where each pair of atoms is listed only once (except when the
 :doc:`newton command setting <newton>` for pair is off; in that case
-pairs straddling sub-domains or periodic boundaries will be listed twice).
+pairs straddling subdomains or periodic boundaries will be listed twice).
 Thus these are the default settings when a neighbor list request is created in:

 .. code-block:: c++
@ -361,7 +361,7 @@ allocated as a 1d vector or 3d array.  Either way, the ordering of
 values within contiguous memory x fastest, then y, z slowest.

 For the ``3d decomposition`` of the grid, the global grid is
-partitioned into bricks that correspond to the sub-domains of the
+partitioned into bricks that correspond to the subdomains of the
 simulation box that each processor owns.  Often, this is a regular 3d
 array (Px by Py by Pz) of bricks, where P = number of processors =
 Px * Py * Pz.  More generally it can be a tiled decomposition, where
--- a/doc/src/Developer_par_comm.rst
+++ b/doc/src/Developer_par_comm.rst
@ -7,16 +7,16 @@ large systems provided it uses a correspondingly large number of MPI
 processes.  Since The per-atom data (atom IDs, positions, velocities,
 types, etc.)  To be able to compute the short-range interactions MPI
 processes need not only access to data of atoms they "own" but also
-information about atoms from neighboring sub-domains, in LAMMPS referred
+information about atoms from neighboring subdomains, in LAMMPS referred
 to as "ghost" atoms.  These are copies of atoms storing required
 per-atom data for up to the communication cutoff distance. The green
 dashed-line boxes in the :ref:`domain-decomposition` figure illustrate
-the extended ghost-atom sub-domain for one processor.
+the extended ghost-atom subdomain for one processor.

 This approach is also used to implement periodic boundary
 conditions: atoms that lie within the cutoff distance across a periodic
 boundary are also stored as ghost atoms and taken from the periodic
-replication of the sub-domain, which may be the same sub-domain, e.g. if
+replication of the subdomain, which may be the same subdomain, e.g. if
 running in serial.  As a consequence of this, force computation in
 LAMMPS is not subject to minimum image conventions and thus cutoffs may
 be larger than half the simulation domain.
@ -28,10 +28,10 @@ be larger than half the simulation domain.
   ghost atom communication

   This figure shows the ghost atom communication patterns between
-   sub-domains for "brick" (left) and "tiled" communication styles for
+   subdomains for "brick" (left) and "tiled" communication styles for
   2d simulations.  The numbers indicate MPI process ranks.  Here the
-   sub-domains are drawn spatially separated for clarity.  The
-   dashed-line box is the extended sub-domain of processor 0 which
+   subdomains are drawn spatially separated for clarity.  The
+   dashed-line box is the extended subdomain of processor 0 which
   includes its ghost atoms.  The red- and blue-shaded boxes are the
   regions of communicated ghost atoms.

@ -42,7 +42,7 @@ atom communication is performed in two stages for a 2d simulation (three
 in 3d) for both a regular and irregular partitioning of the simulation
 box.  For the regular case (left) atoms are exchanged first in the
 *x*-direction, then in *y*, with four neighbors in the grid of processor
-sub-domains.
+subdomains.

 In the *x* stage, processor ranks 1 and 2 send owned atoms in their
 red-shaded regions to rank 0 (and vice versa).  Then in the *y* stage,
@ -55,7 +55,7 @@ For the irregular case (right) the two stages are similar, but a
 processor can have more than one neighbor in each direction.  In the
 *x* stage, MPI ranks 1,2,3 send owned atoms in their red-shaded regions to
 rank 0 (and vice versa).  These include only atoms between the lower
-and upper *y*-boundary of rank 0's sub-domain.  In the *y* stage, ranks
+and upper *y*-boundary of rank 0's subdomain.  In the *y* stage, ranks
 4,5,6 send atoms in their blue-shaded regions to rank 0.  This may
 include ghost atoms they received in the *x* stage, but only if they
 are needed by rank 0 to fill its extended ghost atom regions in the
@ -110,11 +110,11 @@ performed in LAMMPS:
  over 3x the length of a stretched bond for dihedral interactions.  It
  can also exceed the periodic box size.  For the regular communication
  pattern (left), if the cutoff distance extends beyond a neighbor
-  processor's sub-domain, then multiple exchanges are performed in the
+  processor's subdomain, then multiple exchanges are performed in the
  same direction.  Each exchange is with the same neighbor processor,
  but buffers are packed/unpacked using a different list of atoms. For
  forward communication, in the first exchange a processor sends only
  owned atoms.  In subsequent exchanges, it sends ghost atoms received
  in previous exchanges.  For the irregular pattern (right) overlaps of
-  a processor's extended ghost-atom sub-domain with all other processors
+  a processor's extended ghost-atom subdomain with all other processors
  in each dimension are detected.
--- a/doc/src/Developer_par_long.rst
+++ b/doc/src/Developer_par_long.rst
@ -20,7 +20,7 @@ e) electric field values from grid points near each atom are interpolated to com

 For any of the spatial-decomposition partitioning schemes each processor
 owns the brick-shaped portion of FFT grid points contained within its
-sub-domain.  The two interpolation operations use a stencil of grid
+subdomain.  The two interpolation operations use a stencil of grid
 points surrounding each atom.  To accommodate the stencil size, each
 processor also stores a few layers of ghost grid points surrounding its
 brick.  Forward and reverse communication of grid point values is
@ -64,7 +64,7 @@ direction of the 1d FFTs it has to perform. LAMMPS uses the
 pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure.

 Initially (far left), each processor owns a brick of same-color grid
-cells (actually grid points) contained within in its sub-domain.  A
+cells (actually grid points) contained within in its subdomain.  A
 brick-to-pencil communication operation converts this layout to 1d
 pencils in the *x*-dimension (center left).  Again, cells of the same
 color are owned by the same processor.  Each processor can then compute
@ -161,8 +161,8 @@ grid/particle operations that LAMMPS supports:
  <partition>` calculation and then use the :doc:`verlet/split
  integrator <run_style>` to perform the PPPM computation on a
  dedicated, separate partition of MPI processes.  This uses an integer
-  "1:*p*" mapping of *p* sub-domains of the atom decomposition to one
-  sub-domain of the FFT grid decomposition and where pairwise non-bonded
+  "1:*p*" mapping of *p* subdomains of the atom decomposition to one
+  subdomain of the FFT grid decomposition and where pairwise non-bonded
  and bonded forces and energies are computed on the larger partition
  and the PPPM kspace computation concurrently on the smaller partition.

@ -172,7 +172,7 @@ grid/particle operations that LAMMPS supports:

 - LAMMPS implements a ``GridComm`` class which overlays the simulation
  domain with a regular grid, partitions it across processors in a
-  manner consistent with processor sub-domains, and provides methods for
+  manner consistent with processor subdomains, and provides methods for
  forward and reverse communication of owned and ghost grid point
  values.  It is used for PPPM as an FFT grid (as outlined above) and
  also for the MSM algorithm which uses a cascade of grid sizes from
--- a/doc/src/Developer_par_neigh.rst
+++ b/doc/src/Developer_par_neigh.rst
@ -22,7 +22,7 @@ last reneighboring; this and other options of the neighbor list rebuild
 can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.

 On steps when reneighboring is performed, atoms which have moved outside
-their owning processor's sub-domain are first migrated to new processors
+their owning processor's subdomain are first migrated to new processors
 via communication.  Periodic boundary conditions are also (only)
 enforced on these steps to ensure each atom is re-assigned to the
 correct processor.  After migration, the atoms owned by each processor
@ -39,12 +39,12 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.

   neighbor list stencils

-   A 2d simulation sub-domain (thick black line) and the corresponding
+   A 2d simulation subdomain (thick black line) and the corresponding
   ghost atom cutoff region (dashed blue line) for both orthogonal
   (left) and triclinic (right) domains.  A regular grid of neighbor
   bins (thin lines) overlays the entire simulation domain and need not
-   align with sub-domain boundaries; only the portion overlapping the
-   augmented sub-domain is shown.  In the triclinic case it overlaps the
+   align with subdomain boundaries; only the portion overlapping the
+   augmented subdomain is shown.  In the triclinic case it overlaps the
   bounding box of the tilted rectangle.  The blue- and red-shaded bins
   represent a stencil of bins searched to find neighbors of a particular
   atom (black dot).
@ -52,8 +52,8 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.
 To build a local neighbor list in linear time, the simulation domain is
 overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
 as shown in the :ref:`neighbor-stencil` figure for 2d models and a
-single MPI processor's sub-domain.  Each processor stores a set of
-neighbor bins which overlap its sub-domain extended by the neighbor
+single MPI processor's subdomain.  Each processor stores a set of
+neighbor bins which overlap its subdomain extended by the neighbor
 cutoff distance :math:`R_n`.  As illustrated, the bins need not align
 with processor boundaries; an integer number in each dimension is fit to
 the size of the entire simulation box.
@ -144,7 +144,7 @@ supports:

 - For small and sparse systems and as a fallback method, LAMMPS also
  supports neighbor list construction without binning by using a full
-  :math:`O(N^2)` loop over all *i,j* atom pairs in a sub-domain when
+  :math:`O(N^2)` loop over all *i,j* atom pairs in a subdomain when
  using the :doc:`neighbor nsq <neighbor>` command.

 - Dependent on the "pair" setting of the :doc:`newton <newton>` command,
--- a/doc/src/Developer_par_part.rst
+++ b/doc/src/Developer_par_part.rst
@ -15,8 +15,8 @@ distributed-memory parallelism is set with the :doc:`comm_style command
   for MPI parallelization: "brick" on the left with an orthogonal
   (left) and a triclinic (middle) simulation domain, and a "tiled"
   decomposition (right).  The black lines show the division into
-   sub-domains and the contained atoms are "owned" by the corresponding
-   MPI process. The green dashed lines indicate how sub-domains are
+   subdomains and the contained atoms are "owned" by the corresponding
+   MPI process. The green dashed lines indicate how subdomains are
   extended with "ghost" atoms up to the communication cutoff distance.

 The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
@ -32,14 +32,14 @@ means the position of the box face adjusts continuously to enclose all
 the atoms.

 For distributed-memory MPI parallelism, the simulation box is spatially
-decomposed (partitioned) into non-overlapping sub-domains which fill the
+decomposed (partitioned) into non-overlapping subdomains which fill the
 box. The default partitioning, "brick", is most suitable when atom
 density is roughly uniform, as shown in the left-side images of the
-:ref:`domain-decomposition` figure.  The sub-domains comprise a regular
-grid and all sub-domains are identical in size and shape.  Both the
+:ref:`domain-decomposition` figure.  The subdomains comprise a regular
+grid and all subdomains are identical in size and shape.  Both the
 orthogonal and triclinic boxes can deform continuously during a
 simulation, e.g. to compress a solid or shear a liquid, in which case
-the processor sub-domains likewise deform.
+the processor subdomains likewise deform.


 For models with non-uniform density, the number of particles per
@ -50,14 +50,14 @@ load.  For such models, LAMMPS supports multiple strategies to reduce
 the load imbalance:

 - The processor grid decomposition is by default based on the simulation
-  cell volume and tries to optimize the volume to surface ratio for the sub-domains.
+  cell volume and tries to optimize the volume to surface ratio for the subdomains.
  This can be changed with the :doc:`processors command <processors>`.
- The parallel planes defining the size of the sub-domains can be shifted
+- The parallel planes defining the size of the subdomains can be shifted
  with the :doc:`balance command <balance>`. Which can be done in addition
  to choosing a more optimal processor grid.
 - The recursive bisectioning algorithm in combination with the "tiled"
  communication style can produce a partitioning with equal numbers of
-  particles in each sub-domain.
+  particles in each subdomain.


 .. |decomp1| image:: img/decomp-regular.png
@ -76,14 +76,14 @@ the load imbalance:

 The pictures above demonstrate different decompositions for a 2d system
 with 12 MPI ranks.  The atom colors indicate the load imbalance of each
-sub-domain with green being optimal and red the least optimal.
+subdomain with green being optimal and red the least optimal.

 Due to the vacuum in the system, the default decomposition is unbalanced
 with several MPI ranks without atoms (left). By forcing a 1x12x1
 processor grid, every MPI rank does computations now, but number of
-atoms per sub-domain is still uneven and the thin slice shape increases
-the amount of communication between sub-domains (center left). With a
-2x6x1 processor grid and shifting the sub-domain divisions, the load
+atoms per subdomain is still uneven and the thin slice shape increases
+the amount of communication between subdomains (center left). With a
+2x6x1 processor grid and shifting the subdomain divisions, the load
 imbalance is further reduced and the amount of communication required
-between sub-domains is less (center right).  And using the recursive
+between subdomains is less (center right).  And using the recursive
 bisectioning leads to further improved decomposition (right).
--- a/doc/src/Developer_parallel.rst
+++ b/doc/src/Developer_parallel.rst
@ -7,7 +7,7 @@ decomposition.  The parallelization aims to be efficient, and resulting
 in good strong scaling (= good speedup for the same system) and good
 weak scaling (= the computational cost of enlarging the system is
 proportional to the system size).  Additional parallelization using GPUs
-or OpenMP can also be applied within the sub-domain assigned to an MPI
+or OpenMP can also be applied within the subdomain assigned to an MPI
 process.  For clarity, most of the following illustrations show the 2d
 simulation case. The underlying algorithms in those cases, however,
 apply to both 2d and 3d cases equally well.
--- a/doc/src/Developer_utils.rst
+++ b/doc/src/Developer_utils.rst
@ -647,7 +647,7 @@ Communication buffer coding with *ubuf*
 ---------------------------------------

 LAMMPS uses communication buffers where it collects data from various
-class instances and then exchanges the data with neighboring sub-domains.
+class instances and then exchanges the data with neighboring subdomains.
 For simplicity those buffers are defined as ``double`` buffers and
 used for doubles and integer numbers. This presents a unique problem
 when 64-bit integers are used.  While the storage needed for a ``double``
--- a/doc/src/Errors_messages.rst
+++ b/doc/src/Errors_messages.rst
@ -5635,7 +5635,7 @@ Doc page with :doc:`WARNING messages <Errors_warnings>`
   Lost atoms are checked for each time thermo output is done.  See the
   thermo_modify lost command for options.  Lost atoms usually indicate
   bad dynamics, e.g. atoms have been blown far out of the simulation
-   box, or moved further than one processor's sub-domain away before
+   box, or moved further than one processor's subdomain away before
   reneighboring.

 *MEAM library error %d*
@ -6266,14 +6266,14 @@ keyword to allow for additional bonds to be formed
   One or more atoms are attempting to map their charge to a MSM grid point
   that is not owned by a processor.  This is likely for one of two
   reasons, both of them bad.  First, it may mean that an atom near the
-   boundary of a processor's sub-domain has moved more than 1/2 the
+   boundary of a processor's subdomain has moved more than 1/2 the
   :doc:`neighbor skin distance <neighbor>` without neighbor lists being
   rebuilt and atoms being migrated to new processors.  This also means
   you may be missing pairwise interactions that need to be computed.
   The solution is to change the re-neighboring criteria via the
   :doc:`neigh_modify <neigh_modify>` command.  The safest settings are
   "delay 0 every 1 check yes".  Second, it may mean that an atom has
-   moved far outside a processor's sub-domain or even the entire
+   moved far outside a processor's subdomain or even the entire
   simulation box. This indicates bad physics, e.g. due to highly
   overlapping atoms, too large a timestep, etc.

@ -6281,14 +6281,14 @@ keyword to allow for additional bonds to be formed
   One or more atoms are attempting to map their charge to a PPPM grid
   point that is not owned by a processor.  This is likely for one of two
   reasons, both of them bad.  First, it may mean that an atom near the
-   boundary of a processor's sub-domain has moved more than 1/2 the
+   boundary of a processor's subdomain has moved more than 1/2 the
   :doc:`neighbor skin distance <neighbor>` without neighbor lists being
   rebuilt and atoms being migrated to new processors.  This also means
   you may be missing pairwise interactions that need to be computed.
   The solution is to change the re-neighboring criteria via the
   :doc:`neigh_modify <neigh_modify>` command.  The safest settings are
   "delay 0 every 1 check yes".  Second, it may mean that an atom has
-   moved far outside a processor's sub-domain or even the entire
+   moved far outside a processor's subdomain or even the entire
   simulation box. This indicates bad physics, e.g. due to highly
   overlapping atoms, too large a timestep, etc.

@ -6296,14 +6296,14 @@ keyword to allow for additional bonds to be formed
   One or more atoms are attempting to map their charge to a PPPM grid
   point that is not owned by a processor.  This is likely for one of two
   reasons, both of them bad.  First, it may mean that an atom near the
-   boundary of a processor's sub-domain has moved more than 1/2 the
+   boundary of a processor's subdomain has moved more than 1/2 the
   :doc:`neighbor skin distance <neighbor>` without neighbor lists being
   rebuilt and atoms being migrated to new processors.  This also means
   you may be missing pairwise interactions that need to be computed.
   The solution is to change the re-neighboring criteria via the
   :doc:`neigh_modify <neigh_modify>` command.  The safest settings are
   "delay 0 every 1 check yes".  Second, it may mean that an atom has
-   moved far outside a processor's sub-domain or even the entire
+   moved far outside a processor's subdomain or even the entire
   simulation box. This indicates bad physics, e.g. due to highly
   overlapping atoms, too large a timestep, etc.

--- a/doc/src/Errors_warnings.rst
+++ b/doc/src/Errors_warnings.rst
@ -109,9 +109,9 @@ Doc page with :doc:`ERROR messages <Errors_messages>`
 *Communication cutoff is shorter than a bond length based estimate. This may lead to errors.*
   Since LAMMPS stores topology data with individual atoms, all atoms
   comprising a bond, angle, dihedral or improper must be present on any
-   sub-domain that "owns" the atom with the information, either as a
+   subdomain that "owns" the atom with the information, either as a
   local or a ghost atom. The communication cutoff is what determines up
-   to what distance from a sub-domain boundary ghost atoms are created.
+   to what distance from a subdomain boundary ghost atoms are created.
   The communication cutoff is by default the largest non-bonded cutoff
   plus the neighbor skin distance, but for short or non-bonded cutoffs
   and/or long bonds, this may not be sufficient. This warning indicates
@ -398,7 +398,7 @@ This will most likely cause errors in kinetic fluctuations.
   Lost atoms are checked for each time thermo output is done.  See the
   thermo_modify lost command for options.  Lost atoms usually indicate
   bad dynamics, e.g. atoms have been blown far out of the simulation
-   box, or moved further than one processor's sub-domain away before
+   box, or moved further than one processor's subdomain away before
   reneighboring.

 *MSM mesh too small, increasing to 2 points in each direction*
@ -582,13 +582,13 @@ This will most likely cause errors in kinetic fluctuations.
   needed.  The requested volume fraction may be too high, or other atoms
   may be in the insertion region.

-*Proc sub-domain size < neighbor skin, could lead to lost atoms*
+*Proc subdomain size < neighbor skin, could lead to lost atoms*
   The decomposition of the physical domain (likely due to load
-   balancing) has led to a processor's sub-domain being smaller than the
+   balancing) has led to a processor's subdomain being smaller than the
   neighbor skin in one or more dimensions.  Since reneighboring is
   triggered by atoms moving the skin distance, this may lead to lost
   atoms, if an atom moves all the way across a neighboring processor's
-   sub-domain before reneighboring is triggered.
+   subdomain before reneighboring is triggered.

 *Reducing PPPM order b/c stencil extends beyond nearest neighbor processor*
   This may lead to a larger grid than desired.  See the kspace_modify overlap
--- a/doc/src/Howto_grid.rst
+++ b/doc/src/Howto_grid.rst
@ -11,7 +11,7 @@ more values (data).

 The grid cells and data they store are distributed across processors.
 Each processor owns the grid cells (and data) whose center points lie
-within the spatial sub-domain of the processor.  If needed for its
+within the spatial subdomain of the processor.  If needed for its
 computations, a processor may also store ghost grid cells with their
 data.

@ -28,7 +28,7 @@ box size, as set by the :doc:`boundary <boundary>` command for fixed
 or shrink-wrapped boundaries.

 If load-balancing is invoked by the :doc:`balance <balance>` or
-:doc:`fix balance <fix_balance>` commands, then the sub-domain owned
+:doc:`fix balance <fix_balance>` commands, then the subdomain owned
 by a processor can change which may also change which grid cells they
 own.

--- a/doc/src/Howto_output.rst
+++ b/doc/src/Howto_output.rst
@ -59,7 +59,7 @@ of bond distances.
 A per-grid datum is one or more values per grid cell, for a grid which
 overlays the simulation domain.  The grid cells and the data they
 store are distributed across processors; each processor owns the grid
-cells whose center point falls within its sub-domain.
+cells whose center point falls within its subdomain.

 .. _scalar:

@ -322,7 +322,7 @@ The chief difference between the :doc:`fix ave/grid <fix_ave_grid>`
 and :doc:`fix ave/chunk <fix_ave_chunk>` commands when used in this
 context is that the former uses a distributed grid, while the latter
 uses a global grid.  Distributed means that each processor owns the
-subset of grid cells within its sub-domain.  Global means that each
+subset of grid cells within its subdomain.  Global means that each
 processor owns a copy of the entire grid.  The :doc:`fix ave/grid
 <fix_ave_grid>` command is thus more efficient for large grids.

--- a/doc/src/Howto_peri.rst
+++ b/doc/src/Howto_peri.rst
@ -783,19 +783,19 @@ Pitfalls
 **Parallel Scalability**

 LAMMPS operates in parallel in a :doc:`spatial-decomposition mode
-<Developer_par_part>`, where each processor owns a spatial sub-domain of
+<Developer_par_part>`, where each processor owns a spatial subdomain of
 the overall simulation domain and communicates with its neighboring
 processors via distributed-memory message passing (MPI) to acquire ghost
 atom information to allow forces on the atoms it owns to be
 computed. LAMMPS also uses Verlet neighbor lists which are recomputed
 every few timesteps as particles move. On these timesteps, particles
 also migrate to new processors as needed. LAMMPS decomposes the overall
-simulation domain so that spatial sub-domains of nearly equal volume are
-assigned to each processor. When each sub-domain contains nearly the
+simulation domain so that spatial subdomains of nearly equal volume are
+assigned to each processor. When each subdomain contains nearly the
 same number of particles, this results in a reasonable load balance
 among all processors. As is more typical with some peridynamic
-simulations, some sub-domains may contain many particles while other
-sub-domains contain few particles, resulting in a load imbalance that
+simulations, some subdomains may contain many particles while other
+subdomains contain few particles, resulting in a load imbalance that
 impacts parallel scalability.

 **Setting the "skin" distance**
--- a/doc/src/Howto_triclinic.rst
+++ b/doc/src/Howto_triclinic.rst
@ -150,7 +150,7 @@ option with either of the commands.

 Note that if a simulation box has a large tilt factor, LAMMPS will run
 less efficiently, due to the large volume of communication needed to
-acquire ghost atoms around a processor's irregular-shaped sub-domain.
+acquire ghost atoms around a processor's irregular-shaped subdomain.
 For extreme values of tilt, LAMMPS may also lose atoms and generate an
 error.

--- a/doc/src/Intro_citing.rst
+++ b/doc/src/Intro_citing.rst
@ -38,11 +38,11 @@ to create digital object identifiers (DOI) for stable releases of the
 LAMMPS source code.  There are two types of DOIs for the LAMMPS source code.

 The canonical DOI for **all** versions of LAMMPS, which will always
-point to the **latest** stable release version is:
+point to the **latest** stable release version, is:

 - DOI: `10.5281/zenodo.3726416 <https://dx.doi.org/10.5281/zenodo.3726416>`_

-In addition there are DOIs for individual stable releases. Currently there are:
+In addition there are DOIs generated for individual stable releases:

 - 3 March 2020 version: `DOI:10.5281/zenodo.3726417 <https://dx.doi.org/10.5281/zenodo.3726417>`_
 - 29 October 2020 version: `DOI:10.5281/zenodo.4157471 <https://dx.doi.org/10.5281/zenodo.4157471>`_
@ -65,6 +65,6 @@ for optional features used in a specific run is printed to the screen
 and log file.  Style and output location can be selected with the
 :ref:`-cite command-line switch <cite>`.  Additional references are
 given in the documentation of the :doc:`corresponding commands
-<Commands_all>` or in the :doc:`Howto tutorials <Howto>`.  So please
-make certain, that you provide the proper acknowledgments and citations
-in any published works using LAMMPS.
+<Commands_all>` or in the :doc:`Howto tutorials <Howto>`.  Please make
+certain, that you provide the proper acknowledgments and citations in
+any published works using LAMMPS.
--- a/doc/src/Intro_features.rst
+++ b/doc/src/Intro_features.rst
@ -27,7 +27,7 @@ General features
 * distributed memory message-passing parallelism (MPI)
 * shared memory multi-threading parallelism (OpenMP)
 * spatial decomposition of simulation domain for MPI parallelism
-* particle decomposition inside of spatial decomposition for OpenMP and GPU parallelism
+* particle decomposition inside spatial decomposition for OpenMP and GPU parallelism
 * GPLv2 licensed open-source distribution
 * highly portable C++-11
 * modular code with most functionality in optional packages
@ -113,7 +113,7 @@ Atom creation
 :doc:`create_atoms <create_atoms>`, :doc:`delete_atoms <delete_atoms>`,
 :doc:`displace_atoms <displace_atoms>`, :doc:`replicate <replicate>` commands)

-* read in atom coords from files
+* read in atom coordinates from files
 * create atoms on one or more lattices (e.g. grain boundaries)
 * delete geometric or logical groups of atoms (e.g. voids)
 * replicate existing atoms multiple times
@ -173,11 +173,11 @@ Output
 (:doc:`dump <dump>`, :doc:`restart <restart>` commands)

 * log file of thermodynamic info
-* text dump files of atom coords, velocities, other per-atom quantities
+* text dump files of atom coordinates, velocities, other per-atom quantities
 * dump output on fixed and variable intervals, based timestep or simulated time
 * binary restart files
 * parallel I/O of dump and restart files
-* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
+* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc.)
 * user-defined system-wide (log file) or per-atom (dump file) calculations
 * custom partitioning (chunks) for binning, and static or dynamic grouping of atoms for analysis
 * spatial, time, and per-chunk averaging of per-atom quantities
--- a/doc/src/Intro_nonfeatures.rst
+++ b/doc/src/Intro_nonfeatures.rst
@ -20,22 +20,23 @@ that either closely interface with LAMMPS or extend LAMMPS.

 Here are suggestions on how to perform these tasks:

-* **GUI:** LAMMPS can be built as a library and a Python wrapper that wraps
-  the library interface is provided.  Thus, GUI interfaces can be
-  written in Python (or C or C++ if desired) that run LAMMPS and
-  visualize or plot its output.  Examples of this are provided in the
-  python directory and described on the :doc:`Python <Python_head>` doc
-  page.  Also, there are several external wrappers or GUI front ends.
-* **Builder:** Several pre-processing tools are packaged with LAMMPS.  Some
-  of them convert input files in formats produced by other MD codes such
-  as CHARMM, AMBER, or Insight into LAMMPS input formats.  Some of them
-  are simple programs that will build simple molecular systems, such as
-  linear bead-spring polymer chains.  The moltemplate program is a true
-  molecular builder that will generate complex molecular models.  See
-  the :doc:`Tools <Tools>` page for details on tools packaged with
-  LAMMPS.  The `Pre/post processing page <https:/www.lammps.org/prepost.html>`_ of the LAMMPS website
+* **GUI:** LAMMPS can be built as a library and a Python module that
+  wraps the library interface is provided.  Thus, GUI interfaces can be
+  written in Python or C/C++ that run LAMMPS and visualize or plot its
+  output.  Examples of this are provided in the python directory and
+  described on the :doc:`Python <Python_head>` doc page.  Also, there
+  are several external wrappers or GUI front ends.
+* **Builder:** Several pre-processing tools are packaged with LAMMPS.
+  Some of them convert input files in formats produced by other MD codes
+  such as CHARMM, AMBER, or Insight into LAMMPS input formats.  Some of
+  them are simple programs that will build simple molecular systems,
+  such as linear bead-spring polymer chains.  The moltemplate program is
+  a true molecular builder that will generate complex molecular models.
+  See the :doc:`Tools <Tools>` page for details on tools packaged with
+  LAMMPS.  The `Pre-/post-processing page
+  <https:/www.lammps.org/prepost.html>`_ of the LAMMPS homepage
  describes a variety of third party tools for this task.  Furthermore,
-  some LAMMPS internal commands allow to reconstruct, or selectively add
+  some internal LAMMPS commands allow reconstructing, or selectively adding
  topology information, as well as provide the option to insert molecule
  templates instead of atoms for building bulk molecular systems.
 * **Force-field assignment:** The conversion tools described in the previous
@ -47,33 +48,34 @@ Here are suggestions on how to perform these tasks:
  powerful and flexible in converting force field and topology data
  between various MD simulation programs.
 * **Simulation analysis:** If you want to perform analysis on-the-fly as
-  your simulation runs, see the :doc:`compute <compute>` and
-  :doc:`fix <fix>` doc pages, which list commands that can be used in a
-  LAMMPS input script.  Also see the :doc:`Modify <Modify>` page for
-  info on how to add your own analysis code or algorithms to LAMMPS.
-  For post-processing, LAMMPS output such as :doc:`dump file snapshots <dump>` can be converted into formats used by other MD or
+  your simulation runs, see the :doc:`compute <compute>` and :doc:`fix
+  <fix>` doc pages, which list commands that can be used in a LAMMPS
+  input script.  Also see the :doc:`Modify <Modify>` page for info on
+  how to add your own analysis code or algorithms to LAMMPS.  For
+  post-processing, LAMMPS output such as :doc:`dump file snapshots
+  <dump>` can be converted into formats used by other MD or
  post-processing codes.  To some degree, that conversion can be done
-  directly inside of LAMMPS by interfacing to the VMD molfile plugins.
-  The :doc:`rerun <rerun>` command also allows to do some post-processing
-  of existing trajectories, and through being able to read a variety
-  of file formats, this can also be used for analyzing trajectories
-  from other MD codes.  Some post-processing tools packaged with
-  LAMMPS will do these conversions.  Scripts provided in the
-  tools/python directory can extract and massage data in dump files to
-  make it easier to import into other programs.  See the
-  :doc:`Tools <Tools>` page for details on these various options.
-* **Visualization:** LAMMPS can produce NETPBM, JPG or PNG snapshot images
-  on-the-fly via its :doc:`dump image <dump_image>` command and pass
-  them to an external program, `FFmpeg <https://www.ffmpeg.org>`_ to generate
-  movies from them.  For high-quality, interactive visualization there are
-  many excellent and free tools available.  See the
-  `Visualization Tools <https://www.lammps.org/viz.html>`_ page of the
-  LAMMPS website for
+  directly inside LAMMPS by interfacing to the VMD molfile plugins.  The
+  :doc:`rerun <rerun>` command also allows post-processing of existing
+  trajectories, and through being able to read a variety of file
+  formats, this can also be used for analyzing trajectories from other
+  MD codes.  Some post-processing tools packaged with LAMMPS will do
+  these conversions.  Scripts provided in the tools/python directory can
+  extract and massage data in dump files to make it easier to import
+  into other programs.  See the :doc:`Tools <Tools>` page for details on
+  these various options.
+* **Visualization:** LAMMPS can produce NETPBM, JPG, or PNG format
+  snapshot images on-the-fly via its :doc:`dump image <dump_image>`
+  command and pass them to an external program, `FFmpeg
+  <https://www.ffmpeg.org>`_, to generate movies from them.  For
+  high-quality, interactive visualization, there are many excellent and
+  free tools available.  See the `Visualization Tools
+  <https://www.lammps.org/viz.html>`_ page of the LAMMPS website for
  visualization packages that can process LAMMPS output data.
 * **Plotting:** See the next bullet about Pizza.py as well as the
  :doc:`Python <Python_head>` page for examples of plotting LAMMPS
-  output.  Scripts provided with the *python* tool in the tools
-  directory will extract and massage data in log and dump files to make
+  output.  Scripts provided with the *python* tool in the ``tools``
+  directory will extract and process data in log and dump files to make
  it easier to analyze and plot.  See the :doc:`Tools <Tools>` doc page
  for more discussion of the various tools.
 * **Pizza.py:** Our group has also written a separate toolkit called
--- a/doc/src/Intro_overview.rst
+++ b/doc/src/Intro_overview.rst
@ -1,20 +1,20 @@
 Overview of LAMMPS
 ------------------

-LAMMPS is a classical molecular dynamics (MD) code that models
-ensembles of particles in a liquid, solid, or gaseous state.  It can
-model atomic, polymeric, biological, solid-state (metals, ceramics,
-oxides), granular, coarse-grained, or macroscopic systems using a
-variety of interatomic potentials (force fields) and boundary
-conditions.  It can model 2d or 3d systems with only a few particles
-up to millions or billions.
+LAMMPS is a classical molecular dynamics (MD) code that models ensembles
+of particles in a liquid, solid, or gaseous state.  It can model atomic,
+polymeric, biological, solid-state (metals, ceramics, oxides), granular,
+coarse-grained, or macroscopic systems using a variety of interatomic
+potentials (force fields) and boundary conditions.  It can model 2d or
+3d systems with sizes ranging from only a few particles up to billions.

-LAMMPS can be built and run on a laptop or desktop machine, but is
+LAMMPS can be built and run on single laptop or desktop machines, but is
 designed for parallel computers.  It will run in serial and on any
 parallel machine that supports the `MPI <mpi_>`_ message-passing
-library.  This includes shared-memory boxes and distributed-memory
-clusters and supercomputers. Parts of LAMMPS also support
-`OpenMP multi-threading <omp_>`_, vectorization and GPU acceleration.
+library.  This includes shared-memory multicore, multi-CPU servers and
+distributed-memory clusters and supercomputers. Parts of LAMMPS also
+support `OpenMP multi-threading <omp_>`_, vectorization, and GPU
+acceleration.

 .. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface
 .. _lws: https://www.lammps.org
@ -42,11 +42,11 @@ LAMMPS uses neighbor lists to keep track of nearby particles.  The lists
 are optimized for systems with particles that are repulsive at short
 distances, so that the local density of particles never becomes too
 large.  This is in contrast to methods used for modeling plasma or
-gravitational bodies (e.g. galaxy formation).
+gravitational bodies (like galaxy formation).

 On parallel machines, LAMMPS uses spatial-decomposition techniques with
-MPI parallelization to partition the simulation domain into sub-domains
+MPI parallelization to partition the simulation domain into subdomains
 of equal computational cost, one of which is assigned to each processor.
 Processors communicate and store "ghost" atom information for atoms that
-border their sub-domain.  Multi-threading parallelization and GPU
-acceleration with with particle-decomposition can be used in addition.
+border their subdomain.  Multi-threading parallelization and GPU
+acceleration with particle-decomposition can be used in addition.
--- a/doc/src/Intro_portability.rst
+++ b/doc/src/Intro_portability.rst
@ -30,17 +30,17 @@ can be created using CMake.  CMake must be at least version 3.10.
 Operating systems
 ^^^^^^^^^^^^^^^^^

-The primary development platform for LAMMPS is Linux.  Thus the chances
+The primary development platform for LAMMPS is Linux.  Thus, the chances
 for LAMMPS to compile without problems on Linux machines are the best.
-Also compilation and correct execution on macOS and Windows (using
+Also, compilation and correct execution on macOS and Windows (using
 Microsoft Visual C++) is checked automatically for largest part of the
 source code.  Some (optional) features are not compatible with all
-operating systems either through limitations of the source code or
-source code compatibility or the build system requirements of required
-libraries.
+operating systems, either through limitations of the corresponding
+LAMMPS source code or through source code or build system
+incompatibilities of required libraries.

-Executables for Windows may be created using either Cygwin or Visual
-Studio or a Linux to Windows MinGW cross-compiler.
+Executables for Windows may be created natively using either Cygwin or
+Visual Studio or with a Linux to Windows MinGW cross-compiler.

 Additionally, FreeBSD and Solaris have been tested successfully.

@ -49,7 +49,7 @@ Compilers

 The most commonly used compilers are the GNU compilers, but also Clang
 and the Intel compilers have been successfully used on Linux, macOS, and
-Windows.  Also the Nvidia HPC SDK (formerly PGI compilers) will compile
+Windows.  Also, the Nvidia HPC SDK (formerly PGI compilers) will compile
 LAMMPS (tested on Linux).

 CPU architectures
@ -62,12 +62,14 @@ regularly tested.
 Portability compliance
 ^^^^^^^^^^^^^^^^^^^^^^

-Not all of the LAMMPS source code is fully compliant to all of the above
-mentioned standards.  This is rather typical for projects like LAMMPS
-that largely depend on contributions of features from the community.
+Only a subset of the LAMMPS source code is fully compliant to all of the
+above mentioned standards.  This is rather typical for projects like
+LAMMPS that largely depend on contributions from the user community.
 Not all contributors are trained as programmers and not all of them have
-access to a variety of platforms.  As part of the continuous integration
-process, however, all contributions are automatically tested to compile,
-link, and pass some runtime tests on a selection of Linux flavors,
-macOS, and Windows with different compilers.  Other platforms may be
-checked occasionally or when portability bug are reported.
+access to multiple platforms for testing.  As part of the continuous
+integration process, however, all contributions are automatically tested
+to compile, link, and pass some runtime tests on a selection of Linux
+flavors, macOS, and Windows, and on Linux with different compilers.
+Thus portability issues are often found before a pull request is merged.
+Other platforms may be checked occasionally or when portability bugs are
+reported.
--- a/doc/src/Library_properties.rst
+++ b/doc/src/Library_properties.rst
@ -30,7 +30,7 @@ course, changing values should be done with care.  When accessing per-atom
 data, please note that these data are the per-processor **local** data and are
 indexed accordingly. Per-atom data can change sizes and ordering at
 every neighbor list rebuild or atom sort event as atoms migrate between
-sub-domains and processors.
+subdomains and processors.

 .. code-block:: c

--- a/doc/src/Manual.rst
+++ b/doc/src/Manual.rst
@ -5,16 +5,17 @@ LAMMPS Documentation (|version| version)
 LAMMPS stands for **L**\ arge-scale **A**\ tomic/**M**\ olecular
 **M**\ assively **P**\ arallel **S**\ imulator.

-LAMMPS is a classical molecular dynamics simulation code with a focus
-on materials modeling.  It was designed to run efficiently on parallel
-computers.  It was developed originally at Sandia National
-Laboratories, a US Department of Energy facility.  The majority of
-funding for LAMMPS has come from the US Department of Energy (DOE).
-LAMMPS is an open-source code, distributed freely under the terms of
-the GNU Public License Version 2 (GPLv2).
+LAMMPS is a classical molecular dynamics simulation code focusing on
+materials modeling.  It was designed to run efficiently on parallel
+computers and to be easy to extend and modify.  Originally developed at
+Sandia National Laboratories, a US Department of Energy facility, LAMMPS
+now includes contributions from many research groups and individuals
+from many institutions.  Most of the funding for LAMMPS has come from
+the US Department of Energy (DOE).  LAMMPS is open-source software
+distributed under the terms of the GNU Public License Version 2 (GPLv2).

 The `LAMMPS website <lws_>`_ has a variety of information about the
-code.  It includes links to an on-line version of this manual, an
+code.  It includes links to an online version of this manual, an
 `online forum <https://www.lammps.org/forum.html>`_ where users can post
 questions and discuss LAMMPS, and a `GitHub site
 <https://github.com/lammps/lammps>`_ where all LAMMPS development is
@ -26,14 +27,14 @@ The content for this manual is part of the LAMMPS distribution.  The
 online version always corresponds to the latest feature release version.
 If needed, you can build a local copy of the manual as HTML pages or a
 PDF file by following the steps on the :doc:`Build_manual` page.  If you
-have difficulties viewing the pages please :ref:`see this note
+have difficulties viewing the pages, please :ref:`see this note
 <webbrowser>`.

 -----------

-The manual is organized in three parts:
+The manual is organized into three parts:

-1. the :ref:`User Guide <user_documentation>` with information about how
+1. The :ref:`User Guide <user_documentation>` with information about how
   to obtain, configure, compile, install, and use LAMMPS,
 2. the :ref:`Programmer Guide <programmer_documentation>` with
   information about how to use the LAMMPS library interface from
@ -47,7 +48,7 @@ The manual is organized in three parts:

 .. only:: html

-   Once you are familiar with LAMMPS, you may want to bookmark
+   After becoming familiar with LAMMPS, consider bookmarking
   :doc:`this page <Commands_all>`, since it gives quick access to
   tables with links to the documentation for all LAMMPS commands.

--- a/doc/src/Manual_version.rst
+++ b/doc/src/Manual_version.rst
@ -2,43 +2,44 @@ What does a LAMMPS version mean
 -------------------------------

 The LAMMPS "version" is the date when it was released, such as 1 May
-2014.  LAMMPS is updated continuously and we aim to keep it working
+2014.  LAMMPS is updated continuously, and we aim to keep it working
 correctly and reliably at all times.  You can follow its development
 in a public `git repository on GitHub <https://github.com/lammps/lammps>`_.

-Modifications of the LAMMPS source code - like bug fixes, code
-refactors, updates to existing features, or addition of new features -
-are organized into pull requests, and will be merged into the *develop*
-branch of the git repository when they pass automated testing and code
+Modifications of the LAMMPS source code (like bug fixes, code refactors,
+updates to existing features, or addition of new features) are organized
+into pull requests.  Pull requests will be merged into the *develop*
+branch of the git repository after they pass automated testing and code
 review by the LAMMPS developers.  When a sufficient number of changes
-have accumulated *and* the software passes a set of automated tests, we
-release it as a *feature release* (or patch release), which are
-currently made every 4-8 weeks.  The *release* branch of the git
-repository is updated with every such release.  A summary of the most
-important changes of the patch releases are on `this website page
-<https://www.lammps.org/bug.html>`_.  More detailed release notes are
-`available on GitHub <https://github.com/lammps/lammps/releases/>`_.
+have accumulated *and* the *develop* branch version passes an extended
+set of automated tests, we release it as a *feature release* (or patch
+release), which are currently made every 4 to 8 weeks.  The *release*
+branch of the git repository is updated with every such release.  A
+summary of the most important changes of the patch releases are on `this
+website page <https://www.lammps.org/bug.html>`_.  More detailed release
+notes are `available on GitHub
+<https://github.com/lammps/lammps/releases/>`_.

 Once or twice a year, we have a "stabilization period" where we apply
 only bug fixes and small, non-intrusive changes to the *develop*
-branch.  At the same time the code is subjected to more detailed and
-thorough manual testing than the default automated testing.  Also
+branch.  At the same time, the code is subjected to more detailed and
+thorough manual testing than the default automated testing.  Also,
 several variants of static code analysis are run to improve the overall
 code quality, consistency, and compliance with programming standards,
 best practices and style conventions.

 The latest patch release after such a period is then also labeled as a
 *stable* version and the *stable* branch is updated with it.  Between
-stable releases we occasionally release updates to the stable release
+stable releases, we occasionally release updates to the stable release
 containing only bug fixes and updates back-ported from the *develop*
 branch and update the *stable* branch accordingly.

 Each version of LAMMPS contains all the documented features up to and
-including its version date.  For recently added features we add markers
+including its version date.  For recently added features, we add markers
 to the documentation at which specific LAMMPS version a feature or
 keyword was added or significantly changed.

-The version date is printed to the screen and logfile every time you run
+The version date is printed to the screen and log file every time you run
 LAMMPS.  It is also in the file src/version.h and in the LAMMPS
 directory name created when you unpack a tarball.  And it is on the
 first page of the :doc:`manual <Manual>`.
--- a/doc/src/Python_atoms.rst
+++ b/doc/src/Python_atoms.rst
@ -23,7 +23,7 @@ against invalid accesses.
   When accessing per-atom data,
   please note that this data is the per-processor local data and indexed
   accordingly. These arrays can change sizes and order at every neighbor list
-   rebuild and atom sort event as atoms are migrating between sub-domains.
+   rebuild and atom sort event as atoms are migrating between subdomains.

 .. tabs::

--- a/doc/src/Python_properties.rst
+++ b/doc/src/Python_properties.rst
@ -23,7 +23,7 @@ against invalid accesses.
   When accessing per-atom data,
   please note that this data is the per-processor local data and indexed
   accordingly. These arrays can change sizes and order at every neighbor list
-   rebuild and atom sort event as atoms are migrating between sub-domains.
+   rebuild and atom sort event as atoms are migrating between subdomains.

 .. tabs::

--- a/doc/src/Speed.rst
+++ b/doc/src/Speed.rst
@ -9,7 +9,7 @@ There are two thrusts to the discussion that follows.  The first is
 using code options that implement alternate algorithms that can
 speed-up a simulation.  The second is to use one of the several
 accelerator packages provided with LAMMPS that contain code optimized
-for certain kinds of hardware, including multi-core CPUs, GPUs, and
+for certain kinds of hardware, including multicore CPUs, GPUs, and
 Intel Xeon Phi co-processors.

 The `Benchmark page <https://www.lammps.org/bench.html>`_ of the LAMMPS
--- a/doc/src/Speed_gpu.rst
+++ b/doc/src/Speed_gpu.rst
@ -11,7 +11,7 @@ parts of the :doc:`kspace_style pppm <kspace_style>` for long-range
 Coulombics.  It has the following general features:

 * It is designed to exploit common GPU hardware configurations where one
-  or more GPUs are coupled to many cores of one or more multi-core CPUs,
+  or more GPUs are coupled to many cores of one or more multicore CPUs,
  e.g. within a node of a parallel machine.
 * Atom-based data (e.g. coordinates, forces) are moved back-and-forth
  between the CPU(s) and GPU every timestep.
@ -28,7 +28,7 @@ Coulombics.  It has the following general features:
 * LAMMPS-specific code is in the GPU package.  It makes calls to a
  generic GPU library in the lib/gpu directory.  This library provides
  either Nvidia support, AMD support, or more general OpenCL support
-  (for Nvidia GPUs, AMD GPUs, Intel GPUs, and multi-core CPUs).
+  (for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs).
  so that the same functionality is supported on a variety of hardware.

 **Required hardware/software:**
@ -146,7 +146,7 @@ GPUs/node to use, as well as other options.

 **Speed-ups to expect:**

-The performance of a GPU versus a multi-core CPU is a function of your
+The performance of a GPU versus a multicore CPU is a function of your
 hardware, which pair style is used, the number of atoms/GPU, and the
 precision used on the GPU (double, single, mixed). Using the GPU package
 in OpenCL mode on CPUs (which uses vectorization and multithreading) is
@ -174,7 +174,7 @@ deterministic results.
 **Guidelines for best performance:**

 * Using multiple MPI tasks per GPU will often give the best performance,
-  as allowed my most multi-core CPU/GPU configurations.
+  as allowed my most multicore CPU/GPU configurations.
 * If the number of particles per MPI task is small (e.g. 100s of
  particles), it can be more efficient to run with fewer MPI tasks per
  GPU, even if you do not use all the cores on the compute node.
--- a/doc/src/Speed_kokkos.rst
+++ b/doc/src/Speed_kokkos.rst
@ -79,7 +79,7 @@ manner via the ``mpirun`` or ``mpiexec`` commands, and is independent of
 Kokkos. E.g. the mpirun command in OpenMPI does this via its ``-np`` and
 ``-npernode`` switches. Ditto for MPICH via ``-np`` and ``-ppn``.

-Running on a multi-core CPU
+Running on a multicore CPU
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Here is a quick overview of how to use the KOKKOS package
@ -254,7 +254,7 @@ is recommended in this scenario.

 Using a GPU-aware MPI library is highly recommended. GPU-aware MPI use can be
 avoided by using :doc:`-pk kokkos gpu/aware off <package>`. As above for
-multi-core CPUs (and no GPU), if N is the number of physical cores/node,
+multicore CPUs (and no GPU), if N is the number of physical cores/node,
 then the number of MPI tasks/node should not exceed N.

 .. parsed-literal::
--- a/doc/src/Speed_omp.rst
+++ b/doc/src/Speed_omp.rst
@ -12,7 +12,7 @@ Required hardware/software
 """"""""""""""""""""""""""

 To enable multi-threading, your compiler must support the OpenMP interface.
-You should have one or more multi-core CPUs, as multiple threads can only be
+You should have one or more multicore CPUs, as multiple threads can only be
 launched by each MPI task on the local node (using shared memory).

 Building LAMMPS with the OPENMP package
@ -157,7 +157,7 @@ Additional performance tips are as follows:
  affinity setting that restricts each MPI task to a single CPU core.
  Using multi-threading in this mode will force all threads to share the
  one core and thus is likely to be counterproductive.  Instead, binding
-  MPI tasks to a (multi-core) socket, should solve this issue.
+  MPI tasks to a (multicore) socket, should solve this issue.

 Restrictions
 """"""""""""
--- a/doc/src/atom_modify.rst
+++ b/doc/src/atom_modify.rst
@ -113,7 +113,7 @@ your input script.  LAMMPS does not use the group until a simulation
 is run.

 The *sort* keyword turns on a spatial sorting or reordering of atoms
-within each processor's sub-domain every *Nfreq* timesteps.  If
+within each processor's subdomain every *Nfreq* timesteps.  If
 *Nfreq* is set to 0, then sorting is turned off.  Sorting can improve
 cache performance and thus speed-up a LAMMPS simulation, as discussed
 in a paper by :ref:`(Meloni) <Meloni>`.  Its efficacy depends on the problem
--- a/doc/src/balance.rst
+++ b/doc/src/balance.rst
@ -54,7 +54,7 @@ Syntax
           *store* name = store weight in custom atom property defined by :doc:`fix property/atom <fix_property_atom>` command
             name = atom property name (without d\_ prefix)
       *out* arg = filename
-         filename = write each processor's sub-domain to a file
+         filename = write each processor's subdomain to a file

 Examples
 """"""""
@ -72,14 +72,14 @@ Examples
 Description
 """""""""""

-This command adjusts the size and shape of processor sub-domains
+This command adjusts the size and shape of processor subdomains
 within the simulation box, to attempt to balance the number of atoms
 or particles and thus indirectly the computational cost (load) more
 evenly across processors.  The load balancing is "static" in the sense
 that this command performs the balancing once, before or between
-simulations.  The processor sub-domains will then remain static during
+simulations.  The processor subdomains will then remain static during
 the subsequent run.  To perform "dynamic" balancing, see the :doc:`fix
-balance <fix_balance>` command, which can adjust processor sub-domain
+balance <fix_balance>` command, which can adjust processor subdomain
 sizes and shapes on-the-fly during a :doc:`run <run>`.

 Load-balancing is typically most useful if the particles in the
@ -90,7 +90,7 @@ an irregular-shaped geometry containing void regions, or :doc:`hybrid
 pair style simulations <pair_hybrid>` which combine pair styles with
 different computational cost.  In these cases, the LAMMPS default of
 dividing the simulation box volume into a regular-spaced grid of 3d
-bricks, with one equal-volume sub-domain per processor, may assign
+bricks, with one equal-volume subdomain per processor, may assign
 numbers of particles per processor in a way that the computational
 effort varies significantly.  This can lead to poor performance when
 the simulation is run in parallel.
@ -109,7 +109,7 @@ Specifically, for a Px by Py by Pz grid of processors, it allows
 choice of Px, Py, and Pz, subject to the constraint that Px \* Py \*
 Pz = P, the total number of processors.  This is sufficient to achieve
 good load-balance for some problems on some processor counts.
-However, all the processor sub-domains will still have the same shape
+However, all the processor subdomains will still have the same shape
 and same volume.

 The requested load-balancing operation is only performed if the
@ -162,7 +162,7 @@ fractions of the box length) are also printed.
   simulation could run up to 20% faster if it were perfectly balanced,
   versus when imbalanced.  However, computational cost is not strictly
   proportional to particle count, and changing the relative size and
-   shape of processor sub-domains may lead to additional computational
+   shape of processor subdomains may lead to additional computational
   and communication overheads, e.g. in the PPPM solver used via the
   :doc:`kspace_style <kspace_style>` command.  Thus you should benchmark
   the run times of a simulation before and after balancing.
@ -177,7 +177,7 @@ The *x*, *y*, *z*, and *shift* styles are "grid" methods which
 produce a logical 3d grid of processors.  They operate by changing the
 cutting planes (or lines) between processors in 3d (or 2d), to adjust
 the volume (area in 2d) assigned to each processor, as in the
-following 2d diagram where processor sub-domains are shown and
+following 2d diagram where processor subdomains are shown and
 particles are colored by the processor that owns them.

 .. |balance1| image:: img/balance_uniform.jpg
@ -226,7 +226,7 @@ The *x*, *y*, and *z* styles invoke a "grid" method for balancing, as
 described above.  Note that any or all of these 3 styles can be
 specified together, one after the other, but they cannot be used with
 any other style.  This style adjusts the position of cutting planes
-between processor sub-domains in specific dimensions.  Only the
+between processor subdomains in specific dimensions.  Only the
 specified dimensions are altered.

 The *uniform* argument spaces the planes evenly, as in the left
@ -245,8 +245,8 @@ the cutting place.  The left (or lower) edge of the box is 0.0, and
 the right (or upper) edge is 1.0.  Neither of these values is
 specified.  Only the interior Ps-1 positions are specified.  Thus is
 there are 2 processors in the x dimension, you specify a single value
-such as 0.75, which would make the left processor's sub-domain 3x
-larger than the right processor's sub-domain.
+such as 0.75, which would make the left processor's subdomain 3x
+larger than the right processor's subdomain.

 ----------

@ -288,10 +288,10 @@ adjacent planes are closer together than the neighbor skin distance
 (as specified by the :doc:`neigh_modify <neigh_modify>` command), then
 the plane positions are shifted to separate them by at least this
 amount.  This is to prevent particles being lost when dynamics are run
-with processor sub-domains that are too narrow in one or more
+with processor subdomains that are too narrow in one or more
 dimensions.

-Once the re-balancing is complete and final processor sub-domains
+Once the re-balancing is complete and final processor subdomains
 assigned, particles are migrated to their new owning processor, and
 the balance procedure ends.

@ -299,7 +299,7 @@ the balance procedure ends.

   At each re-balance operation, the bisectioning for each cutting
   plane (line in 2d) typically starts with low and high bounds separated
-   by the extent of a processor's sub-domain in one dimension.  The size
+   by the extent of a processor's subdomain in one dimension.  The size
   of this bracketing region shrinks by 1/2 every iteration.  Thus if
   *Niter* is specified as 10, the cutting plane will typically be
   positioned to 1 part in 1000 accuracy (relative to the perfect target
@ -494,7 +494,7 @@ different kinds of custom atom vectors or arrays as arguments.

 The *out* keyword writes a text file to the specified *filename* with
 the results of the balancing operation.  The file contains the bounds
-of the sub-domain for each processor after the balancing operation
+of the subdomain for each processor after the balancing operation
 completes.  The format of the file is compatible with the
 `Pizza.py <pizza_>`_ *mdump* tool which has support for manipulating and
 visualizing mesh files.  An example is shown here for a balancing by 4
@ -538,7 +538,7 @@ processors for a 2d problem:
   4 1 13 14 15 16

 The coordinates of all the vertices are listed in the NODES section, 5
-per processor.  Note that the 4 sub-domains share vertices, so there
+per processor.  Note that the 4 subdomains share vertices, so there
 will be duplicate nodes in the list.

 The "SQUARES" section lists the node IDs of the 4 vertices in a
--- a/doc/src/boundary.rst
+++ b/doc/src/boundary.rst
@ -61,7 +61,7 @@ move. Note that when the difference between the current box dimensions
 and the shrink-wrap box dimensions is large, this can lead to lost
 atoms at the beginning of a run when running in parallel. This is due
 to the large change in the (global) box dimensions also causing
-significant changes in the individual sub-domain sizes. If these
+significant changes in the individual subdomain sizes. If these
 changes are farther than the communication cutoff, atoms will be lost.
 This is best addressed by setting initial box dimensions to match the
 shrink-wrapped dimensions more closely, by using *m* style boundaries
--- a/doc/src/comm_modify.rst
+++ b/doc/src/comm_modify.rst
@ -62,7 +62,7 @@ distances are used to determine which atoms to communicate.

 The default mode is *single* which means each processor acquires
 information for ghost atoms that are within a single distance from its
-sub-domain.  The distance is by default the maximum of the neighbor
+subdomain.  The distance is by default the maximum of the neighbor
 cutoff across all atom type pairs.

 For many systems this is an efficient algorithm, but for systems with
@ -81,7 +81,7 @@ with both the *multi* and *multi/old* neighbor styles.

 The *cutoff* keyword allows you to extend the ghost cutoff distance
 for communication mode *single*, which is the distance from the borders
-of a processor's sub-domain at which ghost atoms are acquired from other
+of a processor's subdomain at which ghost atoms are acquired from other
 processors.  By default the ghost cutoff = neighbor cutoff = pairwise
 force cutoff + neighbor skin.  See the :doc:`neighbor <neighbor>` command
 for more information about the skin distance.  If the specified Rcut is
--- a/doc/src/compute.rst
+++ b/doc/src/compute.rst
@ -54,7 +54,7 @@ per atom, e.g. a list of bond distances.  Per-grid quantities are
 calculated on a regular 2d or 3d grid which overlays a 2d or 3d
 simulation domain.  The grid points and the data they store are
 distributed across processors; each processor owns the grid points
-which fall within its sub-domain.
+which fall within its subdomain.

 Computes that produce per-atom quantities have the word "atom" at the
 end of their style, e.g. *ke/atom*\ .  Computes that produce local
--- a/doc/src/compute_pressure.rst
+++ b/doc/src/compute_pressure.rst
@ -48,9 +48,9 @@ the virial, equal to :math:`-dU/dV`, computed for all pairwise as well
 as 2-body, 3-body, 4-body, many-body, and long-range interactions, where
 :math:`\vec r_i` and :math:`\vec f_i` are the position and force vector
 of atom *i*, and the dot indicates the dot product (scalar product).
-This is computed in parallel for each sub-domain and then summed over
+This is computed in parallel for each subdomain and then summed over
 all parallel processes. Thus :math:`N'` necessarily includes atoms from
-neighboring sub-domains (so-called ghost atoms) and the position and
+neighboring subdomains (so-called ghost atoms) and the position and
 force vectors of ghost atoms are thus included in the summation.  Only
 when running in serial and without periodic boundary conditions is
 :math:`N' = N` the number of atoms in the system.  :doc:`Fixes <fix>`
--- a/doc/src/compute_property_grid.rst
+++ b/doc/src/compute_property_grid.rst
@ -39,7 +39,7 @@ Description
 Define a computation that stores the specified attributes of a
 distributed grid.  In LAMMPS, distributed grids are regular 2d or 3d
 grids which overlay a 2d or 3d simulation domain.  Each processor owns
-the grid cells whose center points lie within its sub-domain.  See the
+the grid cells whose center points lie within its subdomain.  See the
 :doc:`Howto grid <Howto_grid>` doc page for details of how distributed
 grids can be defined by various commands and referenced.

--- a/doc/src/compute_sna_atom.rst
+++ b/doc/src/compute_sna_atom.rst
@ -259,7 +259,7 @@ layout in the global array.
 Compute *sna/grid/local* calculates bispectrum components of a regular
 grid of points similarly to compute *sna/grid* described above.
 However, because the array is local, it contains only rows for grid points
-that are local to the processor sub-domain. The global grid
+that are local to the processor subdomain. The global grid
 of :math:`nx \times ny \times nz` points is still laid out in space the same as for *sna/grid*,
 but grid points are strictly partitioned, so that every grid point appears in
 one and only one local array.  The array contains one row for each of the
--- a/doc/src/dump_image.rst
+++ b/doc/src/dump_image.rst
@ -80,9 +80,9 @@ Syntax
         axes = *yes* or *no* = do or do not draw xyz axes lines next to simulation box
         length = length of axes lines as fraction of respective box lengths
         diam = diameter of axes lines as fraction of shortest box length
-       *subbox* values = lines diam = draw outline of processor sub-domains
-         lines = *yes* or *no* = do or do not draw sub-domain lines
-         diam = diameter of sub-domain lines as fraction of shortest box length
+       *subbox* values = lines diam = draw outline of processor subdomains
+         lines = *yes* or *no* = do or do not draw subdomain lines
+         diam = diameter of subdomain lines as fraction of shortest box length
       *shiny* value = sfactor = shinyness of spheres and cylinders
         sfactor = shinyness of spheres and cylinders from 0.0 to 1.0
       *ssao* value = shading seed dfactor = SSAO depth shading
@ -145,7 +145,7 @@ Syntax
       *bitrate* arg = rate
         rate = target bitrate for movie in kbps
       *boxcolor* arg = color
-         color = name of color for simulation box lines and processor sub-domain lines
+         color = name of color for simulation box lines and processor subdomain lines
       *color* args = name R G B
         name = name of color
         R,G,B = red/green/blue numeric values from 0.0 to 1.0
@ -581,13 +581,13 @@ respective box lengths.  The *diam* setting determines their thickness
 as a fraction of the shortest box length in x,y,z (for 3d) or x,y (for
 2d).

-The *subbox* keyword determines if and how processor sub-domain
+The *subbox* keyword determines if and how processor subdomain
 boundaries are rendered as thin cylinders in the image.  If *no* is
-set (default), then the sub-domain boundaries are not drawn and the
+set (default), then the subdomain boundaries are not drawn and the
 *diam* setting is ignored.  If *yes* is set, the 12 edges of each
-processor sub-domain are drawn, with a diameter that is a fraction of
+processor subdomain are drawn, with a diameter that is a fraction of
 the shortest box length in x,y,z (for 3d) or x,y (for 2d).  The color
-of the sub-domain boundaries can be set with the "dump_modify
+of the subdomain boundaries can be set with the "dump_modify
 boxcolor" command.

 ----------
@ -921,8 +921,8 @@ formats.

 The *boxcolor* keyword sets the color of the simulation box drawn
 around the atoms in each image as well as the color of processor
-sub-domain boundaries.  See the "dump image box" command for how to
-specify that a box be drawn via the *box* keyword, and the sub-domain
+subdomain boundaries.  See the "dump image box" command for how to
+specify that a box be drawn via the *box* keyword, and the subdomain
 boundaries via the *subbox* keyword.  The color name can be any of the
 140 pre-defined colors (see below) or a color name defined by the
 dump_modify color option.
--- a/doc/src/fix.rst
+++ b/doc/src/fix.rst
@ -89,7 +89,7 @@ owns, but there may be zero or more per atoms.  Per-grid quantities
 are calculated on a regular 2d or 3d grid which overlays a 2d or 3d
 simulation domain.  The grid points and the data they store are
 distributed across processors; each processor owns the grid points
-which fall within its sub-domain.
+which fall within its subdomain.

 Note that a single fix typically produces either global or per-atom or
 local or per-grid values (or none at all).  It does not produce both
--- a/doc/src/fix_ave_grid.rst
+++ b/doc/src/fix_ave_grid.rst
@ -84,7 +84,7 @@ produced by other computes or fixes.  This fix operates in either
 per-grid inputs in the same command.

 The grid created by this command is distributed; each processor owns
-the grid points that are within its sub-domain.  This is similar to
+the grid points that are within its subdomain.  This is similar to
 the :doc:`fix ave/chunk <fix_ave_chunk>` command when it uses chunks
 from the :doc:`compute chunk/atom <compute_chunk_atom>` command which
 are 2d or 3d regular bins.  However, the per-bin outputs in that case
--- a/doc/src/fix_balance.rst
+++ b/doc/src/fix_balance.rst
@ -44,7 +44,7 @@ Syntax
           *store* name = store weight in custom atom property defined by :doc:`fix property/atom <fix_property_atom>` command
             name = atom property name (without d\_ prefix)
       *out* arg = filename
-         filename = write each processor's sub-domain to a file, at each re-balancing
+         filename = write each processor's subdomain to a file, at each re-balancing

 Examples
 """"""""
@ -61,7 +61,7 @@ Examples
 Description
 """""""""""

-This command adjusts the size and shape of processor sub-domains
+This command adjusts the size and shape of processor subdomains
 within the simulation box, to attempt to balance the number of
 particles and thus the computational cost (load) evenly across
 processors.  The load balancing is "dynamic" in the sense that
@ -77,7 +77,7 @@ an irregular-shaped geometry containing void regions, or
 :doc:`hybrid pair style simulations <pair_hybrid>` that combine
 pair styles with different computational cost).  In these cases, the
 LAMMPS default of dividing the simulation box volume into a
-regular-spaced grid of 3d bricks, with one equal-volume sub-domain
+regular-spaced grid of 3d bricks, with one equal-volume subdomain
 per processor, may assign numbers of particles per processor in a
 way that the computational effort varies significantly.  This can
 lead to poor performance when the simulation is run in parallel.
@ -105,7 +105,7 @@ a :math:`P_x \times P_y \times P_z` grid of processors, it allows choices of
 :math:`P_x P_y P_z = P`, the total number of processors.
 This is sufficient to achieve good load-balance for
 some problems on some processor counts.  However, all the processor
-sub-domains will still have the same shape and the same volume.
+subdomains will still have the same shape and the same volume.

 On a particular time step, a load-balancing operation is only performed
 if the current "imbalance factor" in particles owned by each processor
@ -141,7 +141,7 @@ forced even if the current balance is perfect (1.0) be specifying a
   simulation could run up to 20% faster if it were perfectly balanced,
   versus when imbalanced.  However, computational cost is not strictly
   proportional to particle count, and changing the relative size and
-   shape of processor sub-domains may lead to additional computational
+   shape of processor subdomains may lead to additional computational
   and communication overheads (e.g., in the PPPM solver used via the
   :doc:`kspace_style <kspace_style>` command).  Thus, you should benchmark
   the run times of a simulation before and after balancing.
@ -156,7 +156,7 @@ The *shift* style is a "grid" method which produces a logical 3d grid
 of processors.  It operates by changing the cutting planes (or lines)
 between processors in 3d (or 2d), to adjust the volume (area in 2d)
 assigned to each processor, as in the following 2d diagram where
-processor sub-domains are shown and atoms are colored by the processor
+processor subdomains are shown and atoms are colored by the processor
 that owns them.

 .. |balance1| image:: img/balance_uniform.jpg
@ -258,7 +258,7 @@ from balanced, and converge more slowly.  In this case you probably
 want to use the :doc:`balance <balance>` command before starting a run,
 so that you begin the run with a balanced system.

-Once the re-balancing is complete and final processor sub-domains
+Once the re-balancing is complete and final processor subdomains
 assigned, particles migrate to their new owning processor as part of
 the normal reneighboring procedure.

@ -266,7 +266,7 @@ the normal reneighboring procedure.

   At each re-balance operation, the bisectioning for each cutting
   plane (line in 2d) typically starts with low and high bounds separated
-   by the extent of a processor's sub-domain in one dimension.  The size
+   by the extent of a processor's subdomain in one dimension.  The size
   of this bracketing region shrinks based on the local density, as
   described above, which should typically be 1/2 or more every
   iteration.  Thus if :math:`N_\text{iter}` is specified as 10, the cutting
@ -310,7 +310,7 @@ in that sub-box.

 The *out* keyword writes text to the specified *filename* with the
 results of each re-balancing operation.  The file contains the bounds
-of the sub-domain for each processor after the balancing operation
+of the subdomain for each processor after the balancing operation
 completes.  The format of the file is compatible with the
 `Pizza.py <pizza_>`_ *mdump* tool which has support for manipulating and
 visualizing mesh files.  An example is shown here for a balancing by four
@ -354,7 +354,7 @@ processors for a 2d problem:
   4 1 13 14 15 16

 The coordinates of all the vertices are listed in the NODES section, five
-per processor.  Note that the four sub-domains share vertices, so there
+per processor.  Note that the four subdomains share vertices, so there
 will be duplicate nodes in the list.

 The "SQUARES" section lists the node IDs of the four vertices in a
--- a/doc/src/fix_box_relax.rst
+++ b/doc/src/fix_box_relax.rst
@ -118,7 +118,7 @@ displaced by the same amount, different on each iteration.
   all.  Also note that if the box shape tilts to an extreme shape,
   LAMMPS will run less efficiently, due to the large volume of
   communication needed to acquire ghost atoms around a processor's
-   irregular-shaped sub-domain.  For extreme values of tilt, LAMMPS may
+   irregular-shaped subdomain.  For extreme values of tilt, LAMMPS may
   also lose atoms and generate an error.

 .. note::
--- a/doc/src/fix_deform.rst
+++ b/doc/src/fix_deform.rst
@ -546,7 +546,7 @@ flipping the box when it is exceeded.  If the *flip* value is set to
 you apply large deformations, this means the box shape can tilt
 dramatically LAMMPS will run less efficiently, due to the large volume
 of communication needed to acquire ghost atoms around a processor's
-irregular-shaped sub-domain.  For extreme values of tilt, LAMMPS may
+irregular-shaped subdomain.  For extreme values of tilt, LAMMPS may
 also lose atoms and generate an error.

 The *units* keyword determines the meaning of the distance units used
--- a/doc/src/fix_lb_fluid.rst
+++ b/doc/src/fix_lb_fluid.rst
@ -198,7 +198,7 @@ dt}{\rho dx^2}` is approximately equal to 1.
   and a simulation domain size.  This fix uses the same subdivision of
   the simulation domain among processors as the main LAMMPS program.  In
   order to uniformly cover the simulation domain with lattice sites, the
-   lengths of the individual LAMMPS sub-domains must all be evenly
+   lengths of the individual LAMMPS subdomains must all be evenly
   divisible by :math:`dx_{LB}`.  If the simulation domain size is cubic,
   with equal lengths in all dimensions, and the default value for
   :math:`dx_{LB}` is used, this will automatically be satisfied.
--- a/doc/src/fix_nh.rst
+++ b/doc/src/fix_nh.rst
@ -371,7 +371,7 @@ flipping the box when it is exceeded.  If the *flip* value is set to
 applied stress induces large deformations (e.g. in a liquid), this
 means the box shape can tilt dramatically and LAMMPS will run less
 efficiently, due to the large volume of communication needed to
-acquire ghost atoms around a processor's irregular-shaped sub-domain.
+acquire ghost atoms around a processor's irregular-shaped subdomain.
 For extreme values of tilt, LAMMPS may also lose atoms and generate an
 error.

--- a/doc/src/fix_npt_cauchy.rst
+++ b/doc/src/fix_npt_cauchy.rst
@ -311,7 +311,7 @@ flipping the box when it is exceeded.  If the *flip* value is set to
 applied stress induces large deformations (e.g. in a liquid), this
 means the box shape can tilt dramatically and LAMMPS will run less
 efficiently, due to the large volume of communication needed to
-acquire ghost atoms around a processor's irregular-shaped sub-domain.
+acquire ghost atoms around a processor's irregular-shaped subdomain.
 For extreme values of tilt, LAMMPS may also lose atoms and generate an
 error.

--- a/doc/src/fix_shardlow.rst
+++ b/doc/src/fix_shardlow.rst
@ -69,7 +69,7 @@ geometries.
 This fix must be used with an additional fix that specifies time
 integration, e.g. :doc:`fix nve <fix_nve>` or :doc:`fix nph <fix_nh>`.

-The Shardlow splitting algorithm requires the sizes of the sub-domain
+The Shardlow splitting algorithm requires the sizes of the subdomain
 lengths to be larger than twice the cutoff+skin.  Generally, the
 domain decomposition is dependent on the number of processors
 requested.
--- a/doc/src/fix_ttm.rst
+++ b/doc/src/fix_ttm.rst
@ -90,7 +90,7 @@ The description in this sub-section applies to all 3 fix styles:
 *ttm*, *ttm/grid*, and *ttm/mod*.

 Fix *ttm/grid* distributes the regular grid across processors consistent
-with the sub-domains of atoms owned by each processor, but is otherwise
+with the subdomains of atoms owned by each processor, but is otherwise
 identical to fix ttm.  Note that fix *ttm* stores a copy of the grid on
 each processor, which is acceptable when the overall grid is reasonably
 small.  For larger grids you should use fix *ttm/grid* instead.
@ -170,11 +170,11 @@ ttm/mod.
  periodic boundary conditions in all dimensions.  They also require
  that the size and shape of the simulation box do not vary
  dynamically, e.g. due to use of the :doc:`fix npt <fix_nh>` command.
-  Likewise, the size/shape of processor sub-domains cannot vary due to
+  Likewise, the size/shape of processor subdomains cannot vary due to
  dynamic load-balancing via use of the :doc:`fix balance
  <fix_balance>` command.  It is possible however to load balance
  before the simulation starts using the :doc:`balance <balance>`
-  command, so that each processor has a different size sub-domain.
+  command, so that each processor has a different size subdomain.

 Periodic boundary conditions are also used in the heat equation solve
 for the electronic subsystem.  This varies from the approach of
--- a/doc/src/package.rst
+++ b/doc/src/package.rst
@ -399,7 +399,7 @@ automatically throughout the run.  This typically give performance
 within 5 to 10 percent of the optimal fixed fraction.

 The *ghost* keyword determines whether or not ghost atoms, i.e. atoms
-at the boundaries of processor sub-domains, are offloaded for neighbor
+at the boundaries of processor subdomains, are offloaded for neighbor
 and force calculations.  When the value = "no", ghost atoms are not
 offloaded.  This option can reduce the amount of data transfer with
 the co-processor and can also overlap MPI communication of forces with
@ -521,7 +521,7 @@ the comm keywords.
 The value options for the keywords are *no* or *host* or *device*\ . A
 value of *no* means to use the standard non-KOKKOS method of
 packing/unpacking data for the communication. A value of *host* means to
-use the host, typically a multi-core CPU, and perform the
+use the host, typically a multicore CPU, and perform the
 packing/unpacking in parallel with threads. A value of *device* means to
 use the device, typically a GPU, to perform the packing/unpacking
 operation.
--- a/doc/src/pair_dsmc.rst
+++ b/doc/src/pair_dsmc.rst
@ -56,7 +56,7 @@ commands:
 The global DSMC *max_cell_size* determines the maximum cell length
 used in the DSMC calculation.  A structured mesh is overlayed on the
 simulation box such that an integer number of cells are created in
-each direction for each processor's sub-domain.  Cell lengths are
+each direction for each processor's subdomain.  Cell lengths are
 adjusted up to the user-specified maximum cell size.

 ----------
--- a/doc/src/pair_none.rst
+++ b/doc/src/pair_none.rst
@ -31,7 +31,7 @@ and the neighbor skin distance (see the documentation of the
 <comm_modify>` command).  When you have bonds, angles, dihedrals, or
 impropers defined at the same time, you must set the communication
 cutoff so that communication cutoff distance is large enough to acquire
-and communicate sufficient ghost atoms from neighboring sub-domains as
+and communicate sufficient ghost atoms from neighboring subdomains as
 needed for computing bonds, angles, etc.

 A pair style of *none* will also not request a pairwise neighbor list.
--- a/doc/src/processors.rst
+++ b/doc/src/processors.rst
@ -66,7 +66,7 @@ parameters can be specified with an asterisk "\*", which means LAMMPS
 will choose the number of processors in that dimension of the grid.
 It will do this based on the size and shape of the global simulation
 box so as to minimize the surface-to-volume ratio of each processor's
-sub-domain.
+subdomain.

 Choosing explicit values for Px or Py or Pz can be used to override
 the default manner in which LAMMPS will create the regular 3d grid of
@ -81,7 +81,7 @@ equal 1.
 Note that if you run on a prime number of processors P, then a grid
 such as 1 x P x 1 will be required, which may incur extra
 communication costs due to the high surface area of each processor's
-sub-domain.
+subdomain.

 Also note that if multiple partitions are being used then P is the
 number of processors in this partition; see the :doc:`-partition command-line switch <Run_options>` page for details.  Also note
@ -113,10 +113,10 @@ will persist for all simulations.  If balancing is performed, some of
 the methods invoked by those commands retain the logical topology of
 the initial 3d grid, and the mapping of processors to the grid
 specified by the processors command.  However the grid spacings in
-different dimensions may change, so that processors own sub-domains of
+different dimensions may change, so that processors own subdomains of
 different sizes.  If the :doc:`comm_style tiled <comm_style>` command is
 used, methods invoked by the balancing commands may discard the 3d
-grid of processors and tile the simulation domain with sub-domains of
+grid of processors and tile the simulation domain with subdomains of
 different sizes and shapes which no longer have a logical 3d
 connectivity.  If that occurs, all the information specified by the
 processors command is ignored.
@ -129,7 +129,7 @@ processors.

 The *onelevel* style creates a 3d grid that is compatible with the
 Px,Py,Pz settings, and which minimizes the surface-to-volume ratio of
-each processor's sub-domain, as described above.  The mapping of
+each processor's subdomain, as described above.  The mapping of
 processors to the grid is determined by the *map* keyword setting.

 The *twolevel* style can be used on machines with multicore nodes to
@ -145,7 +145,7 @@ parameters can be specified with an asterisk "\*", which means LAMMPS
 will choose the number of cores in that dimension of the node's
 sub-grid.  As with Px,Py,Pz, it will do this based on the size and
 shape of the global simulation box so as to minimize the
-surface-to-volume ratio of each processor's sub-domain.
+surface-to-volume ratio of each processor's subdomain.

 .. note::

--- a/doc/src/replicate.rst
+++ b/doc/src/replicate.rst
@ -16,7 +16,7 @@ nx,ny,nz = replication factors in each dimension

  .. parsed-literal::

-       *bbox* = only check atoms in replicas that overlap with a processor's sub-domain
+       *bbox* = only check atoms in replicas that overlap with a processor's subdomain

 Examples
 """"""""
@ -52,7 +52,7 @@ image flags that differ by 1.  This will allow the bond to be
 unwrapped appropriately.

 The optional keyword *bbox* uses a bounding box to only check atoms in
-replicas that overlap with a processor's sub-domain when assigning
+replicas that overlap with a processor's subdomain when assigning
 atoms to processors.  It typically results in a substantial speedup
 when using the replicate command on a large number of processors.  It
 does require temporary use of more memory, specifically that each
--- a/doc/src/thermo_modify.rst
+++ b/doc/src/thermo_modify.rst
@ -64,7 +64,7 @@ The *lost* keyword determines whether LAMMPS checks for lost atoms each
 time it computes thermodynamics and what it does if atoms are lost.  An
 atom can be "lost" if it moves across a non-periodic simulation box
 :doc:`boundary <boundary>` or if it moves more than a box length outside
-the simulation domain (or more than a processor sub-domain length)
+the simulation domain (or more than a processor subdomain length)
 before reneighboring occurs.  The latter case is typically due to bad
 dynamics (e.g., too large a time step and/or huge forces and velocities).  If
 the value is *ignore*, LAMMPS does not check for lost atoms.  If the
--- a/doc/utils/sphinx-config/false_positives.txt
+++ b/doc/utils/sphinx-config/false_positives.txt
@ -3432,6 +3432,8 @@ Subclassed
 subcutoff
 subcycle
 subcycling
+subdomain
+subdomains
 subhi
 sublo
 Subramaniyan