break large file into multiple smaller files by section and add toctree

2021-09-05 21:57:03 -04:00
parent 94f03f169f
commit 4fc9753a69
5 changed files with 376 additions and 363 deletions
--- a/doc/src/Developer_par_comm.rst
+++ b/doc/src/Developer_par_comm.rst
@ -0,0 +1,120 @@
+Communication
+^^^^^^^^^^^^^
+
+Following the partitioning scheme in use all per-atom data is
+distributed across the MPI processes, which allows LAMMPS to handle very
+large systems provided it uses a correspondingly large number of MPI
+processes.  Since The per-atom data (atom IDs, positions, velocities,
+types, etc.)  To be able to compute the short-range interactions MPI
+processes need not only access to data of atoms they "own" but also
+information about atoms from neighboring sub-domains, in LAMMPS referred
+to as "ghost" atoms.  These are copies of atoms storing required
+per-atom data for up to the communication cutoff distance. The green
+dashed-line boxes in the :ref:`domain-decomposition` figure illustrate
+the extended ghost-atom sub-domain for one processor.
+
+This approach is also used to implement periodic boundary
+conditions: atoms that lie within the cutoff distance across a periodic
+boundary are also stored as ghost atoms and taken from the periodic
+replication of the sub-domain, which may be the same sub-domain, e.g. if
+running in serial.  As a consequence of this, force computation in
+LAMMPS is not subject to minimum image conventions and thus cutoffs may
+be larger than half the simulation domain.
+
+.. _ghost-atom-comm:
+.. figure:: img/ghost-comm.png
+   :align: center
+
+   ghost atom communication
+
+   This figure shows the ghost atom communication patterns between
+   sub-domains for "brick" (left) and "tiled" communication styles for
+   2d simulations.  The numbers indicate MPI process ranks.  Here the
+   sub-domains are drawn spatially separated for clarity.  The
+   dashed-line box is the extended sub-domain of processor 0 which
+   includes its ghost atoms.  The red- and blue-shaded boxes are the
+   regions of communicated ghost atoms.
+
+Efficient communication patterns are needed to update the "ghost" atom
+data, since that needs to be done at every MD time step or minimization
+step.  The diagrams of the `ghost-atom-comm` figure illustrate how ghost
+atom communication is performed in two stages for a 2d simulation (three
+in 3d) for both a regular and irregular partitioning of the simulation
+box.  For the regular case (left) atoms are exchanged first in the
+*x*-direction, then in *y*, with four neighbors in the grid of processor
+sub-domains.
+
+In the *x* stage, processor ranks 1 and 2 send owned atoms in their
+red-shaded regions to rank 0 (and vice versa).  Then in the *y* stage,
+ranks 3 and 4 send atoms in their blue-shaded regions to rank 0, which
+includes ghost atoms they received in the *x* stage.  Rank 0 thus
+acquires all its ghost atoms; atoms in the solid blue corner regions
+are communicated twice before rank 0 receives them.
+
+For the irregular case (right) the two stages are similar, but a
+processor can have more than one neighbor in each direction.  In the
+*x* stage, MPI ranks 1,2,3 send owned atoms in their red-shaded regions to
+rank 0 (and vice versa).  These include only atoms between the lower
+and upper *y*-boundary of rank 0's sub-domain.  In the *y* stage, ranks
+4,5,6 send atoms in their blue-shaded regions to rank 0.  This may
+include ghost atoms they received in the *x* stage, but only if they
+are needed by rank 0 to fill its extended ghost atom regions in the
+/-*y* directions (blue rectangles).  Thus in this case, ranks 5 and
+6 do not include ghost atoms they received from each other (in the *x*
+stage) in the atoms they send to rank 0.  The key point is that while
+the pattern of communication is more complex in the irregular
+partitioning case, it can still proceed in two stages (three in 3d)
+via atom exchanges with only neighboring processors.
+
+When attributes of owned atoms are sent to neighboring processors to
+become attributes of their ghost atoms, LAMMPS calls this a "forward"
+communication.  On timesteps when atoms migrate to new owning processors
+and neighbor lists are rebuilt, each processor creates a list of its
+owned atoms which are ghost atoms in each of its neighbor processors.
+These lists are used to pack per-atom coordinates (for example) into
+message buffers in subsequent steps until the next reneighboring.
+
+A "reverse" communication is when computed ghost atom attributes are
+sent back to the processor who owns the atom.  This is used (for
+example) to sum partial forces on ghost atoms to the complete force on
+owned atoms.  The order of the two stages described in the
+:ref:`ghost-atom-comm` figure is inverted and the same lists of atoms
+are used to pack and unpack message buffers with per-atom forces.  When
+a received buffer is unpacked, the ghost forces are summed to owned atom
+forces.  As in forward communication, forces on atoms in the four blue
+corners of the diagrams are sent, received, and summed twice (once at
+each stage) before owning processors have the full force.
+
+These two operations are used many places within LAMMPS aside from
+exchange of coordinates and forces, for example by manybody potentials
+to share intermediate per-atom values, or by rigid-body integrators to
+enable each atom in a body to access body properties.  Here are
+additional details about how these communication operations are
+performed in LAMMPS:
+
+- When exchanging data with different processors, forward and reverse
+  communication is done using ``MPI_Send()`` and ``MPI_IRecv()`` calls.
+  If a processor is "exchanging" atoms with itself, only the pack and
+  unpack operations are performed, e.g. to create ghost atoms across
+  periodic boundaries when running on a single processor.
+
+- For forward communication of owned atom coordinates, periodic box
+  lengths are added and subtracted when the receiving processor is
+  across a periodic boundary from the sender.  There is then no need to
+  apply a minimum image convention when calculating distances between
+  atom pairs when building neighbor lists or computing forces.
+
+- The cutoff distance for exchanging ghost atoms is typically equal to
+  the neighbor cutoff.  But it can also chosen to be longer if needed,
+  e.g. half the diameter of a rigid body composed of multiple atoms or
+  over 3x the length of a stretched bond for dihedral interactions.  It
+  can also exceed the periodic box size.  For the regular communication
+  pattern (left), if the cutoff distance extends beyond a neighbor
+  processor's sub-domain, then multiple exchanges are performed in the
+  same direction.  Each exchange is with the same neighbor processor,
+  but buffers are packed/unpacked using a different list of atoms. For
+  forward communication, in the first exchange a processor sends only
+  owned atoms.  In subsequent exchanges, it sends ghost atoms received
+  in previous exchanges.  For the irregular pattern (right) overlaps of
+  a processor's extended ghost-atom sub-domain with all other processors
+  in each dimension are detected.
--- a/doc/src/Developer_par_long.rst
+++ b/doc/src/Developer_par_long.rst
@ -0,0 +1,2 @@
+Long-range interactions
+^^^^^^^^^^^^^^^^^^^^^^^
--- a/doc/src/Developer_par_neigh.rst
+++ b/doc/src/Developer_par_neigh.rst
@ -0,0 +1,159 @@
+Neighbor lists
+^^^^^^^^^^^^^^
+
+To compute forces efficiently, each processor creates a Verlet-style
+neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned,
+*j* = owned or ghost) with separation less than the applicable
+neighbor list cutoff distance.  In LAMMPS the neighbor lists are stored
+in a multiple-page data structure; each page is a contiguous chunk of
+memory which stores vectors of neighbor atoms *j* for many *i* atoms.
+This allows pages to be incrementally allocated or deallocated in blocks
+as needed.  Neighbor lists typically consume the most memory of any data
+structure in LAMMPS.  The neighbor list is rebuilt (from scratch) once
+every few timesteps, then used repeatedly each step for force or other
+computations.  The neighbor cutoff distance is :math:`R_n = R_f +
+\Delta_s`, where :math:`R_f` is the (largest) force cutoff defined by
+the interatomic potential for computing short-range pairwise or manybody
+forces and :math:`\Delta_s` is a "skin" distance that allows the list to
+be used for multiple steps assuming that atoms do not move very far
+between consecutive time steps.  Typically the code triggers
+reneighboring when any atom has moved half the skin distance since the
+last reneighboring; this and other options of the neighbor list rebuild
+can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
+
+On steps when reneighboring is performed, atoms which have moved outside
+their owning processor's sub-domain are first migrated to new processors
+via communication.  Periodic boundary conditions are also (only)
+enforced on these steps to ensure each atom is re-assigned to the
+correct processor.  After migration, the atoms owned by each processor
+are stored in a contiguous vector.  Periodically each processor
+spatially sorts owned atoms within its vector to reorder it for improved
+cache efficiency in force computations and neighbor list building.  For
+that atoms are spatially binned and then reordered so that atoms in the
+same bin are adjacent in the vector.  Atom sorting can be disabled or
+its settings modified with the :doc:`atom_modify <atom_modify>` command.
+
+.. _neighbor-stencil:
+.. figure:: img/neigh-stencil.png
+   :align: center
+
+   neighbor list stencils
+
+   A 2d simulation sub-domain (thick black line) and the corresponding
+   ghost atom cutoff region (dashed blue line) for both orthogonal
+   (left) and triclinic (right) domains.  A regular grid of neighbor
+   bins (thin lines) overlays the entire simulation domain and need not
+   align with sub-domain boundaries; only the portion overlapping the
+   augmented sub-domain is shown.  In the triclinic case it overlaps the
+   bounding box of the tilted rectangle.  The blue- and red-shaded bins
+   represent a stencil of bins searched to find neighbors of a particular
+   atom (black dot).
+
+To build a local neighbor list in linear time, the simulation domain is
+overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
+as shown in the :ref:`neighbor-stencil` figure for 2d models and a
+single MPI processor's sub-domain.  Each processor stores a set of
+neighbor bins which overlap its sub-domain extended by the neighbor
+cutoff distance :math:`R_n`.  As illustrated, the bins need not align
+with processor boundaries; an integer number in each dimension is fit to
+the size of the entire simulation box.
+
+Most often LAMMPS builds what it calls a "half" neighbor list where
+each *i,j* neighbor pair is stored only once, with either atom *i* or
+*j* as the central atom.  The build can be done efficiently by using a
+pre-computed "stencil" of bins around a central origin bin which
+contains the atom whose neighbors are being searched for.  A stencil
+is simply a list of integer offsets in *x,y,z* of nearby bins
+surrounding the origin bin which are close enough to contain any
+neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the
+origin bin.  Note that for a half neighbor list, the stencil can be
+asymmetric since each atom only need store half its nearby neighbors.
+
+These stencils are illustrated in the figure for a half list and a bin
+size of :math:`\frac{1}{2} R_n`.  There are 13 red+blue stencil bins in
+2d (for the orthogonal case, 15 for triclinic).  In 3d there would be
+63, 13 in the plane of bins that contain the origin bin and 25 in each
+of the two planes above it in the *z* direction (75 for triclinic).  The
+reason the triclinic stencil has extra bins is because the bins tile the
+bounding box of the entire triclinic domain and thus are not periodic
+with respect to the simulation box itself.  The stencil and logic for
+determining which *i,j* pairs to include in the neighbor list are
+altered slightly to account for this.
+
+To build a neighbor list, a processor first loops over its "owned" plus
+"ghost" atoms and assigns each to a neighbor bin.  This uses an integer
+vector to create a linked list of atom indices within each bin.  It then
+performs a triply-nested loop over its owned atoms *i*, the stencil of
+bins surrounding atom *i*'s bin, and the *j* atoms in each stencil bin
+(including ghost atoms).  If the distance :math:`r_{ij} < R_n`, then
+atom *j* is added to the vector of atom *i*'s neighbors.
+
+Here are additional details about neighbor list build options LAMMPS
+supports:
+
+- The choice of bin size is an option; a size half of :math:`R_n` has
+  been found to be optimal for many typical cases.  Smaller bins incur
+  additional overhead to loop over; larger bins require more distance
+  calculations.  Note that for smaller bin sizes, the 2d stencil in the
+  figure would be more semi-circular in shape (hemispherical in 3d),
+  with bins near the corners of the square eliminated due to their
+  distance from the origin bin.
+
+- Depending on the interatomic potential(s) and other commands used in
+  an input script, multiple neighbor lists and stencils with different
+  attributes may be needed.  This includes lists with different cutoff
+  distances, e.g. for force computation versus occasional diagnostic
+  computations such as a radial distribution function, or for the
+  r-RESPA time integrator which can partition pairwise forces by
+  distance into subsets computed at different time intervals.  It
+  includes "full" lists (as opposed to half lists) where each *i,j* pair
+  appears twice, stored once with *i* and *j*, and which use a larger
+  symmetric stencil.  It also includes lists with partial enumeration of
+  ghost atom neighbors.  The full and ghost-atom lists are used by
+  various manybody interatomic potentials.  Lists may also use different
+  criteria for inclusion of a pair interaction.  Typically this simply
+  depends only on the distance between two atoms and the cutoff
+  distance.  But for finite-size coarse-grained particles with
+  individual diameters (e.g. polydisperse granular particles), it can
+  also depend on the diameters of the two particles.
+
+- When using :doc:`pair style hybrid <pair_hybrid>` multiple sub-lists
+  of the master neighbor list for the full system need to be generated,
+  one for each sub-style, which contains only the *i,j* pairs needed to
+  compute interactions between subsets of atoms for the corresponding
+  potential.  This means not all *i* or *j* atoms owned by a processor
+  are included in a particular sub-list.
+
+- Some models use different cutoff lengths for pairwise interactions
+  between different kinds of particles which are stored in a single
+  neighbor list.  One example is a solvated colloidal system with large
+  colloidal particles where colloid/colloid, colloid/solvent, and
+  solvent/solvent interaction cutoffs can be dramatically different.
+  Another is a model of polydisperse finite-size granular particles;
+  pairs of particles interact only when they are in contact with each
+  other.  Mixtures with particle size ratios as high as 10-100x may be
+  used to model realistic systems.  Efficient neighbor list building
+  algorithms for these kinds of systems are available in LAMMPS.  They
+  include a method which uses different stencils for different cutoff
+  lengths and trims the stencil to only include bins that straddle the
+  cutoff sphere surface.  More recently a method which uses both
+  multiple stencils and multiple bin sizes was developed; it builds
+  neighbor lists efficiently for systems with particles of any size
+  ratio, though other considerations (timestep size, force computations)
+  may limit the ability to model systems with huge polydispersity.
+
+- For small and sparse systems and as a fallback method, LAMMPS also
+  supports neighbor list construction without binning by using a full
+  :math:`O(N^2)` loop over all *i,j* atom pairs in a sub-domain when
+  using the :doc:`neighbor nsq <neighbor>` command.
+
+- Dependent on the "pair" setting of the :doc:`newton <newton>` command,
+  the "half" neighbor lists may contain **all** pairs of atoms where
+  atom *j* is a ghost atom (i.e. when the newton pair setting is *off*)
+  For the newton pair *on* setting the atom *j* is only added to the
+  list if its *z* coordinate is larger, or if equal the *y* coordinate
+  is larger, and that is equal, too, the *x* coordinate is larger.  For
+  a homogenously dense system that will result in picking neighbors from
+  a same size sector in always the same direction relative to the
+  "owned" atom and thus it should lead to equal length neighbor lists
+  and thus reduce the chance of a load imbalance.
--- a/doc/src/Developer_par_part.rst
+++ b/doc/src/Developer_par_part.rst
@ -0,0 +1,89 @@
+Partitioning
+^^^^^^^^^^^^
+
+The underlying spatial decomposition strategy used by LAMMPS for
+distributed-memory parallelism is set with the :doc:`comm_style command
+<comm_style>` and can be either "brick" (a regular grid) or "tiled".
+
+.. _domain-decomposition:
+.. figure:: img/domain-decomp.png
+   :align: center
+
+   domain decomposition
+
+   This figure shows the different kinds of domain decomposition used
+   for MPI parallelization: "brick" on the left with an orthogonal
+   (left) and a triclinic (middle) simulation domain, and a "tiled"
+   decomposition (right).  The black lines show the division into
+   sub-domains and the contained atoms are "owned" by the corresponding
+   MPI process. The green dashed lines indicate how sub-domains are
+   extended with "ghost" atoms up to the communication cutoff distance.
+
+The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
+or triclinic in shape, as illustrated in the :ref:`domain-decomposition`
+figure for the 2d case.  Orthogonal means the box edges are aligned with
+the *x*, *y*, *z* Cartesian axes, and the box faces are thus all
+rectangular.  Triclinic allows for a more general parallelepiped shape
+in which edges are aligned with three arbitrary vectors and the box
+faces are parallelograms.  In each dimension box faces can be periodic,
+or non-periodic with fixed or shrink-wrapped boundaries.  In the fixed
+case, atoms which move outside the face are deleted; shrink-wrapped
+means the position of the box face adjusts continuously to enclose all
+the atoms.
+
+For distributed-memory MPI parallelism, the simulation box is spatially
+decomposed (partitioned) into non-overlapping sub-domains which fill the
+box. The default partitioning, "brick", is most suitable when atom
+density is roughly uniform, as shown in the left-side images of the
+:ref:`domain-decomposition` figure.  The sub-domains comprise a regular
+grid and all sub-domains are identical in size and shape.  Both the
+orthogonal and triclinic boxes can deform continuously during a
+simulation, e.g. to compress a solid or shear a liquid, in which case
+the processor sub-domains likewise deform.
+
+
+For models with non-uniform density, the number of particles per
+processor can be load-imbalanced with the default partitioning.  This
+reduces parallel efficiency, as the overall simulation rate is limited
+by the slowest processor, i.e. the one with the largest computational
+load.  For such models, LAMMPS supports multiple strategies to reduce
+the load imbalance:
+
+- The processor grid decomposition is by default based on the simulation
+  cell volume and tries to optimize the volume to surface ratio for the sub-domains.
+  This can be changed with the :doc:`processors command <processors>`.
+- The parallel planes defining the size of the sub-domains can be shifted
+  with the :doc:`balance command <balance>`. Which can be done in addition
+  to choosing a more optimal processor grid.
+- The recursive bisectioning algorithm in combination with the "tiled"
+  communication style can produce a partitioning with equal numbers of
+  particles in each sub-domain.
+
+
+.. |decomp1| image:: img/decomp-regular.png
+   :width: 24%
+
+.. |decomp2| image:: img/decomp-processors.png
+   :width: 24%
+
+.. |decomp3| image:: img/decomp-balance.png
+   :width: 24%
+
+.. |decomp4| image:: img/decomp-rcb.png
+   :width: 24%
+
+|decomp1|  |decomp2|  |decomp3|  |decomp4|
+
+The pictures above demonstrate different decompositions for a 2d system
+with 12 MPI ranks.  The atom colors indicate the load imbalance of each
+sub-domain with green being optimal and red the least optimal.
+
+Due to the vacuum in the system, the default decomposition is unbalanced
+with several MPI ranks without atoms (left). By forcing a 1x12x1
+processor grid, every MPI rank does computations now, but number of
+atoms per sub-domain is still uneven and the thin slice shape increases
+the amount of communication between sub-domains (center left). With a
+2x6x1 processor grid and shifting the sub-domain divisions, the load
+imbalance is further reduced and the amount of communication required
+between sub-domains is less (center right).  And using the recursive
+bisectioning leads to further improved decomposition (right).
--- a/doc/src/Developer_parallel.rst
+++ b/doc/src/Developer_parallel.rst
@ -10,367 +10,10 @@ size).  Additional parallelization using GPUs or OpenMP can then be
 applied within the sub-domain assigned to an MPI process.  For clarity,
 most of the following illustrations show the 2d simulation case.

-Partitioning
-^^^^^^^^^^^^
+.. toctree::
+   :maxdepth: 1

-The underlying spatial decomposition strategy used by LAMMPS for
-distributed-memory parallelism is set with the :doc:`comm_style command
-<comm_style>` and can be either "brick" (a regular grid) or "tiled".
-
-.. _domain-decomposition:
-.. figure:: img/domain-decomp.png
-   :align: center
-
-   domain decomposition
-
-   This figure shows the different kinds of domain decomposition used
-   for MPI parallelization: "brick" on the left with an orthogonal
-   (left) and a triclinic (middle) simulation domain, and a "tiled"
-   decomposition (right).  The black lines show the division into
-   sub-domains and the contained atoms are "owned" by the corresponding
-   MPI process. The green dashed lines indicate how sub-domains are
-   extended with "ghost" atoms up to the communication cutoff distance.
-
-The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
-or triclinic in shape, as illustrated in the :ref:`domain-decomposition`
-figure for the 2d case.  Orthogonal means the box edges are aligned with
-the *x*, *y*, *z* Cartesian axes, and the box faces are thus all
-rectangular.  Triclinic allows for a more general parallelepiped shape
-in which edges are aligned with three arbitrary vectors and the box
-faces are parallelograms.  In each dimension box faces can be periodic,
-or non-periodic with fixed or shrink-wrapped boundaries.  In the fixed
-case, atoms which move outside the face are deleted; shrink-wrapped
-means the position of the box face adjusts continuously to enclose all
-the atoms.
-
-For distributed-memory MPI parallelism, the simulation box is spatially
-decomposed (partitioned) into non-overlapping sub-domains which fill the
-box. The default partitioning, "brick", is most suitable when atom
-density is roughly uniform, as shown in the left-side images of the
-:ref:`domain-decomposition` figure.  The sub-domains comprise a regular
-grid and all sub-domains are identical in size and shape.  Both the
-orthogonal and triclinic boxes can deform continuously during a
-simulation, e.g. to compress a solid or shear a liquid, in which case
-the processor sub-domains likewise deform.
-
-
-For models with non-uniform density, the number of particles per
-processor can be load-imbalanced with the default partitioning.  This
-reduces parallel efficiency, as the overall simulation rate is limited
-by the slowest processor, i.e. the one with the largest computational
-load.  For such models, LAMMPS supports multiple strategies to reduce
-the load imbalance:
-
- The processor grid decomposition is by default based on the simulation
-  cell volume and tries to optimize the volume to surface ratio for the sub-domains.
-  This can be changed with the :doc:`processors command <processors>`.
- The parallel planes defining the size of the sub-domains can be shifted
-  with the :doc:`balance command <balance>`. Which can be done in addition
-  to choosing a more optimal processor grid.
- The recursive bisectioning algorithm in combination with the "tiled"
-  communication style can produce a partitioning with equal numbers of
-  particles in each sub-domain.
-
-
-.. |decomp1| image:: img/decomp-regular.png
-   :width: 24%
-
-.. |decomp2| image:: img/decomp-processors.png
-   :width: 24%
-
-.. |decomp3| image:: img/decomp-balance.png
-   :width: 24%
-
-.. |decomp4| image:: img/decomp-rcb.png
-   :width: 24%
-
-|decomp1|  |decomp2|  |decomp3|  |decomp4|
-
-The pictures above demonstrate different decompositions for a 2d system
-with 12 MPI ranks.  The atom colors indicate the load imbalance of each
-sub-domain with green being optimal and red the least optimal.
-
-Due to the vacuum in the system, the default decomposition is unbalanced
-with several MPI ranks without atoms (left). By forcing a 1x12x1
-processor grid, every MPI rank does computations now, but number of
-atoms per sub-domain is still uneven and the thin slice shape increases
-the amount of communication between sub-domains (center left). With a
-2x6x1 processor grid and shifting the sub-domain divisions, the load
-imbalance is further reduced and the amount of communication required
-between sub-domains is less (center right).  And using the recursive
-bisectioning leads to further improved decomposition (right).
-
-
-Communication
-^^^^^^^^^^^^^
-
-Following the partitioning scheme in use all per-atom data is
-distributed across the MPI processes, which allows LAMMPS to handle very
-large systems provided it uses a correspondingly large number of MPI
-processes.  Since The per-atom data (atom IDs, positions, velocities,
-types, etc.)  To be able to compute the short-range interactions MPI
-processes need not only access to data of atoms they "own" but also
-information about atoms from neighboring sub-domains, in LAMMPS referred
-to as "ghost" atoms.  These are copies of atoms storing required
-per-atom data for up to the communication cutoff distance. The green
-dashed-line boxes in the :ref:`domain-decomposition` figure illustrate
-the extended ghost-atom sub-domain for one processor.
-
-This approach is also used to implement periodic boundary
-conditions: atoms that lie within the cutoff distance across a periodic
-boundary are also stored as ghost atoms and taken from the periodic
-replication of the sub-domain, which may be the same sub-domain, e.g. if
-running in serial.  As a consequence of this, force computation in
-LAMMPS is not subject to minimum image conventions and thus cutoffs may
-be larger than half the simulation domain.
-
-.. _ghost-atom-comm:
-.. figure:: img/ghost-comm.png
-   :align: center
-
-   ghost atom communication
-
-   This figure shows the ghost atom communication patterns between
-   sub-domains for "brick" (left) and "tiled" communication styles for
-   2d simulations.  The numbers indicate MPI process ranks.  Here the
-   sub-domains are drawn spatially separated for clarity.  The
-   dashed-line box is the extended sub-domain of processor 0 which
-   includes its ghost atoms.  The red- and blue-shaded boxes are the
-   regions of communicated ghost atoms.
-
-Efficient communication patterns are needed to update the "ghost" atom
-data, since that needs to be done at every MD time step or minimization
-step.  The diagrams of the `ghost-atom-comm` figure illustrate how ghost
-atom communication is performed in two stages for a 2d simulation (three
-in 3d) for both a regular and irregular partitioning of the simulation
-box.  For the regular case (left) atoms are exchanged first in the
-*x*-direction, then in *y*, with four neighbors in the grid of processor
-sub-domains.
-
-In the *x* stage, processor ranks 1 and 2 send owned atoms in their
-red-shaded regions to rank 0 (and vice versa).  Then in the *y* stage,
-ranks 3 and 4 send atoms in their blue-shaded regions to rank 0, which
-includes ghost atoms they received in the *x* stage.  Rank 0 thus
-acquires all its ghost atoms; atoms in the solid blue corner regions
-are communicated twice before rank 0 receives them.
-
-For the irregular case (right) the two stages are similar, but a
-processor can have more than one neighbor in each direction.  In the
-*x* stage, MPI ranks 1,2,3 send owned atoms in their red-shaded regions to
-rank 0 (and vice versa).  These include only atoms between the lower
-and upper *y*-boundary of rank 0's sub-domain.  In the *y* stage, ranks
-4,5,6 send atoms in their blue-shaded regions to rank 0.  This may
-include ghost atoms they received in the *x* stage, but only if they
-are needed by rank 0 to fill its extended ghost atom regions in the
-+/-*y* directions (blue rectangles).  Thus in this case, ranks 5 and
-6 do not include ghost atoms they received from each other (in the *x*
-stage) in the atoms they send to rank 0.  The key point is that while
-the pattern of communication is more complex in the irregular
-partitioning case, it can still proceed in two stages (three in 3d)
-via atom exchanges with only neighboring processors.
-
-When attributes of owned atoms are sent to neighboring processors to
-become attributes of their ghost atoms, LAMMPS calls this a "forward"
-communication.  On timesteps when atoms migrate to new owning processors
-and neighbor lists are rebuilt, each processor creates a list of its
-owned atoms which are ghost atoms in each of its neighbor processors.
-These lists are used to pack per-atom coordinates (for example) into
-message buffers in subsequent steps until the next reneighboring.
-
-A "reverse" communication is when computed ghost atom attributes are
-sent back to the processor who owns the atom.  This is used (for
-example) to sum partial forces on ghost atoms to the complete force on
-owned atoms.  The order of the two stages described in the
-:ref:`ghost-atom-comm` figure is inverted and the same lists of atoms
-are used to pack and unpack message buffers with per-atom forces.  When
-a received buffer is unpacked, the ghost forces are summed to owned atom
-forces.  As in forward communication, forces on atoms in the four blue
-corners of the diagrams are sent, received, and summed twice (once at
-each stage) before owning processors have the full force.
-
-These two operations are used many places within LAMMPS aside from
-exchange of coordinates and forces, for example by manybody potentials
-to share intermediate per-atom values, or by rigid-body integrators to
-enable each atom in a body to access body properties.  Here are
-additional details about how these communication operations are
-performed in LAMMPS:
-
- When exchanging data with different processors, forward and reverse
-  communication is done using ``MPI_Send()`` and ``MPI_IRecv()`` calls.
-  If a processor is "exchanging" atoms with itself, only the pack and
-  unpack operations are performed, e.g. to create ghost atoms across
-  periodic boundaries when running on a single processor.
-
- For forward communication of owned atom coordinates, periodic box
-  lengths are added and subtracted when the receiving processor is
-  across a periodic boundary from the sender.  There is then no need to
-  apply a minimum image convention when calculating distances between
-  atom pairs when building neighbor lists or computing forces.
-
- The cutoff distance for exchanging ghost atoms is typically equal to
-  the neighbor cutoff.  But it can also chosen to be longer if needed,
-  e.g. half the diameter of a rigid body composed of multiple atoms or
-  over 3x the length of a stretched bond for dihedral interactions.  It
-  can also exceed the periodic box size.  For the regular communication
-  pattern (left), if the cutoff distance extends beyond a neighbor
-  processor's sub-domain, then multiple exchanges are performed in the
-  same direction.  Each exchange is with the same neighbor processor,
-  but buffers are packed/unpacked using a different list of atoms. For
-  forward communication, in the first exchange a processor sends only
-  owned atoms.  In subsequent exchanges, it sends ghost atoms received
-  in previous exchanges.  For the irregular pattern (right) overlaps of
-  a processor's extended ghost-atom sub-domain with all other processors
-  in each dimension are detected.
-
-Neighbor lists
-^^^^^^^^^^^^^^
-
-To compute forces efficiently, each processor creates a Verlet-style
-neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned,
-*j* = owned or ghost) with separation less than the applicable
-neighbor list cutoff distance.  In LAMMPS the neighbor lists are stored
-in a multiple-page data structure; each page is a contiguous chunk of
-memory which stores vectors of neighbor atoms *j* for many *i* atoms.
-This allows pages to be incrementally allocated or deallocated in blocks
-as needed.  Neighbor lists typically consume the most memory of any data
-structure in LAMMPS.  The neighbor list is rebuilt (from scratch) once
-every few timesteps, then used repeatedly each step for force or other
-computations.  The neighbor cutoff distance is :math:`R_n = R_f +
-\Delta_s`, where :math:`R_f` is the (largest) force cutoff defined by
-the interatomic potential for computing short-range pairwise or manybody
-forces and :math:`\Delta_s` is a "skin" distance that allows the list to
-be used for multiple steps assuming that atoms do not move very far
-between consecutive time steps.  Typically the code triggers
-reneighboring when any atom has moved half the skin distance since the
-last reneighboring; this and other options of the neighbor list rebuild
-can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
-
-On steps when reneighboring is performed, atoms which have moved outside
-their owning processor's sub-domain are first migrated to new processors
-via communication.  Periodic boundary conditions are also (only)
-enforced on these steps to ensure each atom is re-assigned to the
-correct processor.  After migration, the atoms owned by each processor
-are stored in a contiguous vector.  Periodically each processor
-spatially sorts owned atoms within its vector to reorder it for improved
-cache efficiency in force computations and neighbor list building.  For
-that atoms are spatially binned and then reordered so that atoms in the
-same bin are adjacent in the vector.  Atom sorting can be disabled or
-its settings modified with the :doc:`atom_modify <atom_modify>` command.
-
-.. _neighbor-stencil:
-.. figure:: img/neigh-stencil.png
-   :align: center
-
-   neighbor list stencils
-
-   A 2d simulation sub-domain (thick black line) and the corresponding
-   ghost atom cutoff region (dashed blue line) for both orthogonal
-   (left) and triclinic (right) domains.  A regular grid of neighbor
-   bins (thin lines) overlays the entire simulation domain and need not
-   align with sub-domain boundaries; only the portion overlapping the
-   augmented sub-domain is shown.  In the triclinic case it overlaps the
-   bounding box of the tilted rectangle.  The blue- and red-shaded bins
-   represent a stencil of bins searched to find neighbors of a particular
-   atom (black dot).
-
-To build a local neighbor list in linear time, the simulation domain is
-overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
-as shown in the :ref:`neighbor-stencil` figure for 2d models and a
-single MPI processor's sub-domain.  Each processor stores a set of
-neighbor bins which overlap its sub-domain extended by the neighbor
-cutoff distance :math:`R_n`.  As illustrated, the bins need not align
-with processor boundaries; an integer number in each dimension is fit to
-the size of the entire simulation box.
-
-Most often LAMMPS builds what it calls a "half" neighbor list where
-each *i,j* neighbor pair is stored only once, with either atom *i* or
-*j* as the central atom.  The build can be done efficiently by using a
-pre-computed "stencil" of bins around a central origin bin which
-contains the atom whose neighbors are being searched for.  A stencil
-is simply a list of integer offsets in *x,y,z* of nearby bins
-surrounding the origin bin which are close enough to contain any
-neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the
-origin bin.  Note that for a half neighbor list, the stencil can be
-asymmetric since each atom only need store half its nearby neighbors.
-
-These stencils are illustrated in the figure for a half list and a bin
-size of :math:`\frac{1}{2} R_n`.  There are 13 red+blue stencil bins in
-2d (for the orthogonal case, 15 for triclinic).  In 3d there would be
-63, 13 in the plane of bins that contain the origin bin and 25 in each
-of the two planes above it in the *z* direction (75 for triclinic).  The
-reason the triclinic stencil has extra bins is because the bins tile the
-bounding box of the entire triclinic domain and thus are not periodic
-with respect to the simulation box itself.  The stencil and logic for
-determining which *i,j* pairs to include in the neighbor list are
-altered slightly to account for this.
-
-To build a neighbor list, a processor first loops over its "owned" plus
-"ghost" atoms and assigns each to a neighbor bin.  This uses an integer
-vector to create a linked list of atom indices within each bin.  It then
-performs a triply-nested loop over its owned atoms *i*, the stencil of
-bins surrounding atom *i*'s bin, and the *j* atoms in each stencil bin
-(including ghost atoms).  If the distance :math:`r_{ij} < R_n`, then
-atom *j* is added to the vector of atom *i*'s neighbors.
-
-Here are additional details about neighbor list build options LAMMPS
-supports:
-
- The choice of bin size is an option; a size half of :math:`R_n` has
-  been found to be optimal for many typical cases.  Smaller bins incur
-  additional overhead to loop over; larger bins require more distance
-  calculations.  Note that for smaller bin sizes, the 2d stencil in the
-  figure would be more semi-circular in shape (hemispherical in 3d),
-  with bins near the corners of the square eliminated due to their
-  distance from the origin bin.
-
- Depending on the interatomic potential(s) and other commands used in
-  an input script, multiple neighbor lists and stencils with different
-  attributes may be needed.  This includes lists with different cutoff
-  distances, e.g. for force computation versus occasional diagnostic
-  computations such as a radial distribution function, or for the
-  r-RESPA time integrator which can partition pairwise forces by
-  distance into subsets computed at different time intervals.  It
-  includes "full" lists (as opposed to half lists) where each *i,j* pair
-  appears twice, stored once with *i* and *j*, and which use a larger
-  symmetric stencil.  It also includes lists with partial enumeration of
-  ghost atom neighbors.  The full and ghost-atom lists are used by
-  various manybody interatomic potentials.  Lists may also use different
-  criteria for inclusion of a pair interaction.  Typically this simply
-  depends only on the distance between two atoms and the cutoff
-  distance.  But for finite-size coarse-grained particles with
-  individual diameters (e.g. polydisperse granular particles), it can
-  also depend on the diameters of the two particles.
-
- When using :doc:`pair style hybrid <pair_hybrid>` multiple sub-lists
-  of the master neighbor list for the full system need to be generated,
-  one for each sub-style, which contains only the *i,j* pairs needed to
-  compute interactions between subsets of atoms for the corresponding
-  potential.  This means not all *i* or *j* atoms owned by a processor
-  are included in a particular sub-list.
-
- Some models use different cutoff lengths for pairwise interactions
-  between different kinds of particles which are stored in a single
-  neighbor list.  One example is a solvated colloidal system with large
-  colloidal particles where colloid/colloid, colloid/solvent, and
-  solvent/solvent interaction cutoffs can be dramatically different.
-  Another is a model of polydisperse finite-size granular particles;
-  pairs of particles interact only when they are in contact with each
-  other.  Mixtures with particle size ratios as high as 10-100x may be
-  used to model realistic systems.  Efficient neighbor list building
-  algorithms for these kinds of systems are available in LAMMPS.  They
-  include a method which uses different stencils for different cutoff
-  lengths and trims the stencil to only include bins that straddle the
-  cutoff sphere surface.  More recently a method which uses both
-  multiple stencils and multiple bin sizes was developed; it builds
-  neighbor lists efficiently for systems with particles of any size
-  ratio, though other considerations (timestep size, force computations)
-  may limit the ability to model systems with huge polydispersity.
-
- For small and sparse systems and as a fallback method, LAMMPS also
-  supports neighbor list construction without binning by using a full
-  :math:`O(N^2)` loop over all *i,j* atom pairs in a sub-domain when
-  using the :doc:`neighbor nsq <neighbor>` command.
-
-
-Long-range interactions
-^^^^^^^^^^^^^^^^^^^^^^^
+   Developer_par_part
+   Developer_par_comm
+   Developer_par_neigh
+   Developer_par_long