add section about neighbor list construction
This commit is contained in:
@ -7,7 +7,8 @@ parallelization has to be efficient to enable good strong scaling (=
|
||||
good speedup for the same system) and good weak scaling (= the
|
||||
computational cost of enlarging the system is linear with the system
|
||||
size). Additional parallelization using GPUs or OpenMP can then be
|
||||
applied within the sub-domain assigned to an MPI process.
|
||||
applied within the sub-domain assigned to an MPI process. For clarity,
|
||||
most of the following illustrations show the 2d simulation case.
|
||||
|
||||
Partitioning
|
||||
^^^^^^^^^^^^
|
||||
@ -86,39 +87,42 @@ the load imbalance:
|
||||
|decomp1| |decomp2| |decomp3| |decomp4|
|
||||
|
||||
The pictures above demonstrate different decompositions for a 2d system
|
||||
with 12 MPI ranks. The atom colors indicate the load imbalance with
|
||||
green being optimal and red the least optimal. Due to the vacuum in the system, the default
|
||||
decomposition is unbalanced with several MPI ranks without atoms
|
||||
(left). By forcing a 1x12x1 processor grid, every MPI rank does
|
||||
computations now, but number of atoms per sub-domain is still uneven and
|
||||
the thin slice shape increases the amount of communication between sub-domains
|
||||
(center left). With a 2x6x1 processor grid and shifting the
|
||||
sub-domain divisions, the load imbalance is further reduced and the amount
|
||||
of communication required between sub-domains is less (center right).
|
||||
And using the recursive bisectioning leads to further improved
|
||||
decomposition (right).
|
||||
with 12 MPI ranks. The atom colors indicate the load imbalance of each
|
||||
sub-domain with green being optimal and red the least optimal.
|
||||
|
||||
Due to the vacuum in the system, the default decomposition is unbalanced
|
||||
with several MPI ranks without atoms (left). By forcing a 1x12x1
|
||||
processor grid, every MPI rank does computations now, but number of
|
||||
atoms per sub-domain is still uneven and the thin slice shape increases
|
||||
the amount of communication between sub-domains (center left). With a
|
||||
2x6x1 processor grid and shifting the sub-domain divisions, the load
|
||||
imbalance is further reduced and the amount of communication required
|
||||
between sub-domains is less (center right). And using the recursive
|
||||
bisectioning leads to further improved decomposition (right).
|
||||
|
||||
|
||||
Communication
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
Following the partitioning scheme the data of the system is distributed
|
||||
and each MPI process stores information (positions, velocities, etc.)
|
||||
for the subset of atoms within its sub-domain, called "owned" atoms. It
|
||||
also stores copies of some of that information for "ghost" atoms within
|
||||
the communication cutoff distance of its sub-domain, which are owned by
|
||||
nearby MPI processes. This enables calculating all short-range
|
||||
interactions which involve atoms the MPI process "owns" in parallel.
|
||||
The dashed-line boxes in the :ref:`domain-decomposition` figure
|
||||
illustrate the extended ghost-atom sub-domain for one processor.
|
||||
Following the partitioning scheme in use all per-atom data is
|
||||
distributed across the MPI processes, which allows LAMMPS to handle very
|
||||
large systems provided it uses a correspondingly large number of MPI
|
||||
processes. Since The per-atom data (atom IDs, positions, velocities,
|
||||
types, etc.) To be able to compute the short-range interactions MPI
|
||||
processes need not only access to data of atoms they "own" but also
|
||||
information about atoms from neighboring sub-domains, in LAMMPS referred
|
||||
to as "ghost" atoms. These are copies of atoms storing required
|
||||
per-atom data for up to the communication cutoff distance. The green
|
||||
dashed-line boxes in the :ref:`domain-decomposition` figure illustrate
|
||||
the extended ghost-atom sub-domain for one processor.
|
||||
|
||||
This approach is also used to implement periodic boundary conditions:
|
||||
atoms that lie within the cutoff distance across a periodic boundary are
|
||||
also stored as ghost atoms and taken from the periodic replication of
|
||||
the sub-domain, which may be the same sub-domain, e.g. if running in
|
||||
serial. As a consequence of this, force computation in LAMMPS is not
|
||||
subject to minimum image conventions and thus cutoffs may be larger than
|
||||
half the simulation domain.
|
||||
This approach is also used to implement periodic boundary
|
||||
conditions: atoms that lie within the cutoff distance across a periodic
|
||||
boundary are also stored as ghost atoms and taken from the periodic
|
||||
replication of the sub-domain, which may be the same sub-domain, e.g. if
|
||||
running in serial. As a consequence of this, force computation in
|
||||
LAMMPS is not subject to minimum image conventions and thus cutoffs may
|
||||
be larger than half the simulation domain.
|
||||
|
||||
.. _ghost-atom-comm:
|
||||
.. figure:: img/ghost-comm.png
|
||||
@ -134,10 +138,12 @@ half the simulation domain.
|
||||
includes its ghost atoms. The red- and blue-shaded boxes are the
|
||||
regions of communicated ghost atoms.
|
||||
|
||||
The diagrams of the `ghost-atom-comm` figure illustrate how ghost atom
|
||||
communication is performed in two stages for a 2d simulation (three in
|
||||
3d) for both a regular and irregular partitioning of the simulation box.
|
||||
For the regular case (left) atoms are exchanged first in the
|
||||
Efficient communication patterns are needed to update the "ghost" atom
|
||||
data, since that needs to be done at every MD time step or minimization
|
||||
step. The diagrams of the `ghost-atom-comm` figure illustrate how ghost
|
||||
atom communication is performed in two stages for a 2d simulation (three
|
||||
in 3d) for both a regular and irregular partitioning of the simulation
|
||||
box. For the regular case (left) atoms are exchanged first in the
|
||||
*x*-direction, then in *y*, with four neighbors in the grid of processor
|
||||
sub-domains.
|
||||
|
||||
@ -219,5 +225,152 @@ performed in LAMMPS:
|
||||
Neighbor lists
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
To compute forces efficiently, each processor creates a Verlet-style
|
||||
neighbor list which enumerates all pairs of atoms *i,j* (*i* = owned,
|
||||
*j* = owned or ghost) with separation less than the applicable
|
||||
neighbor list cutoff distance. In LAMMPS the neighbor lists are stored
|
||||
in a multiple-page data structure; each page is a contiguous chunk of
|
||||
memory which stores vectors of neighbor atoms *j* for many *i* atoms.
|
||||
This allows pages to be incrementally allocated or deallocated in blocks
|
||||
as needed. Neighbor lists typically consume the most memory of any data
|
||||
structure in LAMMPS. The neighbor list is rebuilt (from scratch) once
|
||||
every few timesteps, then used repeatedly each step for force or other
|
||||
computations. The neighbor cutoff distance is :math:`R_n = R_f +
|
||||
\Delta_s`, where :math:`R_f` is the (largest) force cutoff defined by
|
||||
the interatomic potential for computing short-range pairwise or manybody
|
||||
forces and :math:`\Delta_s` is a "skin" distance that allows the list to
|
||||
be used for multiple steps assuming that atoms do not move very far
|
||||
between consecutive time steps. Typically the code triggers
|
||||
reneighboring when any atom has moved half the skin distance since the
|
||||
last reneighboring; this and other options of the neighbor list rebuild
|
||||
can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
|
||||
|
||||
On steps when reneighboring is performed, atoms which have moved outside
|
||||
their owning processor's sub-domain are first migrated to new processors
|
||||
via communication. Periodic boundary conditions are also (only)
|
||||
enforced on these steps to ensure each atom is re-assigned to the
|
||||
correct processor. After migration, the atoms owned by each processor
|
||||
are stored in a contiguous vector. Periodically each processor
|
||||
spatially sorts owned atoms within its vector to reorder it for improved
|
||||
cache efficiency in force computations and neighbor list building. For
|
||||
that atoms are spatially binned and then reordered so that atoms in the
|
||||
same bin are adjacent in the vector. Atom sorting can be disabled or
|
||||
its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
||||
|
||||
.. _neighbor-stencil:
|
||||
.. figure:: img/neigh-stencil.png
|
||||
:align: center
|
||||
|
||||
neighbor list stencils
|
||||
|
||||
A 2d simulation sub-domain (thick black line) and the corresponding
|
||||
ghost atom cutoff region (dashed blue line) for both orthogonal
|
||||
(left) and triclinic (right) domains. A regular grid of neighbor
|
||||
bins (thin lines) overlays the entire simulation domain and need not
|
||||
align with sub-domain boundaries; only the portion overlapping the
|
||||
augmented sub-domain is shown. In the triclinic case it overlaps the
|
||||
bounding box of the tilted rectangle. The blue- and red-shaded bins
|
||||
represent a stencil of bins searched to find neighbors of a particular
|
||||
atom (black dot).
|
||||
|
||||
To build a local neighbor list in linear time, the simulation domain is
|
||||
overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
|
||||
as shown in the :ref:`neighbor-stencil` figure for 2d models and a
|
||||
single MPI processor's sub-domain. Each processor stores a set of
|
||||
neighbor bins which overlap its sub-domain extended by the neighbor
|
||||
cutoff distance :math:`R_n`. As illustrated, the bins need not align
|
||||
with processor boundaries; an integer number in each dimension is fit to
|
||||
the size of the entire simulation box.
|
||||
|
||||
Most often LAMMPS builds what it calls a "half" neighbor list where
|
||||
each *i,j* neighbor pair is stored only once, with either atom *i* or
|
||||
*j* as the central atom. The build can be done efficiently by using a
|
||||
pre-computed "stencil" of bins around a central origin bin which
|
||||
contains the atom whose neighbors are being searched for. A stencil
|
||||
is simply a list of integer offsets in *x,y,z* of nearby bins
|
||||
surrounding the origin bin which are close enough to contain any
|
||||
neighbor atom *j* within a distance :math:`R_n` from any atom *i* in the
|
||||
origin bin. Note that for a half neighbor list, the stencil can be
|
||||
asymmetric since each atom only need store half its nearby neighbors.
|
||||
|
||||
These stencils are illustrated in the figure for a half list and a bin
|
||||
size of :math:`\frac{1}{2} R_n`. There are 13 red+blue stencil bins in
|
||||
2d (for the orthogonal case, 15 for triclinic). In 3d there would be
|
||||
63, 13 in the plane of bins that contain the origin bin and 25 in each
|
||||
of the two planes above it in the *z* direction (75 for triclinic). The
|
||||
reason the triclinic stencil has extra bins is because the bins tile the
|
||||
bounding box of the entire triclinic domain and thus are not periodic
|
||||
with respect to the simulation box itself. The stencil and logic for
|
||||
determining which *i,j* pairs to include in the neighbor list are
|
||||
altered slightly to account for this.
|
||||
|
||||
To build a neighbor list, a processor first loops over its "owned" plus
|
||||
"ghost" atoms and assigns each to a neighbor bin. This uses an integer
|
||||
vector to create a linked list of atom indices within each bin. It then
|
||||
performs a triply-nested loop over its owned atoms *i*, the stencil of
|
||||
bins surrounding atom *i*'s bin, and the *j* atoms in each stencil bin
|
||||
(including ghost atoms). If the distance :math:`r_{ij} < R_n`, then
|
||||
atom *j* is added to the vector of atom *i*'s neighbors.
|
||||
|
||||
Here are additional details about neighbor list build options LAMMPS
|
||||
supports:
|
||||
|
||||
- The choice of bin size is an option; a size half of :math:`R_n` has
|
||||
been found to be optimal for many typical cases. Smaller bins incur
|
||||
additional overhead to loop over; larger bins require more distance
|
||||
calculations. Note that for smaller bin sizes, the 2d stencil in the
|
||||
figure would be more semi-circular in shape (hemispherical in 3d),
|
||||
with bins near the corners of the square eliminated due to their
|
||||
distance from the origin bin.
|
||||
|
||||
- Depending on the interatomic potential(s) and other commands used in
|
||||
an input script, multiple neighbor lists and stencils with different
|
||||
attributes may be needed. This includes lists with different cutoff
|
||||
distances, e.g. for force computation versus occasional diagnostic
|
||||
computations such as a radial distribution function, or for the
|
||||
r-RESPA time integrator which can partition pairwise forces by
|
||||
distance into subsets computed at different time intervals. It
|
||||
includes "full" lists (as opposed to half lists) where each *i,j* pair
|
||||
appears twice, stored once with *i* and *j*, and which use a larger
|
||||
symmetric stencil. It also includes lists with partial enumeration of
|
||||
ghost atom neighbors. The full and ghost-atom lists are used by
|
||||
various manybody interatomic potentials. Lists may also use different
|
||||
criteria for inclusion of a pair interaction. Typically this simply
|
||||
depends only on the distance between two atoms and the cutoff
|
||||
distance. But for finite-size coarse-grained particles with
|
||||
individual diameters (e.g. polydisperse granular particles), it can
|
||||
also depend on the diameters of the two particles.
|
||||
|
||||
- When using :doc:`pair style hybrid <pair_hybrid>` multiple sub-lists
|
||||
of the master neighbor list for the full system need to be generated,
|
||||
one for each sub-style, which contains only the *i,j* pairs needed to
|
||||
compute interactions between subsets of atoms for the corresponding
|
||||
potential. This means not all *i* or *j* atoms owned by a processor
|
||||
are included in a particular sub-list.
|
||||
|
||||
- Some models use different cutoff lengths for pairwise interactions
|
||||
between different kinds of particles which are stored in a single
|
||||
neighbor list. One example is a solvated colloidal system with large
|
||||
colloidal particles where colloid/colloid, colloid/solvent, and
|
||||
solvent/solvent interaction cutoffs can be dramatically different.
|
||||
Another is a model of polydisperse finite-size granular particles;
|
||||
pairs of particles interact only when they are in contact with each
|
||||
other. Mixtures with particle size ratios as high as 10-100x may be
|
||||
used to model realistic systems. Efficient neighbor list building
|
||||
algorithms for these kinds of systems are available in LAMMPS. They
|
||||
include a method which uses different stencils for different cutoff
|
||||
lengths and trims the stencil to only include bins that straddle the
|
||||
cutoff sphere surface. More recently a method which uses both
|
||||
multiple stencils and multiple bin sizes was developed; it builds
|
||||
neighbor lists efficiently for systems with particles of any size
|
||||
ratio, though other considerations (timestep size, force computations)
|
||||
may limit the ability to model systems with huge polydispersity.
|
||||
|
||||
- For small and sparse systems and as a fallback method, LAMMPS also
|
||||
supports neighbor list construction without binning by using a full
|
||||
:math:`O(N^2)` loop over all *i,j* atom pairs in a sub-domain when
|
||||
using the :doc:`neighbor nsq <neighbor>` command.
|
||||
|
||||
|
||||
Long-range interactions
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Reference in New Issue
Block a user