adapt section about domain decomposition from paper

This commit is contained in:
Axel Kohlmeyer
2021-09-03 16:59:41 -04:00
parent 6cf2aa4fbb
commit a98ded7722
7 changed files with 107 additions and 0 deletions

View File

@ -11,6 +11,7 @@ of time and requests from the LAMMPS user community.
:maxdepth: 1
Developer_org
Developer_parallel
Developer_flow
Developer_write
Developer_notes

View File

@ -0,0 +1,106 @@
Parallel algorithms
-------------------
LAMMPS is from ground up designed to be running in parallel using the
MPI standard with distributed data via domain decomposition. The
parallelization has to be efficient to enable good strong scaling (=
good speedup for the same system) and good weak scaling (= the
computational cost of enlarging the system is linear with the system
size). Additional parallelization using GPUs or OpenMP can then be
applied within the sub-domain assigned to an MPI process.
Partitioning
^^^^^^^^^^^^
The underlying spatial decomposition strategy used by LAMMPS for
distributed-memory parallelism is set with the :doc:`comm_style command <comm_style>`
and can be either "brick" (a regular grid) or "tiled".
.. _domain-decomposition:
.. figure:: img/domain-decomp.png
LAMMPS domain decomposition
This figure shows the different kinds of domain decomposition used
for MPI parallelization: "brick" on the left with an orthogonal (top)
and a triclinic (bottom) simulation domain, and "tiled" on the right.
The black lines show the division into sub-domains and the contained
atoms are "owned" by the corresponding MPI process. The green dashed
lines indicate how sub-domains are extended with "ghost" atoms up
to the communication cutoff distance.
The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
or triclinic in shape, as illustrated in the :ref:`domain-decomposition`
figure for the 2d case. Orthogonal means the box edges are aligned with
the *x*, *y*, *z* Cartesian axes, and the box faces are thus all
rectangular. Triclinic allows for a more general parallelepiped shape
in which edges are aligned with three arbitrary vectors and the box
faces are parallelograms. In each dimension box faces can be periodic,
or non-periodic with fixed or shrink-wrapped boundaries. In the fixed
case, atoms which move outside the face are deleted; shrink-wrapped
means the position of the box face adjusts continuously to enclose all
the atoms.
For distributed-memory MPI parallelism, the simulation box is spatially
decomposed (partitioned) into non-overlapping sub-domains which fill the
box. The default partitioning, "brick", is most suitable when atom
density is roughly uniform, as shown in the left-side images of the
:ref:`domain-decomposition` figure. The sub-domains comprise a regular
grid and all sub-domains are identical in size and shape. Both the
orthogonal and triclinic boxes can deform continuously during a
simulation, e.g. to compress a solid or shear a liquid, in which case
the processor sub-domains likewise deform.
For models with non-uniform density, the number of particles per
processor can be load-imbalanced with the default partitioning. This
reduces parallel efficiency, as the overall simulation rate is limited
by the slowest processor, i.e. the one with the largest computational
load. For such models, LAMMPS supports multiple strategies to reduce
the load imbalance:
- The processor grid decomposition is by default based on the simulation
cell volume and tries to optimize the volume to surface ratio for the sub-domains.
This can be changed with the :doc:`processors command <processors>`.
- The parallel planes defining the size of the sub-domains can be shifted
with the :doc:`balance command <balance>`. Which can be done in addition
to choosing a more optimal processor grid.
- The recursive bisectioning algorithm in combination with the "tiled"
communication style can produce a partitioning with equal numbers of
particles in each sub-domain.
.. |decomp1| image:: img/decomp-regular.png
:width: 24%
.. |decomp2| image:: img/decomp-processors.png
:width: 24%
.. |decomp3| image:: img/decomp-balance.png
:width: 24%
.. |decomp4| image:: img/decomp-rcb.png
:width: 24%
|decomp1| |decomp2| |decomp3| |decomp4|
The pictures above demonstrate the differences for a 2d system with 12 MPI ranks.
Due to the vacuum in the system, the default decomposition is unbalanced
with several MPI ranks without atoms (left). By forcing a 1x12x1 processor
grid, every MPI rank does computations now, but the amount of communication
between sub-domains is increased (center left). With a 2x6x1 processor grid and
shifting the sub-domain divisions, the load imbalance is also reduced and
the amount of communication required between sub-domains is less (center right).
And using the recursive bisectioning leads to further improved decomposition (right).
Communication
^^^^^^^^^^^^^
Neighbor lists
^^^^^^^^^^^^^^
Long-range interactions
^^^^^^^^^^^^^^^^^^^^^^^

Binary file not shown.

After

Width:  |  Height:  |  Size: 364 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 332 KiB

BIN
doc/src/img/decomp-rcb.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 363 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 374 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB