adapt section about domain decomposition from paper
This commit is contained in:
@ -11,6 +11,7 @@ of time and requests from the LAMMPS user community.
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
Developer_org
|
Developer_org
|
||||||
|
Developer_parallel
|
||||||
Developer_flow
|
Developer_flow
|
||||||
Developer_write
|
Developer_write
|
||||||
Developer_notes
|
Developer_notes
|
||||||
|
|||||||
106
doc/src/Developer_parallel.rst
Normal file
106
doc/src/Developer_parallel.rst
Normal file
@ -0,0 +1,106 @@
|
|||||||
|
Parallel algorithms
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
LAMMPS is from ground up designed to be running in parallel using the
|
||||||
|
MPI standard with distributed data via domain decomposition. The
|
||||||
|
parallelization has to be efficient to enable good strong scaling (=
|
||||||
|
good speedup for the same system) and good weak scaling (= the
|
||||||
|
computational cost of enlarging the system is linear with the system
|
||||||
|
size). Additional parallelization using GPUs or OpenMP can then be
|
||||||
|
applied within the sub-domain assigned to an MPI process.
|
||||||
|
|
||||||
|
|
||||||
|
Partitioning
|
||||||
|
^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The underlying spatial decomposition strategy used by LAMMPS for
|
||||||
|
distributed-memory parallelism is set with the :doc:`comm_style command <comm_style>`
|
||||||
|
and can be either "brick" (a regular grid) or "tiled".
|
||||||
|
|
||||||
|
.. _domain-decomposition:
|
||||||
|
.. figure:: img/domain-decomp.png
|
||||||
|
|
||||||
|
LAMMPS domain decomposition
|
||||||
|
|
||||||
|
This figure shows the different kinds of domain decomposition used
|
||||||
|
for MPI parallelization: "brick" on the left with an orthogonal (top)
|
||||||
|
and a triclinic (bottom) simulation domain, and "tiled" on the right.
|
||||||
|
The black lines show the division into sub-domains and the contained
|
||||||
|
atoms are "owned" by the corresponding MPI process. The green dashed
|
||||||
|
lines indicate how sub-domains are extended with "ghost" atoms up
|
||||||
|
to the communication cutoff distance.
|
||||||
|
|
||||||
|
The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
|
||||||
|
or triclinic in shape, as illustrated in the :ref:`domain-decomposition`
|
||||||
|
figure for the 2d case. Orthogonal means the box edges are aligned with
|
||||||
|
the *x*, *y*, *z* Cartesian axes, and the box faces are thus all
|
||||||
|
rectangular. Triclinic allows for a more general parallelepiped shape
|
||||||
|
in which edges are aligned with three arbitrary vectors and the box
|
||||||
|
faces are parallelograms. In each dimension box faces can be periodic,
|
||||||
|
or non-periodic with fixed or shrink-wrapped boundaries. In the fixed
|
||||||
|
case, atoms which move outside the face are deleted; shrink-wrapped
|
||||||
|
means the position of the box face adjusts continuously to enclose all
|
||||||
|
the atoms.
|
||||||
|
|
||||||
|
For distributed-memory MPI parallelism, the simulation box is spatially
|
||||||
|
decomposed (partitioned) into non-overlapping sub-domains which fill the
|
||||||
|
box. The default partitioning, "brick", is most suitable when atom
|
||||||
|
density is roughly uniform, as shown in the left-side images of the
|
||||||
|
:ref:`domain-decomposition` figure. The sub-domains comprise a regular
|
||||||
|
grid and all sub-domains are identical in size and shape. Both the
|
||||||
|
orthogonal and triclinic boxes can deform continuously during a
|
||||||
|
simulation, e.g. to compress a solid or shear a liquid, in which case
|
||||||
|
the processor sub-domains likewise deform.
|
||||||
|
|
||||||
|
|
||||||
|
For models with non-uniform density, the number of particles per
|
||||||
|
processor can be load-imbalanced with the default partitioning. This
|
||||||
|
reduces parallel efficiency, as the overall simulation rate is limited
|
||||||
|
by the slowest processor, i.e. the one with the largest computational
|
||||||
|
load. For such models, LAMMPS supports multiple strategies to reduce
|
||||||
|
the load imbalance:
|
||||||
|
|
||||||
|
- The processor grid decomposition is by default based on the simulation
|
||||||
|
cell volume and tries to optimize the volume to surface ratio for the sub-domains.
|
||||||
|
This can be changed with the :doc:`processors command <processors>`.
|
||||||
|
- The parallel planes defining the size of the sub-domains can be shifted
|
||||||
|
with the :doc:`balance command <balance>`. Which can be done in addition
|
||||||
|
to choosing a more optimal processor grid.
|
||||||
|
- The recursive bisectioning algorithm in combination with the "tiled"
|
||||||
|
communication style can produce a partitioning with equal numbers of
|
||||||
|
particles in each sub-domain.
|
||||||
|
|
||||||
|
|
||||||
|
.. |decomp1| image:: img/decomp-regular.png
|
||||||
|
:width: 24%
|
||||||
|
|
||||||
|
.. |decomp2| image:: img/decomp-processors.png
|
||||||
|
:width: 24%
|
||||||
|
|
||||||
|
.. |decomp3| image:: img/decomp-balance.png
|
||||||
|
:width: 24%
|
||||||
|
|
||||||
|
.. |decomp4| image:: img/decomp-rcb.png
|
||||||
|
:width: 24%
|
||||||
|
|
||||||
|
|decomp1| |decomp2| |decomp3| |decomp4|
|
||||||
|
|
||||||
|
The pictures above demonstrate the differences for a 2d system with 12 MPI ranks.
|
||||||
|
Due to the vacuum in the system, the default decomposition is unbalanced
|
||||||
|
with several MPI ranks without atoms (left). By forcing a 1x12x1 processor
|
||||||
|
grid, every MPI rank does computations now, but the amount of communication
|
||||||
|
between sub-domains is increased (center left). With a 2x6x1 processor grid and
|
||||||
|
shifting the sub-domain divisions, the load imbalance is also reduced and
|
||||||
|
the amount of communication required between sub-domains is less (center right).
|
||||||
|
And using the recursive bisectioning leads to further improved decomposition (right).
|
||||||
|
|
||||||
|
|
||||||
|
Communication
|
||||||
|
^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Neighbor lists
|
||||||
|
^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Long-range interactions
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
BIN
doc/src/img/decomp-balance.png
Normal file
BIN
doc/src/img/decomp-balance.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 364 KiB |
BIN
doc/src/img/decomp-processors.png
Normal file
BIN
doc/src/img/decomp-processors.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 332 KiB |
BIN
doc/src/img/decomp-rcb.png
Normal file
BIN
doc/src/img/decomp-rcb.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 363 KiB |
BIN
doc/src/img/decomp-regular.png
Normal file
BIN
doc/src/img/decomp-regular.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 374 KiB |
BIN
doc/src/img/domain-decomp.png
Normal file
BIN
doc/src/img/domain-decomp.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.2 MiB |
Reference in New Issue
Block a user