revise based on suggestions from languagetool.org
This commit is contained in:
@ -14,8 +14,8 @@ Owned and ghost atoms
|
||||
As described on the :doc:`parallel partitioning algorithms
|
||||
<Developer_par_part>` page, LAMMPS spatially decomposes the simulation
|
||||
domain, either in a *brick* or *tiled* manner. Each processor (MPI
|
||||
task) owns atoms within its sub-domain and additionally stores ghost
|
||||
atoms within a cutoff distance of its sub-domain.
|
||||
task) owns atoms within its subdomain and additionally stores ghost
|
||||
atoms within a cutoff distance of its subdomain.
|
||||
|
||||
Forward and reverse communication
|
||||
=================================
|
||||
|
||||
@ -139,7 +139,7 @@ Periodic boundary conditions are then applied by the Domain class via
|
||||
its ``pbc()`` method to remap particles that have moved outside the
|
||||
simulation box back into the box. Note that this is not done every
|
||||
timestep, but only when neighbor lists are rebuilt. This is so that
|
||||
each processor's sub-domain will have consistent (nearby) atom
|
||||
each processor's subdomain will have consistent (nearby) atom
|
||||
coordinates for its owned and ghost atoms. It is also why dumped atom
|
||||
coordinates may be slightly outside the simulation box if not dumped
|
||||
on a step where the neighbor lists are rebuilt.
|
||||
@ -153,10 +153,10 @@ method of the Comm class and ``setup_bins()`` method of the Neighbor
|
||||
class perform the update.
|
||||
|
||||
The code is now ready to migrate atoms that have left a processor's
|
||||
geometric sub-domain to new processors. The ``exchange()`` method of
|
||||
geometric subdomain to new processors. The ``exchange()`` method of
|
||||
the Comm class performs this operation. The ``borders()`` method of the
|
||||
Comm class then identifies ghost atoms surrounding each processor's
|
||||
sub-domain and communicates ghost atom information to neighboring
|
||||
subdomain and communicates ghost atom information to neighboring
|
||||
processors. It does this by looping over all the atoms owned by a
|
||||
processor to make lists of those to send to each neighbor processor. On
|
||||
subsequent timesteps, the lists are used by the ``Comm::forward_comm()``
|
||||
|
||||
@ -28,9 +28,9 @@ grid.
|
||||
|
||||
More specifically, a grid point is defined for each cell (by default
|
||||
the center point), and a processor owns a grid cell if its point is
|
||||
within the processor's spatial sub-domain. The union of processor
|
||||
sub-domains is the global simulation box. If a grid point is on the
|
||||
boundary of two sub-domains, the lower processor owns the grid cell. A
|
||||
within the processor's spatial subdomain. The union of processor
|
||||
subdomains is the global simulation box. If a grid point is on the
|
||||
boundary of two subdomains, the lower processor owns the grid cell. A
|
||||
processor may also store copies of ghost cells which surround its
|
||||
owned cells.
|
||||
|
||||
@ -62,7 +62,7 @@ y-dimension. It is even possible to define a 1x1x1 3d grid, though it
|
||||
may be inefficient to use it in a computational sense.
|
||||
|
||||
Note that the choice of grid size is independent of the number of
|
||||
processors or their layout in a grid of processor sub-domains which
|
||||
processors or their layout in a grid of processor subdomains which
|
||||
overlays the simulations domain. Depending on the distributed grid
|
||||
size, a single processor may own many 1000s or no grid cells.
|
||||
|
||||
@ -235,7 +235,7 @@ invoked, because they influence its operation.
|
||||
void set_zfactor(double factor);
|
||||
|
||||
Processors own a grid cell if a point within the grid cell is inside
|
||||
the processor's sub-domain. By default this is the center point of the
|
||||
the processor's subdomain. By default this is the center point of the
|
||||
grid cell. The *set_shift_grid()* method can change this. The *shift*
|
||||
argument is a value from 0.0 to 1.0 (inclusive) which is the offset of
|
||||
the point within the grid cell in each dimension. The default is 0.5
|
||||
@ -245,9 +245,9 @@ typically no need to change the default as it is optimal for
|
||||
minimizing the number of ghost cells needed.
|
||||
|
||||
If a processor maps its particles to grid cells, it needs to allow for
|
||||
its particles being outside its sub-domain between reneighboring. The
|
||||
its particles being outside its subdomain between reneighboring. The
|
||||
*distance* argument of the *set_distance()* method sets the furthest
|
||||
distance outside a processor's sub-domain which a particle can move.
|
||||
distance outside a processor's subdomain which a particle can move.
|
||||
Typically this is half the neighbor skin distance, assuming
|
||||
reneighboring is done appropriately. This distance is used in
|
||||
determining how many ghost cells a processor needs to store to enable
|
||||
@ -295,7 +295,7 @@ to the Grid class via the *set_zfactor()* method (*set_yfactor()* for
|
||||
2d grids). The Grid class will then assign ownership of the 1/3 of
|
||||
grid cells that overlay the simulation box to the processors which
|
||||
also overlay the simulation box. The remaining 2/3 of the grid cells
|
||||
are assigned to processors whose sub-domains are adjacent to the upper
|
||||
are assigned to processors whose subdomains are adjacent to the upper
|
||||
z boundary of the simulation box.
|
||||
|
||||
----------
|
||||
@ -549,13 +549,13 @@ Grid class remap methods for load balancing
|
||||
The following methods are used when a load-balancing operation,
|
||||
triggered by the :doc:`balance <balance>` or :doc:`fix balance
|
||||
<fix_balance>` commands, changes the partitioning of the simulation
|
||||
domain into processor sub-domains.
|
||||
domain into processor subdomains.
|
||||
|
||||
In order to work with load-balancing, any style command (compute, fix,
|
||||
pair, or kspace style) which allocates a grid and stores per-grid data
|
||||
should define a *reset_grid()* method; it takes no arguments. It will
|
||||
be called by the two balance commands after they have reset processor
|
||||
sub-domains and migrated atoms (particles) to new owning processors.
|
||||
subdomains and migrated atoms (particles) to new owning processors.
|
||||
The *reset_grid()* method will typically perform some or all of the
|
||||
following operations. See the src/fix_ave_grid.cpp and
|
||||
src/EXTRA_FIX/fix_ttm_grid.cpp files for examples of *reset_grid()*
|
||||
@ -564,7 +564,7 @@ functions.
|
||||
|
||||
First, the *reset_grid()* method can instantiate new grid(s) of the
|
||||
same global size, then call *setup_grid()* to partition them via the
|
||||
new processor sub-domains. At this point, it can invoke the
|
||||
new processor subdomains. At this point, it can invoke the
|
||||
*identical()* method which compares the owned and ghost grid cell
|
||||
index bounds between two grids, the old grid passed as a pointer
|
||||
argument, and the new grid whose *identical()* method is being called.
|
||||
|
||||
@ -102,7 +102,7 @@ build is then :doc:`processed in parallel <Developer_par_neigh>`.
|
||||
The most commonly required neighbor list is a so-called "half" neighbor
|
||||
list, where each pair of atoms is listed only once (except when the
|
||||
:doc:`newton command setting <newton>` for pair is off; in that case
|
||||
pairs straddling sub-domains or periodic boundaries will be listed twice).
|
||||
pairs straddling subdomains or periodic boundaries will be listed twice).
|
||||
Thus these are the default settings when a neighbor list request is created in:
|
||||
|
||||
.. code-block:: c++
|
||||
@ -361,7 +361,7 @@ allocated as a 1d vector or 3d array. Either way, the ordering of
|
||||
values within contiguous memory x fastest, then y, z slowest.
|
||||
|
||||
For the ``3d decomposition`` of the grid, the global grid is
|
||||
partitioned into bricks that correspond to the sub-domains of the
|
||||
partitioned into bricks that correspond to the subdomains of the
|
||||
simulation box that each processor owns. Often, this is a regular 3d
|
||||
array (Px by Py by Pz) of bricks, where P = number of processors =
|
||||
Px * Py * Pz. More generally it can be a tiled decomposition, where
|
||||
|
||||
@ -7,16 +7,16 @@ large systems provided it uses a correspondingly large number of MPI
|
||||
processes. Since The per-atom data (atom IDs, positions, velocities,
|
||||
types, etc.) To be able to compute the short-range interactions MPI
|
||||
processes need not only access to data of atoms they "own" but also
|
||||
information about atoms from neighboring sub-domains, in LAMMPS referred
|
||||
information about atoms from neighboring subdomains, in LAMMPS referred
|
||||
to as "ghost" atoms. These are copies of atoms storing required
|
||||
per-atom data for up to the communication cutoff distance. The green
|
||||
dashed-line boxes in the :ref:`domain-decomposition` figure illustrate
|
||||
the extended ghost-atom sub-domain for one processor.
|
||||
the extended ghost-atom subdomain for one processor.
|
||||
|
||||
This approach is also used to implement periodic boundary
|
||||
conditions: atoms that lie within the cutoff distance across a periodic
|
||||
boundary are also stored as ghost atoms and taken from the periodic
|
||||
replication of the sub-domain, which may be the same sub-domain, e.g. if
|
||||
replication of the subdomain, which may be the same subdomain, e.g. if
|
||||
running in serial. As a consequence of this, force computation in
|
||||
LAMMPS is not subject to minimum image conventions and thus cutoffs may
|
||||
be larger than half the simulation domain.
|
||||
@ -28,10 +28,10 @@ be larger than half the simulation domain.
|
||||
ghost atom communication
|
||||
|
||||
This figure shows the ghost atom communication patterns between
|
||||
sub-domains for "brick" (left) and "tiled" communication styles for
|
||||
subdomains for "brick" (left) and "tiled" communication styles for
|
||||
2d simulations. The numbers indicate MPI process ranks. Here the
|
||||
sub-domains are drawn spatially separated for clarity. The
|
||||
dashed-line box is the extended sub-domain of processor 0 which
|
||||
subdomains are drawn spatially separated for clarity. The
|
||||
dashed-line box is the extended subdomain of processor 0 which
|
||||
includes its ghost atoms. The red- and blue-shaded boxes are the
|
||||
regions of communicated ghost atoms.
|
||||
|
||||
@ -42,7 +42,7 @@ atom communication is performed in two stages for a 2d simulation (three
|
||||
in 3d) for both a regular and irregular partitioning of the simulation
|
||||
box. For the regular case (left) atoms are exchanged first in the
|
||||
*x*-direction, then in *y*, with four neighbors in the grid of processor
|
||||
sub-domains.
|
||||
subdomains.
|
||||
|
||||
In the *x* stage, processor ranks 1 and 2 send owned atoms in their
|
||||
red-shaded regions to rank 0 (and vice versa). Then in the *y* stage,
|
||||
@ -55,7 +55,7 @@ For the irregular case (right) the two stages are similar, but a
|
||||
processor can have more than one neighbor in each direction. In the
|
||||
*x* stage, MPI ranks 1,2,3 send owned atoms in their red-shaded regions to
|
||||
rank 0 (and vice versa). These include only atoms between the lower
|
||||
and upper *y*-boundary of rank 0's sub-domain. In the *y* stage, ranks
|
||||
and upper *y*-boundary of rank 0's subdomain. In the *y* stage, ranks
|
||||
4,5,6 send atoms in their blue-shaded regions to rank 0. This may
|
||||
include ghost atoms they received in the *x* stage, but only if they
|
||||
are needed by rank 0 to fill its extended ghost atom regions in the
|
||||
@ -110,11 +110,11 @@ performed in LAMMPS:
|
||||
over 3x the length of a stretched bond for dihedral interactions. It
|
||||
can also exceed the periodic box size. For the regular communication
|
||||
pattern (left), if the cutoff distance extends beyond a neighbor
|
||||
processor's sub-domain, then multiple exchanges are performed in the
|
||||
processor's subdomain, then multiple exchanges are performed in the
|
||||
same direction. Each exchange is with the same neighbor processor,
|
||||
but buffers are packed/unpacked using a different list of atoms. For
|
||||
forward communication, in the first exchange a processor sends only
|
||||
owned atoms. In subsequent exchanges, it sends ghost atoms received
|
||||
in previous exchanges. For the irregular pattern (right) overlaps of
|
||||
a processor's extended ghost-atom sub-domain with all other processors
|
||||
a processor's extended ghost-atom subdomain with all other processors
|
||||
in each dimension are detected.
|
||||
|
||||
@ -20,7 +20,7 @@ e) electric field values from grid points near each atom are interpolated to com
|
||||
|
||||
For any of the spatial-decomposition partitioning schemes each processor
|
||||
owns the brick-shaped portion of FFT grid points contained within its
|
||||
sub-domain. The two interpolation operations use a stencil of grid
|
||||
subdomain. The two interpolation operations use a stencil of grid
|
||||
points surrounding each atom. To accommodate the stencil size, each
|
||||
processor also stores a few layers of ghost grid points surrounding its
|
||||
brick. Forward and reverse communication of grid point values is
|
||||
@ -64,7 +64,7 @@ direction of the 1d FFTs it has to perform. LAMMPS uses the
|
||||
pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure.
|
||||
|
||||
Initially (far left), each processor owns a brick of same-color grid
|
||||
cells (actually grid points) contained within in its sub-domain. A
|
||||
cells (actually grid points) contained within in its subdomain. A
|
||||
brick-to-pencil communication operation converts this layout to 1d
|
||||
pencils in the *x*-dimension (center left). Again, cells of the same
|
||||
color are owned by the same processor. Each processor can then compute
|
||||
@ -161,8 +161,8 @@ grid/particle operations that LAMMPS supports:
|
||||
<partition>` calculation and then use the :doc:`verlet/split
|
||||
integrator <run_style>` to perform the PPPM computation on a
|
||||
dedicated, separate partition of MPI processes. This uses an integer
|
||||
"1:*p*" mapping of *p* sub-domains of the atom decomposition to one
|
||||
sub-domain of the FFT grid decomposition and where pairwise non-bonded
|
||||
"1:*p*" mapping of *p* subdomains of the atom decomposition to one
|
||||
subdomain of the FFT grid decomposition and where pairwise non-bonded
|
||||
and bonded forces and energies are computed on the larger partition
|
||||
and the PPPM kspace computation concurrently on the smaller partition.
|
||||
|
||||
@ -172,7 +172,7 @@ grid/particle operations that LAMMPS supports:
|
||||
|
||||
- LAMMPS implements a ``GridComm`` class which overlays the simulation
|
||||
domain with a regular grid, partitions it across processors in a
|
||||
manner consistent with processor sub-domains, and provides methods for
|
||||
manner consistent with processor subdomains, and provides methods for
|
||||
forward and reverse communication of owned and ghost grid point
|
||||
values. It is used for PPPM as an FFT grid (as outlined above) and
|
||||
also for the MSM algorithm which uses a cascade of grid sizes from
|
||||
|
||||
@ -22,7 +22,7 @@ last reneighboring; this and other options of the neighbor list rebuild
|
||||
can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
|
||||
|
||||
On steps when reneighboring is performed, atoms which have moved outside
|
||||
their owning processor's sub-domain are first migrated to new processors
|
||||
their owning processor's subdomain are first migrated to new processors
|
||||
via communication. Periodic boundary conditions are also (only)
|
||||
enforced on these steps to ensure each atom is re-assigned to the
|
||||
correct processor. After migration, the atoms owned by each processor
|
||||
@ -39,12 +39,12 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
||||
|
||||
neighbor list stencils
|
||||
|
||||
A 2d simulation sub-domain (thick black line) and the corresponding
|
||||
A 2d simulation subdomain (thick black line) and the corresponding
|
||||
ghost atom cutoff region (dashed blue line) for both orthogonal
|
||||
(left) and triclinic (right) domains. A regular grid of neighbor
|
||||
bins (thin lines) overlays the entire simulation domain and need not
|
||||
align with sub-domain boundaries; only the portion overlapping the
|
||||
augmented sub-domain is shown. In the triclinic case it overlaps the
|
||||
align with subdomain boundaries; only the portion overlapping the
|
||||
augmented subdomain is shown. In the triclinic case it overlaps the
|
||||
bounding box of the tilted rectangle. The blue- and red-shaded bins
|
||||
represent a stencil of bins searched to find neighbors of a particular
|
||||
atom (black dot).
|
||||
@ -52,8 +52,8 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
||||
To build a local neighbor list in linear time, the simulation domain is
|
||||
overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
|
||||
as shown in the :ref:`neighbor-stencil` figure for 2d models and a
|
||||
single MPI processor's sub-domain. Each processor stores a set of
|
||||
neighbor bins which overlap its sub-domain extended by the neighbor
|
||||
single MPI processor's subdomain. Each processor stores a set of
|
||||
neighbor bins which overlap its subdomain extended by the neighbor
|
||||
cutoff distance :math:`R_n`. As illustrated, the bins need not align
|
||||
with processor boundaries; an integer number in each dimension is fit to
|
||||
the size of the entire simulation box.
|
||||
@ -144,7 +144,7 @@ supports:
|
||||
|
||||
- For small and sparse systems and as a fallback method, LAMMPS also
|
||||
supports neighbor list construction without binning by using a full
|
||||
:math:`O(N^2)` loop over all *i,j* atom pairs in a sub-domain when
|
||||
:math:`O(N^2)` loop over all *i,j* atom pairs in a subdomain when
|
||||
using the :doc:`neighbor nsq <neighbor>` command.
|
||||
|
||||
- Dependent on the "pair" setting of the :doc:`newton <newton>` command,
|
||||
|
||||
@ -15,8 +15,8 @@ distributed-memory parallelism is set with the :doc:`comm_style command
|
||||
for MPI parallelization: "brick" on the left with an orthogonal
|
||||
(left) and a triclinic (middle) simulation domain, and a "tiled"
|
||||
decomposition (right). The black lines show the division into
|
||||
sub-domains and the contained atoms are "owned" by the corresponding
|
||||
MPI process. The green dashed lines indicate how sub-domains are
|
||||
subdomains and the contained atoms are "owned" by the corresponding
|
||||
MPI process. The green dashed lines indicate how subdomains are
|
||||
extended with "ghost" atoms up to the communication cutoff distance.
|
||||
|
||||
The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
|
||||
@ -32,14 +32,14 @@ means the position of the box face adjusts continuously to enclose all
|
||||
the atoms.
|
||||
|
||||
For distributed-memory MPI parallelism, the simulation box is spatially
|
||||
decomposed (partitioned) into non-overlapping sub-domains which fill the
|
||||
decomposed (partitioned) into non-overlapping subdomains which fill the
|
||||
box. The default partitioning, "brick", is most suitable when atom
|
||||
density is roughly uniform, as shown in the left-side images of the
|
||||
:ref:`domain-decomposition` figure. The sub-domains comprise a regular
|
||||
grid and all sub-domains are identical in size and shape. Both the
|
||||
:ref:`domain-decomposition` figure. The subdomains comprise a regular
|
||||
grid and all subdomains are identical in size and shape. Both the
|
||||
orthogonal and triclinic boxes can deform continuously during a
|
||||
simulation, e.g. to compress a solid or shear a liquid, in which case
|
||||
the processor sub-domains likewise deform.
|
||||
the processor subdomains likewise deform.
|
||||
|
||||
|
||||
For models with non-uniform density, the number of particles per
|
||||
@ -50,14 +50,14 @@ load. For such models, LAMMPS supports multiple strategies to reduce
|
||||
the load imbalance:
|
||||
|
||||
- The processor grid decomposition is by default based on the simulation
|
||||
cell volume and tries to optimize the volume to surface ratio for the sub-domains.
|
||||
cell volume and tries to optimize the volume to surface ratio for the subdomains.
|
||||
This can be changed with the :doc:`processors command <processors>`.
|
||||
- The parallel planes defining the size of the sub-domains can be shifted
|
||||
- The parallel planes defining the size of the subdomains can be shifted
|
||||
with the :doc:`balance command <balance>`. Which can be done in addition
|
||||
to choosing a more optimal processor grid.
|
||||
- The recursive bisectioning algorithm in combination with the "tiled"
|
||||
communication style can produce a partitioning with equal numbers of
|
||||
particles in each sub-domain.
|
||||
particles in each subdomain.
|
||||
|
||||
|
||||
.. |decomp1| image:: img/decomp-regular.png
|
||||
@ -76,14 +76,14 @@ the load imbalance:
|
||||
|
||||
The pictures above demonstrate different decompositions for a 2d system
|
||||
with 12 MPI ranks. The atom colors indicate the load imbalance of each
|
||||
sub-domain with green being optimal and red the least optimal.
|
||||
subdomain with green being optimal and red the least optimal.
|
||||
|
||||
Due to the vacuum in the system, the default decomposition is unbalanced
|
||||
with several MPI ranks without atoms (left). By forcing a 1x12x1
|
||||
processor grid, every MPI rank does computations now, but number of
|
||||
atoms per sub-domain is still uneven and the thin slice shape increases
|
||||
the amount of communication between sub-domains (center left). With a
|
||||
2x6x1 processor grid and shifting the sub-domain divisions, the load
|
||||
atoms per subdomain is still uneven and the thin slice shape increases
|
||||
the amount of communication between subdomains (center left). With a
|
||||
2x6x1 processor grid and shifting the subdomain divisions, the load
|
||||
imbalance is further reduced and the amount of communication required
|
||||
between sub-domains is less (center right). And using the recursive
|
||||
between subdomains is less (center right). And using the recursive
|
||||
bisectioning leads to further improved decomposition (right).
|
||||
|
||||
@ -7,7 +7,7 @@ decomposition. The parallelization aims to be efficient, and resulting
|
||||
in good strong scaling (= good speedup for the same system) and good
|
||||
weak scaling (= the computational cost of enlarging the system is
|
||||
proportional to the system size). Additional parallelization using GPUs
|
||||
or OpenMP can also be applied within the sub-domain assigned to an MPI
|
||||
or OpenMP can also be applied within the subdomain assigned to an MPI
|
||||
process. For clarity, most of the following illustrations show the 2d
|
||||
simulation case. The underlying algorithms in those cases, however,
|
||||
apply to both 2d and 3d cases equally well.
|
||||
|
||||
@ -647,7 +647,7 @@ Communication buffer coding with *ubuf*
|
||||
---------------------------------------
|
||||
|
||||
LAMMPS uses communication buffers where it collects data from various
|
||||
class instances and then exchanges the data with neighboring sub-domains.
|
||||
class instances and then exchanges the data with neighboring subdomains.
|
||||
For simplicity those buffers are defined as ``double`` buffers and
|
||||
used for doubles and integer numbers. This presents a unique problem
|
||||
when 64-bit integers are used. While the storage needed for a ``double``
|
||||
|
||||
@ -5635,7 +5635,7 @@ Doc page with :doc:`WARNING messages <Errors_warnings>`
|
||||
Lost atoms are checked for each time thermo output is done. See the
|
||||
thermo_modify lost command for options. Lost atoms usually indicate
|
||||
bad dynamics, e.g. atoms have been blown far out of the simulation
|
||||
box, or moved further than one processor's sub-domain away before
|
||||
box, or moved further than one processor's subdomain away before
|
||||
reneighboring.
|
||||
|
||||
*MEAM library error %d*
|
||||
@ -6266,14 +6266,14 @@ keyword to allow for additional bonds to be formed
|
||||
One or more atoms are attempting to map their charge to a MSM grid point
|
||||
that is not owned by a processor. This is likely for one of two
|
||||
reasons, both of them bad. First, it may mean that an atom near the
|
||||
boundary of a processor's sub-domain has moved more than 1/2 the
|
||||
boundary of a processor's subdomain has moved more than 1/2 the
|
||||
:doc:`neighbor skin distance <neighbor>` without neighbor lists being
|
||||
rebuilt and atoms being migrated to new processors. This also means
|
||||
you may be missing pairwise interactions that need to be computed.
|
||||
The solution is to change the re-neighboring criteria via the
|
||||
:doc:`neigh_modify <neigh_modify>` command. The safest settings are
|
||||
"delay 0 every 1 check yes". Second, it may mean that an atom has
|
||||
moved far outside a processor's sub-domain or even the entire
|
||||
moved far outside a processor's subdomain or even the entire
|
||||
simulation box. This indicates bad physics, e.g. due to highly
|
||||
overlapping atoms, too large a timestep, etc.
|
||||
|
||||
@ -6281,14 +6281,14 @@ keyword to allow for additional bonds to be formed
|
||||
One or more atoms are attempting to map their charge to a PPPM grid
|
||||
point that is not owned by a processor. This is likely for one of two
|
||||
reasons, both of them bad. First, it may mean that an atom near the
|
||||
boundary of a processor's sub-domain has moved more than 1/2 the
|
||||
boundary of a processor's subdomain has moved more than 1/2 the
|
||||
:doc:`neighbor skin distance <neighbor>` without neighbor lists being
|
||||
rebuilt and atoms being migrated to new processors. This also means
|
||||
you may be missing pairwise interactions that need to be computed.
|
||||
The solution is to change the re-neighboring criteria via the
|
||||
:doc:`neigh_modify <neigh_modify>` command. The safest settings are
|
||||
"delay 0 every 1 check yes". Second, it may mean that an atom has
|
||||
moved far outside a processor's sub-domain or even the entire
|
||||
moved far outside a processor's subdomain or even the entire
|
||||
simulation box. This indicates bad physics, e.g. due to highly
|
||||
overlapping atoms, too large a timestep, etc.
|
||||
|
||||
@ -6296,14 +6296,14 @@ keyword to allow for additional bonds to be formed
|
||||
One or more atoms are attempting to map their charge to a PPPM grid
|
||||
point that is not owned by a processor. This is likely for one of two
|
||||
reasons, both of them bad. First, it may mean that an atom near the
|
||||
boundary of a processor's sub-domain has moved more than 1/2 the
|
||||
boundary of a processor's subdomain has moved more than 1/2 the
|
||||
:doc:`neighbor skin distance <neighbor>` without neighbor lists being
|
||||
rebuilt and atoms being migrated to new processors. This also means
|
||||
you may be missing pairwise interactions that need to be computed.
|
||||
The solution is to change the re-neighboring criteria via the
|
||||
:doc:`neigh_modify <neigh_modify>` command. The safest settings are
|
||||
"delay 0 every 1 check yes". Second, it may mean that an atom has
|
||||
moved far outside a processor's sub-domain or even the entire
|
||||
moved far outside a processor's subdomain or even the entire
|
||||
simulation box. This indicates bad physics, e.g. due to highly
|
||||
overlapping atoms, too large a timestep, etc.
|
||||
|
||||
|
||||
@ -109,9 +109,9 @@ Doc page with :doc:`ERROR messages <Errors_messages>`
|
||||
*Communication cutoff is shorter than a bond length based estimate. This may lead to errors.*
|
||||
Since LAMMPS stores topology data with individual atoms, all atoms
|
||||
comprising a bond, angle, dihedral or improper must be present on any
|
||||
sub-domain that "owns" the atom with the information, either as a
|
||||
subdomain that "owns" the atom with the information, either as a
|
||||
local or a ghost atom. The communication cutoff is what determines up
|
||||
to what distance from a sub-domain boundary ghost atoms are created.
|
||||
to what distance from a subdomain boundary ghost atoms are created.
|
||||
The communication cutoff is by default the largest non-bonded cutoff
|
||||
plus the neighbor skin distance, but for short or non-bonded cutoffs
|
||||
and/or long bonds, this may not be sufficient. This warning indicates
|
||||
@ -398,7 +398,7 @@ This will most likely cause errors in kinetic fluctuations.
|
||||
Lost atoms are checked for each time thermo output is done. See the
|
||||
thermo_modify lost command for options. Lost atoms usually indicate
|
||||
bad dynamics, e.g. atoms have been blown far out of the simulation
|
||||
box, or moved further than one processor's sub-domain away before
|
||||
box, or moved further than one processor's subdomain away before
|
||||
reneighboring.
|
||||
|
||||
*MSM mesh too small, increasing to 2 points in each direction*
|
||||
@ -582,13 +582,13 @@ This will most likely cause errors in kinetic fluctuations.
|
||||
needed. The requested volume fraction may be too high, or other atoms
|
||||
may be in the insertion region.
|
||||
|
||||
*Proc sub-domain size < neighbor skin, could lead to lost atoms*
|
||||
*Proc subdomain size < neighbor skin, could lead to lost atoms*
|
||||
The decomposition of the physical domain (likely due to load
|
||||
balancing) has led to a processor's sub-domain being smaller than the
|
||||
balancing) has led to a processor's subdomain being smaller than the
|
||||
neighbor skin in one or more dimensions. Since reneighboring is
|
||||
triggered by atoms moving the skin distance, this may lead to lost
|
||||
atoms, if an atom moves all the way across a neighboring processor's
|
||||
sub-domain before reneighboring is triggered.
|
||||
subdomain before reneighboring is triggered.
|
||||
|
||||
*Reducing PPPM order b/c stencil extends beyond nearest neighbor processor*
|
||||
This may lead to a larger grid than desired. See the kspace_modify overlap
|
||||
|
||||
@ -11,7 +11,7 @@ more values (data).
|
||||
|
||||
The grid cells and data they store are distributed across processors.
|
||||
Each processor owns the grid cells (and data) whose center points lie
|
||||
within the spatial sub-domain of the processor. If needed for its
|
||||
within the spatial subdomain of the processor. If needed for its
|
||||
computations, a processor may also store ghost grid cells with their
|
||||
data.
|
||||
|
||||
@ -28,7 +28,7 @@ box size, as set by the :doc:`boundary <boundary>` command for fixed
|
||||
or shrink-wrapped boundaries.
|
||||
|
||||
If load-balancing is invoked by the :doc:`balance <balance>` or
|
||||
:doc:`fix balance <fix_balance>` commands, then the sub-domain owned
|
||||
:doc:`fix balance <fix_balance>` commands, then the subdomain owned
|
||||
by a processor can change which may also change which grid cells they
|
||||
own.
|
||||
|
||||
|
||||
@ -59,7 +59,7 @@ of bond distances.
|
||||
A per-grid datum is one or more values per grid cell, for a grid which
|
||||
overlays the simulation domain. The grid cells and the data they
|
||||
store are distributed across processors; each processor owns the grid
|
||||
cells whose center point falls within its sub-domain.
|
||||
cells whose center point falls within its subdomain.
|
||||
|
||||
.. _scalar:
|
||||
|
||||
@ -322,7 +322,7 @@ The chief difference between the :doc:`fix ave/grid <fix_ave_grid>`
|
||||
and :doc:`fix ave/chunk <fix_ave_chunk>` commands when used in this
|
||||
context is that the former uses a distributed grid, while the latter
|
||||
uses a global grid. Distributed means that each processor owns the
|
||||
subset of grid cells within its sub-domain. Global means that each
|
||||
subset of grid cells within its subdomain. Global means that each
|
||||
processor owns a copy of the entire grid. The :doc:`fix ave/grid
|
||||
<fix_ave_grid>` command is thus more efficient for large grids.
|
||||
|
||||
|
||||
@ -783,19 +783,19 @@ Pitfalls
|
||||
**Parallel Scalability**
|
||||
|
||||
LAMMPS operates in parallel in a :doc:`spatial-decomposition mode
|
||||
<Developer_par_part>`, where each processor owns a spatial sub-domain of
|
||||
<Developer_par_part>`, where each processor owns a spatial subdomain of
|
||||
the overall simulation domain and communicates with its neighboring
|
||||
processors via distributed-memory message passing (MPI) to acquire ghost
|
||||
atom information to allow forces on the atoms it owns to be
|
||||
computed. LAMMPS also uses Verlet neighbor lists which are recomputed
|
||||
every few timesteps as particles move. On these timesteps, particles
|
||||
also migrate to new processors as needed. LAMMPS decomposes the overall
|
||||
simulation domain so that spatial sub-domains of nearly equal volume are
|
||||
assigned to each processor. When each sub-domain contains nearly the
|
||||
simulation domain so that spatial subdomains of nearly equal volume are
|
||||
assigned to each processor. When each subdomain contains nearly the
|
||||
same number of particles, this results in a reasonable load balance
|
||||
among all processors. As is more typical with some peridynamic
|
||||
simulations, some sub-domains may contain many particles while other
|
||||
sub-domains contain few particles, resulting in a load imbalance that
|
||||
simulations, some subdomains may contain many particles while other
|
||||
subdomains contain few particles, resulting in a load imbalance that
|
||||
impacts parallel scalability.
|
||||
|
||||
**Setting the "skin" distance**
|
||||
|
||||
@ -150,7 +150,7 @@ option with either of the commands.
|
||||
|
||||
Note that if a simulation box has a large tilt factor, LAMMPS will run
|
||||
less efficiently, due to the large volume of communication needed to
|
||||
acquire ghost atoms around a processor's irregular-shaped sub-domain.
|
||||
acquire ghost atoms around a processor's irregular-shaped subdomain.
|
||||
For extreme values of tilt, LAMMPS may also lose atoms and generate an
|
||||
error.
|
||||
|
||||
|
||||
@ -38,11 +38,11 @@ to create digital object identifiers (DOI) for stable releases of the
|
||||
LAMMPS source code. There are two types of DOIs for the LAMMPS source code.
|
||||
|
||||
The canonical DOI for **all** versions of LAMMPS, which will always
|
||||
point to the **latest** stable release version is:
|
||||
point to the **latest** stable release version, is:
|
||||
|
||||
- DOI: `10.5281/zenodo.3726416 <https://dx.doi.org/10.5281/zenodo.3726416>`_
|
||||
|
||||
In addition there are DOIs for individual stable releases. Currently there are:
|
||||
In addition there are DOIs generated for individual stable releases:
|
||||
|
||||
- 3 March 2020 version: `DOI:10.5281/zenodo.3726417 <https://dx.doi.org/10.5281/zenodo.3726417>`_
|
||||
- 29 October 2020 version: `DOI:10.5281/zenodo.4157471 <https://dx.doi.org/10.5281/zenodo.4157471>`_
|
||||
@ -65,6 +65,6 @@ for optional features used in a specific run is printed to the screen
|
||||
and log file. Style and output location can be selected with the
|
||||
:ref:`-cite command-line switch <cite>`. Additional references are
|
||||
given in the documentation of the :doc:`corresponding commands
|
||||
<Commands_all>` or in the :doc:`Howto tutorials <Howto>`. So please
|
||||
make certain, that you provide the proper acknowledgments and citations
|
||||
in any published works using LAMMPS.
|
||||
<Commands_all>` or in the :doc:`Howto tutorials <Howto>`. Please make
|
||||
certain, that you provide the proper acknowledgments and citations in
|
||||
any published works using LAMMPS.
|
||||
|
||||
@ -27,7 +27,7 @@ General features
|
||||
* distributed memory message-passing parallelism (MPI)
|
||||
* shared memory multi-threading parallelism (OpenMP)
|
||||
* spatial decomposition of simulation domain for MPI parallelism
|
||||
* particle decomposition inside of spatial decomposition for OpenMP and GPU parallelism
|
||||
* particle decomposition inside spatial decomposition for OpenMP and GPU parallelism
|
||||
* GPLv2 licensed open-source distribution
|
||||
* highly portable C++-11
|
||||
* modular code with most functionality in optional packages
|
||||
@ -113,7 +113,7 @@ Atom creation
|
||||
:doc:`create_atoms <create_atoms>`, :doc:`delete_atoms <delete_atoms>`,
|
||||
:doc:`displace_atoms <displace_atoms>`, :doc:`replicate <replicate>` commands)
|
||||
|
||||
* read in atom coords from files
|
||||
* read in atom coordinates from files
|
||||
* create atoms on one or more lattices (e.g. grain boundaries)
|
||||
* delete geometric or logical groups of atoms (e.g. voids)
|
||||
* replicate existing atoms multiple times
|
||||
@ -173,11 +173,11 @@ Output
|
||||
(:doc:`dump <dump>`, :doc:`restart <restart>` commands)
|
||||
|
||||
* log file of thermodynamic info
|
||||
* text dump files of atom coords, velocities, other per-atom quantities
|
||||
* text dump files of atom coordinates, velocities, other per-atom quantities
|
||||
* dump output on fixed and variable intervals, based timestep or simulated time
|
||||
* binary restart files
|
||||
* parallel I/O of dump and restart files
|
||||
* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
|
||||
* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc.)
|
||||
* user-defined system-wide (log file) or per-atom (dump file) calculations
|
||||
* custom partitioning (chunks) for binning, and static or dynamic grouping of atoms for analysis
|
||||
* spatial, time, and per-chunk averaging of per-atom quantities
|
||||
|
||||
@ -20,22 +20,23 @@ that either closely interface with LAMMPS or extend LAMMPS.
|
||||
|
||||
Here are suggestions on how to perform these tasks:
|
||||
|
||||
* **GUI:** LAMMPS can be built as a library and a Python wrapper that wraps
|
||||
the library interface is provided. Thus, GUI interfaces can be
|
||||
written in Python (or C or C++ if desired) that run LAMMPS and
|
||||
visualize or plot its output. Examples of this are provided in the
|
||||
python directory and described on the :doc:`Python <Python_head>` doc
|
||||
page. Also, there are several external wrappers or GUI front ends.
|
||||
* **Builder:** Several pre-processing tools are packaged with LAMMPS. Some
|
||||
of them convert input files in formats produced by other MD codes such
|
||||
as CHARMM, AMBER, or Insight into LAMMPS input formats. Some of them
|
||||
are simple programs that will build simple molecular systems, such as
|
||||
linear bead-spring polymer chains. The moltemplate program is a true
|
||||
molecular builder that will generate complex molecular models. See
|
||||
the :doc:`Tools <Tools>` page for details on tools packaged with
|
||||
LAMMPS. The `Pre/post processing page <https:/www.lammps.org/prepost.html>`_ of the LAMMPS website
|
||||
* **GUI:** LAMMPS can be built as a library and a Python module that
|
||||
wraps the library interface is provided. Thus, GUI interfaces can be
|
||||
written in Python or C/C++ that run LAMMPS and visualize or plot its
|
||||
output. Examples of this are provided in the python directory and
|
||||
described on the :doc:`Python <Python_head>` doc page. Also, there
|
||||
are several external wrappers or GUI front ends.
|
||||
* **Builder:** Several pre-processing tools are packaged with LAMMPS.
|
||||
Some of them convert input files in formats produced by other MD codes
|
||||
such as CHARMM, AMBER, or Insight into LAMMPS input formats. Some of
|
||||
them are simple programs that will build simple molecular systems,
|
||||
such as linear bead-spring polymer chains. The moltemplate program is
|
||||
a true molecular builder that will generate complex molecular models.
|
||||
See the :doc:`Tools <Tools>` page for details on tools packaged with
|
||||
LAMMPS. The `Pre-/post-processing page
|
||||
<https:/www.lammps.org/prepost.html>`_ of the LAMMPS homepage
|
||||
describes a variety of third party tools for this task. Furthermore,
|
||||
some LAMMPS internal commands allow to reconstruct, or selectively add
|
||||
some internal LAMMPS commands allow reconstructing, or selectively adding
|
||||
topology information, as well as provide the option to insert molecule
|
||||
templates instead of atoms for building bulk molecular systems.
|
||||
* **Force-field assignment:** The conversion tools described in the previous
|
||||
@ -47,33 +48,34 @@ Here are suggestions on how to perform these tasks:
|
||||
powerful and flexible in converting force field and topology data
|
||||
between various MD simulation programs.
|
||||
* **Simulation analysis:** If you want to perform analysis on-the-fly as
|
||||
your simulation runs, see the :doc:`compute <compute>` and
|
||||
:doc:`fix <fix>` doc pages, which list commands that can be used in a
|
||||
LAMMPS input script. Also see the :doc:`Modify <Modify>` page for
|
||||
info on how to add your own analysis code or algorithms to LAMMPS.
|
||||
For post-processing, LAMMPS output such as :doc:`dump file snapshots <dump>` can be converted into formats used by other MD or
|
||||
your simulation runs, see the :doc:`compute <compute>` and :doc:`fix
|
||||
<fix>` doc pages, which list commands that can be used in a LAMMPS
|
||||
input script. Also see the :doc:`Modify <Modify>` page for info on
|
||||
how to add your own analysis code or algorithms to LAMMPS. For
|
||||
post-processing, LAMMPS output such as :doc:`dump file snapshots
|
||||
<dump>` can be converted into formats used by other MD or
|
||||
post-processing codes. To some degree, that conversion can be done
|
||||
directly inside of LAMMPS by interfacing to the VMD molfile plugins.
|
||||
The :doc:`rerun <rerun>` command also allows to do some post-processing
|
||||
of existing trajectories, and through being able to read a variety
|
||||
of file formats, this can also be used for analyzing trajectories
|
||||
from other MD codes. Some post-processing tools packaged with
|
||||
LAMMPS will do these conversions. Scripts provided in the
|
||||
tools/python directory can extract and massage data in dump files to
|
||||
make it easier to import into other programs. See the
|
||||
:doc:`Tools <Tools>` page for details on these various options.
|
||||
* **Visualization:** LAMMPS can produce NETPBM, JPG or PNG snapshot images
|
||||
on-the-fly via its :doc:`dump image <dump_image>` command and pass
|
||||
them to an external program, `FFmpeg <https://www.ffmpeg.org>`_ to generate
|
||||
movies from them. For high-quality, interactive visualization there are
|
||||
many excellent and free tools available. See the
|
||||
`Visualization Tools <https://www.lammps.org/viz.html>`_ page of the
|
||||
LAMMPS website for
|
||||
directly inside LAMMPS by interfacing to the VMD molfile plugins. The
|
||||
:doc:`rerun <rerun>` command also allows post-processing of existing
|
||||
trajectories, and through being able to read a variety of file
|
||||
formats, this can also be used for analyzing trajectories from other
|
||||
MD codes. Some post-processing tools packaged with LAMMPS will do
|
||||
these conversions. Scripts provided in the tools/python directory can
|
||||
extract and massage data in dump files to make it easier to import
|
||||
into other programs. See the :doc:`Tools <Tools>` page for details on
|
||||
these various options.
|
||||
* **Visualization:** LAMMPS can produce NETPBM, JPG, or PNG format
|
||||
snapshot images on-the-fly via its :doc:`dump image <dump_image>`
|
||||
command and pass them to an external program, `FFmpeg
|
||||
<https://www.ffmpeg.org>`_, to generate movies from them. For
|
||||
high-quality, interactive visualization, there are many excellent and
|
||||
free tools available. See the `Visualization Tools
|
||||
<https://www.lammps.org/viz.html>`_ page of the LAMMPS website for
|
||||
visualization packages that can process LAMMPS output data.
|
||||
* **Plotting:** See the next bullet about Pizza.py as well as the
|
||||
:doc:`Python <Python_head>` page for examples of plotting LAMMPS
|
||||
output. Scripts provided with the *python* tool in the tools
|
||||
directory will extract and massage data in log and dump files to make
|
||||
output. Scripts provided with the *python* tool in the ``tools``
|
||||
directory will extract and process data in log and dump files to make
|
||||
it easier to analyze and plot. See the :doc:`Tools <Tools>` doc page
|
||||
for more discussion of the various tools.
|
||||
* **Pizza.py:** Our group has also written a separate toolkit called
|
||||
|
||||
@ -1,20 +1,20 @@
|
||||
Overview of LAMMPS
|
||||
------------------
|
||||
|
||||
LAMMPS is a classical molecular dynamics (MD) code that models
|
||||
ensembles of particles in a liquid, solid, or gaseous state. It can
|
||||
model atomic, polymeric, biological, solid-state (metals, ceramics,
|
||||
oxides), granular, coarse-grained, or macroscopic systems using a
|
||||
variety of interatomic potentials (force fields) and boundary
|
||||
conditions. It can model 2d or 3d systems with only a few particles
|
||||
up to millions or billions.
|
||||
LAMMPS is a classical molecular dynamics (MD) code that models ensembles
|
||||
of particles in a liquid, solid, or gaseous state. It can model atomic,
|
||||
polymeric, biological, solid-state (metals, ceramics, oxides), granular,
|
||||
coarse-grained, or macroscopic systems using a variety of interatomic
|
||||
potentials (force fields) and boundary conditions. It can model 2d or
|
||||
3d systems with sizes ranging from only a few particles up to billions.
|
||||
|
||||
LAMMPS can be built and run on a laptop or desktop machine, but is
|
||||
LAMMPS can be built and run on single laptop or desktop machines, but is
|
||||
designed for parallel computers. It will run in serial and on any
|
||||
parallel machine that supports the `MPI <mpi_>`_ message-passing
|
||||
library. This includes shared-memory boxes and distributed-memory
|
||||
clusters and supercomputers. Parts of LAMMPS also support
|
||||
`OpenMP multi-threading <omp_>`_, vectorization and GPU acceleration.
|
||||
library. This includes shared-memory multicore, multi-CPU servers and
|
||||
distributed-memory clusters and supercomputers. Parts of LAMMPS also
|
||||
support `OpenMP multi-threading <omp_>`_, vectorization, and GPU
|
||||
acceleration.
|
||||
|
||||
.. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface
|
||||
.. _lws: https://www.lammps.org
|
||||
@ -42,11 +42,11 @@ LAMMPS uses neighbor lists to keep track of nearby particles. The lists
|
||||
are optimized for systems with particles that are repulsive at short
|
||||
distances, so that the local density of particles never becomes too
|
||||
large. This is in contrast to methods used for modeling plasma or
|
||||
gravitational bodies (e.g. galaxy formation).
|
||||
gravitational bodies (like galaxy formation).
|
||||
|
||||
On parallel machines, LAMMPS uses spatial-decomposition techniques with
|
||||
MPI parallelization to partition the simulation domain into sub-domains
|
||||
MPI parallelization to partition the simulation domain into subdomains
|
||||
of equal computational cost, one of which is assigned to each processor.
|
||||
Processors communicate and store "ghost" atom information for atoms that
|
||||
border their sub-domain. Multi-threading parallelization and GPU
|
||||
acceleration with with particle-decomposition can be used in addition.
|
||||
border their subdomain. Multi-threading parallelization and GPU
|
||||
acceleration with particle-decomposition can be used in addition.
|
||||
|
||||
@ -30,17 +30,17 @@ can be created using CMake. CMake must be at least version 3.10.
|
||||
Operating systems
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The primary development platform for LAMMPS is Linux. Thus the chances
|
||||
The primary development platform for LAMMPS is Linux. Thus, the chances
|
||||
for LAMMPS to compile without problems on Linux machines are the best.
|
||||
Also compilation and correct execution on macOS and Windows (using
|
||||
Also, compilation and correct execution on macOS and Windows (using
|
||||
Microsoft Visual C++) is checked automatically for largest part of the
|
||||
source code. Some (optional) features are not compatible with all
|
||||
operating systems either through limitations of the source code or
|
||||
source code compatibility or the build system requirements of required
|
||||
libraries.
|
||||
operating systems, either through limitations of the corresponding
|
||||
LAMMPS source code or through source code or build system
|
||||
incompatibilities of required libraries.
|
||||
|
||||
Executables for Windows may be created using either Cygwin or Visual
|
||||
Studio or a Linux to Windows MinGW cross-compiler.
|
||||
Executables for Windows may be created natively using either Cygwin or
|
||||
Visual Studio or with a Linux to Windows MinGW cross-compiler.
|
||||
|
||||
Additionally, FreeBSD and Solaris have been tested successfully.
|
||||
|
||||
@ -49,7 +49,7 @@ Compilers
|
||||
|
||||
The most commonly used compilers are the GNU compilers, but also Clang
|
||||
and the Intel compilers have been successfully used on Linux, macOS, and
|
||||
Windows. Also the Nvidia HPC SDK (formerly PGI compilers) will compile
|
||||
Windows. Also, the Nvidia HPC SDK (formerly PGI compilers) will compile
|
||||
LAMMPS (tested on Linux).
|
||||
|
||||
CPU architectures
|
||||
@ -62,12 +62,14 @@ regularly tested.
|
||||
Portability compliance
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Not all of the LAMMPS source code is fully compliant to all of the above
|
||||
mentioned standards. This is rather typical for projects like LAMMPS
|
||||
that largely depend on contributions of features from the community.
|
||||
Only a subset of the LAMMPS source code is fully compliant to all of the
|
||||
above mentioned standards. This is rather typical for projects like
|
||||
LAMMPS that largely depend on contributions from the user community.
|
||||
Not all contributors are trained as programmers and not all of them have
|
||||
access to a variety of platforms. As part of the continuous integration
|
||||
process, however, all contributions are automatically tested to compile,
|
||||
link, and pass some runtime tests on a selection of Linux flavors,
|
||||
macOS, and Windows with different compilers. Other platforms may be
|
||||
checked occasionally or when portability bug are reported.
|
||||
access to multiple platforms for testing. As part of the continuous
|
||||
integration process, however, all contributions are automatically tested
|
||||
to compile, link, and pass some runtime tests on a selection of Linux
|
||||
flavors, macOS, and Windows, and on Linux with different compilers.
|
||||
Thus portability issues are often found before a pull request is merged.
|
||||
Other platforms may be checked occasionally or when portability bugs are
|
||||
reported.
|
||||
|
||||
@ -30,7 +30,7 @@ course, changing values should be done with care. When accessing per-atom
|
||||
data, please note that these data are the per-processor **local** data and are
|
||||
indexed accordingly. Per-atom data can change sizes and ordering at
|
||||
every neighbor list rebuild or atom sort event as atoms migrate between
|
||||
sub-domains and processors.
|
||||
subdomains and processors.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
|
||||
@ -5,16 +5,17 @@ LAMMPS Documentation (|version| version)
|
||||
LAMMPS stands for **L**\ arge-scale **A**\ tomic/**M**\ olecular
|
||||
**M**\ assively **P**\ arallel **S**\ imulator.
|
||||
|
||||
LAMMPS is a classical molecular dynamics simulation code with a focus
|
||||
on materials modeling. It was designed to run efficiently on parallel
|
||||
computers. It was developed originally at Sandia National
|
||||
Laboratories, a US Department of Energy facility. The majority of
|
||||
funding for LAMMPS has come from the US Department of Energy (DOE).
|
||||
LAMMPS is an open-source code, distributed freely under the terms of
|
||||
the GNU Public License Version 2 (GPLv2).
|
||||
LAMMPS is a classical molecular dynamics simulation code focusing on
|
||||
materials modeling. It was designed to run efficiently on parallel
|
||||
computers and to be easy to extend and modify. Originally developed at
|
||||
Sandia National Laboratories, a US Department of Energy facility, LAMMPS
|
||||
now includes contributions from many research groups and individuals
|
||||
from many institutions. Most of the funding for LAMMPS has come from
|
||||
the US Department of Energy (DOE). LAMMPS is open-source software
|
||||
distributed under the terms of the GNU Public License Version 2 (GPLv2).
|
||||
|
||||
The `LAMMPS website <lws_>`_ has a variety of information about the
|
||||
code. It includes links to an on-line version of this manual, an
|
||||
code. It includes links to an online version of this manual, an
|
||||
`online forum <https://www.lammps.org/forum.html>`_ where users can post
|
||||
questions and discuss LAMMPS, and a `GitHub site
|
||||
<https://github.com/lammps/lammps>`_ where all LAMMPS development is
|
||||
@ -26,14 +27,14 @@ The content for this manual is part of the LAMMPS distribution. The
|
||||
online version always corresponds to the latest feature release version.
|
||||
If needed, you can build a local copy of the manual as HTML pages or a
|
||||
PDF file by following the steps on the :doc:`Build_manual` page. If you
|
||||
have difficulties viewing the pages please :ref:`see this note
|
||||
have difficulties viewing the pages, please :ref:`see this note
|
||||
<webbrowser>`.
|
||||
|
||||
-----------
|
||||
|
||||
The manual is organized in three parts:
|
||||
The manual is organized into three parts:
|
||||
|
||||
1. the :ref:`User Guide <user_documentation>` with information about how
|
||||
1. The :ref:`User Guide <user_documentation>` with information about how
|
||||
to obtain, configure, compile, install, and use LAMMPS,
|
||||
2. the :ref:`Programmer Guide <programmer_documentation>` with
|
||||
information about how to use the LAMMPS library interface from
|
||||
@ -47,7 +48,7 @@ The manual is organized in three parts:
|
||||
|
||||
.. only:: html
|
||||
|
||||
Once you are familiar with LAMMPS, you may want to bookmark
|
||||
After becoming familiar with LAMMPS, consider bookmarking
|
||||
:doc:`this page <Commands_all>`, since it gives quick access to
|
||||
tables with links to the documentation for all LAMMPS commands.
|
||||
|
||||
|
||||
@ -2,43 +2,44 @@ What does a LAMMPS version mean
|
||||
-------------------------------
|
||||
|
||||
The LAMMPS "version" is the date when it was released, such as 1 May
|
||||
2014. LAMMPS is updated continuously and we aim to keep it working
|
||||
2014. LAMMPS is updated continuously, and we aim to keep it working
|
||||
correctly and reliably at all times. You can follow its development
|
||||
in a public `git repository on GitHub <https://github.com/lammps/lammps>`_.
|
||||
|
||||
Modifications of the LAMMPS source code - like bug fixes, code
|
||||
refactors, updates to existing features, or addition of new features -
|
||||
are organized into pull requests, and will be merged into the *develop*
|
||||
branch of the git repository when they pass automated testing and code
|
||||
Modifications of the LAMMPS source code (like bug fixes, code refactors,
|
||||
updates to existing features, or addition of new features) are organized
|
||||
into pull requests. Pull requests will be merged into the *develop*
|
||||
branch of the git repository after they pass automated testing and code
|
||||
review by the LAMMPS developers. When a sufficient number of changes
|
||||
have accumulated *and* the software passes a set of automated tests, we
|
||||
release it as a *feature release* (or patch release), which are
|
||||
currently made every 4-8 weeks. The *release* branch of the git
|
||||
repository is updated with every such release. A summary of the most
|
||||
important changes of the patch releases are on `this website page
|
||||
<https://www.lammps.org/bug.html>`_. More detailed release notes are
|
||||
`available on GitHub <https://github.com/lammps/lammps/releases/>`_.
|
||||
have accumulated *and* the *develop* branch version passes an extended
|
||||
set of automated tests, we release it as a *feature release* (or patch
|
||||
release), which are currently made every 4 to 8 weeks. The *release*
|
||||
branch of the git repository is updated with every such release. A
|
||||
summary of the most important changes of the patch releases are on `this
|
||||
website page <https://www.lammps.org/bug.html>`_. More detailed release
|
||||
notes are `available on GitHub
|
||||
<https://github.com/lammps/lammps/releases/>`_.
|
||||
|
||||
Once or twice a year, we have a "stabilization period" where we apply
|
||||
only bug fixes and small, non-intrusive changes to the *develop*
|
||||
branch. At the same time the code is subjected to more detailed and
|
||||
thorough manual testing than the default automated testing. Also
|
||||
branch. At the same time, the code is subjected to more detailed and
|
||||
thorough manual testing than the default automated testing. Also,
|
||||
several variants of static code analysis are run to improve the overall
|
||||
code quality, consistency, and compliance with programming standards,
|
||||
best practices and style conventions.
|
||||
|
||||
The latest patch release after such a period is then also labeled as a
|
||||
*stable* version and the *stable* branch is updated with it. Between
|
||||
stable releases we occasionally release updates to the stable release
|
||||
stable releases, we occasionally release updates to the stable release
|
||||
containing only bug fixes and updates back-ported from the *develop*
|
||||
branch and update the *stable* branch accordingly.
|
||||
|
||||
Each version of LAMMPS contains all the documented features up to and
|
||||
including its version date. For recently added features we add markers
|
||||
including its version date. For recently added features, we add markers
|
||||
to the documentation at which specific LAMMPS version a feature or
|
||||
keyword was added or significantly changed.
|
||||
|
||||
The version date is printed to the screen and logfile every time you run
|
||||
The version date is printed to the screen and log file every time you run
|
||||
LAMMPS. It is also in the file src/version.h and in the LAMMPS
|
||||
directory name created when you unpack a tarball. And it is on the
|
||||
first page of the :doc:`manual <Manual>`.
|
||||
|
||||
@ -23,7 +23,7 @@ against invalid accesses.
|
||||
When accessing per-atom data,
|
||||
please note that this data is the per-processor local data and indexed
|
||||
accordingly. These arrays can change sizes and order at every neighbor list
|
||||
rebuild and atom sort event as atoms are migrating between sub-domains.
|
||||
rebuild and atom sort event as atoms are migrating between subdomains.
|
||||
|
||||
.. tabs::
|
||||
|
||||
|
||||
@ -23,7 +23,7 @@ against invalid accesses.
|
||||
When accessing per-atom data,
|
||||
please note that this data is the per-processor local data and indexed
|
||||
accordingly. These arrays can change sizes and order at every neighbor list
|
||||
rebuild and atom sort event as atoms are migrating between sub-domains.
|
||||
rebuild and atom sort event as atoms are migrating between subdomains.
|
||||
|
||||
.. tabs::
|
||||
|
||||
|
||||
@ -9,7 +9,7 @@ There are two thrusts to the discussion that follows. The first is
|
||||
using code options that implement alternate algorithms that can
|
||||
speed-up a simulation. The second is to use one of the several
|
||||
accelerator packages provided with LAMMPS that contain code optimized
|
||||
for certain kinds of hardware, including multi-core CPUs, GPUs, and
|
||||
for certain kinds of hardware, including multicore CPUs, GPUs, and
|
||||
Intel Xeon Phi co-processors.
|
||||
|
||||
The `Benchmark page <https://www.lammps.org/bench.html>`_ of the LAMMPS
|
||||
|
||||
@ -11,7 +11,7 @@ parts of the :doc:`kspace_style pppm <kspace_style>` for long-range
|
||||
Coulombics. It has the following general features:
|
||||
|
||||
* It is designed to exploit common GPU hardware configurations where one
|
||||
or more GPUs are coupled to many cores of one or more multi-core CPUs,
|
||||
or more GPUs are coupled to many cores of one or more multicore CPUs,
|
||||
e.g. within a node of a parallel machine.
|
||||
* Atom-based data (e.g. coordinates, forces) are moved back-and-forth
|
||||
between the CPU(s) and GPU every timestep.
|
||||
@ -28,7 +28,7 @@ Coulombics. It has the following general features:
|
||||
* LAMMPS-specific code is in the GPU package. It makes calls to a
|
||||
generic GPU library in the lib/gpu directory. This library provides
|
||||
either Nvidia support, AMD support, or more general OpenCL support
|
||||
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multi-core CPUs).
|
||||
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs).
|
||||
so that the same functionality is supported on a variety of hardware.
|
||||
|
||||
**Required hardware/software:**
|
||||
@ -146,7 +146,7 @@ GPUs/node to use, as well as other options.
|
||||
|
||||
**Speed-ups to expect:**
|
||||
|
||||
The performance of a GPU versus a multi-core CPU is a function of your
|
||||
The performance of a GPU versus a multicore CPU is a function of your
|
||||
hardware, which pair style is used, the number of atoms/GPU, and the
|
||||
precision used on the GPU (double, single, mixed). Using the GPU package
|
||||
in OpenCL mode on CPUs (which uses vectorization and multithreading) is
|
||||
@ -174,7 +174,7 @@ deterministic results.
|
||||
**Guidelines for best performance:**
|
||||
|
||||
* Using multiple MPI tasks per GPU will often give the best performance,
|
||||
as allowed my most multi-core CPU/GPU configurations.
|
||||
as allowed my most multicore CPU/GPU configurations.
|
||||
* If the number of particles per MPI task is small (e.g. 100s of
|
||||
particles), it can be more efficient to run with fewer MPI tasks per
|
||||
GPU, even if you do not use all the cores on the compute node.
|
||||
|
||||
@ -79,7 +79,7 @@ manner via the ``mpirun`` or ``mpiexec`` commands, and is independent of
|
||||
Kokkos. E.g. the mpirun command in OpenMPI does this via its ``-np`` and
|
||||
``-npernode`` switches. Ditto for MPICH via ``-np`` and ``-ppn``.
|
||||
|
||||
Running on a multi-core CPU
|
||||
Running on a multicore CPU
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Here is a quick overview of how to use the KOKKOS package
|
||||
@ -254,7 +254,7 @@ is recommended in this scenario.
|
||||
|
||||
Using a GPU-aware MPI library is highly recommended. GPU-aware MPI use can be
|
||||
avoided by using :doc:`-pk kokkos gpu/aware off <package>`. As above for
|
||||
multi-core CPUs (and no GPU), if N is the number of physical cores/node,
|
||||
multicore CPUs (and no GPU), if N is the number of physical cores/node,
|
||||
then the number of MPI tasks/node should not exceed N.
|
||||
|
||||
.. parsed-literal::
|
||||
|
||||
@ -12,7 +12,7 @@ Required hardware/software
|
||||
""""""""""""""""""""""""""
|
||||
|
||||
To enable multi-threading, your compiler must support the OpenMP interface.
|
||||
You should have one or more multi-core CPUs, as multiple threads can only be
|
||||
You should have one or more multicore CPUs, as multiple threads can only be
|
||||
launched by each MPI task on the local node (using shared memory).
|
||||
|
||||
Building LAMMPS with the OPENMP package
|
||||
@ -157,7 +157,7 @@ Additional performance tips are as follows:
|
||||
affinity setting that restricts each MPI task to a single CPU core.
|
||||
Using multi-threading in this mode will force all threads to share the
|
||||
one core and thus is likely to be counterproductive. Instead, binding
|
||||
MPI tasks to a (multi-core) socket, should solve this issue.
|
||||
MPI tasks to a (multicore) socket, should solve this issue.
|
||||
|
||||
Restrictions
|
||||
""""""""""""
|
||||
|
||||
@ -113,7 +113,7 @@ your input script. LAMMPS does not use the group until a simulation
|
||||
is run.
|
||||
|
||||
The *sort* keyword turns on a spatial sorting or reordering of atoms
|
||||
within each processor's sub-domain every *Nfreq* timesteps. If
|
||||
within each processor's subdomain every *Nfreq* timesteps. If
|
||||
*Nfreq* is set to 0, then sorting is turned off. Sorting can improve
|
||||
cache performance and thus speed-up a LAMMPS simulation, as discussed
|
||||
in a paper by :ref:`(Meloni) <Meloni>`. Its efficacy depends on the problem
|
||||
|
||||
@ -54,7 +54,7 @@ Syntax
|
||||
*store* name = store weight in custom atom property defined by :doc:`fix property/atom <fix_property_atom>` command
|
||||
name = atom property name (without d\_ prefix)
|
||||
*out* arg = filename
|
||||
filename = write each processor's sub-domain to a file
|
||||
filename = write each processor's subdomain to a file
|
||||
|
||||
Examples
|
||||
""""""""
|
||||
@ -72,14 +72,14 @@ Examples
|
||||
Description
|
||||
"""""""""""
|
||||
|
||||
This command adjusts the size and shape of processor sub-domains
|
||||
This command adjusts the size and shape of processor subdomains
|
||||
within the simulation box, to attempt to balance the number of atoms
|
||||
or particles and thus indirectly the computational cost (load) more
|
||||
evenly across processors. The load balancing is "static" in the sense
|
||||
that this command performs the balancing once, before or between
|
||||
simulations. The processor sub-domains will then remain static during
|
||||
simulations. The processor subdomains will then remain static during
|
||||
the subsequent run. To perform "dynamic" balancing, see the :doc:`fix
|
||||
balance <fix_balance>` command, which can adjust processor sub-domain
|
||||
balance <fix_balance>` command, which can adjust processor subdomain
|
||||
sizes and shapes on-the-fly during a :doc:`run <run>`.
|
||||
|
||||
Load-balancing is typically most useful if the particles in the
|
||||
@ -90,7 +90,7 @@ an irregular-shaped geometry containing void regions, or :doc:`hybrid
|
||||
pair style simulations <pair_hybrid>` which combine pair styles with
|
||||
different computational cost. In these cases, the LAMMPS default of
|
||||
dividing the simulation box volume into a regular-spaced grid of 3d
|
||||
bricks, with one equal-volume sub-domain per processor, may assign
|
||||
bricks, with one equal-volume subdomain per processor, may assign
|
||||
numbers of particles per processor in a way that the computational
|
||||
effort varies significantly. This can lead to poor performance when
|
||||
the simulation is run in parallel.
|
||||
@ -109,7 +109,7 @@ Specifically, for a Px by Py by Pz grid of processors, it allows
|
||||
choice of Px, Py, and Pz, subject to the constraint that Px \* Py \*
|
||||
Pz = P, the total number of processors. This is sufficient to achieve
|
||||
good load-balance for some problems on some processor counts.
|
||||
However, all the processor sub-domains will still have the same shape
|
||||
However, all the processor subdomains will still have the same shape
|
||||
and same volume.
|
||||
|
||||
The requested load-balancing operation is only performed if the
|
||||
@ -162,7 +162,7 @@ fractions of the box length) are also printed.
|
||||
simulation could run up to 20% faster if it were perfectly balanced,
|
||||
versus when imbalanced. However, computational cost is not strictly
|
||||
proportional to particle count, and changing the relative size and
|
||||
shape of processor sub-domains may lead to additional computational
|
||||
shape of processor subdomains may lead to additional computational
|
||||
and communication overheads, e.g. in the PPPM solver used via the
|
||||
:doc:`kspace_style <kspace_style>` command. Thus you should benchmark
|
||||
the run times of a simulation before and after balancing.
|
||||
@ -177,7 +177,7 @@ The *x*, *y*, *z*, and *shift* styles are "grid" methods which
|
||||
produce a logical 3d grid of processors. They operate by changing the
|
||||
cutting planes (or lines) between processors in 3d (or 2d), to adjust
|
||||
the volume (area in 2d) assigned to each processor, as in the
|
||||
following 2d diagram where processor sub-domains are shown and
|
||||
following 2d diagram where processor subdomains are shown and
|
||||
particles are colored by the processor that owns them.
|
||||
|
||||
.. |balance1| image:: img/balance_uniform.jpg
|
||||
@ -226,7 +226,7 @@ The *x*, *y*, and *z* styles invoke a "grid" method for balancing, as
|
||||
described above. Note that any or all of these 3 styles can be
|
||||
specified together, one after the other, but they cannot be used with
|
||||
any other style. This style adjusts the position of cutting planes
|
||||
between processor sub-domains in specific dimensions. Only the
|
||||
between processor subdomains in specific dimensions. Only the
|
||||
specified dimensions are altered.
|
||||
|
||||
The *uniform* argument spaces the planes evenly, as in the left
|
||||
@ -245,8 +245,8 @@ the cutting place. The left (or lower) edge of the box is 0.0, and
|
||||
the right (or upper) edge is 1.0. Neither of these values is
|
||||
specified. Only the interior Ps-1 positions are specified. Thus is
|
||||
there are 2 processors in the x dimension, you specify a single value
|
||||
such as 0.75, which would make the left processor's sub-domain 3x
|
||||
larger than the right processor's sub-domain.
|
||||
such as 0.75, which would make the left processor's subdomain 3x
|
||||
larger than the right processor's subdomain.
|
||||
|
||||
----------
|
||||
|
||||
@ -288,10 +288,10 @@ adjacent planes are closer together than the neighbor skin distance
|
||||
(as specified by the :doc:`neigh_modify <neigh_modify>` command), then
|
||||
the plane positions are shifted to separate them by at least this
|
||||
amount. This is to prevent particles being lost when dynamics are run
|
||||
with processor sub-domains that are too narrow in one or more
|
||||
with processor subdomains that are too narrow in one or more
|
||||
dimensions.
|
||||
|
||||
Once the re-balancing is complete and final processor sub-domains
|
||||
Once the re-balancing is complete and final processor subdomains
|
||||
assigned, particles are migrated to their new owning processor, and
|
||||
the balance procedure ends.
|
||||
|
||||
@ -299,7 +299,7 @@ the balance procedure ends.
|
||||
|
||||
At each re-balance operation, the bisectioning for each cutting
|
||||
plane (line in 2d) typically starts with low and high bounds separated
|
||||
by the extent of a processor's sub-domain in one dimension. The size
|
||||
by the extent of a processor's subdomain in one dimension. The size
|
||||
of this bracketing region shrinks by 1/2 every iteration. Thus if
|
||||
*Niter* is specified as 10, the cutting plane will typically be
|
||||
positioned to 1 part in 1000 accuracy (relative to the perfect target
|
||||
@ -494,7 +494,7 @@ different kinds of custom atom vectors or arrays as arguments.
|
||||
|
||||
The *out* keyword writes a text file to the specified *filename* with
|
||||
the results of the balancing operation. The file contains the bounds
|
||||
of the sub-domain for each processor after the balancing operation
|
||||
of the subdomain for each processor after the balancing operation
|
||||
completes. The format of the file is compatible with the
|
||||
`Pizza.py <pizza_>`_ *mdump* tool which has support for manipulating and
|
||||
visualizing mesh files. An example is shown here for a balancing by 4
|
||||
@ -538,7 +538,7 @@ processors for a 2d problem:
|
||||
4 1 13 14 15 16
|
||||
|
||||
The coordinates of all the vertices are listed in the NODES section, 5
|
||||
per processor. Note that the 4 sub-domains share vertices, so there
|
||||
per processor. Note that the 4 subdomains share vertices, so there
|
||||
will be duplicate nodes in the list.
|
||||
|
||||
The "SQUARES" section lists the node IDs of the 4 vertices in a
|
||||
|
||||
@ -61,7 +61,7 @@ move. Note that when the difference between the current box dimensions
|
||||
and the shrink-wrap box dimensions is large, this can lead to lost
|
||||
atoms at the beginning of a run when running in parallel. This is due
|
||||
to the large change in the (global) box dimensions also causing
|
||||
significant changes in the individual sub-domain sizes. If these
|
||||
significant changes in the individual subdomain sizes. If these
|
||||
changes are farther than the communication cutoff, atoms will be lost.
|
||||
This is best addressed by setting initial box dimensions to match the
|
||||
shrink-wrapped dimensions more closely, by using *m* style boundaries
|
||||
|
||||
@ -62,7 +62,7 @@ distances are used to determine which atoms to communicate.
|
||||
|
||||
The default mode is *single* which means each processor acquires
|
||||
information for ghost atoms that are within a single distance from its
|
||||
sub-domain. The distance is by default the maximum of the neighbor
|
||||
subdomain. The distance is by default the maximum of the neighbor
|
||||
cutoff across all atom type pairs.
|
||||
|
||||
For many systems this is an efficient algorithm, but for systems with
|
||||
@ -81,7 +81,7 @@ with both the *multi* and *multi/old* neighbor styles.
|
||||
|
||||
The *cutoff* keyword allows you to extend the ghost cutoff distance
|
||||
for communication mode *single*, which is the distance from the borders
|
||||
of a processor's sub-domain at which ghost atoms are acquired from other
|
||||
of a processor's subdomain at which ghost atoms are acquired from other
|
||||
processors. By default the ghost cutoff = neighbor cutoff = pairwise
|
||||
force cutoff + neighbor skin. See the :doc:`neighbor <neighbor>` command
|
||||
for more information about the skin distance. If the specified Rcut is
|
||||
|
||||
@ -54,7 +54,7 @@ per atom, e.g. a list of bond distances. Per-grid quantities are
|
||||
calculated on a regular 2d or 3d grid which overlays a 2d or 3d
|
||||
simulation domain. The grid points and the data they store are
|
||||
distributed across processors; each processor owns the grid points
|
||||
which fall within its sub-domain.
|
||||
which fall within its subdomain.
|
||||
|
||||
Computes that produce per-atom quantities have the word "atom" at the
|
||||
end of their style, e.g. *ke/atom*\ . Computes that produce local
|
||||
|
||||
@ -48,9 +48,9 @@ the virial, equal to :math:`-dU/dV`, computed for all pairwise as well
|
||||
as 2-body, 3-body, 4-body, many-body, and long-range interactions, where
|
||||
:math:`\vec r_i` and :math:`\vec f_i` are the position and force vector
|
||||
of atom *i*, and the dot indicates the dot product (scalar product).
|
||||
This is computed in parallel for each sub-domain and then summed over
|
||||
This is computed in parallel for each subdomain and then summed over
|
||||
all parallel processes. Thus :math:`N'` necessarily includes atoms from
|
||||
neighboring sub-domains (so-called ghost atoms) and the position and
|
||||
neighboring subdomains (so-called ghost atoms) and the position and
|
||||
force vectors of ghost atoms are thus included in the summation. Only
|
||||
when running in serial and without periodic boundary conditions is
|
||||
:math:`N' = N` the number of atoms in the system. :doc:`Fixes <fix>`
|
||||
|
||||
@ -39,7 +39,7 @@ Description
|
||||
Define a computation that stores the specified attributes of a
|
||||
distributed grid. In LAMMPS, distributed grids are regular 2d or 3d
|
||||
grids which overlay a 2d or 3d simulation domain. Each processor owns
|
||||
the grid cells whose center points lie within its sub-domain. See the
|
||||
the grid cells whose center points lie within its subdomain. See the
|
||||
:doc:`Howto grid <Howto_grid>` doc page for details of how distributed
|
||||
grids can be defined by various commands and referenced.
|
||||
|
||||
|
||||
@ -259,7 +259,7 @@ layout in the global array.
|
||||
Compute *sna/grid/local* calculates bispectrum components of a regular
|
||||
grid of points similarly to compute *sna/grid* described above.
|
||||
However, because the array is local, it contains only rows for grid points
|
||||
that are local to the processor sub-domain. The global grid
|
||||
that are local to the processor subdomain. The global grid
|
||||
of :math:`nx \times ny \times nz` points is still laid out in space the same as for *sna/grid*,
|
||||
but grid points are strictly partitioned, so that every grid point appears in
|
||||
one and only one local array. The array contains one row for each of the
|
||||
|
||||
@ -80,9 +80,9 @@ Syntax
|
||||
axes = *yes* or *no* = do or do not draw xyz axes lines next to simulation box
|
||||
length = length of axes lines as fraction of respective box lengths
|
||||
diam = diameter of axes lines as fraction of shortest box length
|
||||
*subbox* values = lines diam = draw outline of processor sub-domains
|
||||
lines = *yes* or *no* = do or do not draw sub-domain lines
|
||||
diam = diameter of sub-domain lines as fraction of shortest box length
|
||||
*subbox* values = lines diam = draw outline of processor subdomains
|
||||
lines = *yes* or *no* = do or do not draw subdomain lines
|
||||
diam = diameter of subdomain lines as fraction of shortest box length
|
||||
*shiny* value = sfactor = shinyness of spheres and cylinders
|
||||
sfactor = shinyness of spheres and cylinders from 0.0 to 1.0
|
||||
*ssao* value = shading seed dfactor = SSAO depth shading
|
||||
@ -145,7 +145,7 @@ Syntax
|
||||
*bitrate* arg = rate
|
||||
rate = target bitrate for movie in kbps
|
||||
*boxcolor* arg = color
|
||||
color = name of color for simulation box lines and processor sub-domain lines
|
||||
color = name of color for simulation box lines and processor subdomain lines
|
||||
*color* args = name R G B
|
||||
name = name of color
|
||||
R,G,B = red/green/blue numeric values from 0.0 to 1.0
|
||||
@ -581,13 +581,13 @@ respective box lengths. The *diam* setting determines their thickness
|
||||
as a fraction of the shortest box length in x,y,z (for 3d) or x,y (for
|
||||
2d).
|
||||
|
||||
The *subbox* keyword determines if and how processor sub-domain
|
||||
The *subbox* keyword determines if and how processor subdomain
|
||||
boundaries are rendered as thin cylinders in the image. If *no* is
|
||||
set (default), then the sub-domain boundaries are not drawn and the
|
||||
set (default), then the subdomain boundaries are not drawn and the
|
||||
*diam* setting is ignored. If *yes* is set, the 12 edges of each
|
||||
processor sub-domain are drawn, with a diameter that is a fraction of
|
||||
processor subdomain are drawn, with a diameter that is a fraction of
|
||||
the shortest box length in x,y,z (for 3d) or x,y (for 2d). The color
|
||||
of the sub-domain boundaries can be set with the "dump_modify
|
||||
of the subdomain boundaries can be set with the "dump_modify
|
||||
boxcolor" command.
|
||||
|
||||
----------
|
||||
@ -921,8 +921,8 @@ formats.
|
||||
|
||||
The *boxcolor* keyword sets the color of the simulation box drawn
|
||||
around the atoms in each image as well as the color of processor
|
||||
sub-domain boundaries. See the "dump image box" command for how to
|
||||
specify that a box be drawn via the *box* keyword, and the sub-domain
|
||||
subdomain boundaries. See the "dump image box" command for how to
|
||||
specify that a box be drawn via the *box* keyword, and the subdomain
|
||||
boundaries via the *subbox* keyword. The color name can be any of the
|
||||
140 pre-defined colors (see below) or a color name defined by the
|
||||
dump_modify color option.
|
||||
|
||||
@ -89,7 +89,7 @@ owns, but there may be zero or more per atoms. Per-grid quantities
|
||||
are calculated on a regular 2d or 3d grid which overlays a 2d or 3d
|
||||
simulation domain. The grid points and the data they store are
|
||||
distributed across processors; each processor owns the grid points
|
||||
which fall within its sub-domain.
|
||||
which fall within its subdomain.
|
||||
|
||||
Note that a single fix typically produces either global or per-atom or
|
||||
local or per-grid values (or none at all). It does not produce both
|
||||
|
||||
@ -84,7 +84,7 @@ produced by other computes or fixes. This fix operates in either
|
||||
per-grid inputs in the same command.
|
||||
|
||||
The grid created by this command is distributed; each processor owns
|
||||
the grid points that are within its sub-domain. This is similar to
|
||||
the grid points that are within its subdomain. This is similar to
|
||||
the :doc:`fix ave/chunk <fix_ave_chunk>` command when it uses chunks
|
||||
from the :doc:`compute chunk/atom <compute_chunk_atom>` command which
|
||||
are 2d or 3d regular bins. However, the per-bin outputs in that case
|
||||
|
||||
@ -44,7 +44,7 @@ Syntax
|
||||
*store* name = store weight in custom atom property defined by :doc:`fix property/atom <fix_property_atom>` command
|
||||
name = atom property name (without d\_ prefix)
|
||||
*out* arg = filename
|
||||
filename = write each processor's sub-domain to a file, at each re-balancing
|
||||
filename = write each processor's subdomain to a file, at each re-balancing
|
||||
|
||||
Examples
|
||||
""""""""
|
||||
@ -61,7 +61,7 @@ Examples
|
||||
Description
|
||||
"""""""""""
|
||||
|
||||
This command adjusts the size and shape of processor sub-domains
|
||||
This command adjusts the size and shape of processor subdomains
|
||||
within the simulation box, to attempt to balance the number of
|
||||
particles and thus the computational cost (load) evenly across
|
||||
processors. The load balancing is "dynamic" in the sense that
|
||||
@ -77,7 +77,7 @@ an irregular-shaped geometry containing void regions, or
|
||||
:doc:`hybrid pair style simulations <pair_hybrid>` that combine
|
||||
pair styles with different computational cost). In these cases, the
|
||||
LAMMPS default of dividing the simulation box volume into a
|
||||
regular-spaced grid of 3d bricks, with one equal-volume sub-domain
|
||||
regular-spaced grid of 3d bricks, with one equal-volume subdomain
|
||||
per processor, may assign numbers of particles per processor in a
|
||||
way that the computational effort varies significantly. This can
|
||||
lead to poor performance when the simulation is run in parallel.
|
||||
@ -105,7 +105,7 @@ a :math:`P_x \times P_y \times P_z` grid of processors, it allows choices of
|
||||
:math:`P_x P_y P_z = P`, the total number of processors.
|
||||
This is sufficient to achieve good load-balance for
|
||||
some problems on some processor counts. However, all the processor
|
||||
sub-domains will still have the same shape and the same volume.
|
||||
subdomains will still have the same shape and the same volume.
|
||||
|
||||
On a particular time step, a load-balancing operation is only performed
|
||||
if the current "imbalance factor" in particles owned by each processor
|
||||
@ -141,7 +141,7 @@ forced even if the current balance is perfect (1.0) be specifying a
|
||||
simulation could run up to 20% faster if it were perfectly balanced,
|
||||
versus when imbalanced. However, computational cost is not strictly
|
||||
proportional to particle count, and changing the relative size and
|
||||
shape of processor sub-domains may lead to additional computational
|
||||
shape of processor subdomains may lead to additional computational
|
||||
and communication overheads (e.g., in the PPPM solver used via the
|
||||
:doc:`kspace_style <kspace_style>` command). Thus, you should benchmark
|
||||
the run times of a simulation before and after balancing.
|
||||
@ -156,7 +156,7 @@ The *shift* style is a "grid" method which produces a logical 3d grid
|
||||
of processors. It operates by changing the cutting planes (or lines)
|
||||
between processors in 3d (or 2d), to adjust the volume (area in 2d)
|
||||
assigned to each processor, as in the following 2d diagram where
|
||||
processor sub-domains are shown and atoms are colored by the processor
|
||||
processor subdomains are shown and atoms are colored by the processor
|
||||
that owns them.
|
||||
|
||||
.. |balance1| image:: img/balance_uniform.jpg
|
||||
@ -258,7 +258,7 @@ from balanced, and converge more slowly. In this case you probably
|
||||
want to use the :doc:`balance <balance>` command before starting a run,
|
||||
so that you begin the run with a balanced system.
|
||||
|
||||
Once the re-balancing is complete and final processor sub-domains
|
||||
Once the re-balancing is complete and final processor subdomains
|
||||
assigned, particles migrate to their new owning processor as part of
|
||||
the normal reneighboring procedure.
|
||||
|
||||
@ -266,7 +266,7 @@ the normal reneighboring procedure.
|
||||
|
||||
At each re-balance operation, the bisectioning for each cutting
|
||||
plane (line in 2d) typically starts with low and high bounds separated
|
||||
by the extent of a processor's sub-domain in one dimension. The size
|
||||
by the extent of a processor's subdomain in one dimension. The size
|
||||
of this bracketing region shrinks based on the local density, as
|
||||
described above, which should typically be 1/2 or more every
|
||||
iteration. Thus if :math:`N_\text{iter}` is specified as 10, the cutting
|
||||
@ -310,7 +310,7 @@ in that sub-box.
|
||||
|
||||
The *out* keyword writes text to the specified *filename* with the
|
||||
results of each re-balancing operation. The file contains the bounds
|
||||
of the sub-domain for each processor after the balancing operation
|
||||
of the subdomain for each processor after the balancing operation
|
||||
completes. The format of the file is compatible with the
|
||||
`Pizza.py <pizza_>`_ *mdump* tool which has support for manipulating and
|
||||
visualizing mesh files. An example is shown here for a balancing by four
|
||||
@ -354,7 +354,7 @@ processors for a 2d problem:
|
||||
4 1 13 14 15 16
|
||||
|
||||
The coordinates of all the vertices are listed in the NODES section, five
|
||||
per processor. Note that the four sub-domains share vertices, so there
|
||||
per processor. Note that the four subdomains share vertices, so there
|
||||
will be duplicate nodes in the list.
|
||||
|
||||
The "SQUARES" section lists the node IDs of the four vertices in a
|
||||
|
||||
@ -118,7 +118,7 @@ displaced by the same amount, different on each iteration.
|
||||
all. Also note that if the box shape tilts to an extreme shape,
|
||||
LAMMPS will run less efficiently, due to the large volume of
|
||||
communication needed to acquire ghost atoms around a processor's
|
||||
irregular-shaped sub-domain. For extreme values of tilt, LAMMPS may
|
||||
irregular-shaped subdomain. For extreme values of tilt, LAMMPS may
|
||||
also lose atoms and generate an error.
|
||||
|
||||
.. note::
|
||||
|
||||
@ -546,7 +546,7 @@ flipping the box when it is exceeded. If the *flip* value is set to
|
||||
you apply large deformations, this means the box shape can tilt
|
||||
dramatically LAMMPS will run less efficiently, due to the large volume
|
||||
of communication needed to acquire ghost atoms around a processor's
|
||||
irregular-shaped sub-domain. For extreme values of tilt, LAMMPS may
|
||||
irregular-shaped subdomain. For extreme values of tilt, LAMMPS may
|
||||
also lose atoms and generate an error.
|
||||
|
||||
The *units* keyword determines the meaning of the distance units used
|
||||
|
||||
@ -198,7 +198,7 @@ dt}{\rho dx^2}` is approximately equal to 1.
|
||||
and a simulation domain size. This fix uses the same subdivision of
|
||||
the simulation domain among processors as the main LAMMPS program. In
|
||||
order to uniformly cover the simulation domain with lattice sites, the
|
||||
lengths of the individual LAMMPS sub-domains must all be evenly
|
||||
lengths of the individual LAMMPS subdomains must all be evenly
|
||||
divisible by :math:`dx_{LB}`. If the simulation domain size is cubic,
|
||||
with equal lengths in all dimensions, and the default value for
|
||||
:math:`dx_{LB}` is used, this will automatically be satisfied.
|
||||
|
||||
@ -371,7 +371,7 @@ flipping the box when it is exceeded. If the *flip* value is set to
|
||||
applied stress induces large deformations (e.g. in a liquid), this
|
||||
means the box shape can tilt dramatically and LAMMPS will run less
|
||||
efficiently, due to the large volume of communication needed to
|
||||
acquire ghost atoms around a processor's irregular-shaped sub-domain.
|
||||
acquire ghost atoms around a processor's irregular-shaped subdomain.
|
||||
For extreme values of tilt, LAMMPS may also lose atoms and generate an
|
||||
error.
|
||||
|
||||
|
||||
@ -311,7 +311,7 @@ flipping the box when it is exceeded. If the *flip* value is set to
|
||||
applied stress induces large deformations (e.g. in a liquid), this
|
||||
means the box shape can tilt dramatically and LAMMPS will run less
|
||||
efficiently, due to the large volume of communication needed to
|
||||
acquire ghost atoms around a processor's irregular-shaped sub-domain.
|
||||
acquire ghost atoms around a processor's irregular-shaped subdomain.
|
||||
For extreme values of tilt, LAMMPS may also lose atoms and generate an
|
||||
error.
|
||||
|
||||
|
||||
@ -69,7 +69,7 @@ geometries.
|
||||
This fix must be used with an additional fix that specifies time
|
||||
integration, e.g. :doc:`fix nve <fix_nve>` or :doc:`fix nph <fix_nh>`.
|
||||
|
||||
The Shardlow splitting algorithm requires the sizes of the sub-domain
|
||||
The Shardlow splitting algorithm requires the sizes of the subdomain
|
||||
lengths to be larger than twice the cutoff+skin. Generally, the
|
||||
domain decomposition is dependent on the number of processors
|
||||
requested.
|
||||
|
||||
@ -90,7 +90,7 @@ The description in this sub-section applies to all 3 fix styles:
|
||||
*ttm*, *ttm/grid*, and *ttm/mod*.
|
||||
|
||||
Fix *ttm/grid* distributes the regular grid across processors consistent
|
||||
with the sub-domains of atoms owned by each processor, but is otherwise
|
||||
with the subdomains of atoms owned by each processor, but is otherwise
|
||||
identical to fix ttm. Note that fix *ttm* stores a copy of the grid on
|
||||
each processor, which is acceptable when the overall grid is reasonably
|
||||
small. For larger grids you should use fix *ttm/grid* instead.
|
||||
@ -170,11 +170,11 @@ ttm/mod.
|
||||
periodic boundary conditions in all dimensions. They also require
|
||||
that the size and shape of the simulation box do not vary
|
||||
dynamically, e.g. due to use of the :doc:`fix npt <fix_nh>` command.
|
||||
Likewise, the size/shape of processor sub-domains cannot vary due to
|
||||
Likewise, the size/shape of processor subdomains cannot vary due to
|
||||
dynamic load-balancing via use of the :doc:`fix balance
|
||||
<fix_balance>` command. It is possible however to load balance
|
||||
before the simulation starts using the :doc:`balance <balance>`
|
||||
command, so that each processor has a different size sub-domain.
|
||||
command, so that each processor has a different size subdomain.
|
||||
|
||||
Periodic boundary conditions are also used in the heat equation solve
|
||||
for the electronic subsystem. This varies from the approach of
|
||||
|
||||
@ -399,7 +399,7 @@ automatically throughout the run. This typically give performance
|
||||
within 5 to 10 percent of the optimal fixed fraction.
|
||||
|
||||
The *ghost* keyword determines whether or not ghost atoms, i.e. atoms
|
||||
at the boundaries of processor sub-domains, are offloaded for neighbor
|
||||
at the boundaries of processor subdomains, are offloaded for neighbor
|
||||
and force calculations. When the value = "no", ghost atoms are not
|
||||
offloaded. This option can reduce the amount of data transfer with
|
||||
the co-processor and can also overlap MPI communication of forces with
|
||||
@ -521,7 +521,7 @@ the comm keywords.
|
||||
The value options for the keywords are *no* or *host* or *device*\ . A
|
||||
value of *no* means to use the standard non-KOKKOS method of
|
||||
packing/unpacking data for the communication. A value of *host* means to
|
||||
use the host, typically a multi-core CPU, and perform the
|
||||
use the host, typically a multicore CPU, and perform the
|
||||
packing/unpacking in parallel with threads. A value of *device* means to
|
||||
use the device, typically a GPU, to perform the packing/unpacking
|
||||
operation.
|
||||
|
||||
@ -56,7 +56,7 @@ commands:
|
||||
The global DSMC *max_cell_size* determines the maximum cell length
|
||||
used in the DSMC calculation. A structured mesh is overlayed on the
|
||||
simulation box such that an integer number of cells are created in
|
||||
each direction for each processor's sub-domain. Cell lengths are
|
||||
each direction for each processor's subdomain. Cell lengths are
|
||||
adjusted up to the user-specified maximum cell size.
|
||||
|
||||
----------
|
||||
|
||||
@ -31,7 +31,7 @@ and the neighbor skin distance (see the documentation of the
|
||||
<comm_modify>` command). When you have bonds, angles, dihedrals, or
|
||||
impropers defined at the same time, you must set the communication
|
||||
cutoff so that communication cutoff distance is large enough to acquire
|
||||
and communicate sufficient ghost atoms from neighboring sub-domains as
|
||||
and communicate sufficient ghost atoms from neighboring subdomains as
|
||||
needed for computing bonds, angles, etc.
|
||||
|
||||
A pair style of *none* will also not request a pairwise neighbor list.
|
||||
|
||||
@ -66,7 +66,7 @@ parameters can be specified with an asterisk "\*", which means LAMMPS
|
||||
will choose the number of processors in that dimension of the grid.
|
||||
It will do this based on the size and shape of the global simulation
|
||||
box so as to minimize the surface-to-volume ratio of each processor's
|
||||
sub-domain.
|
||||
subdomain.
|
||||
|
||||
Choosing explicit values for Px or Py or Pz can be used to override
|
||||
the default manner in which LAMMPS will create the regular 3d grid of
|
||||
@ -81,7 +81,7 @@ equal 1.
|
||||
Note that if you run on a prime number of processors P, then a grid
|
||||
such as 1 x P x 1 will be required, which may incur extra
|
||||
communication costs due to the high surface area of each processor's
|
||||
sub-domain.
|
||||
subdomain.
|
||||
|
||||
Also note that if multiple partitions are being used then P is the
|
||||
number of processors in this partition; see the :doc:`-partition command-line switch <Run_options>` page for details. Also note
|
||||
@ -113,10 +113,10 @@ will persist for all simulations. If balancing is performed, some of
|
||||
the methods invoked by those commands retain the logical topology of
|
||||
the initial 3d grid, and the mapping of processors to the grid
|
||||
specified by the processors command. However the grid spacings in
|
||||
different dimensions may change, so that processors own sub-domains of
|
||||
different dimensions may change, so that processors own subdomains of
|
||||
different sizes. If the :doc:`comm_style tiled <comm_style>` command is
|
||||
used, methods invoked by the balancing commands may discard the 3d
|
||||
grid of processors and tile the simulation domain with sub-domains of
|
||||
grid of processors and tile the simulation domain with subdomains of
|
||||
different sizes and shapes which no longer have a logical 3d
|
||||
connectivity. If that occurs, all the information specified by the
|
||||
processors command is ignored.
|
||||
@ -129,7 +129,7 @@ processors.
|
||||
|
||||
The *onelevel* style creates a 3d grid that is compatible with the
|
||||
Px,Py,Pz settings, and which minimizes the surface-to-volume ratio of
|
||||
each processor's sub-domain, as described above. The mapping of
|
||||
each processor's subdomain, as described above. The mapping of
|
||||
processors to the grid is determined by the *map* keyword setting.
|
||||
|
||||
The *twolevel* style can be used on machines with multicore nodes to
|
||||
@ -145,7 +145,7 @@ parameters can be specified with an asterisk "\*", which means LAMMPS
|
||||
will choose the number of cores in that dimension of the node's
|
||||
sub-grid. As with Px,Py,Pz, it will do this based on the size and
|
||||
shape of the global simulation box so as to minimize the
|
||||
surface-to-volume ratio of each processor's sub-domain.
|
||||
surface-to-volume ratio of each processor's subdomain.
|
||||
|
||||
.. note::
|
||||
|
||||
|
||||
@ -16,7 +16,7 @@ nx,ny,nz = replication factors in each dimension
|
||||
|
||||
.. parsed-literal::
|
||||
|
||||
*bbox* = only check atoms in replicas that overlap with a processor's sub-domain
|
||||
*bbox* = only check atoms in replicas that overlap with a processor's subdomain
|
||||
|
||||
Examples
|
||||
""""""""
|
||||
@ -52,7 +52,7 @@ image flags that differ by 1. This will allow the bond to be
|
||||
unwrapped appropriately.
|
||||
|
||||
The optional keyword *bbox* uses a bounding box to only check atoms in
|
||||
replicas that overlap with a processor's sub-domain when assigning
|
||||
replicas that overlap with a processor's subdomain when assigning
|
||||
atoms to processors. It typically results in a substantial speedup
|
||||
when using the replicate command on a large number of processors. It
|
||||
does require temporary use of more memory, specifically that each
|
||||
|
||||
@ -64,7 +64,7 @@ The *lost* keyword determines whether LAMMPS checks for lost atoms each
|
||||
time it computes thermodynamics and what it does if atoms are lost. An
|
||||
atom can be "lost" if it moves across a non-periodic simulation box
|
||||
:doc:`boundary <boundary>` or if it moves more than a box length outside
|
||||
the simulation domain (or more than a processor sub-domain length)
|
||||
the simulation domain (or more than a processor subdomain length)
|
||||
before reneighboring occurs. The latter case is typically due to bad
|
||||
dynamics (e.g., too large a time step and/or huge forces and velocities). If
|
||||
the value is *ignore*, LAMMPS does not check for lost atoms. If the
|
||||
|
||||
@ -3432,6 +3432,8 @@ Subclassed
|
||||
subcutoff
|
||||
subcycle
|
||||
subcycling
|
||||
subdomain
|
||||
subdomains
|
||||
subhi
|
||||
sublo
|
||||
Subramaniyan
|
||||
|
||||
Reference in New Issue
Block a user