revise based on suggestions from languagetool.org
This commit is contained in:
@ -14,8 +14,8 @@ Owned and ghost atoms
|
|||||||
As described on the :doc:`parallel partitioning algorithms
|
As described on the :doc:`parallel partitioning algorithms
|
||||||
<Developer_par_part>` page, LAMMPS spatially decomposes the simulation
|
<Developer_par_part>` page, LAMMPS spatially decomposes the simulation
|
||||||
domain, either in a *brick* or *tiled* manner. Each processor (MPI
|
domain, either in a *brick* or *tiled* manner. Each processor (MPI
|
||||||
task) owns atoms within its sub-domain and additionally stores ghost
|
task) owns atoms within its subdomain and additionally stores ghost
|
||||||
atoms within a cutoff distance of its sub-domain.
|
atoms within a cutoff distance of its subdomain.
|
||||||
|
|
||||||
Forward and reverse communication
|
Forward and reverse communication
|
||||||
=================================
|
=================================
|
||||||
|
|||||||
@ -139,7 +139,7 @@ Periodic boundary conditions are then applied by the Domain class via
|
|||||||
its ``pbc()`` method to remap particles that have moved outside the
|
its ``pbc()`` method to remap particles that have moved outside the
|
||||||
simulation box back into the box. Note that this is not done every
|
simulation box back into the box. Note that this is not done every
|
||||||
timestep, but only when neighbor lists are rebuilt. This is so that
|
timestep, but only when neighbor lists are rebuilt. This is so that
|
||||||
each processor's sub-domain will have consistent (nearby) atom
|
each processor's subdomain will have consistent (nearby) atom
|
||||||
coordinates for its owned and ghost atoms. It is also why dumped atom
|
coordinates for its owned and ghost atoms. It is also why dumped atom
|
||||||
coordinates may be slightly outside the simulation box if not dumped
|
coordinates may be slightly outside the simulation box if not dumped
|
||||||
on a step where the neighbor lists are rebuilt.
|
on a step where the neighbor lists are rebuilt.
|
||||||
@ -153,10 +153,10 @@ method of the Comm class and ``setup_bins()`` method of the Neighbor
|
|||||||
class perform the update.
|
class perform the update.
|
||||||
|
|
||||||
The code is now ready to migrate atoms that have left a processor's
|
The code is now ready to migrate atoms that have left a processor's
|
||||||
geometric sub-domain to new processors. The ``exchange()`` method of
|
geometric subdomain to new processors. The ``exchange()`` method of
|
||||||
the Comm class performs this operation. The ``borders()`` method of the
|
the Comm class performs this operation. The ``borders()`` method of the
|
||||||
Comm class then identifies ghost atoms surrounding each processor's
|
Comm class then identifies ghost atoms surrounding each processor's
|
||||||
sub-domain and communicates ghost atom information to neighboring
|
subdomain and communicates ghost atom information to neighboring
|
||||||
processors. It does this by looping over all the atoms owned by a
|
processors. It does this by looping over all the atoms owned by a
|
||||||
processor to make lists of those to send to each neighbor processor. On
|
processor to make lists of those to send to each neighbor processor. On
|
||||||
subsequent timesteps, the lists are used by the ``Comm::forward_comm()``
|
subsequent timesteps, the lists are used by the ``Comm::forward_comm()``
|
||||||
|
|||||||
@ -28,9 +28,9 @@ grid.
|
|||||||
|
|
||||||
More specifically, a grid point is defined for each cell (by default
|
More specifically, a grid point is defined for each cell (by default
|
||||||
the center point), and a processor owns a grid cell if its point is
|
the center point), and a processor owns a grid cell if its point is
|
||||||
within the processor's spatial sub-domain. The union of processor
|
within the processor's spatial subdomain. The union of processor
|
||||||
sub-domains is the global simulation box. If a grid point is on the
|
subdomains is the global simulation box. If a grid point is on the
|
||||||
boundary of two sub-domains, the lower processor owns the grid cell. A
|
boundary of two subdomains, the lower processor owns the grid cell. A
|
||||||
processor may also store copies of ghost cells which surround its
|
processor may also store copies of ghost cells which surround its
|
||||||
owned cells.
|
owned cells.
|
||||||
|
|
||||||
@ -62,7 +62,7 @@ y-dimension. It is even possible to define a 1x1x1 3d grid, though it
|
|||||||
may be inefficient to use it in a computational sense.
|
may be inefficient to use it in a computational sense.
|
||||||
|
|
||||||
Note that the choice of grid size is independent of the number of
|
Note that the choice of grid size is independent of the number of
|
||||||
processors or their layout in a grid of processor sub-domains which
|
processors or their layout in a grid of processor subdomains which
|
||||||
overlays the simulations domain. Depending on the distributed grid
|
overlays the simulations domain. Depending on the distributed grid
|
||||||
size, a single processor may own many 1000s or no grid cells.
|
size, a single processor may own many 1000s or no grid cells.
|
||||||
|
|
||||||
@ -235,7 +235,7 @@ invoked, because they influence its operation.
|
|||||||
void set_zfactor(double factor);
|
void set_zfactor(double factor);
|
||||||
|
|
||||||
Processors own a grid cell if a point within the grid cell is inside
|
Processors own a grid cell if a point within the grid cell is inside
|
||||||
the processor's sub-domain. By default this is the center point of the
|
the processor's subdomain. By default this is the center point of the
|
||||||
grid cell. The *set_shift_grid()* method can change this. The *shift*
|
grid cell. The *set_shift_grid()* method can change this. The *shift*
|
||||||
argument is a value from 0.0 to 1.0 (inclusive) which is the offset of
|
argument is a value from 0.0 to 1.0 (inclusive) which is the offset of
|
||||||
the point within the grid cell in each dimension. The default is 0.5
|
the point within the grid cell in each dimension. The default is 0.5
|
||||||
@ -245,9 +245,9 @@ typically no need to change the default as it is optimal for
|
|||||||
minimizing the number of ghost cells needed.
|
minimizing the number of ghost cells needed.
|
||||||
|
|
||||||
If a processor maps its particles to grid cells, it needs to allow for
|
If a processor maps its particles to grid cells, it needs to allow for
|
||||||
its particles being outside its sub-domain between reneighboring. The
|
its particles being outside its subdomain between reneighboring. The
|
||||||
*distance* argument of the *set_distance()* method sets the furthest
|
*distance* argument of the *set_distance()* method sets the furthest
|
||||||
distance outside a processor's sub-domain which a particle can move.
|
distance outside a processor's subdomain which a particle can move.
|
||||||
Typically this is half the neighbor skin distance, assuming
|
Typically this is half the neighbor skin distance, assuming
|
||||||
reneighboring is done appropriately. This distance is used in
|
reneighboring is done appropriately. This distance is used in
|
||||||
determining how many ghost cells a processor needs to store to enable
|
determining how many ghost cells a processor needs to store to enable
|
||||||
@ -295,7 +295,7 @@ to the Grid class via the *set_zfactor()* method (*set_yfactor()* for
|
|||||||
2d grids). The Grid class will then assign ownership of the 1/3 of
|
2d grids). The Grid class will then assign ownership of the 1/3 of
|
||||||
grid cells that overlay the simulation box to the processors which
|
grid cells that overlay the simulation box to the processors which
|
||||||
also overlay the simulation box. The remaining 2/3 of the grid cells
|
also overlay the simulation box. The remaining 2/3 of the grid cells
|
||||||
are assigned to processors whose sub-domains are adjacent to the upper
|
are assigned to processors whose subdomains are adjacent to the upper
|
||||||
z boundary of the simulation box.
|
z boundary of the simulation box.
|
||||||
|
|
||||||
----------
|
----------
|
||||||
@ -549,13 +549,13 @@ Grid class remap methods for load balancing
|
|||||||
The following methods are used when a load-balancing operation,
|
The following methods are used when a load-balancing operation,
|
||||||
triggered by the :doc:`balance <balance>` or :doc:`fix balance
|
triggered by the :doc:`balance <balance>` or :doc:`fix balance
|
||||||
<fix_balance>` commands, changes the partitioning of the simulation
|
<fix_balance>` commands, changes the partitioning of the simulation
|
||||||
domain into processor sub-domains.
|
domain into processor subdomains.
|
||||||
|
|
||||||
In order to work with load-balancing, any style command (compute, fix,
|
In order to work with load-balancing, any style command (compute, fix,
|
||||||
pair, or kspace style) which allocates a grid and stores per-grid data
|
pair, or kspace style) which allocates a grid and stores per-grid data
|
||||||
should define a *reset_grid()* method; it takes no arguments. It will
|
should define a *reset_grid()* method; it takes no arguments. It will
|
||||||
be called by the two balance commands after they have reset processor
|
be called by the two balance commands after they have reset processor
|
||||||
sub-domains and migrated atoms (particles) to new owning processors.
|
subdomains and migrated atoms (particles) to new owning processors.
|
||||||
The *reset_grid()* method will typically perform some or all of the
|
The *reset_grid()* method will typically perform some or all of the
|
||||||
following operations. See the src/fix_ave_grid.cpp and
|
following operations. See the src/fix_ave_grid.cpp and
|
||||||
src/EXTRA_FIX/fix_ttm_grid.cpp files for examples of *reset_grid()*
|
src/EXTRA_FIX/fix_ttm_grid.cpp files for examples of *reset_grid()*
|
||||||
@ -564,7 +564,7 @@ functions.
|
|||||||
|
|
||||||
First, the *reset_grid()* method can instantiate new grid(s) of the
|
First, the *reset_grid()* method can instantiate new grid(s) of the
|
||||||
same global size, then call *setup_grid()* to partition them via the
|
same global size, then call *setup_grid()* to partition them via the
|
||||||
new processor sub-domains. At this point, it can invoke the
|
new processor subdomains. At this point, it can invoke the
|
||||||
*identical()* method which compares the owned and ghost grid cell
|
*identical()* method which compares the owned and ghost grid cell
|
||||||
index bounds between two grids, the old grid passed as a pointer
|
index bounds between two grids, the old grid passed as a pointer
|
||||||
argument, and the new grid whose *identical()* method is being called.
|
argument, and the new grid whose *identical()* method is being called.
|
||||||
|
|||||||
@ -102,7 +102,7 @@ build is then :doc:`processed in parallel <Developer_par_neigh>`.
|
|||||||
The most commonly required neighbor list is a so-called "half" neighbor
|
The most commonly required neighbor list is a so-called "half" neighbor
|
||||||
list, where each pair of atoms is listed only once (except when the
|
list, where each pair of atoms is listed only once (except when the
|
||||||
:doc:`newton command setting <newton>` for pair is off; in that case
|
:doc:`newton command setting <newton>` for pair is off; in that case
|
||||||
pairs straddling sub-domains or periodic boundaries will be listed twice).
|
pairs straddling subdomains or periodic boundaries will be listed twice).
|
||||||
Thus these are the default settings when a neighbor list request is created in:
|
Thus these are the default settings when a neighbor list request is created in:
|
||||||
|
|
||||||
.. code-block:: c++
|
.. code-block:: c++
|
||||||
@ -361,7 +361,7 @@ allocated as a 1d vector or 3d array. Either way, the ordering of
|
|||||||
values within contiguous memory x fastest, then y, z slowest.
|
values within contiguous memory x fastest, then y, z slowest.
|
||||||
|
|
||||||
For the ``3d decomposition`` of the grid, the global grid is
|
For the ``3d decomposition`` of the grid, the global grid is
|
||||||
partitioned into bricks that correspond to the sub-domains of the
|
partitioned into bricks that correspond to the subdomains of the
|
||||||
simulation box that each processor owns. Often, this is a regular 3d
|
simulation box that each processor owns. Often, this is a regular 3d
|
||||||
array (Px by Py by Pz) of bricks, where P = number of processors =
|
array (Px by Py by Pz) of bricks, where P = number of processors =
|
||||||
Px * Py * Pz. More generally it can be a tiled decomposition, where
|
Px * Py * Pz. More generally it can be a tiled decomposition, where
|
||||||
|
|||||||
@ -7,16 +7,16 @@ large systems provided it uses a correspondingly large number of MPI
|
|||||||
processes. Since The per-atom data (atom IDs, positions, velocities,
|
processes. Since The per-atom data (atom IDs, positions, velocities,
|
||||||
types, etc.) To be able to compute the short-range interactions MPI
|
types, etc.) To be able to compute the short-range interactions MPI
|
||||||
processes need not only access to data of atoms they "own" but also
|
processes need not only access to data of atoms they "own" but also
|
||||||
information about atoms from neighboring sub-domains, in LAMMPS referred
|
information about atoms from neighboring subdomains, in LAMMPS referred
|
||||||
to as "ghost" atoms. These are copies of atoms storing required
|
to as "ghost" atoms. These are copies of atoms storing required
|
||||||
per-atom data for up to the communication cutoff distance. The green
|
per-atom data for up to the communication cutoff distance. The green
|
||||||
dashed-line boxes in the :ref:`domain-decomposition` figure illustrate
|
dashed-line boxes in the :ref:`domain-decomposition` figure illustrate
|
||||||
the extended ghost-atom sub-domain for one processor.
|
the extended ghost-atom subdomain for one processor.
|
||||||
|
|
||||||
This approach is also used to implement periodic boundary
|
This approach is also used to implement periodic boundary
|
||||||
conditions: atoms that lie within the cutoff distance across a periodic
|
conditions: atoms that lie within the cutoff distance across a periodic
|
||||||
boundary are also stored as ghost atoms and taken from the periodic
|
boundary are also stored as ghost atoms and taken from the periodic
|
||||||
replication of the sub-domain, which may be the same sub-domain, e.g. if
|
replication of the subdomain, which may be the same subdomain, e.g. if
|
||||||
running in serial. As a consequence of this, force computation in
|
running in serial. As a consequence of this, force computation in
|
||||||
LAMMPS is not subject to minimum image conventions and thus cutoffs may
|
LAMMPS is not subject to minimum image conventions and thus cutoffs may
|
||||||
be larger than half the simulation domain.
|
be larger than half the simulation domain.
|
||||||
@ -28,10 +28,10 @@ be larger than half the simulation domain.
|
|||||||
ghost atom communication
|
ghost atom communication
|
||||||
|
|
||||||
This figure shows the ghost atom communication patterns between
|
This figure shows the ghost atom communication patterns between
|
||||||
sub-domains for "brick" (left) and "tiled" communication styles for
|
subdomains for "brick" (left) and "tiled" communication styles for
|
||||||
2d simulations. The numbers indicate MPI process ranks. Here the
|
2d simulations. The numbers indicate MPI process ranks. Here the
|
||||||
sub-domains are drawn spatially separated for clarity. The
|
subdomains are drawn spatially separated for clarity. The
|
||||||
dashed-line box is the extended sub-domain of processor 0 which
|
dashed-line box is the extended subdomain of processor 0 which
|
||||||
includes its ghost atoms. The red- and blue-shaded boxes are the
|
includes its ghost atoms. The red- and blue-shaded boxes are the
|
||||||
regions of communicated ghost atoms.
|
regions of communicated ghost atoms.
|
||||||
|
|
||||||
@ -42,7 +42,7 @@ atom communication is performed in two stages for a 2d simulation (three
|
|||||||
in 3d) for both a regular and irregular partitioning of the simulation
|
in 3d) for both a regular and irregular partitioning of the simulation
|
||||||
box. For the regular case (left) atoms are exchanged first in the
|
box. For the regular case (left) atoms are exchanged first in the
|
||||||
*x*-direction, then in *y*, with four neighbors in the grid of processor
|
*x*-direction, then in *y*, with four neighbors in the grid of processor
|
||||||
sub-domains.
|
subdomains.
|
||||||
|
|
||||||
In the *x* stage, processor ranks 1 and 2 send owned atoms in their
|
In the *x* stage, processor ranks 1 and 2 send owned atoms in their
|
||||||
red-shaded regions to rank 0 (and vice versa). Then in the *y* stage,
|
red-shaded regions to rank 0 (and vice versa). Then in the *y* stage,
|
||||||
@ -55,7 +55,7 @@ For the irregular case (right) the two stages are similar, but a
|
|||||||
processor can have more than one neighbor in each direction. In the
|
processor can have more than one neighbor in each direction. In the
|
||||||
*x* stage, MPI ranks 1,2,3 send owned atoms in their red-shaded regions to
|
*x* stage, MPI ranks 1,2,3 send owned atoms in their red-shaded regions to
|
||||||
rank 0 (and vice versa). These include only atoms between the lower
|
rank 0 (and vice versa). These include only atoms between the lower
|
||||||
and upper *y*-boundary of rank 0's sub-domain. In the *y* stage, ranks
|
and upper *y*-boundary of rank 0's subdomain. In the *y* stage, ranks
|
||||||
4,5,6 send atoms in their blue-shaded regions to rank 0. This may
|
4,5,6 send atoms in their blue-shaded regions to rank 0. This may
|
||||||
include ghost atoms they received in the *x* stage, but only if they
|
include ghost atoms they received in the *x* stage, but only if they
|
||||||
are needed by rank 0 to fill its extended ghost atom regions in the
|
are needed by rank 0 to fill its extended ghost atom regions in the
|
||||||
@ -110,11 +110,11 @@ performed in LAMMPS:
|
|||||||
over 3x the length of a stretched bond for dihedral interactions. It
|
over 3x the length of a stretched bond for dihedral interactions. It
|
||||||
can also exceed the periodic box size. For the regular communication
|
can also exceed the periodic box size. For the regular communication
|
||||||
pattern (left), if the cutoff distance extends beyond a neighbor
|
pattern (left), if the cutoff distance extends beyond a neighbor
|
||||||
processor's sub-domain, then multiple exchanges are performed in the
|
processor's subdomain, then multiple exchanges are performed in the
|
||||||
same direction. Each exchange is with the same neighbor processor,
|
same direction. Each exchange is with the same neighbor processor,
|
||||||
but buffers are packed/unpacked using a different list of atoms. For
|
but buffers are packed/unpacked using a different list of atoms. For
|
||||||
forward communication, in the first exchange a processor sends only
|
forward communication, in the first exchange a processor sends only
|
||||||
owned atoms. In subsequent exchanges, it sends ghost atoms received
|
owned atoms. In subsequent exchanges, it sends ghost atoms received
|
||||||
in previous exchanges. For the irregular pattern (right) overlaps of
|
in previous exchanges. For the irregular pattern (right) overlaps of
|
||||||
a processor's extended ghost-atom sub-domain with all other processors
|
a processor's extended ghost-atom subdomain with all other processors
|
||||||
in each dimension are detected.
|
in each dimension are detected.
|
||||||
|
|||||||
@ -20,7 +20,7 @@ e) electric field values from grid points near each atom are interpolated to com
|
|||||||
|
|
||||||
For any of the spatial-decomposition partitioning schemes each processor
|
For any of the spatial-decomposition partitioning schemes each processor
|
||||||
owns the brick-shaped portion of FFT grid points contained within its
|
owns the brick-shaped portion of FFT grid points contained within its
|
||||||
sub-domain. The two interpolation operations use a stencil of grid
|
subdomain. The two interpolation operations use a stencil of grid
|
||||||
points surrounding each atom. To accommodate the stencil size, each
|
points surrounding each atom. To accommodate the stencil size, each
|
||||||
processor also stores a few layers of ghost grid points surrounding its
|
processor also stores a few layers of ghost grid points surrounding its
|
||||||
brick. Forward and reverse communication of grid point values is
|
brick. Forward and reverse communication of grid point values is
|
||||||
@ -64,7 +64,7 @@ direction of the 1d FFTs it has to perform. LAMMPS uses the
|
|||||||
pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure.
|
pencil-decomposition algorithm as shown in the :ref:`fft-parallel` figure.
|
||||||
|
|
||||||
Initially (far left), each processor owns a brick of same-color grid
|
Initially (far left), each processor owns a brick of same-color grid
|
||||||
cells (actually grid points) contained within in its sub-domain. A
|
cells (actually grid points) contained within in its subdomain. A
|
||||||
brick-to-pencil communication operation converts this layout to 1d
|
brick-to-pencil communication operation converts this layout to 1d
|
||||||
pencils in the *x*-dimension (center left). Again, cells of the same
|
pencils in the *x*-dimension (center left). Again, cells of the same
|
||||||
color are owned by the same processor. Each processor can then compute
|
color are owned by the same processor. Each processor can then compute
|
||||||
@ -161,8 +161,8 @@ grid/particle operations that LAMMPS supports:
|
|||||||
<partition>` calculation and then use the :doc:`verlet/split
|
<partition>` calculation and then use the :doc:`verlet/split
|
||||||
integrator <run_style>` to perform the PPPM computation on a
|
integrator <run_style>` to perform the PPPM computation on a
|
||||||
dedicated, separate partition of MPI processes. This uses an integer
|
dedicated, separate partition of MPI processes. This uses an integer
|
||||||
"1:*p*" mapping of *p* sub-domains of the atom decomposition to one
|
"1:*p*" mapping of *p* subdomains of the atom decomposition to one
|
||||||
sub-domain of the FFT grid decomposition and where pairwise non-bonded
|
subdomain of the FFT grid decomposition and where pairwise non-bonded
|
||||||
and bonded forces and energies are computed on the larger partition
|
and bonded forces and energies are computed on the larger partition
|
||||||
and the PPPM kspace computation concurrently on the smaller partition.
|
and the PPPM kspace computation concurrently on the smaller partition.
|
||||||
|
|
||||||
@ -172,7 +172,7 @@ grid/particle operations that LAMMPS supports:
|
|||||||
|
|
||||||
- LAMMPS implements a ``GridComm`` class which overlays the simulation
|
- LAMMPS implements a ``GridComm`` class which overlays the simulation
|
||||||
domain with a regular grid, partitions it across processors in a
|
domain with a regular grid, partitions it across processors in a
|
||||||
manner consistent with processor sub-domains, and provides methods for
|
manner consistent with processor subdomains, and provides methods for
|
||||||
forward and reverse communication of owned and ghost grid point
|
forward and reverse communication of owned and ghost grid point
|
||||||
values. It is used for PPPM as an FFT grid (as outlined above) and
|
values. It is used for PPPM as an FFT grid (as outlined above) and
|
||||||
also for the MSM algorithm which uses a cascade of grid sizes from
|
also for the MSM algorithm which uses a cascade of grid sizes from
|
||||||
|
|||||||
@ -22,7 +22,7 @@ last reneighboring; this and other options of the neighbor list rebuild
|
|||||||
can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
|
can be adjusted with the :doc:`neigh_modify <neigh_modify>` command.
|
||||||
|
|
||||||
On steps when reneighboring is performed, atoms which have moved outside
|
On steps when reneighboring is performed, atoms which have moved outside
|
||||||
their owning processor's sub-domain are first migrated to new processors
|
their owning processor's subdomain are first migrated to new processors
|
||||||
via communication. Periodic boundary conditions are also (only)
|
via communication. Periodic boundary conditions are also (only)
|
||||||
enforced on these steps to ensure each atom is re-assigned to the
|
enforced on these steps to ensure each atom is re-assigned to the
|
||||||
correct processor. After migration, the atoms owned by each processor
|
correct processor. After migration, the atoms owned by each processor
|
||||||
@ -39,12 +39,12 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
|||||||
|
|
||||||
neighbor list stencils
|
neighbor list stencils
|
||||||
|
|
||||||
A 2d simulation sub-domain (thick black line) and the corresponding
|
A 2d simulation subdomain (thick black line) and the corresponding
|
||||||
ghost atom cutoff region (dashed blue line) for both orthogonal
|
ghost atom cutoff region (dashed blue line) for both orthogonal
|
||||||
(left) and triclinic (right) domains. A regular grid of neighbor
|
(left) and triclinic (right) domains. A regular grid of neighbor
|
||||||
bins (thin lines) overlays the entire simulation domain and need not
|
bins (thin lines) overlays the entire simulation domain and need not
|
||||||
align with sub-domain boundaries; only the portion overlapping the
|
align with subdomain boundaries; only the portion overlapping the
|
||||||
augmented sub-domain is shown. In the triclinic case it overlaps the
|
augmented subdomain is shown. In the triclinic case it overlaps the
|
||||||
bounding box of the tilted rectangle. The blue- and red-shaded bins
|
bounding box of the tilted rectangle. The blue- and red-shaded bins
|
||||||
represent a stencil of bins searched to find neighbors of a particular
|
represent a stencil of bins searched to find neighbors of a particular
|
||||||
atom (black dot).
|
atom (black dot).
|
||||||
@ -52,8 +52,8 @@ its settings modified with the :doc:`atom_modify <atom_modify>` command.
|
|||||||
To build a local neighbor list in linear time, the simulation domain is
|
To build a local neighbor list in linear time, the simulation domain is
|
||||||
overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
|
overlaid (conceptually) with a regular 3d (or 2d) grid of neighbor bins,
|
||||||
as shown in the :ref:`neighbor-stencil` figure for 2d models and a
|
as shown in the :ref:`neighbor-stencil` figure for 2d models and a
|
||||||
single MPI processor's sub-domain. Each processor stores a set of
|
single MPI processor's subdomain. Each processor stores a set of
|
||||||
neighbor bins which overlap its sub-domain extended by the neighbor
|
neighbor bins which overlap its subdomain extended by the neighbor
|
||||||
cutoff distance :math:`R_n`. As illustrated, the bins need not align
|
cutoff distance :math:`R_n`. As illustrated, the bins need not align
|
||||||
with processor boundaries; an integer number in each dimension is fit to
|
with processor boundaries; an integer number in each dimension is fit to
|
||||||
the size of the entire simulation box.
|
the size of the entire simulation box.
|
||||||
@ -144,7 +144,7 @@ supports:
|
|||||||
|
|
||||||
- For small and sparse systems and as a fallback method, LAMMPS also
|
- For small and sparse systems and as a fallback method, LAMMPS also
|
||||||
supports neighbor list construction without binning by using a full
|
supports neighbor list construction without binning by using a full
|
||||||
:math:`O(N^2)` loop over all *i,j* atom pairs in a sub-domain when
|
:math:`O(N^2)` loop over all *i,j* atom pairs in a subdomain when
|
||||||
using the :doc:`neighbor nsq <neighbor>` command.
|
using the :doc:`neighbor nsq <neighbor>` command.
|
||||||
|
|
||||||
- Dependent on the "pair" setting of the :doc:`newton <newton>` command,
|
- Dependent on the "pair" setting of the :doc:`newton <newton>` command,
|
||||||
|
|||||||
@ -15,8 +15,8 @@ distributed-memory parallelism is set with the :doc:`comm_style command
|
|||||||
for MPI parallelization: "brick" on the left with an orthogonal
|
for MPI parallelization: "brick" on the left with an orthogonal
|
||||||
(left) and a triclinic (middle) simulation domain, and a "tiled"
|
(left) and a triclinic (middle) simulation domain, and a "tiled"
|
||||||
decomposition (right). The black lines show the division into
|
decomposition (right). The black lines show the division into
|
||||||
sub-domains and the contained atoms are "owned" by the corresponding
|
subdomains and the contained atoms are "owned" by the corresponding
|
||||||
MPI process. The green dashed lines indicate how sub-domains are
|
MPI process. The green dashed lines indicate how subdomains are
|
||||||
extended with "ghost" atoms up to the communication cutoff distance.
|
extended with "ghost" atoms up to the communication cutoff distance.
|
||||||
|
|
||||||
The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
|
The LAMMPS simulation box is a 3d or 2d volume, which can be orthogonal
|
||||||
@ -32,14 +32,14 @@ means the position of the box face adjusts continuously to enclose all
|
|||||||
the atoms.
|
the atoms.
|
||||||
|
|
||||||
For distributed-memory MPI parallelism, the simulation box is spatially
|
For distributed-memory MPI parallelism, the simulation box is spatially
|
||||||
decomposed (partitioned) into non-overlapping sub-domains which fill the
|
decomposed (partitioned) into non-overlapping subdomains which fill the
|
||||||
box. The default partitioning, "brick", is most suitable when atom
|
box. The default partitioning, "brick", is most suitable when atom
|
||||||
density is roughly uniform, as shown in the left-side images of the
|
density is roughly uniform, as shown in the left-side images of the
|
||||||
:ref:`domain-decomposition` figure. The sub-domains comprise a regular
|
:ref:`domain-decomposition` figure. The subdomains comprise a regular
|
||||||
grid and all sub-domains are identical in size and shape. Both the
|
grid and all subdomains are identical in size and shape. Both the
|
||||||
orthogonal and triclinic boxes can deform continuously during a
|
orthogonal and triclinic boxes can deform continuously during a
|
||||||
simulation, e.g. to compress a solid or shear a liquid, in which case
|
simulation, e.g. to compress a solid or shear a liquid, in which case
|
||||||
the processor sub-domains likewise deform.
|
the processor subdomains likewise deform.
|
||||||
|
|
||||||
|
|
||||||
For models with non-uniform density, the number of particles per
|
For models with non-uniform density, the number of particles per
|
||||||
@ -50,14 +50,14 @@ load. For such models, LAMMPS supports multiple strategies to reduce
|
|||||||
the load imbalance:
|
the load imbalance:
|
||||||
|
|
||||||
- The processor grid decomposition is by default based on the simulation
|
- The processor grid decomposition is by default based on the simulation
|
||||||
cell volume and tries to optimize the volume to surface ratio for the sub-domains.
|
cell volume and tries to optimize the volume to surface ratio for the subdomains.
|
||||||
This can be changed with the :doc:`processors command <processors>`.
|
This can be changed with the :doc:`processors command <processors>`.
|
||||||
- The parallel planes defining the size of the sub-domains can be shifted
|
- The parallel planes defining the size of the subdomains can be shifted
|
||||||
with the :doc:`balance command <balance>`. Which can be done in addition
|
with the :doc:`balance command <balance>`. Which can be done in addition
|
||||||
to choosing a more optimal processor grid.
|
to choosing a more optimal processor grid.
|
||||||
- The recursive bisectioning algorithm in combination with the "tiled"
|
- The recursive bisectioning algorithm in combination with the "tiled"
|
||||||
communication style can produce a partitioning with equal numbers of
|
communication style can produce a partitioning with equal numbers of
|
||||||
particles in each sub-domain.
|
particles in each subdomain.
|
||||||
|
|
||||||
|
|
||||||
.. |decomp1| image:: img/decomp-regular.png
|
.. |decomp1| image:: img/decomp-regular.png
|
||||||
@ -76,14 +76,14 @@ the load imbalance:
|
|||||||
|
|
||||||
The pictures above demonstrate different decompositions for a 2d system
|
The pictures above demonstrate different decompositions for a 2d system
|
||||||
with 12 MPI ranks. The atom colors indicate the load imbalance of each
|
with 12 MPI ranks. The atom colors indicate the load imbalance of each
|
||||||
sub-domain with green being optimal and red the least optimal.
|
subdomain with green being optimal and red the least optimal.
|
||||||
|
|
||||||
Due to the vacuum in the system, the default decomposition is unbalanced
|
Due to the vacuum in the system, the default decomposition is unbalanced
|
||||||
with several MPI ranks without atoms (left). By forcing a 1x12x1
|
with several MPI ranks without atoms (left). By forcing a 1x12x1
|
||||||
processor grid, every MPI rank does computations now, but number of
|
processor grid, every MPI rank does computations now, but number of
|
||||||
atoms per sub-domain is still uneven and the thin slice shape increases
|
atoms per subdomain is still uneven and the thin slice shape increases
|
||||||
the amount of communication between sub-domains (center left). With a
|
the amount of communication between subdomains (center left). With a
|
||||||
2x6x1 processor grid and shifting the sub-domain divisions, the load
|
2x6x1 processor grid and shifting the subdomain divisions, the load
|
||||||
imbalance is further reduced and the amount of communication required
|
imbalance is further reduced and the amount of communication required
|
||||||
between sub-domains is less (center right). And using the recursive
|
between subdomains is less (center right). And using the recursive
|
||||||
bisectioning leads to further improved decomposition (right).
|
bisectioning leads to further improved decomposition (right).
|
||||||
|
|||||||
@ -7,7 +7,7 @@ decomposition. The parallelization aims to be efficient, and resulting
|
|||||||
in good strong scaling (= good speedup for the same system) and good
|
in good strong scaling (= good speedup for the same system) and good
|
||||||
weak scaling (= the computational cost of enlarging the system is
|
weak scaling (= the computational cost of enlarging the system is
|
||||||
proportional to the system size). Additional parallelization using GPUs
|
proportional to the system size). Additional parallelization using GPUs
|
||||||
or OpenMP can also be applied within the sub-domain assigned to an MPI
|
or OpenMP can also be applied within the subdomain assigned to an MPI
|
||||||
process. For clarity, most of the following illustrations show the 2d
|
process. For clarity, most of the following illustrations show the 2d
|
||||||
simulation case. The underlying algorithms in those cases, however,
|
simulation case. The underlying algorithms in those cases, however,
|
||||||
apply to both 2d and 3d cases equally well.
|
apply to both 2d and 3d cases equally well.
|
||||||
|
|||||||
@ -647,7 +647,7 @@ Communication buffer coding with *ubuf*
|
|||||||
---------------------------------------
|
---------------------------------------
|
||||||
|
|
||||||
LAMMPS uses communication buffers where it collects data from various
|
LAMMPS uses communication buffers where it collects data from various
|
||||||
class instances and then exchanges the data with neighboring sub-domains.
|
class instances and then exchanges the data with neighboring subdomains.
|
||||||
For simplicity those buffers are defined as ``double`` buffers and
|
For simplicity those buffers are defined as ``double`` buffers and
|
||||||
used for doubles and integer numbers. This presents a unique problem
|
used for doubles and integer numbers. This presents a unique problem
|
||||||
when 64-bit integers are used. While the storage needed for a ``double``
|
when 64-bit integers are used. While the storage needed for a ``double``
|
||||||
|
|||||||
@ -5635,7 +5635,7 @@ Doc page with :doc:`WARNING messages <Errors_warnings>`
|
|||||||
Lost atoms are checked for each time thermo output is done. See the
|
Lost atoms are checked for each time thermo output is done. See the
|
||||||
thermo_modify lost command for options. Lost atoms usually indicate
|
thermo_modify lost command for options. Lost atoms usually indicate
|
||||||
bad dynamics, e.g. atoms have been blown far out of the simulation
|
bad dynamics, e.g. atoms have been blown far out of the simulation
|
||||||
box, or moved further than one processor's sub-domain away before
|
box, or moved further than one processor's subdomain away before
|
||||||
reneighboring.
|
reneighboring.
|
||||||
|
|
||||||
*MEAM library error %d*
|
*MEAM library error %d*
|
||||||
@ -6266,14 +6266,14 @@ keyword to allow for additional bonds to be formed
|
|||||||
One or more atoms are attempting to map their charge to a MSM grid point
|
One or more atoms are attempting to map their charge to a MSM grid point
|
||||||
that is not owned by a processor. This is likely for one of two
|
that is not owned by a processor. This is likely for one of two
|
||||||
reasons, both of them bad. First, it may mean that an atom near the
|
reasons, both of them bad. First, it may mean that an atom near the
|
||||||
boundary of a processor's sub-domain has moved more than 1/2 the
|
boundary of a processor's subdomain has moved more than 1/2 the
|
||||||
:doc:`neighbor skin distance <neighbor>` without neighbor lists being
|
:doc:`neighbor skin distance <neighbor>` without neighbor lists being
|
||||||
rebuilt and atoms being migrated to new processors. This also means
|
rebuilt and atoms being migrated to new processors. This also means
|
||||||
you may be missing pairwise interactions that need to be computed.
|
you may be missing pairwise interactions that need to be computed.
|
||||||
The solution is to change the re-neighboring criteria via the
|
The solution is to change the re-neighboring criteria via the
|
||||||
:doc:`neigh_modify <neigh_modify>` command. The safest settings are
|
:doc:`neigh_modify <neigh_modify>` command. The safest settings are
|
||||||
"delay 0 every 1 check yes". Second, it may mean that an atom has
|
"delay 0 every 1 check yes". Second, it may mean that an atom has
|
||||||
moved far outside a processor's sub-domain or even the entire
|
moved far outside a processor's subdomain or even the entire
|
||||||
simulation box. This indicates bad physics, e.g. due to highly
|
simulation box. This indicates bad physics, e.g. due to highly
|
||||||
overlapping atoms, too large a timestep, etc.
|
overlapping atoms, too large a timestep, etc.
|
||||||
|
|
||||||
@ -6281,14 +6281,14 @@ keyword to allow for additional bonds to be formed
|
|||||||
One or more atoms are attempting to map their charge to a PPPM grid
|
One or more atoms are attempting to map their charge to a PPPM grid
|
||||||
point that is not owned by a processor. This is likely for one of two
|
point that is not owned by a processor. This is likely for one of two
|
||||||
reasons, both of them bad. First, it may mean that an atom near the
|
reasons, both of them bad. First, it may mean that an atom near the
|
||||||
boundary of a processor's sub-domain has moved more than 1/2 the
|
boundary of a processor's subdomain has moved more than 1/2 the
|
||||||
:doc:`neighbor skin distance <neighbor>` without neighbor lists being
|
:doc:`neighbor skin distance <neighbor>` without neighbor lists being
|
||||||
rebuilt and atoms being migrated to new processors. This also means
|
rebuilt and atoms being migrated to new processors. This also means
|
||||||
you may be missing pairwise interactions that need to be computed.
|
you may be missing pairwise interactions that need to be computed.
|
||||||
The solution is to change the re-neighboring criteria via the
|
The solution is to change the re-neighboring criteria via the
|
||||||
:doc:`neigh_modify <neigh_modify>` command. The safest settings are
|
:doc:`neigh_modify <neigh_modify>` command. The safest settings are
|
||||||
"delay 0 every 1 check yes". Second, it may mean that an atom has
|
"delay 0 every 1 check yes". Second, it may mean that an atom has
|
||||||
moved far outside a processor's sub-domain or even the entire
|
moved far outside a processor's subdomain or even the entire
|
||||||
simulation box. This indicates bad physics, e.g. due to highly
|
simulation box. This indicates bad physics, e.g. due to highly
|
||||||
overlapping atoms, too large a timestep, etc.
|
overlapping atoms, too large a timestep, etc.
|
||||||
|
|
||||||
@ -6296,14 +6296,14 @@ keyword to allow for additional bonds to be formed
|
|||||||
One or more atoms are attempting to map their charge to a PPPM grid
|
One or more atoms are attempting to map their charge to a PPPM grid
|
||||||
point that is not owned by a processor. This is likely for one of two
|
point that is not owned by a processor. This is likely for one of two
|
||||||
reasons, both of them bad. First, it may mean that an atom near the
|
reasons, both of them bad. First, it may mean that an atom near the
|
||||||
boundary of a processor's sub-domain has moved more than 1/2 the
|
boundary of a processor's subdomain has moved more than 1/2 the
|
||||||
:doc:`neighbor skin distance <neighbor>` without neighbor lists being
|
:doc:`neighbor skin distance <neighbor>` without neighbor lists being
|
||||||
rebuilt and atoms being migrated to new processors. This also means
|
rebuilt and atoms being migrated to new processors. This also means
|
||||||
you may be missing pairwise interactions that need to be computed.
|
you may be missing pairwise interactions that need to be computed.
|
||||||
The solution is to change the re-neighboring criteria via the
|
The solution is to change the re-neighboring criteria via the
|
||||||
:doc:`neigh_modify <neigh_modify>` command. The safest settings are
|
:doc:`neigh_modify <neigh_modify>` command. The safest settings are
|
||||||
"delay 0 every 1 check yes". Second, it may mean that an atom has
|
"delay 0 every 1 check yes". Second, it may mean that an atom has
|
||||||
moved far outside a processor's sub-domain or even the entire
|
moved far outside a processor's subdomain or even the entire
|
||||||
simulation box. This indicates bad physics, e.g. due to highly
|
simulation box. This indicates bad physics, e.g. due to highly
|
||||||
overlapping atoms, too large a timestep, etc.
|
overlapping atoms, too large a timestep, etc.
|
||||||
|
|
||||||
|
|||||||
@ -109,9 +109,9 @@ Doc page with :doc:`ERROR messages <Errors_messages>`
|
|||||||
*Communication cutoff is shorter than a bond length based estimate. This may lead to errors.*
|
*Communication cutoff is shorter than a bond length based estimate. This may lead to errors.*
|
||||||
Since LAMMPS stores topology data with individual atoms, all atoms
|
Since LAMMPS stores topology data with individual atoms, all atoms
|
||||||
comprising a bond, angle, dihedral or improper must be present on any
|
comprising a bond, angle, dihedral or improper must be present on any
|
||||||
sub-domain that "owns" the atom with the information, either as a
|
subdomain that "owns" the atom with the information, either as a
|
||||||
local or a ghost atom. The communication cutoff is what determines up
|
local or a ghost atom. The communication cutoff is what determines up
|
||||||
to what distance from a sub-domain boundary ghost atoms are created.
|
to what distance from a subdomain boundary ghost atoms are created.
|
||||||
The communication cutoff is by default the largest non-bonded cutoff
|
The communication cutoff is by default the largest non-bonded cutoff
|
||||||
plus the neighbor skin distance, but for short or non-bonded cutoffs
|
plus the neighbor skin distance, but for short or non-bonded cutoffs
|
||||||
and/or long bonds, this may not be sufficient. This warning indicates
|
and/or long bonds, this may not be sufficient. This warning indicates
|
||||||
@ -398,7 +398,7 @@ This will most likely cause errors in kinetic fluctuations.
|
|||||||
Lost atoms are checked for each time thermo output is done. See the
|
Lost atoms are checked for each time thermo output is done. See the
|
||||||
thermo_modify lost command for options. Lost atoms usually indicate
|
thermo_modify lost command for options. Lost atoms usually indicate
|
||||||
bad dynamics, e.g. atoms have been blown far out of the simulation
|
bad dynamics, e.g. atoms have been blown far out of the simulation
|
||||||
box, or moved further than one processor's sub-domain away before
|
box, or moved further than one processor's subdomain away before
|
||||||
reneighboring.
|
reneighboring.
|
||||||
|
|
||||||
*MSM mesh too small, increasing to 2 points in each direction*
|
*MSM mesh too small, increasing to 2 points in each direction*
|
||||||
@ -582,13 +582,13 @@ This will most likely cause errors in kinetic fluctuations.
|
|||||||
needed. The requested volume fraction may be too high, or other atoms
|
needed. The requested volume fraction may be too high, or other atoms
|
||||||
may be in the insertion region.
|
may be in the insertion region.
|
||||||
|
|
||||||
*Proc sub-domain size < neighbor skin, could lead to lost atoms*
|
*Proc subdomain size < neighbor skin, could lead to lost atoms*
|
||||||
The decomposition of the physical domain (likely due to load
|
The decomposition of the physical domain (likely due to load
|
||||||
balancing) has led to a processor's sub-domain being smaller than the
|
balancing) has led to a processor's subdomain being smaller than the
|
||||||
neighbor skin in one or more dimensions. Since reneighboring is
|
neighbor skin in one or more dimensions. Since reneighboring is
|
||||||
triggered by atoms moving the skin distance, this may lead to lost
|
triggered by atoms moving the skin distance, this may lead to lost
|
||||||
atoms, if an atom moves all the way across a neighboring processor's
|
atoms, if an atom moves all the way across a neighboring processor's
|
||||||
sub-domain before reneighboring is triggered.
|
subdomain before reneighboring is triggered.
|
||||||
|
|
||||||
*Reducing PPPM order b/c stencil extends beyond nearest neighbor processor*
|
*Reducing PPPM order b/c stencil extends beyond nearest neighbor processor*
|
||||||
This may lead to a larger grid than desired. See the kspace_modify overlap
|
This may lead to a larger grid than desired. See the kspace_modify overlap
|
||||||
|
|||||||
@ -11,7 +11,7 @@ more values (data).
|
|||||||
|
|
||||||
The grid cells and data they store are distributed across processors.
|
The grid cells and data they store are distributed across processors.
|
||||||
Each processor owns the grid cells (and data) whose center points lie
|
Each processor owns the grid cells (and data) whose center points lie
|
||||||
within the spatial sub-domain of the processor. If needed for its
|
within the spatial subdomain of the processor. If needed for its
|
||||||
computations, a processor may also store ghost grid cells with their
|
computations, a processor may also store ghost grid cells with their
|
||||||
data.
|
data.
|
||||||
|
|
||||||
@ -28,7 +28,7 @@ box size, as set by the :doc:`boundary <boundary>` command for fixed
|
|||||||
or shrink-wrapped boundaries.
|
or shrink-wrapped boundaries.
|
||||||
|
|
||||||
If load-balancing is invoked by the :doc:`balance <balance>` or
|
If load-balancing is invoked by the :doc:`balance <balance>` or
|
||||||
:doc:`fix balance <fix_balance>` commands, then the sub-domain owned
|
:doc:`fix balance <fix_balance>` commands, then the subdomain owned
|
||||||
by a processor can change which may also change which grid cells they
|
by a processor can change which may also change which grid cells they
|
||||||
own.
|
own.
|
||||||
|
|
||||||
|
|||||||
@ -59,7 +59,7 @@ of bond distances.
|
|||||||
A per-grid datum is one or more values per grid cell, for a grid which
|
A per-grid datum is one or more values per grid cell, for a grid which
|
||||||
overlays the simulation domain. The grid cells and the data they
|
overlays the simulation domain. The grid cells and the data they
|
||||||
store are distributed across processors; each processor owns the grid
|
store are distributed across processors; each processor owns the grid
|
||||||
cells whose center point falls within its sub-domain.
|
cells whose center point falls within its subdomain.
|
||||||
|
|
||||||
.. _scalar:
|
.. _scalar:
|
||||||
|
|
||||||
@ -322,7 +322,7 @@ The chief difference between the :doc:`fix ave/grid <fix_ave_grid>`
|
|||||||
and :doc:`fix ave/chunk <fix_ave_chunk>` commands when used in this
|
and :doc:`fix ave/chunk <fix_ave_chunk>` commands when used in this
|
||||||
context is that the former uses a distributed grid, while the latter
|
context is that the former uses a distributed grid, while the latter
|
||||||
uses a global grid. Distributed means that each processor owns the
|
uses a global grid. Distributed means that each processor owns the
|
||||||
subset of grid cells within its sub-domain. Global means that each
|
subset of grid cells within its subdomain. Global means that each
|
||||||
processor owns a copy of the entire grid. The :doc:`fix ave/grid
|
processor owns a copy of the entire grid. The :doc:`fix ave/grid
|
||||||
<fix_ave_grid>` command is thus more efficient for large grids.
|
<fix_ave_grid>` command is thus more efficient for large grids.
|
||||||
|
|
||||||
|
|||||||
@ -783,19 +783,19 @@ Pitfalls
|
|||||||
**Parallel Scalability**
|
**Parallel Scalability**
|
||||||
|
|
||||||
LAMMPS operates in parallel in a :doc:`spatial-decomposition mode
|
LAMMPS operates in parallel in a :doc:`spatial-decomposition mode
|
||||||
<Developer_par_part>`, where each processor owns a spatial sub-domain of
|
<Developer_par_part>`, where each processor owns a spatial subdomain of
|
||||||
the overall simulation domain and communicates with its neighboring
|
the overall simulation domain and communicates with its neighboring
|
||||||
processors via distributed-memory message passing (MPI) to acquire ghost
|
processors via distributed-memory message passing (MPI) to acquire ghost
|
||||||
atom information to allow forces on the atoms it owns to be
|
atom information to allow forces on the atoms it owns to be
|
||||||
computed. LAMMPS also uses Verlet neighbor lists which are recomputed
|
computed. LAMMPS also uses Verlet neighbor lists which are recomputed
|
||||||
every few timesteps as particles move. On these timesteps, particles
|
every few timesteps as particles move. On these timesteps, particles
|
||||||
also migrate to new processors as needed. LAMMPS decomposes the overall
|
also migrate to new processors as needed. LAMMPS decomposes the overall
|
||||||
simulation domain so that spatial sub-domains of nearly equal volume are
|
simulation domain so that spatial subdomains of nearly equal volume are
|
||||||
assigned to each processor. When each sub-domain contains nearly the
|
assigned to each processor. When each subdomain contains nearly the
|
||||||
same number of particles, this results in a reasonable load balance
|
same number of particles, this results in a reasonable load balance
|
||||||
among all processors. As is more typical with some peridynamic
|
among all processors. As is more typical with some peridynamic
|
||||||
simulations, some sub-domains may contain many particles while other
|
simulations, some subdomains may contain many particles while other
|
||||||
sub-domains contain few particles, resulting in a load imbalance that
|
subdomains contain few particles, resulting in a load imbalance that
|
||||||
impacts parallel scalability.
|
impacts parallel scalability.
|
||||||
|
|
||||||
**Setting the "skin" distance**
|
**Setting the "skin" distance**
|
||||||
|
|||||||
@ -150,7 +150,7 @@ option with either of the commands.
|
|||||||
|
|
||||||
Note that if a simulation box has a large tilt factor, LAMMPS will run
|
Note that if a simulation box has a large tilt factor, LAMMPS will run
|
||||||
less efficiently, due to the large volume of communication needed to
|
less efficiently, due to the large volume of communication needed to
|
||||||
acquire ghost atoms around a processor's irregular-shaped sub-domain.
|
acquire ghost atoms around a processor's irregular-shaped subdomain.
|
||||||
For extreme values of tilt, LAMMPS may also lose atoms and generate an
|
For extreme values of tilt, LAMMPS may also lose atoms and generate an
|
||||||
error.
|
error.
|
||||||
|
|
||||||
|
|||||||
@ -38,11 +38,11 @@ to create digital object identifiers (DOI) for stable releases of the
|
|||||||
LAMMPS source code. There are two types of DOIs for the LAMMPS source code.
|
LAMMPS source code. There are two types of DOIs for the LAMMPS source code.
|
||||||
|
|
||||||
The canonical DOI for **all** versions of LAMMPS, which will always
|
The canonical DOI for **all** versions of LAMMPS, which will always
|
||||||
point to the **latest** stable release version is:
|
point to the **latest** stable release version, is:
|
||||||
|
|
||||||
- DOI: `10.5281/zenodo.3726416 <https://dx.doi.org/10.5281/zenodo.3726416>`_
|
- DOI: `10.5281/zenodo.3726416 <https://dx.doi.org/10.5281/zenodo.3726416>`_
|
||||||
|
|
||||||
In addition there are DOIs for individual stable releases. Currently there are:
|
In addition there are DOIs generated for individual stable releases:
|
||||||
|
|
||||||
- 3 March 2020 version: `DOI:10.5281/zenodo.3726417 <https://dx.doi.org/10.5281/zenodo.3726417>`_
|
- 3 March 2020 version: `DOI:10.5281/zenodo.3726417 <https://dx.doi.org/10.5281/zenodo.3726417>`_
|
||||||
- 29 October 2020 version: `DOI:10.5281/zenodo.4157471 <https://dx.doi.org/10.5281/zenodo.4157471>`_
|
- 29 October 2020 version: `DOI:10.5281/zenodo.4157471 <https://dx.doi.org/10.5281/zenodo.4157471>`_
|
||||||
@ -65,6 +65,6 @@ for optional features used in a specific run is printed to the screen
|
|||||||
and log file. Style and output location can be selected with the
|
and log file. Style and output location can be selected with the
|
||||||
:ref:`-cite command-line switch <cite>`. Additional references are
|
:ref:`-cite command-line switch <cite>`. Additional references are
|
||||||
given in the documentation of the :doc:`corresponding commands
|
given in the documentation of the :doc:`corresponding commands
|
||||||
<Commands_all>` or in the :doc:`Howto tutorials <Howto>`. So please
|
<Commands_all>` or in the :doc:`Howto tutorials <Howto>`. Please make
|
||||||
make certain, that you provide the proper acknowledgments and citations
|
certain, that you provide the proper acknowledgments and citations in
|
||||||
in any published works using LAMMPS.
|
any published works using LAMMPS.
|
||||||
|
|||||||
@ -27,7 +27,7 @@ General features
|
|||||||
* distributed memory message-passing parallelism (MPI)
|
* distributed memory message-passing parallelism (MPI)
|
||||||
* shared memory multi-threading parallelism (OpenMP)
|
* shared memory multi-threading parallelism (OpenMP)
|
||||||
* spatial decomposition of simulation domain for MPI parallelism
|
* spatial decomposition of simulation domain for MPI parallelism
|
||||||
* particle decomposition inside of spatial decomposition for OpenMP and GPU parallelism
|
* particle decomposition inside spatial decomposition for OpenMP and GPU parallelism
|
||||||
* GPLv2 licensed open-source distribution
|
* GPLv2 licensed open-source distribution
|
||||||
* highly portable C++-11
|
* highly portable C++-11
|
||||||
* modular code with most functionality in optional packages
|
* modular code with most functionality in optional packages
|
||||||
@ -113,7 +113,7 @@ Atom creation
|
|||||||
:doc:`create_atoms <create_atoms>`, :doc:`delete_atoms <delete_atoms>`,
|
:doc:`create_atoms <create_atoms>`, :doc:`delete_atoms <delete_atoms>`,
|
||||||
:doc:`displace_atoms <displace_atoms>`, :doc:`replicate <replicate>` commands)
|
:doc:`displace_atoms <displace_atoms>`, :doc:`replicate <replicate>` commands)
|
||||||
|
|
||||||
* read in atom coords from files
|
* read in atom coordinates from files
|
||||||
* create atoms on one or more lattices (e.g. grain boundaries)
|
* create atoms on one or more lattices (e.g. grain boundaries)
|
||||||
* delete geometric or logical groups of atoms (e.g. voids)
|
* delete geometric or logical groups of atoms (e.g. voids)
|
||||||
* replicate existing atoms multiple times
|
* replicate existing atoms multiple times
|
||||||
@ -173,11 +173,11 @@ Output
|
|||||||
(:doc:`dump <dump>`, :doc:`restart <restart>` commands)
|
(:doc:`dump <dump>`, :doc:`restart <restart>` commands)
|
||||||
|
|
||||||
* log file of thermodynamic info
|
* log file of thermodynamic info
|
||||||
* text dump files of atom coords, velocities, other per-atom quantities
|
* text dump files of atom coordinates, velocities, other per-atom quantities
|
||||||
* dump output on fixed and variable intervals, based timestep or simulated time
|
* dump output on fixed and variable intervals, based timestep or simulated time
|
||||||
* binary restart files
|
* binary restart files
|
||||||
* parallel I/O of dump and restart files
|
* parallel I/O of dump and restart files
|
||||||
* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
|
* per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc.)
|
||||||
* user-defined system-wide (log file) or per-atom (dump file) calculations
|
* user-defined system-wide (log file) or per-atom (dump file) calculations
|
||||||
* custom partitioning (chunks) for binning, and static or dynamic grouping of atoms for analysis
|
* custom partitioning (chunks) for binning, and static or dynamic grouping of atoms for analysis
|
||||||
* spatial, time, and per-chunk averaging of per-atom quantities
|
* spatial, time, and per-chunk averaging of per-atom quantities
|
||||||
|
|||||||
@ -20,22 +20,23 @@ that either closely interface with LAMMPS or extend LAMMPS.
|
|||||||
|
|
||||||
Here are suggestions on how to perform these tasks:
|
Here are suggestions on how to perform these tasks:
|
||||||
|
|
||||||
* **GUI:** LAMMPS can be built as a library and a Python wrapper that wraps
|
* **GUI:** LAMMPS can be built as a library and a Python module that
|
||||||
the library interface is provided. Thus, GUI interfaces can be
|
wraps the library interface is provided. Thus, GUI interfaces can be
|
||||||
written in Python (or C or C++ if desired) that run LAMMPS and
|
written in Python or C/C++ that run LAMMPS and visualize or plot its
|
||||||
visualize or plot its output. Examples of this are provided in the
|
output. Examples of this are provided in the python directory and
|
||||||
python directory and described on the :doc:`Python <Python_head>` doc
|
described on the :doc:`Python <Python_head>` doc page. Also, there
|
||||||
page. Also, there are several external wrappers or GUI front ends.
|
are several external wrappers or GUI front ends.
|
||||||
* **Builder:** Several pre-processing tools are packaged with LAMMPS. Some
|
* **Builder:** Several pre-processing tools are packaged with LAMMPS.
|
||||||
of them convert input files in formats produced by other MD codes such
|
Some of them convert input files in formats produced by other MD codes
|
||||||
as CHARMM, AMBER, or Insight into LAMMPS input formats. Some of them
|
such as CHARMM, AMBER, or Insight into LAMMPS input formats. Some of
|
||||||
are simple programs that will build simple molecular systems, such as
|
them are simple programs that will build simple molecular systems,
|
||||||
linear bead-spring polymer chains. The moltemplate program is a true
|
such as linear bead-spring polymer chains. The moltemplate program is
|
||||||
molecular builder that will generate complex molecular models. See
|
a true molecular builder that will generate complex molecular models.
|
||||||
the :doc:`Tools <Tools>` page for details on tools packaged with
|
See the :doc:`Tools <Tools>` page for details on tools packaged with
|
||||||
LAMMPS. The `Pre/post processing page <https:/www.lammps.org/prepost.html>`_ of the LAMMPS website
|
LAMMPS. The `Pre-/post-processing page
|
||||||
|
<https:/www.lammps.org/prepost.html>`_ of the LAMMPS homepage
|
||||||
describes a variety of third party tools for this task. Furthermore,
|
describes a variety of third party tools for this task. Furthermore,
|
||||||
some LAMMPS internal commands allow to reconstruct, or selectively add
|
some internal LAMMPS commands allow reconstructing, or selectively adding
|
||||||
topology information, as well as provide the option to insert molecule
|
topology information, as well as provide the option to insert molecule
|
||||||
templates instead of atoms for building bulk molecular systems.
|
templates instead of atoms for building bulk molecular systems.
|
||||||
* **Force-field assignment:** The conversion tools described in the previous
|
* **Force-field assignment:** The conversion tools described in the previous
|
||||||
@ -47,33 +48,34 @@ Here are suggestions on how to perform these tasks:
|
|||||||
powerful and flexible in converting force field and topology data
|
powerful and flexible in converting force field and topology data
|
||||||
between various MD simulation programs.
|
between various MD simulation programs.
|
||||||
* **Simulation analysis:** If you want to perform analysis on-the-fly as
|
* **Simulation analysis:** If you want to perform analysis on-the-fly as
|
||||||
your simulation runs, see the :doc:`compute <compute>` and
|
your simulation runs, see the :doc:`compute <compute>` and :doc:`fix
|
||||||
:doc:`fix <fix>` doc pages, which list commands that can be used in a
|
<fix>` doc pages, which list commands that can be used in a LAMMPS
|
||||||
LAMMPS input script. Also see the :doc:`Modify <Modify>` page for
|
input script. Also see the :doc:`Modify <Modify>` page for info on
|
||||||
info on how to add your own analysis code or algorithms to LAMMPS.
|
how to add your own analysis code or algorithms to LAMMPS. For
|
||||||
For post-processing, LAMMPS output such as :doc:`dump file snapshots <dump>` can be converted into formats used by other MD or
|
post-processing, LAMMPS output such as :doc:`dump file snapshots
|
||||||
|
<dump>` can be converted into formats used by other MD or
|
||||||
post-processing codes. To some degree, that conversion can be done
|
post-processing codes. To some degree, that conversion can be done
|
||||||
directly inside of LAMMPS by interfacing to the VMD molfile plugins.
|
directly inside LAMMPS by interfacing to the VMD molfile plugins. The
|
||||||
The :doc:`rerun <rerun>` command also allows to do some post-processing
|
:doc:`rerun <rerun>` command also allows post-processing of existing
|
||||||
of existing trajectories, and through being able to read a variety
|
trajectories, and through being able to read a variety of file
|
||||||
of file formats, this can also be used for analyzing trajectories
|
formats, this can also be used for analyzing trajectories from other
|
||||||
from other MD codes. Some post-processing tools packaged with
|
MD codes. Some post-processing tools packaged with LAMMPS will do
|
||||||
LAMMPS will do these conversions. Scripts provided in the
|
these conversions. Scripts provided in the tools/python directory can
|
||||||
tools/python directory can extract and massage data in dump files to
|
extract and massage data in dump files to make it easier to import
|
||||||
make it easier to import into other programs. See the
|
into other programs. See the :doc:`Tools <Tools>` page for details on
|
||||||
:doc:`Tools <Tools>` page for details on these various options.
|
these various options.
|
||||||
* **Visualization:** LAMMPS can produce NETPBM, JPG or PNG snapshot images
|
* **Visualization:** LAMMPS can produce NETPBM, JPG, or PNG format
|
||||||
on-the-fly via its :doc:`dump image <dump_image>` command and pass
|
snapshot images on-the-fly via its :doc:`dump image <dump_image>`
|
||||||
them to an external program, `FFmpeg <https://www.ffmpeg.org>`_ to generate
|
command and pass them to an external program, `FFmpeg
|
||||||
movies from them. For high-quality, interactive visualization there are
|
<https://www.ffmpeg.org>`_, to generate movies from them. For
|
||||||
many excellent and free tools available. See the
|
high-quality, interactive visualization, there are many excellent and
|
||||||
`Visualization Tools <https://www.lammps.org/viz.html>`_ page of the
|
free tools available. See the `Visualization Tools
|
||||||
LAMMPS website for
|
<https://www.lammps.org/viz.html>`_ page of the LAMMPS website for
|
||||||
visualization packages that can process LAMMPS output data.
|
visualization packages that can process LAMMPS output data.
|
||||||
* **Plotting:** See the next bullet about Pizza.py as well as the
|
* **Plotting:** See the next bullet about Pizza.py as well as the
|
||||||
:doc:`Python <Python_head>` page for examples of plotting LAMMPS
|
:doc:`Python <Python_head>` page for examples of plotting LAMMPS
|
||||||
output. Scripts provided with the *python* tool in the tools
|
output. Scripts provided with the *python* tool in the ``tools``
|
||||||
directory will extract and massage data in log and dump files to make
|
directory will extract and process data in log and dump files to make
|
||||||
it easier to analyze and plot. See the :doc:`Tools <Tools>` doc page
|
it easier to analyze and plot. See the :doc:`Tools <Tools>` doc page
|
||||||
for more discussion of the various tools.
|
for more discussion of the various tools.
|
||||||
* **Pizza.py:** Our group has also written a separate toolkit called
|
* **Pizza.py:** Our group has also written a separate toolkit called
|
||||||
|
|||||||
@ -1,20 +1,20 @@
|
|||||||
Overview of LAMMPS
|
Overview of LAMMPS
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
LAMMPS is a classical molecular dynamics (MD) code that models
|
LAMMPS is a classical molecular dynamics (MD) code that models ensembles
|
||||||
ensembles of particles in a liquid, solid, or gaseous state. It can
|
of particles in a liquid, solid, or gaseous state. It can model atomic,
|
||||||
model atomic, polymeric, biological, solid-state (metals, ceramics,
|
polymeric, biological, solid-state (metals, ceramics, oxides), granular,
|
||||||
oxides), granular, coarse-grained, or macroscopic systems using a
|
coarse-grained, or macroscopic systems using a variety of interatomic
|
||||||
variety of interatomic potentials (force fields) and boundary
|
potentials (force fields) and boundary conditions. It can model 2d or
|
||||||
conditions. It can model 2d or 3d systems with only a few particles
|
3d systems with sizes ranging from only a few particles up to billions.
|
||||||
up to millions or billions.
|
|
||||||
|
|
||||||
LAMMPS can be built and run on a laptop or desktop machine, but is
|
LAMMPS can be built and run on single laptop or desktop machines, but is
|
||||||
designed for parallel computers. It will run in serial and on any
|
designed for parallel computers. It will run in serial and on any
|
||||||
parallel machine that supports the `MPI <mpi_>`_ message-passing
|
parallel machine that supports the `MPI <mpi_>`_ message-passing
|
||||||
library. This includes shared-memory boxes and distributed-memory
|
library. This includes shared-memory multicore, multi-CPU servers and
|
||||||
clusters and supercomputers. Parts of LAMMPS also support
|
distributed-memory clusters and supercomputers. Parts of LAMMPS also
|
||||||
`OpenMP multi-threading <omp_>`_, vectorization and GPU acceleration.
|
support `OpenMP multi-threading <omp_>`_, vectorization, and GPU
|
||||||
|
acceleration.
|
||||||
|
|
||||||
.. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface
|
.. _mpi: https://en.wikipedia.org/wiki/Message_Passing_Interface
|
||||||
.. _lws: https://www.lammps.org
|
.. _lws: https://www.lammps.org
|
||||||
@ -42,11 +42,11 @@ LAMMPS uses neighbor lists to keep track of nearby particles. The lists
|
|||||||
are optimized for systems with particles that are repulsive at short
|
are optimized for systems with particles that are repulsive at short
|
||||||
distances, so that the local density of particles never becomes too
|
distances, so that the local density of particles never becomes too
|
||||||
large. This is in contrast to methods used for modeling plasma or
|
large. This is in contrast to methods used for modeling plasma or
|
||||||
gravitational bodies (e.g. galaxy formation).
|
gravitational bodies (like galaxy formation).
|
||||||
|
|
||||||
On parallel machines, LAMMPS uses spatial-decomposition techniques with
|
On parallel machines, LAMMPS uses spatial-decomposition techniques with
|
||||||
MPI parallelization to partition the simulation domain into sub-domains
|
MPI parallelization to partition the simulation domain into subdomains
|
||||||
of equal computational cost, one of which is assigned to each processor.
|
of equal computational cost, one of which is assigned to each processor.
|
||||||
Processors communicate and store "ghost" atom information for atoms that
|
Processors communicate and store "ghost" atom information for atoms that
|
||||||
border their sub-domain. Multi-threading parallelization and GPU
|
border their subdomain. Multi-threading parallelization and GPU
|
||||||
acceleration with with particle-decomposition can be used in addition.
|
acceleration with particle-decomposition can be used in addition.
|
||||||
|
|||||||
@ -30,17 +30,17 @@ can be created using CMake. CMake must be at least version 3.10.
|
|||||||
Operating systems
|
Operating systems
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The primary development platform for LAMMPS is Linux. Thus the chances
|
The primary development platform for LAMMPS is Linux. Thus, the chances
|
||||||
for LAMMPS to compile without problems on Linux machines are the best.
|
for LAMMPS to compile without problems on Linux machines are the best.
|
||||||
Also compilation and correct execution on macOS and Windows (using
|
Also, compilation and correct execution on macOS and Windows (using
|
||||||
Microsoft Visual C++) is checked automatically for largest part of the
|
Microsoft Visual C++) is checked automatically for largest part of the
|
||||||
source code. Some (optional) features are not compatible with all
|
source code. Some (optional) features are not compatible with all
|
||||||
operating systems either through limitations of the source code or
|
operating systems, either through limitations of the corresponding
|
||||||
source code compatibility or the build system requirements of required
|
LAMMPS source code or through source code or build system
|
||||||
libraries.
|
incompatibilities of required libraries.
|
||||||
|
|
||||||
Executables for Windows may be created using either Cygwin or Visual
|
Executables for Windows may be created natively using either Cygwin or
|
||||||
Studio or a Linux to Windows MinGW cross-compiler.
|
Visual Studio or with a Linux to Windows MinGW cross-compiler.
|
||||||
|
|
||||||
Additionally, FreeBSD and Solaris have been tested successfully.
|
Additionally, FreeBSD and Solaris have been tested successfully.
|
||||||
|
|
||||||
@ -49,7 +49,7 @@ Compilers
|
|||||||
|
|
||||||
The most commonly used compilers are the GNU compilers, but also Clang
|
The most commonly used compilers are the GNU compilers, but also Clang
|
||||||
and the Intel compilers have been successfully used on Linux, macOS, and
|
and the Intel compilers have been successfully used on Linux, macOS, and
|
||||||
Windows. Also the Nvidia HPC SDK (formerly PGI compilers) will compile
|
Windows. Also, the Nvidia HPC SDK (formerly PGI compilers) will compile
|
||||||
LAMMPS (tested on Linux).
|
LAMMPS (tested on Linux).
|
||||||
|
|
||||||
CPU architectures
|
CPU architectures
|
||||||
@ -62,12 +62,14 @@ regularly tested.
|
|||||||
Portability compliance
|
Portability compliance
|
||||||
^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Not all of the LAMMPS source code is fully compliant to all of the above
|
Only a subset of the LAMMPS source code is fully compliant to all of the
|
||||||
mentioned standards. This is rather typical for projects like LAMMPS
|
above mentioned standards. This is rather typical for projects like
|
||||||
that largely depend on contributions of features from the community.
|
LAMMPS that largely depend on contributions from the user community.
|
||||||
Not all contributors are trained as programmers and not all of them have
|
Not all contributors are trained as programmers and not all of them have
|
||||||
access to a variety of platforms. As part of the continuous integration
|
access to multiple platforms for testing. As part of the continuous
|
||||||
process, however, all contributions are automatically tested to compile,
|
integration process, however, all contributions are automatically tested
|
||||||
link, and pass some runtime tests on a selection of Linux flavors,
|
to compile, link, and pass some runtime tests on a selection of Linux
|
||||||
macOS, and Windows with different compilers. Other platforms may be
|
flavors, macOS, and Windows, and on Linux with different compilers.
|
||||||
checked occasionally or when portability bug are reported.
|
Thus portability issues are often found before a pull request is merged.
|
||||||
|
Other platforms may be checked occasionally or when portability bugs are
|
||||||
|
reported.
|
||||||
|
|||||||
@ -30,7 +30,7 @@ course, changing values should be done with care. When accessing per-atom
|
|||||||
data, please note that these data are the per-processor **local** data and are
|
data, please note that these data are the per-processor **local** data and are
|
||||||
indexed accordingly. Per-atom data can change sizes and ordering at
|
indexed accordingly. Per-atom data can change sizes and ordering at
|
||||||
every neighbor list rebuild or atom sort event as atoms migrate between
|
every neighbor list rebuild or atom sort event as atoms migrate between
|
||||||
sub-domains and processors.
|
subdomains and processors.
|
||||||
|
|
||||||
.. code-block:: c
|
.. code-block:: c
|
||||||
|
|
||||||
|
|||||||
@ -5,16 +5,17 @@ LAMMPS Documentation (|version| version)
|
|||||||
LAMMPS stands for **L**\ arge-scale **A**\ tomic/**M**\ olecular
|
LAMMPS stands for **L**\ arge-scale **A**\ tomic/**M**\ olecular
|
||||||
**M**\ assively **P**\ arallel **S**\ imulator.
|
**M**\ assively **P**\ arallel **S**\ imulator.
|
||||||
|
|
||||||
LAMMPS is a classical molecular dynamics simulation code with a focus
|
LAMMPS is a classical molecular dynamics simulation code focusing on
|
||||||
on materials modeling. It was designed to run efficiently on parallel
|
materials modeling. It was designed to run efficiently on parallel
|
||||||
computers. It was developed originally at Sandia National
|
computers and to be easy to extend and modify. Originally developed at
|
||||||
Laboratories, a US Department of Energy facility. The majority of
|
Sandia National Laboratories, a US Department of Energy facility, LAMMPS
|
||||||
funding for LAMMPS has come from the US Department of Energy (DOE).
|
now includes contributions from many research groups and individuals
|
||||||
LAMMPS is an open-source code, distributed freely under the terms of
|
from many institutions. Most of the funding for LAMMPS has come from
|
||||||
the GNU Public License Version 2 (GPLv2).
|
the US Department of Energy (DOE). LAMMPS is open-source software
|
||||||
|
distributed under the terms of the GNU Public License Version 2 (GPLv2).
|
||||||
|
|
||||||
The `LAMMPS website <lws_>`_ has a variety of information about the
|
The `LAMMPS website <lws_>`_ has a variety of information about the
|
||||||
code. It includes links to an on-line version of this manual, an
|
code. It includes links to an online version of this manual, an
|
||||||
`online forum <https://www.lammps.org/forum.html>`_ where users can post
|
`online forum <https://www.lammps.org/forum.html>`_ where users can post
|
||||||
questions and discuss LAMMPS, and a `GitHub site
|
questions and discuss LAMMPS, and a `GitHub site
|
||||||
<https://github.com/lammps/lammps>`_ where all LAMMPS development is
|
<https://github.com/lammps/lammps>`_ where all LAMMPS development is
|
||||||
@ -26,14 +27,14 @@ The content for this manual is part of the LAMMPS distribution. The
|
|||||||
online version always corresponds to the latest feature release version.
|
online version always corresponds to the latest feature release version.
|
||||||
If needed, you can build a local copy of the manual as HTML pages or a
|
If needed, you can build a local copy of the manual as HTML pages or a
|
||||||
PDF file by following the steps on the :doc:`Build_manual` page. If you
|
PDF file by following the steps on the :doc:`Build_manual` page. If you
|
||||||
have difficulties viewing the pages please :ref:`see this note
|
have difficulties viewing the pages, please :ref:`see this note
|
||||||
<webbrowser>`.
|
<webbrowser>`.
|
||||||
|
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
The manual is organized in three parts:
|
The manual is organized into three parts:
|
||||||
|
|
||||||
1. the :ref:`User Guide <user_documentation>` with information about how
|
1. The :ref:`User Guide <user_documentation>` with information about how
|
||||||
to obtain, configure, compile, install, and use LAMMPS,
|
to obtain, configure, compile, install, and use LAMMPS,
|
||||||
2. the :ref:`Programmer Guide <programmer_documentation>` with
|
2. the :ref:`Programmer Guide <programmer_documentation>` with
|
||||||
information about how to use the LAMMPS library interface from
|
information about how to use the LAMMPS library interface from
|
||||||
@ -47,7 +48,7 @@ The manual is organized in three parts:
|
|||||||
|
|
||||||
.. only:: html
|
.. only:: html
|
||||||
|
|
||||||
Once you are familiar with LAMMPS, you may want to bookmark
|
After becoming familiar with LAMMPS, consider bookmarking
|
||||||
:doc:`this page <Commands_all>`, since it gives quick access to
|
:doc:`this page <Commands_all>`, since it gives quick access to
|
||||||
tables with links to the documentation for all LAMMPS commands.
|
tables with links to the documentation for all LAMMPS commands.
|
||||||
|
|
||||||
|
|||||||
@ -2,43 +2,44 @@ What does a LAMMPS version mean
|
|||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
||||||
The LAMMPS "version" is the date when it was released, such as 1 May
|
The LAMMPS "version" is the date when it was released, such as 1 May
|
||||||
2014. LAMMPS is updated continuously and we aim to keep it working
|
2014. LAMMPS is updated continuously, and we aim to keep it working
|
||||||
correctly and reliably at all times. You can follow its development
|
correctly and reliably at all times. You can follow its development
|
||||||
in a public `git repository on GitHub <https://github.com/lammps/lammps>`_.
|
in a public `git repository on GitHub <https://github.com/lammps/lammps>`_.
|
||||||
|
|
||||||
Modifications of the LAMMPS source code - like bug fixes, code
|
Modifications of the LAMMPS source code (like bug fixes, code refactors,
|
||||||
refactors, updates to existing features, or addition of new features -
|
updates to existing features, or addition of new features) are organized
|
||||||
are organized into pull requests, and will be merged into the *develop*
|
into pull requests. Pull requests will be merged into the *develop*
|
||||||
branch of the git repository when they pass automated testing and code
|
branch of the git repository after they pass automated testing and code
|
||||||
review by the LAMMPS developers. When a sufficient number of changes
|
review by the LAMMPS developers. When a sufficient number of changes
|
||||||
have accumulated *and* the software passes a set of automated tests, we
|
have accumulated *and* the *develop* branch version passes an extended
|
||||||
release it as a *feature release* (or patch release), which are
|
set of automated tests, we release it as a *feature release* (or patch
|
||||||
currently made every 4-8 weeks. The *release* branch of the git
|
release), which are currently made every 4 to 8 weeks. The *release*
|
||||||
repository is updated with every such release. A summary of the most
|
branch of the git repository is updated with every such release. A
|
||||||
important changes of the patch releases are on `this website page
|
summary of the most important changes of the patch releases are on `this
|
||||||
<https://www.lammps.org/bug.html>`_. More detailed release notes are
|
website page <https://www.lammps.org/bug.html>`_. More detailed release
|
||||||
`available on GitHub <https://github.com/lammps/lammps/releases/>`_.
|
notes are `available on GitHub
|
||||||
|
<https://github.com/lammps/lammps/releases/>`_.
|
||||||
|
|
||||||
Once or twice a year, we have a "stabilization period" where we apply
|
Once or twice a year, we have a "stabilization period" where we apply
|
||||||
only bug fixes and small, non-intrusive changes to the *develop*
|
only bug fixes and small, non-intrusive changes to the *develop*
|
||||||
branch. At the same time the code is subjected to more detailed and
|
branch. At the same time, the code is subjected to more detailed and
|
||||||
thorough manual testing than the default automated testing. Also
|
thorough manual testing than the default automated testing. Also,
|
||||||
several variants of static code analysis are run to improve the overall
|
several variants of static code analysis are run to improve the overall
|
||||||
code quality, consistency, and compliance with programming standards,
|
code quality, consistency, and compliance with programming standards,
|
||||||
best practices and style conventions.
|
best practices and style conventions.
|
||||||
|
|
||||||
The latest patch release after such a period is then also labeled as a
|
The latest patch release after such a period is then also labeled as a
|
||||||
*stable* version and the *stable* branch is updated with it. Between
|
*stable* version and the *stable* branch is updated with it. Between
|
||||||
stable releases we occasionally release updates to the stable release
|
stable releases, we occasionally release updates to the stable release
|
||||||
containing only bug fixes and updates back-ported from the *develop*
|
containing only bug fixes and updates back-ported from the *develop*
|
||||||
branch and update the *stable* branch accordingly.
|
branch and update the *stable* branch accordingly.
|
||||||
|
|
||||||
Each version of LAMMPS contains all the documented features up to and
|
Each version of LAMMPS contains all the documented features up to and
|
||||||
including its version date. For recently added features we add markers
|
including its version date. For recently added features, we add markers
|
||||||
to the documentation at which specific LAMMPS version a feature or
|
to the documentation at which specific LAMMPS version a feature or
|
||||||
keyword was added or significantly changed.
|
keyword was added or significantly changed.
|
||||||
|
|
||||||
The version date is printed to the screen and logfile every time you run
|
The version date is printed to the screen and log file every time you run
|
||||||
LAMMPS. It is also in the file src/version.h and in the LAMMPS
|
LAMMPS. It is also in the file src/version.h and in the LAMMPS
|
||||||
directory name created when you unpack a tarball. And it is on the
|
directory name created when you unpack a tarball. And it is on the
|
||||||
first page of the :doc:`manual <Manual>`.
|
first page of the :doc:`manual <Manual>`.
|
||||||
|
|||||||
@ -23,7 +23,7 @@ against invalid accesses.
|
|||||||
When accessing per-atom data,
|
When accessing per-atom data,
|
||||||
please note that this data is the per-processor local data and indexed
|
please note that this data is the per-processor local data and indexed
|
||||||
accordingly. These arrays can change sizes and order at every neighbor list
|
accordingly. These arrays can change sizes and order at every neighbor list
|
||||||
rebuild and atom sort event as atoms are migrating between sub-domains.
|
rebuild and atom sort event as atoms are migrating between subdomains.
|
||||||
|
|
||||||
.. tabs::
|
.. tabs::
|
||||||
|
|
||||||
|
|||||||
@ -23,7 +23,7 @@ against invalid accesses.
|
|||||||
When accessing per-atom data,
|
When accessing per-atom data,
|
||||||
please note that this data is the per-processor local data and indexed
|
please note that this data is the per-processor local data and indexed
|
||||||
accordingly. These arrays can change sizes and order at every neighbor list
|
accordingly. These arrays can change sizes and order at every neighbor list
|
||||||
rebuild and atom sort event as atoms are migrating between sub-domains.
|
rebuild and atom sort event as atoms are migrating between subdomains.
|
||||||
|
|
||||||
.. tabs::
|
.. tabs::
|
||||||
|
|
||||||
|
|||||||
@ -9,7 +9,7 @@ There are two thrusts to the discussion that follows. The first is
|
|||||||
using code options that implement alternate algorithms that can
|
using code options that implement alternate algorithms that can
|
||||||
speed-up a simulation. The second is to use one of the several
|
speed-up a simulation. The second is to use one of the several
|
||||||
accelerator packages provided with LAMMPS that contain code optimized
|
accelerator packages provided with LAMMPS that contain code optimized
|
||||||
for certain kinds of hardware, including multi-core CPUs, GPUs, and
|
for certain kinds of hardware, including multicore CPUs, GPUs, and
|
||||||
Intel Xeon Phi co-processors.
|
Intel Xeon Phi co-processors.
|
||||||
|
|
||||||
The `Benchmark page <https://www.lammps.org/bench.html>`_ of the LAMMPS
|
The `Benchmark page <https://www.lammps.org/bench.html>`_ of the LAMMPS
|
||||||
|
|||||||
@ -11,7 +11,7 @@ parts of the :doc:`kspace_style pppm <kspace_style>` for long-range
|
|||||||
Coulombics. It has the following general features:
|
Coulombics. It has the following general features:
|
||||||
|
|
||||||
* It is designed to exploit common GPU hardware configurations where one
|
* It is designed to exploit common GPU hardware configurations where one
|
||||||
or more GPUs are coupled to many cores of one or more multi-core CPUs,
|
or more GPUs are coupled to many cores of one or more multicore CPUs,
|
||||||
e.g. within a node of a parallel machine.
|
e.g. within a node of a parallel machine.
|
||||||
* Atom-based data (e.g. coordinates, forces) are moved back-and-forth
|
* Atom-based data (e.g. coordinates, forces) are moved back-and-forth
|
||||||
between the CPU(s) and GPU every timestep.
|
between the CPU(s) and GPU every timestep.
|
||||||
@ -28,7 +28,7 @@ Coulombics. It has the following general features:
|
|||||||
* LAMMPS-specific code is in the GPU package. It makes calls to a
|
* LAMMPS-specific code is in the GPU package. It makes calls to a
|
||||||
generic GPU library in the lib/gpu directory. This library provides
|
generic GPU library in the lib/gpu directory. This library provides
|
||||||
either Nvidia support, AMD support, or more general OpenCL support
|
either Nvidia support, AMD support, or more general OpenCL support
|
||||||
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multi-core CPUs).
|
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs).
|
||||||
so that the same functionality is supported on a variety of hardware.
|
so that the same functionality is supported on a variety of hardware.
|
||||||
|
|
||||||
**Required hardware/software:**
|
**Required hardware/software:**
|
||||||
@ -146,7 +146,7 @@ GPUs/node to use, as well as other options.
|
|||||||
|
|
||||||
**Speed-ups to expect:**
|
**Speed-ups to expect:**
|
||||||
|
|
||||||
The performance of a GPU versus a multi-core CPU is a function of your
|
The performance of a GPU versus a multicore CPU is a function of your
|
||||||
hardware, which pair style is used, the number of atoms/GPU, and the
|
hardware, which pair style is used, the number of atoms/GPU, and the
|
||||||
precision used on the GPU (double, single, mixed). Using the GPU package
|
precision used on the GPU (double, single, mixed). Using the GPU package
|
||||||
in OpenCL mode on CPUs (which uses vectorization and multithreading) is
|
in OpenCL mode on CPUs (which uses vectorization and multithreading) is
|
||||||
@ -174,7 +174,7 @@ deterministic results.
|
|||||||
**Guidelines for best performance:**
|
**Guidelines for best performance:**
|
||||||
|
|
||||||
* Using multiple MPI tasks per GPU will often give the best performance,
|
* Using multiple MPI tasks per GPU will often give the best performance,
|
||||||
as allowed my most multi-core CPU/GPU configurations.
|
as allowed my most multicore CPU/GPU configurations.
|
||||||
* If the number of particles per MPI task is small (e.g. 100s of
|
* If the number of particles per MPI task is small (e.g. 100s of
|
||||||
particles), it can be more efficient to run with fewer MPI tasks per
|
particles), it can be more efficient to run with fewer MPI tasks per
|
||||||
GPU, even if you do not use all the cores on the compute node.
|
GPU, even if you do not use all the cores on the compute node.
|
||||||
|
|||||||
@ -79,7 +79,7 @@ manner via the ``mpirun`` or ``mpiexec`` commands, and is independent of
|
|||||||
Kokkos. E.g. the mpirun command in OpenMPI does this via its ``-np`` and
|
Kokkos. E.g. the mpirun command in OpenMPI does this via its ``-np`` and
|
||||||
``-npernode`` switches. Ditto for MPICH via ``-np`` and ``-ppn``.
|
``-npernode`` switches. Ditto for MPICH via ``-np`` and ``-ppn``.
|
||||||
|
|
||||||
Running on a multi-core CPU
|
Running on a multicore CPU
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Here is a quick overview of how to use the KOKKOS package
|
Here is a quick overview of how to use the KOKKOS package
|
||||||
@ -254,7 +254,7 @@ is recommended in this scenario.
|
|||||||
|
|
||||||
Using a GPU-aware MPI library is highly recommended. GPU-aware MPI use can be
|
Using a GPU-aware MPI library is highly recommended. GPU-aware MPI use can be
|
||||||
avoided by using :doc:`-pk kokkos gpu/aware off <package>`. As above for
|
avoided by using :doc:`-pk kokkos gpu/aware off <package>`. As above for
|
||||||
multi-core CPUs (and no GPU), if N is the number of physical cores/node,
|
multicore CPUs (and no GPU), if N is the number of physical cores/node,
|
||||||
then the number of MPI tasks/node should not exceed N.
|
then the number of MPI tasks/node should not exceed N.
|
||||||
|
|
||||||
.. parsed-literal::
|
.. parsed-literal::
|
||||||
|
|||||||
@ -12,7 +12,7 @@ Required hardware/software
|
|||||||
""""""""""""""""""""""""""
|
""""""""""""""""""""""""""
|
||||||
|
|
||||||
To enable multi-threading, your compiler must support the OpenMP interface.
|
To enable multi-threading, your compiler must support the OpenMP interface.
|
||||||
You should have one or more multi-core CPUs, as multiple threads can only be
|
You should have one or more multicore CPUs, as multiple threads can only be
|
||||||
launched by each MPI task on the local node (using shared memory).
|
launched by each MPI task on the local node (using shared memory).
|
||||||
|
|
||||||
Building LAMMPS with the OPENMP package
|
Building LAMMPS with the OPENMP package
|
||||||
@ -157,7 +157,7 @@ Additional performance tips are as follows:
|
|||||||
affinity setting that restricts each MPI task to a single CPU core.
|
affinity setting that restricts each MPI task to a single CPU core.
|
||||||
Using multi-threading in this mode will force all threads to share the
|
Using multi-threading in this mode will force all threads to share the
|
||||||
one core and thus is likely to be counterproductive. Instead, binding
|
one core and thus is likely to be counterproductive. Instead, binding
|
||||||
MPI tasks to a (multi-core) socket, should solve this issue.
|
MPI tasks to a (multicore) socket, should solve this issue.
|
||||||
|
|
||||||
Restrictions
|
Restrictions
|
||||||
""""""""""""
|
""""""""""""
|
||||||
|
|||||||
@ -113,7 +113,7 @@ your input script. LAMMPS does not use the group until a simulation
|
|||||||
is run.
|
is run.
|
||||||
|
|
||||||
The *sort* keyword turns on a spatial sorting or reordering of atoms
|
The *sort* keyword turns on a spatial sorting or reordering of atoms
|
||||||
within each processor's sub-domain every *Nfreq* timesteps. If
|
within each processor's subdomain every *Nfreq* timesteps. If
|
||||||
*Nfreq* is set to 0, then sorting is turned off. Sorting can improve
|
*Nfreq* is set to 0, then sorting is turned off. Sorting can improve
|
||||||
cache performance and thus speed-up a LAMMPS simulation, as discussed
|
cache performance and thus speed-up a LAMMPS simulation, as discussed
|
||||||
in a paper by :ref:`(Meloni) <Meloni>`. Its efficacy depends on the problem
|
in a paper by :ref:`(Meloni) <Meloni>`. Its efficacy depends on the problem
|
||||||
|
|||||||
@ -54,7 +54,7 @@ Syntax
|
|||||||
*store* name = store weight in custom atom property defined by :doc:`fix property/atom <fix_property_atom>` command
|
*store* name = store weight in custom atom property defined by :doc:`fix property/atom <fix_property_atom>` command
|
||||||
name = atom property name (without d\_ prefix)
|
name = atom property name (without d\_ prefix)
|
||||||
*out* arg = filename
|
*out* arg = filename
|
||||||
filename = write each processor's sub-domain to a file
|
filename = write each processor's subdomain to a file
|
||||||
|
|
||||||
Examples
|
Examples
|
||||||
""""""""
|
""""""""
|
||||||
@ -72,14 +72,14 @@ Examples
|
|||||||
Description
|
Description
|
||||||
"""""""""""
|
"""""""""""
|
||||||
|
|
||||||
This command adjusts the size and shape of processor sub-domains
|
This command adjusts the size and shape of processor subdomains
|
||||||
within the simulation box, to attempt to balance the number of atoms
|
within the simulation box, to attempt to balance the number of atoms
|
||||||
or particles and thus indirectly the computational cost (load) more
|
or particles and thus indirectly the computational cost (load) more
|
||||||
evenly across processors. The load balancing is "static" in the sense
|
evenly across processors. The load balancing is "static" in the sense
|
||||||
that this command performs the balancing once, before or between
|
that this command performs the balancing once, before or between
|
||||||
simulations. The processor sub-domains will then remain static during
|
simulations. The processor subdomains will then remain static during
|
||||||
the subsequent run. To perform "dynamic" balancing, see the :doc:`fix
|
the subsequent run. To perform "dynamic" balancing, see the :doc:`fix
|
||||||
balance <fix_balance>` command, which can adjust processor sub-domain
|
balance <fix_balance>` command, which can adjust processor subdomain
|
||||||
sizes and shapes on-the-fly during a :doc:`run <run>`.
|
sizes and shapes on-the-fly during a :doc:`run <run>`.
|
||||||
|
|
||||||
Load-balancing is typically most useful if the particles in the
|
Load-balancing is typically most useful if the particles in the
|
||||||
@ -90,7 +90,7 @@ an irregular-shaped geometry containing void regions, or :doc:`hybrid
|
|||||||
pair style simulations <pair_hybrid>` which combine pair styles with
|
pair style simulations <pair_hybrid>` which combine pair styles with
|
||||||
different computational cost. In these cases, the LAMMPS default of
|
different computational cost. In these cases, the LAMMPS default of
|
||||||
dividing the simulation box volume into a regular-spaced grid of 3d
|
dividing the simulation box volume into a regular-spaced grid of 3d
|
||||||
bricks, with one equal-volume sub-domain per processor, may assign
|
bricks, with one equal-volume subdomain per processor, may assign
|
||||||
numbers of particles per processor in a way that the computational
|
numbers of particles per processor in a way that the computational
|
||||||
effort varies significantly. This can lead to poor performance when
|
effort varies significantly. This can lead to poor performance when
|
||||||
the simulation is run in parallel.
|
the simulation is run in parallel.
|
||||||
@ -109,7 +109,7 @@ Specifically, for a Px by Py by Pz grid of processors, it allows
|
|||||||
choice of Px, Py, and Pz, subject to the constraint that Px \* Py \*
|
choice of Px, Py, and Pz, subject to the constraint that Px \* Py \*
|
||||||
Pz = P, the total number of processors. This is sufficient to achieve
|
Pz = P, the total number of processors. This is sufficient to achieve
|
||||||
good load-balance for some problems on some processor counts.
|
good load-balance for some problems on some processor counts.
|
||||||
However, all the processor sub-domains will still have the same shape
|
However, all the processor subdomains will still have the same shape
|
||||||
and same volume.
|
and same volume.
|
||||||
|
|
||||||
The requested load-balancing operation is only performed if the
|
The requested load-balancing operation is only performed if the
|
||||||
@ -162,7 +162,7 @@ fractions of the box length) are also printed.
|
|||||||
simulation could run up to 20% faster if it were perfectly balanced,
|
simulation could run up to 20% faster if it were perfectly balanced,
|
||||||
versus when imbalanced. However, computational cost is not strictly
|
versus when imbalanced. However, computational cost is not strictly
|
||||||
proportional to particle count, and changing the relative size and
|
proportional to particle count, and changing the relative size and
|
||||||
shape of processor sub-domains may lead to additional computational
|
shape of processor subdomains may lead to additional computational
|
||||||
and communication overheads, e.g. in the PPPM solver used via the
|
and communication overheads, e.g. in the PPPM solver used via the
|
||||||
:doc:`kspace_style <kspace_style>` command. Thus you should benchmark
|
:doc:`kspace_style <kspace_style>` command. Thus you should benchmark
|
||||||
the run times of a simulation before and after balancing.
|
the run times of a simulation before and after balancing.
|
||||||
@ -177,7 +177,7 @@ The *x*, *y*, *z*, and *shift* styles are "grid" methods which
|
|||||||
produce a logical 3d grid of processors. They operate by changing the
|
produce a logical 3d grid of processors. They operate by changing the
|
||||||
cutting planes (or lines) between processors in 3d (or 2d), to adjust
|
cutting planes (or lines) between processors in 3d (or 2d), to adjust
|
||||||
the volume (area in 2d) assigned to each processor, as in the
|
the volume (area in 2d) assigned to each processor, as in the
|
||||||
following 2d diagram where processor sub-domains are shown and
|
following 2d diagram where processor subdomains are shown and
|
||||||
particles are colored by the processor that owns them.
|
particles are colored by the processor that owns them.
|
||||||
|
|
||||||
.. |balance1| image:: img/balance_uniform.jpg
|
.. |balance1| image:: img/balance_uniform.jpg
|
||||||
@ -226,7 +226,7 @@ The *x*, *y*, and *z* styles invoke a "grid" method for balancing, as
|
|||||||
described above. Note that any or all of these 3 styles can be
|
described above. Note that any or all of these 3 styles can be
|
||||||
specified together, one after the other, but they cannot be used with
|
specified together, one after the other, but they cannot be used with
|
||||||
any other style. This style adjusts the position of cutting planes
|
any other style. This style adjusts the position of cutting planes
|
||||||
between processor sub-domains in specific dimensions. Only the
|
between processor subdomains in specific dimensions. Only the
|
||||||
specified dimensions are altered.
|
specified dimensions are altered.
|
||||||
|
|
||||||
The *uniform* argument spaces the planes evenly, as in the left
|
The *uniform* argument spaces the planes evenly, as in the left
|
||||||
@ -245,8 +245,8 @@ the cutting place. The left (or lower) edge of the box is 0.0, and
|
|||||||
the right (or upper) edge is 1.0. Neither of these values is
|
the right (or upper) edge is 1.0. Neither of these values is
|
||||||
specified. Only the interior Ps-1 positions are specified. Thus is
|
specified. Only the interior Ps-1 positions are specified. Thus is
|
||||||
there are 2 processors in the x dimension, you specify a single value
|
there are 2 processors in the x dimension, you specify a single value
|
||||||
such as 0.75, which would make the left processor's sub-domain 3x
|
such as 0.75, which would make the left processor's subdomain 3x
|
||||||
larger than the right processor's sub-domain.
|
larger than the right processor's subdomain.
|
||||||
|
|
||||||
----------
|
----------
|
||||||
|
|
||||||
@ -288,10 +288,10 @@ adjacent planes are closer together than the neighbor skin distance
|
|||||||
(as specified by the :doc:`neigh_modify <neigh_modify>` command), then
|
(as specified by the :doc:`neigh_modify <neigh_modify>` command), then
|
||||||
the plane positions are shifted to separate them by at least this
|
the plane positions are shifted to separate them by at least this
|
||||||
amount. This is to prevent particles being lost when dynamics are run
|
amount. This is to prevent particles being lost when dynamics are run
|
||||||
with processor sub-domains that are too narrow in one or more
|
with processor subdomains that are too narrow in one or more
|
||||||
dimensions.
|
dimensions.
|
||||||
|
|
||||||
Once the re-balancing is complete and final processor sub-domains
|
Once the re-balancing is complete and final processor subdomains
|
||||||
assigned, particles are migrated to their new owning processor, and
|
assigned, particles are migrated to their new owning processor, and
|
||||||
the balance procedure ends.
|
the balance procedure ends.
|
||||||
|
|
||||||
@ -299,7 +299,7 @@ the balance procedure ends.
|
|||||||
|
|
||||||
At each re-balance operation, the bisectioning for each cutting
|
At each re-balance operation, the bisectioning for each cutting
|
||||||
plane (line in 2d) typically starts with low and high bounds separated
|
plane (line in 2d) typically starts with low and high bounds separated
|
||||||
by the extent of a processor's sub-domain in one dimension. The size
|
by the extent of a processor's subdomain in one dimension. The size
|
||||||
of this bracketing region shrinks by 1/2 every iteration. Thus if
|
of this bracketing region shrinks by 1/2 every iteration. Thus if
|
||||||
*Niter* is specified as 10, the cutting plane will typically be
|
*Niter* is specified as 10, the cutting plane will typically be
|
||||||
positioned to 1 part in 1000 accuracy (relative to the perfect target
|
positioned to 1 part in 1000 accuracy (relative to the perfect target
|
||||||
@ -494,7 +494,7 @@ different kinds of custom atom vectors or arrays as arguments.
|
|||||||
|
|
||||||
The *out* keyword writes a text file to the specified *filename* with
|
The *out* keyword writes a text file to the specified *filename* with
|
||||||
the results of the balancing operation. The file contains the bounds
|
the results of the balancing operation. The file contains the bounds
|
||||||
of the sub-domain for each processor after the balancing operation
|
of the subdomain for each processor after the balancing operation
|
||||||
completes. The format of the file is compatible with the
|
completes. The format of the file is compatible with the
|
||||||
`Pizza.py <pizza_>`_ *mdump* tool which has support for manipulating and
|
`Pizza.py <pizza_>`_ *mdump* tool which has support for manipulating and
|
||||||
visualizing mesh files. An example is shown here for a balancing by 4
|
visualizing mesh files. An example is shown here for a balancing by 4
|
||||||
@ -538,7 +538,7 @@ processors for a 2d problem:
|
|||||||
4 1 13 14 15 16
|
4 1 13 14 15 16
|
||||||
|
|
||||||
The coordinates of all the vertices are listed in the NODES section, 5
|
The coordinates of all the vertices are listed in the NODES section, 5
|
||||||
per processor. Note that the 4 sub-domains share vertices, so there
|
per processor. Note that the 4 subdomains share vertices, so there
|
||||||
will be duplicate nodes in the list.
|
will be duplicate nodes in the list.
|
||||||
|
|
||||||
The "SQUARES" section lists the node IDs of the 4 vertices in a
|
The "SQUARES" section lists the node IDs of the 4 vertices in a
|
||||||
|
|||||||
@ -61,7 +61,7 @@ move. Note that when the difference between the current box dimensions
|
|||||||
and the shrink-wrap box dimensions is large, this can lead to lost
|
and the shrink-wrap box dimensions is large, this can lead to lost
|
||||||
atoms at the beginning of a run when running in parallel. This is due
|
atoms at the beginning of a run when running in parallel. This is due
|
||||||
to the large change in the (global) box dimensions also causing
|
to the large change in the (global) box dimensions also causing
|
||||||
significant changes in the individual sub-domain sizes. If these
|
significant changes in the individual subdomain sizes. If these
|
||||||
changes are farther than the communication cutoff, atoms will be lost.
|
changes are farther than the communication cutoff, atoms will be lost.
|
||||||
This is best addressed by setting initial box dimensions to match the
|
This is best addressed by setting initial box dimensions to match the
|
||||||
shrink-wrapped dimensions more closely, by using *m* style boundaries
|
shrink-wrapped dimensions more closely, by using *m* style boundaries
|
||||||
|
|||||||
@ -62,7 +62,7 @@ distances are used to determine which atoms to communicate.
|
|||||||
|
|
||||||
The default mode is *single* which means each processor acquires
|
The default mode is *single* which means each processor acquires
|
||||||
information for ghost atoms that are within a single distance from its
|
information for ghost atoms that are within a single distance from its
|
||||||
sub-domain. The distance is by default the maximum of the neighbor
|
subdomain. The distance is by default the maximum of the neighbor
|
||||||
cutoff across all atom type pairs.
|
cutoff across all atom type pairs.
|
||||||
|
|
||||||
For many systems this is an efficient algorithm, but for systems with
|
For many systems this is an efficient algorithm, but for systems with
|
||||||
@ -81,7 +81,7 @@ with both the *multi* and *multi/old* neighbor styles.
|
|||||||
|
|
||||||
The *cutoff* keyword allows you to extend the ghost cutoff distance
|
The *cutoff* keyword allows you to extend the ghost cutoff distance
|
||||||
for communication mode *single*, which is the distance from the borders
|
for communication mode *single*, which is the distance from the borders
|
||||||
of a processor's sub-domain at which ghost atoms are acquired from other
|
of a processor's subdomain at which ghost atoms are acquired from other
|
||||||
processors. By default the ghost cutoff = neighbor cutoff = pairwise
|
processors. By default the ghost cutoff = neighbor cutoff = pairwise
|
||||||
force cutoff + neighbor skin. See the :doc:`neighbor <neighbor>` command
|
force cutoff + neighbor skin. See the :doc:`neighbor <neighbor>` command
|
||||||
for more information about the skin distance. If the specified Rcut is
|
for more information about the skin distance. If the specified Rcut is
|
||||||
|
|||||||
@ -54,7 +54,7 @@ per atom, e.g. a list of bond distances. Per-grid quantities are
|
|||||||
calculated on a regular 2d or 3d grid which overlays a 2d or 3d
|
calculated on a regular 2d or 3d grid which overlays a 2d or 3d
|
||||||
simulation domain. The grid points and the data they store are
|
simulation domain. The grid points and the data they store are
|
||||||
distributed across processors; each processor owns the grid points
|
distributed across processors; each processor owns the grid points
|
||||||
which fall within its sub-domain.
|
which fall within its subdomain.
|
||||||
|
|
||||||
Computes that produce per-atom quantities have the word "atom" at the
|
Computes that produce per-atom quantities have the word "atom" at the
|
||||||
end of their style, e.g. *ke/atom*\ . Computes that produce local
|
end of their style, e.g. *ke/atom*\ . Computes that produce local
|
||||||
|
|||||||
@ -48,9 +48,9 @@ the virial, equal to :math:`-dU/dV`, computed for all pairwise as well
|
|||||||
as 2-body, 3-body, 4-body, many-body, and long-range interactions, where
|
as 2-body, 3-body, 4-body, many-body, and long-range interactions, where
|
||||||
:math:`\vec r_i` and :math:`\vec f_i` are the position and force vector
|
:math:`\vec r_i` and :math:`\vec f_i` are the position and force vector
|
||||||
of atom *i*, and the dot indicates the dot product (scalar product).
|
of atom *i*, and the dot indicates the dot product (scalar product).
|
||||||
This is computed in parallel for each sub-domain and then summed over
|
This is computed in parallel for each subdomain and then summed over
|
||||||
all parallel processes. Thus :math:`N'` necessarily includes atoms from
|
all parallel processes. Thus :math:`N'` necessarily includes atoms from
|
||||||
neighboring sub-domains (so-called ghost atoms) and the position and
|
neighboring subdomains (so-called ghost atoms) and the position and
|
||||||
force vectors of ghost atoms are thus included in the summation. Only
|
force vectors of ghost atoms are thus included in the summation. Only
|
||||||
when running in serial and without periodic boundary conditions is
|
when running in serial and without periodic boundary conditions is
|
||||||
:math:`N' = N` the number of atoms in the system. :doc:`Fixes <fix>`
|
:math:`N' = N` the number of atoms in the system. :doc:`Fixes <fix>`
|
||||||
|
|||||||
@ -39,7 +39,7 @@ Description
|
|||||||
Define a computation that stores the specified attributes of a
|
Define a computation that stores the specified attributes of a
|
||||||
distributed grid. In LAMMPS, distributed grids are regular 2d or 3d
|
distributed grid. In LAMMPS, distributed grids are regular 2d or 3d
|
||||||
grids which overlay a 2d or 3d simulation domain. Each processor owns
|
grids which overlay a 2d or 3d simulation domain. Each processor owns
|
||||||
the grid cells whose center points lie within its sub-domain. See the
|
the grid cells whose center points lie within its subdomain. See the
|
||||||
:doc:`Howto grid <Howto_grid>` doc page for details of how distributed
|
:doc:`Howto grid <Howto_grid>` doc page for details of how distributed
|
||||||
grids can be defined by various commands and referenced.
|
grids can be defined by various commands and referenced.
|
||||||
|
|
||||||
|
|||||||
@ -259,7 +259,7 @@ layout in the global array.
|
|||||||
Compute *sna/grid/local* calculates bispectrum components of a regular
|
Compute *sna/grid/local* calculates bispectrum components of a regular
|
||||||
grid of points similarly to compute *sna/grid* described above.
|
grid of points similarly to compute *sna/grid* described above.
|
||||||
However, because the array is local, it contains only rows for grid points
|
However, because the array is local, it contains only rows for grid points
|
||||||
that are local to the processor sub-domain. The global grid
|
that are local to the processor subdomain. The global grid
|
||||||
of :math:`nx \times ny \times nz` points is still laid out in space the same as for *sna/grid*,
|
of :math:`nx \times ny \times nz` points is still laid out in space the same as for *sna/grid*,
|
||||||
but grid points are strictly partitioned, so that every grid point appears in
|
but grid points are strictly partitioned, so that every grid point appears in
|
||||||
one and only one local array. The array contains one row for each of the
|
one and only one local array. The array contains one row for each of the
|
||||||
|
|||||||
@ -80,9 +80,9 @@ Syntax
|
|||||||
axes = *yes* or *no* = do or do not draw xyz axes lines next to simulation box
|
axes = *yes* or *no* = do or do not draw xyz axes lines next to simulation box
|
||||||
length = length of axes lines as fraction of respective box lengths
|
length = length of axes lines as fraction of respective box lengths
|
||||||
diam = diameter of axes lines as fraction of shortest box length
|
diam = diameter of axes lines as fraction of shortest box length
|
||||||
*subbox* values = lines diam = draw outline of processor sub-domains
|
*subbox* values = lines diam = draw outline of processor subdomains
|
||||||
lines = *yes* or *no* = do or do not draw sub-domain lines
|
lines = *yes* or *no* = do or do not draw subdomain lines
|
||||||
diam = diameter of sub-domain lines as fraction of shortest box length
|
diam = diameter of subdomain lines as fraction of shortest box length
|
||||||
*shiny* value = sfactor = shinyness of spheres and cylinders
|
*shiny* value = sfactor = shinyness of spheres and cylinders
|
||||||
sfactor = shinyness of spheres and cylinders from 0.0 to 1.0
|
sfactor = shinyness of spheres and cylinders from 0.0 to 1.0
|
||||||
*ssao* value = shading seed dfactor = SSAO depth shading
|
*ssao* value = shading seed dfactor = SSAO depth shading
|
||||||
@ -145,7 +145,7 @@ Syntax
|
|||||||
*bitrate* arg = rate
|
*bitrate* arg = rate
|
||||||
rate = target bitrate for movie in kbps
|
rate = target bitrate for movie in kbps
|
||||||
*boxcolor* arg = color
|
*boxcolor* arg = color
|
||||||
color = name of color for simulation box lines and processor sub-domain lines
|
color = name of color for simulation box lines and processor subdomain lines
|
||||||
*color* args = name R G B
|
*color* args = name R G B
|
||||||
name = name of color
|
name = name of color
|
||||||
R,G,B = red/green/blue numeric values from 0.0 to 1.0
|
R,G,B = red/green/blue numeric values from 0.0 to 1.0
|
||||||
@ -581,13 +581,13 @@ respective box lengths. The *diam* setting determines their thickness
|
|||||||
as a fraction of the shortest box length in x,y,z (for 3d) or x,y (for
|
as a fraction of the shortest box length in x,y,z (for 3d) or x,y (for
|
||||||
2d).
|
2d).
|
||||||
|
|
||||||
The *subbox* keyword determines if and how processor sub-domain
|
The *subbox* keyword determines if and how processor subdomain
|
||||||
boundaries are rendered as thin cylinders in the image. If *no* is
|
boundaries are rendered as thin cylinders in the image. If *no* is
|
||||||
set (default), then the sub-domain boundaries are not drawn and the
|
set (default), then the subdomain boundaries are not drawn and the
|
||||||
*diam* setting is ignored. If *yes* is set, the 12 edges of each
|
*diam* setting is ignored. If *yes* is set, the 12 edges of each
|
||||||
processor sub-domain are drawn, with a diameter that is a fraction of
|
processor subdomain are drawn, with a diameter that is a fraction of
|
||||||
the shortest box length in x,y,z (for 3d) or x,y (for 2d). The color
|
the shortest box length in x,y,z (for 3d) or x,y (for 2d). The color
|
||||||
of the sub-domain boundaries can be set with the "dump_modify
|
of the subdomain boundaries can be set with the "dump_modify
|
||||||
boxcolor" command.
|
boxcolor" command.
|
||||||
|
|
||||||
----------
|
----------
|
||||||
@ -921,8 +921,8 @@ formats.
|
|||||||
|
|
||||||
The *boxcolor* keyword sets the color of the simulation box drawn
|
The *boxcolor* keyword sets the color of the simulation box drawn
|
||||||
around the atoms in each image as well as the color of processor
|
around the atoms in each image as well as the color of processor
|
||||||
sub-domain boundaries. See the "dump image box" command for how to
|
subdomain boundaries. See the "dump image box" command for how to
|
||||||
specify that a box be drawn via the *box* keyword, and the sub-domain
|
specify that a box be drawn via the *box* keyword, and the subdomain
|
||||||
boundaries via the *subbox* keyword. The color name can be any of the
|
boundaries via the *subbox* keyword. The color name can be any of the
|
||||||
140 pre-defined colors (see below) or a color name defined by the
|
140 pre-defined colors (see below) or a color name defined by the
|
||||||
dump_modify color option.
|
dump_modify color option.
|
||||||
|
|||||||
@ -89,7 +89,7 @@ owns, but there may be zero or more per atoms. Per-grid quantities
|
|||||||
are calculated on a regular 2d or 3d grid which overlays a 2d or 3d
|
are calculated on a regular 2d or 3d grid which overlays a 2d or 3d
|
||||||
simulation domain. The grid points and the data they store are
|
simulation domain. The grid points and the data they store are
|
||||||
distributed across processors; each processor owns the grid points
|
distributed across processors; each processor owns the grid points
|
||||||
which fall within its sub-domain.
|
which fall within its subdomain.
|
||||||
|
|
||||||
Note that a single fix typically produces either global or per-atom or
|
Note that a single fix typically produces either global or per-atom or
|
||||||
local or per-grid values (or none at all). It does not produce both
|
local or per-grid values (or none at all). It does not produce both
|
||||||
|
|||||||
@ -84,7 +84,7 @@ produced by other computes or fixes. This fix operates in either
|
|||||||
per-grid inputs in the same command.
|
per-grid inputs in the same command.
|
||||||
|
|
||||||
The grid created by this command is distributed; each processor owns
|
The grid created by this command is distributed; each processor owns
|
||||||
the grid points that are within its sub-domain. This is similar to
|
the grid points that are within its subdomain. This is similar to
|
||||||
the :doc:`fix ave/chunk <fix_ave_chunk>` command when it uses chunks
|
the :doc:`fix ave/chunk <fix_ave_chunk>` command when it uses chunks
|
||||||
from the :doc:`compute chunk/atom <compute_chunk_atom>` command which
|
from the :doc:`compute chunk/atom <compute_chunk_atom>` command which
|
||||||
are 2d or 3d regular bins. However, the per-bin outputs in that case
|
are 2d or 3d regular bins. However, the per-bin outputs in that case
|
||||||
|
|||||||
@ -44,7 +44,7 @@ Syntax
|
|||||||
*store* name = store weight in custom atom property defined by :doc:`fix property/atom <fix_property_atom>` command
|
*store* name = store weight in custom atom property defined by :doc:`fix property/atom <fix_property_atom>` command
|
||||||
name = atom property name (without d\_ prefix)
|
name = atom property name (without d\_ prefix)
|
||||||
*out* arg = filename
|
*out* arg = filename
|
||||||
filename = write each processor's sub-domain to a file, at each re-balancing
|
filename = write each processor's subdomain to a file, at each re-balancing
|
||||||
|
|
||||||
Examples
|
Examples
|
||||||
""""""""
|
""""""""
|
||||||
@ -61,7 +61,7 @@ Examples
|
|||||||
Description
|
Description
|
||||||
"""""""""""
|
"""""""""""
|
||||||
|
|
||||||
This command adjusts the size and shape of processor sub-domains
|
This command adjusts the size and shape of processor subdomains
|
||||||
within the simulation box, to attempt to balance the number of
|
within the simulation box, to attempt to balance the number of
|
||||||
particles and thus the computational cost (load) evenly across
|
particles and thus the computational cost (load) evenly across
|
||||||
processors. The load balancing is "dynamic" in the sense that
|
processors. The load balancing is "dynamic" in the sense that
|
||||||
@ -77,7 +77,7 @@ an irregular-shaped geometry containing void regions, or
|
|||||||
:doc:`hybrid pair style simulations <pair_hybrid>` that combine
|
:doc:`hybrid pair style simulations <pair_hybrid>` that combine
|
||||||
pair styles with different computational cost). In these cases, the
|
pair styles with different computational cost). In these cases, the
|
||||||
LAMMPS default of dividing the simulation box volume into a
|
LAMMPS default of dividing the simulation box volume into a
|
||||||
regular-spaced grid of 3d bricks, with one equal-volume sub-domain
|
regular-spaced grid of 3d bricks, with one equal-volume subdomain
|
||||||
per processor, may assign numbers of particles per processor in a
|
per processor, may assign numbers of particles per processor in a
|
||||||
way that the computational effort varies significantly. This can
|
way that the computational effort varies significantly. This can
|
||||||
lead to poor performance when the simulation is run in parallel.
|
lead to poor performance when the simulation is run in parallel.
|
||||||
@ -105,7 +105,7 @@ a :math:`P_x \times P_y \times P_z` grid of processors, it allows choices of
|
|||||||
:math:`P_x P_y P_z = P`, the total number of processors.
|
:math:`P_x P_y P_z = P`, the total number of processors.
|
||||||
This is sufficient to achieve good load-balance for
|
This is sufficient to achieve good load-balance for
|
||||||
some problems on some processor counts. However, all the processor
|
some problems on some processor counts. However, all the processor
|
||||||
sub-domains will still have the same shape and the same volume.
|
subdomains will still have the same shape and the same volume.
|
||||||
|
|
||||||
On a particular time step, a load-balancing operation is only performed
|
On a particular time step, a load-balancing operation is only performed
|
||||||
if the current "imbalance factor" in particles owned by each processor
|
if the current "imbalance factor" in particles owned by each processor
|
||||||
@ -141,7 +141,7 @@ forced even if the current balance is perfect (1.0) be specifying a
|
|||||||
simulation could run up to 20% faster if it were perfectly balanced,
|
simulation could run up to 20% faster if it were perfectly balanced,
|
||||||
versus when imbalanced. However, computational cost is not strictly
|
versus when imbalanced. However, computational cost is not strictly
|
||||||
proportional to particle count, and changing the relative size and
|
proportional to particle count, and changing the relative size and
|
||||||
shape of processor sub-domains may lead to additional computational
|
shape of processor subdomains may lead to additional computational
|
||||||
and communication overheads (e.g., in the PPPM solver used via the
|
and communication overheads (e.g., in the PPPM solver used via the
|
||||||
:doc:`kspace_style <kspace_style>` command). Thus, you should benchmark
|
:doc:`kspace_style <kspace_style>` command). Thus, you should benchmark
|
||||||
the run times of a simulation before and after balancing.
|
the run times of a simulation before and after balancing.
|
||||||
@ -156,7 +156,7 @@ The *shift* style is a "grid" method which produces a logical 3d grid
|
|||||||
of processors. It operates by changing the cutting planes (or lines)
|
of processors. It operates by changing the cutting planes (or lines)
|
||||||
between processors in 3d (or 2d), to adjust the volume (area in 2d)
|
between processors in 3d (or 2d), to adjust the volume (area in 2d)
|
||||||
assigned to each processor, as in the following 2d diagram where
|
assigned to each processor, as in the following 2d diagram where
|
||||||
processor sub-domains are shown and atoms are colored by the processor
|
processor subdomains are shown and atoms are colored by the processor
|
||||||
that owns them.
|
that owns them.
|
||||||
|
|
||||||
.. |balance1| image:: img/balance_uniform.jpg
|
.. |balance1| image:: img/balance_uniform.jpg
|
||||||
@ -258,7 +258,7 @@ from balanced, and converge more slowly. In this case you probably
|
|||||||
want to use the :doc:`balance <balance>` command before starting a run,
|
want to use the :doc:`balance <balance>` command before starting a run,
|
||||||
so that you begin the run with a balanced system.
|
so that you begin the run with a balanced system.
|
||||||
|
|
||||||
Once the re-balancing is complete and final processor sub-domains
|
Once the re-balancing is complete and final processor subdomains
|
||||||
assigned, particles migrate to their new owning processor as part of
|
assigned, particles migrate to their new owning processor as part of
|
||||||
the normal reneighboring procedure.
|
the normal reneighboring procedure.
|
||||||
|
|
||||||
@ -266,7 +266,7 @@ the normal reneighboring procedure.
|
|||||||
|
|
||||||
At each re-balance operation, the bisectioning for each cutting
|
At each re-balance operation, the bisectioning for each cutting
|
||||||
plane (line in 2d) typically starts with low and high bounds separated
|
plane (line in 2d) typically starts with low and high bounds separated
|
||||||
by the extent of a processor's sub-domain in one dimension. The size
|
by the extent of a processor's subdomain in one dimension. The size
|
||||||
of this bracketing region shrinks based on the local density, as
|
of this bracketing region shrinks based on the local density, as
|
||||||
described above, which should typically be 1/2 or more every
|
described above, which should typically be 1/2 or more every
|
||||||
iteration. Thus if :math:`N_\text{iter}` is specified as 10, the cutting
|
iteration. Thus if :math:`N_\text{iter}` is specified as 10, the cutting
|
||||||
@ -310,7 +310,7 @@ in that sub-box.
|
|||||||
|
|
||||||
The *out* keyword writes text to the specified *filename* with the
|
The *out* keyword writes text to the specified *filename* with the
|
||||||
results of each re-balancing operation. The file contains the bounds
|
results of each re-balancing operation. The file contains the bounds
|
||||||
of the sub-domain for each processor after the balancing operation
|
of the subdomain for each processor after the balancing operation
|
||||||
completes. The format of the file is compatible with the
|
completes. The format of the file is compatible with the
|
||||||
`Pizza.py <pizza_>`_ *mdump* tool which has support for manipulating and
|
`Pizza.py <pizza_>`_ *mdump* tool which has support for manipulating and
|
||||||
visualizing mesh files. An example is shown here for a balancing by four
|
visualizing mesh files. An example is shown here for a balancing by four
|
||||||
@ -354,7 +354,7 @@ processors for a 2d problem:
|
|||||||
4 1 13 14 15 16
|
4 1 13 14 15 16
|
||||||
|
|
||||||
The coordinates of all the vertices are listed in the NODES section, five
|
The coordinates of all the vertices are listed in the NODES section, five
|
||||||
per processor. Note that the four sub-domains share vertices, so there
|
per processor. Note that the four subdomains share vertices, so there
|
||||||
will be duplicate nodes in the list.
|
will be duplicate nodes in the list.
|
||||||
|
|
||||||
The "SQUARES" section lists the node IDs of the four vertices in a
|
The "SQUARES" section lists the node IDs of the four vertices in a
|
||||||
|
|||||||
@ -118,7 +118,7 @@ displaced by the same amount, different on each iteration.
|
|||||||
all. Also note that if the box shape tilts to an extreme shape,
|
all. Also note that if the box shape tilts to an extreme shape,
|
||||||
LAMMPS will run less efficiently, due to the large volume of
|
LAMMPS will run less efficiently, due to the large volume of
|
||||||
communication needed to acquire ghost atoms around a processor's
|
communication needed to acquire ghost atoms around a processor's
|
||||||
irregular-shaped sub-domain. For extreme values of tilt, LAMMPS may
|
irregular-shaped subdomain. For extreme values of tilt, LAMMPS may
|
||||||
also lose atoms and generate an error.
|
also lose atoms and generate an error.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|||||||
@ -546,7 +546,7 @@ flipping the box when it is exceeded. If the *flip* value is set to
|
|||||||
you apply large deformations, this means the box shape can tilt
|
you apply large deformations, this means the box shape can tilt
|
||||||
dramatically LAMMPS will run less efficiently, due to the large volume
|
dramatically LAMMPS will run less efficiently, due to the large volume
|
||||||
of communication needed to acquire ghost atoms around a processor's
|
of communication needed to acquire ghost atoms around a processor's
|
||||||
irregular-shaped sub-domain. For extreme values of tilt, LAMMPS may
|
irregular-shaped subdomain. For extreme values of tilt, LAMMPS may
|
||||||
also lose atoms and generate an error.
|
also lose atoms and generate an error.
|
||||||
|
|
||||||
The *units* keyword determines the meaning of the distance units used
|
The *units* keyword determines the meaning of the distance units used
|
||||||
|
|||||||
@ -198,7 +198,7 @@ dt}{\rho dx^2}` is approximately equal to 1.
|
|||||||
and a simulation domain size. This fix uses the same subdivision of
|
and a simulation domain size. This fix uses the same subdivision of
|
||||||
the simulation domain among processors as the main LAMMPS program. In
|
the simulation domain among processors as the main LAMMPS program. In
|
||||||
order to uniformly cover the simulation domain with lattice sites, the
|
order to uniformly cover the simulation domain with lattice sites, the
|
||||||
lengths of the individual LAMMPS sub-domains must all be evenly
|
lengths of the individual LAMMPS subdomains must all be evenly
|
||||||
divisible by :math:`dx_{LB}`. If the simulation domain size is cubic,
|
divisible by :math:`dx_{LB}`. If the simulation domain size is cubic,
|
||||||
with equal lengths in all dimensions, and the default value for
|
with equal lengths in all dimensions, and the default value for
|
||||||
:math:`dx_{LB}` is used, this will automatically be satisfied.
|
:math:`dx_{LB}` is used, this will automatically be satisfied.
|
||||||
|
|||||||
@ -371,7 +371,7 @@ flipping the box when it is exceeded. If the *flip* value is set to
|
|||||||
applied stress induces large deformations (e.g. in a liquid), this
|
applied stress induces large deformations (e.g. in a liquid), this
|
||||||
means the box shape can tilt dramatically and LAMMPS will run less
|
means the box shape can tilt dramatically and LAMMPS will run less
|
||||||
efficiently, due to the large volume of communication needed to
|
efficiently, due to the large volume of communication needed to
|
||||||
acquire ghost atoms around a processor's irregular-shaped sub-domain.
|
acquire ghost atoms around a processor's irregular-shaped subdomain.
|
||||||
For extreme values of tilt, LAMMPS may also lose atoms and generate an
|
For extreme values of tilt, LAMMPS may also lose atoms and generate an
|
||||||
error.
|
error.
|
||||||
|
|
||||||
|
|||||||
@ -311,7 +311,7 @@ flipping the box when it is exceeded. If the *flip* value is set to
|
|||||||
applied stress induces large deformations (e.g. in a liquid), this
|
applied stress induces large deformations (e.g. in a liquid), this
|
||||||
means the box shape can tilt dramatically and LAMMPS will run less
|
means the box shape can tilt dramatically and LAMMPS will run less
|
||||||
efficiently, due to the large volume of communication needed to
|
efficiently, due to the large volume of communication needed to
|
||||||
acquire ghost atoms around a processor's irregular-shaped sub-domain.
|
acquire ghost atoms around a processor's irregular-shaped subdomain.
|
||||||
For extreme values of tilt, LAMMPS may also lose atoms and generate an
|
For extreme values of tilt, LAMMPS may also lose atoms and generate an
|
||||||
error.
|
error.
|
||||||
|
|
||||||
|
|||||||
@ -69,7 +69,7 @@ geometries.
|
|||||||
This fix must be used with an additional fix that specifies time
|
This fix must be used with an additional fix that specifies time
|
||||||
integration, e.g. :doc:`fix nve <fix_nve>` or :doc:`fix nph <fix_nh>`.
|
integration, e.g. :doc:`fix nve <fix_nve>` or :doc:`fix nph <fix_nh>`.
|
||||||
|
|
||||||
The Shardlow splitting algorithm requires the sizes of the sub-domain
|
The Shardlow splitting algorithm requires the sizes of the subdomain
|
||||||
lengths to be larger than twice the cutoff+skin. Generally, the
|
lengths to be larger than twice the cutoff+skin. Generally, the
|
||||||
domain decomposition is dependent on the number of processors
|
domain decomposition is dependent on the number of processors
|
||||||
requested.
|
requested.
|
||||||
|
|||||||
@ -90,7 +90,7 @@ The description in this sub-section applies to all 3 fix styles:
|
|||||||
*ttm*, *ttm/grid*, and *ttm/mod*.
|
*ttm*, *ttm/grid*, and *ttm/mod*.
|
||||||
|
|
||||||
Fix *ttm/grid* distributes the regular grid across processors consistent
|
Fix *ttm/grid* distributes the regular grid across processors consistent
|
||||||
with the sub-domains of atoms owned by each processor, but is otherwise
|
with the subdomains of atoms owned by each processor, but is otherwise
|
||||||
identical to fix ttm. Note that fix *ttm* stores a copy of the grid on
|
identical to fix ttm. Note that fix *ttm* stores a copy of the grid on
|
||||||
each processor, which is acceptable when the overall grid is reasonably
|
each processor, which is acceptable when the overall grid is reasonably
|
||||||
small. For larger grids you should use fix *ttm/grid* instead.
|
small. For larger grids you should use fix *ttm/grid* instead.
|
||||||
@ -170,11 +170,11 @@ ttm/mod.
|
|||||||
periodic boundary conditions in all dimensions. They also require
|
periodic boundary conditions in all dimensions. They also require
|
||||||
that the size and shape of the simulation box do not vary
|
that the size and shape of the simulation box do not vary
|
||||||
dynamically, e.g. due to use of the :doc:`fix npt <fix_nh>` command.
|
dynamically, e.g. due to use of the :doc:`fix npt <fix_nh>` command.
|
||||||
Likewise, the size/shape of processor sub-domains cannot vary due to
|
Likewise, the size/shape of processor subdomains cannot vary due to
|
||||||
dynamic load-balancing via use of the :doc:`fix balance
|
dynamic load-balancing via use of the :doc:`fix balance
|
||||||
<fix_balance>` command. It is possible however to load balance
|
<fix_balance>` command. It is possible however to load balance
|
||||||
before the simulation starts using the :doc:`balance <balance>`
|
before the simulation starts using the :doc:`balance <balance>`
|
||||||
command, so that each processor has a different size sub-domain.
|
command, so that each processor has a different size subdomain.
|
||||||
|
|
||||||
Periodic boundary conditions are also used in the heat equation solve
|
Periodic boundary conditions are also used in the heat equation solve
|
||||||
for the electronic subsystem. This varies from the approach of
|
for the electronic subsystem. This varies from the approach of
|
||||||
|
|||||||
@ -399,7 +399,7 @@ automatically throughout the run. This typically give performance
|
|||||||
within 5 to 10 percent of the optimal fixed fraction.
|
within 5 to 10 percent of the optimal fixed fraction.
|
||||||
|
|
||||||
The *ghost* keyword determines whether or not ghost atoms, i.e. atoms
|
The *ghost* keyword determines whether or not ghost atoms, i.e. atoms
|
||||||
at the boundaries of processor sub-domains, are offloaded for neighbor
|
at the boundaries of processor subdomains, are offloaded for neighbor
|
||||||
and force calculations. When the value = "no", ghost atoms are not
|
and force calculations. When the value = "no", ghost atoms are not
|
||||||
offloaded. This option can reduce the amount of data transfer with
|
offloaded. This option can reduce the amount of data transfer with
|
||||||
the co-processor and can also overlap MPI communication of forces with
|
the co-processor and can also overlap MPI communication of forces with
|
||||||
@ -521,7 +521,7 @@ the comm keywords.
|
|||||||
The value options for the keywords are *no* or *host* or *device*\ . A
|
The value options for the keywords are *no* or *host* or *device*\ . A
|
||||||
value of *no* means to use the standard non-KOKKOS method of
|
value of *no* means to use the standard non-KOKKOS method of
|
||||||
packing/unpacking data for the communication. A value of *host* means to
|
packing/unpacking data for the communication. A value of *host* means to
|
||||||
use the host, typically a multi-core CPU, and perform the
|
use the host, typically a multicore CPU, and perform the
|
||||||
packing/unpacking in parallel with threads. A value of *device* means to
|
packing/unpacking in parallel with threads. A value of *device* means to
|
||||||
use the device, typically a GPU, to perform the packing/unpacking
|
use the device, typically a GPU, to perform the packing/unpacking
|
||||||
operation.
|
operation.
|
||||||
|
|||||||
@ -56,7 +56,7 @@ commands:
|
|||||||
The global DSMC *max_cell_size* determines the maximum cell length
|
The global DSMC *max_cell_size* determines the maximum cell length
|
||||||
used in the DSMC calculation. A structured mesh is overlayed on the
|
used in the DSMC calculation. A structured mesh is overlayed on the
|
||||||
simulation box such that an integer number of cells are created in
|
simulation box such that an integer number of cells are created in
|
||||||
each direction for each processor's sub-domain. Cell lengths are
|
each direction for each processor's subdomain. Cell lengths are
|
||||||
adjusted up to the user-specified maximum cell size.
|
adjusted up to the user-specified maximum cell size.
|
||||||
|
|
||||||
----------
|
----------
|
||||||
|
|||||||
@ -31,7 +31,7 @@ and the neighbor skin distance (see the documentation of the
|
|||||||
<comm_modify>` command). When you have bonds, angles, dihedrals, or
|
<comm_modify>` command). When you have bonds, angles, dihedrals, or
|
||||||
impropers defined at the same time, you must set the communication
|
impropers defined at the same time, you must set the communication
|
||||||
cutoff so that communication cutoff distance is large enough to acquire
|
cutoff so that communication cutoff distance is large enough to acquire
|
||||||
and communicate sufficient ghost atoms from neighboring sub-domains as
|
and communicate sufficient ghost atoms from neighboring subdomains as
|
||||||
needed for computing bonds, angles, etc.
|
needed for computing bonds, angles, etc.
|
||||||
|
|
||||||
A pair style of *none* will also not request a pairwise neighbor list.
|
A pair style of *none* will also not request a pairwise neighbor list.
|
||||||
|
|||||||
@ -66,7 +66,7 @@ parameters can be specified with an asterisk "\*", which means LAMMPS
|
|||||||
will choose the number of processors in that dimension of the grid.
|
will choose the number of processors in that dimension of the grid.
|
||||||
It will do this based on the size and shape of the global simulation
|
It will do this based on the size and shape of the global simulation
|
||||||
box so as to minimize the surface-to-volume ratio of each processor's
|
box so as to minimize the surface-to-volume ratio of each processor's
|
||||||
sub-domain.
|
subdomain.
|
||||||
|
|
||||||
Choosing explicit values for Px or Py or Pz can be used to override
|
Choosing explicit values for Px or Py or Pz can be used to override
|
||||||
the default manner in which LAMMPS will create the regular 3d grid of
|
the default manner in which LAMMPS will create the regular 3d grid of
|
||||||
@ -81,7 +81,7 @@ equal 1.
|
|||||||
Note that if you run on a prime number of processors P, then a grid
|
Note that if you run on a prime number of processors P, then a grid
|
||||||
such as 1 x P x 1 will be required, which may incur extra
|
such as 1 x P x 1 will be required, which may incur extra
|
||||||
communication costs due to the high surface area of each processor's
|
communication costs due to the high surface area of each processor's
|
||||||
sub-domain.
|
subdomain.
|
||||||
|
|
||||||
Also note that if multiple partitions are being used then P is the
|
Also note that if multiple partitions are being used then P is the
|
||||||
number of processors in this partition; see the :doc:`-partition command-line switch <Run_options>` page for details. Also note
|
number of processors in this partition; see the :doc:`-partition command-line switch <Run_options>` page for details. Also note
|
||||||
@ -113,10 +113,10 @@ will persist for all simulations. If balancing is performed, some of
|
|||||||
the methods invoked by those commands retain the logical topology of
|
the methods invoked by those commands retain the logical topology of
|
||||||
the initial 3d grid, and the mapping of processors to the grid
|
the initial 3d grid, and the mapping of processors to the grid
|
||||||
specified by the processors command. However the grid spacings in
|
specified by the processors command. However the grid spacings in
|
||||||
different dimensions may change, so that processors own sub-domains of
|
different dimensions may change, so that processors own subdomains of
|
||||||
different sizes. If the :doc:`comm_style tiled <comm_style>` command is
|
different sizes. If the :doc:`comm_style tiled <comm_style>` command is
|
||||||
used, methods invoked by the balancing commands may discard the 3d
|
used, methods invoked by the balancing commands may discard the 3d
|
||||||
grid of processors and tile the simulation domain with sub-domains of
|
grid of processors and tile the simulation domain with subdomains of
|
||||||
different sizes and shapes which no longer have a logical 3d
|
different sizes and shapes which no longer have a logical 3d
|
||||||
connectivity. If that occurs, all the information specified by the
|
connectivity. If that occurs, all the information specified by the
|
||||||
processors command is ignored.
|
processors command is ignored.
|
||||||
@ -129,7 +129,7 @@ processors.
|
|||||||
|
|
||||||
The *onelevel* style creates a 3d grid that is compatible with the
|
The *onelevel* style creates a 3d grid that is compatible with the
|
||||||
Px,Py,Pz settings, and which minimizes the surface-to-volume ratio of
|
Px,Py,Pz settings, and which minimizes the surface-to-volume ratio of
|
||||||
each processor's sub-domain, as described above. The mapping of
|
each processor's subdomain, as described above. The mapping of
|
||||||
processors to the grid is determined by the *map* keyword setting.
|
processors to the grid is determined by the *map* keyword setting.
|
||||||
|
|
||||||
The *twolevel* style can be used on machines with multicore nodes to
|
The *twolevel* style can be used on machines with multicore nodes to
|
||||||
@ -145,7 +145,7 @@ parameters can be specified with an asterisk "\*", which means LAMMPS
|
|||||||
will choose the number of cores in that dimension of the node's
|
will choose the number of cores in that dimension of the node's
|
||||||
sub-grid. As with Px,Py,Pz, it will do this based on the size and
|
sub-grid. As with Px,Py,Pz, it will do this based on the size and
|
||||||
shape of the global simulation box so as to minimize the
|
shape of the global simulation box so as to minimize the
|
||||||
surface-to-volume ratio of each processor's sub-domain.
|
surface-to-volume ratio of each processor's subdomain.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
|
|||||||
@ -16,7 +16,7 @@ nx,ny,nz = replication factors in each dimension
|
|||||||
|
|
||||||
.. parsed-literal::
|
.. parsed-literal::
|
||||||
|
|
||||||
*bbox* = only check atoms in replicas that overlap with a processor's sub-domain
|
*bbox* = only check atoms in replicas that overlap with a processor's subdomain
|
||||||
|
|
||||||
Examples
|
Examples
|
||||||
""""""""
|
""""""""
|
||||||
@ -52,7 +52,7 @@ image flags that differ by 1. This will allow the bond to be
|
|||||||
unwrapped appropriately.
|
unwrapped appropriately.
|
||||||
|
|
||||||
The optional keyword *bbox* uses a bounding box to only check atoms in
|
The optional keyword *bbox* uses a bounding box to only check atoms in
|
||||||
replicas that overlap with a processor's sub-domain when assigning
|
replicas that overlap with a processor's subdomain when assigning
|
||||||
atoms to processors. It typically results in a substantial speedup
|
atoms to processors. It typically results in a substantial speedup
|
||||||
when using the replicate command on a large number of processors. It
|
when using the replicate command on a large number of processors. It
|
||||||
does require temporary use of more memory, specifically that each
|
does require temporary use of more memory, specifically that each
|
||||||
|
|||||||
@ -64,7 +64,7 @@ The *lost* keyword determines whether LAMMPS checks for lost atoms each
|
|||||||
time it computes thermodynamics and what it does if atoms are lost. An
|
time it computes thermodynamics and what it does if atoms are lost. An
|
||||||
atom can be "lost" if it moves across a non-periodic simulation box
|
atom can be "lost" if it moves across a non-periodic simulation box
|
||||||
:doc:`boundary <boundary>` or if it moves more than a box length outside
|
:doc:`boundary <boundary>` or if it moves more than a box length outside
|
||||||
the simulation domain (or more than a processor sub-domain length)
|
the simulation domain (or more than a processor subdomain length)
|
||||||
before reneighboring occurs. The latter case is typically due to bad
|
before reneighboring occurs. The latter case is typically due to bad
|
||||||
dynamics (e.g., too large a time step and/or huge forces and velocities). If
|
dynamics (e.g., too large a time step and/or huge forces and velocities). If
|
||||||
the value is *ignore*, LAMMPS does not check for lost atoms. If the
|
the value is *ignore*, LAMMPS does not check for lost atoms. If the
|
||||||
|
|||||||
@ -3432,6 +3432,8 @@ Subclassed
|
|||||||
subcutoff
|
subcutoff
|
||||||
subcycle
|
subcycle
|
||||||
subcycling
|
subcycling
|
||||||
|
subdomain
|
||||||
|
subdomains
|
||||||
subhi
|
subhi
|
||||||
sublo
|
sublo
|
||||||
Subramaniyan
|
Subramaniyan
|
||||||
|
|||||||
Reference in New Issue
Block a user