git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@15650 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
@ -10,7 +10,7 @@ balance command :h3
|
||||
|
||||
[Syntax:]
|
||||
|
||||
balance thresh style args ... keyword value ... :pre
|
||||
balance thresh style args ... keyword args ... :pre
|
||||
|
||||
thresh = imbalance threshhold that must be exceeded to perform a re-balance :ulb,l
|
||||
one style/arg pair can be used (or multiple for {x},{y},{z}) :l
|
||||
@ -32,9 +32,23 @@ style = {x} or {y} or {z} or {shift} or {rcb} :l
|
||||
Niter = # of times to iterate within each dimension of dimstr sequence
|
||||
stopthresh = stop balancing when this imbalance threshhold is reached
|
||||
{rcb} args = none :pre
|
||||
zero or more keyword/value pairs may be appended :l
|
||||
keyword = {out} :l
|
||||
{out} value = filename
|
||||
zero or more keyword/arg pairs may be appended :l
|
||||
keyword = {weight} or {out} :l
|
||||
{weight} style args = use weighted particle counts for the balancing
|
||||
{style} = {group} or {neigh} or {time} or {var} or {store}
|
||||
{group} args = Ngroup group1 weight1 group2 weight2 ...
|
||||
Ngroup = number of groups with assigned weights
|
||||
group1, group2, ... = group IDs
|
||||
weight1, weight2, ... = corresponding weight factors
|
||||
{neigh} factor = compute weight based on number of neighbors
|
||||
factor = scaling factor (> 0)
|
||||
{time} factor = compute weight based on time spend computing
|
||||
factor = scaling factor (> 0)
|
||||
{var} name = take weight from atom-style variable
|
||||
name = name of the atom-style variable
|
||||
{store} name = store weight in custom atom property defined by "fix property/atom"_fix_property_atom.html command
|
||||
name = atom property name (without d_ prefix)
|
||||
{out} arg = filename
|
||||
filename = write each processor's sub-domain to a file :pre
|
||||
:ule
|
||||
|
||||
@ -44,28 +58,41 @@ balance 0.9 x uniform y 0.4 0.5 0.6
|
||||
balance 1.2 shift xz 5 1.1
|
||||
balance 1.0 shift xz 5 1.1
|
||||
balance 1.1 rcb
|
||||
balance 1.0 shift x 10 1.1 weight group 2 fast 0.5 slow 2.0
|
||||
balance 1.0 shift x 10 1.1 weight time 0.8 weight neigh 0.5 weight store balance
|
||||
balance 1.0 shift x 20 1.0 out tmp.balance :pre
|
||||
|
||||
[Description:]
|
||||
|
||||
This command adjusts the size and shape of processor sub-domains
|
||||
within the simulation box, to attempt to balance the number of
|
||||
particles and thus the computational cost (load) evenly across
|
||||
processors. The load balancing is "static" in the sense that this
|
||||
command performs the balancing once, before or between simulations.
|
||||
The processor sub-domains will then remain static during the
|
||||
subsequent run. To perform "dynamic" balancing, see the "fix
|
||||
within the simulation box, to attempt to balance the number of atoms
|
||||
or particles and thus indirectly the computational cost (load) more
|
||||
evenly across processors. The load balancing is "static" in the sense
|
||||
that this command performs the balancing once, before or between
|
||||
simulations. The processor sub-domains will then remain static during
|
||||
the subsequent run. To perform "dynamic" balancing, see the "fix
|
||||
balance"_fix_balance.html command, which can adjust processor
|
||||
sub-domain sizes and shapes on-the-fly during a "run"_run.html.
|
||||
|
||||
Load-balancing is typically only useful if the particles in the
|
||||
simulation box have a spatially-varying density distribution. E.g. a
|
||||
model of a vapor/liquid interface, or a solid with an irregular-shaped
|
||||
geometry containing void regions. In this case, the LAMMPS default of
|
||||
Load-balancing is typically most useful if the particles in the
|
||||
simulation box have a spatially-varying density distribution or when
|
||||
the computational cost varies signficantly between different
|
||||
particles. E.g. a model of a vapor/liquid interface, or a solid with
|
||||
an irregular-shaped geometry containing void regions, or "hybrid pair
|
||||
style simulations"_pair_hybrid.html which combine pair styles with
|
||||
different computational cost. In these cases, the LAMMPS default of
|
||||
dividing the simulation box volume into a regular-spaced grid of 3d
|
||||
bricks, with one equal-volume sub-domain per procesor, may assign very
|
||||
different numbers of particles per processor. This can lead to poor
|
||||
performance when the simulation is run in parallel.
|
||||
bricks, with one equal-volume sub-domain per procesor, may assign
|
||||
numbers of particles per processor in a way that the computational
|
||||
effort varies significantly. This can lead to poor performance when
|
||||
the simulation is run in parallel.
|
||||
|
||||
The balancing can be performed with or without per-particle weighting.
|
||||
With no weighting, the balancing attempts to assign an equal number of
|
||||
particles to each processor. With weighting, the balancing attempts
|
||||
to assign an equal aggregate weight to each processor, which typically
|
||||
means a different number of particles per processor. Details on the
|
||||
various weighting options are "given below"_#weighted_balance.
|
||||
|
||||
Note that the "processors"_processors.html command allows some control
|
||||
over how the box volume is split across processors. Specifically, for
|
||||
@ -78,9 +105,9 @@ sub-domains will still have the same shape and same volume.
|
||||
The requested load-balancing operation is only performed if the
|
||||
current "imbalance factor" in particles owned by each processor
|
||||
exceeds the specified {thresh} parameter. The imbalance factor is
|
||||
defined as the maximum number of particles owned by any processor,
|
||||
divided by the average number of particles per processor. Thus an
|
||||
imbalance factor of 1.0 is perfect balance.
|
||||
defined as the maximum number of particles (or weight) owned by any
|
||||
processor, divided by the average number of particles (or weight) per
|
||||
processor. Thus an imbalance factor of 1.0 is perfect balance.
|
||||
|
||||
As an example, for 10000 particles running on 10 processors, if the
|
||||
most heavily loaded processor has 1200 particles, then the factor is
|
||||
@ -108,7 +135,7 @@ defined above. But depending on the method a perfect balance (1.0)
|
||||
may not be achieved. For example, "grid" methods (defined below) that
|
||||
create a logical 3d grid cannot achieve perfect balance for many
|
||||
irregular distributions of particles. Likewise, if a portion of the
|
||||
system is a perfect lattice, e.g. the intiial system is generated by
|
||||
system is a perfect lattice, e.g. the initial system is generated by
|
||||
the "create_atoms"_create_atoms.html command, then "grid" methods may
|
||||
be unable to achieve exact balance. This is because entire lattice
|
||||
planes will be owned or not owned by a single processor.
|
||||
@ -134,11 +161,11 @@ The {x}, {y}, {z}, and {shift} styles are "grid" methods which produce
|
||||
a logical 3d grid of processors. They operate by changing the cutting
|
||||
planes (or lines) between processors in 3d (or 2d), to adjust the
|
||||
volume (area in 2d) assigned to each processor, as in the following 2d
|
||||
diagram where processor sub-domains are shown and atoms are colored by
|
||||
the processor that owns them. The leftmost diagram is the default
|
||||
partitioning of the simulation box across processors (one sub-box for
|
||||
each of 16 processors); the middle diagram is after a "grid" method
|
||||
has been applied.
|
||||
diagram where processor sub-domains are shown and particles are
|
||||
colored by the processor that owns them. The leftmost diagram is the
|
||||
default partitioning of the simulation box across processors (one
|
||||
sub-box for each of 16 processors); the middle diagram is after a
|
||||
"grid" method has been applied.
|
||||
|
||||
:image(JPG/balance_uniform_small.jpg,JPG/balance_uniform.jpg),image(JPG/balance_nonuniform_small.jpg,JPG/balance_nonuniform.jpg),image(JPG/balance_rcb_small.jpg,JPG/balance_rcb.jpg)
|
||||
:c
|
||||
@ -146,8 +173,8 @@ has been applied.
|
||||
The {rcb} style is a "tiling" method which does not produce a logical
|
||||
3d grid of processors. Rather it tiles the simulation domain with
|
||||
rectangular sub-boxes of varying size and shape in an irregular
|
||||
fashion so as to have equal numbers of particles in each sub-box, as
|
||||
in the rightmost diagram above.
|
||||
fashion so as to have equal numbers of particles (or weight) in each
|
||||
sub-box, as in the rightmost diagram above.
|
||||
|
||||
The "grid" methods can be used with either of the
|
||||
"comm_style"_comm_style.html command options, {brick} or {tiled}. The
|
||||
@ -230,7 +257,7 @@ counts do not match the target value for the plane, the position of
|
||||
the cut is adjusted to be halfway between a low and high bound. The
|
||||
low and high bounds are adjusted on each iteration, using new count
|
||||
information, so that they become closer together over time. Thus as
|
||||
the recustion progresses, the count of particles on either side of the
|
||||
the recursion progresses, the count of particles on either side of the
|
||||
plane gets closer to the target value.
|
||||
|
||||
Once the rebalancing is complete and final processor sub-domains
|
||||
@ -262,21 +289,129 @@ the longest dimension, leaving one new box on either side of the cut.
|
||||
All the processors are also partitioned into 2 groups, half assigned
|
||||
to the box on the lower side of the cut, and half to the box on the
|
||||
upper side. (If the processor count is odd, one side gets an extra
|
||||
processor.) The cut is positioned so that the number of atoms in the
|
||||
lower box is exactly the number that the processors assigned to that
|
||||
box should own for load balance to be perfect. This also makes load
|
||||
balance for the upper box perfect. The positioning is done
|
||||
iteratively, by a bisectioning method. Note that counting atoms on
|
||||
either side of the cut requires communication between all processors
|
||||
at each iteration.
|
||||
processor.) The cut is positioned so that the number of particles in
|
||||
the lower box is exactly the number that the processors assigned to
|
||||
that box should own for load balance to be perfect. This also makes
|
||||
load balance for the upper box perfect. The positioning is done
|
||||
iteratively, by a bisectioning method. Note that counting particles
|
||||
on either side of the cut requires communication between all
|
||||
processors at each iteration.
|
||||
|
||||
That is the procedure for the first cut. Subsequent cuts are made
|
||||
recursively, in exactly the same manner. The subset of processors
|
||||
assigned to each box make a new cut in the longest dimension of that
|
||||
box, splitting the box, the subset of processsors, and the atoms in
|
||||
the box in two. The recursion continues until every processor is
|
||||
assigned a sub-box of the entire simulation domain, and owns the atoms
|
||||
in that sub-box.
|
||||
box, splitting the box, the subset of processsors, and the particles
|
||||
in the box in two. The recursion continues until every processor is
|
||||
assigned a sub-box of the entire simulation domain, and owns the
|
||||
particles in that sub-box.
|
||||
|
||||
:line
|
||||
|
||||
This sub-section describes how to perform weighted load balancing
|
||||
using the {weight} keyword. :link(weighted_balance)
|
||||
|
||||
By default, all particles have a weight of 1.0, which means each
|
||||
particle is assumed to require the same amount of computation during a
|
||||
timestep. There are, however, scenarios where this is not a good
|
||||
assumption. Measuring the computational cost for each particle
|
||||
accurately would be impractical and slow down the computation.
|
||||
Instead the {weight} keyword implements several ways to influence the
|
||||
per-particle weights empirically by properties readily available or
|
||||
using the user's knowledge of the system. Note that the absolute
|
||||
value of the weights are not important; their ratio is what is used to
|
||||
assign particles to processors. A particle with a weight of 2.5 is
|
||||
assumed to require 5x more computational than a particle with a weight
|
||||
of 0.5.
|
||||
|
||||
Below is a list of possible weight options with a short description of
|
||||
their usage and some example scenarios where they might be applicable.
|
||||
It is possible to apply multiple weight flags and the weightins they
|
||||
induce will be combined through multiplication. Most of the time,
|
||||
however, it is sufficient to use just one method.
|
||||
|
||||
The {group} weight style assigns weight factors to specified
|
||||
"groups"_group.html of particles. The {group} style keyword is
|
||||
followed by the number of groups, then pairs of group IDs and the
|
||||
corresponding weight factor. If a particle belongs to none of the
|
||||
specified groups, its weight is not changed. If it belongs to
|
||||
multiple groups, its weight is the product of the weight factors.
|
||||
|
||||
This weight style is useful in combination with pair style
|
||||
"hybrid"_pair_hybrid.html, e.g. when combining a more costly manybody
|
||||
potential with a fast pair-wise potential. It is also useful when
|
||||
using "run_style respa"_run_style.html where some portions of the
|
||||
system have many bonded interactions and others none. It assumes that
|
||||
the computational cost for each group remains constant over time.
|
||||
This is a purely empirical weighting, so a series test runs to tune
|
||||
the assigned weight factors for optimal performance is recommended.
|
||||
|
||||
The {neigh} weight style assigns a weight to each particle equal to
|
||||
its number of neighbors divided by the avergage number of neighbors
|
||||
for all particles. The {factor} setting is then appied as an overall
|
||||
scale factor to all the {neigh} weights which allows tuning of the
|
||||
impact of this style. A {factor} smaller than 1.0 (e.g. 0.8) often
|
||||
results in the best performance, since the number of neighbors is
|
||||
likely to overestimate the ideal weight.
|
||||
|
||||
This weight style is useful for systems where there are different
|
||||
cutoffs used for different pairs of interations, or the density
|
||||
fluctuates, or a large number of particles are in the vicinity of a
|
||||
wall, or a combination of these effects. If a simulation uses
|
||||
multiple neighbor lists, this weight style will use the first suitable
|
||||
neighbor list it finds. It will not request or compute a new list. A
|
||||
warning will be issued if there is no suitable neighbor list available
|
||||
or if it is not current, e.g. if the balance command is used before a
|
||||
"run"_run.html or "minimize"_minimize.html command is used, in which
|
||||
case the neighbor list may not yet have been built. In this case no
|
||||
weights are computed. Inserting a "run 0 post no"_run.html command
|
||||
before issuing the {balance} command, may be a workaround for this
|
||||
case, as it will induce the neighbor list to be built.
|
||||
|
||||
The {time} weight style uses "timer data"_timer.html to estimate a
|
||||
weight for each particle. It uses the same information as is used for
|
||||
the "MPI task timing breakdown"_Section_start.html#start_8, namely,
|
||||
the timings for sections {Pair}, {Bond}, {Kspace}, and {Neigh}. The
|
||||
time spent in these sections of the timestep are measured for each MPI
|
||||
rank, summed up, then converted into a cost for each MPI rank relative
|
||||
to the average cost over all MPI ranks for the same sections. That
|
||||
cost then evenly distributed over all the particles owned by that
|
||||
rank. Finally, the {factor} setting is then appied as an overall
|
||||
scale factor to all the {time} weights as a way to fine tune the
|
||||
impact of this weight style. Good {factor} values to use are
|
||||
typically between 0.5 and 1.2.
|
||||
|
||||
For the {balance} command the timing data is taken from the preceding
|
||||
run command, i.e. the timings are for the entire previous run. For
|
||||
the {fix balance} command the timing data is for only the timesteps
|
||||
since the last balancing operation was performed. If timing
|
||||
information for the required sections is not available, e.g. at the
|
||||
beginning of a run, or when the "timer"_timer.html command is set to
|
||||
either {loop} or {off}, a warning is issued. In this case no weights
|
||||
are computed.
|
||||
|
||||
This weight style is the most generic one, and should be tried first,
|
||||
if neither the {group} or {neigh} styles are easily applicable.
|
||||
However, since the computed cost function is averaged over all local
|
||||
particles this weight style may not be highly accurate. This style
|
||||
can also be effective as a secondary weight in combination with either
|
||||
{group} or {neigh} to offset some of inaccuracies in either of those
|
||||
heuristics.
|
||||
|
||||
The {var} weight style assigns per-particle weights by evaluating an
|
||||
"atom-style variable"_variable.html specified by {name}. This is
|
||||
provided as a more flexible alternative to the {group} weight style,
|
||||
allowing definition of a more complex heuristics based on information
|
||||
(global and per atom) available inside of LAMMPS. For example,
|
||||
atom-style variables can reference the position of a particle, its
|
||||
velocity, the volume of its Voronoi cell, etc.
|
||||
|
||||
The {store} weight style does not compute a weight factor. Instead it
|
||||
stores the current accumulated weights in a custom per-atom property
|
||||
specified by {name}. This must be a property defined as {d_name} via
|
||||
the "fix property/atom"_fix_property_atom.html command. Note that
|
||||
these custom per-atom properties can be output in a "dump"_dump.html
|
||||
file, so this is a way to examine, debug, or visualize the
|
||||
per-particle weights computed during the load-balancing operation.
|
||||
|
||||
:line
|
||||
|
||||
@ -342,6 +477,7 @@ appear in {dimstr} for the {shift} style.
|
||||
|
||||
[Related commands:]
|
||||
|
||||
"processors"_processors.html, "fix balance"_fix_balance.html
|
||||
"group"_group.html, "processors"_processors.html,
|
||||
"fix balance"_fix_balance.html
|
||||
|
||||
[Default:] none
|
||||
|
||||
@ -10,7 +10,7 @@ fix balance command :h3
|
||||
|
||||
[Syntax:]
|
||||
|
||||
fix ID group-ID balance Nfreq thresh style args keyword value ... :pre
|
||||
fix ID group-ID balance Nfreq thresh style args keyword args ... :pre
|
||||
|
||||
ID, group-ID are documented in "fix"_fix.html command :ulb,l
|
||||
balance = style name of this fix command :l
|
||||
@ -21,10 +21,24 @@ style = {shift} or {rcb} :l
|
||||
dimstr = sequence of letters containing "x" or "y" or "z", each not more than once
|
||||
Niter = # of times to iterate within each dimension of dimstr sequence
|
||||
stopthresh = stop balancing when this imbalance threshhold is reached
|
||||
rcb args = none :pre
|
||||
zero or more keyword/value pairs may be appended :l
|
||||
keyword = {out} :l
|
||||
{out} value = filename
|
||||
{rcb} args = none :pre
|
||||
zero or more keyword/arg pairs may be appended :l
|
||||
keyword = {weight} or {out} :l
|
||||
{weight} style args = use weighted particle counts for the balancing
|
||||
{style} = {group} or {neigh} or {time} or {var} or {store}
|
||||
{group} args = Ngroup group1 weight1 group2 weight2 ...
|
||||
Ngroup = number of groups with assigned weights
|
||||
group1, group2, ... = group IDs
|
||||
weight1, weight2, ... = corresponding weight factors
|
||||
{neigh} factor = compute weight based on number of neighbors
|
||||
factor = scaling factor (> 0)
|
||||
{time} factor = compute weight based on time spend computing
|
||||
factor = scaling factor (> 0)
|
||||
{var} name = take weight from atom-style variable
|
||||
name = name of the atom-style variable
|
||||
{store} name = store weight in custom atom property defined by "fix property/atom"_fix_property_atom.html command
|
||||
name = atom property name (without d_ prefix)
|
||||
{out} arg = filename
|
||||
filename = write each processor's sub-domain to a file, at each re-balancing :pre
|
||||
:ule
|
||||
|
||||
@ -32,6 +46,9 @@ keyword = {out} :l
|
||||
|
||||
fix 2 all balance 1000 1.05 shift x 10 1.05
|
||||
fix 2 all balance 100 0.9 shift xy 20 1.1 out tmp.balance
|
||||
fix 2 all balance 100 0.9 shift xy 20 1.1 weight group 3 substrate 3.0 solvent 1.0 solute 0.8 out tmp.balance
|
||||
fix 2 all balance 100 1.0 shift x 10 1.1 weight time 0.8
|
||||
fix 2 all balance 100 1.0 shift xy 5 1.1 weight var myweight weight neigh 0.6 weight store allweight
|
||||
fix 2 all balance 1000 1.1 rcb :pre
|
||||
|
||||
[Description:]
|
||||
@ -44,14 +61,31 @@ rebalancing is performed periodically during the simulation. To
|
||||
perform "static" balancing, before or between runs, see the
|
||||
"balance"_balance.html command.
|
||||
|
||||
Load-balancing is typically only useful if the particles in the
|
||||
simulation box have a spatially-varying density distribution. E.g. a
|
||||
model of a vapor/liquid interface, or a solid with an irregular-shaped
|
||||
geometry containing void regions. In this case, the LAMMPS default of
|
||||
dividing the simulation box volume into a regular-spaced grid of 3d
|
||||
bricks, with one equal-volume sub-domain per processor, may assign
|
||||
very different numbers of particles per processor. This can lead to
|
||||
poor performance when the simulation is run in parallel.
|
||||
Load-balancing is typically most useful if the particles in the
|
||||
simulation box have a spatially-varying density distribution or
|
||||
where the computational cost varies signficantly between different
|
||||
atoms. E.g. a model of a vapor/liquid interface, or a solid with
|
||||
an irregular-shaped geometry containing void regions, or
|
||||
"hybrid pair style simulations"_pair_hybrid.html which combine
|
||||
pair styles with different computational cost. In these cases, the
|
||||
LAMMPS default of dividing the simulation box volume into a
|
||||
regular-spaced grid of 3d bricks, with one equal-volume sub-domain
|
||||
per procesor, may assign numbers of particles per processor in a
|
||||
way that the computational effort varies significantly. This can
|
||||
lead to poor performance when the simulation is run in parallel.
|
||||
|
||||
The balancing can be performed with or without per-particle weighting.
|
||||
With no weighting, the balancing attempts to assign an equal number of
|
||||
particles to each processor. With weighting, the balancing attempts
|
||||
to assign an equal weight to each processor, which typically means a
|
||||
different number of atoms per processor.
|
||||
|
||||
NOTE: The weighting options listed above are documented with the
|
||||
"balance"_balance.html command in "this section of the balance
|
||||
command"_balance.html#weighted_balance doc page. The section
|
||||
describes the various weighting options and gives a few examples of
|
||||
how they can be used. The weighting options are the same for both the
|
||||
fix balance and "balance"_balance.html commands.
|
||||
|
||||
Note that the "processors"_processors.html command allows some control
|
||||
over how the box volume is split across processors. Specifically, for
|
||||
@ -64,9 +98,9 @@ sub-domains will still have the same shape and same volume.
|
||||
On a particular timestep, a load-balancing operation is only performed
|
||||
if the current "imbalance factor" in particles owned by each processor
|
||||
exceeds the specified {thresh} parameter. The imbalance factor is
|
||||
defined as the maximum number of particles owned by any processor,
|
||||
divided by the average number of particles per processor. Thus an
|
||||
imbalance factor of 1.0 is perfect balance.
|
||||
defined as the maximum number of particles (or weight) owned by any
|
||||
processor, divided by the average number of particles (or weight) per
|
||||
processor. Thus an imbalance factor of 1.0 is perfect balance.
|
||||
|
||||
As an example, for 10000 particles running on 10 processors, if the
|
||||
most heavily loaded processor has 1200 particles, then the factor is
|
||||
@ -117,8 +151,8 @@ applied.
|
||||
The {rcb} style is a "tiling" method which does not produce a logical
|
||||
3d grid of processors. Rather it tiles the simulation domain with
|
||||
rectangular sub-boxes of varying size and shape in an irregular
|
||||
fashion so as to have equal numbers of particles in each sub-box, as
|
||||
in the rightmost diagram above.
|
||||
fashion so as to have equal numbers of particles (or weight) in each
|
||||
sub-box, as in the rightmost diagram above.
|
||||
|
||||
The "grid" methods can be used with either of the
|
||||
"comm_style"_comm_style.html command options, {brick} or {tiled}. The
|
||||
@ -139,12 +173,9 @@ from scratch.
|
||||
|
||||
:line
|
||||
|
||||
The {group-ID} is currently ignored. In the future it may be used to
|
||||
determine what particles are considered for balancing. Normally it
|
||||
would only makes sense to use the {all} group. But in some cases it
|
||||
may be useful to balance on a subset of the particles, e.g. when
|
||||
modeling large nanoparticles in a background of small solvent
|
||||
particles.
|
||||
The {group-ID} is ignored. However the impact of balancing on
|
||||
different groups of atoms can be affected by using the {group} weight
|
||||
style as described below.
|
||||
|
||||
The {Nfreq} setting determines how often a rebalance is performed. If
|
||||
{Nfreq} > 0, then rebalancing will occur every {Nfreq} steps. Each
|
||||
@ -225,7 +256,7 @@ than {Niter} and exit early.
|
||||
|
||||
The {rcb} style invokes a "tiled" method for balancing, as described
|
||||
above. It performs a recursive coordinate bisectioning (RCB) of the
|
||||
simulation domain. The basic idea is as follows.
|
||||
simulation domain. The basic idea is as follows.
|
||||
|
||||
The simulation domain is cut into 2 boxes by an axis-aligned cut in
|
||||
the longest dimension, leaving one new box on either side of the cut.
|
||||
@ -250,10 +281,10 @@ in that sub-box.
|
||||
|
||||
:line
|
||||
|
||||
The {out} keyword writes a text file to the specified {filename} with
|
||||
the results of each rebalancing operation. The file contains the
|
||||
bounds of the sub-domain for each processor after the balancing
|
||||
operation completes. The format of the file is compatible with the
|
||||
The {out} keyword writes text to the specified {filename} with the
|
||||
results of each rebalancing operation. The file contains the bounds
|
||||
of the sub-domain for each processor after the balancing operation
|
||||
completes. The format of the file is compatible with the
|
||||
"Pizza.py"_pizza {mdump} tool which has support for manipulating and
|
||||
visualizing mesh files. An example is shown here for a balancing by 4
|
||||
processors for a 2d problem:
|
||||
@ -321,8 +352,8 @@ values in the vector are as follows:
|
||||
3 = imbalance factor right before the last rebalance was performed :ul
|
||||
|
||||
As explained above, the imbalance factor is the ratio of the maximum
|
||||
number of particles on any processor to the average number of
|
||||
particles per processor.
|
||||
number of particles (or total weight) on any processor to the average
|
||||
number of particles (or total weight) per processor.
|
||||
|
||||
These quantities can be accessed by various "output
|
||||
commands"_Section_howto.html#howto_15. The scalar and vector values
|
||||
@ -336,11 +367,11 @@ minimization"_minimize.html.
|
||||
|
||||
[Restrictions:]
|
||||
|
||||
For 2d simulations, a "z" cannot appear in {dimstr} for the {shift}
|
||||
style.
|
||||
For 2d simulations, the {z} style cannot be used. Nor can a "z"
|
||||
appear in {dimstr} for the {shift} style.
|
||||
|
||||
[Related commands:]
|
||||
|
||||
"processors"_processors.html, "balance"_balance.html
|
||||
"group"_group.html, "processors"_processors.html, "balance"_balance.html
|
||||
|
||||
[Default:] none
|
||||
|
||||
@ -48,14 +48,14 @@ follows the discussion in these 3 papers: "(HenkelmanA)"_#HenkelmanA,
|
||||
|
||||
Each replica runs on a partition of one or more processors. Processor
|
||||
partitions are defined at run-time using the -partition command-line
|
||||
switch; see "Section 2.7"_Section_start.html#start_7 of the
|
||||
manual. Note that if you have MPI installed, you can run a
|
||||
multi-replica simulation with more replicas (partitions) than you have
|
||||
physical processors, e.g you can run a 10-replica simulation on just
|
||||
one or two processors. You will simply not get the performance
|
||||
speed-up you would see with one or more physical processors per
|
||||
replica. See "this section"_Section_howto.html#howto_5 of the manual
|
||||
for further discussion.
|
||||
switch; see "Section 2.7"_Section_start.html#start_7 of the manual.
|
||||
Note that if you have MPI installed, you can run a multi-replica
|
||||
simulation with more replicas (partitions) than you have physical
|
||||
processors, e.g you can run a 10-replica simulation on just one or two
|
||||
processors. You will simply not get the performance speed-up you
|
||||
would see with one or more physical processors per replica. See
|
||||
"Section 6.5"_Section_howto.html#howto_5 of the manual for further
|
||||
discussion.
|
||||
|
||||
NOTE: The current NEB implementation in LAMMPS only allows there to be
|
||||
one processor per replica.
|
||||
|
||||
@ -63,14 +63,14 @@ event to occur.
|
||||
|
||||
Each replica runs on a partition of one or more processors. Processor
|
||||
partitions are defined at run-time using the -partition command-line
|
||||
switch; see "Section 2.7"_Section_start.html#start_7 of the
|
||||
manual. Note that if you have MPI installed, you can run a
|
||||
multi-replica simulation with more replicas (partitions) than you have
|
||||
physical processors, e.g you can run a 10-replica simulation on one or
|
||||
two processors. For PRD, this makes little sense, since this offers
|
||||
no effective parallel speed-up in searching for infrequent events. See
|
||||
"Section 6.5"_Section_howto.html#howto_5 of the manual for further
|
||||
discussion.
|
||||
switch; see "Section 2.7"_Section_start.html#start_7 of the manual.
|
||||
Note that if you have MPI installed, you can run a multi-replica
|
||||
simulation with more replicas (partitions) than you have physical
|
||||
processors, e.g you can run a 10-replica simulation on one or two
|
||||
processors. However for PRD, this makes little sense, since running a
|
||||
replica on virtual instead of physical processors,offers no effective
|
||||
parallel speed-up in searching for infrequent events. See "Section
|
||||
6.5"_Section_howto.html#howto_5 of the manual for further discussion.
|
||||
|
||||
When a PRD simulation is performed, it is assumed that each replica is
|
||||
running the same model, though LAMMPS does not check for this.
|
||||
@ -163,7 +163,7 @@ runs for {N} timesteps. If the {time} value is {clock}, then the
|
||||
simulation runs until {N} aggregate timesteps across all replicas have
|
||||
elapsed. This aggregate time is the "clock" time defined below, which
|
||||
typically advances nearly M times faster than the timestepping on a
|
||||
single replica.
|
||||
single replica, where M is the number of replicas.
|
||||
|
||||
:line
|
||||
|
||||
@ -183,25 +183,26 @@ coincident events, and the replica number of the chosen event.
|
||||
|
||||
The timestep is the usual LAMMPS timestep, except that time does not
|
||||
advance during dephasing or quenches, but only during dynamics. Note
|
||||
that are two kinds of dynamics in the PRD loop listed above. The
|
||||
first is when all replicas are performing independent dynamics,
|
||||
waiting for an event to occur. The second is when correlated events
|
||||
are being searched for and only one replica is running dynamics.
|
||||
that are two kinds of dynamics in the PRD loop listed above that
|
||||
contribute to this timestepping. The first is when all replicas are
|
||||
performing independent dynamics, waiting for an event to occur. The
|
||||
second is when correlated events are being searched for, but only one
|
||||
replica is running dynamics.
|
||||
|
||||
The CPU time is the total processor time since the start of the PRD
|
||||
run.
|
||||
The CPU time is the total elapsed time on each processor, since the
|
||||
start of the PRD run.
|
||||
|
||||
The clock is the same as the timestep except that it advances by M
|
||||
steps every timestep during the first kind of dynamics when the M
|
||||
steps per timestep during the first kind of dynamics when the M
|
||||
replicas are running independently. The clock advances by only 1 step
|
||||
per timestep during the second kind of dynamics, since only a single
|
||||
per timestep during the second kind of dynamics, when only a single
|
||||
replica is checking for a correlated event. Thus "clock" time
|
||||
represents the aggregate time (in steps) that effectively elapses
|
||||
represents the aggregate time (in steps) that has effectively elapsed
|
||||
during a PRD simulation on M replicas. If most of the PRD run is
|
||||
spent in the second stage of the loop above, searching for infrequent
|
||||
events, then the clock will advance nearly M times faster than it
|
||||
would if a single replica was running. Note the clock time between
|
||||
events will be drawn from p(t).
|
||||
successive events should be drawn from p(t).
|
||||
|
||||
The event number is a counter that increments with each event, whether
|
||||
it is uncorrelated or correlated.
|
||||
@ -212,14 +213,15 @@ replicas are running independently. The correlation flag will be 1
|
||||
when a correlated event occurs during the third stage of the loop
|
||||
listed above, i.e. when only one replica is running dynamics.
|
||||
|
||||
When more than one replica detects an event at the end of the second
|
||||
stage, then one of them is chosen at random. The number of coincident
|
||||
events is the number of replicas that detected an event. Normally, we
|
||||
expect this value to be 1. If it is often greater than 1, then either
|
||||
the number of replicas is too large, or {t_event} is too large.
|
||||
When more than one replica detects an event at the end of the same
|
||||
event check (every {t_event} steps) during the the second stage, then
|
||||
one of them is chosen at random. The number of coincident events is
|
||||
the number of replicas that detected an event. Normally, this value
|
||||
should be 1. If it is often greater than 1, then either the number of
|
||||
replicas is too large, or {t_event} is too large.
|
||||
|
||||
The replica number is the ID of the replica (from 0 to M-1) that
|
||||
found the event.
|
||||
The replica number is the ID of the replica (from 0 to M-1) in which
|
||||
the event occurred.
|
||||
|
||||
:line
|
||||
|
||||
@ -286,7 +288,7 @@ This command can only be used if LAMMPS was built with the REPLICA
|
||||
package. See the "Making LAMMPS"_Section_start.html#start_3 section
|
||||
for more info on packages.
|
||||
|
||||
{N} and {t_correlate} settings must be integer multiples of
|
||||
The {N} and {t_correlate} settings must be integer multiples of
|
||||
{t_event}.
|
||||
|
||||
Runs restarted from restart file written during a PRD run will not
|
||||
|
||||
Reference in New Issue
Block a user