diff --git a/doc/JPG/balance.jpg b/doc/JPG/balance.jpg new file mode 100644 index 0000000000..ea8bdf1466 Binary files /dev/null and b/doc/JPG/balance.jpg differ diff --git a/doc/balance.html b/doc/balance.html new file mode 100644 index 0000000000..02336da31e --- /dev/null +++ b/doc/balance.html @@ -0,0 +1,206 @@ + +
NOTE: the fix balance command referred to here for dynamic load +balancing has not yet been released. +
+Syntax: +
+balance group-ID keyword args ... ++
x args = uniform or Px-1 numbers between 0 and 1 + uniform = evenly spaced cuts between processors in x dimension + numbers = Px-1 ascending values between 0 and 1, Px - # of processors in x dimension + y args = uniform or Py-1 numbers between 0 and 1 + uniform = evenly spaced cuts between processors in x dimension + numbers = Py-1 ascending values between 0 and 1, Py - # of processors in y dimension + z args = uniform or Pz-1 numbers between 0 and 1 + uniform = evenly spaced cuts between processors in x dimension + numbers = Pz-1 ascending values between 0 and 1, Pz - # of processors in z dimension + dynamic args = Nrepeat Niter dimstr thresh + Nrepeat = # of times to repeat dimstr sequence + Niter = # of times to iterate within each dimension of dimstr sequence + dimstr = sequence of letters containing "x" or "y" or "z" + thresh = stop balancing when this imbalance threshhold is reached ++ + +
Examples: +
+balance x uniform y 0.4 0.5 0.6 +balance dynamic 1 5 xzx 1.1 +balance dynamic 5 10 x 1.0 ++
Description: +
+This command adjusts the size of processor sub-domains within the +simulation box, to attempt to balance the number of particles and thus +the computational cost (load) evenly across processors. The load +balancing is "static" in the sense that this command performs the +balancing once, before or between simulations. The processor +sub-domains will then remain static during the subsequent run. To +perform "dynammic" balancing, see the fix balance +command, which can adjust processor sub-domain sizes on-the-fly during +a run. +
+Load-balancing is only useful if the particles in the simulation box +have a spatially-varying density distribution. E.g. a model of a +vapor/liquid interface, or a solid with an irregular-shaped geometry +containing void regions. In this case, the LAMMPS default of dividing +the simulation box volume into a regular-spaced grid of processor +sub-domain, with one equal-volume sub-domain per procesor, may assign +very different numbers of particles per processor. This can lead to +poor performance in a scalability sense, when the simulation is run in +parallel. +
+Note that the processors command gives you some +control over how the box volume is split across processors. +Specifically, for a Px by Py by Pz grid of processors, it lets you +choose Px, Py, and Pz, subject to the constraint that Px * Py * Pz = +P, the total number of processors. This can be sufficient to achieve +good load-balance for some models on some processor counts. However, +all the processor sub-domains will still be the same shape and have +the same volume. +
+This command does not alter the topology of the Px by Py by Pz grid or +processors. But it shifts the cutting planes between processors (in +3d, or lines in 2d), which adjusts the volume (area in 2d) assigned to +each processor, as in the following 2d diagram. The left diagram is +the default partitioning of the simulation box across processors (one +sub-box for each of 16 processors); the right diagram is after +balancing. +
+
+When the balance command completes, it prints out the change in +"imbalance factor". The imbalance factor is defined as the maximum +number of particles owned by any processor, divided by the average +number of particles per processor. Thus an imbalance factor of 1.0 is +perfect balance. For 10000 particles running on 10 processors, if the +most heavily loaded processor has 1200 particles, then the factor is +1.2, meaning there is a 20% imbalance. The change in the maximum +number of particles (on any processor) is also printed. +
+IMPORTANT NOTE: This command attempts to minimize the imbalance +factor, as defined above. But because of the topology constraint that +only the cutting planes (lines) between processors are moved, there +are many irregular distributions of particles, where this factor +cannot be shrunk to 1.0, particuarly in 3d. Also, computational cost +is not strictly proportional to particle count, and changing the +relative size and shape of processor sub-domains may lead to +additional computational and communication overheads, e.g. in the PPPM +solver used via the kspace_style command. Thus +you should benchmark the run times of your simulation before and after +balancing. +
+The x, y, and z keywords adjust the position of cutting planes +between processor sub-domains in a specific dimension. The uniform +argument spaces the planes evenly, as in the left diagram above. The +numeric argument requires you to list Ps-1 numbers that specify the +position of the cutting planes. This requires that you know Ps = Px +or Py or Pz = the number of processors assigned by LAMMPS to the +relevant dimension. This assignment is made (and the Px, Py, Pz +values printed out) when the simulation box is created by the +"create_box" or "read_data" or "read_restart" command and is +influenced by the settings of the "processors" command. +
+Each of the numeric values must be between 0 and 1, and they must be +listed in ascending order. They represent the fractional position of +the cutting place. The left (or lower) edge of the box is 0.0, and +the right (or upper) edge is 1.0. Neither of these values is +specified. Only the interior Ps-1 positions are specified. Thus is +there are 2 procesors in the x dimension, you specify a single value +such as 0.75, which would make the left processor's sub-domain 3x +larger than the right processor's sub-domain. +
+The dynamic keyword changes the cutting planes between processors in +an iterative fashion, seeking to reduce the imbalance factor, similar +to how the fix balance command operates. Note that +this keyword begins its operation from the current processor +partitioning, which could be uniform or the result of a previous +balance command. +
+The dimstr argument is a string of characters, each of which must be +an "x" or "y" or "z". The characters can appear in any order, and can +be repeated as many times as desired. These are all valid dimstr +arguments: "x" or "xyzyx" or "yyyzzz". +
+Balancing proceeds by adjusting the cutting planes in each of the +dimensions listed in dimstr, one dimension at a time. The entire +sequence of dimensions is repeated Nrepeat times. For a single +dimension, the balancing operation (described below) is iterated on +Niter times. After each dimension finishes, the imbalance factor is +re-computed, and the balancing operation halts if the thresh +criterion is met. +
+The interplay between Nrepeat, Niter, and dimstr means that +these commands do essentially the same thing, the only difference +being how often the imbalance factor is computed and checked against +the threshhold: +
+balance y dynamic 5 10 x 1.2 +balance y dynamic 1 10 xxxxx 1.2 +balance y dynamic 50 1 x 1.2 ++
A rebalance operation in a single dimension is performed using an +iterative "diffusive" load-balancing algorithm. One iteration (which +is repeated Niter times), works as follows. Assume there are Px +processors in the x dimension. This defines Px slices of the +simulation, each of which contains Py*Pz processors. The task is to +adjust the position of the Px-1 cuts between slices, leaving the end +cuts unchanged (left and right edges of the simulation box). +
+The iteration beings by calculating the number of atoms within each of +the Px slices. Then for each slice, its atom count is compared to its +neighbors. If a slice has more atoms than its left (or right) +neighbor, the cut is moved towards the center of the slice, +effectively shrinking the width of the slice and migrating atoms to +the other slice. The distance to move the cut is a function of the +"density" of atoms in the donor slice and the difference in counts +between the 2 slices. A damping factor is also applied to avoid +oscillations in the position of the cutting plane as iterations +proceed. Hence the "diffusive" nature of the algorithm as work +(atoms) effectively diffuses from highly loaded processors to +less-loaded processors. +
+Restrictions: +
+The dynamic keyword cannot be used with the x, y, or z +arguments. +
+For 2d simulations, the z keyword cannot be used. Nor can a "z" +appear in dimstr for the dynamic keyword. +
+Related commands: +
+ +Default: none +
+add a citation and image +
+ diff --git a/doc/balance.txt b/doc/balance.txt new file mode 100644 index 0000000000..84c598d21c --- /dev/null +++ b/doc/balance.txt @@ -0,0 +1,197 @@ +"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c + +:link(lws,http://lammps.sandia.gov) +:link(ld,Manual.html) +:link(lc,Section_commands.html#comm) + +:line + +balance command :h3 + +NOTE: the fix balance command referred to here for dynamic load +balancing has not yet been released. + +[Syntax:] + +balance group-ID keyword args ... :pre + +group-ID = ID of group of atoms to balance across processors :ule,l +one or more keyword/arg pairs may be appended :l +keyword = {x} or {y} or {z} or {dynamic} :l + {x} args = {uniform} or Px-1 numbers between 0 and 1 + {uniform} = evenly spaced cuts between processors in x dimension + numbers = Px-1 ascending values between 0 and 1, Px - # of processors in x dimension + {y} args = {uniform} or Py-1 numbers between 0 and 1 + {uniform} = evenly spaced cuts between processors in x dimension + numbers = Py-1 ascending values between 0 and 1, Py - # of processors in y dimension + {z} args = {uniform} or Pz-1 numbers between 0 and 1 + {uniform} = evenly spaced cuts between processors in x dimension + numbers = Pz-1 ascending values between 0 and 1, Pz - # of processors in z dimension + {dynamic} args = Nrepeat Niter dimstr thresh + Nrepeat = # of times to repeat dimstr sequence + Niter = # of times to iterate within each dimension of dimstr sequence + dimstr = sequence of letters containing "x" or "y" or "z" + thresh = stop balancing when this imbalance threshhold is reached :pre +:ule + +[Examples:] + +balance x uniform y 0.4 0.5 0.6 +balance dynamic 1 5 xzx 1.1 +balance dynamic 5 10 x 1.0 :pre + +[Description:] + +This command adjusts the size of processor sub-domains within the +simulation box, to attempt to balance the number of particles and thus +the computational cost (load) evenly across processors. The load +balancing is "static" in the sense that this command performs the +balancing once, before or between simulations. The processor +sub-domains will then remain static during the subsequent run. To +perform "dynammic" balancing, see the "fix balance"_fix_balance.html +command, which can adjust processor sub-domain sizes on-the-fly during +a "run"_run.html. + +Load-balancing is only useful if the particles in the simulation box +have a spatially-varying density distribution. E.g. a model of a +vapor/liquid interface, or a solid with an irregular-shaped geometry +containing void regions. In this case, the LAMMPS default of dividing +the simulation box volume into a regular-spaced grid of processor +sub-domain, with one equal-volume sub-domain per procesor, may assign +very different numbers of particles per processor. This can lead to +poor performance in a scalability sense, when the simulation is run in +parallel. + +Note that the "processors"_processors.html command gives you some +control over how the box volume is split across processors. +Specifically, for a Px by Py by Pz grid of processors, it lets you +choose Px, Py, and Pz, subject to the constraint that Px * Py * Pz = +P, the total number of processors. This can be sufficient to achieve +good load-balance for some models on some processor counts. However, +all the processor sub-domains will still be the same shape and have +the same volume. + +This command does not alter the topology of the Px by Py by Pz grid or +processors. But it shifts the cutting planes between processors (in +3d, or lines in 2d), which adjusts the volume (area in 2d) assigned to +each processor, as in the following 2d diagram. The left diagram is +the default partitioning of the simulation box across processors (one +sub-box for each of 16 processors); the right diagram is after +balancing. + +:c,image(JPG/balance.jpg) + +When the balance command completes, it prints out the change in +"imbalance factor". The imbalance factor is defined as the maximum +number of particles owned by any processor, divided by the average +number of particles per processor. Thus an imbalance factor of 1.0 is +perfect balance. For 10000 particles running on 10 processors, if the +most heavily loaded processor has 1200 particles, then the factor is +1.2, meaning there is a 20% imbalance. The change in the maximum +number of particles (on any processor) is also printed. + +IMPORTANT NOTE: This command attempts to minimize the imbalance +factor, as defined above. But because of the topology constraint that +only the cutting planes (lines) between processors are moved, there +are many irregular distributions of particles, where this factor +cannot be shrunk to 1.0, particuarly in 3d. Also, computational cost +is not strictly proportional to particle count, and changing the +relative size and shape of processor sub-domains may lead to +additional computational and communication overheads, e.g. in the PPPM +solver used via the "kspace_style"_kspace_style.html command. Thus +you should benchmark the run times of your simulation before and after +balancing. + +:line + +The {x}, {y}, and {z} keywords adjust the position of cutting planes +between processor sub-domains in a specific dimension. The {uniform} +argument spaces the planes evenly, as in the left diagram above. The +{numeric} argument requires you to list Ps-1 numbers that specify the +position of the cutting planes. This requires that you know Ps = Px +or Py or Pz = the number of processors assigned by LAMMPS to the +relevant dimension. This assignment is made (and the Px, Py, Pz +values printed out) when the simulation box is created by the +"create_box" or "read_data" or "read_restart" command and is +influenced by the settings of the "processors" command. + +Each of the numeric values must be between 0 and 1, and they must be +listed in ascending order. They represent the fractional position of +the cutting place. The left (or lower) edge of the box is 0.0, and +the right (or upper) edge is 1.0. Neither of these values is +specified. Only the interior Ps-1 positions are specified. Thus is +there are 2 procesors in the x dimension, you specify a single value +such as 0.75, which would make the left processor's sub-domain 3x +larger than the right processor's sub-domain. + +:line + +The {dynamic} keyword changes the cutting planes between processors in +an iterative fashion, seeking to reduce the imbalance factor, similar +to how the "fix balance"_fix_balance.html command operates. Note that +this keyword begins its operation from the current processor +partitioning, which could be uniform or the result of a previous +balance command. + +The {dimstr} argument is a string of characters, each of which must be +an "x" or "y" or "z". The characters can appear in any order, and can +be repeated as many times as desired. These are all valid {dimstr} +arguments: "x" or "xyzyx" or "yyyzzz". + +Balancing proceeds by adjusting the cutting planes in each of the +dimensions listed in {dimstr}, one dimension at a time. The entire +sequence of dimensions is repeated {Nrepeat} times. For a single +dimension, the balancing operation (described below) is iterated on +{Niter} times. After each dimension finishes, the imbalance factor is +re-computed, and the balancing operation halts if the {thresh} +criterion is met. + +The interplay between {Nrepeat}, {Niter}, and {dimstr} means that +these commands do essentially the same thing, the only difference +being how often the imbalance factor is computed and checked against +the threshhold: + +balance y dynamic 5 10 x 1.2 +balance y dynamic 1 10 xxxxx 1.2 +balance y dynamic 50 1 x 1.2 :pre + +A rebalance operation in a single dimension is performed using an +iterative "diffusive" load-balancing algorithm. One iteration (which +is repeated {Niter} times), works as follows. Assume there are Px +processors in the x dimension. This defines Px slices of the +simulation, each of which contains Py*Pz processors. The task is to +adjust the position of the Px-1 cuts between slices, leaving the end +cuts unchanged (left and right edges of the simulation box). + +The iteration beings by calculating the number of atoms within each of +the Px slices. Then for each slice, its atom count is compared to its +neighbors. If a slice has more atoms than its left (or right) +neighbor, the cut is moved towards the center of the slice, +effectively shrinking the width of the slice and migrating atoms to +the other slice. The distance to move the cut is a function of the +"density" of atoms in the donor slice and the difference in counts +between the 2 slices. A damping factor is also applied to avoid +oscillations in the position of the cutting plane as iterations +proceed. Hence the "diffusive" nature of the algorithm as work +(atoms) effectively diffuses from highly loaded processors to +less-loaded processors. + +:line + +[Restrictions:] + +The {dynamic} keyword cannot be used with the {x}, {y}, or {z} +arguments. + +For 2d simulations, the {z} keyword cannot be used. Nor can a "z" +appear in {dimstr} for the {dynamic} keyword. + +[Related commands:] + +"processors"_processors.html, "fix balance"_fix_balance.html + +[Default:] none + +:line + +add a citation and image