diff --git a/doc/JPG/balance.jpg b/doc/JPG/balance.jpg
new file mode 100644
index 0000000000..ea8bdf1466
Binary files /dev/null and b/doc/JPG/balance.jpg differ
diff --git a/doc/balance.html b/doc/balance.html
new file mode 100644
index 0000000000..02336da31e
--- /dev/null
+++ b/doc/balance.html
@@ -0,0 +1,206 @@
+<HTML>
+<CENTER><A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A> - <A HREF = "Manual.html">LAMMPS Documentation</A> - <A HREF = "Section_commands.html#comm">LAMMPS Commands</A> 
+</CENTER>
+
+
+
+
+
+
+<HR>
+
+<H3>balance command 
+</H3>
+<P>NOTE: the fix balance command referred to here for dynamic load
+balancing has not yet been released.
+</P>
+<P><B>Syntax:</B>
+</P>
+<PRE>balance group-ID keyword args ... 
+</PRE>
+<LI>group-ID = ID of group of atoms to balance across processors 
+</UL>
+<LI>one or more keyword/arg pairs may be appended 
+
+<LI>keyword = <I>x</I> or <I>y</I> or <I>z</I> or <I>dynamic</I> 
+
+<PRE>  <I>x</I> args = <I>uniform</I> or Px-1 numbers between 0 and 1
+    <I>uniform</I> = evenly spaced cuts between processors in x dimension
+    numbers = Px-1 ascending values between 0 and 1, Px - # of processors in x dimension
+  <I>y</I> args = <I>uniform</I> or Py-1 numbers between 0 and 1
+    <I>uniform</I> = evenly spaced cuts between processors in x dimension
+    numbers = Py-1 ascending values between 0 and 1, Py - # of processors in y dimension
+  <I>z</I> args = <I>uniform</I> or Pz-1 numbers between 0 and 1
+    <I>uniform</I> = evenly spaced cuts between processors in x dimension
+    numbers = Pz-1 ascending values between 0 and 1, Pz - # of processors in z dimension
+  <I>dynamic</I> args = Nrepeat Niter dimstr thresh
+    Nrepeat = # of times to repeat dimstr sequence
+    Niter = # of times to iterate within each dimension of dimstr sequence
+    dimstr = sequence of letters containing "x" or "y" or "z"
+    thresh = stop balancing when this imbalance threshhold is reached 
+</PRE>
+
+</UL>
+<P><B>Examples:</B>
+</P>
+<PRE>balance x uniform y 0.4 0.5 0.6
+balance dynamic 1 5 xzx 1.1
+balance dynamic 5 10 x 1.0 
+</PRE>
+<P><B>Description:</B>
+</P>
+<P>This command adjusts the size of processor sub-domains within the
+simulation box, to attempt to balance the number of particles and thus
+the computational cost (load) evenly across processors.  The load
+balancing is "static" in the sense that this command performs the
+balancing once, before or between simulations.  The processor
+sub-domains will then remain static during the subsequent run.  To
+perform "dynammic" balancing, see the <A HREF = "fix_balance.html">fix balance</A>
+command, which can adjust processor sub-domain sizes on-the-fly during
+a <A HREF = "run.html">run</A>.
+</P>
+<P>Load-balancing is only useful if the particles in the simulation box
+have a spatially-varying density distribution.  E.g. a model of a
+vapor/liquid interface, or a solid with an irregular-shaped geometry
+containing void regions.  In this case, the LAMMPS default of dividing
+the simulation box volume into a regular-spaced grid of processor
+sub-domain, with one equal-volume sub-domain per procesor, may assign
+very different numbers of particles per processor.  This can lead to
+poor performance in a scalability sense, when the simulation is run in
+parallel.
+</P>
+<P>Note that the <A HREF = "processors.html">processors</A> command gives you some
+control over how the box volume is split across processors.
+Specifically, for a Px by Py by Pz grid of processors, it lets you
+choose Px, Py, and Pz, subject to the constraint that Px * Py * Pz =
+P, the total number of processors.  This can be sufficient to achieve
+good load-balance for some models on some processor counts.  However,
+all the processor sub-domains will still be the same shape and have
+the same volume.
+</P>
+<P>This command does not alter the topology of the Px by Py by Pz grid or
+processors.  But it shifts the cutting planes between processors (in
+3d, or lines in 2d), which adjusts the volume (area in 2d) assigned to
+each processor, as in the following 2d diagram.  The left diagram is
+the default partitioning of the simulation box across processors (one
+sub-box for each of 16 processors); the right diagram is after
+balancing.
+</P>
+<CENTER><IMG SRC = "JPG/balance.jpg">
+</CENTER>
+<P>When the balance command completes, it prints out the change in
+"imbalance factor".  The imbalance factor is defined as the maximum
+number of particles owned by any processor, divided by the average
+number of particles per processor.  Thus an imbalance factor of 1.0 is
+perfect balance.  For 10000 particles running on 10 processors, if the
+most heavily loaded processor has 1200 particles, then the factor is
+1.2, meaning there is a 20% imbalance.  The change in the maximum
+number of particles (on any processor) is also printed.
+</P>
+<P>IMPORTANT NOTE: This command attempts to minimize the imbalance
+factor, as defined above.  But because of the topology constraint that
+only the cutting planes (lines) between processors are moved, there
+are many irregular distributions of particles, where this factor
+cannot be shrunk to 1.0, particuarly in 3d.  Also, computational cost
+is not strictly proportional to particle count, and changing the
+relative size and shape of processor sub-domains may lead to
+additional computational and communication overheads, e.g. in the PPPM
+solver used via the <A HREF = "kspace_style.html">kspace_style</A> command.  Thus
+you should benchmark the run times of your simulation before and after
+balancing.
+</P>
+<HR>
+
+<P>The <I>x</I>, <I>y</I>, and <I>z</I> keywords adjust the position of cutting planes
+between processor sub-domains in a specific dimension.  The <I>uniform</I>
+argument spaces the planes evenly, as in the left diagram above.  The
+<I>numeric</I> argument requires you to list Ps-1 numbers that specify the
+position of the cutting planes.  This requires that you know Ps = Px
+or Py or Pz = the number of processors assigned by LAMMPS to the
+relevant dimension.  This assignment is made (and the Px, Py, Pz
+values printed out) when the simulation box is created by the
+"create_box" or "read_data" or "read_restart" command and is
+influenced by the settings of the "processors" command.
+</P>
+<P>Each of the numeric values must be between 0 and 1, and they must be
+listed in ascending order.  They represent the fractional position of
+the cutting place.  The left (or lower) edge of the box is 0.0, and
+the right (or upper) edge is 1.0.  Neither of these values is
+specified.  Only the interior Ps-1 positions are specified.  Thus is
+there are 2 procesors in the x dimension, you specify a single value
+such as 0.75, which would make the left processor's sub-domain 3x
+larger than the right processor's sub-domain.
+</P>
+<HR>
+
+<P>The <I>dynamic</I> keyword changes the cutting planes between processors in
+an iterative fashion, seeking to reduce the imbalance factor, similar
+to how the <A HREF = "fix_balance.html">fix balance</A> command operates.  Note that
+this keyword begins its operation from the current processor
+partitioning, which could be uniform or the result of a previous
+balance command.
+</P>
+<P>The <I>dimstr</I> argument is a string of characters, each of which must be
+an "x" or "y" or "z".  The characters can appear in any order, and can
+be repeated as many times as desired.  These are all valid <I>dimstr</I>
+arguments: "x" or "xyzyx" or "yyyzzz".
+</P>
+<P>Balancing proceeds by adjusting the cutting planes in each of the
+dimensions listed in <I>dimstr</I>, one dimension at a time.  The entire
+sequence of dimensions is repeated <I>Nrepeat</I> times.  For a single
+dimension, the balancing operation (described below) is iterated on
+<I>Niter</I> times.  After each dimension finishes, the imbalance factor is
+re-computed, and the balancing operation halts if the <I>thresh</I>
+criterion is met.
+</P>
+<P>The interplay between <I>Nrepeat</I>, <I>Niter</I>, and <I>dimstr</I> means that
+these commands do essentially the same thing, the only difference
+being how often the imbalance factor is computed and checked against
+the threshhold:
+</P>
+<PRE>balance y dynamic 5 10 x 1.2
+balance y dynamic 1 10 xxxxx 1.2
+balance y dynamic 50 1 x 1.2 
+</PRE>
+<P>A rebalance operation in a single dimension is performed using an
+iterative "diffusive" load-balancing algorithm.  One iteration (which
+is repeated <I>Niter</I> times), works as follows.  Assume there are Px
+processors in the x dimension.  This defines Px slices of the
+simulation, each of which contains Py*Pz processors.  The task is to
+adjust the position of the Px-1 cuts between slices, leaving the end
+cuts unchanged (left and right edges of the simulation box).
+</P>
+<P>The iteration beings by calculating the number of atoms within each of
+the Px slices.  Then for each slice, its atom count is compared to its
+neighbors.  If a slice has more atoms than its left (or right)
+neighbor, the cut is moved towards the center of the slice,
+effectively shrinking the width of the slice and migrating atoms to
+the other slice.  The distance to move the cut is a function of the
+"density" of atoms in the donor slice and the difference in counts
+between the 2 slices.  A damping factor is also applied to avoid
+oscillations in the position of the cutting plane as iterations
+proceed.  Hence the "diffusive" nature of the algorithm as work
+(atoms) effectively diffuses from highly loaded processors to
+less-loaded processors.
+</P>
+<HR>
+
+<P><B>Restrictions:</B>
+</P>
+<P>The <I>dynamic</I> keyword cannot be used with the <I>x</I>, <I>y</I>, or <I>z</I>
+arguments.
+</P>
+<P>For 2d simulations, the <I>z</I> keyword cannot be used.  Nor can a "z"
+appear in <I>dimstr</I> for the <I>dynamic</I> keyword.
+</P>
+<P><B>Related commands:</B>
+</P>
+<P><A HREF = "processors.html">processors</A>, <A HREF = "fix_balance.html">fix balance</A>
+</P>
+<P><B>Default:</B> none
+</P>
+<HR>
+
+<P>add a citation and image
+</P>
+</HTML>
diff --git a/doc/balance.txt b/doc/balance.txt
new file mode 100644
index 0000000000..84c598d21c
--- /dev/null
+++ b/doc/balance.txt
@@ -0,0 +1,197 @@
+"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
+
+:link(lws,http://lammps.sandia.gov)
+:link(ld,Manual.html)
+:link(lc,Section_commands.html#comm)
+
+:line
+
+balance command :h3
+
+NOTE: the fix balance command referred to here for dynamic load
+balancing has not yet been released.
+
+[Syntax:]
+
+balance group-ID keyword args ... :pre
+
+group-ID = ID of group of atoms to balance across processors :ule,l
+one or more keyword/arg pairs may be appended :l
+keyword = {x} or {y} or {z} or {dynamic} :l
+  {x} args = {uniform} or Px-1 numbers between 0 and 1
+    {uniform} = evenly spaced cuts between processors in x dimension
+    numbers = Px-1 ascending values between 0 and 1, Px - # of processors in x dimension
+  {y} args = {uniform} or Py-1 numbers between 0 and 1
+    {uniform} = evenly spaced cuts between processors in x dimension
+    numbers = Py-1 ascending values between 0 and 1, Py - # of processors in y dimension
+  {z} args = {uniform} or Pz-1 numbers between 0 and 1
+    {uniform} = evenly spaced cuts between processors in x dimension
+    numbers = Pz-1 ascending values between 0 and 1, Pz - # of processors in z dimension
+  {dynamic} args = Nrepeat Niter dimstr thresh
+    Nrepeat = # of times to repeat dimstr sequence
+    Niter = # of times to iterate within each dimension of dimstr sequence
+    dimstr = sequence of letters containing "x" or "y" or "z"
+    thresh = stop balancing when this imbalance threshhold is reached :pre
+:ule
+
+[Examples:]
+
+balance x uniform y 0.4 0.5 0.6
+balance dynamic 1 5 xzx 1.1
+balance dynamic 5 10 x 1.0 :pre
+
+[Description:]
+
+This command adjusts the size of processor sub-domains within the
+simulation box, to attempt to balance the number of particles and thus
+the computational cost (load) evenly across processors.  The load
+balancing is "static" in the sense that this command performs the
+balancing once, before or between simulations.  The processor
+sub-domains will then remain static during the subsequent run.  To
+perform "dynammic" balancing, see the "fix balance"_fix_balance.html
+command, which can adjust processor sub-domain sizes on-the-fly during
+a "run"_run.html.
+
+Load-balancing is only useful if the particles in the simulation box
+have a spatially-varying density distribution.  E.g. a model of a
+vapor/liquid interface, or a solid with an irregular-shaped geometry
+containing void regions.  In this case, the LAMMPS default of dividing
+the simulation box volume into a regular-spaced grid of processor
+sub-domain, with one equal-volume sub-domain per procesor, may assign
+very different numbers of particles per processor.  This can lead to
+poor performance in a scalability sense, when the simulation is run in
+parallel.
+
+Note that the "processors"_processors.html command gives you some
+control over how the box volume is split across processors.
+Specifically, for a Px by Py by Pz grid of processors, it lets you
+choose Px, Py, and Pz, subject to the constraint that Px * Py * Pz =
+P, the total number of processors.  This can be sufficient to achieve
+good load-balance for some models on some processor counts.  However,
+all the processor sub-domains will still be the same shape and have
+the same volume.
+
+This command does not alter the topology of the Px by Py by Pz grid or
+processors.  But it shifts the cutting planes between processors (in
+3d, or lines in 2d), which adjusts the volume (area in 2d) assigned to
+each processor, as in the following 2d diagram.  The left diagram is
+the default partitioning of the simulation box across processors (one
+sub-box for each of 16 processors); the right diagram is after
+balancing.
+
+:c,image(JPG/balance.jpg)
+
+When the balance command completes, it prints out the change in
+"imbalance factor".  The imbalance factor is defined as the maximum
+number of particles owned by any processor, divided by the average
+number of particles per processor.  Thus an imbalance factor of 1.0 is
+perfect balance.  For 10000 particles running on 10 processors, if the
+most heavily loaded processor has 1200 particles, then the factor is
+1.2, meaning there is a 20% imbalance.  The change in the maximum
+number of particles (on any processor) is also printed.
+
+IMPORTANT NOTE: This command attempts to minimize the imbalance
+factor, as defined above.  But because of the topology constraint that
+only the cutting planes (lines) between processors are moved, there
+are many irregular distributions of particles, where this factor
+cannot be shrunk to 1.0, particuarly in 3d.  Also, computational cost
+is not strictly proportional to particle count, and changing the
+relative size and shape of processor sub-domains may lead to
+additional computational and communication overheads, e.g. in the PPPM
+solver used via the "kspace_style"_kspace_style.html command.  Thus
+you should benchmark the run times of your simulation before and after
+balancing.
+
+:line
+
+The {x}, {y}, and {z} keywords adjust the position of cutting planes
+between processor sub-domains in a specific dimension.  The {uniform}
+argument spaces the planes evenly, as in the left diagram above.  The
+{numeric} argument requires you to list Ps-1 numbers that specify the
+position of the cutting planes.  This requires that you know Ps = Px
+or Py or Pz = the number of processors assigned by LAMMPS to the
+relevant dimension.  This assignment is made (and the Px, Py, Pz
+values printed out) when the simulation box is created by the
+"create_box" or "read_data" or "read_restart" command and is
+influenced by the settings of the "processors" command.
+
+Each of the numeric values must be between 0 and 1, and they must be
+listed in ascending order.  They represent the fractional position of
+the cutting place.  The left (or lower) edge of the box is 0.0, and
+the right (or upper) edge is 1.0.  Neither of these values is
+specified.  Only the interior Ps-1 positions are specified.  Thus is
+there are 2 procesors in the x dimension, you specify a single value
+such as 0.75, which would make the left processor's sub-domain 3x
+larger than the right processor's sub-domain.
+
+:line
+
+The {dynamic} keyword changes the cutting planes between processors in
+an iterative fashion, seeking to reduce the imbalance factor, similar
+to how the "fix balance"_fix_balance.html command operates.  Note that
+this keyword begins its operation from the current processor
+partitioning, which could be uniform or the result of a previous
+balance command.
+
+The {dimstr} argument is a string of characters, each of which must be
+an "x" or "y" or "z".  The characters can appear in any order, and can
+be repeated as many times as desired.  These are all valid {dimstr}
+arguments: "x" or "xyzyx" or "yyyzzz".
+
+Balancing proceeds by adjusting the cutting planes in each of the
+dimensions listed in {dimstr}, one dimension at a time.  The entire
+sequence of dimensions is repeated {Nrepeat} times.  For a single
+dimension, the balancing operation (described below) is iterated on
+{Niter} times.  After each dimension finishes, the imbalance factor is
+re-computed, and the balancing operation halts if the {thresh}
+criterion is met.
+
+The interplay between {Nrepeat}, {Niter}, and {dimstr} means that
+these commands do essentially the same thing, the only difference
+being how often the imbalance factor is computed and checked against
+the threshhold:
+
+balance y dynamic 5 10 x 1.2
+balance y dynamic 1 10 xxxxx 1.2
+balance y dynamic 50 1 x 1.2 :pre
+
+A rebalance operation in a single dimension is performed using an
+iterative "diffusive" load-balancing algorithm.  One iteration (which
+is repeated {Niter} times), works as follows.  Assume there are Px
+processors in the x dimension.  This defines Px slices of the
+simulation, each of which contains Py*Pz processors.  The task is to
+adjust the position of the Px-1 cuts between slices, leaving the end
+cuts unchanged (left and right edges of the simulation box).
+
+The iteration beings by calculating the number of atoms within each of
+the Px slices.  Then for each slice, its atom count is compared to its
+neighbors.  If a slice has more atoms than its left (or right)
+neighbor, the cut is moved towards the center of the slice,
+effectively shrinking the width of the slice and migrating atoms to
+the other slice.  The distance to move the cut is a function of the
+"density" of atoms in the donor slice and the difference in counts
+between the 2 slices.  A damping factor is also applied to avoid
+oscillations in the position of the cutting plane as iterations
+proceed.  Hence the "diffusive" nature of the algorithm as work
+(atoms) effectively diffuses from highly loaded processors to
+less-loaded processors.
+
+:line
+
+[Restrictions:]
+
+The {dynamic} keyword cannot be used with the {x}, {y}, or {z}
+arguments.
+
+For 2d simulations, the {z} keyword cannot be used.  Nor can a "z"
+appear in {dimstr} for the {dynamic} keyword.
+
+[Related commands:]
+
+"processors"_processors.html, "fix balance"_fix_balance.html
+
+[Default:] none
+
+:line
+
+add a citation and image