git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@6380 f3b2605a-c512-4ea7-a41b-209d697bcdaa

2011-06-13 23:18:49 +00:00
parent c7e91e5418
commit ec3f68bed2
12 changed files with 698 additions and 33 deletions
--- a/doc/Section_accelerate.html
+++ b/doc/Section_accelerate.html
@ -106,6 +106,8 @@ to 20% savings.

 <H4><A NAME = "10_2"></A>10.2 GPU package 
 </H4>
+<P>The GPU package was developed by Mike Brown at ORNL.
+</P>
 <P>Additional requirements in your input script to run the styles with a
 <I>gpu</I> suffix are as follows:
 </P>
@ -113,8 +115,6 @@ to 20% savings.
 gpu</A> command must be used.  The fix controls the GPU
 selection and initialization steps.
 </P>
-<P>The GPU package was developed by Mike Brown at ORNL.
-</P>
 <P>A few LAMMPS <A HREF = "pair_style.html">pair styles</A> can be run on graphical
 processing units (GPUs).  We plan to add more over time.  Currently,
 they only support NVIDIA GPU cards.  To use them you need to install
@ -130,7 +130,7 @@ certain NVIDIA CUDA software on your system:
 <H4>GPU configuration 
 </H4>
 <P>When using GPUs, you are restricted to one physical GPU per LAMMPS
-process. Multiple processes can share a single GPU and in many cases
+process.  Multiple processes can share a single GPU and in many cases
 it will be more efficient to run with multiple processes per GPU. Any
 GPU accelerated style requires that <A HREF = "fix_gpu.html">fix gpu</A> be used in
 the input script to select and initialize the GPUs. The format for the
@ -252,8 +252,195 @@ latter requires that your GPU card supports double precision.
 <P>The USER-CUDA package was developed by Christian Trott at U Technology
 Ilmenau in Germany.
 </P>
+<P>This package will only be of any use to you, if you have an NVIDIA(tm)
+graphics card being CUDA(tm) enabled. Your GPU needs to support
+Compute Capability 1.3. This list may help
+you to find out the Compute Capability of your card:
+</P>
+<P>http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units
+</P>
+<P>Install the Nvidia Cuda Toolkit in version 3.2 or higher and the
+corresponding GPU drivers. The Nvidia Cuda SDK is not required for
+LAMMPSCUDA but we recommend to install it and
+</P>
+<P>make sure that the sample projects can be compiled without problems.
+</P>
+<P>You should also be able to compile LAMMPS by typing
+</P>
+<P><I>make YourMachine</I>
+</P>
+<P>inside the src directory of LAMMPS root path. If not, you should
+consult the LAMMPS documentation.
+</P>
+<H4>Compilation 
+</H4>
+<P>If your <I>CUDA</I> toolkit is not installed in the default directoy
+<I>/usr/local/cuda</I> edit the file <I>lib/cuda/Makefile.common</I>
+accordingly.
+</P>
+<P>Go to  <I>lib/cuda/</I> and type 
+</P>
+<P><I>make OPTIONS</I>
+</P>
+<P>where <I>OPTIONS</I> are one or more of the following:
+</P>
+<UL><LI><I>precision = 2</I> set precision level: 1 .. single precision, 2
+.. double precision, 3 .. positions in double precision, 4
+.. positions and velocities in double precision 
+
+<LI><I>arch = 20</I> set GPU compute capability: 20 .. CC2.0 (GF100/110
+e.g. C2050,GTX580,GTX470), 21 .. CC2.1 (GF104/114 e.g. GTX560, GTX460,
+GTX450), 13 .. CC1.3 (GF200 e.g. C1060, GTX285) 
+
+<LI><I>prec_timer = 1</I> do not use precision timers if set to 0. This is
+usually only usefull for compiling on Mac machines. 
+
+<LI><I>dbg = 0</I> activate debug mode when setting to 1. Only usefull for
+developers. 
+
+<LI><I>cufft = 1</I> set CUDA FFT library. Can currently only be used to not
+compile with cufft support (set to 0). In the future other CUDA
+enabled FFT libraries might be supported. 
+</UL>
+<P>The settings will be written to the <I>lib/cuda/Makefile.defaults</I>. When
+compiling with <I>make</I> only those settings will be used.
+</P>
+<P>Go to <I>src</I>, install the USER-CUDA package with <I>make yes-USER-CUDA</I>
+and compile the binary with <I>make YourMachine</I>. You might need to
+delete old object files if you compiled without the USER-CUDA package
+before, using the same Machine file (<I>rm Obj_YourMachine/*</I>).
+</P>
+<P>CUDA versions of classes are only installed if the corresponding CPU
+versions are installed as well. E.g. you need to install the KSPACE
+package to use <I>pppm/cuda</I>.
+</P>
+<H4>Usage 
+</H4>
+<P>In order to make use of the GPU acceleration provided by the USER-CUDA
+package, you only have to add
+</P>
+<P><I>accelerator cuda</I>
+</P>
+<P>at the top of your input script. See the <A HREF = "accelerator.html">accelerator</A> command for details of additional options.
+</P>
+<P>When compiling with USER-CUDA support the <A HREF = "Section_start.html#2_6">-accelerator command-line
+switch</A> is effectively set to "cuda" by default
+and does not have to be given.
+</P>
+<P>If you want to run simulations without using the "cuda" styles with
+the same binary, you need to turn it explicitely off by giving "-a
+none", "-a opt" or "-a gpu" as a command-
+</P>
+<P>line argument.
+</P>
+<P>The kspace style <I>pppm/cuda</I> has to be requested explicitely.
+</P>
 <HR>

 <H4><A NAME = "10_4"></A>10.4 Comparison of GPU and USER-CUDA packages 
 </H4>
+<P>The USER-CUDA package is an alternative package for GPU acceleration 
+that runs as much of the simulation as possible on the GPU. Depending on 
+the simulation, this can provide a significant speedup when the number 
+of atoms per GPU is large.
+</P>
+<P>The styles available for GPU acceleration 
+will be different in each package.
+</P>
+<P>The main difference between the "GPU" and the "USER-CUDA" package is
+that while the latter aims at calculating everything on the device the
+GPU package uses it as an accelerator for the pair force, neighbor
+list and pppm calculations only. As a consequence in different
+scenarios either package can be faster. Generally the GPU package is
+faster than the USER-CUDA package, if the number of atoms per device
+is small. Also the GPU package profits from oversubscribing
+devices. Hence one usually wants to launch two (or more) MPI processes
+per device.
+</P>
+<P>The exact crossover where the USER-CUDA package becomes faster depends
+strongly on the pair-style. For example for a simple Lennard Jones
+system the crossover (in single precision) can often be found between
+50,000 - 100,000 atoms per device. When performing double precision
+calculations this threshold can be significantly smaller. As a result
+the GPU package can show better "strong scaling" behaviour in
+comparison with the USER-CUDA package as long as this limit of atoms
+per GPU is not reached.
+</P>
+<P>Another scenario where the GPU package can be faster is, when a lot of
+bonded interactions are calculated. Those are handled by both packages
+by the host while the device simultaniously calculates the
+pair-forces. Since, when using the GPU package, one launches several
+MPI processes per device, this work is spread over more CPU cores as
+compared to running the same simulation with the USER-CUDA package.
+</P>
+<P>As a side note: the GPU package performance depends to some extent on
+optimal bandwidth between host and device. Hence its performance is
+affected if no full 16 PCIe lanes are available for each device. In
+HPC environments this can be the case if S2050/70 servers are used,
+where two devices generally share one PCIe 2.0 16x slot. Also many
+multi GPU mainboards do not provide full 16 lanes to each of the PCIe
+2.0 16x slots.
+</P>
+<P>While the GPU package uses considerable more device memory than the
+USER-CUDA package, this is generally not much of a problem. Typically
+run times are larger than desired, before the memory is exhausted.
+</P>
+<P>Currently the USER-CUDA package supports a wider range of
+force-fields. On the other hand its performance is considerably
+reduced if one has to use a fix at every timestep, which is not yet
+available as a "CUDA"-accelerated version.
+</P>
+<P>In the end for each simulations its best to just try both packages and
+see which one is performing better in the particular situation.
+</P>
+<H4>Benchmark 
+</H4>
+<P>In the following 4 benchmark systems which are supported by both the
+GPu and the CUDA package are shown:
+</P>
+<P>1. Lennard Jones, 2.5A
+256,000 atoms
+2.5 A cutoff
+0.844 density
+</P>
+<P>2. Lennard Jones, 5.0A
+256,000 atoms
+5.0 A cutoff
+0.844 density
+</P>
+<P>3. Rhodopsin model
+256,000 atoms
+10A cutoff
+Coulomb via PPPM
+</P>
+<P>4. Lihtium-Phosphate
+295650 atoms
+15A cutoff
+Coulomb via PPPM
+</P>
+<P>Hardware:
+Workstation:
+2x GTX470
+i7 950@3GHz
+24Gb DDR3 @ 1066Mhz
+CentOS 5.5
+CUDA 3.2
+Driver 260.19.12
+</P>
+<P>eStella:
+6 Nodes
+2xC2050
+2xQDR Infiniband interconnect(aggregate bandwidth 80GBps)
+Intel X5650 HexCore @ 2.67GHz
+SL 5.5
+CUDA 3.2
+Driver 260.19.26 
+</P>
+<P>Keeneland:
+HP SL-390 (Ariston) cluster
+120 nodes
+2x Intel Westmere hex-core CPUs
+3xC2070s
+QDR InfiniBand interconnec
+</P>
 </HTML>
--- a/doc/Section_accelerate.txt
+++ b/doc/Section_accelerate.txt
@ -102,6 +102,8 @@ to 20% savings.

 10.2 GPU package :h4,link(10_2)

+The GPU package was developed by Mike Brown at ORNL.
+
 Additional requirements in your input script to run the styles with a
 {gpu} suffix are as follows:

@ -109,10 +111,6 @@ The "newton pair"_newton.html setting must be {off} and the "fix
 gpu"_fix_gpu.html command must be used.  The fix controls the GPU
 selection and initialization steps.

-
-
-The GPU package was developed by Mike Brown at ORNL.
-
 A few LAMMPS "pair styles"_pair_style.html can be run on graphical
 processing units (GPUs).  We plan to add more over time.  Currently,
 they only support NVIDIA GPU cards.  To use them you need to install
@ -128,7 +126,7 @@ properties :ul
 GPU configuration :h4

 When using GPUs, you are restricted to one physical GPU per LAMMPS
-process. Multiple processes can share a single GPU and in many cases
+process.  Multiple processes can share a single GPU and in many cases
 it will be more efficient to run with multiple processes per GPU. Any
 GPU accelerated style requires that "fix gpu"_fix_gpu.html be used in
 the input script to select and initialize the GPUs. The format for the
@ -250,6 +248,195 @@ latter requires that your GPU card supports double precision.
 The USER-CUDA package was developed by Christian Trott at U Technology
 Ilmenau in Germany.

+This package will only be of any use to you, if you have an NVIDIA(tm)
+graphics card being CUDA(tm) enabled. Your GPU needs to support
+Compute Capability 1.3. This list may help
+you to find out the Compute Capability of your card:
+
+http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units
+
+Install the Nvidia Cuda Toolkit in version 3.2 or higher and the
+corresponding GPU drivers. The Nvidia Cuda SDK is not required for
+LAMMPSCUDA but we recommend to install it and
+
+make sure that the sample projects can be compiled without problems.
+
+You should also be able to compile LAMMPS by typing
+
+{make YourMachine}
+
+inside the src directory of LAMMPS root path. If not, you should
+consult the LAMMPS documentation.
+
+Compilation :h4
+
+If your {CUDA} toolkit is not installed in the default directoy
+{/usr/local/cuda} edit the file {lib/cuda/Makefile.common}
+accordingly.
+
+Go to  {lib/cuda/} and type 
+
+{make OPTIONS}
+
+where {OPTIONS} are one or more of the following:
+
+{precision = 2} set precision level: 1 .. single precision, 2
+.. double precision, 3 .. positions in double precision, 4
+.. positions and velocities in double precision :ulb,l
+
+{arch = 20} set GPU compute capability: 20 .. CC2.0 (GF100/110
+e.g. C2050,GTX580,GTX470), 21 .. CC2.1 (GF104/114 e.g. GTX560, GTX460,
+GTX450), 13 .. CC1.3 (GF200 e.g. C1060, GTX285) :l
+
+{prec_timer = 1} do not use precision timers if set to 0. This is
+usually only usefull for compiling on Mac machines. :l
+
+{dbg = 0} activate debug mode when setting to 1. Only usefull for
+developers. :l
+
+{cufft = 1} set CUDA FFT library. Can currently only be used to not
+compile with cufft support (set to 0). In the future other CUDA
+enabled FFT libraries might be supported. :l,ule
+
+The settings will be written to the {lib/cuda/Makefile.defaults}. When
+compiling with {make} only those settings will be used.
+
+Go to {src}, install the USER-CUDA package with {make yes-USER-CUDA}
+and compile the binary with {make YourMachine}. You might need to
+delete old object files if you compiled without the USER-CUDA package
+before, using the same Machine file ({rm Obj_YourMachine/*}).
+
+CUDA versions of classes are only installed if the corresponding CPU
+versions are installed as well. E.g. you need to install the KSPACE
+package to use {pppm/cuda}.
+
+Usage :h4
+
+In order to make use of the GPU acceleration provided by the USER-CUDA
+package, you only have to add
+
+{accelerator cuda}
+
+at the top of your input script. See the "accelerator"_accelerator.html command for details of additional options.
+
+When compiling with USER-CUDA support the "-accelerator command-line
+switch"_Section_start.html#2_6 is effectively set to "cuda" by default
+and does not have to be given.
+
+If you want to run simulations without using the "cuda" styles with
+the same binary, you need to turn it explicitely off by giving "-a
+none", "-a opt" or "-a gpu" as a command-
+
+line argument.
+
+The kspace style {pppm/cuda} has to be requested explicitely.
+
+
 :line

 10.4 Comparison of GPU and USER-CUDA packages :h4,link(10_4)
+
+The USER-CUDA package is an alternative package for GPU acceleration 
+that runs as much of the simulation as possible on the GPU. Depending on 
+the simulation, this can provide a significant speedup when the number 
+of atoms per GPU is large.
+
+The styles available for GPU acceleration 
+will be different in each package.
+
+The main difference between the "GPU" and the "USER-CUDA" package is
+that while the latter aims at calculating everything on the device the
+GPU package uses it as an accelerator for the pair force, neighbor
+list and pppm calculations only. As a consequence in different
+scenarios either package can be faster. Generally the GPU package is
+faster than the USER-CUDA package, if the number of atoms per device
+is small. Also the GPU package profits from oversubscribing
+devices. Hence one usually wants to launch two (or more) MPI processes
+per device.
+
+The exact crossover where the USER-CUDA package becomes faster depends
+strongly on the pair-style. For example for a simple Lennard Jones
+system the crossover (in single precision) can often be found between
+50,000 - 100,000 atoms per device. When performing double precision
+calculations this threshold can be significantly smaller. As a result
+the GPU package can show better "strong scaling" behaviour in
+comparison with the USER-CUDA package as long as this limit of atoms
+per GPU is not reached.
+
+Another scenario where the GPU package can be faster is, when a lot of
+bonded interactions are calculated. Those are handled by both packages
+by the host while the device simultaniously calculates the
+pair-forces. Since, when using the GPU package, one launches several
+MPI processes per device, this work is spread over more CPU cores as
+compared to running the same simulation with the USER-CUDA package.
+
+As a side note: the GPU package performance depends to some extent on
+optimal bandwidth between host and device. Hence its performance is
+affected if no full 16 PCIe lanes are available for each device. In
+HPC environments this can be the case if S2050/70 servers are used,
+where two devices generally share one PCIe 2.0 16x slot. Also many
+multi GPU mainboards do not provide full 16 lanes to each of the PCIe
+2.0 16x slots.
+
+While the GPU package uses considerable more device memory than the
+USER-CUDA package, this is generally not much of a problem. Typically
+run times are larger than desired, before the memory is exhausted.
+
+Currently the USER-CUDA package supports a wider range of
+force-fields. On the other hand its performance is considerably
+reduced if one has to use a fix at every timestep, which is not yet
+available as a "CUDA"-accelerated version.
+
+In the end for each simulations its best to just try both packages and
+see which one is performing better in the particular situation.
+
+Benchmark :h4
+
+In the following 4 benchmark systems which are supported by both the
+GPu and the CUDA package are shown:
+
+1. Lennard Jones, 2.5A
+256,000 atoms
+2.5 A cutoff
+0.844 density
+
+2. Lennard Jones, 5.0A
+256,000 atoms
+5.0 A cutoff
+0.844 density
+ 
+3. Rhodopsin model
+256,000 atoms
+10A cutoff
+Coulomb via PPPM
+
+4. Lihtium-Phosphate
+295650 atoms
+15A cutoff
+Coulomb via PPPM
+
+Hardware:
+Workstation:
+2x GTX470
+i7 950@3GHz
+24Gb DDR3 @ 1066Mhz
+CentOS 5.5
+CUDA 3.2
+Driver 260.19.12
+
+eStella:
+6 Nodes
+2xC2050
+2xQDR Infiniband interconnect(aggregate bandwidth 80GBps)
+Intel X5650 HexCore @ 2.67GHz
+SL 5.5
+CUDA 3.2
+Driver 260.19.26 
+
+Keeneland:
+HP SL-390 (Ariston) cluster
+120 nodes
+2x Intel Westmere hex-core CPUs
+3xC2070s
+QDR InfiniBand interconnec
+
--- a/doc/Section_commands.html
+++ b/doc/Section_commands.html
@ -429,8 +429,9 @@ potentials.  Click on the style itself for a full description:
 <A HREF = "Section_start.html#2_3">LAMMPS is built with the appropriate package</A>.
 </P>
 <DIV ALIGN=center><TABLE  BORDER=1 >
-<TR ALIGN="center"><TD ><A HREF = "pair_buck_coul.html">buck/coul</A></TD><TD ><A HREF = "pair_cmm.html">cg/cmm</A></TD><TD ><A HREF = "pair_cmm.html">cg/cmm/coul/cut</A></TD><TD ><A HREF = "pair_cmm.html">cg/cmm/coul/long</A></TD></TR>
-<TR ALIGN="center"><TD ><A HREF = "pair_eam.html">eam/cd</A></TD><TD ><A HREF = "pair_eff.html">eff/cut</A></TD><TD ><A HREF = "pair_lj_coul.html">lj/coul</A></TD><TD ><A HREF = "pair_reax_c.html">reax/c</A> 
+<TR ALIGN="center"><TD ><A HREF = "pair_awpmd.html">awpmd/cut</A></TD><TD ><A HREF = "pair_buck_coul.html">buck/coul</A></TD><TD ><A HREF = "pair_cmm.html">cg/cmm</A></TD><TD ><A HREF = "pair_cmm.html">cg/cmm/coul/cut</A></TD></TR>
+<TR ALIGN="center"><TD ><A HREF = "pair_cmm.html">cg/cmm/coul/long</A></TD><TD ><A HREF = "pair_eam.html">eam/cd</A></TD><TD ><A HREF = "pair_eff.html">eff/cut</A></TD><TD ><A HREF = "pair_lj_coul.html">lj/coul</A></TD></TR>
+<TR ALIGN="center"><TD ><A HREF = "pair_reax_c.html">reax/c</A> 
 </TD></TR></TABLE></DIV>

 <P>These are accelerated pair styles, which can be used if LAMMPS is
--- a/doc/Section_commands.txt
+++ b/doc/Section_commands.txt
@ -657,6 +657,7 @@ potentials.  Click on the style itself for a full description:
 These are pair styles contributed by users, which can be used if
 "LAMMPS is built with the appropriate package"_Section_start.html#2_3.

+"awpmd/cut"_pair_awpmd.html,
 "buck/coul"_pair_buck_coul.html,
 "cg/cmm"_pair_cmm.html,
 "cg/cmm/coul/cut"_pair_cmm.html,
--- a/doc/atom_style.html
+++ b/doc/atom_style.html
@ -63,7 +63,8 @@ quantities.
 <TR><TD ><I>full</I> </TD><TD > molecular + charge </TD><TD > bio-molecules </TD></TR>
 <TR><TD ><I>molecular</I> </TD><TD > bonds, angles, dihedrals, impropers </TD><TD > uncharged molecules </TD></TR>
 <TR><TD ><I>peri</I> </TD><TD > mass, volume </TD><TD > mesocopic Peridynamic models </TD></TR>
-<TR><TD ><I>sphere</I> </TD><TD > diameter, mass, angular velocity </TD><TD > granular models 
+<TR><TD ><I>sphere</I> </TD><TD > diameter, mass, angular velocity </TD><TD > granular models </TD></TR>
+<TR><TD ><I>wavepacket</I> </TD><TD > charge, spin, eradius, etag, cs_re, cs_im </TD><TD > AWPMD 
 </TD></TR></TABLE></DIV>

 <P>All of the styles assign mass to particles on a per-type basis, using
@ -71,8 +72,8 @@ the <A HREF = "mass.html">mass</A> command, except for the finite-size particle
 styles discussed below.  They assign mass on a per-atom basis.
 </P>
 <P>All of the styles define point particles, except the <I>sphere</I>,
-<I>ellipsoid</I>, <I>electron</I>, and <I>peri</I> styles, which define finite-size
-particles.
+<I>ellipsoid</I>, <I>electron</I>, <I>peri</I>, and <I>wavepacket</I> styles, which define
+finite-size particles.
 </P>
 <P>For the <I>sphere</I> style, the particles are spheres and each stores a
 per-particle diameter and mass.  If the diameter > 0.0, the particle
@ -92,6 +93,12 @@ position, which is represented by the eradius = electron size.
 <P>For the <I>peri</I> style, the particles are spherical and each stores a
 per-particle mass and volume.
 </P>
+<P>The <I>wavepacket</I> style is similar to <I>electron</I>, but the electrons may
+consist of several Gaussian wave packets, summed up with coefficients
+cs= (cs_re,cs_im).  Each of the wave packets is treated as a separate
+particle in LAMMPS, wave packets belonging to the same electron must
+have identical <I>etag</I> values.
+</P>
 <HR>

 <P>Typically, simulations require only a single (non-hybrid) atom style.
@ -121,9 +128,11 @@ section</A>.
 package.  The <I>ellipsoid</I> style is part of the "asphere" package.  The
 <I>peri</I> style is part of the "peri" package for Peridynamics.  The
 <I>electron</I> style is part of the "user-eff" package for <A HREF = "pair_eff.html">electronic
-force fields</A>.  They are only enabled if LAMMPS was
-built with that package.  See the <A HREF = "Section_start.html#2_3">Making
-LAMMPS</A> section for more info.
+force fields</A>.  The <I>wavepacket</I> style is part of the
+"user-awpmd" package for the <A HREF = "pair_awpmd.html">antisymmetrized wave packet MD
+method</A>.  They are only enabled if LAMMPS was built
+with that package.  See the <A HREF = "Section_start.html#2_3">Making LAMMPS</A>
+section for more info.
 </P>
 <P><B>Related commands:</B>
 </P>
--- a/doc/atom_style.txt
+++ b/doc/atom_style.txt
@ -60,15 +60,16 @@ quantities.
 {full} | molecular + charge | bio-molecules |
 {molecular} | bonds, angles, dihedrals, impropers | uncharged molecules |
 {peri} | mass, volume | mesocopic Peridynamic models |
-{sphere} | diameter, mass, angular velocity | granular models :tb(c=3,s=|)
+{sphere} | diameter, mass, angular velocity | granular models |
+{wavepacket} | charge, spin, eradius, etag, cs_re, cs_im | AWPMD :tb(c=3,s=|)

 All of the styles assign mass to particles on a per-type basis, using
 the "mass"_mass.html command, except for the finite-size particle
 styles discussed below.  They assign mass on a per-atom basis.

 All of the styles define point particles, except the {sphere},
-{ellipsoid}, {electron}, and {peri} styles, which define finite-size
-particles.
+{ellipsoid}, {electron}, {peri}, and {wavepacket} styles, which define
+finite-size particles.

 For the {sphere} style, the particles are spheres and each stores a
 per-particle diameter and mass.  If the diameter > 0.0, the particle
@ -88,6 +89,12 @@ position, which is represented by the eradius = electron size.
 For the {peri} style, the particles are spherical and each stores a
 per-particle mass and volume.

+The {wavepacket} style is similar to {electron}, but the electrons may
+consist of several Gaussian wave packets, summed up with coefficients
+cs= (cs_re,cs_im).  Each of the wave packets is treated as a separate
+particle in LAMMPS, wave packets belonging to the same electron must
+have identical {etag} values.
+
 :line

 Typically, simulations require only a single (non-hybrid) atom style.
@ -117,9 +124,11 @@ The {angle}, {bond}, {full}, and {molecular} styles are part of the
 package.  The {ellipsoid} style is part of the "asphere" package.  The
 {peri} style is part of the "peri" package for Peridynamics.  The
 {electron} style is part of the "user-eff" package for "electronic
-force fields"_pair_eff.html.  They are only enabled if LAMMPS was
-built with that package.  See the "Making
-LAMMPS"_Section_start.html#2_3 section for more info.
+force fields"_pair_eff.html.  The {wavepacket} style is part of the
+"user-awpmd" package for the "antisymmetrized wave packet MD
+method"_pair_awpmd.html.  They are only enabled if LAMMPS was built
+with that package.  See the "Making LAMMPS"_Section_start.html#2_3
+section for more info.

 [Related commands:]

--- a/doc/pair_awpmd.html
+++ b/doc/pair_awpmd.html
@ -0,0 +1,130 @@
+<HTML>
+<CENTER><A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A> - <A HREF = "Manual.html">LAMMPS Documentation</A> - <A HREF = "Section_commands.html#comm">LAMMPS Commands</A> 
+</CENTER>
+
+
+
+
+
+
+<HR>
+
+<H3>pair_style awpmd/cut command 
+</H3>
+<P><B>Syntax:</B>
+</P>
+<PRE>pair_style awpmd/cut Rc keyword value ... 
+</PRE>
+<UL><LI>Rc = global cutoff, -1 means cutoff of half the shortest box length 
+
+<LI>zero or more keyword/value pairs may be appended 
+
+<LI>keyword = <I>hartree</I> or <I>dproduct</I> or <I>uhf</I> or <I>free</I> or <I>pbc</I> or <I>fix</I> or <I>harm</I> or <I>ermscale</I> or <I>flex_press</I> 
+
+<PRE>  <I>hartree</I> value = none
+  <I>dproduct</I> value = none
+  <I>uhf</I> value = none
+  <I>free</I> value = none
+  <I>pbc</I> value = Plen
+    Plen = periodic width of electron = -1 or positive value (distance units)
+  <I>fix</I> value = Flen
+    Flen = fixed width of electron = -1 or positive value (distance units)
+  <I>harm</I> value = width
+    width = harmonic width constraint
+  <I>ermscale</I> value = factor
+    factor = scaling between electron mass and width variable mass
+  <I>flex_press</I> value = none 
+</PRE>
+
+</UL>
+<P><B>Examples:</B>
+</P>
+<PRE>pair_style awpmd/cut -1
+pair_style awpmd/cut 40.0 uhf free
+pair_coeff * *
+pair_coeff 2 2 20.0 
+</PRE>
+<P><B>Description:</B>
+</P>
+<P>This pair style contains an implementation of the Antisymmetrized Wave
+Packet Molecular Dynamics (AWPMD) method.  Need citation here.  Need
+basic formulas here.  Could be links to other documents.
+</P>
+<P>Rc is the cutoff.
+</P>
+<P>The pair_style command allows for several optional keywords
+to be specified.
+</P>
+<P>The <I>hartree</I>, <I>dproduct</I>, and <I>uhf</I> keywords specify the form of the
+initial trial wave function for the system.  If the <I>hartree</I> keyword
+is used, then a Hartree multielectron trial wave function is used.  If
+the <I>dproduct</I> keyword is used, then a trial function which is a
+product of two determinants for each spin type is used.  If the <I>uhf</I>
+keyword is used, then an unrestricted Hartree-Fock trial wave function
+is used.
+</P>
+<P>The <I>free</I>, <I>pbc</I>, and <I>fix</I> keywords specify a width constraint on
+the electron wavepackets.  If the <I>free</I> keyword is specified, then there is no
+constraint.  If the <I>pbc</I> keyword is used and <I>Plen</I> is specified as
+-1, then the maximum width is half the shortest box length.  If <I>Plen</I>
+is a positive value, then the value is the maximum width.  If the
+<I>fix</I> keyword is used and <I>Flen</I> is specified as -1, then electrons
+have a constant width that is read from the data file.  If <I>Flen</I> is a
+positive value, then the constant width for all electrons is set to
+<I>Flen</I>.
+</P>
+<P>The <I>harm</I> keyword allow oscillations in the width of the
+electron wavepackets.  More details are needed.
+</P>
+<P>The <I>ermscale</I> keyword specifies a unitless scaling factor
+between the electron masses and the width variable mass.  More
+details needed.
+</P>
+<P>If the <I>flex_press</I> keyword is used, then a contribution from the
+electrons is added to the total virial and pressure of the system.
+</P>
+<P>This potential is designed to be used with <A HREF = "atom_style.html">atom_style
+wavepacket</A> definitions, in order to handle the
+description of systems with interacting nuclei and explicit electrons.
+</P>
+<P>The following coefficients must be defined for each pair of atoms
+types via the <A HREF = "pair_coeff.html">pair_coeff</A> command as in the examples
+above, or in the data file or restart files read by the
+<A HREF = "read_data.html">read_data</A> or <A HREF = "read_restart.html">read_restart</A>
+commands, or by mixing as described below:
+</P>
+<UL><LI>cutoff (distance units) 
+</UL>
+<P>For <I>awpmd/cut</I>, the cutoff coefficient is optional.  If it is not
+used (as in some of the examples above), the default global value
+specified in the pair_style command is used.
+</P>
+<HR>
+
+<P><B>Mixing, shift, table, tail correction, restart, rRESPA info</B>:
+</P>
+<P>The <A HREF = "pair_modify.html">pair_modify</A> mix, shift, table, and tail options
+are not relevant for this pair style.
+</P>
+<P>This pair style writes its information to <A HREF = "restart.html">binary restart
+files</A>, so pair_style and pair_coeff commands do not need
+to be specified in an input script that reads a restart file.
+</P>
+<P>This pair style can only be used via the <I>pair</I> keyword of the
+<A HREF = "run_style.html">run_style respa</A> command.  It does not support the
+<I>inner</I>, <I>middle</I>, <I>outer</I> keywords.
+</P>
+<HR>
+
+<P><B>Restrictions:</B> none
+</P>
+<P><B>Related commands:</B>
+</P>
+<P><A HREF = "pair_coeff.html">pair_coeff</A>
+</P>
+<P><B>Default:</B>
+</P>
+<P>These are the defaults for the pair_style keywords: <I>hartree</I> for the
+initial wavefunction, <I>free</I> for the wavepacket width.
+</P>
+</HTML>
--- a/doc/pair_awpmd.txt
+++ b/doc/pair_awpmd.txt
@ -0,0 +1,123 @@
+"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
+
+:link(lws,http://lammps.sandia.gov)
+:link(ld,Manual.html)
+:link(lc,Section_commands.html#comm)
+
+:line
+
+pair_style awpmd/cut command :h3
+
+[Syntax:]
+
+pair_style awpmd/cut Rc keyword value ... :pre
+
+Rc = global cutoff, -1 means cutoff of half the shortest box length :ulb,l
+zero or more keyword/value pairs may be appended :l
+keyword = {hartree} or {dproduct} or {uhf} or {free} or {pbc} or {fix} or {harm} or {ermscale} or {flex_press} :l
+  {hartree} value = none
+  {dproduct} value = none
+  {uhf} value = none
+  {free} value = none
+  {pbc} value = Plen
+    Plen = periodic width of electron = -1 or positive value (distance units)
+  {fix} value = Flen
+    Flen = fixed width of electron = -1 or positive value (distance units)
+  {harm} value = width
+    width = harmonic width constraint
+  {ermscale} value = factor
+    factor = scaling between electron mass and width variable mass
+  {flex_press} value = none :pre
+:ule
+
+
+[Examples:]
+
+pair_style awpmd/cut -1
+pair_style awpmd/cut 40.0 uhf free
+pair_coeff * *
+pair_coeff 2 2 20.0 :pre
+
+[Description:]
+
+This pair style contains an implementation of the Antisymmetrized Wave
+Packet Molecular Dynamics (AWPMD) method.  Need citation here.  Need
+basic formulas here.  Could be links to other documents.
+
+Rc is the cutoff.
+
+The pair_style command allows for several optional keywords
+to be specified.
+
+The {hartree}, {dproduct}, and {uhf} keywords specify the form of the
+initial trial wave function for the system.  If the {hartree} keyword
+is used, then a Hartree multielectron trial wave function is used.  If
+the {dproduct} keyword is used, then a trial function which is a
+product of two determinants for each spin type is used.  If the {uhf}
+keyword is used, then an unrestricted Hartree-Fock trial wave function
+is used.
+
+The {free}, {pbc}, and {fix} keywords specify a width constraint on
+the electron wavepackets.  If the {free} keyword is specified, then there is no
+constraint.  If the {pbc} keyword is used and {Plen} is specified as
+-1, then the maximum width is half the shortest box length.  If {Plen}
+is a positive value, then the value is the maximum width.  If the
+{fix} keyword is used and {Flen} is specified as -1, then electrons
+have a constant width that is read from the data file.  If {Flen} is a
+positive value, then the constant width for all electrons is set to
+{Flen}.
+
+The {harm} keyword allow oscillations in the width of the
+electron wavepackets.  More details are needed.
+
+The {ermscale} keyword specifies a unitless scaling factor
+between the electron masses and the width variable mass.  More
+details needed.
+
+If the {flex_press} keyword is used, then a contribution from the
+electrons is added to the total virial and pressure of the system.
+
+This potential is designed to be used with "atom_style
+wavepacket"_atom_style.html definitions, in order to handle the
+description of systems with interacting nuclei and explicit electrons.
+
+The following coefficients must be defined for each pair of atoms
+types via the "pair_coeff"_pair_coeff.html command as in the examples
+above, or in the data file or restart files read by the
+"read_data"_read_data.html or "read_restart"_read_restart.html
+commands, or by mixing as described below:
+
+cutoff (distance units) :ul
+
+For {awpmd/cut}, the cutoff coefficient is optional.  If it is not
+used (as in some of the examples above), the default global value
+specified in the pair_style command is used.
+
+:line
+
+[Mixing, shift, table, tail correction, restart, rRESPA info]:
+
+The "pair_modify"_pair_modify.html mix, shift, table, and tail options
+are not relevant for this pair style.
+
+This pair style writes its information to "binary restart
+files"_restart.html, so pair_style and pair_coeff commands do not need
+to be specified in an input script that reads a restart file.
+
+This pair style can only be used via the {pair} keyword of the
+"run_style respa"_run_style.html command.  It does not support the
+{inner}, {middle}, {outer} keywords.
+
+:line
+
+[Restrictions:] none
+
+[Related commands:]
+
+"pair_coeff"_pair_coeff.html
+
+[Default:]
+
+These are the defaults for the pair_style keywords: {hartree} for the
+initial wavefunction, {free} for the wavepacket width.
+
--- a/doc/pair_eff.html
+++ b/doc/pair_eff.html
@ -28,10 +28,10 @@ pair_coeff 2 2 20.0
 </PRE>
 <P><B>Description:</B>
 </P>
-<P>Contains a LAMMPS implementation of the electron Force Field (eFF)
-potential currently under development at Caltech, as described in
-<A HREF = "#Jaramillo-Botero">(Jaramillo-Botero)</A>.  The eFF was first introduced
-by <A HREF = "#Su">(Su)</A> in 2007.
+<P>This pair style contains a LAMMPS implementation of the electron Force
+Field (eFF) potential currently under development at Caltech, as
+described in <A HREF = "#Jaramillo-Botero">(Jaramillo-Botero)</A>.  The eFF was
+first introduced by <A HREF = "#Su">(Su)</A> in 2007.
 </P>
 <P>eFF can be viewed as an approximation to QM wave packet dynamics and
 Fermionic molecular dynamics, combining the ability of electronic
@ -117,7 +117,7 @@ option.  The Coulombic cutoff specified for this style means that
 pairwise interactions within this distance are computed directly;
 interactions outside that distance are computed in reciprocal space.
 </P>
-<P>These potentials are designed to be used with <A HREF = "atom_electron.html">atom_style
+<P>This potential is designed to be used with <A HREF = "atom_style.html">atom_style
 electron</A> definitions, in order to handle the
 description of systems with interacting nuclei and explicit electrons.
 </P>
--- a/doc/pair_eff.txt
+++ b/doc/pair_eff.txt
@ -25,10 +25,10 @@ pair_coeff 2 2 20.0 :pre

 [Description:]

-Contains a LAMMPS implementation of the electron Force Field (eFF)
-potential currently under development at Caltech, as described in
-"(Jaramillo-Botero)"_#Jaramillo-Botero.  The eFF was first introduced
-by "(Su)"_#Su in 2007.
+This pair style contains a LAMMPS implementation of the electron Force
+Field (eFF) potential currently under development at Caltech, as
+described in "(Jaramillo-Botero)"_#Jaramillo-Botero.  The eFF was
+first introduced by "(Su)"_#Su in 2007.

 eFF can be viewed as an approximation to QM wave packet dynamics and
 Fermionic molecular dynamics, combining the ability of electronic
@ -114,8 +114,8 @@ option.  The Coulombic cutoff specified for this style means that
 pairwise interactions within this distance are computed directly;
 interactions outside that distance are computed in reciprocal space.

-These potentials are designed to be used with "atom_style
-electron"_atom_electron.html definitions, in order to handle the
+This potential is designed to be used with "atom_style
+electron"_atom_style.html definitions, in order to handle the
 description of systems with interacting nuclei and explicit electrons.

 The following coefficients must be defined for each pair of atoms
--- a/doc/region.html
+++ b/doc/region.html
@ -90,6 +90,15 @@ deleted via the <A HREF = "delete_atoms.html">delete_atoms</A> command.  Or the
 surface of the region can be used as a boundary wall via the <A HREF = "fix_wall_region.html">fix
 wall/region</A> command.
 </P>
+<P>Commands which use regions typically test whether an atom's position
+is contained in the region or not.  For this purpose, coordinates
+exactly on the region boundary are considered to be interior to the
+region.  This means, for example, for a spherical region, an atom on
+the sphere surface would be part of the region if the sphere were
+defined with the <I>side in</I> keyword, but would not be part of the
+region if it were defined using the <I>side out</I> keyword.  See more
+details on the <I>side</I> keyword below.
+</P>
 <P>Normally, regions in LAMMPS are "static", meaning their geometric
 extent does not change with time.  If the <I>move</I> or <I>rotate</I> keyword
 is used, as described below, the region becomes "dynamic", meaning
--- a/doc/region.txt
+++ b/doc/region.txt
@ -81,6 +81,15 @@ deleted via the "delete_atoms"_delete_atoms.html command.  Or the
 surface of the region can be used as a boundary wall via the "fix
 wall/region"_fix_wall_region.html command.

+Commands which use regions typically test whether an atom's position
+is contained in the region or not.  For this purpose, coordinates
+exactly on the region boundary are considered to be interior to the
+region.  This means, for example, for a spherical region, an atom on
+the sphere surface would be part of the region if the sphere were
+defined with the {side in} keyword, but would not be part of the
+region if it were defined using the {side out} keyword.  See more
+details on the {side} keyword below.
+
 Normally, regions in LAMMPS are "static", meaning their geometric
 extent does not change with time.  If the {move} or {rotate} keyword
 is used, as described below, the region becomes "dynamic", meaning