git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@8544 f3b2605a-c512-4ea7-a41b-209d697bcdaa

2012-08-06 22:50:52 +00:00
parent df7994356a
commit 671c90bcca
10 changed files with 250 additions and 188 deletions
--- a/doc/Manual.html
+++ b/doc/Manual.html
@ -132,15 +132,21 @@ it gives quick access to documentation for all LAMMPS commands.
 <BR></UL>
 <LI><A HREF = "Section_accelerate.html">Accelerating LAMMPS performance</A> 
-<UL>  5.1 <A HREF = "Section_accelerate.html#acc_1">OPT package</A> 
+<UL>  5.1 <A HREF = "Section_accelerate.html#acc_1">Measuring performance</A> 
 <BR>
-  5.2 <A HREF = "Section_accelerate.html#acc_2">USER-OMP package</A> 
+  5.2 <A HREF = "Section_accelerate.html#acc_2">General strategies</A> 
 <BR>
-  5.3 <A HREF = "Section_accelerate.html#acc_3">GPU package</A> 
+  5.3 <A HREF = "Section_accelerate.html#acc_3">Packages with optimized styles</A> 
 <BR>
-  5.4 <A HREF = "Section_accelerate.html#acc_4">USER-CUDA package</A> 
+  5.4 <A HREF = "Section_accelerate.html#acc_4">OPT package</A> 
 <BR>
-  5.5 <A HREF = "Section_accelerate.html#acc_5">Comparison of GPU and USER-CUDA packages</A> 
+  5.5 <A HREF = "Section_accelerate.html#acc_5">USER-OMP package</A> 
 <BR>
  5.6 <A HREF = "Section_accelerate.html#acc_6">GPU package</A> 
 <BR>
  5.7 <A HREF = "Section_accelerate.html#acc_7">USER-CUDA package</A> 
 <BR>
  5.8 <A HREF = "Section_accelerate.html#acc_8">Comparison of GPU and USER-CUDA packages</A> 
 <BR></UL>
 <LI><A HREF = "Section_howto.html">How-to discussions</A> 
@ -393,6 +399,12 @@ it gives quick access to documentation for all LAMMPS commands.
--- a/doc/Manual.txt
+++ b/doc/Manual.txt
@ -102,11 +102,14 @@ it gives quick access to documentation for all LAMMPS commands.
  4.1 "Standard packages"_pkg_1 :ulb,b
  4.2 "User packages"_pkg_2 :ule,b
 "Accelerating LAMMPS performance"_Section_accelerate.html :l
-  5.1 "OPT package"_acc_1 :ulb,b
+  5.1 "Measuring performance"_acc_1 :ulb,b
-  5.2 "USER-OMP package"_acc_2 :b
+  5.2 "General strategies"_acc_2 :b
-  5.3 "GPU package"_acc_3 :b
+  5.3 "Packages with optimized styles"_acc_3 :b
-  5.4 "USER-CUDA package"_acc_4 :b
+  5.4 "OPT package"_acc_4 :b
-  5.5 "Comparison of GPU and USER-CUDA packages"_acc_5 :ule,b
+  5.5 "USER-OMP package"_acc_5 :b
  5.6 "GPU package"_acc_6 :b
  5.7 "USER-CUDA package"_acc_7 :b
  5.8 "Comparison of GPU and USER-CUDA packages"_acc_8 :ule,b
 "How-to discussions"_Section_howto.html :l
  6.1 "Restarting a simulation"_howto_1 :ulb,b
  6.2 "2d simulations"_howto_2 :b
@ -194,6 +197,9 @@ it gives quick access to documentation for all LAMMPS commands.
 :link(acc_3,Section_accelerate.html#acc_3)
 :link(acc_4,Section_accelerate.html#acc_4)
 :link(acc_5,Section_accelerate.html#acc_5)
 :link(acc_6,Section_accelerate.html#acc_6)
 :link(acc_7,Section_accelerate.html#acc_7)
 :link(acc_8,Section_accelerate.html#acc_8)
 :link(howto_1,Section_howto.html#howto_1)
 :link(howto_2,Section_howto.html#howto_2)
--- a/doc/Section_accelerate.html
+++ b/doc/Section_accelerate.html
@ -14,15 +14,94 @@ Section</A>
 <H3>5. Accelerating LAMMPS performance 
 </H3>
 <P>This section describes various methods for improving LAMMPS
-performance for different classes of problems running
+performance for different classes of problems running on different
-on different kinds of machines.
+kinds of machines.
 </P>
-5.1 <A HREF = "#acc_1">OPT package</A><BR>
+5.1 <A HREF = "#acc_1">Measuring performance</A><BR>
-5.2 <A HREF = "#acc_2">USER-OMP package</A><BR>
+5.2 <A HREF = "#acc_2">General strategies</A><BR>
-5.3 <A HREF = "#acc_3">GPU package</A><BR>
+5.3 <A HREF = "#acc_3">Packages with optimized styles</A><BR>
-5.4 <A HREF = "#acc_4">USER-CUDA package</A><BR>
+5.4 <A HREF = "#acc_4">OPT package</A><BR>
-5.5 <A HREF = "#acc_5">Comparison of GPU and USER-CUDA packages</A> <BR>
+5.5 <A HREF = "#acc_5">USER-OMP package</A><BR>
 5.6 <A HREF = "#acc_6">GPU package</A><BR>
 5.7 <A HREF = "#acc_7">USER-CUDA package</A><BR>
 5.8 <A HREF = "#acc_8">Comparison of GPU and USER-CUDA packages</A> <BR>
 <HR>
 <HR>
 <H4><A NAME = "acc_1"></A>5.1 Measuring performance 
 </H4>
 <P>Before trying to make your simulation run faster, you should
 understand how it currently performs and where the bottlenecks are.
 </P>
 <P>The best way to do this is run the your system (actual number of
 atoms) for a modest number of timesteps (say 100, or a few 100 at
 most) on several different processor counts, including a single
 processor if possible.  Do this for an equilibrium version of your
 system, so that the 100-step timings are representative of a much
 longer run.  There is typically no need to run for 1000s or timesteps
 to get accurate timings; you can simply extrapolate from short runs.
 </P>
 <P>For the set of runs, look at the timing data printed to the screen and
 log file at the end of each LAMMPS run.  <A HREF = "Section_start.html#start_8">This
 section</A> of the manual has an overview.
 </P>
 <P>Running on one (or a few processors) should give a good estimate of
 the serial performance and what portions of the timestep are taking
 the most time.  Running the same problem on a few different processor
 counts should give an estimate of parallel scalability.  I.e. if the
 simulation runs 16x faster on 16 processors, its 100% parallel
 efficient; if it runs 8x faster on 16 processors, it's 50% efficient.
 </P>
 <P>The most important data to look at in the timing info is the timing
 breakdown and relative percentages.  For example, trying different
 options for speeding up the long-range solvers will have little impact
 if they only consume 10% of the run time.  If the pairwise time is
 dominating, you may want to look at GPU or OMP versions of the pair
 style, as discussed below.  Comparing how the percentages change as
 you increase the processor count gives you a sense of how different
 operations within the timestep are scaling.  Note that if you are
 running with a Kspace solver, there is additional output on the
 breakdown of the Kspace time.  For PPPM, this includes the fraction
 spent on FFTs, which can be communication intensive.
 </P>
 <P>Another important detail in the timing info are the histograms of
 atoms counts and neighbor counts.  If these vary widely across
 processors, you have a load-imbalance issue.  This often results in
 inaccurate relative timing data, because processors have to wait when
 communication occurs for other processors to catch up.  Thus the
 reported times for "Communication" or "Other" may be higher than they
 really are, due to load-imbalance.  If this is an issue, you can
 uncomment the MPI_Barrier() lines in src/timer.cpp, and recompile
 LAMMPS, to obtain synchronized timings.
 </P>
 <HR>
 <H4><A NAME = "acc_2"></A>5.2 General strategies 
 </H4>
 <P>Here is a list of general ideas for improving simulation performance.
 Most of them are only applicable to certain models and certain
 bottlenecks in the current performance, so let the timing data you
 intially generate be your guide.  It is hard, if not impossible, to
 predict how much difference these options will make, since it is a
 function of your problem and your machine.  There is no substitute for
 simply trying them out.
 </P>
 <UL><LI>rRESPA
 <LI>2-FFT PPPM
 <LI>single vs double PPPM
 <LI>partial charge PPPM
 <LI>verlet/split
 <LI>processor mapping via processors numa command
 <LI>load-balancing: balance and fix balance
 <LI>processor command for layout
 <LI>OMP when lots of cores 
 </UL>
 <HR>
 <H4><A NAME = "acc_3"></A>5.3 Packages with optimized styles 
 </H4>
 <P>Accelerated versions of various <A HREF = "pair_style.html">pair_style</A>,
 <A HREF = "fix.html">fixes</A>, <A HREF = "compute.html">computes</A>, and other commands have
 been added to LAMMPS, which will typically run faster than the
@ -86,9 +165,7 @@ packages, since they are both designed to use NVIDIA GPU hardware.
 </P>
 <HR>
-<HR>
+<H4><A NAME = "acc_4"></A>5.4 OPT package 
 <H4><A NAME = "acc_1"></A>5.1 OPT package 
 </H4>
 <P>The OPT package was developed by James Fischer (High Performance
 Technologies), David Richie, and Vincent Natoli (Stone Ridge
@ -115,9 +192,7 @@ to 20% savings.
 </P>
 <HR>
-<HR>
+<H4><A NAME = "acc_5"></A>5.5 USER-OMP package 
 <H4><A NAME = "acc_2"></A>5.2 USER-OMP package 
 </H4>
 <P>The USER-OMP package was developed by Axel Kohlmeyer at Temple University.
 It provides multi-threaded versions of most pair styles, all dihedral
@ -236,9 +311,7 @@ examples are <A HREF = "http://sites.google.com/site/akohlmey/software/lammps-ic
 </P>
 <HR>
-<HR>
+<H4><A NAME = "acc_6"></A>5.6 GPU package 
 <H4><A NAME = "acc_3"></A>5.3 GPU package 
 </H4>
 <P>The GPU package was developed by Mike Brown at ORNL.  It provides GPU
 versions of several pair styles and for long-range Coulombics via the
@ -266,6 +339,13 @@ NVIDIA support as well as more general OpenCL support, so that the
 same functionality can eventually be supported on a variety of GPU
 hardware. 
 </UL>
 <P>NOTE:
  discuss 3 precisions
    if change, also have to re-link with LAMMPS
  always use newton off
  expt with differing numbers of CPUs vs GPU - can't tell what is fastest
  give command line switches in examples
 </P>
 <P><B>Hardware and software requirements:</B>
 </P>
 <P>To use this package, you currently need to have specific NVIDIA
@ -378,9 +458,7 @@ requires that your GPU card support double precision.
 </P>
 <HR>
-<HR>
+<H4><A NAME = "acc_7"></A>5.7 USER-CUDA package 
 <H4><A NAME = "acc_4"></A>5.4 USER-CUDA package 
 </H4>
 <P>The USER-CUDA package was developed by Christian Trott at U Technology
 Ilmenau in Germany.  It provides NVIDIA GPU versions of many pair
@ -516,7 +594,7 @@ occurs, the faster your simulation will run.
 <HR>
-<H4><A NAME = "acc_5"></A>5.5 Comparison of GPU and USER-CUDA packages 
+<H4><A NAME = "acc_8"></A>5.8 Comparison of GPU and USER-CUDA packages 
 </H4>
 <P>Both the GPU and USER-CUDA packages accelerate a LAMMPS calculation
 using NVIDIA hardware, but they do it in different ways.
@ -602,66 +680,4 @@ for the GPU and USER-CUDA packages.
 <P>These contain input scripts for identical systems, so they can be used
 to benchmark the performance of both packages on your system.
 </P>
 <HR>
 <P><B>Benchmark data:</B>
 </P>
 <P>NOTE: We plan to add some benchmark results and plots here for the
 examples described in the previous section.
 </P>
 <P>Simulations:
 </P>
 <P>1. Lennard Jones
 </P>
 <UL><LI>256,000 atoms
 <LI>2.5 A cutoff
 <LI>0.844 density 
 </UL>
 <P>2. Lennard Jones
 </P>
 <UL><LI>256,000 atoms
 <LI>5.0 A cutoff
 <LI>0.844 density 
 </UL>
 <P>3. Rhodopsin model
 </P>
 <UL><LI>256,000 atoms
 <LI>10A cutoff
 <LI>Coulomb via PPPM 
 </UL>
 <P>4. Lihtium-Phosphate
 </P>
 <UL><LI>295650 atoms
 <LI>15A cutoff
 <LI>Coulomb via PPPM 
 </UL>
 <P>Hardware:
 </P>
 <P>Workstation:
 </P>
 <UL><LI>2x GTX470
 <LI>i7 950@3GHz
 <LI>24Gb DDR3 @ 1066Mhz
 <LI>CentOS 5.5
 <LI>CUDA 3.2
 <LI>Driver 260.19.12 
 </UL>
 <P>eStella:
 </P>
 <UL><LI>6 Nodes
 <LI>2xC2050
 <LI>2xQDR Infiniband interconnect(aggregate bandwidth 80GBps)
 <LI>Intel X5650 HexCore @ 2.67GHz
 <LI>SL 5.5
 <LI>CUDA 3.2
 <LI>Driver 260.19.26 
 </UL>
 <P>Keeneland:
 </P>
 <UL><LI>HP SL-390 (Ariston) cluster
 <LI>120 nodes
 <LI>2x Intel Westmere hex-core CPUs
 <LI>3xC2070s
 <LI>QDR InfiniBand interconnect 
 </UL>
 </HTML>
--- a/doc/Section_accelerate.txt
+++ b/doc/Section_accelerate.txt
@ -11,14 +11,92 @@ Section"_Section_howto.html :c
 5. Accelerating LAMMPS performance :h3
 This section describes various methods for improving LAMMPS
-performance for different classes of problems running
+performance for different classes of problems running on different
-on different kinds of machines.
+kinds of machines.
-5.1 "OPT package"_#acc_1
+5.1 "Measuring performance"_#acc_1
-5.2 "USER-OMP package"_#acc_2
+5.2 "General strategies"_#acc_2
-5.3 "GPU package"_#acc_3
+5.3 "Packages with optimized styles"_#acc_3
-5.4 "USER-CUDA package"_#acc_4
+5.4 "OPT package"_#acc_4
-5.5 "Comparison of GPU and USER-CUDA packages"_#acc_5 :all(b)
+5.5 "USER-OMP package"_#acc_5
 5.6 "GPU package"_#acc_6
 5.7 "USER-CUDA package"_#acc_7
 5.8 "Comparison of GPU and USER-CUDA packages"_#acc_8 :all(b)
 :line
 :line
 5.1 Measuring performance :h4,link(acc_1)
 Before trying to make your simulation run faster, you should
 understand how it currently performs and where the bottlenecks are.
 The best way to do this is run the your system (actual number of
 atoms) for a modest number of timesteps (say 100, or a few 100 at
 most) on several different processor counts, including a single
 processor if possible.  Do this for an equilibrium version of your
 system, so that the 100-step timings are representative of a much
 longer run.  There is typically no need to run for 1000s or timesteps
 to get accurate timings; you can simply extrapolate from short runs.
 For the set of runs, look at the timing data printed to the screen and
 log file at the end of each LAMMPS run.  "This
 section"_Section_start.html#start_8 of the manual has an overview.
 Running on one (or a few processors) should give a good estimate of
 the serial performance and what portions of the timestep are taking
 the most time.  Running the same problem on a few different processor
 counts should give an estimate of parallel scalability.  I.e. if the
 simulation runs 16x faster on 16 processors, its 100% parallel
 efficient; if it runs 8x faster on 16 processors, it's 50% efficient.
 The most important data to look at in the timing info is the timing
 breakdown and relative percentages.  For example, trying different
 options for speeding up the long-range solvers will have little impact
 if they only consume 10% of the run time.  If the pairwise time is
 dominating, you may want to look at GPU or OMP versions of the pair
 style, as discussed below.  Comparing how the percentages change as
 you increase the processor count gives you a sense of how different
 operations within the timestep are scaling.  Note that if you are
 running with a Kspace solver, there is additional output on the
 breakdown of the Kspace time.  For PPPM, this includes the fraction
 spent on FFTs, which can be communication intensive.
 Another important detail in the timing info are the histograms of
 atoms counts and neighbor counts.  If these vary widely across
 processors, you have a load-imbalance issue.  This often results in
 inaccurate relative timing data, because processors have to wait when
 communication occurs for other processors to catch up.  Thus the
 reported times for "Communication" or "Other" may be higher than they
 really are, due to load-imbalance.  If this is an issue, you can
 uncomment the MPI_Barrier() lines in src/timer.cpp, and recompile
 LAMMPS, to obtain synchronized timings.
 :line
 5.2 General strategies :h4,link(acc_2)
 Here is a list of general ideas for improving simulation performance.
 Most of them are only applicable to certain models and certain
 bottlenecks in the current performance, so let the timing data you
 intially generate be your guide.  It is hard, if not impossible, to
 predict how much difference these options will make, since it is a
 function of your problem and your machine.  There is no substitute for
 simply trying them out.
 rRESPA
 2-FFT PPPM
 single vs double PPPM
 partial charge PPPM
 verlet/split
 processor mapping via processors numa command
 load-balancing: balance and fix balance
 processor command for layout
 OMP when lots of cores :ul
 :line
 5.3 Packages with optimized styles :h4,link(acc_3)
 Accelerated versions of various "pair_style"_pair_style.html,
 "fixes"_fix.html, "computes"_compute.html, and other commands have
@ -81,10 +159,9 @@ speed-ups you can expect :ul
 The final section compares and contrasts the GPU and USER-CUDA
 packages, since they are both designed to use NVIDIA GPU hardware.
 :line
 :line
-5.1 OPT package :h4,link(acc_1)
+5.4 OPT package :h4,link(acc_4)
 The OPT package was developed by James Fischer (High Performance
 Technologies), David Richie, and Vincent Natoli (Stone Ridge
@ -109,10 +186,9 @@ You should see a reduction in the "Pair time" printed out at the end
 of the run.  On most machines and problems, this will typically be a 5
 to 20% savings.
 :line
 :line
-5.2 USER-OMP package :h4,link(acc_2)
+5.5 USER-OMP package :h4,link(acc_5)
 The USER-OMP package was developed by Axel Kohlmeyer at Temple University.
 It provides multi-threaded versions of most pair styles, all dihedral
@ -229,10 +305,9 @@ through hyper-threading.
 A description of the multi-threading strategy and some performance
 examples are "presented here"_http://sites.google.com/site/akohlmey/software/lammps-icms/lammps-icms-tms2011-talk.pdf?attredirects=0&d=1
 :line
 :line
-5.3 GPU package :h4,link(acc_3)
+5.6 GPU package :h4,link(acc_6)
 The GPU package was developed by Mike Brown at ORNL.  It provides GPU
 versions of several pair styles and for long-range Coulombics via the
@ -260,6 +335,19 @@ NVIDIA support as well as more general OpenCL support, so that the
 same functionality can eventually be supported on a variety of GPU
 hardware. :l,ule
 NOTE:
  discuss 3 precisions
    if change, also have to re-link with LAMMPS
  always use newton off
  expt with differing numbers of CPUs vs GPU - can't tell what is fastest
  give command line switches in examples
 [Hardware and software requirements:]
 To use this package, you currently need to have specific NVIDIA
@ -370,10 +458,9 @@ See the lammps/lib/gpu/README file for instructions on how to build
 the GPU library for single, mixed, or double precision.  The latter
 requires that your GPU card support double precision.
 :line
 :line
-5.4 USER-CUDA package :h4,link(acc_4)
+5.7 USER-CUDA package :h4,link(acc_7)
 The USER-CUDA package was developed by Christian Trott at U Technology
 Ilmenau in Germany.  It provides NVIDIA GPU versions of many pair
@ -508,7 +595,7 @@ occurs, the faster your simulation will run.
 :line
 :line
-5.5 Comparison of GPU and USER-CUDA packages :h4,link(acc_5)
+5.8 Comparison of GPU and USER-CUDA packages :h4,link(acc_8)
 Both the GPU and USER-CUDA packages accelerate a LAMMPS calculation
 using NVIDIA hardware, but they do it in different ways.
@ -593,65 +680,3 @@ lammps/examples/USER/cuda = USER-CUDA package files :ul
 These contain input scripts for identical systems, so they can be used
 to benchmark the performance of both packages on your system.
 :line
 [Benchmark data:]
 NOTE: We plan to add some benchmark results and plots here for the
 examples described in the previous section.
 Simulations:
 1. Lennard Jones
 256,000 atoms
 2.5 A cutoff
 0.844 density :ul
 2. Lennard Jones
 256,000 atoms
 5.0 A cutoff
 0.844 density :ul
 3. Rhodopsin model
 256,000 atoms
 10A cutoff
 Coulomb via PPPM :ul
 4. Lihtium-Phosphate
 295650 atoms
 15A cutoff
 Coulomb via PPPM :ul
 Hardware:
 Workstation:
 2x GTX470
 i7 950@3GHz
 24Gb DDR3 @ 1066Mhz
 CentOS 5.5
 CUDA 3.2
 Driver 260.19.12 :ul
 eStella:
 6 Nodes
 2xC2050
 2xQDR Infiniband interconnect(aggregate bandwidth 80GBps)
 Intel X5650 HexCore @ 2.67GHz
 SL 5.5
 CUDA 3.2
 Driver 260.19.26 :ul
 Keeneland:
 HP SL-390 (Ariston) cluster
 120 nodes
 2x Intel Westmere hex-core CPUs
 3xC2070s
 QDR InfiniBand interconnect :ul
--- a/doc/Section_commands.html
+++ b/doc/Section_commands.html
@ -383,12 +383,12 @@ each style or click on the style itself for a full description:
 <DIV ALIGN=center><TABLE  BORDER=1 >
 <TR ALIGN="center"><TD ><A HREF = "compute_angle_local.html">angle/local</A></TD><TD ><A HREF = "compute_atom_molecule.html">atom/molecule</A></TD><TD ><A HREF = "compute_bond_local.html">bond/local</A></TD><TD ><A HREF = "compute_centro_atom.html">centro/atom</A></TD><TD ><A HREF = "compute_cluster_atom.html">cluster/atom</A></TD><TD ><A HREF = "compute_cna_atom.html">cna/atom</A></TD></TR>
 <TR ALIGN="center"><TD ><A HREF = "compute_com.html">com</A></TD><TD ><A HREF = "compute_com_molecule.html">com/molecule</A></TD><TD ><A HREF = "compute_coord_atom.html">coord/atom</A></TD><TD ><A HREF = "compute_damage_atom.html">damage/atom</A></TD><TD ><A HREF = "compute_dihedral_local.html">dihedral/local</A></TD><TD ><A HREF = "compute_displace_atom.html">displace/atom</A></TD></TR>
-<TR ALIGN="center"><TD ><A HREF = "compute_erotate_asphere.html">erotate/asphere</A></TD><TD ><A HREF = "compute_erotate_sphere.html">erotate/sphere</A></TD><TD ><A HREF = "compute_event_displace.html">event/displace</A></TD><TD ><A HREF = "compute_group_group.html">group/group</A></TD><TD ><A HREF = "compute_gyration.html">gyration</A></TD><TD ><A HREF = "compute_gyration_molecule.html">gyration/molecule</A></TD></TR>
+<TR ALIGN="center"><TD ><A HREF = "compute_erotate_asphere.html">erotate/asphere</A></TD><TD ><A HREF = "compute_erotate_sphere.html">erotate/sphere</A></TD><TD ><A HREF = "compute_erotate_sphere_atom.html">erotate/sphere/atom</A></TD><TD ><A HREF = "compute_event_displace.html">event/displace</A></TD><TD ><A HREF = "compute_group_group.html">group/group</A></TD><TD ><A HREF = "compute_gyration.html">gyration</A></TD></TR>
-<TR ALIGN="center"><TD ><A HREF = "compute_heat_flux.html">heat/flux</A></TD><TD ><A HREF = "compute_improper_local.html">improper/local</A></TD><TD ><A HREF = "compute_ke.html">ke</A></TD><TD ><A HREF = "compute_ke_atom.html">ke/atom</A></TD><TD ><A HREF = "compute_msd.html">msd</A></TD><TD ><A HREF = "compute_msd_molecule.html">msd/molecule</A></TD></TR>
+<TR ALIGN="center"><TD ><A HREF = "compute_gyration_molecule.html">gyration/molecule</A></TD><TD ><A HREF = "compute_heat_flux.html">heat/flux</A></TD><TD ><A HREF = "compute_improper_local.html">improper/local</A></TD><TD ><A HREF = "compute_ke.html">ke</A></TD><TD ><A HREF = "compute_ke_atom.html">ke/atom</A></TD><TD ><A HREF = "compute_msd.html">msd</A></TD></TR>
-<TR ALIGN="center"><TD ><A HREF = "compute_pair.html">pair</A></TD><TD ><A HREF = "compute_pair_local.html">pair/local</A></TD><TD ><A HREF = "compute_pe.html">pe</A></TD><TD ><A HREF = "compute_pe_atom.html">pe/atom</A></TD><TD ><A HREF = "compute_pressure.html">pressure</A></TD><TD ><A HREF = "compute_property_atom.html">property/atom</A></TD></TR>
+<TR ALIGN="center"><TD ><A HREF = "compute_msd_molecule.html">msd/molecule</A></TD><TD ><A HREF = "compute_pair.html">pair</A></TD><TD ><A HREF = "compute_pair_local.html">pair/local</A></TD><TD ><A HREF = "compute_pe.html">pe</A></TD><TD ><A HREF = "compute_pe_atom.html">pe/atom</A></TD><TD ><A HREF = "compute_pressure.html">pressure</A></TD></TR>
-<TR ALIGN="center"><TD ><A HREF = "compute_property_local.html">property/local</A></TD><TD ><A HREF = "compute_property_molecule.html">property/molecule</A></TD><TD ><A HREF = "compute_rdf.html">rdf</A></TD><TD ><A HREF = "compute_reduce.html">reduce</A></TD><TD ><A HREF = "compute_reduce.html">reduce/region</A></TD><TD ><A HREF = "compute_slice.html">slice</A></TD></TR>
+<TR ALIGN="center"><TD ><A HREF = "compute_property_atom.html">property/atom</A></TD><TD ><A HREF = "compute_property_local.html">property/local</A></TD><TD ><A HREF = "compute_property_molecule.html">property/molecule</A></TD><TD ><A HREF = "compute_rdf.html">rdf</A></TD><TD ><A HREF = "compute_reduce.html">reduce</A></TD><TD ><A HREF = "compute_reduce.html">reduce/region</A></TD></TR>
-<TR ALIGN="center"><TD ><A HREF = "compute_stress_atom.html">stress/atom</A></TD><TD ><A HREF = "compute_temp.html">temp</A></TD><TD ><A HREF = "compute_temp_asphere.html">temp/asphere</A></TD><TD ><A HREF = "compute_temp_com.html">temp/com</A></TD><TD ><A HREF = "compute_temp_deform.html">temp/deform</A></TD><TD ><A HREF = "compute_temp_partial.html">temp/partial</A></TD></TR>
+<TR ALIGN="center"><TD ><A HREF = "compute_slice.html">slice</A></TD><TD ><A HREF = "compute_stress_atom.html">stress/atom</A></TD><TD ><A HREF = "compute_temp.html">temp</A></TD><TD ><A HREF = "compute_temp_asphere.html">temp/asphere</A></TD><TD ><A HREF = "compute_temp_com.html">temp/com</A></TD><TD ><A HREF = "compute_temp_deform.html">temp/deform</A></TD></TR>
-<TR ALIGN="center"><TD ><A HREF = "compute_temp_profile.html">temp/profile</A></TD><TD ><A HREF = "compute_temp_ramp.html">temp/ramp</A></TD><TD ><A HREF = "compute_temp_region.html">temp/region</A></TD><TD ><A HREF = "compute_temp_sphere.html">temp/sphere</A></TD><TD ><A HREF = "compute_ti.html">ti</A> 
+<TR ALIGN="center"><TD ><A HREF = "compute_temp_partial.html">temp/partial</A></TD><TD ><A HREF = "compute_temp_profile.html">temp/profile</A></TD><TD ><A HREF = "compute_temp_ramp.html">temp/ramp</A></TD><TD ><A HREF = "compute_temp_region.html">temp/region</A></TD><TD ><A HREF = "compute_temp_sphere.html">temp/sphere</A></TD><TD ><A HREF = "compute_ti.html">ti</A> 
 </TD></TR></TABLE></DIV>
 <P>These are compute styles contributed by users, which can be used if
--- a/doc/Section_commands.txt
+++ b/doc/Section_commands.txt
@ -556,6 +556,7 @@ each style or click on the style itself for a full description:
 "displace/atom"_compute_displace_atom.html,
 "erotate/asphere"_compute_erotate_asphere.html,
 "erotate/sphere"_compute_erotate_sphere.html,
 "erotate/sphere/atom"_compute_erotate_sphere_atom.html,
 "event/displace"_compute_event_displace.html,
 "group/group"_compute_group_group.html,
 "gyration"_compute_gyration.html,
--- a/doc/Section_packages.html
+++ b/doc/Section_packages.html
@ -49,7 +49,7 @@ packages, more details are provided.
 <TR ALIGN="center"><TD >COLLOID</TD><TD > colloidal particles</TD><TD > -</TD><TD > <A HREF = "atom_style.html">atom_style colloid</A></TD><TD > colloid</TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >DIPOLE</TD><TD > point dipole particles</TD><TD > -</TD><TD > <A HREF = "pair_dipole.html">pair_style dipole/cut</A></TD><TD > dipole</TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >FLD</TD><TD > Fast Lubrication Dynamics</TD><TD > Kumar & Bybee & Higdon (1)</TD><TD > <A HREF = "pair_lubricateU.html">pair_style lubricateU</A></TD><TD > -</TD><TD > -</TD></TR>
-<TR ALIGN="center"><TD >GPU</TD><TD > GPU-enabled potentials</TD><TD > Mike Brown (ORNL)</TD><TD > <A HREF = "Section_accelerate.html#acc_3">Section accelerate</A></TD><TD > gpu</TD><TD > lib/gpu</TD></TR>
+<TR ALIGN="center"><TD >GPU</TD><TD > GPU-enabled potentials</TD><TD > Mike Brown (ORNL)</TD><TD > <A HREF = "Section_accelerate.html#acc_6">Section accelerate</A></TD><TD > gpu</TD><TD > lib/gpu</TD></TR>
 <TR ALIGN="center"><TD >GRANULAR</TD><TD > granular systems</TD><TD > -</TD><TD > <A HREF = "Section_howto.html#howto_6">Section_howto</A></TD><TD > pour</TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >KIM</TD><TD > openKIM potentials</TD><TD > Smirichinski & Elliot & Tadmor (3)</TD><TD > <A HREF = "pair_kim.html">pair_style kim</A></TD><TD > kim</TD><TD > lib/kim</TD></TR>
 <TR ALIGN="center"><TD >KSPACE</TD><TD > long-range Coulombic solvers</TD><TD > -</TD><TD > <A HREF = "kspace_style.html">kspace_style</A></TD><TD > peptide</TD><TD > -</TD></TR>
@ -57,7 +57,7 @@ packages, more details are provided.
 <TR ALIGN="center"><TD >MEAM</TD><TD > modified EAM potential</TD><TD > Greg Wagner (Sandia)</TD><TD > <A HREF = "pair_meam.html">pair_style meam</A></TD><TD > meam</TD><TD > lib/meam</TD></TR>
 <TR ALIGN="center"><TD >MC</TD><TD > Monte Carlo options</TD><TD > -</TD><TD > <A HREF = "fix_gcmc.html">fix gcmc</A></TD><TD > -</TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >MOLECULE</TD><TD > molecular system force fields</TD><TD > -</TD><TD > <A HREF = "Section_howto.html#howto_3">Section_howto</A></TD><TD > peptide</TD><TD > -</TD></TR>
-<TR ALIGN="center"><TD >OPT</TD><TD > optimized pair potentials</TD><TD > Fischer & Richie & Natoli (2)</TD><TD > <A HREF = "Section_accelerate.html#acc_1">Section accelerate</A></TD><TD > -</TD><TD > -</TD></TR>
+<TR ALIGN="center"><TD >OPT</TD><TD > optimized pair potentials</TD><TD > Fischer & Richie & Natoli (2)</TD><TD > <A HREF = "Section_accelerate.html#acc_4">Section accelerate</A></TD><TD > -</TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >PERI</TD><TD > Peridynamics models</TD><TD > Mike Parks (Sandia)</TD><TD > <A HREF = "pair_peri.html">pair_style peri</A></TD><TD > peri</TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >POEMS</TD><TD > coupled rigid body motion</TD><TD > Rudra Mukherjee (JPL)</TD><TD > <A HREF = "fix_poems.html">fix poems</A></TD><TD > rigid</TD><TD > lib/poems</TD></TR>
 <TR ALIGN="center"><TD >REAX</TD><TD > ReaxFF potential</TD><TD > Aidan Thompson (Sandia)</TD><TD > <A HREF = "pair_reax.html">pair_style reax</A></TD><TD > reax</TD><TD >  lib/reax</TD></TR>
@ -106,11 +106,11 @@ E.g. "peptide" refers to the examples/peptide directory.
 <TR ALIGN="center"><TD >USER-AWPMD</TD><TD > wave-packet MD</TD><TD > Ilya Valuev (JIHT)</TD><TD > <A HREF = "pair_awpmd.html">pair_style awpmd/cut</A></TD><TD > USER/awpmd</TD><TD > -</TD><TD > lib/awpmd</TD></TR>
 <TR ALIGN="center"><TD >USER-CG-CMM</TD><TD > coarse-graining model</TD><TD > Axel Kohlmeyer (Temple U)</TD><TD > <A HREF = "pair_sdk.html">pair_style lj/sdk</A></TD><TD > USER/cg-cmm</TD><TD > <A HREF = "http://lammps.sandia.gov/pictures.html#cg">cg</A></TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >USER-COLVARS</TD><TD > collective variables</TD><TD > Fiorin & Henin & Kohlmeyer (3)</TD><TD > <A HREF = "fix_colvars.html">fix colvars</A></TD><TD > USER/colvars</TD><TD > <A HREF = "colvars">colvars</A></TD><TD > lib/colvars</TD></TR>
-<TR ALIGN="center"><TD >USER-CUDA</TD><TD > NVIDIA GPU styles</TD><TD > Christian Trott (U Tech Ilmenau)</TD><TD > <A HREF = "Section_accelerate.html#acc_4">Section accelerate</A></TD><TD > USER/cuda</TD><TD > -</TD><TD > lib/cuda</TD></TR>
+<TR ALIGN="center"><TD >USER-CUDA</TD><TD > NVIDIA GPU styles</TD><TD > Christian Trott (U Tech Ilmenau)</TD><TD > <A HREF = "Section_accelerate.html#acc_7">Section accelerate</A></TD><TD > USER/cuda</TD><TD > -</TD><TD > lib/cuda</TD></TR>
 <TR ALIGN="center"><TD >USER-EFF</TD><TD > electron force field</TD><TD > Andres Jaramillo-Botero (Caltech)</TD><TD > <A HREF = "pair_eff.html">pair_style eff/cut</A></TD><TD > USER/eff</TD><TD > <A HREF = "http://lammps.sandia.gov/movies.html#eff">eff</A></TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >USER-EWALDN</TD><TD > Ewald for 1/R^n</TD><TD > Pieter in' t Veld (BASF)</TD><TD > <A HREF = "kspace_style.html">kspace_style</A></TD><TD > -</TD><TD > -</TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >USER-MOLFILE</TD><TD > <A HREF = "http://www.ks.uiuc.edu/Research/vmd">VMD</A> molfile plug-ins</TD><TD > Axel Kohlmeyer (Temple U)</TD><TD > <A HREF = "dump_molfile.html">dump molfile</A></TD><TD > -</TD><TD > -</TD><TD > lib/molfile</TD></TR>
-<TR ALIGN="center"><TD >USER-OMP</TD><TD > OpenMP threaded styles</TD><TD > Axel Kohlmeyer (Temple U)</TD><TD > <A HREF = "Section_accelerate.html#acc_2">Section accelerate</A></TD><TD > -</TD><TD > -</TD><TD > -</TD></TR>
+<TR ALIGN="center"><TD >USER-OMP</TD><TD > OpenMP threaded styles</TD><TD > Axel Kohlmeyer (Temple U)</TD><TD > <A HREF = "Section_accelerate.html#acc_5">Section accelerate</A></TD><TD > -</TD><TD > -</TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >USER-REAXC</TD><TD > C version of ReaxFF</TD><TD > Metin Aktulga (LBNL)</TD><TD > <A HREF = "pair_reax_c.html">pair_style reaxc</A></TD><TD > reax</TD><TD > -</TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >USER-SPH</TD><TD > smoothed particle hydrodynamics</TD><TD > Georg Ganzenmuller (EMI)</TD><TD > <A HREF = "USER/sph/SPH_LAMMPS_userguide.pdf">userguide.pdf</A></TD><TD > USER/sph</TD><TD > <A HREF = "http://lammps.sandia.gov/movies.html#sph">sph</A></TD><TD > -</TD></TR>
 <TR ALIGN="center"><TD >
@ -304,7 +304,7 @@ GPUs.
 </P>
 <P>See this section of the manual to get started:
 </P>
-<P><A HREF = "Section_accelerate.html#acc_4">Section_accelerate</A>
+<P><A HREF = "Section_accelerate.html#acc_7">Section_accelerate</A>
 </P>
 <P>There are example scripts for using this package in
 examples/USER/cuda.
@ -403,7 +403,7 @@ styles, and fix styles.
 </P>
 <P>See this section of the manual to get started:
 </P>
-<P><A HREF = "Section_accelerate.html#acc_2">Section_accelerate</A>
+<P><A HREF = "Section_accelerate.html#acc_5">Section_accelerate</A>
 </P>
 <P>The person who created this package is Axel Kohlmeyer at Temple U
 (akohlmey at gmail.com).  Contact him directly if you have questions.
--- a/doc/Section_packages.txt
+++ b/doc/Section_packages.txt
@ -44,7 +44,7 @@ CLASS2, class 2 force fields, -, "pair_style lj/class2"_pair_class2.html, -, -
 COLLOID, colloidal particles, -, "atom_style colloid"_atom_style.html, colloid, -
 DIPOLE, point dipole particles, -, "pair_style dipole/cut"_pair_dipole.html, dipole, -
 FLD, Fast Lubrication Dynamics, Kumar & Bybee & Higdon (1), "pair_style lubricateU"_pair_lubricateU.html, -, -
-GPU, GPU-enabled potentials, Mike Brown (ORNL), "Section accelerate"_Section_accelerate.html#acc_3, gpu, lib/gpu
+GPU, GPU-enabled potentials, Mike Brown (ORNL), "Section accelerate"_Section_accelerate.html#acc_6, gpu, lib/gpu
 GRANULAR, granular systems, -, "Section_howto"_Section_howto.html#howto_6, pour, -
 KIM, openKIM potentials, Smirichinski & Elliot & Tadmor (3), "pair_style kim"_pair_kim.html, kim, lib/kim
 KSPACE, long-range Coulombic solvers, -, "kspace_style"_kspace_style.html, peptide, -
@ -52,7 +52,7 @@ MANYBODY, many-body potentials, -, "pair_style tersoff"_pair_tersoff.html, shear
 MEAM, modified EAM potential, Greg Wagner (Sandia), "pair_style meam"_pair_meam.html, meam, lib/meam
 MC, Monte Carlo options, -, "fix gcmc"_fix_gcmc.html, -, -
 MOLECULE, molecular system force fields, -, "Section_howto"_Section_howto.html#howto_3, peptide, -
-OPT, optimized pair potentials, Fischer & Richie & Natoli (2), "Section accelerate"_Section_accelerate.html#acc_1, -, -
+OPT, optimized pair potentials, Fischer & Richie & Natoli (2), "Section accelerate"_Section_accelerate.html#acc_4, -, -
 PERI, Peridynamics models, Mike Parks (Sandia), "pair_style peri"_pair_peri.html, peri, -
 POEMS, coupled rigid body motion, Rudra Mukherjee (JPL), "fix poems"_fix_poems.html, rigid, lib/poems
 REAX, ReaxFF potential, Aidan Thompson (Sandia), "pair_style reax"_pair_reax.html, reax,  lib/reax
@ -98,11 +98,11 @@ USER-ATC, atom-to-continuum coupling, Jones & Templeton & Zimmerman (2), "fix at
 USER-AWPMD, wave-packet MD, Ilya Valuev (JIHT), "pair_style awpmd/cut"_pair_awpmd.html, USER/awpmd, -, lib/awpmd
 USER-CG-CMM, coarse-graining model, Axel Kohlmeyer (Temple U), "pair_style lj/sdk"_pair_sdk.html, USER/cg-cmm, "cg"_cg, -
 USER-COLVARS, collective variables, Fiorin & Henin & Kohlmeyer (3), "fix colvars"_fix_colvars.html, USER/colvars, "colvars"_colvars, lib/colvars
-USER-CUDA, NVIDIA GPU styles, Christian Trott (U Tech Ilmenau), "Section accelerate"_Section_accelerate.html#acc_4, USER/cuda, -, lib/cuda
+USER-CUDA, NVIDIA GPU styles, Christian Trott (U Tech Ilmenau), "Section accelerate"_Section_accelerate.html#acc_7, USER/cuda, -, lib/cuda
 USER-EFF, electron force field, Andres Jaramillo-Botero (Caltech), "pair_style eff/cut"_pair_eff.html, USER/eff, "eff"_eff, -
 USER-EWALDN, Ewald for 1/R^n, Pieter in' t Veld (BASF), "kspace_style"_kspace_style.html, -, -, -
 USER-MOLFILE, "VMD"_VMD molfile plug-ins, Axel Kohlmeyer (Temple U), "dump molfile"_dump_molfile.html, -, -, lib/molfile
-USER-OMP, OpenMP threaded styles, Axel Kohlmeyer (Temple U), "Section accelerate"_Section_accelerate.html#acc_2, -, -, -
+USER-OMP, OpenMP threaded styles, Axel Kohlmeyer (Temple U), "Section accelerate"_Section_accelerate.html#acc_5, -, -, -
 USER-REAXC, C version of ReaxFF, Metin Aktulga (LBNL), "pair_style reaxc"_pair_reax_c.html, reax, -, -
 USER-SPH, smoothed particle hydrodynamics, Georg Ganzenmuller (EMI), "userguide.pdf"_USER/sph/SPH_LAMMPS_userguide.pdf, USER/sph, "sph"_sph, -
 :tb(ea=c)
@ -291,7 +291,7 @@ GPUs.
 See this section of the manual to get started:
-"Section_accelerate"_Section_accelerate.html#acc_4
+"Section_accelerate"_Section_accelerate.html#acc_7
 There are example scripts for using this package in
 examples/USER/cuda.
@ -390,7 +390,7 @@ styles, and fix styles.
 See this section of the manual to get started:
-"Section_accelerate"_Section_accelerate.html#acc_2
+"Section_accelerate"_Section_accelerate.html#acc_5
 The person who created this package is Axel Kohlmeyer at Temple U
 (akohlmey at gmail.com).  Contact him directly if you have questions.
--- a/doc/compute.html
+++ b/doc/compute.html
@ -181,6 +181,7 @@ available in LAMMPS:
 <LI><A HREF = "compute_displace_atom.html">displace/atom</A> - displacement of each atom
 <LI><A HREF = "compute_erotate_asphere.html">erotate/asphere</A> - rotational energy of aspherical particles
 <LI><A HREF = "compute_erotate_sphere.html">erotate/sphere</A> - rotational energy of spherical particles
 <LI><A HREF = "compute_erotate_sphere.html">erotate/sphere/atom</A> - rotational energy for each spherical particle
 <LI><A HREF = "compute_event_displace.html">event/displace</A> - detect event on atom displacement
 <LI><A HREF = "compute_group_group.html">group/group</A> - energy/force between two groups of atoms
 <LI><A HREF = "compute_gyration.html">gyration</A> - radius of gyration of group of atoms
--- a/doc/compute.txt
+++ b/doc/compute.txt
@ -176,6 +176,7 @@ available in LAMMPS:
 "displace/atom"_compute_displace_atom.html - displacement of each atom
 "erotate/asphere"_compute_erotate_asphere.html - rotational energy of aspherical particles
 "erotate/sphere"_compute_erotate_sphere.html - rotational energy of spherical particles
 "erotate/sphere/atom"_compute_erotate_sphere.html - rotational energy for each spherical particle
 "event/displace"_compute_event_displace.html - detect event on atom displacement
 "group/group"_compute_group_group.html - energy/force between two groups of atoms
 "gyration"_compute_gyration.html - radius of gyration of group of atoms