Merge branch 'master' into lammps-icms

Resolved Conflicts: lib/meam/Makefile.gfortran lib/poems/Makefile.g++ lib/reax/Makefile.gfortran python/lammps.py src/USER-CUDA/cuda.cpp
2012-08-15 08:38:04 -04:00
parent 01830d476b e02a64ecc0
commit ffc41c52c5
67 changed files with 3370 additions and 1354 deletions
--- a/doc/Section_python.html
+++ b/doc/Section_python.html
@ -14,8 +14,8 @@
 <P>This section describes how to build and use LAMMPS via a Python
 interface.
 </P>
-<UL><LI>11.1 <A HREF = "#py_1">Setting necessary environment variables</A>
+<UL><LI>11.1 <A HREF = "#py_1">Building LAMMPS as a shared library</A>
-<LI>11.2 <A HREF = "#py_2">Building LAMMPS as a shared library</A>
+<LI>11.2 <A HREF = "#py_2">Installing the Python wrapper into Python</A>
 <LI>11.3 <A HREF = "#py_3">Extending Python with MPI to run in parallel</A>
 <LI>11.4 <A HREF = "#py_4">Testing the Python-LAMMPS interface</A>
 <LI>11.5 <A HREF = "#py_5">Using LAMMPS from Python</A>
@ -76,109 +76,97 @@ check which version of Python you have installed, by simply typing
 <HR>
-<A NAME = "py_1"></A><H4>11.1 Setting necessary environment variables 
+<A NAME = "py_1"></A><H4>11.1 Building LAMMPS as a shared library 
 </H4>
 <P>For Python to use the LAMMPS interface, it needs to find two files.
 The paths to these files need to be added to two environment variables
 that Python checks.
 </P>
 <P>The first is the environment variable PYTHONPATH.  It needs
 to include the directory where the python/lammps.py file is.
 </P>
 <P>For the csh or tcsh shells, you could add something like this to your
 ~/.cshrc file:
 </P>
 <PRE>setenv PYTHONPATH $<I>PYTHONPATH</I>:/home/sjplimp/lammps/python 
 </PRE>
 <P>The second is the environment variable LD_LIBRARY_PATH, which is used
 by the operating system to find dynamic shared libraries when it loads
 them.  It needs to include the directory where the shared LAMMPS
 library will be.  Normally this is the LAMMPS src dir, as explained in
 the following section.
 </P>
 <P>For the csh or tcsh shells, you could add something like this to your
 ~/.cshrc file:
 </P>
 <PRE>setenv LD_LIBRARY_PATH $<I>LD_LIBRARY_PATH</I>:/home/sjplimp/lammps/src 
 </PRE>
 <P>As discussed below, if your LAMMPS build includes auxiliary libraries,
 they must also be available as shared libraries for Python to
 successfully load LAMMPS.  If they are not in default places where the
 operating system can find them, then you also have to add their paths
 to the LD_LIBRARY_PATH environment variable.
 </P>
 <P>For example, if you are using the dummy MPI library provided in
 src/STUBS, you need to add something like this to your ~/.cshrc file:
 </P>
 <PRE>setenv LD_LIBRARY_PATH $<I>LD_LIBRARY_PATH</I>:/home/sjplimp/lammps/src/STUBS 
 </PRE>
 <P>If you are using the LAMMPS USER-ATC package, you need to add
 something like this to your ~/.cshrc file:
 </P>
 <PRE>setenv LD_LIBRARY_PATH $<I>LD_LIBRARY_PATH</I>:/home/sjplimp/lammps/lib/atc 
 </PRE>
 <HR>
 <A NAME = "py_2"></A><H4>11.2 Building LAMMPS as a shared library 
 </H4>
 <P>Instructions on how to build LAMMPS as a shared library are given in
 <A HREF = "Section_start.html#start_5">Section_start 5</A>.  A shared library is one
 that is dynamically loadable, which is what Python requires.  On Linux
 this is a library file that ends in ".so", not ".a".
 </P>
-<P>>From the src directory, type
+<P>From the src directory, type
 </P>
-<P>make makeshlib
+<PRE>make makeshlib
 make -f Makefile.shlib foo 
 </P>
 <P>where foo is the machine target name, such as linux or g++ or serial.
 This should create the file liblmp_foo.so in the src directory, as
 well as a soft link liblmp.so which is what the Python wrapper will
 load by default.  If you are building multiple machine versions of the
 shared library, the soft link is always set to the most recently built
 version.
 </P>
 <P>Note that as discussed in below, a LAMMPS build may depend on several
 auxiliary libraries, which are specified in your low-level
 src/Makefile.foo file.  For example, an MPI library, the FFTW library,
 a JPEG library, etc.  Depending on what LAMMPS packages you have
 installed, the build may also require additional libraries from the
 lib directories, such as lib/atc/libatc.so or lib/reax/libreax.so.
 </P>
 <P>You must insure that each of these libraries exist in shared library
 form (*.so file for Linux systems), or either the LAMMPS shared
 library build or the Python load of the library will fail.  For the
 load to be successful all the shared libraries must also be in
 directories that the operating system checks.  See the discussion in
 the preceding section about the LD_LIBRARY_PATH environment variable
 for how to insure this.
 </P>
 <P>Note that some system libraries, such as MPI, if you installed it
 yourself, may not be built by default as shared libraries.  The build
 instructions for the library should tell you how to do this.
 </P>
 <P>For example, here is how to build and install the <A HREF = "http://www-unix.mcs.anl.gov/mpi">MPICH
 library</A>, a popular open-source version of MPI, distributed by
 Argonne National Labs, as a shared library in the default
 /usr/local/lib location:
 </P>
 <PRE>./configure --enable-shared
 make
 make install 
 </PRE>
-<P>You may need to use "sudo make install" in place of the last line if
+<P>where foo is the machine target name, such as linux or g++ or serial.
-you do not have write priveleges for /usr/local/lib.  The end result
+This should create the file liblammps_foo.so in the src directory, as
-should be the file /usr/local/lib/libmpich.so.
+well as a soft link liblammps.so, which is what the Python wrapper will
 load by default.  Note that if you are building multiple machine
 versions of the shared library, the soft link is always set to the
 most recently built version.
 </P>
-<P>Note that not all of the auxiliary libraries provided with LAMMPS have
+<P>If this fails, see <A HREF = "Section_start.html#start_5">Section_start 5</A> for
-shared-library Makefiles in their lib directories.  Typically this
+more details, especially if your LAMMPS build uses auxiliary libraries
-simply requires a Makefile.foo that adds a -fPIC switch when files are
+like MPI or FFTW which may not be built as shared libraries on your
-compiled and a "-fPIC -shared" switches when the library is linked
+system.
-with a C++ (or Fortran) compiler, as well as an output target that
+</P>
-ends in ".so", like libatc.o.  As we or others create and contribute
+<HR>
-these Makefiles, we will add them to the LAMMPS distribution.
+
 <A NAME = "py_2"></A><H4>11.2 Installing the Python wrapper into Python 
 </H4>
 <P>For Python to invoke LAMMPS, there are 2 files it needs to know about:
 </P>
 <UL><LI>python/lammps.py
 <LI>src/liblammps.so 
 </UL>
 <P>Lammps.py is the Python wrapper on the LAMMPS library interface.
 Liblammps.so is the shared LAMMPS library that Python loads, as
 described above.
 </P>
 <P>You can insure Python can find these files in one of two ways:
 </P>
 <UL><LI>set two environment variables
 <LI>run the python/install.py script 
 </UL>
 <P>If you set the paths to these files as environment variables, you only
 have to do it once.  For the csh or tcsh shells, add something like
 this to your ~/.cshrc file, one line for each of the two files:
 </P>
 <PRE>setenv PYTHONPATH $<I>PYTHONPATH</I>:/home/sjplimp/lammps/python
 setenv LD_LIBRARY_PATH $<I>LD_LIBRARY_PATH</I>:/home/sjplimp/lammps/src 
 </PRE>
 <P>If you use the python/install.py script, you need to invoke it every
 time you rebuild LAMMPS (as a shared library) or make changes to the
 python/lammps.py file.
 </P>
 <P>You can invoke install.py from the python directory as
 </P>
 <PRE>% python install.py <B>libdir</B> <B>pydir</B> 
 </PRE>
 <P>The optional libdir is where to copy the LAMMPS shared library to; the
 default is /usr/local/lib.  The optional pydir is where to copy the
 lammps.py file to; the default is the site-packages directory of the
 version of Python that is running the install script.
 </P>
 <P>Note that libdir must be a location that is in your default
 LD_LIBRARY_PATH, like /usr/local/lib or /usr/lib.  And pydir must be a
 location that Python looks in by default for imported modules, like
 its site-packages dir.  If you want to copy these files to
 non-standard locations, such as within your own user space, you will
 need to set your PYTHONPATH and LD_LIBRARY_PATH environment variables
 accordingly, as above.
 </P>
 <P>If the instally.py script does not allow you to copy files into system
 directories, prefix the python command with "sudo".  If you do this,
 make sure that the Python that root runs is the same as the Python you
 run.  E.g. you may need to do something like
 </P>
 <PRE>% sudo /usr/local/bin/python install.py <B>libdir</B> <B>pydir</B> 
 </PRE>
 <P>You can also invoke install.py from the make command in the src
 directory as
 </P>
 <PRE>% make install-python 
 </PRE>
 <P>In this mode you cannot append optional arguments.  Again, you may
 need to prefix this with "sudo".  In this mode you cannot control
 which Python is invoked by root.
 </P>
 <P>Note that if you want Python to be able to load different versions of
 the LAMMPS shared library (see <A HREF = "#py_5">this section</A> below), you will
 need to manually copy files like lmplammps_g++.so into the appropriate
 system directory.  This is not needed if you set the LD_LIBRARY_PATH
 environment variable as described above.
 </P>
 <HR>
@ -197,13 +185,12 @@ as a library and allow MPI functions to be called from Python.
 <LI><A HREF = "http://code.google.com/p/maroonmpi/">maroonmpi</A>
 <LI><A HREF = "http://code.google.com/p/mpi4py/">mpi4py</A>
 <LI><A HREF = "http://nbcr.sdsc.edu/forum/viewtopic.php?t=89&sid=c997fefc3933bd66204875b436940f16">myMPI</A>
-<LI><A HREF = "http://datamining.anu.edu.au/~ole/pypar">Pypar</A> 
+<LI><A HREF = "http://code.google.com/p/pypar">Pypar</A> 
 </UL>
-<P>All of these except pyMPI work by wrapping the MPI library (which must
+<P>All of these except pyMPI work by wrapping the MPI library and
-be available on your system as a shared library, as discussed above),
+exposing (some portion of) its interface to your Python script.  This
-and exposing (some portion of) its interface to your Python script.
+means Python cannot be used interactively in parallel, since they do
-This means Python cannot be used interactively in parallel, since they
+not address the issue of interactive input to multiple instances of
 do not address the issue of interactive input to multiple instances of
 Python running on different processors.  The one exception is pyMPI,
 which alters the Python interpreter to address this issue, and (I
 believe) creates a new alternate executable (in place of "python"
@ -233,17 +220,17 @@ sudo python setup.py install
 <P>The "sudo" is only needed if required to copy Numpy files into your
 Python distribution's site-packages directory.
 </P>
-<P>To install Pypar (version pypar-2.1.0_66 as of April 2009), unpack it
+<P>To install Pypar (version pypar-2.1.4_94 as of Aug 2012), unpack it
 and from its "source" directory, type
 </P>
 <PRE>python setup.py build
 sudo python setup.py install 
 </PRE>
-<P>Again, the "sudo" is only needed if required to copy PyPar files into
+<P>Again, the "sudo" is only needed if required to copy Pypar files into
 your Python distribution's site-packages directory.
 </P>
 <P>If you have successully installed Pypar, you should be able to run
-python serially and type
+Python and type
 </P>
 <PRE>import pypar 
 </PRE>
@ -259,6 +246,19 @@ print "Proc %d out of %d procs" % (pypar.rank(),pypar.size())
 </PRE>
 <P>and see one line of output for each processor you run on.
 </P>
 <P>IMPORTANT NOTE: To use Pypar and LAMMPS in parallel from Python, you
 must insure both are using the same version of MPI.  If you only have
 one MPI installed on your system, this is not an issue, but it can be
 if you have multiple MPIs.  Your LAMMPS build is explicit about which
 MPI it is using, since you specify the details in your lo-level
 src/MAKE/Makefile.foo file.  Pypar uses the "mpicc" command to find
 information about the MPI it uses to build against.  And it tries to
 load "libmpi.so" from the LD_LIBRARY_PATH.  This may or may not find
 the MPI library that LAMMPS is using.  If you have problems running
 both Pypar and LAMMPS together, this is an issue you may need to
 address, e.g. by moving other MPI installations so that Pypar finds
 the right one.
 </P>
 <HR>
 <A NAME = "py_4"></A><H4>11.4 Testing the Python-LAMMPS interface 
@ -272,24 +272,17 @@ and type:
 <P>If you get no errors, you're ready to use LAMMPS from Python.
 If the load fails, the most common error to see is
 </P>
-<P>"CDLL: asdfasdfasdf"
+<PRE>OSError: Could not load LAMMPS dynamic library 
-</P>
+</PRE>
 <P>which means Python was unable to load the LAMMPS shared library.  This
-can occur if it can't find the LAMMMPS library; see the environment
+typically occurs if the system can't find the LAMMMPS shared library
-variable discussion <A HREF = "#python_1">above</A>.  Or if it can't find one of the
+or one of the auxiliary shared libraries it depends on.
 auxiliary libraries that was specified in the LAMMPS build, in a
 shared dynamic library format.  This includes all libraries needed by
 main LAMMPS (e.g. MPI or FFTW or JPEG), system libraries needed by
 main LAMMPS (e.g. extra libs needed by MPI), or packages you have
 installed that require libraries provided with LAMMPS (e.g. the
 USER-ATC package require lib/atc/libatc.so) or system libraries
 (e.g. BLAS or Fortran-to-C libraries) listed in the
 lib/package/Makefile.lammps file.  Again, all of these must be
 available as shared libraries, or the Python load will fail.
 </P>
 <P>Python (actually the operating system) isn't verbose about telling you
-why the load failed, so go through the steps above and in
+why the load failed, so carefully go through the steps above regarding
-<A HREF = "Section_start.html#start_5">Section_start 5</A> carefully.
+environment variables, and the instructions in <A HREF = "Section_start.html#start_5">Section_start
 5</A> about building a shared library and
 about setting the LD_LIBRARY_PATH envirornment variable.
 </P>
 <H5><B>Test LAMMPS and Python in serial:</B> 
 </H5>
@ -334,10 +327,10 @@ pypar.finalize()
 <P>Note that if you leave out the 3 lines from test.py that specify Pypar
 commands you will instantiate and run LAMMPS independently on each of
 the P processors specified in the mpirun command.  In this case you
-should get 4 sets of output, each showing that a run was made on a
+should get 4 sets of output, each showing that a LAMMPS run was made
-single processor, instead of one set of output showing that it ran on
+on a single processor, instead of one set of output showing that
-4 processors.  If the 1-processor outputs occur, it means that Pypar
+LAMMPS ran on 4 processors.  If the 1-processor outputs occur, it
-is not working correctly.
+means that Pypar is not working correctly.
 </P>
 <P>Also note that once you import the PyPar module, Pypar initializes MPI
 for you, and you can use MPI calls directly in your Python script, as
@ -345,6 +338,8 @@ described in the Pypar documentation.  The last line of your Python
 script should be pypar.finalize(), to insure MPI is shut down
 correctly.
 </P>
 <H5><B>Running Python scripts:</B> 
 </H5>
 <P>Note that any Python script (not just for LAMMPS) can be invoked in
 one of several ways:
 </P>
@ -379,25 +374,18 @@ Python on a single processor, not in parallel.
 the source code for which is in python/lammps.py, which creates a
 "lammps" object, with a set of methods that can be invoked on that
 object.  The sample Python code below assumes you have first imported
-the "lammps" module in your Python script.  You can also include its
+the "lammps" module in your Python script, as follows:
 settings as follows, which are useful in test return values from some
 of the methods described below:
 </P>
 <PRE>from lammps import lammps 
 from lammps import LMPINT as INT
 from lammps import LMPDOUBLE as DOUBLE
 from lammps import LMPIPTR as IPTR
 from lammps import LMPDPTR as DPTR
 from lammps import LMPDPTRPTR as DPTRPTR 
 </PRE>
 <P>These are the methods defined by the lammps module.  If you look
 at the file src/library.cpp you will see that they correspond
 one-to-one with calls you can make to the LAMMPS library from a C++ or
 C or Fortran program.
 </P>
-<PRE>lmp = lammps()           # create a LAMMPS object using the default liblmp.so library
+<PRE>lmp = lammps()           # create a LAMMPS object using the default liblammps.so library
-lmp = lammps("g++")      # create a LAMMPS object using the liblmp_g++.so library
+lmp = lammps("g++")      # create a LAMMPS object using the liblammps_g++.so library
-lmp = lammps("",list)    # ditto, with command-line args, list = ["-echo","screen"]
+lmp = lammps("",list)    # ditto, with command-line args, e.g. list = ["-echo","screen"]
 lmp = lammps("g++",list) 
 </PRE>
 <PRE>lmp.close()              # destroy a LAMMPS object 
@ -407,11 +395,15 @@ lmp.command(cmd)         # invoke a single LAMMPS command, cmd = "run 100"
 </PRE>
 <PRE>xlo = lmp.extract_global(name,type)  # extract a global quantity
                                     # name = "boxxlo", "nlocal", etc
-				     # type = INT or DOUBLE 
+				     # type = 0 = int
 				     #        1 = double 
 </PRE>
 <PRE>coords = lmp.extract_atom(name,type)      # extract a per-atom quantity
                                          # name = "x", "type", etc
-				          # type = IPTR or DPTR or DPTRPTR 
+				          # type = 0 = vector of ints
 				          #        1 = array of ints
 				          #        2 = vector of doubles
 				          #        3 = array of doubles 
 </PRE>
 <PRE>eng = lmp.extract_compute(id,style,type)  # extract value(s) from a compute
 v3 = lmp.extract_fix(id,style,type,i,j)   # extract value(s) from a fix
@ -431,18 +423,23 @@ v3 = lmp.extract_fix(id,style,type,i,j)   # extract value(s) from a fix
 					     #        1 = atom-style variable 
 </PRE>
 <PRE>natoms = lmp.get_natoms()                 # total # of atoms as int
-x = lmp.get_coords()                      # return coords of all atoms in x
+data = lmp.gather_atoms(name,type,count)  # return atom attribute of all atoms gathered into data, ordered by atom ID
-lmp.put_coords(x)                         # set all atom coords via x 
+                                          # name = "x", "charge", "type", etc
                                          # count = # of per-atom values, 1 or 3, etc
 lmp.scatter_atoms(name,type,count,data)   # scatter atom attribute of all atoms from data, ordered by atom ID
                                          # name = "x", "charge", "type", etc
                                          # count = # of per-atom values, 1 or 3, etc 
 </PRE>
 <HR>
-<P>IMPORTANT NOTE: Currently, the creation of a LAMMPS object does not
+<P>IMPORTANT NOTE: Currently, the creation of a LAMMPS object from within
-take an MPI communicator as an argument.  There should be a way to do
+lammps.py does not take an MPI communicator as an argument.  There
-this, so that the LAMMPS instance runs on a subset of processors if
+should be a way to do this, so that the LAMMPS instance runs on a
-desired, but I don't know how to do it from Pypar.  So for now, it
+subset of processors if desired, but I don't know how to do it from
-runs on MPI_COMM_WORLD, which is all the processors.  If someone
+Pypar.  So for now, it runs with MPI_COMM_WORLD, which is all the
-figures out how to do this with one or more of the Python wrappers for
+processors.  If someone figures out how to do this with one or more of
-MPI, like Pypar, please let us know and we will amend these doc pages.
+the Python wrappers for MPI, like Pypar, please let us know and we
 will amend these doc pages.
 </P>
 <P>Note that you can create multiple LAMMPS objects in your Python
 script, and coordinate and run multiple simulations, e.g.
@ -470,8 +467,8 @@ returned, which you can use via normal Python subscripting.  See the
 extract() method in the src/atom.cpp file for a list of valid names.
 Again, new names could easily be added.  A pointer to a vector of
 doubles or integers, or a pointer to an array of doubles (double **)
-is returned.  You need to specify the appropriate data type via the
+or integers (int **) is returned.  You need to specify the appropriate
-type argument.
+data type via the type argument.
 </P>
 <P>For extract_compute() and extract_fix(), the global, per-atom, or
 local data calulated by the compute or fix can be accessed.  What is
@ -499,58 +496,57 @@ Python subscripting. The values will be zero for atoms not in the
 specified group.
 </P>
 <P>The get_natoms() method returns the total number of atoms in the
-simulation, as an int.  Note that extract_global("natoms") returns the
+simulation, as an int.
 same value, but as a double, which is the way LAMMPS stores it to
 allow for systems with more atoms than can be stored in an int (> 2
 billion).
 </P>
-<P>The get_coords() method returns an ctypes vector of doubles of length
+<P>The gather_atoms() method returns a ctypes vector of ints or doubles
-3*natoms, for the coordinates of all the atoms in the simulation,
+as specified by type, of length count*natoms, for the property of all
-ordered by x,y,z and then by atom ID (see code for put_coords()
+the atoms in the simulation specified by name, ordered by count and
-below).  The array can be used via normal Python subscripting.  If
+then by atom ID.  The vector can be used via normal Python
-atom IDs are not consecutively ordered within LAMMPS, a None is
+subscripting.  If atom IDs are not consecutively ordered within
-returned as indication of an error.
+LAMMPS, a None is returned as indication of an error.
 </P>
-<P>Note that the data structure get_coords() returns is different from
+<P>Note that the data structure gather_atoms("x") returns is different
-the data structure returned by extract_atom("x") in four ways.  (1)
+from the data structure returned by extract_atom("x") in four ways.
-Get_coords() returns a vector which you index as x[i];
+(1) Gather_atoms() returns a vector which you index as x[i];
 extract_atom() returns an array which you index as x[i][j].  (2)
-Get_coords() orders the atoms by atom ID while extract_atom() does
+Gather_atoms() orders the atoms by atom ID while extract_atom() does
-not.  (3) Get_coords() returns a list of all atoms in the simulation;
+not.  (3) Gathert_atoms() returns a list of all atoms in the
-extract_atoms() returns just the atoms local to each processor.  (4)
+simulation; extract_atoms() returns just the atoms local to each
-Finally, the get_coords() data structure is a copy of the atom coords
+processor.  (4) Finally, the gather_atoms() data structure is a copy
-stored internally in LAMMPS, whereas extract_atom returns an array
+of the atom coords stored internally in LAMMPS, whereas extract_atom()
-that points directly to the internal data.  This means you can change
+returns an array that effectively points directly to the internal
-values inside LAMMPS from Python by assigning a new values to the
+data.  This means you can change values inside LAMMPS from Python by
-extract_atom() array.  To do this with the get_atoms() vector, you
+assigning a new values to the extract_atom() array.  To do this with
-need to change values in the vector, then invoke the put_coords()
+the gather_atoms() vector, you need to change values in the vector,
-method.
+then invoke the scatter_atoms() method.
 </P>
-<P>The put_coords() method takes a vector of coordinates for all atoms in
+<P>The scatter_atoms() method takes a vector of ints or doubles as
-the simulation, assumed to be ordered by x,y,z and then by atom ID,
+specified by type, of length count*natoms, for the property of all the
-and uses the values to overwrite the corresponding coordinates for
+atoms in the simulation specified by name, ordered by bount and then
-each atom inside LAMMPS.  This requires LAMMPS to have its "map"
+by atom ID.  It uses the vector of data to overwrite the corresponding
-option enabled; see the <A HREF = "atom_modify.html">atom_modify</A> command for
+properties for each atom inside LAMMPS.  This requires LAMMPS to have
-details.  If it is not or if atom IDs are not consecutively ordered,
+its "map" option enabled; see the <A HREF = "atom_modify.html">atom_modify</A>
-no coordinates are reset,
+command for details.  If it is not, or if atom IDs are not
 consecutively ordered, no coordinates are reset.
 </P>
-<P>The array of coordinates passed to put_coords() must be a ctypes
+<P>The array of coordinates passed to scatter_atoms() must be a ctypes
-vector of doubles, allocated and initialized something like this:
+vector of ints or doubles, allocated and initialized something like
 this:
 </P>
 <PRE>from ctypes import *
-natoms = lmp.get_atoms()
+natoms = lmp.get_natoms()
 n3 = 3*natoms
-x = (c_double*n3)()
+x = (n3*c_double)()
 x<B>0</B> = x coord of atom with ID 1
 x<B>1</B> = y coord of atom with ID 1
 x<B>2</B> = z coord of atom with ID 1
 x<B>3</B> = x coord of atom with ID 2
 ...
 x<B>n3-1</B> = z coord of atom with ID natoms
-lmp.put_coords(x) 
+lmp.scatter_coords("x",1,3,x) 
 </PRE>
 <P>Alternatively, you can just change values in the vector returned by
-get_coords(), since it is a ctypes vector of doubles.
+gather_atoms("x",1,3), since it is a ctypes vector of doubles.
 </P>
 <HR>
--- a/doc/Section_python.txt
+++ b/doc/Section_python.txt
@ -11,8 +11,8 @@
 This section describes how to build and use LAMMPS via a Python
 interface.
-11.1 "Setting necessary environment variables"_#py_1
+11.1 "Building LAMMPS as a shared library"_#py_1
-11.2 "Building LAMMPS as a shared library"_#py_2
+11.2 "Installing the Python wrapper into Python"_#py_2
 11.3 "Extending Python with MPI to run in parallel"_#py_3
 11.4 "Testing the Python-LAMMPS interface"_#py_4
 11.5 "Using LAMMPS from Python"_#py_5
@ -72,109 +72,97 @@ check which version of Python you have installed, by simply typing
 :line
 :line
-11.1 Setting necessary environment variables :link(py_1),h4
+11.1 Building LAMMPS as a shared library :link(py_1),h4
 For Python to use the LAMMPS interface, it needs to find two files.
 The paths to these files need to be added to two environment variables
 that Python checks.
 The first is the environment variable PYTHONPATH.  It needs
 to include the directory where the python/lammps.py file is.
 For the csh or tcsh shells, you could add something like this to your
 ~/.cshrc file:
 setenv PYTHONPATH ${PYTHONPATH}:/home/sjplimp/lammps/python :pre
 The second is the environment variable LD_LIBRARY_PATH, which is used
 by the operating system to find dynamic shared libraries when it loads
 them.  It needs to include the directory where the shared LAMMPS
 library will be.  Normally this is the LAMMPS src dir, as explained in
 the following section.
 For the csh or tcsh shells, you could add something like this to your
 ~/.cshrc file:
 setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/home/sjplimp/lammps/src :pre
 As discussed below, if your LAMMPS build includes auxiliary libraries,
 they must also be available as shared libraries for Python to
 successfully load LAMMPS.  If they are not in default places where the
 operating system can find them, then you also have to add their paths
 to the LD_LIBRARY_PATH environment variable.
 For example, if you are using the dummy MPI library provided in
 src/STUBS, you need to add something like this to your ~/.cshrc file:
 setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/home/sjplimp/lammps/src/STUBS :pre
 If you are using the LAMMPS USER-ATC package, you need to add
 something like this to your ~/.cshrc file:
 setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/home/sjplimp/lammps/lib/atc :pre
 :line
 11.2 Building LAMMPS as a shared library :link(py_2),h4
 Instructions on how to build LAMMPS as a shared library are given in
 "Section_start 5"_Section_start.html#start_5.  A shared library is one
 that is dynamically loadable, which is what Python requires.  On Linux
 this is a library file that ends in ".so", not ".a".
->From the src directory, type
+From the src directory, type
 make makeshlib
-make -f Makefile.shlib foo
+make -f Makefile.shlib foo :pre
 where foo is the machine target name, such as linux or g++ or serial.
-This should create the file liblmp_foo.so in the src directory, as
+This should create the file liblammps_foo.so in the src directory, as
-well as a soft link liblmp.so which is what the Python wrapper will
+well as a soft link liblammps.so, which is what the Python wrapper will
-load by default.  If you are building multiple machine versions of the
+load by default.  Note that if you are building multiple machine
-shared library, the soft link is always set to the most recently built
+versions of the shared library, the soft link is always set to the
-version.
+most recently built version.
-Note that as discussed in below, a LAMMPS build may depend on several
+If this fails, see "Section_start 5"_Section_start.html#start_5 for
-auxiliary libraries, which are specified in your low-level
+more details, especially if your LAMMPS build uses auxiliary libraries
-src/Makefile.foo file.  For example, an MPI library, the FFTW library,
+like MPI or FFTW which may not be built as shared libraries on your
-a JPEG library, etc.  Depending on what LAMMPS packages you have
+system.
 installed, the build may also require additional libraries from the
 lib directories, such as lib/atc/libatc.so or lib/reax/libreax.so.
-You must insure that each of these libraries exist in shared library
+:line
 form (*.so file for Linux systems), or either the LAMMPS shared
 library build or the Python load of the library will fail.  For the
 load to be successful all the shared libraries must also be in
 directories that the operating system checks.  See the discussion in
 the preceding section about the LD_LIBRARY_PATH environment variable
 for how to insure this.
-Note that some system libraries, such as MPI, if you installed it
+11.2 Installing the Python wrapper into Python :link(py_2),h4
 yourself, may not be built by default as shared libraries.  The build
 instructions for the library should tell you how to do this.
-For example, here is how to build and install the "MPICH
+For Python to invoke LAMMPS, there are 2 files it needs to know about:
 library"_mpich, a popular open-source version of MPI, distributed by
 Argonne National Labs, as a shared library in the default
 /usr/local/lib location:
-:link(mpich,http://www-unix.mcs.anl.gov/mpi)
+python/lammps.py
 src/liblammps.so :ul
-./configure --enable-shared
+Lammps.py is the Python wrapper on the LAMMPS library interface.
-make
+Liblammps.so is the shared LAMMPS library that Python loads, as
-make install :pre
+described above.
-You may need to use "sudo make install" in place of the last line if
+You can insure Python can find these files in one of two ways:
 you do not have write priveleges for /usr/local/lib.  The end result
 should be the file /usr/local/lib/libmpich.so.
-Note that not all of the auxiliary libraries provided with LAMMPS have
+set two environment variables
-shared-library Makefiles in their lib directories.  Typically this
+run the python/install.py script :ul
-simply requires a Makefile.foo that adds a -fPIC switch when files are
+
-compiled and a "-fPIC -shared" switches when the library is linked
+If you set the paths to these files as environment variables, you only
-with a C++ (or Fortran) compiler, as well as an output target that
+have to do it once.  For the csh or tcsh shells, add something like
-ends in ".so", like libatc.o.  As we or others create and contribute
+this to your ~/.cshrc file, one line for each of the two files:
-these Makefiles, we will add them to the LAMMPS distribution.
+
 setenv PYTHONPATH ${PYTHONPATH}:/home/sjplimp/lammps/python
 setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/home/sjplimp/lammps/src :pre
 If you use the python/install.py script, you need to invoke it every
 time you rebuild LAMMPS (as a shared library) or make changes to the
 python/lammps.py file.
 You can invoke install.py from the python directory as
 % python install.py [libdir] [pydir] :pre
 The optional libdir is where to copy the LAMMPS shared library to; the
 default is /usr/local/lib.  The optional pydir is where to copy the
 lammps.py file to; the default is the site-packages directory of the
 version of Python that is running the install script.
 Note that libdir must be a location that is in your default
 LD_LIBRARY_PATH, like /usr/local/lib or /usr/lib.  And pydir must be a
 location that Python looks in by default for imported modules, like
 its site-packages dir.  If you want to copy these files to
 non-standard locations, such as within your own user space, you will
 need to set your PYTHONPATH and LD_LIBRARY_PATH environment variables
 accordingly, as above.
 If the instally.py script does not allow you to copy files into system
 directories, prefix the python command with "sudo".  If you do this,
 make sure that the Python that root runs is the same as the Python you
 run.  E.g. you may need to do something like
 % sudo /usr/local/bin/python install.py [libdir] [pydir] :pre
 You can also invoke install.py from the make command in the src
 directory as
 % make install-python :pre
 In this mode you cannot append optional arguments.  Again, you may
 need to prefix this with "sudo".  In this mode you cannot control
 which Python is invoked by root.
 Note that if you want Python to be able to load different versions of
 the LAMMPS shared library (see "this section"_#py_5 below), you will
 need to manually copy files like lmplammps_g++.so into the appropriate
 system directory.  This is not needed if you set the LD_LIBRARY_PATH
 environment variable as described above.
 :line
@ -193,13 +181,12 @@ These include
 "maroonmpi"_http://code.google.com/p/maroonmpi/
 "mpi4py"_http://code.google.com/p/mpi4py/
 "myMPI"_http://nbcr.sdsc.edu/forum/viewtopic.php?t=89&sid=c997fefc3933bd66204875b436940f16
-"Pypar"_http://datamining.anu.edu.au/~ole/pypar :ul
+"Pypar"_http://code.google.com/p/pypar :ul
-All of these except pyMPI work by wrapping the MPI library (which must
+All of these except pyMPI work by wrapping the MPI library and
-be available on your system as a shared library, as discussed above),
+exposing (some portion of) its interface to your Python script.  This
-and exposing (some portion of) its interface to your Python script.
+means Python cannot be used interactively in parallel, since they do
-This means Python cannot be used interactively in parallel, since they
+not address the issue of interactive input to multiple instances of
 do not address the issue of interactive input to multiple instances of
 Python running on different processors.  The one exception is pyMPI,
 which alters the Python interpreter to address this issue, and (I
 believe) creates a new alternate executable (in place of "python"
@ -229,17 +216,17 @@ sudo python setup.py install :pre
 The "sudo" is only needed if required to copy Numpy files into your
 Python distribution's site-packages directory.
-To install Pypar (version pypar-2.1.0_66 as of April 2009), unpack it
+To install Pypar (version pypar-2.1.4_94 as of Aug 2012), unpack it
 and from its "source" directory, type
 python setup.py build
 sudo python setup.py install :pre
-Again, the "sudo" is only needed if required to copy PyPar files into
+Again, the "sudo" is only needed if required to copy Pypar files into
 your Python distribution's site-packages directory.
 If you have successully installed Pypar, you should be able to run
-python serially and type
+Python and type
 import pypar :pre
@ -255,6 +242,19 @@ print "Proc %d out of %d procs" % (pypar.rank(),pypar.size()) :pre
 and see one line of output for each processor you run on.
 IMPORTANT NOTE: To use Pypar and LAMMPS in parallel from Python, you
 must insure both are using the same version of MPI.  If you only have
 one MPI installed on your system, this is not an issue, but it can be
 if you have multiple MPIs.  Your LAMMPS build is explicit about which
 MPI it is using, since you specify the details in your lo-level
 src/MAKE/Makefile.foo file.  Pypar uses the "mpicc" command to find
 information about the MPI it uses to build against.  And it tries to
 load "libmpi.so" from the LD_LIBRARY_PATH.  This may or may not find
 the MPI library that LAMMPS is using.  If you have problems running
 both Pypar and LAMMPS together, this is an issue you may need to
 address, e.g. by moving other MPI installations so that Pypar finds
 the right one.
 :line
 11.4 Testing the Python-LAMMPS interface :link(py_4),h4
@ -268,24 +268,17 @@ and type:
 If you get no errors, you're ready to use LAMMPS from Python.
 If the load fails, the most common error to see is
-"CDLL: asdfasdfasdf"
+OSError: Could not load LAMMPS dynamic library :pre
 which means Python was unable to load the LAMMPS shared library.  This
-can occur if it can't find the LAMMMPS library; see the environment
+typically occurs if the system can't find the LAMMMPS shared library
-variable discussion "above"_#python_1.  Or if it can't find one of the
+or one of the auxiliary shared libraries it depends on.
 auxiliary libraries that was specified in the LAMMPS build, in a
 shared dynamic library format.  This includes all libraries needed by
 main LAMMPS (e.g. MPI or FFTW or JPEG), system libraries needed by
 main LAMMPS (e.g. extra libs needed by MPI), or packages you have
 installed that require libraries provided with LAMMPS (e.g. the
 USER-ATC package require lib/atc/libatc.so) or system libraries
 (e.g. BLAS or Fortran-to-C libraries) listed in the
 lib/package/Makefile.lammps file.  Again, all of these must be
 available as shared libraries, or the Python load will fail.
 Python (actually the operating system) isn't verbose about telling you
-why the load failed, so go through the steps above and in
+why the load failed, so carefully go through the steps above regarding
-"Section_start 5"_Section_start.html#start_5 carefully.
+environment variables, and the instructions in "Section_start
 5"_Section_start.html#start_5 about building a shared library and
 about setting the LD_LIBRARY_PATH envirornment variable.
 [Test LAMMPS and Python in serial:] :h5
@ -330,10 +323,10 @@ and you should see the same output as if you had typed
 Note that if you leave out the 3 lines from test.py that specify Pypar
 commands you will instantiate and run LAMMPS independently on each of
 the P processors specified in the mpirun command.  In this case you
-should get 4 sets of output, each showing that a run was made on a
+should get 4 sets of output, each showing that a LAMMPS run was made
-single processor, instead of one set of output showing that it ran on
+on a single processor, instead of one set of output showing that
-4 processors.  If the 1-processor outputs occur, it means that Pypar
+LAMMPS ran on 4 processors.  If the 1-processor outputs occur, it
-is not working correctly.
+means that Pypar is not working correctly.
 Also note that once you import the PyPar module, Pypar initializes MPI
 for you, and you can use MPI calls directly in your Python script, as
@ -341,6 +334,8 @@ described in the Pypar documentation.  The last line of your Python
 script should be pypar.finalize(), to insure MPI is shut down
 correctly.
 [Running Python scripts:] :h5
 Note that any Python script (not just for LAMMPS) can be invoked in
 one of several ways:
@ -374,25 +369,18 @@ The Python interface to LAMMPS consists of a Python "lammps" module,
 the source code for which is in python/lammps.py, which creates a
 "lammps" object, with a set of methods that can be invoked on that
 object.  The sample Python code below assumes you have first imported
-the "lammps" module in your Python script.  You can also include its
+the "lammps" module in your Python script, as follows:
 settings as follows, which are useful in test return values from some
 of the methods described below:
-from lammps import lammps 
+from lammps import lammps :pre
 from lammps import LMPINT as INT
 from lammps import LMPDOUBLE as DOUBLE
 from lammps import LMPIPTR as IPTR
 from lammps import LMPDPTR as DPTR
 from lammps import LMPDPTRPTR as DPTRPTR :pre
 These are the methods defined by the lammps module.  If you look
 at the file src/library.cpp you will see that they correspond
 one-to-one with calls you can make to the LAMMPS library from a C++ or
 C or Fortran program.
-lmp = lammps()           # create a LAMMPS object using the default liblmp.so library
+lmp = lammps()           # create a LAMMPS object using the default liblammps.so library
-lmp = lammps("g++")      # create a LAMMPS object using the liblmp_g++.so library
+lmp = lammps("g++")      # create a LAMMPS object using the liblammps_g++.so library
-lmp = lammps("",list)    # ditto, with command-line args, list = \["-echo","screen"\]
+lmp = lammps("",list)    # ditto, with command-line args, e.g. list = \["-echo","screen"\]
 lmp = lammps("g++",list) :pre
 lmp.close()              # destroy a LAMMPS object :pre
@ -402,11 +390,15 @@ lmp.command(cmd)         # invoke a single LAMMPS command, cmd = "run 100" :pre
 xlo = lmp.extract_global(name,type)  # extract a global quantity
                                     # name = "boxxlo", "nlocal", etc
-				     # type = INT or DOUBLE :pre
+				     # type = 0 = int
 				     #        1 = double :pre
 coords = lmp.extract_atom(name,type)      # extract a per-atom quantity
                                          # name = "x", "type", etc
-				          # type = IPTR or DPTR or DPTRPTR :pre
+				          # type = 0 = vector of ints
 				          #        1 = array of ints
 				          #        2 = vector of doubles
 				          #        3 = array of doubles :pre
 eng = lmp.extract_compute(id,style,type)  # extract value(s) from a compute
 v3 = lmp.extract_fix(id,style,type,i,j)   # extract value(s) from a fix
@ -426,18 +418,23 @@ var = lmp.extract_variable(name,group,flag)  # extract value(s) from a variable
 					     #        1 = atom-style variable :pre
 natoms = lmp.get_natoms()                 # total # of atoms as int
-x = lmp.get_coords()                      # return coords of all atoms in x
+data = lmp.gather_atoms(name,type,count)  # return atom attribute of all atoms gathered into data, ordered by atom ID
-lmp.put_coords(x)                         # set all atom coords via x :pre
+                                          # name = "x", "charge", "type", etc
                                          # count = # of per-atom values, 1 or 3, etc
 lmp.scatter_atoms(name,type,count,data)   # scatter atom attribute of all atoms from data, ordered by atom ID
                                          # name = "x", "charge", "type", etc
                                          # count = # of per-atom values, 1 or 3, etc :pre
 :line
-IMPORTANT NOTE: Currently, the creation of a LAMMPS object does not
+IMPORTANT NOTE: Currently, the creation of a LAMMPS object from within
-take an MPI communicator as an argument.  There should be a way to do
+lammps.py does not take an MPI communicator as an argument.  There
-this, so that the LAMMPS instance runs on a subset of processors if
+should be a way to do this, so that the LAMMPS instance runs on a
-desired, but I don't know how to do it from Pypar.  So for now, it
+subset of processors if desired, but I don't know how to do it from
-runs on MPI_COMM_WORLD, which is all the processors.  If someone
+Pypar.  So for now, it runs with MPI_COMM_WORLD, which is all the
-figures out how to do this with one or more of the Python wrappers for
+processors.  If someone figures out how to do this with one or more of
-MPI, like Pypar, please let us know and we will amend these doc pages.
+the Python wrappers for MPI, like Pypar, please let us know and we
 will amend these doc pages.
 Note that you can create multiple LAMMPS objects in your Python
 script, and coordinate and run multiple simulations, e.g.
@ -465,8 +462,8 @@ returned, which you can use via normal Python subscripting.  See the
 extract() method in the src/atom.cpp file for a list of valid names.
 Again, new names could easily be added.  A pointer to a vector of
 doubles or integers, or a pointer to an array of doubles (double **)
-is returned.  You need to specify the appropriate data type via the
+or integers (int **) is returned.  You need to specify the appropriate
-type argument.
+data type via the type argument.
 For extract_compute() and extract_fix(), the global, per-atom, or
 local data calulated by the compute or fix can be accessed.  What is
@ -494,58 +491,57 @@ Python subscripting. The values will be zero for atoms not in the
 specified group.
 The get_natoms() method returns the total number of atoms in the
-simulation, as an int.  Note that extract_global("natoms") returns the
+simulation, as an int.
 same value, but as a double, which is the way LAMMPS stores it to
 allow for systems with more atoms than can be stored in an int (> 2
 billion).
-The get_coords() method returns an ctypes vector of doubles of length
+The gather_atoms() method returns a ctypes vector of ints or doubles
-3*natoms, for the coordinates of all the atoms in the simulation,
+as specified by type, of length count*natoms, for the property of all
-ordered by x,y,z and then by atom ID (see code for put_coords()
+the atoms in the simulation specified by name, ordered by count and
-below).  The array can be used via normal Python subscripting.  If
+then by atom ID.  The vector can be used via normal Python
-atom IDs are not consecutively ordered within LAMMPS, a None is
+subscripting.  If atom IDs are not consecutively ordered within
-returned as indication of an error.
+LAMMPS, a None is returned as indication of an error.
-Note that the data structure get_coords() returns is different from
+Note that the data structure gather_atoms("x") returns is different
-the data structure returned by extract_atom("x") in four ways.  (1)
+from the data structure returned by extract_atom("x") in four ways.
-Get_coords() returns a vector which you index as x\[i\];
+(1) Gather_atoms() returns a vector which you index as x\[i\];
 extract_atom() returns an array which you index as x\[i\]\[j\].  (2)
-Get_coords() orders the atoms by atom ID while extract_atom() does
+Gather_atoms() orders the atoms by atom ID while extract_atom() does
-not.  (3) Get_coords() returns a list of all atoms in the simulation;
+not.  (3) Gathert_atoms() returns a list of all atoms in the
-extract_atoms() returns just the atoms local to each processor.  (4)
+simulation; extract_atoms() returns just the atoms local to each
-Finally, the get_coords() data structure is a copy of the atom coords
+processor.  (4) Finally, the gather_atoms() data structure is a copy
-stored internally in LAMMPS, whereas extract_atom returns an array
+of the atom coords stored internally in LAMMPS, whereas extract_atom()
-that points directly to the internal data.  This means you can change
+returns an array that effectively points directly to the internal
-values inside LAMMPS from Python by assigning a new values to the
+data.  This means you can change values inside LAMMPS from Python by
-extract_atom() array.  To do this with the get_atoms() vector, you
+assigning a new values to the extract_atom() array.  To do this with
-need to change values in the vector, then invoke the put_coords()
+the gather_atoms() vector, you need to change values in the vector,
-method.
+then invoke the scatter_atoms() method.
-The put_coords() method takes a vector of coordinates for all atoms in
+The scatter_atoms() method takes a vector of ints or doubles as
-the simulation, assumed to be ordered by x,y,z and then by atom ID,
+specified by type, of length count*natoms, for the property of all the
-and uses the values to overwrite the corresponding coordinates for
+atoms in the simulation specified by name, ordered by bount and then
-each atom inside LAMMPS.  This requires LAMMPS to have its "map"
+by atom ID.  It uses the vector of data to overwrite the corresponding
-option enabled; see the "atom_modify"_atom_modify.html command for
+properties for each atom inside LAMMPS.  This requires LAMMPS to have
-details.  If it is not or if atom IDs are not consecutively ordered,
+its "map" option enabled; see the "atom_modify"_atom_modify.html
-no coordinates are reset,
+command for details.  If it is not, or if atom IDs are not
 consecutively ordered, no coordinates are reset.
-The array of coordinates passed to put_coords() must be a ctypes
+The array of coordinates passed to scatter_atoms() must be a ctypes
-vector of doubles, allocated and initialized something like this:
+vector of ints or doubles, allocated and initialized something like
 this:
 from ctypes import *
-natoms = lmp.get_atoms()
+natoms = lmp.get_natoms()
 n3 = 3*natoms
-x = (c_double*n3)()
+x = (n3*c_double)()
 x[0] = x coord of atom with ID 1
 x[1] = y coord of atom with ID 1
 x[2] = z coord of atom with ID 1
 x[3] = x coord of atom with ID 2
 ...
 x[n3-1] = z coord of atom with ID natoms
-lmp.put_coords(x) :pre
+lmp.scatter_coords("x",1,3,x) :pre
 Alternatively, you can just change values in the vector returned by
-get_coords(), since it is a ctypes vector of doubles.
+gather_atoms("x",1,3), since it is a ctypes vector of doubles.
 :line 
--- a/doc/Section_start.html
+++ b/doc/Section_start.html
@ -281,10 +281,11 @@ dummy MPI library provided in src/STUBS, since you don't need a true
 MPI library installed on your system.  See the
 src/MAKE/Makefile.serial file for how to specify the 3 MPI variables
 in this case.  You will also need to build the STUBS library for your
-platform before making LAMMPS itself.  From the src directory, type
+platform before making LAMMPS itself.  To build from the src
-"make stubs", or from the STUBS dir, type "make" and it should create
+directory, type "make stubs", or from the STUBS dir, type "make".
-a libmpi.a suitable for linking to LAMMPS.  If this build fails, you
+This should create a libmpi_stubs.a file suitable for linking to
-will need to edit the STUBS/Makefile for your platform.
+LAMMPS.  If the build fails, you will need to edit the STUBS/Makefile
 for your platform.
 </P>
 <P>The file STUBS/mpi.cpp provides a CPU timer function called
 MPI_Wtime() that calls gettimeofday() .  If your system doesn't
@ -779,24 +780,28 @@ then be called from another application or a scripting language.  See
 LAMMPS to other codes.  See <A HREF = "Section_python.html">this section</A> for
 more info on wrapping and running LAMMPS from Python.
 </P>
 <H5><B>Static library:</B> 
 </H5>
 <P>To build LAMMPS as a static library (*.a file on Linux), type
 </P>
 <PRE>make makelib
 make -f Makefile.lib foo 
 </PRE>
 <P>where foo is the machine name.  This kind of library is typically used
-to statically link a driver application to all of LAMMPS, so that you
+to statically link a driver application to LAMMPS, so that you can
-can insure all dependencies are satisfied at compile time.  Note that
+insure all dependencies are satisfied at compile time.  Note that
 inclusion or exclusion of any desired optional packages should be done
 before typing "make makelib".  The first "make" command will create a
-current Makefile.lib with all the file names in your src dir.  The 2nd
+current Makefile.lib with all the file names in your src dir.  The
-"make" command will use it to build LAMMPS as a static library, using
+second "make" command will use it to build LAMMPS as a static library,
-the ARCHIVE and ARFLAGS settings in src/MAKE/Makefile.foo.  The build
+using the ARCHIVE and ARFLAGS settings in src/MAKE/Makefile.foo.  The
-will create the file liblmp_foo.a which another application can link
+build will create the file liblammps_foo.a which another application can
-to.
+link to.
 </P>
 <H5><B>Shared library:</B> 
 </H5>
 <P>To build LAMMPS as a shared library (*.so file on Linux), which can be
-dynamically loaded, type
+dynamically loaded, e.g. from Python, type
 </P>
 <PRE>make makeshlib
 make -f Makefile.shlib foo 
@ -806,31 +811,58 @@ wrapping LAMMPS with Python; see <A HREF = "Section_python.html">Section_python<
 for details.  Again, note that inclusion or exclusion of any desired
 optional packages should be done before typing "make makelib".  The
 first "make" command will create a current Makefile.shlib with all the
-file names in your src dir.  The 2nd "make" command will use it to
+file names in your src dir.  The second "make" command will use it to
 build LAMMPS as a shared library, using the SHFLAGS and SHLIBFLAGS
 settings in src/MAKE/Makefile.foo.  The build will create the file
-liblmp_foo.so which another application can link to dyamically, as
+liblammps_foo.so which another application can link to dyamically.  It
-well as a soft link liblmp.so, which the Python wrapper uses by
+will also create a soft link liblammps.so, which the Python wrapper uses
-default.
+by default.
 </P>
 <P>Note that for a shared library to be usable by a calling program, all
 the auxiliary libraries it depends on must also exist as shared
-libraries, and be find-able by the operating system.  Else you will
+libraries.  This will be the case for libraries included with LAMMPS,
-get a run-time error when the shared library is loaded.  For LAMMPS,
+such as the dummy MPI library in src/STUBS or any package libraries in
-this includes all libraries needed by main LAMMPS (e.g. MPI or FFTW or
+lib/packges, since they are always built as shared libraries with the
-JPEG), system libraries needed by main LAMMPS (e.g. extra libs needed
+-fPIC switch.  However, if a library like MPI or FFTW does not exist
-by MPI), or packages you have installed that require libraries
+as a shared library, the second make command will generate an error.
-provided with LAMMPS (e.g. the USER-ATC package require
+This means you will need to install a shared library version of the
-lib/atc/libatc.so) or system libraries (e.g. BLAS or Fortran-to-C
+package.  The build instructions for the library should tell you how
-libraries) listed in the lib/package/Makefile.lammps file.  See the
+to do this.
 discussion about the LAMMPS shared library in
 <A HREF = "Section_python.html">Section_python</A> for details about how to build
 shared versions of these libraries, and how to insure the operating
 system can find them, by setting the LD_LIBRARY_PATH environment
 variable correctly.
 </P>
-<P>Either flavor of library allows one or more LAMMPS objects to be
+<P>As an example, here is how to build and install the <A HREF = "http://www-unix.mcs.anl.gov/mpi">MPICH
-instantiated from the calling program.
+library</A>, a popular open-source version of MPI, distributed by
 Argonne National Labs, as a shared library in the default
 /usr/local/lib location:
 </P>
 <PRE>./configure --enable-shared
 make
 make install 
 </PRE>
 <P>You may need to use "sudo make install" in place of the last line if
 you do not have write privileges for /usr/local/lib.  The end result
 should be the file /usr/local/lib/libmpich.so.
 </P>
 <H5><B>Additional requirement for using a shared library:</B> 
 </H5>
 <P>The operating system finds shared libraries to load at run-time using
 the environment variable LD_LIBRARY_PATH.  So you may wish to copy the
 file src/liblammps.so or src/liblammps_g++.so (for example) to a place
 the system can find it by default, such as /usr/local/lib, or you may
 wish to add the lammps src directory to LD_LIBRARY_PATH, so that the
 current version of the shared library is always available to programs
 that use it.
 </P>
 <P>For the csh or tcsh shells, you would add something like this to your
 ~/.cshrc file:
 </P>
 <PRE>setenv LD_LIBRARY_PATH $<I>LD_LIBRARY_PATH</I>:/home/sjplimp/lammps/src 
 </PRE>
 <H5><B>Calling the LAMMPS library:</B> 
 </H5>
 <P>Either flavor of library (static or shared0 allows one or more LAMMPS
 objects to be instantiated from the calling program.
 </P>
 <P>When used from a C++ program, all of LAMMPS is wrapped in a LAMMPS_NS
 namespace; you can safely use any of its classes and methods from
@ -841,17 +873,17 @@ Python, the library has a simple function-style interface, provided in
 src/library.cpp and src/library.h.
 </P>
 <P>See the sample codes in examples/COUPLE/simple for examples of C++ and
-C codes that invoke LAMMPS thru its library interface.  There are
+C and Fortran codes that invoke LAMMPS thru its library interface.
-other examples as well in the COUPLE directory which are discussed in
+There are other examples as well in the COUPLE directory which are
-<A HREF = "Section_howto.html#howto_10">Section_howto 10</A> of the manual.  See
+discussed in <A HREF = "Section_howto.html#howto_10">Section_howto 10</A> of the
-<A HREF = "Section_python.html">Section_python</A> of the manual for a description
+manual.  See <A HREF = "Section_python.html">Section_python</A> of the manual for a
-of the Python wrapper provided with LAMMPS that operates through the
+description of the Python wrapper provided with LAMMPS that operates
-LAMMPS library interface.
+through the LAMMPS library interface.
 </P>
-<P>The files src/library.cpp and library.h contain the C-style interface
+<P>The files src/library.cpp and library.h define the C-style API for
-to LAMMPS.  See <A HREF = "Section_howto.html#howto_19">Section_howto 19</A> of the
+using LAMMPS as a library.  See <A HREF = "Section_howto.html#howto_19">Section_howto
-manual for a description of the interface and how to extend it for
+19</A> of the manual for a description of the
-your needs.
+interface and how to extend it for your needs.
 </P>
 <HR>
--- a/doc/Section_start.txt
+++ b/doc/Section_start.txt
@ -275,10 +275,11 @@ dummy MPI library provided in src/STUBS, since you don't need a true
 MPI library installed on your system.  See the
 src/MAKE/Makefile.serial file for how to specify the 3 MPI variables
 in this case.  You will also need to build the STUBS library for your
-platform before making LAMMPS itself.  From the src directory, type
+platform before making LAMMPS itself.  To build from the src
-"make stubs", or from the STUBS dir, type "make" and it should create
+directory, type "make stubs", or from the STUBS dir, type "make".
-a libmpi.a suitable for linking to LAMMPS.  If this build fails, you
+This should create a libmpi_stubs.a file suitable for linking to
-will need to edit the STUBS/Makefile for your platform.
+LAMMPS.  If the build fails, you will need to edit the STUBS/Makefile
 for your platform.
 The file STUBS/mpi.c provides a CPU timer function called
 MPI_Wtime() that calls gettimeofday() .  If your system doesn't
@ -773,24 +774,28 @@ then be called from another application or a scripting language.  See
 LAMMPS to other codes.  See "this section"_Section_python.html for
 more info on wrapping and running LAMMPS from Python.
 [Static library:] :h5
 To build LAMMPS as a static library (*.a file on Linux), type
 make makelib
 make -f Makefile.lib foo :pre
 where foo is the machine name.  This kind of library is typically used
-to statically link a driver application to all of LAMMPS, so that you
+to statically link a driver application to LAMMPS, so that you can
-can insure all dependencies are satisfied at compile time.  Note that
+insure all dependencies are satisfied at compile time.  Note that
 inclusion or exclusion of any desired optional packages should be done
 before typing "make makelib".  The first "make" command will create a
-current Makefile.lib with all the file names in your src dir.  The 2nd
+current Makefile.lib with all the file names in your src dir.  The
-"make" command will use it to build LAMMPS as a static library, using
+second "make" command will use it to build LAMMPS as a static library,
-the ARCHIVE and ARFLAGS settings in src/MAKE/Makefile.foo.  The build
+using the ARCHIVE and ARFLAGS settings in src/MAKE/Makefile.foo.  The
-will create the file liblmp_foo.a which another application can link
+build will create the file liblammps_foo.a which another application can
-to.
+link to.
 [Shared library:] :h5
 To build LAMMPS as a shared library (*.so file on Linux), which can be
-dynamically loaded, type
+dynamically loaded, e.g. from Python, type
 make makeshlib
 make -f Makefile.shlib foo :pre
@ -800,31 +805,58 @@ wrapping LAMMPS with Python; see "Section_python"_Section_python.html
 for details.  Again, note that inclusion or exclusion of any desired
 optional packages should be done before typing "make makelib".  The
 first "make" command will create a current Makefile.shlib with all the
-file names in your src dir.  The 2nd "make" command will use it to
+file names in your src dir.  The second "make" command will use it to
 build LAMMPS as a shared library, using the SHFLAGS and SHLIBFLAGS
 settings in src/MAKE/Makefile.foo.  The build will create the file
-liblmp_foo.so which another application can link to dyamically, as
+liblammps_foo.so which another application can link to dyamically.  It
-well as a soft link liblmp.so, which the Python wrapper uses by
+will also create a soft link liblammps.so, which the Python wrapper uses
-default.
+by default.
 Note that for a shared library to be usable by a calling program, all
 the auxiliary libraries it depends on must also exist as shared
-libraries, and be find-able by the operating system.  Else you will
+libraries.  This will be the case for libraries included with LAMMPS,
-get a run-time error when the shared library is loaded.  For LAMMPS,
+such as the dummy MPI library in src/STUBS or any package libraries in
-this includes all libraries needed by main LAMMPS (e.g. MPI or FFTW or
+lib/packges, since they are always built as shared libraries with the
-JPEG), system libraries needed by main LAMMPS (e.g. extra libs needed
+-fPIC switch.  However, if a library like MPI or FFTW does not exist
-by MPI), or packages you have installed that require libraries
+as a shared library, the second make command will generate an error.
-provided with LAMMPS (e.g. the USER-ATC package require
+This means you will need to install a shared library version of the
-lib/atc/libatc.so) or system libraries (e.g. BLAS or Fortran-to-C
+package.  The build instructions for the library should tell you how
-libraries) listed in the lib/package/Makefile.lammps file.  See the
+to do this.
 discussion about the LAMMPS shared library in
 "Section_python"_Section_python.html for details about how to build
 shared versions of these libraries, and how to insure the operating
 system can find them, by setting the LD_LIBRARY_PATH environment
 variable correctly.
-Either flavor of library allows one or more LAMMPS objects to be
+As an example, here is how to build and install the "MPICH
-instantiated from the calling program.
+library"_mpich, a popular open-source version of MPI, distributed by
 Argonne National Labs, as a shared library in the default
 /usr/local/lib location:
 :link(mpich,http://www-unix.mcs.anl.gov/mpi)
 ./configure --enable-shared
 make
 make install :pre
 You may need to use "sudo make install" in place of the last line if
 you do not have write privileges for /usr/local/lib.  The end result
 should be the file /usr/local/lib/libmpich.so.
 [Additional requirement for using a shared library:] :h5
 The operating system finds shared libraries to load at run-time using
 the environment variable LD_LIBRARY_PATH.  So you may wish to copy the
 file src/liblammps.so or src/liblammps_g++.so (for example) to a place
 the system can find it by default, such as /usr/local/lib, or you may
 wish to add the lammps src directory to LD_LIBRARY_PATH, so that the
 current version of the shared library is always available to programs
 that use it.
 For the csh or tcsh shells, you would add something like this to your
 ~/.cshrc file:
 setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/home/sjplimp/lammps/src :pre
 [Calling the LAMMPS library:] :h5
 Either flavor of library (static or shared0 allows one or more LAMMPS
 objects to be instantiated from the calling program.
 When used from a C++ program, all of LAMMPS is wrapped in a LAMMPS_NS
 namespace; you can safely use any of its classes and methods from
@ -835,17 +867,17 @@ Python, the library has a simple function-style interface, provided in
 src/library.cpp and src/library.h.
 See the sample codes in examples/COUPLE/simple for examples of C++ and
-C codes that invoke LAMMPS thru its library interface.  There are
+C and Fortran codes that invoke LAMMPS thru its library interface.
-other examples as well in the COUPLE directory which are discussed in
+There are other examples as well in the COUPLE directory which are
-"Section_howto 10"_Section_howto.html#howto_10 of the manual.  See
+discussed in "Section_howto 10"_Section_howto.html#howto_10 of the
-"Section_python"_Section_python.html of the manual for a description
+manual.  See "Section_python"_Section_python.html of the manual for a
-of the Python wrapper provided with LAMMPS that operates through the
+description of the Python wrapper provided with LAMMPS that operates
-LAMMPS library interface.
+through the LAMMPS library interface.
-The files src/library.cpp and library.h contain the C-style interface
+The files src/library.cpp and library.h define the C-style API for
-to LAMMPS.  See "Section_howto 19"_Section_howto.html#howto_19 of the
+using LAMMPS as a library.  See "Section_howto
-manual for a description of the interface and how to extend it for
+19"_Section_howto.html#howto_19 of the manual for a description of the
-your needs.
+interface and how to extend it for your needs.
 :line
--- a/doc/fix_deform.html
+++ b/doc/fix_deform.html
@ -327,17 +327,19 @@ direction for xy deformation) from the unstrained orientation.
 </P>
 <P>The tilt factor T as a function of time will change as
 </P>
-<PRE>T(t) = T0 + erate*dt 
+<PRE>T(t) = T0 + L0*erate*dt 
 </PRE>
-<P>where T0 is the initial tilt factor and dt is the elapsed time (in
+<P>where T0 is the initial tilt factor, L0 is the original length of the
-time units).  Thus if <I>erate</I> R is specified as 0.1 and time units are
+box perpendicular to the shear direction (e.g. y box length for xy
-picoseconds, this means the shear strain will increase by 0.1 every
+deformation), and dt is the elapsed time (in time units).  Thus if
-picosecond.  I.e. if the xy shear strain was initially 0.0, then
+<I>erate</I> R is specified as 0.1 and time units are picoseconds, this
-strain after 1 psec = 0.1, strain after 2 psec = 0.2, etc.  Thus the
+means the shear strain will increase by 0.1 every picosecond.  I.e. if
-tilt factor would be 0.0 at time 0, 0.1*ybox at 1 psec, 0.2*ybox at 2
+the xy shear strain was initially 0.0, then strain after 1 psec = 0.1,
-psec, etc, where ybox is the original y box length.  R = 1 or 2 means
+strain after 2 psec = 0.2, etc.  Thus the tilt factor would be 0.0 at
-the tilt factor will increase by 1 or 2 every picosecond.  R = -0.01
+time 0, 0.1*ybox at 1 psec, 0.2*ybox at 2 psec, etc, where ybox is the
-means a decrease in shear strain by 0.01 every picosecond.
+original y box length.  R = 1 or 2 means the tilt factor will increase
 by 1 or 2 every picosecond.  R = -0.01 means a decrease in shear
 strain by 0.01 every picosecond.
 </P>
 <P>The <I>trate</I> style changes a tilt factor at a "constant true shear
 strain rate".  Note that this is not an "engineering shear strain
--- a/doc/fix_deform.txt
+++ b/doc/fix_deform.txt
@ -317,17 +317,19 @@ direction for xy deformation) from the unstrained orientation.
 The tilt factor T as a function of time will change as
-T(t) = T0 + erate*dt :pre
+T(t) = T0 + L0*erate*dt :pre
-where T0 is the initial tilt factor and dt is the elapsed time (in
+where T0 is the initial tilt factor, L0 is the original length of the
-time units).  Thus if {erate} R is specified as 0.1 and time units are
+box perpendicular to the shear direction (e.g. y box length for xy
-picoseconds, this means the shear strain will increase by 0.1 every
+deformation), and dt is the elapsed time (in time units).  Thus if
-picosecond.  I.e. if the xy shear strain was initially 0.0, then
+{erate} R is specified as 0.1 and time units are picoseconds, this
-strain after 1 psec = 0.1, strain after 2 psec = 0.2, etc.  Thus the
+means the shear strain will increase by 0.1 every picosecond.  I.e. if
-tilt factor would be 0.0 at time 0, 0.1*ybox at 1 psec, 0.2*ybox at 2
+the xy shear strain was initially 0.0, then strain after 1 psec = 0.1,
-psec, etc, where ybox is the original y box length.  R = 1 or 2 means
+strain after 2 psec = 0.2, etc.  Thus the tilt factor would be 0.0 at
-the tilt factor will increase by 1 or 2 every picosecond.  R = -0.01
+time 0, 0.1*ybox at 1 psec, 0.2*ybox at 2 psec, etc, where ybox is the
-means a decrease in shear strain by 0.01 every picosecond.
+original y box length.  R = 1 or 2 means the tilt factor will increase
 by 1 or 2 every picosecond.  R = -0.01 means a decrease in shear
 strain by 0.01 every picosecond.
 The {trate} style changes a tilt factor at a "constant true shear
 strain rate".  Note that this is not an "engineering shear strain
--- a/doc/units.html
+++ b/doc/units.html
@ -58,6 +58,11 @@ results from a unitless LJ simulation into physical quantities.
 <LI>electric field = force/charge, where E* = E (4 pi perm0 sigma epsilon)^1/2 sigma / epsilon 
 <LI>density = mass/volume, where rho* = rho sigma^dim 
 </UL>
 <P>Note that for LJ units, the default mode of thermodyamic output via
 the <A HREF = "thermo_style.html">thermo_style</A> command is to normalize energies
 by the number of atoms, i.e. energy/atom.  This can be changed via the
 <A HREF = "therm_modify.html">thermo_modify norm</A> command.
 </P>
 <P>For style <I>real</I>, these are the units:
 </P>
 <UL><LI>mass = grams/mole
--- a/doc/units.txt
+++ b/doc/units.txt
@ -55,6 +55,11 @@ dipole = reduced LJ dipole, moment where *mu = mu / (4 pi perm0 sigma^3 epsilon)
 electric field = force/charge, where E* = E (4 pi perm0 sigma epsilon)^1/2 sigma / epsilon 
 density = mass/volume, where rho* = rho sigma^dim :ul
 Note that for LJ units, the default mode of thermodyamic output via
 the "thermo_style"_thermo_style.html command is to normalize energies
 by the number of atoms, i.e. energy/atom.  This can be changed via the
 "thermo_modify norm"_therm_modify.html command.
 For style {real}, these are the units:
 mass = grams/mole
--- a/examples/COUPLE/README
+++ b/examples/COUPLE/README
@ -17,7 +17,7 @@ library.  Basically, you type something like
 make makelib
 make -f Makefile.lib g++
-in the LAMMPS src directory to create liblmp_g++.a
+in the LAMMPS src directory to create liblammps_g++.a
 The library interface to LAMMPS is in src/library.cpp.  Routines can
 be easily added to this file so an external program can perform the
@ -34,5 +34,7 @@ library		    collection of useful inter-code communication routines
 simple		    simple example of driver code calling LAMMPS as library
 fortran             a wrapper on the LAMMPS library API that
 		      can be called from Fortran
 fortran2            a more sophisticated wrapper on the LAMMPS library API that
 		      can be called from Fortran
 Each sub-directory has its own README.
--- a/examples/COUPLE/fortran/README
+++ b/examples/COUPLE/fortran/README
@ -1,9 +1,8 @@
 libfwrapper.c is a C file that wraps the LAMMPS library API
 in src/library.h so that it can be called from Fortran.
-See the couple/simple/simple.f90 program for an example
+See the couple/simple/simple.f90 program for an example of a Fortran
-of a Fortran code that does this.
+code that does this.
-See the README file in that dir for instructions
+See the README file in that dir for instructions on how to build a
-on how to build a Fortran code that uses this
+Fortran code that uses this wrapper and links to the LAMMPS library.
 wrapper and links to the LAMMPS library.
--- a/examples/COUPLE/fortran/libfwrapper.c
+++ b/examples/COUPLE/fortran/libfwrapper.c
@ -22,7 +22,7 @@
 #include "library.h"        /* this is a LAMMPS include file */
 /* wrapper for creating a lammps instance from fortran.
-   since fortran has no simple way to emit a c-compatible
+   since fortran has no simple way to emit a C-compatible
   argument array, we don't support it. for simplicity,
   the address of the pointer to the lammps object is
   stored in a 64-bit integer on all platforms. */
@ -109,6 +109,8 @@ void lammps_get_natoms_(int64_t *ptr, MPI_Fint *natoms)
 /* wrapper to copy coordinates from lammps to fortran */
 /* NOTE: this is now out-of-date, needs to be updated to lammps_gather_atoms()
 void lammps_get_coords_(int64_t *ptr, double *coords)
 {
    void *obj;
@ -117,8 +119,12 @@ void lammps_get_coords_(int64_t *ptr, double *coords)
    lammps_get_coords(obj,coords);
 }
 */
 /* wrapper to copy coordinates from fortran to lammps */
 /* NOTE: this is now out-of-date, needs to be updated to lammps_scatter_atoms()
 void lammps_put_coords_(int64_t *ptr, double *coords)
 {
    void *obj;
@ -127,3 +133,4 @@ void lammps_put_coords_(int64_t *ptr, double *coords)
    lammps_put_coords(obj,coords);
 }
 */
--- a/examples/COUPLE/fortran2/LAMMPS-wrapper.cpp
+++ b/examples/COUPLE/fortran2/LAMMPS-wrapper.cpp
@ -0,0 +1,235 @@
 /* -----------------------------------------------------------------------
    LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
    www.cs.sandia.gov/~sjplimp/lammps.html
    Steve Plimpton, sjplimp@sandia.gov, Sandia National Laboratories
    Copyright (2003) Sandia Corporation.  Under the terms of Contract
    DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
    certain rights in this software.  This software is distributed under 
    the GNU General Public License.
    See the README file in the top-level LAMMPS directory.
 ------------------------------------------------------------------------- */
 /* ------------------------------------------------------------------------
    Contributing author:  Karl D. Hammond <karlh@ugcs.caltech.edu>
                          University of Tennessee, Knoxville (USA), 2012
 ------------------------------------------------------------------------- */
 /* This is set of "wrapper" functions to assist LAMMPS.F90, which itself
   provides a (I hope) robust Fortran interface to library.cpp and
   library.h.  All functions herein COULD be added to library.cpp instead of
   including this as a separate file. See the README for instructions. */
 #include <mpi.h>
 #include "LAMMPS-wrapper.h"
 #include <library.h>
 #include <lammps.h>
 #include <atom.h>
 #include <fix.h>
 #include <compute.h>
 #include <modify.h>
 #include <error.h>
 using namespace LAMMPS_NS;
 void lammps_open_fortran_wrapper (int argc, char **argv,
      MPI_Fint communicator, void **ptr)
 {
   MPI_Comm C_communicator = MPI_Comm_f2c (communicator);
   lammps_open (argc, argv, C_communicator, ptr);
 }
 int lammps_get_ntypes (void *ptr)
 {
  class LAMMPS *lmp = (class LAMMPS *) ptr;
  int ntypes = lmp->atom->ntypes;
  return ntypes;
 }
 void lammps_error_all (void *ptr, const char *file, int line, const char *str)
 {
   class LAMMPS *lmp = (class LAMMPS *) ptr;
   lmp->error->all (file, line, str);
 }
 int lammps_extract_compute_vectorsize (void *ptr, char *id, int style)
 {
   class LAMMPS *lmp = (class LAMMPS *) ptr;
   int icompute = lmp->modify->find_compute(id);
   if ( icompute < 0 ) return 0;
   class Compute *compute = lmp->modify->compute[icompute];
   if ( style == 0 )
   {
      if ( !compute->vector_flag )
         return 0;
      else
         return compute->size_vector;
   }
   else if ( style == 1 )
   {
      return lammps_get_natoms (ptr);
   }
   else if ( style == 2 )
   {
      if ( !compute->local_flag )
         return 0;
      else
         return compute->size_local_rows;
   }
   else
      return 0;
 }
 void lammps_extract_compute_arraysize (void *ptr, char *id, int style,
      int *nrows, int *ncols)
 {
   class LAMMPS *lmp = (class LAMMPS *) ptr;
   int icompute = lmp->modify->find_compute(id);
   if ( icompute < 0 )
   {
      *nrows = 0;
      *ncols = 0;
   }
   class Compute *compute = lmp->modify->compute[icompute];
   if ( style == 0 )
   {
      if ( !compute->array_flag )
      {
         *nrows = 0;
         *ncols = 0;
      }
      else
      {
         *nrows = compute->size_array_rows;
         *ncols = compute->size_array_cols;
      }
   }
   else if ( style == 1 )
   {
      if ( !compute->peratom_flag )
      {
         *nrows = 0;
         *ncols = 0;
      }
      else
      {
         *nrows = lammps_get_natoms (ptr);
         *ncols = compute->size_peratom_cols;
      }
   }
   else if ( style == 2 )
   {
      if ( !compute->local_flag )
      {
         *nrows = 0;
         *ncols = 0;
      }
      else
      {
         *nrows = compute->size_local_rows;
         *ncols = compute->size_local_cols;
      }
   }
   else
   {
      *nrows = 0;
      *ncols = 0;
   }
   return;
 }
 int lammps_extract_fix_vectorsize (void *ptr, char *id, int style)
 {
   class LAMMPS *lmp = (class LAMMPS *) ptr;
   int ifix = lmp->modify->find_fix(id);
   if ( ifix < 0 ) return 0;
   class Fix *fix = lmp->modify->fix[ifix];
   if ( style == 0 )
   {
      if ( !fix->vector_flag )
         return 0;
      else
         return fix->size_vector;
   }
   else if ( style == 1 )
   {
      return lammps_get_natoms (ptr);
   }
   else if ( style == 2 )
   {
      if ( !fix->local_flag )
         return 0;
      else
         return fix->size_local_rows;
   }
   else
      return 0;
 }
 void lammps_extract_fix_arraysize (void *ptr, char *id, int style,
      int *nrows, int *ncols)
 {
   class LAMMPS *lmp = (class LAMMPS *) ptr;
   int ifix = lmp->modify->find_fix(id);
   if ( ifix < 0 )
   {
      *nrows = 0;
      *ncols = 0;
   }
   class Fix *fix = lmp->modify->fix[ifix];
   if ( style == 0 )
   {
      if ( !fix->array_flag )
      {
         *nrows = 0;
         *ncols = 0;
      }
      else
      {
         *nrows = fix->size_array_rows;
         *ncols = fix->size_array_cols;
      }
   }
   else if ( style == 1 )
   {
      if ( !fix->peratom_flag )
      {
         *nrows = 0;
         *ncols = 0;
      }
      else
      {
         *nrows = lammps_get_natoms (ptr);
         *ncols = fix->size_peratom_cols;
      }
   }
   else if ( style == 2 )
   {
      if ( !fix->local_flag )
      {
         *nrows = 0;
         *ncols = 0;
      }
      else
      {
         *nrows = fix->size_local_rows;
         *ncols = fix->size_local_cols;
      }
   }
   else
   {
      *nrows = 0;
      *ncols = 0;
   }
   return;
 }
 /* vim: set ts=3 sts=3 expandtab: */
--- a/examples/COUPLE/fortran2/LAMMPS-wrapper.h
+++ b/examples/COUPLE/fortran2/LAMMPS-wrapper.h
@ -0,0 +1,47 @@
 /* -----------------------------------------------------------------------
    LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
    www.cs.sandia.gov/~sjplimp/lammps.html
    Steve Plimpton, sjplimp@sandia.gov, Sandia National Laboratories
    Copyright (2003) Sandia Corporation.  Under the terms of Contract
    DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
    certain rights in this software.  This software is distributed under 
    the GNU General Public License.
    See the README file in the top-level LAMMPS directory.
 ------------------------------------------------------------------------- */
 /* ------------------------------------------------------------------------
    Contributing author:  Karl D. Hammond <karlh@ugcs.caltech.edu>
                          University of Tennessee, Knoxville (USA), 2012
 ------------------------------------------------------------------------- */
 /* This is set of "wrapper" functions to assist LAMMPS.F90, which itself
   provides a (I hope) robust Fortran interface to library.cpp and
   library.h.  All prototypes herein COULD be added to library.h instead of
   including this as a separate file. See the README for instructions. */
 /* These prototypes probably belong in mpi.h in the src/STUBS directory. */
 #ifndef OPEN_MPI
 #define MPI_Comm_f2c(a) a
 #define MPI_Fint int
 #endif
 #ifdef __cplusplus
 extern "C" {
 #endif
 /* Prototypes for auxiliary functions */
 void lammps_open_fortran_wrapper (int, char**, MPI_Fint, void**);
 int lammps_get_ntypes (void*);
 int lammps_extract_compute_vectorsize (void*, char*, int);
 void lammps_extract_compute_arraysize (void*, char*, int, int*, int*);
 int lammps_extract_fix_vectorsize (void*, char*, int);
 void lammps_extract_fix_arraysize (void*, char*, int, int*, int*);
 void lammps_error_all (void *ptr, const char*, int, const char*);
 #ifdef __cplusplus
 }
 #endif
 /* vim: set ts=3 sts=3 expandtab: */
--- a/examples/COUPLE/fortran2/LAMMPS.F90
+++ b/examples/COUPLE/fortran2/LAMMPS.F90
--- a/examples/COUPLE/fortran2/README
+++ b/examples/COUPLE/fortran2/README
@ -0,0 +1,221 @@
 LAMMPS.F90 defines a Fortran 2003 module, LAMMPS, which wraps all functions in
 src/library.h so they can be used directly from Fortran-encoded programs.
 All functions in src/library.h that use and/or return C-style pointers have
 Fortran wrapper functions that use Fortran-style arrays, pointers, and
 strings; all C-style memory management is handled internally with no user
 intervention.
 This interface was created by Karl Hammond who you can contact with
 questions:
 Karl D. Hammond
 University of Tennessee, Knoxville
 karlh at ugcs.caltech.edu
 karlh atutk.edu
 -------------------------------------
 --COMPILATION--
 First, be advised that mixed-language programming is not trivial.  It requires
 you to link in the required libraries of all languages you use (in this case,
 those for Fortran, C, and C++), as well as any other libraries required.
 You are also advised to read the --USE-- section below before trying to
 compile.
 The following steps will work to compile this module (replace ${LAMMPS_SRC}
 with the path to your LAMMPS source directory):
 (1) Compile LAMMPS as a static library.  Call the resulting file ${LAMMPS_LIB},
     which will have an actual name lake liblammps_openmpi.a.  If compiling
     using the MPI stubs in ${LAMMPS_SRC}/STUBS, you will need to know where
     libmpi.a is as well (I'll call it ${MPI_STUBS} hereafter)
 (2) Copy said library to your Fortran program's source directory or include
     its location in a -L${LAMMPS_SRC} flag to your compiler.
 (3) Compile (but don't link!) LAMMPS.F90.  Example:
        mpif90 -c LAMMPS.f90
     OR
        gfortran -c LAMMPS.F90
     Copy the LAMMPS.o and lammps.mod (or whatever your compiler calls module
     files) to your Fortran program's source directory.
     NOTE:  you may get a warning such as,
        subroutine lammps_open_wrapper (argc, argv, communicator, ptr) &
        Variable 'communicator' at (1) is a parameter to the BIND(C)
        procedure 'lammps_open_wrapper' but may not be C interoperable
     This is normal (see --IMPLEMENTATION NOTES--).
 (4) Compile (but don't link) LAMMPS-wrapper.cpp.  You will need its header
     file as well.  You will have to provide the locations of LAMMPS's
     header files.  For example,
        mpicxx -c -I${LAMMPS_SRC} LAMMPS-wrapper.cpp
     OR
        g++ -c -I${LAMMPS_SRC} -I${LAMMPS_SRC}/STUBS LAMMPS-wrapper.cpp
     OR
        icpc -c -I${LAMMPS_SRC} -I${LAMMPS_SRC}/STUBS LAMMPS-wrapper.cpp
     Copy the resulting object file LAMMPS-wrapper.o to your Fortran program's
     source directory.
 (4b) OPTIONAL:  Make a library so you can carry around two files instead of
     three.  Example:
        ar rs liblammps_fortran.a LAMMPS.o LAMMPS-wrapper.o
     This will create the file liblammps_fortran.a that you can use in place
     of "LAMMPS.o LAMMPS-wrapper.o" in part (6).  Note that you will still
     need to have the .mod file from part (3).
     It is also possible to add LAMMPS.o and LAMMPS-wrapper.o into the
     LAMMPS library (e.g., liblammps_openmpi.a) instead of creating a separate
     library, like so:
        ar rs ${LAMMPS_LIB} LAMMPS.o LAMMPS-wrapper.o
     In this case, you can now use the Fortran wrapper functions as if they
     were part of the usual LAMMPS library interface (if you have the module
     file visible to the compiler, that is).
 (5) Compile your Fortran program.  Example:
        mpif90 -c myfreeformatfile.f90
        mpif90 -c myfixedformatfile.f
     OR
        gfortran -c myfreeformatfile.f90
        gfortran -c myfixedformatfile.f
     The object files generated by these steps are collectively referred to
     as ${my_object_files} in the next step(s).
     IMPORTANT:  If the Fortran module from part (3) is not in the current
     directory or in one searched by the compiler for module files, you will
     need to include that location via the -I flag to the compiler.
 (6) Link everything together, including any libraries needed by LAMMPS (such
     as the C++ standard library, the C math library, the JPEG library, fftw,
     etc.)  For example,
        mpif90 LAMMPS.o LAMMPS-wrapper.o ${my_object_files} \
            ${LAMMPS_LIB} -lstdc++ -lm
     OR
        gfortran LAMMPS.o LAMMPS-wrapper.o ${my_object_files} \
            ${LAMMPS_LIB} ${MPI_STUBS} -lstdc++ -lm
     OR
        ifort LAMMPS.o LAMMPS-wrapper.o ${my_object_files} \
            ${LAMMPS_LIB} ${MPI_STUBS} -cxxlib -limf -lm
     Any other required libraries (e.g. -ljpeg, -lfftw) should be added to
     the end of this line.
 You should now have a working executable.
 Steps 3 and 4 above are accomplished, possibly after some modifications to
 the makefile, by make using the attached makefile.
 -------------------------------------
 --USAGE--
 To use this API, your program unit (PROGRAM/SUBROUTINE/FUNCTION/MODULE/etc.)
 should look something like this:
  program call_lammps
    use LAMMPS
    ! Other modules, etc.
    implicit none
    type (lammps_instance) :: lmp ! This is a pointer to your LAMMPS instance
    double precision :: fix
    double precision, dimension(:), allocatable :: fix2
    ! Rest of declarations
    call lammps_open_no_mpi ('lmp -in /dev/null -screen out.lammps',lmp)
    ! Set up rest of program here
    call lammps_file (lmp, 'in.example')
    call lammps_extract_fix (fix, lmp, '2', 0, 1, 1, 1)
    call lammps_extract_fix (fix2, lmp, '4', 0, 2, 1, 1)
    call lammps_close (lmp)
  end program call_lammps
 Important notes:
 * All arguments which are char* variables in library.cpp are character (len=*)
   variables here.  For example,
        call lammps_command (lmp, 'units metal')
   will work as expected.
 * The public functions (the only ones you can use) have interfaces as
   described in the comments at the top of LAMMPS.F90.  They are not always
   the same as those in library.h, since C strings are replaced by Fortran
   strings and the like.
 * The module attempts to check whether you have done something stupid (such
   as assign a 2D array to a scalar), but it's not perfect.  For example, the
   command
        call lammps_extract_global (nlocal, ptr, 'nlocal')
   will give nlocal correctly if nlocal is of type INTEGER, but it will give
   the wrong answer if nlocal is of type REAL or DOUBLE PRECISION.  This is a
   feature of the (void*) type cast in library.cpp.  There is no way I can
   check this for you!
 * You are allowed to use REAL or DOUBLE PRECISION floating-point numbers.
   All LAMMPS data (which are of type REAL(C_double)) are rounded off if
   placed in single precision variables.  It is tacitly assumed that NO C++
   variables are of type float; everything is int or double (since this is
   all library.cpp currently handles).
 * An example of a complete program is offered at the end of this file.
 -------------------------------------
 --TROUBLESHOOTING--
 Compile-time errors probably indicate that your compiler is not new enough to
 support Fortran 2003 features.  For example, GCC 4.1.2 will not compile this
 module, but GCC 4.4.0 will.
 If your compiler balks at 'use, intrinsic :: ISO_C_binding,' try removing the
 intrinsic part so it looks like an ordinary module.  However, it is likely
 that such a compiler will also have problems with everything else in the
 file as well.
 If you get a segfault as soon as the lammps_open call is made, check that you
 compiled your program AND LAMMPS-header.cpp using the same MPI headers.  Using
 the stubs for one and the actual MPI library for the other will cause major
 problems.
 If you find run-time errors, please pass them along via the LAMMPS Users
 mailing list.  Please provide a minimal working example along with the names
 and versions of the compilers you are using.  Please make sure the error is
 repeatable and is in MY code, not yours (generating a minimal working example
 will usually ensure this anyway).
 -------------------------------------
 --IMPLEMENTATION NOTES--
 The Fortran procedures have the same names as the C procedures, and
 their purpose is the same, but they may take different arguments.  Here are
 some of the important differences:
 * lammps_open and lammps_open_no_mpi take a string instead of argc and
   argv.  This is necessary because C and C++ have a very different way
   of treating strings than Fortran.
 * All C++ functions that accept char* pointers now accept Fortran-style
   strings within this interface instead.
 * All of the lammps_extract_[something] functions, which return void*
   C-style pointers, have been replaced by generic subroutines that return
   Fortran variables (which may be arrays).  The first argument houses the
   variable to be returned; all other arguments are identical except as
   stipulated above.  Note that it is not possible to declare generic
   functions that are selected based solely on the type/kind/rank (TKR)
   signature of the return value, only based on the TKR of the arguments.
 * The SHAPE of the first argument to lammps_extract_[something] is checked
   against the "shape" of the C array (e.g., double vs. double* vs. double**).
   Calling a subroutine with arguments of inappropriate rank will result in an
   error at run time.
 * All arrays passed to subroutines must be ALLOCATABLE and are REALLOCATED
   to fit the shape of the array LAMMPS will be returning.
 * The indices i and j in lammps_extract_fix are used the same way they
   are in f_ID[i][j] references in LAMMPS (i.e., starting from 1).  This is
   different than the way library.cpp uses these numbers, but is more
   consistent with the way arrays are accessed in LAMMPS and in Fortran.
 * The char* pointer normally returned by lammps_command is thrown away
   in this version; note also that lammps_command is now a subroutine
   instead of a function.
 * The pointer to LAMMPS itself is of type(lammps_instance), which is itself
   a synonym for type(C_ptr), part of ISO_C_BINDING.  Type (C_ptr) is
   C's void* data type.  This should be the only C data type that needs to
   be used by the end user.
 * This module will almost certainly generate a compile-time warning,
   such as,
         subroutine lammps_open_wrapper (argc, argv, communicator, ptr) &
      Variable 'communicator' at (1) is a parameter to the BIND(C)
      procedure 'lammps_open_wrapper' but may not be C interoperable
   This happens because lammps_open_wrapper actually takes a Fortran
   INTEGER argument, whose type is defined by the MPI library itself.  The
   Fortran integer is converted to a C integer by the MPI library (if such
   conversion is actually necessary).
 * Unlike library.cpp, this module returns COPIES of the data LAMMPS actually
   uses.  This is done for safety reasons, as you should, in general, not be
   overwriting LAMMPS data directly from Fortran.  If you require this
   functionality, it is possible to write another function that, for example,
   returns a Fortran pointer that resolves to the C/C++ data instead of
   copying the contents of that pointer to the original array as is done now.
--- a/examples/COUPLE/fortran2/in.simple
+++ b/examples/COUPLE/fortran2/in.simple
@ -0,0 +1,15 @@
 units metal
 lattice bcc 3.1656
 region simbox block 0 10 0 10 0 10
 create_box 2 simbox
 create_atoms 1 region simbox
 pair_style eam/fs
 pair_coeff * * path/to/my_potential.eam.fs A1 A2
 mass 1 58.2 # These are made-up numbers
 mass 2 28.3
 velocity all create 1200.0 7474848 dist gaussian
 fix 1 all nve
 fix 2 all dt/reset 1 1E-5 1E-3 0.01 units box
 fix 4 all ave/histo 10 5 100 0.5 1.5 50 f_2 file temp.histo ave running
 thermo_style custom step dt temp press etotal f_4[1][1]
 thermo 100
--- a/examples/COUPLE/fortran2/makefile
+++ b/examples/COUPLE/fortran2/makefile
@ -0,0 +1,33 @@
 SHELL = /bin/sh
 # Path to LAMMPS extraction directory
 LAMMPS_ROOT = ../svn-dist
 LAMMPS_SRC = $(LAMMPS_ROOT)/src
 # Remove the line below if using mpicxx/mpic++ as your C++ compiler
 MPI_STUBS = $(LAMMPS_SRC)/STUBS
 FC = gfortran # replace with your Fortran compiler
 CXX = g++     # replace with your C++ compiler
 # Flags for Fortran compiler, C++ compiler, and C preprocessor, respectively
 FFLAGS = -O2
 CXXFLAGS = -O2
 CPPFLAGS =
 all : liblammps_fortran.a
 liblammps_fortran.a : LAMMPS.o LAMMPS-wrapper.o
 	$(AR) rs $@ $^
 LAMMPS.o lammps.mod : LAMMPS.F90
 	$(FC) $(CPPFLAGS) $(FFLAGS) -c $<
 LAMMPS-wrapper.o : LAMMPS-wrapper.cpp LAMMPS-wrapper.h
 	$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c $< -I$(LAMMPS_SRC) -I$(MPI_STUBS)
 clean :
 	$(RM) *.o *.mod liblammps_fortran.a
 dist :
 	tar -czf Fortran-interface.tar.gz LAMMPS-wrapper.h LAMMPS-wrapper.cpp LAMMPS.F90 makefile README
--- a/examples/COUPLE/fortran2/simple.f90
+++ b/examples/COUPLE/fortran2/simple.f90
@ -0,0 +1,44 @@
 program simple
   use LAMMPS
   implicit none
   type (lammps_instance) :: lmp
   double precision :: compute, fix, fix2
   double precision, dimension(:), allocatable :: compute_v, mass, r
   double precision, dimension(:,:), allocatable :: x
   real, dimension(:,:), allocatable :: x_r
   call lammps_open_no_mpi ('',lmp)
   call lammps_file (lmp, 'in.simple')
   call lammps_command (lmp, 'run 500')
   call lammps_extract_fix (fix, lmp, '2', 0, 1, 1, 1)
   print *, 'Fix is ', fix
   call lammps_extract_fix (fix2, lmp, '4', 0, 2, 1, 1)
   print *, 'Fix 2 is ', fix2
   call lammps_extract_compute (compute, lmp, 'thermo_temp', 0, 0)
   print *, 'Compute is ', compute
   call lammps_extract_compute (compute_v, lmp, 'thermo_temp', 0, 1)
   print *, 'Vector is ', compute_v
   call lammps_extract_atom (mass, lmp, 'mass')
   print *, 'Mass is ', mass
   call lammps_extract_atom (x, lmp, 'x')
   if ( .not. allocated (x) ) print *, 'x is not allocated'
   print *, 'x is ', x(1,:)
   call lammps_extract_atom (x_r, lmp, 'x')
   if ( .not. allocated (x_r) ) print *, 'x is not allocated'
   print *, 'x_r is ', x_r(1,:)
   call lammps_get_coords (lmp, r)
   print *, 'r is ', r(1:3)
   call lammps_close (lmp)
 end program simple
--- a/examples/COUPLE/simple/README
+++ b/examples/COUPLE/simple/README
@ -35,7 +35,8 @@ gcc -L/home/sjplimp/lammps/src simple.o \
    -llmp_g++ -lfftw -lmpich -lmpl -lpthread -lstdc++ -o simpleC
 This builds the Fortran wrapper and driver with the LAMMPS library
-using a Fortran and C compiler:
+using a Fortran and C compiler, using the wrapper in the fortran
 directory:
 cp ../fortran/libfwrapper.c .
 gcc -I/home/sjplimp/lammps/src -c libfwrapper.c
--- a/examples/COUPLE/simple/simple.c
+++ b/examples/COUPLE/simple/simple.c
@ -99,10 +99,10 @@ int main(int narg, char **arg)
    int natoms = lammps_get_natoms(ptr);
    double *x = (double *) malloc(3*natoms*sizeof(double));
-    lammps_get_coords(ptr,x);
+    lammps_gather_atoms(lmp,"x",1,3,x);
    double epsilon = 0.1;
    x[0] += epsilon;
-    lammps_put_coords(ptr,x);
+    lammps_scatter_atoms(lmp,"x",1,3,x);
    free(x);
    lammps_command(ptr,"run 1");
--- a/examples/COUPLE/simple/simple.cpp
+++ b/examples/COUPLE/simple/simple.cpp
@ -23,6 +23,7 @@
 #include "stdlib.h"
 #include "string.h"
 #include "mpi.h"
 #include "lammps.h"         // these are LAMMPS include files
 #include "input.h"
 #include "atom.h"
@ -104,10 +105,10 @@ int main(int narg, char **arg)
    int natoms = static_cast<int> (lmp->atom->natoms);
    double *x = new double[3*natoms];
-    lammps_get_coords(lmp,x);          // no LAMMPS class function for this
+    lammps_gather_atoms(lmp,"x",1,3,x);
    double epsilon = 0.1;
    x[0] += epsilon;
-    lammps_put_coords(lmp,x);          // no LAMMPS class function for this
+    lammps_scatter_atoms(lmp,"x",1,3,x);
    delete [] x;
    lmp->input->one("run 1");
--- a/examples/COUPLE/simple/simple.f90
+++ b/examples/COUPLE/simple/simple.f90
@ -115,9 +115,9 @@ PROGRAM f_driver
     CALL lammps_get_natoms(ptr,natoms)
     ALLOCATE(x(3*natoms))
-     CALL lammps_get_coords(ptr,x)
+     CALL lammps_gather_atoms(ptr,'x',1,3,x);
     x(1) = x(1) + epsilon
-     CALL lammps_put_coords(ptr,x)
+     CALL lammps_scatter_atoms(ptr,'x',1,3,x);
     DEALLOCATE(x)
--- a/lib/atc/Makefile.g++
+++ b/lib/atc/Makefile.g++
@ -98,7 +98,7 @@ OBJ =   $(SRC:.cpp=.o)
 # the same MPI library that LAMMPS is built with
 CC =	        g++
-CCFLAGS =       -O -g -I../../src -DMPICH_IGNORE_CXX_SEEK
+CCFLAGS =       -O -g -fPIC -I../../src -DMPICH_IGNORE_CXX_SEEK
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 DEPFLAGS =      -M
--- a/lib/atc/Makefile.icc
+++ b/lib/atc/Makefile.icc
@ -98,7 +98,7 @@ OBJ =   $(SRC:.cpp=.o)
 # the same MPI library that LAMMPS is built with
 CC =	        icc
-CCFLAGS =       -O -g -I../../src -DMPICH_IGNORE_CXX_SEEK
+CCFLAGS =       -O -g -fPIC -I../../src -DMPICH_IGNORE_CXX_SEEK
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 DEPFLAGS =      -M
--- a/lib/atc/Makefile.serial
+++ b/lib/atc/Makefile.serial
@ -98,7 +98,7 @@ OBJ =   $(SRC:.cpp=.o)
 # the same MPI library that LAMMPS is built with
 CC =	        g++
-CCFLAGS =       -O -g -I../../src -I../../src/STUBS
+CCFLAGS =       -O -g -fPIC -I../../src -I../../src/STUBS
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 DEPFLAGS =      -M
--- a/lib/awpmd/Makefile.openmpi
+++ b/lib/awpmd/Makefile.openmpi
@ -33,7 +33,7 @@ OBJ =   $(SRC:.cpp=.o)
 # the same MPI library that LAMMPS is built with
 CC =	    mpic++ 
-CCFLAGS = -O -Isystems/interact/TCP/ -Isystems/interact -Iivutils/include
+CCFLAGS = -O -fPIC -Isystems/interact/TCP/ -Isystems/interact -Iivutils/include
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 DEPFLAGS =  -M
--- a/lib/colvars/Makefile.g++
+++ b/lib/colvars/Makefile.g++
@ -3,7 +3,7 @@
 # ------ SETTINGS ------
 CXX =		g++
-CXXFLAGS =	-O2 -g -funroll-loops # -DCOLVARS_DEBUG
+CXXFLAGS =	-O2 -g -fPIC -funroll-loops # -DCOLVARS_DEBUG
 ARCHIVE =	ar
 ARCHFLAG =	-rscv
 SHELL =		/bin/sh
--- a/lib/linalg/Makefile.gfortran
+++ b/lib/linalg/Makefile.gfortran
@ -27,7 +27,7 @@ OBJ =   $(SRC:.f=.o)
 # ------ SETTINGS ------
 FC =           gfortran
-FFLAGS =      -O3 -march=native -mpc64  \
+FFLAGS =      -O3 -fPIC -march=native -mpc64  \
         -ffast-math -funroll-loops -fstrict-aliasing -Wall -W -Wno-uninitialized -fno-second-underscore
 FFLAGS0 =      -O0 -march=native -mpc64  \
         -Wall -W -Wno-uninitialized -fno-second-underscore
--- a/lib/meam/Makefile.g95
+++ b/lib/meam/Makefile.g95
@ -23,7 +23,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           g95
-F90FLAGS =      -O
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/lib/meam/Makefile.gfortran
+++ b/lib/meam/Makefile.gfortran
@ -29,7 +29,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           gfortran
-F90FLAGS =      -O2 -ffast-math -ftree-vectorize -fexpensive-optimizations -fno-second-underscore
+F90FLAGS =      -O2 -fPIC -ffast-math -ftree-vectorize -fexpensive-optimizations -fno-second-underscore
 #F90FLAGS =      -O 
 ARCHIVE =	ar
 ARCHFLAG =	-rc
--- a/lib/meam/Makefile.ifort
+++ b/lib/meam/Makefile.ifort
@ -23,7 +23,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           ifort
-F90FLAGS =      -O
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/lib/meam/Makefile.pgf90
+++ b/lib/meam/Makefile.pgf90
@ -23,7 +23,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           pgf90
-F90FLAGS =      -O
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/lib/meam/Makefile.tbird
+++ b/lib/meam/Makefile.tbird
@ -32,7 +32,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           mpif90
-F90FLAGS =      -O 
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 LINK =         	g++
--- a/lib/poems/Makefile.g++
+++ b/lib/poems/Makefile.g++
@ -67,7 +67,7 @@ OBJ =   $(SRC:.cpp=.o)
 # ------ SETTINGS ------
 CC =	        g++
-CCFLAGS =       -O2 -Wall -W -funroll-loops -ffast-math -fexpensive-optimizations -finline-functions -fno-rtti -fno-exceptions -Wall #-Wno-deprecated
+CCFLAGS =       -O2 -fPIC -Wall -W -funroll-loops -ffast-math -fexpensive-optimizations -finline-functions -fno-rtti -fno-exceptions -Wall #-Wno-deprecated
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 DEPFLAGS =      -M
--- a/lib/poems/Makefile.icc
+++ b/lib/poems/Makefile.icc
@ -67,7 +67,7 @@ OBJ =   $(SRC:.cpp=.o)
 # ------ SETTINGS ------
 CC =		icc
-CCFLAGS =       -O -Wall -Wcheck -wd869,981,1572
+CCFLAGS =       -O -fPIC -Wall -Wcheck -wd869,981,1572
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 DEPFLAGS =      -M
--- a/lib/poems/Makefile.storm
+++ b/lib/poems/Makefile.storm
@ -67,7 +67,7 @@ OBJ =   $(SRC:.cpp=.o)
 # ------ SETTINGS ------
 CC =	        CC
-CCFLAGS =       -O -g -Wall #-Wno-deprecated
+CCFLAGS =       -O -fPIC -g -Wall #-Wno-deprecated
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 DEPFLAGS =      -M
--- a/lib/reax/Makefile.g77
+++ b/lib/reax/Makefile.g77
@ -39,7 +39,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           g77
-F90FLAGS =      -O 
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/lib/reax/Makefile.g95
+++ b/lib/reax/Makefile.g95
@ -39,7 +39,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           g95
-F90FLAGS =      -O 
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/lib/reax/Makefile.gfortran
+++ b/lib/reax/Makefile.gfortran
@ -43,7 +43,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           gfortran
-F90FLAGS =      -O3 -Wall -march=native -mpc64 -ffast-math -funroll-loops -fno-second-underscore
+F90FLAGS =      -O3 -Wall -march=native -mpc64 -ffast-math -funroll-loops -fno-second-underscore -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/lib/reax/Makefile.ifort
+++ b/lib/reax/Makefile.ifort
@ -43,7 +43,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           ifort
-F90FLAGS =      -O 
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/lib/reax/Makefile.pgf90
+++ b/lib/reax/Makefile.pgf90
@ -39,7 +39,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           pgf90
-F90FLAGS =      -O 
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/lib/reax/Makefile.redsky
+++ b/lib/reax/Makefile.redsky
@ -44,7 +44,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           mpif90
-F90FLAGS =      -O 
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/lib/reax/Makefile.tbird
+++ b/lib/reax/Makefile.tbird
@ -43,7 +43,7 @@ OBJ =   $(SRC:.F=.o)
 # ------ SETTINGS ------
 F90 =           mpif90
-F90FLAGS =      -O 
+F90FLAGS =      -O -fPIC
 ARCHIVE =	ar
 ARCHFLAG =	-rc
 USRLIB =
--- a/python/README
+++ b/python/README
@ -3,41 +3,29 @@ and allows the LAMMPS library interface to be invoked from Python,
 either from a script or interactively.
 Details on the Python interface to LAMMPS and how to build LAMMPS as a
-shared library for use with Python are given in
+shared library, for use with Python, are given in
-doc/Section_python.html.
+doc/Section_python.html and in doc/Section_start.html#start_5.
-Basically you need to follow these 3 steps:
+Basically you need to follow these steps in the src directory:
-a) Add paths to environment variables in your shell script
+% make makeshlib                # creates Makefile.shlib
 % make -f Makefile.shlib g++    # or whatever machine target you wish
 % make install-python           # may need to do this via sudo
-For example, for csh or tcsh, add something like this to ~/.cshrc:
+You can replace the last step with running the python/install.py
 script directly to give you more control over where two relevant files
 are installed, or by setting environment variables in your shell
 script.  See doc/Section_python.html for details.
-setenv PYTHONPATH ${PYTHONPATH}:/home/sjplimp/lammps/python
+You can then launch Python and instantiate an instance of LAMMPS:
 setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/home/sjplimp/lammps/src
 setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/home/sjplimp/lammps/src/STUBS
 The latter is only necessary if you will use the MPI stubs library
 instead of an MPI installed on your machine.
 b) Build LAMMPS as a dynamic library, including dynamic versions of
 any libraries it includes for the packages you have installed,
 e.g. STUBS, MPI, FFTW, JPEG, package libs.
 From the src directory:
 % make makeshlib
 % make -f Makefile.shlib g++
 If successful, this results in the file src/liblmp_g++.so
 c) Launch Python and import the LAMMPS wrapper
 % python
 >>> from lammps import lammps
 >>> lmp = lammps()
 If that gives no errors, you have succesfully wrapped LAMMPS with
-Python.
+Python.  See doc/Section_python.html#py_5 for tests you can then use
 to run LAMMPS both in serial or parallel thru Python.
 -------------------------------------------------------------------
--- a/python/examples/simple.py
+++ b/python/examples/simple.py
@ -18,6 +18,7 @@ if len(argv) != 2:
 infile = sys.argv[1]
 me = 0
 # uncomment if running in parallel via Pypar
 #import pypar
 #me = pypar.rank()
@ -38,12 +39,11 @@ for line in lines: lmp.command(line)
 # run a single step with changed coords
 lmp.command("run 10")
-x = lmp.get_coords()
+x = lmp.gather_atoms("x",1,3)
 epsilon = 0.1
 x[0] += epsilon
-lmp.put_coords(x)
+lmp.scatter_atoms("x",1,3,x)
 lmp.command("run 1");
 lmp.command("run 1")
 # uncomment if running in parallel via Pypar
 #print "Proc %d out of %d procs has" % (me,nprocs), lmp
--- a/python/install.py
+++ b/python/install.py
@ -0,0 +1,35 @@
 #!/usr/local/bin/python
 # copy LAMMPS shared library src/liblammps.so and lammps.py to system dirs
 # Syntax: python install.py [libdir] [pydir]
 #         libdir = target dir for src/liblammps.so, default = /usr/local/lib
 #         pydir = target dir for lammps.py, default = Python site-packages dir
 import sys,commands
 if len(sys.argv) > 3:
  print "Syntax: python install.py [libdir] [pydir]"
  sys.exit()
 if len(sys.argv) >= 2: libdir = sys.argv[1]
 else: libdir = "/usr/local/lib"
 if len(sys.argv) == 3: pydir = sys.argv[2]
 else:
  paths = sys.path
  for i,path in enumerate(paths):
    index = path.rfind("site-packages")
    if index < 0: continue
    if index == len(path) - len("site-packages"): break
  pydir = paths[i]
 str = "cp ../src/liblammps.so %s" % libdir
 print str
 outstr = commands.getoutput(str)
 if len(outstr.strip()): print outstr
 str = "cp ../python/lammps.py %s" % pydir
 print str
 outstr = commands.getoutput(str)
 if len(outstr.strip()): print outstr
--- a/python/lammps.py
+++ b/python/lammps.py
@ -17,23 +17,15 @@ import types
 from ctypes import *
 import os.path
 LMPINT = 0
 LMPDOUBLE = 1
 LMPIPTR = 2
 LMPDPTR = 3
 LMPDPTRPTR = 4
 LOCATION = os.path.dirname(__file__)
 class lammps:
-  def __init__(self,name="",cmdlineargs=None):
+  def __init__(self,name="",cmdargs=None):
-    # load liblmp.so by default
+    # load liblammps.so by default
-    # if name = "g++", load liblmp_g++.so
+    # if name = "g++", load liblammps_g++.so
    try:
-      if not name: self.lib = CDLL("liblmp.so")
+      if not name: self.lib = CDLL("liblammps.so")
-      else: self.lib = CDLL("liblmp_%s.so" % name)
+      else: self.lib = CDLL("liblammps_%s.so" % name)
    except:
      raise OSError,"Could not load LAMMPS dynamic library"
@ -42,10 +34,10 @@ class lammps:
    # no_mpi call lets LAMMPS use MPI_COMM_WORLD
    # cargs = array of C strings from args
-    if cmdlineargs:
+    if cmdargs:
-      cmdlineargs.insert(0,"lammps.py")
+      cmdargs.insert(0,"lammps.py")
-      narg = len(cmdlineargs)
+      narg = len(cmdargs)
-      cargs = (c_char_p*narg)(*cmdlineargs)
+      cargs = (c_char_p*narg)(*cmdargs)
      self.lmp = c_void_p()
      self.lib.lammps_open_no_mpi(narg,cargs,byref(self.lmp))
    else:
@ -68,30 +60,26 @@ class lammps:
    self.lib.lammps_command(self.lmp,cmd)
  def extract_global(self,name,type):
-    if type == LMPDOUBLE:
+    if type == 0:
      self.lib.lammps_extract_global.restype = POINTER(c_double)
      ptr = self.lib.lammps_extract_global(self.lmp,name)
      return ptr[0]
    if type == LMPINT:
      self.lib.lammps_extract_global.restype = POINTER(c_int)
    elif type == 1:
      self.lib.lammps_extract_global.restype = POINTER(c_double)
    else: return None
    ptr = self.lib.lammps_extract_global(self.lmp,name)
    return ptr[0]
    return None
  def extract_atom(self,name,type):
-    if type == LMPDPTRPTR:
+    if type == 0:
      self.lib.lammps_extract_atom.restype = POINTER(POINTER(c_double))
      ptr = self.lib.lammps_extract_atom(self.lmp,name)
      return ptr
    if type == LMPDPTR:
      self.lib.lammps_extract_atom.restype = POINTER(c_double)
      ptr = self.lib.lammps_extract_atom(self.lmp,name)
      return ptr
    if type == LMPIPTR:
      self.lib.lammps_extract_atom.restype = POINTER(c_int)
    elif type == 1:
      self.lib.lammps_extract_atom.restype = POINTER(POINTER(c_int))
    elif type == 2:
      self.lib.lammps_extract_atom.restype = POINTER(c_double)
    elif type == 3:
      self.lib.lammps_extract_atom.restype = POINTER(POINTER(c_double))
    else: return None
    ptr = self.lib.lammps_extract_atom(self.lmp,name)
    return ptr
    return None
  def extract_compute(self,id,style,type):
    if type == 0:
@ -153,18 +141,26 @@ class lammps:
      return result
    return None
  # return total number of atoms in system
  def get_natoms(self):
    return self.lib.lammps_get_natoms(self.lmp)
-  def get_coords(self):
+  # return vector of atom properties gathered across procs, ordered by atom ID
    nlen = 3 * self.lib.lammps_get_natoms(self.lmp)
    coords = (c_double*nlen)()
    self.lib.lammps_get_coords(self.lmp,coords)
    return coords
-  # assume coords is an array of c_double, as created by get_coords()
+  def gather_atoms(self,name,type,count):
-  # could check if it is some other Python object and create c_double array?
+    natoms = self.lib.lammps_get_natoms(self.lmp)
-  # constructor for c_double array can take an arg to use to fill it?
+    if type == 0:
      data = ((count*natoms)*c_double)()
      self.lib.lammps_gather_atoms(self.lmp,name,type,count,data)
    elif type == 1:
      data = ((count*natoms)*c_double)()
      self.lib.lammps_gather_atoms(self.lmp,name,type,count,data)
    else: return None
    return data
-  def put_coords(self,coords):
+  # scatter vector of atom properties across procs, ordered by atom ID
-    self.lib.lammps_put_coords(self.lmp,coords)
+  # assume vector is of correct type and length, as created by gather_atoms()
  def scatter_atoms(self,name,type,count,data):
    self.lib.lammps_scatter_atoms(self.lmp,name,type,count,data)
--- a/src/MAKE/Makefile.g++
+++ b/src/MAKE/Makefile.g++
@ -85,6 +85,9 @@ $(EXE):	$(OBJ)
 lib:	$(OBJ)
 	$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)
 #shlib:	$(OBJ)
 #	$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)
 shlib:	$(OBJ)
 	$(CC) $(CCFLAGS) $(SHFLAGS) $(SHLIBFLAGS) $(EXTRA_PATH) -o $(EXE) \
        $(OBJ) $(EXTRA_LIB) $(LIB)
--- a/src/MAKE/Makefile.redsky
+++ b/src/MAKE/Makefile.redsky
@ -2,25 +2,17 @@
 SHELL = /bin/sh
-# this Makefile builds LAMMPS for RedSky with OpenMPI
+# This Makefile builds LAMMPS for RedSky with OpenMPI.
-# to invoke this Makefile, you need these modules loaded:
+# To use this Makefile, you need appropriate modules loaded.
-#   mpi/openmpi-1.4.1_oobpr_intel-11.1-f064-c064
+# You can determine which modules are loaded by typing:
 #   misc/env-openmpi-1.4-oobpr
 #   compilers/intel-11.1-f064-c064
 #   libraries/intel-mkl-11.1.064
 #   libraries/fftw-2.1.5_openmpi-1.4.1_oobpr_intel-11.1-f064-c064
 # you can determine which modules are loaded by typing:
 #   module list
-# these modules are not the default ones, but can be enabled by 
+# These modules can be enabled by lines like this in your .cshrc or 
-#   lines like this in your .cshrc or other start-up shell file
+# other start-up shell file or by typing them before you build LAMMPS:
-#   or by typing them before you build LAMMPS:
+#   module load mpi/openmpi-1.4.2_oobpr_intel-11.1-f064-c064
 #     module load mpi/openmpi-1.4.3_oobpr_intel-11.1-f064-c064
 #     module load misc/env-openmpi-1.4-oobpr
 #     module load compilers/intel-11.1-f064-c064
 #   module load libraries/intel-mkl-11.1.064
-#     module load libraries/fftw-2.1.5_openmpi-1.4.3_oobpr_intel-11.1-f064-c064
+#   module load libraries/fftw-2.1.5_openmpi-1.4.2_oobpr_intel-11.1-f064-c064
-# these same modules need to be loaded to submit a LAMMPS job,
+# These same modules need to be loaded to submit a LAMMPS job,
-#   either interactively or via a batch script
+#   either interactively or via a batch script.
 # IMPORTANT NOTE:
 # to run efficiently on RedSky, use the "numa_wrapper" mpiexec option,
--- a/src/Makefile
+++ b/src/Makefile
@ -38,10 +38,15 @@ help:
 	@echo ''
 	@echo 'make clean-all           delete all object files'
 	@echo 'make clean-machine       delete object files for one machine'
-	@echo 'make tar                 lmp_src.tar.gz of src dir and packages'
+	@echo 'make tar                 create lmp_src.tar.gz of src dir and packages'
-	@echo 'make makelib             update Makefile.lib for static library build'
+	@echo 'make makelib             create Makefile.lib for static library build'
-	@echo 'make makeshlib           update Makefile.shlib for shared library build'
+	@echo 'make makeshlib           create Makefile.shlib for shared library build'
-	@echo 'make makelist            update Makefile.list used by old makes'
+	@echo 'make makelist            create Makefile.list used by old makes'
 	@echo 'make -f Makefile.lib machine      build LAMMPS as static library for machine'
 	@echo 'make -f Makefile.shlib machine    build LAMMPS as shared library for machine'
 	@echo 'make -f Makefile.list machine     build LAMMPS from explicit list of files'
 	@echo 'make stubs               build dummy MPI library in STUBS'
 	@echo 'make install-python      install LAMMPS wrapper in Python'
 	@echo ''
 	@echo 'make package             list available packages'
 	@echo 'make package-status      status of all packages'
@ -106,12 +111,12 @@ tar:
 	@cd STUBS; make
 	@echo "Created $(ROOT)_src.tar.gz"
-# Make MPI STUBS lib
+# Make MPI STUBS library
 stubs:
 	@cd STUBS; make clean; make
-# Update Makefile.lib and Makefile.list
+# Create Makefile.lib, Makefile.shlib, and Makefile.list
 makelib:
 	@$(SHELL) Make.sh style
@ -125,6 +130,11 @@ makelist:
 	@$(SHELL) Make.sh style
 	@$(SHELL) Make.sh Makefile.list
 # install LAMMPS shared lib and Python wrapper in Python
 install-python:
 	@python ../python/install.py
 # Package management
 package:
--- a/src/Makefile.lib
+++ b/src/Makefile.lib
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
--- a/src/REPLICA/prd.cpp
+++ b/src/REPLICA/prd.cpp
@ -482,7 +482,7 @@ void PRD::dynamics()
  update->integrate->setup();
  // this may be needed if don't do full init
  //modify->addstep_compute_all(update->ntimestep);
-  int ncalls = neighbor->ncalls;
+  bigint ncalls = neighbor->ncalls;
  timer->barrier_start(Timer::LOOP);
  update->integrate->run(t_event);
--- a/src/REPLICA/prd.h
+++ b/src/REPLICA/prd.h
@ -39,8 +39,8 @@ class PRD : protected Pointers {
  int equal_size_replicas,natoms;
  int neigh_every,neigh_delay,neigh_dist_check;
  int nbuild,ndanger;
  int quench_reneighbor;
  bigint nbuild,ndanger;
  double time_dephase,time_dynamics,time_quench,time_comm,time_output;
  double time_start;
--- a/src/REPLICA/tad.h
+++ b/src/REPLICA/tad.h
@ -42,8 +42,8 @@ class TAD : protected Pointers {
  int event_first;
  int neigh_every,neigh_delay,neigh_dist_check;
  int nbuild,ndanger;
  int quench_reneighbor;
  bigint nbuild,ndanger;
  double time_dynamics,time_quench,time_neb,time_comm,time_output;
  double time_start;
--- a/src/STUBS/Makefile
+++ b/src/STUBS/Makefile
@ -1,8 +1,7 @@
 # Makefile for MPI stubs library
 # Syntax:
-#   make                 # build static lib as libmpi_stubs.a
+#   make                 # build lib as libmpi_stubs.a
 #   make shlib           # build shared lib as libmpi_stubs.so
 #   make clean           # remove *.o and lib files
 # edit System-specific settings as needed for your platform
@ -18,34 +17,27 @@ INC =		mpi.h
 # Definitions
 EXE =		libmpi_stubs.a
 SHLIB =		libmpi_stubs.so
 OBJ = 		$(SRC:.c=.o)
 # System-specific settings
 CC =		g++
-CCFLAGS =	-O
+CCFLAGS =	-O -fPIC
 SHFLAGS =	-fPIC
 ARCHIVE =	ar
 ARCHFLAG =	rs
 SHLIBFLAGS =    -shared          
 # Targets
 lib:	$(OBJ)
 	$(ARCHIVE) $(ARCHFLAG) $(EXE) $(OBJ)
 shlib:	$(OBJ)
 	$(CC) $(CFLAGS) $(SHFLAGS) $(SHLIBFLAGS) -o $(SHLIB) $(OBJ)
 clean:
-	rm -f *.o libmpi_stubs.a libmpi_stubs.so
+	rm -f *.o libmpi_stubs.a
 # Compilation rules
 .c.o:
-	$(CC) $(CCFLAGS) $(SHFLAGS) -c $<
+	$(CC) $(CCFLAGS) -c $<
 # Individual dependencies
--- a/src/USER-CUDA/cuda.cpp
+++ b/src/USER-CUDA/cuda.cpp
@ -48,30 +48,42 @@ using namespace LAMMPS_NS;
-Cuda::Cuda(LAMMPS *lmp) : Pointers(lmp)
+Cuda::Cuda(LAMMPS* lmp) : Pointers(lmp)
 {
-        cuda_exists=true;
+  cuda_exists = true;
-        lmp->cuda=this;
+  lmp->cuda = this;
-        if(universe->me==0)
+
  if(universe->me == 0)
    printf("# Using LAMMPS_CUDA \n");
-        shared_data.me=universe->me;
+
-        device_set=false;
+  shared_data.me = universe->me;
  device_set = false;
  Cuda_Cuda_GetCompileSettings(&shared_data);
-        if(shared_data.compile_settings.prec_glob!=static_cast<int>(sizeof(CUDA_FLOAT))/4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: Global Precision: cuda %i cpp %i\n\n",shared_data.compile_settings.prec_glob, static_cast<int>(sizeof(CUDA_FLOAT))/4);
+  if(shared_data.compile_settings.prec_glob != sizeof(CUDA_FLOAT) / 4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: Global Precision: cuda %i cpp %i\n\n", shared_data.compile_settings.prec_glob, sizeof(CUDA_FLOAT) / 4);
        if(shared_data.compile_settings.prec_x!=static_cast<int>(sizeof(X_FLOAT))/4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: X Precision: cuda %i cpp %i\n\n",shared_data.compile_settings.prec_x, static_cast<int>(sizeof(X_FLOAT))/4);
        if(shared_data.compile_settings.prec_v!=static_cast<int>(sizeof(V_FLOAT))/4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: V Precision: cuda %i cpp %i\n\n",shared_data.compile_settings.prec_v, static_cast<int>(sizeof(V_FLOAT))/4);
        if(shared_data.compile_settings.prec_f!=static_cast<int>(sizeof(F_FLOAT))/4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: F Precision: cuda %i cpp %i\n\n",shared_data.compile_settings.prec_f, static_cast<int>(sizeof(F_FLOAT))/4);
        if(shared_data.compile_settings.prec_pppm!=static_cast<int>(sizeof(PPPM_FLOAT))/4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: PPPM Precision: cuda %i cpp %i\n\n",shared_data.compile_settings.prec_pppm, static_cast<int>(sizeof(PPPM_FLOAT))/4);
        if(shared_data.compile_settings.prec_fft!=static_cast<int>(sizeof(FFT_FLOAT))/4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: FFT Precision: cuda %i cpp %i\n\n",shared_data.compile_settings.prec_fft, static_cast<int>(sizeof(FFT_FLOAT))/4);
    #ifdef FFT_CUFFT
      if(shared_data.compile_settings.cufft!=1) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: cufft: cuda %i cpp %i\n\n",shared_data.compile_settings.cufft, 1);
    #else
      if(shared_data.compile_settings.cufft!=0) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: cufft: cuda %i cpp %i\n\n",shared_data.compile_settings.cufft, 0);
    #endif
-    if(shared_data.compile_settings.arch!=CUDA_ARCH)  printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: arch: cuda %i cpp %i\n\n",shared_data.compile_settings.cufft, CUDA_ARCH);
+  if(shared_data.compile_settings.prec_x != sizeof(X_FLOAT) / 4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: X Precision: cuda %i cpp %i\n\n", shared_data.compile_settings.prec_x, sizeof(X_FLOAT) / 4);
  if(shared_data.compile_settings.prec_v != sizeof(V_FLOAT) / 4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: V Precision: cuda %i cpp %i\n\n", shared_data.compile_settings.prec_v, sizeof(V_FLOAT) / 4);
  if(shared_data.compile_settings.prec_f != sizeof(F_FLOAT) / 4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: F Precision: cuda %i cpp %i\n\n", shared_data.compile_settings.prec_f, sizeof(F_FLOAT) / 4);
  if(shared_data.compile_settings.prec_pppm != sizeof(PPPM_FLOAT) / 4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: PPPM Precision: cuda %i cpp %i\n\n", shared_data.compile_settings.prec_pppm, sizeof(PPPM_FLOAT) / 4);
  if(shared_data.compile_settings.prec_fft != sizeof(FFT_FLOAT) / 4) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: FFT Precision: cuda %i cpp %i\n\n", shared_data.compile_settings.prec_fft, sizeof(FFT_FLOAT) / 4);
 #ifdef FFT_CUFFT
  if(shared_data.compile_settings.cufft != 1) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: cufft: cuda %i cpp %i\n\n", shared_data.compile_settings.cufft, 1);
 #else
  if(shared_data.compile_settings.cufft != 0) printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: cufft: cuda %i cpp %i\n\n", shared_data.compile_settings.cufft, 0);
 #endif
  if(shared_data.compile_settings.arch != CUDA_ARCH)  printf("\n\n # CUDA WARNING: Compile Settings of cuda and cpp code differ! \n # CUDA WARNING: arch: cuda %i cpp %i\n\n", shared_data.compile_settings.cufft, CUDA_ARCH);
  cu_x          = 0;
  cu_v          = 0;
@ -111,14 +123,14 @@ Cuda::Cuda(LAMMPS *lmp) : Pointers(lmp)
  cu_map_array = 0;
-        copy_buffer=0;
+  copy_buffer = 0;
-        copy_buffersize=0;
+  copy_buffersize = 0;
-        neighbor_decide_by_integrator=0;
+  neighbor_decide_by_integrator = 0;
-        pinned=true;
+  pinned = true;
-        debugdata=0;
+  debugdata = 0;
-        new int[2*CUDA_MAX_DEBUG_SIZE];
+  new int[2 * CUDA_MAX_DEBUG_SIZE];
  finished_setup = false;
  begin_setup = false;
@ -126,16 +138,16 @@ Cuda::Cuda(LAMMPS *lmp) : Pointers(lmp)
  setSharedDataZero();
-        uploadtime=0;
+  uploadtime = 0;
-        downloadtime=0;
+  downloadtime = 0;
-        dotiming=false;
+  dotiming = false;
  dotestatom = false;
  testatom = 0;
  oncpu = true;
  self_comm = 0;
-        MYDBG( printf("# CUDA: Cuda::Cuda Done...\n");)
+  MYDBG(printf("# CUDA: Cuda::Cuda Done...\n");)
  //cCudaData<double, float, yx >
 }
@ -144,7 +156,7 @@ Cuda::~Cuda()
  print_timings();
-        if(universe->me==0) printf("# CUDA: Free memory...\n");
+  if(universe->me == 0) printf("# CUDA: Free memory...\n");
  delete cu_q;
  delete cu_x;
@ -178,8 +190,8 @@ Cuda::~Cuda()
  delete cu_map_array;
  std::map<NeighList*, CudaNeighList*>::iterator p = neigh_lists.begin();
-        while(p != neigh_lists.end())
+
-        {
+  while(p != neigh_lists.end()) {
    delete p->second;
    ++p;
  }
@ -188,75 +200,81 @@ Cuda::~Cuda()
 void Cuda::accelerator(int narg, char** arg)
 {
  if(device_set) return;
-        if(universe->me==0)
+
  if(universe->me == 0)
    printf("# CUDA: Activate GPU \n");
-        int* devicelist=NULL;
+  int* devicelist = NULL;
-        int pppn=2;
+  int pppn = 2;
-    for(int i=0;i<narg;i++)
+
-        {
+  for(int i = 0; i < narg; i++) {
-          if(strcmp(arg[i],"gpu/node")==0)
+    if(strcmp(arg[i], "gpu/node") == 0) {
-          {
+      if(++i == narg)
-            if(++i==narg)
+        error->all(FLERR, "Invalid Options for 'accelerator' command. Expecting a number after 'gpu/node' option.");
-              error->all(FLERR,"Invalid Options for 'accelerator' command. Expecting a number after 'gpu/node' option.");
+
-            pppn=atoi(arg[i]);
+      pppn = atoi(arg[i]);
    }
-          if(strcmp(arg[i],"gpu/node/special")==0)
+    if(strcmp(arg[i], "gpu/node/special") == 0) {
-          {
+      if(++i == narg)
-            if(++i==narg)
+        error->all(FLERR, "Invalid Options for 'accelerator' command. Expecting number of GPUs to be used per node after keyword 'gpu/node/special'.");
-              error->all(FLERR,"Invalid Options for 'accelerator' command. Expecting number of GPUs to be used per node after keyword 'gpu/node/special'.");
+
-            pppn=atoi(arg[i]);
+      pppn = atoi(arg[i]);
-            if(pppn<1) error->all(FLERR,"Invalid Options for 'accelerator' command. Expecting number of GPUs to be used per node after keyword 'gpu/node special'.");
+
-            if(i+pppn==narg)
+      if(pppn < 1) error->all(FLERR, "Invalid Options for 'accelerator' command. Expecting number of GPUs to be used per node after keyword 'gpu/node special'.");
-              error->all(FLERR,"Invalid Options for 'accelerator' command. Expecting list of device ids after keyword 'gpu/node special'.");
+
-            devicelist=new int[pppn];
+      if(i + pppn == narg)
-            for(int k=0;k<pppn;k++)
+        error->all(FLERR, "Invalid Options for 'accelerator' command. Expecting list of device ids after keyword 'gpu/node special'.");
-              {i++;devicelist[k]=atoi(arg[i]);}
+
      devicelist = new int[pppn];
      for(int k = 0; k < pppn; k++) {
        i++;
        devicelist[k] = atoi(arg[i]);
      }
    }
-          if(strcmp(arg[i],"pinned")==0)
+    if(strcmp(arg[i], "pinned") == 0) {
-          {
+      if(++i == narg)
-                  if(++i==narg)
+        error->all(FLERR, "Invalid Options for 'accelerator' command. Expecting a number after 'pinned' option.");
-                    error->all(FLERR,"Invalid Options for 'accelerator' command. Expecting a number after 'pinned' option.");
+
-            pinned=atoi(arg[i])==0?false:true;
+      pinned = atoi(arg[i]) == 0 ? false : true;
-            if((pinned==false)&&(universe->me==0)) printf(" #CUDA: Pinned memory is not used for communication\n");
+
      if((pinned == false) && (universe->me == 0)) printf(" #CUDA: Pinned memory is not used for communication\n");
    }
-          if(strcmp(arg[i],"timing")==0)
+    if(strcmp(arg[i], "timing") == 0) {
-          {
+      dotiming = true;
                  dotiming=true;
    }
-          if(strcmp(arg[i],"suffix")==0)
+    if(strcmp(arg[i], "suffix") == 0) {
-          {
+      if(++i == narg)
-                  if(++i==narg)
+        error->all(FLERR, "Invalid Options for 'accelerator' command. Expecting a string after 'suffix' option.");
-                    error->all(FLERR,"Invalid Options for 'accelerator' command. Expecting a string after 'suffix' option.");
+
-                  strcpy(lmp->suffix,arg[i]);
+      strcpy(lmp->suffix, arg[i]);
    }
-          if(strcmp(arg[i],"overlap_comm")==0)
+    if(strcmp(arg[i], "overlap_comm") == 0) {
-          {
+      shared_data.overlap_comm = 1;
                  shared_data.overlap_comm=1;
    }
-          if(strcmp(arg[i],"test")==0)
+    if(strcmp(arg[i], "test") == 0) {
-          {
+      if(++i == narg)
-                  if(++i==narg)
+        error->all(FLERR, "Invalid Options for 'accelerator' command. Expecting a number after 'test' option.");
-                    error->all(FLERR,"Invalid Options for 'accelerator' command. Expecting a number after 'test' option.");
+
-            testatom=atof(arg[i]);
+      testatom = atof(arg[i]);
-            dotestatom=true;
+      dotestatom = true;
    }
-          if(strcmp(arg[i],"override/bpa")==0)
+    if(strcmp(arg[i], "override/bpa") == 0) {
-          {
+      if(++i == narg)
-                  if(++i==narg)
+        error->all(FLERR, "Invalid Options for 'accelerator' command. Expecting a number after 'override/bpa' option.");
-                    error->all(FLERR,"Invalid Options for 'accelerator' command. Expecting a number after 'override/bpa' option.");
+
      shared_data.pair.override_block_per_atom = atoi(arg[i]);
    }
  }
-        CudaWrapper_Init(0, (char**)0,universe->me,pppn,devicelist);
+  CudaWrapper_Init(0, (char**)0, universe->me, pppn, devicelist);
  //if(shared_data.overlap_comm)
  CudaWrapper_AddStreams(3);
  cu_x          = 0;
@ -289,7 +307,7 @@ void Cuda::accelerator(int narg, char** arg)
  cu_binned_id  = 0;
  cu_binned_idnew = 0;
-        device_set=true;
+  device_set = true;
  allocate();
  delete devicelist;
 }
@ -328,31 +346,32 @@ void Cuda::setSharedDataZero()
  shared_data.buffer_new = 1;
  shared_data.buffer = NULL;
-        shared_data.comm.comm_phase=0;
+  shared_data.comm.comm_phase = 0;
-        shared_data.overlap_comm=0;
+  shared_data.overlap_comm = 0;
  shared_data.comm.buffer = NULL;
-        shared_data.comm.buffer_size=0;
+  shared_data.comm.buffer_size = 0;
-        shared_data.comm.overlap_split_ratio=0;
+  shared_data.comm.overlap_split_ratio = 0;
  // setTimingsZero();
 }
 void Cuda::allocate()
 {
-        accelerator(0,NULL);
+  accelerator(0, NULL);
  MYDBG(printf("# CUDA: Cuda::allocate ...\n");)
-        if(not cu_virial)
+
-        {
+  if(not cu_virial) {
    cu_virial    = new cCudaData<double, ENERGY_FLOAT, x > (NULL, & shared_data.pair.virial , 6);
-          cu_eng_vdwl  = new cCudaData<double, ENERGY_FLOAT, x > (NULL, & shared_data.pair.eng_vdwl ,1);
+    cu_eng_vdwl  = new cCudaData<double, ENERGY_FLOAT, x > (NULL, & shared_data.pair.eng_vdwl , 1);
-          cu_eng_coul  = new cCudaData<double, ENERGY_FLOAT, x > (NULL, & shared_data.pair.eng_coul ,1);
+    cu_eng_coul  = new cCudaData<double, ENERGY_FLOAT, x > (NULL, & shared_data.pair.eng_coul , 1);
    cu_extent          = new cCudaData<double, double, x> (extent, 6);
    shared_data.flag = CudaWrapper_AllocCudaData(sizeof(int));
-          int size=2*CUDA_MAX_DEBUG_SIZE;
+    int size = 2 * CUDA_MAX_DEBUG_SIZE;
    debugdata = new int[size];
    cu_debugdata    = new cCudaData<int, int, x > (debugdata , size);
-          shared_data.debugdata=cu_debugdata->dev_data();
+    shared_data.debugdata = cu_debugdata->dev_data();
  }
  checkResize();
  setSystemParams();
  MYDBG(printf("# CUDA: Cuda::allocate done...\n");)
@ -376,8 +395,8 @@ void Cuda::setDomainParams()
  cuda_shared_domain* cu_domain = &shared_data.domain;
  cu_domain->triclinic = domain->triclinic;
-        for(short i=0; i<3; ++i)
+
-        {
+  for(short i = 0; i < 3; ++i) {
    cu_domain->periodicity[i] = domain->periodicity[i];
    cu_domain->sublo[i] = domain->sublo[i];
    cu_domain->subhi[i] = domain->subhi[i];
@ -385,34 +404,33 @@ void Cuda::setDomainParams()
    cu_domain->boxhi[i] = domain->boxhi[i];
    cu_domain->prd[i] = domain->prd[i];
  }
-        if(domain->triclinic)
+
-    {
+  if(domain->triclinic) {
-          for(short i=0; i<3; ++i)
+    for(short i = 0; i < 3; ++i) {
          {
      cu_domain->boxlo_lamda[i] = domain->boxlo_lamda[i];
      cu_domain->boxhi_lamda[i] = domain->boxhi_lamda[i];
      cu_domain->prd_lamda[i] = domain->prd_lamda[i];
    }
    cu_domain->xy = domain->xy;
    cu_domain->xz = domain->xz;
    cu_domain->yz = domain->yz;
  }
-    for(int i=0;i<6;i++)
+  for(int i = 0; i < 6; i++) {
-        {
+    cu_domain->h[i] = domain->h[i];
-          cu_domain->h[i]=domain->h[i];
+    cu_domain->h_inv[i] = domain->h_inv[i];
-          cu_domain->h_inv[i]=domain->h_inv[i];
+    cu_domain->h_rate[i] = domain->h_rate[i];
          cu_domain->h_rate[i]=domain->h_rate[i];
  }
-        cu_domain->update=2;
+  cu_domain->update = 2;
  MYDBG(printf("# CUDA: Cuda::setDomainParams done ...\n");)
 }
 void Cuda::checkResize()
 {
  MYDBG(printf("# CUDA: Cuda::checkResize ...\n");)
-    accelerator(0,NULL);
+  accelerator(0, NULL);
  cuda_shared_atom* cu_atom = & shared_data.atom;
  cuda_shared_pair* cu_pair = & shared_data.pair;
  cu_atom->q_flag      = atom->q_flag;
@ -422,116 +440,151 @@ void Cuda::checkResize()
  cu_atom->nghost      = atom->nghost;
  // do we have more atoms to upload than currently allocated memory on device? (also true if nothing yet allocated)
-        if(atom->nmax > cu_atom->nmax || cu_tag == NULL)
+  if(atom->nmax > cu_atom->nmax || cu_tag == NULL) {
-        {
+    delete cu_x;
-                delete cu_x;               cu_x         = new cCudaData<double, X_FLOAT, yx> ((double*)atom->x , & cu_atom->x        , atom->nmax, 3,0,true); //cu_x->set_buffer(&(shared_data.buffer),&(shared_data.buffersize),true);
+    cu_x         = new cCudaData<double, X_FLOAT, yx> ((double*)atom->x , & cu_atom->x        , atom->nmax, 3, 0, true); //cu_x->set_buffer(&(shared_data.buffer),&(shared_data.buffersize),true);
-                delete cu_v;               cu_v         = new cCudaData<double, V_FLOAT, yx> ((double*)atom->v, & cu_atom->v         , atom->nmax, 3);
+    delete cu_v;
-                delete cu_f;               cu_f         = new cCudaData<double, F_FLOAT, yx> ((double*)atom->f, & cu_atom->f         , atom->nmax, 3,0,true);
+    cu_v         = new cCudaData<double, V_FLOAT, yx> ((double*)atom->v, & cu_atom->v         , atom->nmax, 3);
-                delete cu_tag;             cu_tag       = new cCudaData<int   , int    , x > (atom->tag       , & cu_atom->tag       , atom->nmax   );
+    delete cu_f;
-                delete cu_type;            cu_type      = new cCudaData<int   , int    , x > (atom->type      , & cu_atom->type      , atom->nmax   );
+    cu_f         = new cCudaData<double, F_FLOAT, yx> ((double*)atom->f, & cu_atom->f         , atom->nmax, 3, 0, true);
-                delete cu_mask;            cu_mask      = new cCudaData<int   , int    , x > (atom->mask      , & cu_atom->mask      , atom->nmax   );
+    delete cu_tag;
-                delete cu_image;           cu_image     = new cCudaData<int   , int    , x > (atom->image     , & cu_atom->image     , atom->nmax   );
+    cu_tag       = new cCudaData<int   , int    , x > (atom->tag       , & cu_atom->tag       , atom->nmax, 0, true);
    delete cu_type;
    cu_type      = new cCudaData<int   , int    , x > (atom->type      , & cu_atom->type      , atom->nmax, 0, true);
    delete cu_mask;
    cu_mask      = new cCudaData<int   , int    , x > (atom->mask      , & cu_atom->mask      , atom->nmax, 0, true);
    delete cu_image;
    cu_image     = new cCudaData<int   , int    , x > (atom->image     , & cu_atom->image     , atom->nmax, 0, true);
-                if(atom->rmass)
+    if(atom->rmass) {
-                        {delete cu_rmass;      cu_rmass     = new cCudaData<double, V_FLOAT, x > (atom->rmass     , & cu_atom->rmass     , atom->nmax  );}
+      delete cu_rmass;
-
+      cu_rmass     = new cCudaData<double, V_FLOAT, x > (atom->rmass     , & cu_atom->rmass     , atom->nmax);
                if(cu_atom->q_flag)
                        {delete cu_q;          cu_q         = new cCudaData<double, F_FLOAT, x > ((double*)atom->q, & cu_atom->q         , atom->nmax  );}// cu_q->set_buffer(&(copy_buffer),&(copy_buffersize),true);}
                if(atom->radius)
                {
                        delete cu_radius;     cu_radius    = new cCudaData<double, X_FLOAT, x > (atom->radius    , & cu_atom->radius     , atom->nmax  );
                    delete cu_v_radius;   cu_v_radius  = new cCudaData<V_FLOAT, V_FLOAT, x> (v_radius , & cu_atom->v_radius      , atom->nmax*4);
                    delete cu_omega_rmass;   cu_omega_rmass  = new cCudaData<V_FLOAT, V_FLOAT, x> (omega_rmass , & cu_atom->omega_rmass      , atom->nmax*4);
    }
-                if(atom->omega)
+    if(cu_atom->q_flag) {
-                        {delete cu_omega;      cu_omega     = new cCudaData<double, V_FLOAT, yx > (((double*) atom->omega)    , & cu_atom->omega     , atom->nmax,3  );}
+      delete cu_q;
      cu_q         = new cCudaData<double, F_FLOAT, x > ((double*)atom->q, & cu_atom->q         , atom->nmax, 0 , true);
    }// cu_q->set_buffer(&(copy_buffer),&(copy_buffersize),true);}
-                if(atom->torque)
+    if(atom->radius) {
-                        {delete cu_torque;     cu_torque    = new cCudaData<double, F_FLOAT, yx > (((double*) atom->torque)   , & cu_atom->torque     , atom->nmax,3  );}
+      delete cu_radius;
      cu_radius    = new cCudaData<double, X_FLOAT, x > (atom->radius    , & cu_atom->radius     , atom->nmax);
      delete cu_v_radius;
      cu_v_radius  = new cCudaData<V_FLOAT, V_FLOAT, x> (v_radius , & cu_atom->v_radius      , atom->nmax * 4);
      delete cu_omega_rmass;
      cu_omega_rmass  = new cCudaData<V_FLOAT, V_FLOAT, x> (omega_rmass , & cu_atom->omega_rmass      , atom->nmax * 4);
    }
    if(atom->omega) {
      delete cu_omega;
      cu_omega     = new cCudaData<double, V_FLOAT, yx > (((double*) atom->omega)    , & cu_atom->omega     , atom->nmax, 3);
    }
    if(atom->torque) {
      delete cu_torque;
      cu_torque    = new cCudaData<double, F_FLOAT, yx > (((double*) atom->torque)   , & cu_atom->torque     , atom->nmax, 3);
    }
    if(atom->special) {
      delete cu_special;
      cu_special    = new cCudaData<int, int, yx > (((int*) & (atom->special[0][0]))   , & cu_atom->special     , atom->nmax, atom->maxspecial, 0 , true);
      shared_data.atom.maxspecial = atom->maxspecial;
    }
    if(atom->nspecial) {
      delete cu_nspecial;
      cu_nspecial    = new cCudaData<int, int, yx > (((int*) atom->nspecial)  , & cu_atom->nspecial     , atom->nmax, 3, 0, true);
    }
    if(atom->molecule) {
      delete cu_molecule;
      cu_molecule    = new cCudaData<int, int, x > (((int*) atom->molecule)  , & cu_atom->molecule     , atom->nmax, 0 , true);
    }
                if(atom->special)
                        {delete cu_special;     cu_special    = new cCudaData<int, int, yx > (((int*) &(atom->special[0][0]))   , & cu_atom->special     , atom->nmax,atom->maxspecial  ); shared_data.atom.maxspecial=atom->maxspecial;}
                if(atom->nspecial)
                        {delete cu_nspecial;     cu_nspecial    = new cCudaData<int, int, yx > (((int*) atom->nspecial)  , & cu_atom->nspecial     , atom->nmax,3  );}
                if(atom->molecule)
                        {delete cu_molecule;     cu_molecule    = new cCudaData<int, int, x > (((int*) atom->molecule)  , & cu_atom->molecule     , atom->nmax  );}
    shared_data.atom.special_flag = neighbor->special_flag;
    shared_data.atom.molecular = atom->molecular;
    cu_atom->update_nmax = 2;
    cu_atom->nmax        = atom->nmax;
-                delete cu_x_type;           cu_x_type   = new cCudaData<X_FLOAT, X_FLOAT, x> (x_type , & cu_atom->x_type      , atom->nmax*4);
+    delete cu_x_type;
    cu_x_type   = new cCudaData<X_FLOAT, X_FLOAT, x> (x_type , & cu_atom->x_type      , atom->nmax * 4);
  }
-        if(((cu_xhold==NULL)||(cu_xhold->get_dim()[0]<neighbor->maxhold))&&neighbor->xhold)
+  if(((cu_xhold == NULL) || (cu_xhold->get_dim()[0] < neighbor->maxhold)) && neighbor->xhold) {
-        {
+    delete cu_xhold;
-                delete cu_xhold;           cu_xhold     = new cCudaData<double, X_FLOAT, yx> ((double*)neighbor->xhold, & cu_atom->xhold         , neighbor->maxhold, 3);
+    cu_xhold     = new cCudaData<double, X_FLOAT, yx> ((double*)neighbor->xhold, & cu_atom->xhold         , neighbor->maxhold, 3);
-                shared_data.atom.maxhold=neighbor->maxhold;
+    shared_data.atom.maxhold = neighbor->maxhold;
  }
  if(atom->mass && !cu_mass) {
    cu_mass      = new cCudaData<double, V_FLOAT, x > (atom->mass      , & cu_atom->mass      , atom->ntypes + 1);
  }
        if(atom->mass && !cu_mass)
        {cu_mass      = new cCudaData<double, V_FLOAT, x > (atom->mass      , & cu_atom->mass      , atom->ntypes+1);}
  cu_atom->mass_host   = atom->mass;
-        if(atom->map_style==1)
+  if(atom->map_style == 1) {
-        {
+    if((cu_map_array == NULL)) {
-          if((cu_map_array==NULL))
+      cu_map_array   = new cCudaData<int, int, x > (atom->get_map_array()   , & cu_atom->map_array     , atom->get_map_size());
-          {
+    } else if(cu_map_array->dev_size() / sizeof(int) < atom->get_map_size()) {
                  cu_map_array   = new cCudaData<int, int, x > (atom->get_map_array()   , & cu_atom->map_array     , atom->get_map_size()  );
          }
          else
          if(cu_map_array->dev_size()/sizeof(int)<atom->get_map_size())
          {
      delete cu_map_array;
-      cu_map_array   = new cCudaData<int, int, x > (atom->get_map_array()   , & cu_atom->map_array     , atom->get_map_size()  );
+      cu_map_array   = new cCudaData<int, int, x > (atom->get_map_array()   , & cu_atom->map_array     , atom->get_map_size());
    }
  }
  // if any of the host pointers have changed (e.g. re-allocated somewhere else), set to correct pointer
-        if(cu_x   ->get_host_data() != atom->x)    cu_x   ->set_host_data((double*) (atom->x));
+  if(cu_x   ->get_host_data() != atom->x)    cu_x   ->set_host_data((double*)(atom->x));
-        if(cu_v   ->get_host_data() != atom->v)    cu_v   ->set_host_data((double*) (atom->v));
+
-        if(cu_f   ->get_host_data() != atom->f)    cu_f   ->set_host_data((double*) (atom->f));
+  if(cu_v   ->get_host_data() != atom->v)    cu_v   ->set_host_data((double*)(atom->v));
  if(cu_f   ->get_host_data() != atom->f)    cu_f   ->set_host_data((double*)(atom->f));
  if(cu_tag ->get_host_data() != atom->tag)  cu_tag ->set_host_data(atom->tag);
  if(cu_type->get_host_data() != atom->type) cu_type->set_host_data(atom->type);
  if(cu_mask->get_host_data() != atom->mask) cu_mask->set_host_data(atom->mask);
  if(cu_image->get_host_data() != atom->image) cu_mask->set_host_data(atom->image);
  if(cu_xhold)
-        if(cu_xhold->get_host_data()!= neighbor->xhold) cu_xhold->set_host_data((double*)(neighbor->xhold));
+    if(cu_xhold->get_host_data() != neighbor->xhold) cu_xhold->set_host_data((double*)(neighbor->xhold));
  if(atom->rmass)
-        if(cu_rmass->get_host_data() != atom->rmass) cu_rmass->set_host_data((double*) (atom->rmass));
+    if(cu_rmass->get_host_data() != atom->rmass) cu_rmass->set_host_data((double*)(atom->rmass));
  if(cu_atom->q_flag)
-        if(cu_q->get_host_data() != atom->q) cu_q->set_host_data((double*) (atom->q));
+    if(cu_q->get_host_data() != atom->q) cu_q->set_host_data((double*)(atom->q));
  if(atom->radius)
-        if(cu_radius->get_host_data() != atom->radius) cu_radius->set_host_data((double*) (atom->radius));
+    if(cu_radius->get_host_data() != atom->radius) cu_radius->set_host_data((double*)(atom->radius));
  if(atom->omega)
-        if(cu_omega->get_host_data() != atom->omega) cu_omega->set_host_data((double*) (atom->omega));
+    if(cu_omega->get_host_data() != atom->omega) cu_omega->set_host_data((double*)(atom->omega));
  if(atom->torque)
-        if(cu_torque->get_host_data() != atom->torque) cu_torque->set_host_data((double*) (atom->torque));
+    if(cu_torque->get_host_data() != atom->torque) cu_torque->set_host_data((double*)(atom->torque));
  if(atom->special)
-        if(cu_special->get_host_data() != atom->special)
+    if(cu_special->get_host_data() != atom->special) {
-                        {delete cu_special;     cu_special    = new cCudaData<int, int, yx > (((int*) atom->special)   , & cu_atom->special     , atom->nmax,atom->maxspecial  ); shared_data.atom.maxspecial=atom->maxspecial;}
+      delete cu_special;
      cu_special    = new cCudaData<int, int, yx > (((int*) atom->special)   , & cu_atom->special     , atom->nmax, atom->maxspecial);
      shared_data.atom.maxspecial = atom->maxspecial;
    }
  if(atom->nspecial)
-        if(cu_nspecial->get_host_data() != atom->nspecial) cu_nspecial->set_host_data((int*) (atom->nspecial));
+    if(cu_nspecial->get_host_data() != atom->nspecial) cu_nspecial->set_host_data((int*)(atom->nspecial));
  if(atom->molecule)
-        if(cu_molecule->get_host_data() != atom->molecule) cu_molecule->set_host_data((int*) (atom->molecule));
+    if(cu_molecule->get_host_data() != atom->molecule) cu_molecule->set_host_data((int*)(atom->molecule));
  if(force)
    if(cu_virial   ->get_host_data() != force->pair->virial)    cu_virial   ->set_host_data(force->pair->virial);
  if(force)
    if(cu_eng_vdwl ->get_host_data() != &force->pair->eng_vdwl)    cu_eng_vdwl  ->set_host_data(&force->pair->eng_vdwl);
  if(force)
    if(cu_eng_coul ->get_host_data() != &force->pair->eng_coul)    cu_eng_coul   ->set_host_data(&force->pair->eng_coul);
@ -539,32 +592,32 @@ void Cuda::checkResize()
  MYDBG(printf("# CUDA: Cuda::checkResize done...\n");)
 }
-void Cuda::evsetup_eatom_vatom(int eflag_atom,int vflag_atom)
+void Cuda::evsetup_eatom_vatom(int eflag_atom, int vflag_atom)
 {
-    if(eflag_atom)
+  if(eflag_atom) {
    {
    if(not cu_eatom)
-                    cu_eatom         = new cCudaData<double, ENERGY_FLOAT, x > (force->pair->eatom, & (shared_data.atom.eatom)         , atom->nmax  );// cu_eatom->set_buffer(&(copy_buffer),&(copy_buffersize),true);}
+      cu_eatom         = new cCudaData<double, ENERGY_FLOAT, x > (force->pair->eatom, & (shared_data.atom.eatom)         , atom->nmax);  // cu_eatom->set_buffer(&(copy_buffer),&(copy_buffersize),true);}
-            if(cu_eatom->get_dim()[0]!=atom->nmax)
+
-            {
+    if(cu_eatom->get_dim()[0] != atom->nmax) {
      //delete cu_eatom;
      //cu_eatom         = new cCudaData<double, ENERGY_FLOAT, x > (force->pair->eatom, & (shared_data.atom.eatom)         , atom->nmax  );// cu_eatom->set_buffer(&(copy_buffer),&(copy_buffersize),true);}
-              shared_data.atom.update_nmax=2;
+      shared_data.atom.update_nmax = 2;
    }
    cu_eatom->set_host_data(force->pair->eatom);
    cu_eatom->memset_device(0);
  }
-    if(vflag_atom)
+
-    {
+  if(vflag_atom) {
    if(not cu_vatom)
-                    cu_vatom         = new cCudaData<double, ENERGY_FLOAT, yx > ((double*)force->pair->vatom, & (shared_data.atom.vatom)         , atom->nmax ,6 );// cu_vatom->set_buffer(&(copy_buffer),&(copy_buffersize),true);}
+      cu_vatom         = new cCudaData<double, ENERGY_FLOAT, yx > ((double*)force->pair->vatom, & (shared_data.atom.vatom)         , atom->nmax , 6);// cu_vatom->set_buffer(&(copy_buffer),&(copy_buffersize),true);}
-      if(cu_vatom->get_dim()[0]!=atom->nmax)
+
-      {
+    if(cu_vatom->get_dim()[0] != atom->nmax) {
      //delete cu_vatom;
      //cu_vatom         = new cCudaData<double, ENERGY_FLOAT, yx > ((double*)force->pair->vatom, & (shared_data.atom.vatom)         , atom->nmax ,6 );// cu_vatom->set_buffer(&(copy_buffer),&(copy_buffersize),true);}
-        shared_data.atom.update_nmax=2;
+      shared_data.atom.update_nmax = 2;
    }
    cu_vatom->set_host_data((double*)force->pair->vatom);
    cu_vatom->memset_device(0);
  }
@ -576,8 +629,9 @@ void Cuda::uploadAll()
  timespec starttime;
  timespec endtime;
-        if(atom->nmax!=shared_data.atom.nmax) checkResize();
+  if(atom->nmax != shared_data.atom.nmax) checkResize();
-        clock_gettime(CLOCK_REALTIME,&starttime);
+
  clock_gettime(CLOCK_REALTIME, &starttime);
  cu_x   ->upload();
  cu_v   ->upload();
  cu_f   ->upload();
@ -585,25 +639,33 @@ void Cuda::uploadAll()
  cu_type->upload();
  cu_mask->upload();
  cu_image->upload();
  if(shared_data.atom.q_flag) cu_q    ->upload();
  if(atom->rmass)             cu_rmass->upload();
  if(atom->radius)            cu_radius->upload();
  if(atom->omega)             cu_omega->upload();
  if(atom->torque)            cu_torque->upload();
  if(atom->special)           cu_special->upload();
  if(atom->nspecial)          cu_nspecial->upload();
  if(atom->molecule)          cu_molecule->upload();
  if(cu_eatom) cu_eatom->upload();
  if(cu_vatom) cu_vatom->upload();
-        clock_gettime(CLOCK_REALTIME,&endtime);
+  clock_gettime(CLOCK_REALTIME, &endtime);
-        uploadtime+=(endtime.tv_sec-starttime.tv_sec+1.0*(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
+  uploadtime += (endtime.tv_sec - starttime.tv_sec + 1.0 * (endtime.tv_nsec - starttime.tv_nsec) / 1000000000);
  CUDA_IF_BINNING(Cuda_PreBinning(& shared_data);)
-        CUDA_IF_BINNING(Cuda_Binning   (& shared_data);)
+  CUDA_IF_BINNING(Cuda_Binning(& shared_data);)
-        shared_data.atom.triggerneighsq=neighbor->triggersq;
+  shared_data.atom.triggerneighsq = neighbor->triggersq;
  MYDBG(printf("# CUDA: Cuda::uploadAll() ... end\n");)
 }
@ -613,10 +675,10 @@ void Cuda::downloadAll()
  timespec starttime;
  timespec endtime;
-        if(atom->nmax!=shared_data.atom.nmax) checkResize();
+  if(atom->nmax != shared_data.atom.nmax) checkResize();
-        CUDA_IF_BINNING( Cuda_ReverseBinning(& shared_data); )
+  CUDA_IF_BINNING(Cuda_ReverseBinning(& shared_data);)
-        clock_gettime(CLOCK_REALTIME,&starttime);
+  clock_gettime(CLOCK_REALTIME, &starttime);
  cu_x   ->download();
  cu_v   ->download();
  cu_f   ->download();
@ -629,19 +691,27 @@ void Cuda::downloadAll()
  //if(shared_data.atom.need_vatom) cu_vatom->download();
  if(shared_data.atom.q_flag) cu_q    ->download();
  if(atom->rmass)             cu_rmass->download();
  if(atom->radius)            cu_radius->download();
  if(atom->omega)             cu_omega->download();
  if(atom->torque)            cu_torque->download();
  if(atom->special)           cu_special->download();
  if(atom->nspecial)          cu_nspecial->download();
  if(atom->molecule)          cu_molecule->download();
  if(cu_eatom) cu_eatom->download();
  if(cu_vatom) cu_vatom->download();
-        clock_gettime(CLOCK_REALTIME,&endtime);
+  clock_gettime(CLOCK_REALTIME, &endtime);
-        downloadtime+=(endtime.tv_sec-starttime.tv_sec+1.0*(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
+  downloadtime += (endtime.tv_sec - starttime.tv_sec + 1.0 * (endtime.tv_nsec - starttime.tv_nsec) / 1000000000);
  MYDBG(printf("# CUDA: Cuda::downloadAll() ... end\n");)
 }
@ -657,12 +727,12 @@ CudaNeighList* Cuda::registerNeighborList(class NeighList* neigh_list)
  std::map<NeighList*, CudaNeighList*>::iterator p = neigh_lists.find(neigh_list);
  if(p != neigh_lists.end()) return p->second;
-        else
+  else {
        {
    CudaNeighList* neigh_list_cuda = new CudaNeighList(lmp, neigh_list);
    neigh_lists.insert(std::pair<NeighList*, CudaNeighList*>(neigh_list, neigh_list_cuda));
    return neigh_list_cuda;
  }
  MYDBG(printf("# CUDA: Cuda::registerNeighborList() ... end b\n");)
 }
@ -670,14 +740,17 @@ void Cuda::uploadAllNeighborLists()
 {
  MYDBG(printf("# CUDA: Cuda::uploadAllNeighborList() ... start\n");)
  std::map<NeighList*, CudaNeighList*>::iterator p = neigh_lists.begin();
-        while(p != neigh_lists.end())
+
-        {
+  while(p != neigh_lists.end()) {
    p->second->nl_upload();
-                if(not (p->second->neigh_list->cuda_list->build_cuda))
+
-                for(int i=0;i<atom->nlocal;i++)
+    if(not(p->second->neigh_list->cuda_list->build_cuda))
-                p->second->sneighlist.maxneighbors=MAX(p->second->neigh_list->numneigh[i],p->second->sneighlist.maxneighbors) ;
+      for(int i = 0; i < atom->nlocal; i++)
        p->second->sneighlist.maxneighbors = MAX(p->second->neigh_list->numneigh[i], p->second->sneighlist.maxneighbors) ;
    ++p;
  }
  MYDBG(printf("# CUDA: Cuda::uploadAllNeighborList() ... done\n");)
 }
@ -685,28 +758,29 @@ void Cuda::downloadAllNeighborLists()
 {
  MYDBG(printf("# CUDA: Cuda::downloadAllNeighborList() ... start\n");)
  std::map<NeighList*, CudaNeighList*>::iterator p = neigh_lists.begin();
-        while(p != neigh_lists.end())
+
-        {
+  while(p != neigh_lists.end()) {
    p->second->nl_download();
    ++p;
  }
 }
-void Cuda::update_xhold(int &maxhold,double* xhold)
+void Cuda::update_xhold(int &maxhold, double* xhold)
 {
-     if(this->shared_data.atom.maxhold<atom->nmax)
+  if(this->shared_data.atom.maxhold < atom->nmax) {
     {
    maxhold = atom->nmax;
-                delete this->cu_xhold;           this->cu_xhold     = new cCudaData<double, X_FLOAT, yx> ((double*)xhold, & this->shared_data.atom.xhold         , maxhold, 3);
+    delete this->cu_xhold;
    this->cu_xhold     = new cCudaData<double, X_FLOAT, yx> ((double*)xhold, & this->shared_data.atom.xhold         , maxhold, 3);
  }
-     this->shared_data.atom.maxhold=maxhold;
+
-          CudaWrapper_CopyData(this->cu_xhold->dev_data(),this->cu_x->dev_data(),3*atom->nmax*sizeof(X_FLOAT));
+  this->shared_data.atom.maxhold = maxhold;
  CudaWrapper_CopyData(this->cu_xhold->dev_data(), this->cu_x->dev_data(), 3 * atom->nmax * sizeof(X_FLOAT));
 }
 void Cuda::setTimingsZero()
 {
-        shared_data.cuda_timings.test1=0;
+  shared_data.cuda_timings.test1 = 0;
-        shared_data.cuda_timings.test2=0;
+  shared_data.cuda_timings.test2 = 0;
  //communication
  shared_data.cuda_timings.comm_forward_total = 0;
@ -722,7 +796,7 @@ void Cuda::setTimingsZero()
  shared_data.cuda_timings.comm_exchange_kernel_pack = 0;
  shared_data.cuda_timings.comm_exchange_kernel_unpack = 0;
  shared_data.cuda_timings.comm_exchange_kernel_fill = 0;
-        shared_data.cuda_timings.comm_exchange_cpu_pack= 0;
+  shared_data.cuda_timings.comm_exchange_cpu_pack = 0;
  shared_data.cuda_timings.comm_exchange_upload = 0;
  shared_data.cuda_timings.comm_exchange_download = 0;
@ -763,76 +837,77 @@ void Cuda::setTimingsZero()
 void Cuda::print_timings()
 {
-        if(universe->me!=0) return;
+  if(universe->me != 0) return;
  if(not dotiming) return;
  printf("\n # CUDA: Special timings\n\n");
  printf("\n Transfer Times\n");
-        printf(" PCIe Upload:  \t %lf s\n",CudaWrapper_CheckUploadTime());
+  printf(" PCIe Upload:  \t %lf s\n", CudaWrapper_CheckUploadTime());
-        printf(" PCIe Download:\t %lf s\n",CudaWrapper_CheckDownloadTime());
+  printf(" PCIe Download:\t %lf s\n", CudaWrapper_CheckDownloadTime());
-        printf(" CPU Tempbbuf Upload:   \t %lf \n",CudaWrapper_CheckCPUBufUploadTime());
+  printf(" CPU Tempbbuf Upload:   \t %lf \n", CudaWrapper_CheckCPUBufUploadTime());
-        printf(" CPU Tempbbuf Download: \t %lf \n",CudaWrapper_CheckCPUBufDownloadTime());
+  printf(" CPU Tempbbuf Download: \t %lf \n", CudaWrapper_CheckCPUBufDownloadTime());
  printf("\n Communication \n");
-        printf(" Forward Total           \t %lf \n",shared_data.cuda_timings.comm_forward_total);
+  printf(" Forward Total           \t %lf \n", shared_data.cuda_timings.comm_forward_total);
-        printf(" Forward MPI Upper Bound \t %lf \n",shared_data.cuda_timings.comm_forward_mpi_upper);
+  printf(" Forward MPI Upper Bound \t %lf \n", shared_data.cuda_timings.comm_forward_mpi_upper);
-        printf(" Forward MPI Lower Bound \t %lf \n",shared_data.cuda_timings.comm_forward_mpi_lower);
+  printf(" Forward MPI Lower Bound \t %lf \n", shared_data.cuda_timings.comm_forward_mpi_lower);
-        printf(" Forward Kernel Pack     \t %lf \n",shared_data.cuda_timings.comm_forward_kernel_pack);
+  printf(" Forward Kernel Pack     \t %lf \n", shared_data.cuda_timings.comm_forward_kernel_pack);
-        printf(" Forward Kernel Unpack   \t %lf \n",shared_data.cuda_timings.comm_forward_kernel_unpack);
+  printf(" Forward Kernel Unpack   \t %lf \n", shared_data.cuda_timings.comm_forward_kernel_unpack);
-        printf(" Forward Kernel Self     \t %lf \n",shared_data.cuda_timings.comm_forward_kernel_self);
+  printf(" Forward Kernel Self     \t %lf \n", shared_data.cuda_timings.comm_forward_kernel_self);
-        printf(" Forward Upload          \t %lf \n",shared_data.cuda_timings.comm_forward_upload);
+  printf(" Forward Upload          \t %lf \n", shared_data.cuda_timings.comm_forward_upload);
-        printf(" Forward Download        \t %lf \n",shared_data.cuda_timings.comm_forward_download);
+  printf(" Forward Download        \t %lf \n", shared_data.cuda_timings.comm_forward_download);
-        printf(" Forward Overlap Split Ratio\t %lf \n",shared_data.comm.overlap_split_ratio);
+  printf(" Forward Overlap Split Ratio\t %lf \n", shared_data.comm.overlap_split_ratio);
  printf("\n");
-        printf(" Exchange Total          \t %lf \n",shared_data.cuda_timings.comm_exchange_total);
+  printf(" Exchange Total          \t %lf \n", shared_data.cuda_timings.comm_exchange_total);
-        printf(" Exchange MPI            \t %lf \n",shared_data.cuda_timings.comm_exchange_mpi);
+  printf(" Exchange MPI            \t %lf \n", shared_data.cuda_timings.comm_exchange_mpi);
-        printf(" Exchange Kernel Pack    \t %lf \n",shared_data.cuda_timings.comm_exchange_kernel_pack);
+  printf(" Exchange Kernel Pack    \t %lf \n", shared_data.cuda_timings.comm_exchange_kernel_pack);
-        printf(" Exchange Kernel Unpack  \t %lf \n",shared_data.cuda_timings.comm_exchange_kernel_unpack);
+  printf(" Exchange Kernel Unpack  \t %lf \n", shared_data.cuda_timings.comm_exchange_kernel_unpack);
-  printf(" Exchange Kernel Fill    \t %lf \n",shared_data.cuda_timings.comm_exchange_kernel_fill);
+  printf(" Exchange Kernel Fill    \t %lf \n", shared_data.cuda_timings.comm_exchange_kernel_fill);
-  printf(" Exchange CPU Pack             \t %lf \n",shared_data.cuda_timings.comm_exchange_cpu_pack);
+  printf(" Exchange CPU Pack             \t %lf \n", shared_data.cuda_timings.comm_exchange_cpu_pack);
-        printf(" Exchange Upload         \t %lf \n",shared_data.cuda_timings.comm_exchange_upload);
+  printf(" Exchange Upload         \t %lf \n", shared_data.cuda_timings.comm_exchange_upload);
-        printf(" Exchange Download       \t %lf \n",shared_data.cuda_timings.comm_exchange_download);
+  printf(" Exchange Download       \t %lf \n", shared_data.cuda_timings.comm_exchange_download);
  printf("\n");
-        printf(" Border Total            \t %lf \n",shared_data.cuda_timings.comm_border_total);
+  printf(" Border Total            \t %lf \n", shared_data.cuda_timings.comm_border_total);
-        printf(" Border MPI              \t %lf \n",shared_data.cuda_timings.comm_border_mpi);
+  printf(" Border MPI              \t %lf \n", shared_data.cuda_timings.comm_border_mpi);
-        printf(" Border Kernel Pack      \t %lf \n",shared_data.cuda_timings.comm_border_kernel_pack);
+  printf(" Border Kernel Pack      \t %lf \n", shared_data.cuda_timings.comm_border_kernel_pack);
-        printf(" Border Kernel Unpack    \t %lf \n",shared_data.cuda_timings.comm_border_kernel_unpack);
+  printf(" Border Kernel Unpack    \t %lf \n", shared_data.cuda_timings.comm_border_kernel_unpack);
-        printf(" Border Kernel Self      \t %lf \n",shared_data.cuda_timings.comm_border_kernel_self);
+  printf(" Border Kernel Self      \t %lf \n", shared_data.cuda_timings.comm_border_kernel_self);
-        printf(" Border Kernel BuildList \t %lf \n",shared_data.cuda_timings.comm_border_kernel_buildlist);
+  printf(" Border Kernel BuildList \t %lf \n", shared_data.cuda_timings.comm_border_kernel_buildlist);
-        printf(" Border Upload           \t %lf \n",shared_data.cuda_timings.comm_border_upload);
+  printf(" Border Upload           \t %lf \n", shared_data.cuda_timings.comm_border_upload);
-        printf(" Border Download              \t %lf \n",shared_data.cuda_timings.comm_border_download);
+  printf(" Border Download              \t %lf \n", shared_data.cuda_timings.comm_border_download);
  printf("\n");
  //pair forces
-        printf(" Pair XType Conversion   \t %lf \n",shared_data.cuda_timings.pair_xtype_conversion );
+  printf(" Pair XType Conversion   \t %lf \n", shared_data.cuda_timings.pair_xtype_conversion);
-        printf(" Pair Kernel             \t %lf \n",shared_data.cuda_timings.pair_kernel );
+  printf(" Pair Kernel             \t %lf \n", shared_data.cuda_timings.pair_kernel);
-        printf(" Pair Virial             \t %lf \n",shared_data.cuda_timings.pair_virial );
+  printf(" Pair Virial             \t %lf \n", shared_data.cuda_timings.pair_virial);
-        printf(" Pair Force Collection   \t %lf \n",shared_data.cuda_timings.pair_force_collection );
+  printf(" Pair Force Collection   \t %lf \n", shared_data.cuda_timings.pair_force_collection);
  printf("\n");
  //neighbor
-        printf(" Neighbor Binning        \t %lf \n",shared_data.cuda_timings.neigh_bin );
+  printf(" Neighbor Binning        \t %lf \n", shared_data.cuda_timings.neigh_bin);
-        printf(" Neighbor Build          \t %lf \n",shared_data.cuda_timings.neigh_build );
+  printf(" Neighbor Build          \t %lf \n", shared_data.cuda_timings.neigh_build);
-        printf(" Neighbor Special        \t %lf \n",shared_data.cuda_timings.neigh_special );
+  printf(" Neighbor Special        \t %lf \n", shared_data.cuda_timings.neigh_special);
  printf("\n");
  //pppm
-        if(force->kspace)
+  if(force->kspace) {
-        {
+    printf(" PPPM Total              \t %lf \n", shared_data.cuda_timings.pppm_compute);
-        printf(" PPPM Total              \t %lf \n",shared_data.cuda_timings.pppm_compute );
+    printf(" PPPM Particle Map       \t %lf \n", shared_data.cuda_timings.pppm_particle_map);
-        printf(" PPPM Particle Map       \t %lf \n",shared_data.cuda_timings.pppm_particle_map );
+    printf(" PPPM Make Rho           \t %lf \n", shared_data.cuda_timings.pppm_make_rho);
-        printf(" PPPM Make Rho           \t %lf \n",shared_data.cuda_timings.pppm_make_rho );
+    printf(" PPPM Brick2fft          \t %lf \n", shared_data.cuda_timings.pppm_brick2fft);
-        printf(" PPPM Brick2fft          \t %lf \n",shared_data.cuda_timings.pppm_brick2fft );
+    printf(" PPPM Poisson            \t %lf \n", shared_data.cuda_timings.pppm_poisson);
-        printf(" PPPM Poisson            \t %lf \n",shared_data.cuda_timings.pppm_poisson );
+    printf(" PPPM Fillbrick          \t %lf \n", shared_data.cuda_timings.pppm_fillbrick);
-        printf(" PPPM Fillbrick          \t %lf \n",shared_data.cuda_timings.pppm_fillbrick );
+    printf(" PPPM Fieldforce         \t %lf \n", shared_data.cuda_timings.pppm_fieldforce);
        printf(" PPPM Fieldforce         \t %lf \n",shared_data.cuda_timings.pppm_fieldforce );
    printf("\n");
  }
-        printf(" Debug Test 1            \t %lf \n",shared_data.cuda_timings.test1);
+  printf(" Debug Test 1            \t %lf \n", shared_data.cuda_timings.test1);
-        printf(" Debug Test 2            \t %lf \n",shared_data.cuda_timings.test2);
+  printf(" Debug Test 2            \t %lf \n", shared_data.cuda_timings.test2);
  printf("\n");
 }
--- a/src/USER-CUDA/neighbor_cuda.cpp
+++ b/src/USER-CUDA/neighbor_cuda.cpp
@ -23,28 +23,30 @@
 #include "group.h"
 #include "memory.h"
 #include "error.h"
 #include "update.h"
 using namespace LAMMPS_NS;
-enum{NSQ,BIN,MULTI};     // also in neigh_list.cpp
+enum {NSQ, BIN, MULTI};  // also in neigh_list.cpp
 /* ---------------------------------------------------------------------- */
-NeighborCuda::NeighborCuda(LAMMPS *lmp) : Neighbor(lmp)
+NeighborCuda::NeighborCuda(LAMMPS* lmp) : Neighbor(lmp)
 {
  cuda = lmp->cuda;
  if(cuda == NULL)
-        error->all(FLERR,"You cannot use a /cuda class, without activating 'cuda' acceleration. Provide '-c on' as command-line argument to LAMMPS..");
+    error->all(FLERR, "You cannot use a /cuda class, without activating 'cuda' acceleration. Provide '-c on' as command-line argument to LAMMPS..");
 }
 /* ---------------------------------------------------------------------- */
 void NeighborCuda::init()
 {
-  cuda->set_neighinit(dist_check,0.25*skin*skin);
+  cuda->set_neighinit(dist_check, 0.25 * skin * skin);
  cudable = 1;
  Neighbor::init();
@ -55,13 +57,13 @@ void NeighborCuda::init()
   any other neighbor build method is unchanged
 ------------------------------------------------------------------------- */
-void NeighborCuda::choose_build(int index, NeighRequest *rq)
+void NeighborCuda::choose_build(int index, NeighRequest* rq)
 {
-  Neighbor::choose_build(index,rq);
+  Neighbor::choose_build(index, rq);
-  if (rq->full && style == NSQ && rq->cudable)
+  if(rq->full && style == NSQ && rq->cudable)
    pair_build[index] = (Neighbor::PairPtr) &NeighborCuda::full_nsq_cuda;
-  else if (rq->full && style == BIN && rq->cudable)
+  else if(rq->full && style == BIN && rq->cudable)
    pair_build[index] = (Neighbor::PairPtr) &NeighborCuda::full_bin_cuda;
 }
@ -69,93 +71,104 @@ void NeighborCuda::choose_build(int index, NeighRequest *rq)
 int NeighborCuda::check_distance()
 {
-  double delx,dely,delz,rsq;
+  double delx, dely, delz, rsq;
-  double delta,deltasq,delta1,delta2;
+  double delta, deltasq, delta1, delta2;
-  if (boxcheck) {
+  if(boxcheck) {
-    if (triclinic == 0) {
+    if(triclinic == 0) {
      delx = bboxlo[0] - boxlo_hold[0];
      dely = bboxlo[1] - boxlo_hold[1];
      delz = bboxlo[2] - boxlo_hold[2];
-      delta1 = sqrt(delx*delx + dely*dely + delz*delz);
+      delta1 = sqrt(delx * delx + dely * dely + delz * delz);
      delx = bboxhi[0] - boxhi_hold[0];
      dely = bboxhi[1] - boxhi_hold[1];
      delz = bboxhi[2] - boxhi_hold[2];
-      delta2 = sqrt(delx*delx + dely*dely + delz*delz);
+      delta2 = sqrt(delx * delx + dely * dely + delz * delz);
-      delta = 0.5 * (skin - (delta1+delta2));
+      delta = 0.5 * (skin - (delta1 + delta2));
-      deltasq = delta*delta;
+      deltasq = delta * delta;
    } else {
      domain->box_corners();
      delta1 = delta2 = 0.0;
-      for (int i = 0; i < 8; i++) {
+
      for(int i = 0; i < 8; i++) {
        delx = corners[i][0] - corners_hold[i][0];
        dely = corners[i][1] - corners_hold[i][1];
        delz = corners[i][2] - corners_hold[i][2];
-        delta = sqrt(delx*delx + dely*dely + delz*delz);
+        delta = sqrt(delx * delx + dely * dely + delz * delz);
-        if (delta > delta1) delta1 = delta;
+
-        else if (delta > delta2) delta2 = delta;
+        if(delta > delta1) delta1 = delta;
        else if(delta > delta2) delta2 = delta;
      }
-      delta = 0.5 * (skin - (delta1+delta2));
+
-      deltasq = delta*delta;
+      delta = 0.5 * (skin - (delta1 + delta2));
      deltasq = delta * delta;
    }
  } else deltasq = triggersq;
-  double **x = atom->x;
+  double** x = atom->x;
  int nlocal = atom->nlocal;
-  if (includegroup) nlocal = atom->nfirst;
+
  if(includegroup) nlocal = atom->nfirst;
  int flag = 0;
-  if (not cuda->neighbor_decide_by_integrator) {
+  if(not cuda->neighbor_decide_by_integrator) {
    cuda->cu_x_download();
-    for (int i = 0; i < nlocal; i++) {
+
    for(int i = 0; i < nlocal; i++) {
      delx = x[i][0] - xhold[i][0];
      dely = x[i][1] - xhold[i][1];
      delz = x[i][2] - xhold[i][2];
-      rsq = delx*delx + dely*dely + delz*delz;
+      rsq = delx * delx + dely * dely + delz * delz;
-      if (rsq > deltasq) flag = 1;
+
      if(rsq > deltasq) flag = 1;
    }
-  }
+  } else flag = cuda->shared_data.atom.reneigh_flag;
  else flag = cuda->shared_data.atom.reneigh_flag;
  int flagall;
-  MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
+  MPI_Allreduce(&flag, &flagall, 1, MPI_INT, MPI_MAX, world);
-  if (flagall && ago == MAX(every,delay)) ndanger++;
+
  if(flagall && ago == MAX(every, delay)) ndanger++;
  return flagall;
 }
 /* ---------------------------------------------------------------------- */
-void NeighborCuda::build()
+void NeighborCuda::build(int topoflag)
 {
  int i;
  ago = 0;
  ncalls++;
-
+  lastcall = update->ntimestep;
  // store current atom positions and box size if needed
-  if (dist_check) {
+  if(dist_check) {
-    if (cuda->decide_by_integrator())
+    if(cuda->decide_by_integrator())
      cuda->update_xhold(maxhold, &xhold[0][0]);
    else {
-      if (cuda->finished_setup) cuda->cu_x_download();
+      if(cuda->finished_setup) cuda->cu_x_download();
-      double **x = atom->x;
+      double** x = atom->x;
      int nlocal = atom->nlocal;
-      if (includegroup) nlocal = atom->nfirst;
+
-      if (nlocal > maxhold) {
+      if(includegroup) nlocal = atom->nfirst;
      if(nlocal > maxhold) {
        maxhold = atom->nmax;
        memory->destroy(xhold);
-        memory->create(xhold,maxhold,3,"neigh:xhold");
+        memory->create(xhold, maxhold, 3, "neigh:xhold");
      }
-      for (i = 0; i < nlocal; i++) {
+
      for(i = 0; i < nlocal; i++) {
        xhold[i][0] = x[i][0];
        xhold[i][1] = x[i][1];
        xhold[i][2] = x[i][2];
      }
-      if (boxcheck) {
+
-        if (triclinic == 0) {
+      if(boxcheck) {
        if(triclinic == 0) {
          boxlo_hold[0] = bboxlo[0];
          boxlo_hold[1] = bboxlo[1];
          boxlo_hold[2] = bboxlo[2];
@ -165,7 +178,8 @@ void NeighborCuda::build()
        } else {
          domain->box_corners();
          corners = domain->corners;
-          for (i = 0; i < 8; i++) {
+
          for(i = 0; i < 8; i++) {
            corners_hold[i][0] = corners[i][0];
            corners_hold[i][1] = corners[i][1];
            corners_hold[i][2] = corners[i][2];
@ -175,9 +189,10 @@ void NeighborCuda::build()
    }
  }
-  if (not cudable && cuda->finished_setup && atom->avec->cudable)
+  if(not cudable && cuda->finished_setup && atom->avec->cudable)
    cuda->downloadAll();
-  if (cudable && (not cuda->finished_setup)) {
+
  if(cudable && (not cuda->finished_setup)) {
    cuda->checkResize();
    cuda->uploadAll();
  }
@ -187,37 +202,39 @@ void NeighborCuda::build()
  // else only invoke grow() if nlocal exceeds previous list size
  // only done for lists with growflag set and which are perpetual
-  if (anyghostlist && atom->nlocal+atom->nghost > maxatom) {
+  if(anyghostlist && atom->nlocal + atom->nghost > maxatom) {
    maxatom = atom->nmax;
-    for (i = 0; i < nglist; i++) lists[glist[i]]->grow(maxatom);
+
-  } else if (atom->nlocal > maxatom) {
+    for(i = 0; i < nglist; i++) lists[glist[i]]->grow(maxatom);
  } else if(atom->nlocal > maxatom) {
    maxatom = atom->nmax;
-    for (i = 0; i < nglist; i++) lists[glist[i]]->grow(maxatom);
+
    for(i = 0; i < nglist; i++) lists[glist[i]]->grow(maxatom);
  }
  // extend atom bin list if necessary
-  if (style != NSQ && atom->nmax > maxbin) {
+  if(style != NSQ && atom->nmax > maxbin) {
    maxbin = atom->nmax;
    memory->destroy(bins);
-    memory->create(bins,maxbin,"bins");
+    memory->create(bins, maxbin, "bins");
  }
  // check that neighbor list with special bond flags will not overflow
-  if (atom->nlocal+atom->nghost > NEIGHMASK)
+  if(atom->nlocal + atom->nghost > NEIGHMASK)
-    error->one(FLERR,"Too many local+ghost atoms for neighbor list");
+    error->one(FLERR, "Too many local+ghost atoms for neighbor list");
  // invoke building of pair and molecular neighbor lists
  // only for pairwise lists with buildflag set
-  for (i = 0; i < nblist; i++)
+  for(i = 0; i < nblist; i++)
    (this->*pair_build[blist[i]])(lists[blist[i]]);
-  if (atom->molecular) {
+  if(atom->molecular && topoflag) {
-    if (force->bond) (this->*bond_build)();
+    if(force->bond)(this->*bond_build)();
-    if (force->angle) (this->*angle_build)();
+    if(force->angle)(this->*angle_build)();
-    if (force->dihedral) (this->*dihedral_build)();
+    if(force->dihedral)(this->*dihedral_build)();
-    if (force->improper) (this->*improper_build)();
+    if(force->improper)(this->*improper_build)();
  }
 }
--- a/src/USER-CUDA/neighbor_cuda.h
+++ b/src/USER-CUDA/neighbor_cuda.h
@ -23,7 +23,7 @@ class NeighborCuda : public Neighbor {
  NeighborCuda(class LAMMPS *);
  void init();
  int check_distance();
-  void build();
+  void build(int do_build_bonded=1);
 private:
  class Cuda *cuda;
--- a/src/USER-CUDA/verlet_cuda.cpp
+++ b/src/USER-CUDA/verlet_cuda.cpp
@ -52,6 +52,9 @@
 #include "cuda.h"
 #include <ctime>
 #include <cmath>
 #ifdef _OPENMP
 #include "omp.h"
 #endif
 using namespace LAMMPS_NS;
@ -834,11 +837,6 @@ void VerletCuda::run(int n)
      cuda->shared_data.buffer_new = 2;
      if(atom->molecular) {
        cuda->cu_molecule->download();
        cuda->cu_x->download();
      }
      MYDBG(printf("# CUDA VerletCuda::iterate: neighbor build\n");)
      timer->stamp(TIME_COMM);
      clock_gettime(CLOCK_REALTIME, &endtime);
@ -847,21 +845,19 @@ void VerletCuda::run(int n)
      //rebuild neighbor list
      test_atom(testatom, "Pre Neighbor");
-      neighbor->build();
+      neighbor->build(0);
      timer->stamp(TIME_NEIGHBOR);
      MYDBG(printf("# CUDA VerletCuda::iterate: neighbor done\n");)
      //if bonded interactions are used (in this case collect_forces_later is true), transfer data which only changes upon exchange/border routines from GPU to CPU
      if(cuda->shared_data.pair.collect_forces_later) {
-        if(cuda->cu_molecule) cuda->cu_molecule->download();
+        if(cuda->cu_molecule) cuda->cu_molecule->downloadAsync(2);
-        cuda->cu_tag->download();
+        cuda->cu_tag->downloadAsync(2);
-        cuda->cu_type->download();
+        cuda->cu_type->downloadAsync(2);
-        cuda->cu_mask->download();
+        cuda->cu_mask->downloadAsync(2);
-        if(cuda->cu_q) cuda->cu_q->download();
+        if(cuda->cu_q) cuda->cu_q->downloadAsync(2);
      }
      cuda->shared_data.comm.comm_phase = 3;
    }
@ -949,6 +945,11 @@ void VerletCuda::run(int n)
      timer->stamp(TIME_PAIR);
      if(neighbor->lastcall == update->ntimestep) {
        neighbor->build_topology();
        timer->stamp(TIME_NEIGHBOR);
      }
      test_atom(testatom, "pre bond force");
      if(force->bond) force->bond->compute(eflag, vflag);
--- a/src/atom.cpp
+++ b/src/atom.cpp
@ -1384,14 +1384,25 @@ void Atom::update_callback(int ifix)
 void *Atom::extract(char *name)
 {
  if (strcmp(name,"mass") == 0) return (void *) mass;
  if (strcmp(name,"id") == 0) return (void *) tag;
  if (strcmp(name,"type") == 0) return (void *) type;
  if (strcmp(name,"mask") == 0) return (void *) mask;
  if (strcmp(name,"image") == 0) return (void *) image;
  if (strcmp(name,"x") == 0) return (void *) x;
  if (strcmp(name,"v") == 0) return (void *) v;
  if (strcmp(name,"f") == 0) return (void *) f;
-  if (strcmp(name,"mass") == 0) return (void *) mass;
+  if (strcmp(name,"molecule") == 0) return (void *) molecule;
  if (strcmp(name,"q") == 0) return (void *) q;
  if (strcmp(name,"mu") == 0) return (void *) mu;
  if (strcmp(name,"omega") == 0) return (void *) omega;
  if (strcmp(name,"amgmom") == 0) return (void *) angmom;
  if (strcmp(name,"torque") == 0) return (void *) torque;
  if (strcmp(name,"radius") == 0) return (void *) radius;
  if (strcmp(name,"rmass") == 0) return (void *) rmass;
  if (strcmp(name,"vfrac") == 0) return (void *) vfrac;
  if (strcmp(name,"s0") == 0) return (void *) s0;
  return NULL;
 }
--- a/src/finish.cpp
+++ b/src/finish.cpp
@ -649,8 +649,10 @@ void Finish::end(int flag)
        if (atom->molecular && atom->natoms > 0)
          fprintf(screen,"Ave special neighs/atom = %g\n",
                  nspec_all/atom->natoms);
-        fprintf(screen,"Neighbor list builds = %d\n",neighbor->ncalls);
+        fprintf(screen,"Neighbor list builds = " BIGINT_FORMAT "\n",
-        fprintf(screen,"Dangerous builds = %d\n",neighbor->ndanger);
+                neighbor->ncalls);
        fprintf(screen,"Dangerous builds = " BIGINT_FORMAT "\n",
                neighbor->ndanger);
      }
      if (logfile) {
        if (nall < 2.0e9)
@ -662,8 +664,10 @@ void Finish::end(int flag)
        if (atom->molecular && atom->natoms > 0)
          fprintf(logfile,"Ave special neighs/atom = %g\n",
                  nspec_all/atom->natoms);
-        fprintf(logfile,"Neighbor list builds = %d\n",neighbor->ncalls);
+        fprintf(logfile,"Neighbor list builds = " BIGINT_FORMAT "\n",
-        fprintf(logfile,"Dangerous builds = %d\n",neighbor->ndanger);
+                neighbor->ncalls);
        fprintf(logfile,"Dangerous builds = " BIGINT_FORMAT "\n",
                neighbor->ndanger);
      }
    }
  }
--- a/src/library.cpp
+++ b/src/library.cpp
@ -30,6 +30,7 @@
 #include "modify.h"
 #include "compute.h"
 #include "fix.h"
 #include "memory.h"
 using namespace LAMMPS_NS;
@ -157,11 +158,19 @@ void *lammps_extract_atom(void *ptr, char *name)
   id = compute ID
   style = 0 for global data, 1 for per-atom data, 2 for local data
   type = 0 for scalar, 1 for vector, 2 for array
   for global data, returns a pointer to the
     compute's internal data structure for the entity
     caller should cast it to (double *) for a scalar or vector
     caller should cast it to (double **) for an array
   for per-atom or local data, returns a pointer to the
     compute's internal data structure for the entity
     caller should cast it to (double *) for a vector
     caller should cast it to (double **) for an array
   returns a void pointer to the compute's internal data structure
     for the entity which the caller can cast to the proper data type
   returns a NULL if id is not recognized or style/type not supported
   IMPORTANT: if the compute is not current it will be invoked
-     LAMMPS cannot easily check if it is valid to invoke the compute,
+     LAMMPS cannot easily check here if it is valid to invoke the compute,
     so caller must insure that it is OK
 ------------------------------------------------------------------------- */
@ -236,7 +245,8 @@ void *lammps_extract_compute(void *ptr, char *id, int style, int type)
     which the caller can cast to a (double *) which points to the value
   for per-atom or local data, returns a pointer to the
     fix's internal data structure for the entity
-     which the caller can cast to the proper data type
+     caller should cast it to (double *) for a vector
     caller should cast it to (double **) for an array
   returns a NULL if id is not recognized or style/type not supported
   IMPORTANT: for global data,
     this function allocates a double to store the value in,
@ -244,7 +254,7 @@ void *lammps_extract_compute(void *ptr, char *id, int style, int type)
       double *dptr = (double *) lammps_extract_fix();
       double value = *dptr;
       free(dptr);
-   IMPORTANT: LAMMPS cannot easily check when info extracted from
+   IMPORTANT: LAMMPS cannot easily check here when info extracted from
     the fix is valid, so caller must insure that it is OK
 ------------------------------------------------------------------------- */
@ -300,7 +310,7 @@ void *lammps_extract_fix(void *ptr, char *id, int style, int type,
     which the caller can cast to a (double *) which points to the value
   for atom-style variable, returns a pointer to the
     vector of per-atom values on each processor,
-     which the caller can cast to the proper data type
+     which the caller can cast to a (double *) which points to the values
   returns a NULL if name is not recognized or not equal-style or atom-style
   IMPORTANT: for both equal-style and atom-style variables,
     this function allocates memory to store the variable data in
@ -313,7 +323,7 @@ void *lammps_extract_fix(void *ptr, char *id, int style, int type,
       double *vector = (double *) lammps_extract_variable();
       use the vector values
       free(vector);
-   IMPORTANT: LAMMPS cannot easily check when it is valid to evaluate
+   IMPORTANT: LAMMPS cannot easily check here when it is valid to evaluate
     the variable or any fixes or computes or thermodynamic info it references,
     so caller must insure that it is OK
 ------------------------------------------------------------------------- */
@ -343,7 +353,10 @@ void *lammps_extract_variable(void *ptr, char *name, char *group)
  return NULL;
 }
-/* ---------------------------------------------------------------------- */
+/* ----------------------------------------------------------------------
   return the total number of atoms in the system
   useful before call to lammps_get_atoms() so can pre-allocate vector
 ------------------------------------------------------------------------- */
 int lammps_get_natoms(void *ptr)
 {
@ -353,9 +366,18 @@ int lammps_get_natoms(void *ptr)
  return natoms;
 }
-/* ---------------------------------------------------------------------- */
+/* ----------------------------------------------------------------------
   gather the named atom-based entity across all processors
   name = desired quantity, e.g. x or charge
   type = 0 for integer values, 1 for double values
   count = # of per-atom values, e.g. 1 for type or charge, 3 for x or f
   return atom-based values in data, ordered by count, then by atom ID
     e.g. x[0][0],x[0][1],x[0][2],x[1][0],x[1][1],x[1][2],x[2][0],...
   data must be pre-allocated by caller to correct length
 ------------------------------------------------------------------------- */
-void lammps_get_coords(void *ptr, double *coords)
+void lammps_gather_atoms(void *ptr, char *name, 
                         int type, int count, void *data)
 {
  LAMMPS *lmp = (LAMMPS *) ptr;
@ -365,47 +387,135 @@ void lammps_get_coords(void *ptr, double *coords)
  if (lmp->atom->natoms > MAXSMALLINT) return;
  int natoms = static_cast<int> (lmp->atom->natoms);
  double *copy = new double[3*natoms];
  for (int i = 0; i < 3*natoms; i++) copy[i] = 0.0;
-  double **x = lmp->atom->x;
+  int i,j,offset;
  void *vptr = lmp->atom->extract(name);
  // copy = Natom length vector of per-atom values
  // use atom ID to insert each atom's values into copy
  // MPI_Allreduce with MPI_SUM to merge into data, ordered by atom ID
  if (type == 0) {
    int *vector = NULL;
    int **array = NULL;
    if (count == 1) vector = (int *) vptr;
    else array = (int **) vptr;
    int *copy;
    lmp->memory->create(copy,count*natoms,"lib/gather:copy");
    for (i = 0; i < count*natoms; i++) copy[i] = 0;
    int *tag = lmp->atom->tag;
    int nlocal = lmp->atom->nlocal;
-  int id,offset;
+    if (count == 1)
-  for (int i = 0; i < nlocal; i++) {
+      for (i = 0; i < nlocal; i++)
-    id = tag[i];
+        copy[tag[i]-1] = vector[i];
-    offset = 3*(id-1);
+    else
-    copy[offset+0] = x[i][0];
+      for (i = 0; i < nlocal; i++) {
-    copy[offset+1] = x[i][1];
+        offset = count*(tag[i]-1);
-    copy[offset+2] = x[i][2];
+        for (j = 0; j < count; j++)
          copy[offset++] = array[i][0];
      }
-  MPI_Allreduce(copy,coords,3*natoms,MPI_DOUBLE,MPI_SUM,lmp->world);
+    MPI_Allreduce(copy,data,count*natoms,MPI_INT,MPI_SUM,lmp->world);
-  delete [] copy;
+    lmp->memory->destroy(copy);
  } else {
    double *vector = NULL;
    double **array = NULL;
    if (count == 1) vector = (double *) vptr;
    else array = (double **) vptr;
    double *copy;
    lmp->memory->create(copy,count*natoms,"lib/gather:copy");
    for (i = 0; i < count*natoms; i++) copy[i] = 0.0;
    int *tag = lmp->atom->tag;
    int nlocal = lmp->atom->nlocal;
    if (count == 1) {
      for (i = 0; i < nlocal; i++)
        copy[tag[i]-1] = vector[i];
    } else {
      for (i = 0; i < nlocal; i++) {
        offset = count*(tag[i]-1);
        for (j = 0; j < count; j++)
          copy[offset++] = array[i][j];
      }
    }
    MPI_Allreduce(copy,data,count*natoms,MPI_DOUBLE,MPI_SUM,lmp->world);
    lmp->memory->destroy(copy);
  }
 }
-/* ---------------------------------------------------------------------- */
+/* ----------------------------------------------------------------------
   scatter the named atom-based entity across all processors
   name = desired quantity, e.g. x or charge
   type = 0 for integer values, 1 for double values
   count = # of per-atom values, e.g. 1 for type or charge, 3 for x or f
   data = atom-based values in data, ordered by count, then by atom ID
     e.g. x[0][0],x[0][1],x[0][2],x[1][0],x[1][1],x[1][2],x[2][0],...
 ------------------------------------------------------------------------- */
-void lammps_put_coords(void *ptr, double *coords)
+void lammps_scatter_atoms(void *ptr, char *name,
                          int type, int count, void *data)
 {
  LAMMPS *lmp = (LAMMPS *) ptr;
-  // error if no map defined by LAMMPS
+  // error if tags are not defined or not consecutive
-  if (lmp->atom->map_style == 0) return;
+  if (lmp->atom->tag_enable == 0 || lmp->atom->tag_consecutive() == 0) return;
  if (lmp->atom->natoms > MAXSMALLINT) return;
  int natoms = static_cast<int> (lmp->atom->natoms);
  double **x = lmp->atom->x;
-  int m,offset;
+  int i,j,m,offset;
-  for (int i = 0; i < natoms; i++) {
+  void *vptr = lmp->atom->extract(name);
  // copy = Natom length vector of per-atom values
  // use atom ID to insert each atom's values into copy
  // MPI_Allreduce with MPI_SUM to merge into data, ordered by atom ID
  if (type == 0) {
    int *vector = NULL;
    int **array = NULL;
    if (count == 1) vector = (int *) vptr;
    else array = (int **) vptr;
    int *dptr = (int *) data;
    if (count == 1)
      for (i = 0; i < natoms; i++)
        if ((m = lmp->atom->map(i+1)) >= 0)
          vector[m] = dptr[i];
    else
      for (i = 0; i < natoms; i++)
        if ((m = lmp->atom->map(i+1)) >= 0) {
-      offset = 3*i;
+          offset = count*i;
-      x[m][0] = coords[offset+0];
+          for (j = 0; j < count; j++)
-      x[m][1] = coords[offset+1];
+            array[m][j] = dptr[offset++];
-      x[m][2] = coords[offset+2];
+        }
  } else {
    double *vector = NULL;
    double **array = NULL;
    if (count == 1) vector = (double *) vptr;
    else array = (double **) vptr;
    double *dptr = (double *) data;
    if (count == 1) {
      for (i = 0; i < natoms; i++)
        if ((m = lmp->atom->map(i+1)) >= 0)
          vector[m] = dptr[i];
    } else {
      for (i = 0; i < natoms; i++) {
        if ((m = lmp->atom->map(i+1)) >= 0) {
          offset = count*i;
          for (j = 0; j < count; j++)
            array[m][j] = dptr[offset++];
        }
      }
    }
  }
 }
--- a/src/library.h
+++ b/src/library.h
@ -38,12 +38,13 @@ void *lammps_extract_fix(void *, char *, int, int, int, int);
 void *lammps_extract_variable(void *, char *, char *);
 int lammps_get_natoms(void *);
-void lammps_get_coords(void *, double *);
+void lammps_gather_atoms(void *, char *, int, int, void *);
-void lammps_put_coords(void *, double *);
+void lammps_scatter_atoms(void *, char *, int, int, void *);
 #ifdef __cplusplus
 }
 #endif
 /* ERROR/WARNING messages:
 */
--- a/src/neighbor.cpp
+++ b/src/neighbor.cpp
@ -1259,14 +1259,16 @@ int Neighbor::check_distance()
 /* ----------------------------------------------------------------------
   build all perpetual neighbor lists every few timesteps
   pairwise & topology lists are created as needed
   topology lists only built if topoflag = 1
 ------------------------------------------------------------------------- */
-void Neighbor::build()
+void Neighbor::build(int topoflag)
 {
  int i;
  ago = 0;
  ncalls++;
  lastcall = update->ntimestep;
  // store current atom positions and box size if needed
@ -1336,12 +1338,20 @@ void Neighbor::build()
  for (i = 0; i < nblist; i++)
    (this->*pair_build[blist[i]])(lists[blist[i]]);
-  if (atom->molecular) {
+  if (atom->molecular && topoflag) build_topology();
 }
 /* ----------------------------------------------------------------------
   build all topology neighbor lists every few timesteps
   normally built with pair lists, but USER-CUDA separates them
 ------------------------------------------------------------------------- */
 void Neighbor::build_topology()
 {
  if (force->bond) (this->*bond_build)();
  if (force->angle) (this->*angle_build)();
  if (force->dihedral) (this->*dihedral_build)();
  if (force->improper) (this->*improper_build)();
  }
 }
 /* ----------------------------------------------------------------------
--- a/src/neighbor.h
+++ b/src/neighbor.h
@ -38,8 +38,9 @@ class Neighbor : protected Pointers {
  double cutneighmax;              // max neighbor cutoff for all type pairs
  double *cuttype;                 // for each type, max neigh cut w/ others
-  int ncalls;                      // # of times build has been called
+  bigint ncalls;                   // # of times build has been called
-  int ndanger;                     // # of dangerous builds
+  bigint ndanger;                  // # of dangerous builds
  bigint lastcall;                 // timestep of last neighbor::build() call
  int nrequest;                    // requests for pairwise neighbor lists
  class NeighRequest **requests;   // from Pair, Fix, Compute, Command classes
@ -70,7 +71,8 @@ class Neighbor : protected Pointers {
  int decide();                     // decide whether to build or not
  virtual int check_distance();     // check max distance moved since last build
  void setup_bins();                // setup bins based on box and cutoff
-  virtual void build();             // create all neighbor lists (pair,bond)
+  virtual void build(int topoflag=1);  // create all neighbor lists (pair,bond)
  virtual void build_topology();    // create all topology neighbor lists
  void build_one(int);              // create a single neighbor list
  void set(int, char **);           // set neighbor style and skin distance
  void modify_params(int, char**);  // modify parameters that control builds
--- a/src/version.h
+++ b/src/version.h
@ -1 +1 @@
-#define LAMMPS_VERSION "16 Aug 2012"
+#define LAMMPS_VERSION "21 Aug 2012"
`@ -1 +1 @@`
	`#define LAMMPS_VERSION "16 Aug 2012"`	`#define LAMMPS_VERSION "21 Aug 2012"`