git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12592 f3b2605a-c512-4ea7-a41b-209d697bcdaa

2014-10-07 15:08:33 +00:00
parent fcad656d92
commit 16dcead2d2
16 changed files with 1085 additions and 583 deletions
--- a/doc/Section_accelerate.html
+++ b/doc/Section_accelerate.html
@ -165,8 +165,8 @@ coprocessors.
 </P>
 <P>All of these commands are in packages provided with LAMMPS.  An
 overview of packages is give in <A HREF = "Section_packages.html">Section
-packages</A>.  Currently, there are 6 accelerator
-packages in LAMMPS, either as standard or user packages:
+packages</A>.  These are the accelerator packages
+currently in LAMMPS, either as standard or user packages:
 </P>
 <DIV ALIGN=center><TABLE  BORDER=1 >
 <TR><TD ><A HREF = "accelerate_cuda.html">USER-CUDA</A> </TD><TD > for NVIDIA GPUs</TD></TR>
@ -201,8 +201,8 @@ for that style.
 </P>
 <P>To use an accelerator package in LAMMPS, and one or more of the styles
 it provides, follow these general steps.  Details vary from package to
-package and are explained in the individual accelerator sub-section
-doc pages, listed above:
+package and are explained in the individual accelerator doc pages,
+listed above:
 </P>
 <DIV ALIGN=center><TABLE  BORDER=1 >
 <TR><TD >build the accelerator library </TD><TD >  only for USER-CUDA and GPU packages </TD></TR>
@ -215,26 +215,46 @@ doc pages, listed above:
 <TR><TD >use accelerated styles in your input script </TD><TD >  via "-sf" <A HREF = "Section_start.html#start_7">command-line switch</A> or  <A HREF = "suffix.html">suffix</A> command 
 </TD></TR></TABLE></DIV>

-<P>The first 4 steps typically only need to be done once, to create an
-executable that uses one or more accelerator packages.  We are working
-to create a "make" tool that will perform all these 4 steps in a
-single command.
+<P>The first 4 steps can be done as a single command, using the
+src/Make.py tool.  The Make.py tool is discussed in <A HREF = "Section_start.html#start_4">Section
+2.4</A> of the manual, and its use is
+illustrated in the individual accelerator sections.  Typically these
+steps only need to be done once, to create an executable that uses one
+or more accelerator packages.
 </P>
 <P>The last 4 steps can all be done from the command-line when LAMMPS is
-launched, without changing your input script.  Or you can add
+launched, without changing your input script, as illustrated in the
+individual accelerator sections.  Or you can add
 <A HREF = "package.html">package</A> and <A HREF = "suffix.html">suffix</A> commands to your input
 script.
 </P>
-<P>The examples directory has several sub-directories with scripts and
-README files for how to use the following accelerator packages:
+<P>IMPORTANT NOTE: With a few exceptions, you can build a single LAMMPS
+executable with all its accelerator packages installed.  Note that the
+USER-INTEL and KOKKOS packages require you to choose one of their
+options when building.  I.e. CPU or Phi for USER-INTEL.  OpenMP, Cuda,
+or Phi for KOKKOS.  Here are the exceptions; you cannot build a single
+executable with:
 </P>
-<UL><LI>examples/cuda for USER-CUDA package
-<LI>examples/gpu for GPU package
-<LI>examples/intel for USER-INTEL package
-<LI>examples/kokkos for KOKKOS package 
+<UL><LI>both the USER-INTEL Phi and KOKKOS Phi options
+<LI>the USER-INTEL Phi or Kokkos Phi option, and either the USER-CUDA or GPU packages 
 </UL>
-<P>Likewise, the bench directory has FERMI and KEPLER sub-directories
-with scripts and README files for using all the accelerator packages.
+<P>See the examples/accelerate/README and make.list files for sample
+Make.py commands that build LAMMPS with any or all of the accelerator
+packages.  As an example, here is a command that builds with all the
+GPU related packages installed (USER-CUDA, GPU, KOKKOS with Cuda),
+including settings to build the needed auxiliary USER-CUDA and GPU
+libraries for Kepler GPUs:
+</P>
+<PRE>Make.py -j 16 -p omp gpu cuda kokkos -cc nvcc wrap=mpi   -cuda mode=double arch=35 -gpu mode=double arch=35 \  -kokkos cuda arch=35 lib-all file mpi 
+</PRE>
+<P>The examples/accelerate directory also has input scripts that can be
+used with all of the accelerator packages.  See its README file for
+details.
+</P>
+<P>Likewise, the bench directory has FERMI and KEPLER and PHI
+sub-directories with Make.py commands and input scripts for using all
+the accelerator packages on various machines.  See the README files in
+those dirs.
 </P>
 <P>As mentioned above, the <A HREF = "http://lammps.sandia.gov/bench.html">Benchmark
 page</A> of the LAMMPS web site gives
@ -243,23 +263,25 @@ of the standard LAMMPS benchmark problems, as a function of problem
 size and number of compute nodes, on different hardware platforms.
 </P>
 <P>Here is a brief summary of what the various packages provide.  Details
-are in the individual package sub-sections listed above.
+are in the individual accelerator sections.
 </P>
 <UL><LI>Styles with a "cuda" or "gpu" suffix are part of the USER-CUDA or GPU
 packages, and can be run on NVIDIA GPUs.  The speed-up on a GPU
-depends on a variety of factors, as discussed below. 
+depends on a variety of factors, discussed in the accelerator
+sections. 

 <LI>Styles with an "intel" suffix are part of the USER-INTEL
 package. These styles support vectorized single and mixed precision
 calculations, in addition to full double precision.  In extreme cases,
 this can provide speedups over 3.5x on CPUs.  The package also
-supports acceleration with offload to Intel(R) Xeon Phi(TM)
+supports acceleration in "offload" mode to Intel(R) Xeon Phi(TM)
 coprocessors.  This can result in additional speedup over 2x depending
 on the hardware configuration. 

 <LI>Styles with a "kk" suffix are part of the KOKKOS package, and can be
-run using OpenMP, on an NVIDIA GPU, or on an Intel Xeon Phi.  The
-speed-up depends on a variety of factors, as discussed below. 
+run using OpenMP on multicore CPUs, on an NVIDIA GPU, or on an Intel
+Xeon Phi in "native" mode.  The speed-up depends on a variety of
+factors, as discussed on the KOKKOS accelerator page. 

 <LI>Styles with an "omp" suffix are part of the USER-OMP package and allow
 a pair-style to be run in multi-threaded mode using OpenMP.  This can
@ -272,7 +294,7 @@ overload the available bandwidth for communication.
 speed-up the pairwise calculations of your simulation by 5-25% on a
 CPU. 
 </UL>
-<P>The individual accelerator package sub-sections explain:
+<P>The individual accelerator package doc pages explain:
 </P>
 <UL><LI>what hardware and software the accelerated package requires
 <LI>how to build LAMMPS with the accelerated package
--- a/doc/Section_accelerate.txt
+++ b/doc/Section_accelerate.txt
@ -152,8 +152,8 @@ coprocessors.

 All of these commands are in packages provided with LAMMPS.  An
 overview of packages is give in "Section
-packages"_Section_packages.html.  Currently, there are 6 accelerator
-packages in LAMMPS, either as standard or user packages:
+packages"_Section_packages.html.  These are the accelerator packages
+currently in LAMMPS, either as standard or user packages:

 "USER-CUDA"_accelerate_cuda.html : for NVIDIA GPUs
 "GPU"_accelerate_gpu.html : for NVIDIA GPUs as well as OpenCL support
@ -186,8 +186,8 @@ for that style.

 To use an accelerator package in LAMMPS, and one or more of the styles
 it provides, follow these general steps.  Details vary from package to
-package and are explained in the individual accelerator sub-section
-doc pages, listed above:
+package and are explained in the individual accelerator doc pages,
+listed above:

 build the accelerator library |
  only for USER-CUDA and GPU packages |
@ -211,26 +211,48 @@ use accelerated styles in your input script |
  via "-sf" "command-line switch"_Section_start.html#start_7 or
  "suffix"_suffix.html command :tb(c=2,s=|)

-The first 4 steps typically only need to be done once, to create an
-executable that uses one or more accelerator packages.  We are working
-to create a "make" tool that will perform all these 4 steps in a
-single command.
+The first 4 steps can be done as a single command, using the
+src/Make.py tool.  The Make.py tool is discussed in "Section
+2.4"_Section_start.html#start_4 of the manual, and its use is
+illustrated in the individual accelerator sections.  Typically these
+steps only need to be done once, to create an executable that uses one
+or more accelerator packages.

 The last 4 steps can all be done from the command-line when LAMMPS is
-launched, without changing your input script.  Or you can add
+launched, without changing your input script, as illustrated in the
+individual accelerator sections.  Or you can add
 "package"_package.html and "suffix"_suffix.html commands to your input
 script.

-The examples directory has several sub-directories with scripts and
-README files for how to use the following accelerator packages:
+IMPORTANT NOTE: With a few exceptions, you can build a single LAMMPS
+executable with all its accelerator packages installed.  Note that the
+USER-INTEL and KOKKOS packages require you to choose one of their
+options when building.  I.e. CPU or Phi for USER-INTEL.  OpenMP, Cuda,
+or Phi for KOKKOS.  Here are the exceptions; you cannot build a single
+executable with:

-examples/cuda for USER-CUDA package
-examples/gpu for GPU package
-examples/intel for USER-INTEL package
-examples/kokkos for KOKKOS package :ul
+both the USER-INTEL Phi and KOKKOS Phi options
+the USER-INTEL Phi or Kokkos Phi option, and either the USER-CUDA or GPU packages :ul

-Likewise, the bench directory has FERMI and KEPLER sub-directories
-with scripts and README files for using all the accelerator packages.
+See the examples/accelerate/README and make.list files for sample
+Make.py commands that build LAMMPS with any or all of the accelerator
+packages.  As an example, here is a command that builds with all the
+GPU related packages installed (USER-CUDA, GPU, KOKKOS with Cuda),
+including settings to build the needed auxiliary USER-CUDA and GPU
+libraries for Kepler GPUs:
+
+Make.py -j 16 -p omp gpu cuda kokkos -cc nvcc wrap=mpi \
+  -cuda mode=double arch=35 -gpu mode=double arch=35 \\
+  -kokkos cuda arch=35 lib-all file mpi :pre
+
+The examples/accelerate directory also has input scripts that can be
+used with all of the accelerator packages.  See its README file for
+details.
+
+Likewise, the bench directory has FERMI and KEPLER and PHI
+sub-directories with Make.py commands and input scripts for using all
+the accelerator packages on various machines.  See the README files in
+those dirs.

 As mentioned above, the "Benchmark
 page"_http://lammps.sandia.gov/bench.html of the LAMMPS web site gives
@ -239,23 +261,25 @@ of the standard LAMMPS benchmark problems, as a function of problem
 size and number of compute nodes, on different hardware platforms.

 Here is a brief summary of what the various packages provide.  Details
-are in the individual package sub-sections listed above.
+are in the individual accelerator sections.

 Styles with a "cuda" or "gpu" suffix are part of the USER-CUDA or GPU
 packages, and can be run on NVIDIA GPUs.  The speed-up on a GPU
-depends on a variety of factors, as discussed below. :ulb,l
+depends on a variety of factors, discussed in the accelerator
+sections. :ulb,l

 Styles with an "intel" suffix are part of the USER-INTEL
 package. These styles support vectorized single and mixed precision
 calculations, in addition to full double precision.  In extreme cases,
 this can provide speedups over 3.5x on CPUs.  The package also
-supports acceleration with offload to Intel(R) Xeon Phi(TM)
+supports acceleration in "offload" mode to Intel(R) Xeon Phi(TM)
 coprocessors.  This can result in additional speedup over 2x depending
 on the hardware configuration. :l

 Styles with a "kk" suffix are part of the KOKKOS package, and can be
-run using OpenMP, on an NVIDIA GPU, or on an Intel Xeon Phi.  The
-speed-up depends on a variety of factors, as discussed below. :l
+run using OpenMP on multicore CPUs, on an NVIDIA GPU, or on an Intel
+Xeon Phi in "native" mode.  The speed-up depends on a variety of
+factors, as discussed on the KOKKOS accelerator page. :l

 Styles with an "omp" suffix are part of the USER-OMP package and allow
 a pair-style to be run in multi-threaded mode using OpenMP.  This can
@ -268,7 +292,7 @@ Styles with an "opt" suffix are part of the OPT package and typically
 speed-up the pairwise calculations of your simulation by 5-25% on a
 CPU. :l,ule

-The individual accelerator package sub-sections explain:
+The individual accelerator package doc pages explain:

 what hardware and software the accelerated package requires
 how to build LAMMPS with the accelerated package
--- a/doc/Section_start.html
+++ b/doc/Section_start.html
@ -85,14 +85,27 @@ launch a LAMMPS Windows executable on a Windows box.

 <A NAME = "start_2_1"></A><B><I>Read this first:</I></B> 

-<P>If you want to avoid building LAMMPS, read the preceeding section
-about options available for downloading and installing executables.
-Details are discussed on the <A HREF = "download">download</A> page.
+<P>If you want to avoid building LAMMPS yourself, read the preceeding
+section about options available for downloading and installing
+executables.  Details are discussed on the <A HREF = "download">download</A> page.
 </P>
-<P>Building LAMMPS can be simple or not-so-simple.  If MPI is already
-installed on your machine (or you just want to run LAMMPS in serial)
-and you can use one of the provided machine Makefiles and the build
-works on your platform, then it's simple.
+<P>Building LAMMPS can be simple or not-so-simple.  If all you need are
+the default packages installed in LAMMPS, and MPI is already installed
+on your machine, or you just want to run LAMMPS in serial, then you
+can typically use the Makefile.mpi or Makefile.serial files in
+src/MAKE and type one of these lines (from the src dir):
+</P>
+<PRE>make mpi
+make serial 
+</PRE>
+<P>Or if one of the other Makefile.machine files in the src/MAKE
+sub-directories matches your system (type "make" to see a list), you
+can use it as-is by typing (for example):
+</P>
+<PRE>make stampede 
+</PRE>
+<P>If any of these builds with an existing Makefile.machine works on your
+system, then you're done!
 </P>
 <P>If you want to do one of these:
 </P>
@ -105,7 +118,14 @@ works on your platform, then it's simple.
 auxiliary libraries exist on your machine or install them if they
 don't.  You may need to build additional libraries that are part of
 the LAMMPS package, before building LAMMPS.  You may need to edit a
-machine Makefile to make it compatible with your system.
+Makefile.machine file to make it compatible with your system.
+</P>
+<P>Note that there is a Make.py tool in the src directory that automates
+several of these steps, but you still have to know what you are doing.
+<A HREF = "#start_4">Section 2.4</A> below describes the tool.  It is a convenient
+way to work with installing/un-installing various packages, the
+Makefile.machine changes required by some packages, and the auxiliary
+libraries some of them use.
 </P>
 <P>Please read the following sections carefully.  If you are not
 comfortable with makefiles, or building codes on a Unix platform, or
@ -121,7 +141,7 @@ please post the issue to the <A HREF = "http://lammps.sandia.gov/mail.html">LAMM
 list</A>.
 </P>
 <P>If you succeed in building LAMMPS on a new kind of machine, for which
-there isn't a similar machine Makefile included in the src/MAKE
+there isn't a similar machine Makefile included in the src/MAKE/MORE
 directory, then send it to the developers and we can include it in the
 LAMMPS distribution.
 </P>
@ -133,21 +153,43 @@ LAMMPS distribution.
 </P>
 <P>The src directory contains the C++ source and header files for LAMMPS.
 It also contains a top-level Makefile and a MAKE sub-directory with
-low-level Makefile.* files for many machines.  From within the src
-directory, type "make" or "gmake".  You should see a list of available
-choices.  If one of those is the machine and options you want, you can
-type a command like:
+low-level Makefile.* files for many systems and machines.  See the
+src/MAKE/README file for a quick overview of what files are available
+and what sub-directories they are in.
 </P>
-<PRE>make linux
+<P>The src/MAKE dir has a few files that should work as-is on many
+platforms.  The src/MAKE/OPTIONS dir has more that inovke additional
+compiler, MPI, and other setting options commonly used by LAMMPS, to
+illustrate their syntax.  The src/MAKE/MACHINES dir has many more that
+have been tweaked or optimized for specific machines.  These files are
+all good starting points if you find you need to change them for your
+machine.  Put any file you edit into the src/MAKE/MINE directory and
+it will be never be touched by any LAMMPS updates.
+</P>
+<P>From within the src directory, type "make" or "gmake".  You should see
+a list of available choices from src/MAKE and all of its
+sub-directories.  If one of those has the options you want or is the
+machine you want, you can type a command like:
+</P>
+<PRE>make mpi
+or
+make serial_icc
 or
 gmake mac 
 </PRE>
+<P>Note that the corresponding Makefile.machine can exist in src/MAKE or
+any of its sub-directories.  If a file with the same name appears in
+multiple places (not a good idea), the order they are used is as
+follows: src/MAKE/MINE, src/MAKE, src/MAKE/OPTIONS, src/MAKE/MACHINES.
+This gives preference to a file you have created/edited and put in
+src/MAKE/MINE.
+</P>
 <P>Note that on a multi-processor or multi-core platform you can launch a
 parallel make, by using the "-j" switch with the make command, which
 will build LAMMPS more quickly.
 </P>
-<P>If you get no errors and an executable like lmp_linux or lmp_mac is
-produced, you're done; it's your lucky day.
+<P>If you get no errors and an executable like lmp_mpi or lmp_g++_serial
+or lmp_mac is produced, then you're done; it's your lucky day.
 </P>
 <P>Note that by default only a few of LAMMPS optional packages are
 installed.  To build LAMMPS with optional packages, see <A HREF = "#start_3">this
@ -157,43 +199,47 @@ section</A> below.
 </P>
 <P>If Step 0 did not work, you will need to create a low-level Makefile
 for your machine, like Makefile.foo.  You should make a copy of an
-existing src/MAKE/Makefile.* as a starting point.  The only portions
-of the file you need to edit are the first line, the "compiler/linker
-settings" section, and the "LAMMPS-specific settings" section.
+existing Makefile.* in src/MAKE or one of its sub-directories as a
+starting point.  The only portions of the file you need to edit are
+the first line, the "compiler/linker settings" section, and the
+"LAMMPS-specific settings" section.  When it works, put the edited
+file in src/MAKE/MINE and it will not be altered by any future LAMMPS
+updates.
 </P>
 <P><B>Step 2</B>
 </P>
-<P>Change the first line of src/MAKE/Makefile.foo to list the word "foo"
-after the "#", and whatever other options it will set.  This is the
-line you will see if you just type "make".
+<P>Change the first line of Makefile.foo to list the word "foo" after the
+"#", and whatever other options it will set.  This is the line you
+will see if you just type "make".
 </P>
 <P><B>Step 3</B>
 </P>
 <P>The "compiler/linker settings" section lists compiler and linker
 settings for your C++ compiler, including optimization flags.  You can
 use g++, the open-source GNU compiler, which is available on all Unix
-systems.  You can also use mpicc which will typically be available if
+systems.  You can also use mpicxx which will typically be available if
 MPI is installed on your system, though you should check which actual
 compiler it wraps.  Vendor compilers often produce faster code.  On
-boxes with Intel CPUs, we suggest using the commercial Intel icc
-compiler, which can be downloaded from <A HREF = "http://www.intel.com/software/products/noncom">Intel's compiler site</A>.
+boxes with Intel CPUs, we suggest using the Intel icc compiler, which
+can be downloaded from <A HREF = "http://www.intel.com/software/products/noncom">Intel's compiler site</A>.
 </P>


 <P>If building a C++ code on your machine requires additional libraries,
-then you should list them as part of the LIB variable.
+then you should list them as part of the LIB variable.  You should
+not need to do this if you use mpicxx.
 </P>
 <P>The DEPFLAGS setting is what triggers the C++ compiler to create a
 dependency list for a source file.  This speeds re-compilation when
 source (*.cpp) or header (*.h) files are edited.  Some compilers do
 not support dependency file creation, or may use a different switch
-than -D.  GNU g++ works with -D.  If your compiler can't create
-dependency files, then you'll need to create a Makefile.foo patterned
-after Makefile.storm, which uses different rules that do not involve
-dependency files.  Note that when you build LAMMPS for the first time
-on a new platform, a long list of *.d files will be printed out
-rapidly.  This is not an error; it is the Makefile doing its normal
-creation of dependencies.
+than -D.  GNU g++ and Intel icc works with -D.  If your compiler can't
+create dependency files, then you'll need to create a Makefile.foo
+patterned after Makefile.storm, which uses different rules that do not
+involve dependency files.  Note that when you build LAMMPS for the
+first time on a new platform, a long list of *.d files will be printed
+out rapidly.  This is not an error; it is the Makefile doing its
+normal creation of dependencies.
 </P>
 <P><B>Step 4</B>
 </P>
@ -277,20 +323,21 @@ Step 6 below for info about building LAMMPS with an FFT library.
 <P><B>Step 5</B>
 </P>
 <P>The 3 MPI variables are used to specify an MPI library to build LAMMPS
-with. 
+with.  Note that you do not need to set these if you use the MPI
+compiler mpicxx for your CC and LINK setting in the section above.
+The MPI wrapper knows where to find the needed files.
 </P>
 <P>If you want LAMMPS to run in parallel, you must have an MPI library
-installed on your platform.  If you use an MPI-wrapped compiler, such
-as "mpicc" to build LAMMPS, you should be able to leave these 3
-variables blank; the MPI wrapper knows where to find the needed files.
-If not, and MPI is installed on your system in the usual place (under
-/usr/local), you also may not need to specify these 3 variables.  On
-some large parallel machines which use "modules" for their
-compile/link environements, you may simply need to include the correct
-module in your build environment.  Or the parallel machine may have a
-vendor-provided MPI which the compiler has no trouble finding.
+installed on your platform.  If MPI is installed on your system in the
+usual place (under /usr/local), you also may not need to specify these
+3 variables, assuming /usr/local is in your path.  On some large
+parallel machines which use "modules" for their compile/link
+environements, you may simply need to include the correct module in
+your build environment, before building LAMMPS.  Or the parallel
+machine may have a vendor-provided MPI which the compiler has no
+trouble finding.
 </P>
-<P>Failing this, with these 3 variables you can specify where the mpi.h
+<P>Failing this, these 3 variables can be used to specify where the mpi.h
 file (MPI_INC) and the MPI library file (MPI_PATH) are found and the
 name of the library file (MPI_LIB).
 </P>
@ -310,20 +357,22 @@ arise when linking LAMMPS to the MPI library.
 </P>
 <P>If you just want to run LAMMPS on a single processor, you can use the
 dummy MPI library provided in src/STUBS, since you don't need a true
-MPI library installed on your system.  See the
-src/MAKE/Makefile.serial file for how to specify the 3 MPI variables
-in this case.  You will also need to build the STUBS library for your
-platform before making LAMMPS itself.  To build from the src
-directory, type "make stubs", or from the STUBS dir, type "make".
-This should create a libmpi_stubs.a file suitable for linking to
-LAMMPS.  If the build fails, you will need to edit the STUBS/Makefile
-for your platform.
+MPI library installed on your system.  See src/MAKE/Makefile.serial
+for how to specify the 3 MPI variables in this case.  You will also
+need to build the STUBS library for your platform before making LAMMPS
+itself.  Note that if you are building with src/MAKE/Makefile.serial,
+e.g. by typing "make serial", then the STUBS library is built for you.
 </P>
-<P>The file STUBS/mpi.c provides a CPU timer function called
-MPI_Wtime() that calls gettimeofday() .  If your system doesn't
-support gettimeofday() , you'll need to insert code to call another
-timer.  Note that the ANSI-standard function clock() rolls over after
-an hour or so, and is therefore insufficient for timing long LAMMPS
+<P>To build the STUBS library from the src directory, type "make stubs",
+or from the src/STUBS dir, type "make".  This should create a
+libmpi_stubs.a file suitable for linking to LAMMPS.  If the build
+fails, you will need to edit the STUBS/Makefile for your platform.
+</P>
+<P>The file STUBS/mpi.c provides a CPU timer function called MPI_Wtime()
+that calls gettimeofday() .  If your system doesn't support
+gettimeofday() , you'll need to insert code to call another timer.
+Note that the ANSI-standard function clock() rolls over after an hour
+or so, and is therefore insufficient for timing long LAMMPS
 simulations.
 </P>
 <P><B>Step 6</B>
@ -410,11 +459,9 @@ section</A> below, before proceeding to Step 9.
 </P>
 <P><B>Step 9</B>
 </P>
-<P>That's it.  Once you have a correct Makefile.foo, you have installed
-the optional LAMMPS packages you want to include in your build, and
-you have pre-built any other needed libraries (e.g. MPI, FFT, package
-libraries), all you need to do from the src directory is type
-something like this:
+<P>That's it.  Once you have a correct Makefile.foo, and you have
+pre-built any other needed libraries (e.g. MPI, FFT, etc) all you need
+to do from the src directory is type something like this:
 </P>
 <PRE>make foo
 or
@ -530,7 +577,7 @@ neighbor lists and would run very slowly in terms of CPU secs/timestep.
 <A NAME = "start_2_5"></A><B><I>Building for a Mac:</I></B> 

 <P>OS X is BSD Unix, so it should just work.  See the
-src/MAKE/Makefile.mac file.
+src/MAKE/MACHINES/Makefile.mac and Makefile.mac_mpi files.
 </P>
 <HR>

@ -559,7 +606,7 @@ excluded, you can build it yourself.
 </P>
 <P>One way to do this is install and use cygwin to build LAMMPS with a
 standard unix style make program, just as you would on a Linux box;
-see src/MAKE/Makefile.cygwin.
+see src/MAKE/MACHINES/Makefile.cygwin.
 </P>
 <P>The other way to do this is using Visual Studio and project files.
 See the src/WINDOWS directory and its README.txt file for instructions
@ -574,8 +621,13 @@ on both a basic build and a customized build with pacakges you select.
 <UL><LI><A HREF = "#start_3_1">Package basics</A>
 <LI><A HREF = "#start_3_2">Including/excluding packages</A>
 <LI><A HREF = "#start_3_3">Packages that require extra libraries</A>
-<LI><A HREF = "#start_3_4">Packages that use make variable settings</A> 
+<LI><A HREF = "#start_3_4">Packages that require Makefile.machine settings</A> 
 </UL>
+<P>Note that the following <A HREF = "#start_4">Section 2.4</A> describes the Make.py
+tool which can be used to install/un-install packages and build the
+auxiliary libraries which some of them use.  It can also auto-edit a
+Makefile.machine to add settings needed by some packages.
+</P>
 <HR>

 <A NAME = "start_3_1"></A><B><I>Package basics:</I></B> 
@ -583,9 +635,11 @@ on both a basic build and a customized build with pacakges you select.
 <P>The source code for LAMMPS is structured as a set of core files which
 are always included, plus optional packages.  Packages are groups of
 files that enable a specific set of features.  For example, force
-fields for molecular systems or granular systems are in packages.  You
-can see the list of all packages by typing "make package" from within
-the src directory of the LAMMPS distribution.
+fields for molecular systems or granular systems are in packages.
+</P>
+<P>You can see the list of all packages by typing "make package" from
+within the src directory of the LAMMPS distribution.  This also lists
+various make commands that can be used to manipulate packages.
 </P>
 <P>If you use a command in a LAMMPS input script that is specific to a
 particular package, you must have built LAMMPS with that package, else
@ -652,10 +706,11 @@ I.e. individual files are only included if their dependencies are
 already included.  Likewise, if a package is excluded, other files
 dependent on that package are also excluded.
 </P>
-<P>The reason to exclude packages is if you will never run certain kinds
-of simulations.  For some packages, this will keep you from having to
-build auxiliary libraries (see below), and will also produce a smaller
-executable which may run a bit faster.
+<P>If you will never run simulations that use the features in a
+particular packages, there is no reason to include it in your build.
+For some packages, this will keep you from having to build auxiliary
+libraries (see below), and will also produce a smaller executable
+which may run a bit faster.
 </P>
 <P>When you download a LAMMPS tarball, these packages are pre-installed
 in the src directory: KSPACE, MANYBODY,MOLECULE.  When you download
@ -666,9 +721,10 @@ pre-installed.
 no-name", where "name" is the name of the package in lower-case, e.g.
 name = kspace for the KSPACE package or name = user-atc for the
 USER-ATC package.  You can also type "make yes-standard", "make
-no-standard", "make yes-user", "make no-user", "make yes-all" or "make
-no-all" to include/exclude various sets of packages.  Type "make
-package" to see the all of the package-related make options.
+no-standard", "make yes-std", "make no-std", "make yes-user", "make
+no-user", "make yes-all" or "make no-all" to include/exclude various
+sets of packages.  Type "make package" to see the all of the
+package-related make options.
 </P>
 <P>IMPORTANT NOTE: Inclusion/exclusion of a package works by simply
 moving files back and forth between the main src directory and
@ -682,18 +738,19 @@ sub-directories.  You do not normally need to use these commands
 unless you are editing LAMMPS files or have downloaded a patch from
 the LAMMPS WWW site.
 </P>
-<P>Typing "make package-update" will overwrite src files with files from
-the package sub-directories if the package has been included.  It
-should be used after a patch is installed, since patches only update
-the files in the package sub-directory, but not the src files.  Typing
-"make package-overwrite" will overwrite files in the package
-sub-directories with src files.
+<P>Typing "make package-update" or "make pu" will overwrite src files
+with files from the package sub-directories if the package has been
+included.  It should be used after a patch is installed, since patches
+only update the files in the package sub-directory, but not the src
+files.  Typing "make package-overwrite" will overwrite files in the
+package sub-directories with src files.
 </P>
-<P>Typing "make package-status" will show which packages are currently
-included. Of those that are included, it will list files that are
-different in the src directory and package sub-directory.  Typing
-"make package-diff" lists all differences between these files.  Again,
-type "make package" to see all of the package-related make options.
+<P>Typing "make package-status" or "make ps" will show which packages are
+currently included. Of those that are included, it will list files
+that are different in the src directory and package sub-directory.
+Typing "make package-diff" lists all differences between these files.
+Again, type "make package" to see all of the package-related make
+options.
 </P>
 <HR>

@ -705,16 +762,16 @@ you get a LAMMPS build error about a missing library, this is likely
 the reason.  See the <A HREF = "Section_packages.html">Section_packages</A> doc page
 for a list of packages that have auxiliary libraries.
 </P>
-<P>Code for some of these auxiliary libraries is included in the LAMMPS
+<P>Code for most of these auxiliary libraries is included in the LAMMPS
 distribution under the lib directory.  Examples are the USER-ATC and
-MEAM packages.  Some auxiliary libraries are NOT included with LAMMPS;
-to use the associated package you must download and install the
-auxiliary library yourself.  Examples are the KIM and VORONOI and
+MEAM packages.  A few auxiliary libraries are NOT included with
+LAMMPS; to use the associated package you must download and install
+the auxiliary library yourself.  Examples are the KIM and VORONOI and
 USER-MOLFILE packages.
 </P>
-<P>For libraries with provided source code, each lib directory has a
-README file (e.g. lib/reax/README) with instructions on how to build
-that library.  Typically this is done by typing something like:
+<P>For provided libraries, each lib directory has a README file
+(e.g. lib/reax/README) with instructions on how to build that library.
+Typically this is done by typing something like:
 </P>
 <PRE>make -f Makefile.g++ 
 </PRE>
@ -746,168 +803,205 @@ is built with, typically requires additional Fortran-to-C libraries be
 included in the link.  Another example are the BLAS and LAPACK
 libraries needed to use the USER-ATC or USER-AWPMD packages.
 </P>
-<P>For libraries without provided source code, see the
-src/package/Makefile.lammps file for information on where to find the
-library and how to build it.  E.g. the file src/KIM/Makefile.lammps or
-src/VORONOI/Makefile.lammps or src/UESR-MOLFILE/Makefile.lammps.
-These files serve the same purpose as the lib/package/Makefile.lammps
-files described above.  The files have settings needed when LAMMPS is
-built to link with the corresponding auxiliary library.
+<P>For libraries without provided source code, the file
+src/package/README has information on where to find the library and
+how to build it, e.g. src/VORONOI/README.  There is also a
+Makefile.lammps file in the src/package directory.  E.g. files
+src/KIM/Makefile.lammps or src/VORONOI/Makefile.lammps or
+src/UESR-MOLFILE/Makefile.lammps.  These files serve the same purpose
+as the lib/package/Makefile.lammps files described above.  The files
+have settings needed when LAMMPS is built to link with the
+corresponding auxiliary library.
 </P>
 <P>Again, you must insure that the settings in
 src/package/Makefile.lammps are appropriate for your system and where
 you installed the auxiliary library.  If they are not, the LAMMPS
-build will fail.
+build will typically fail.
 </P>
 <HR>

-<A NAME = "start_3_4"></A><B><I>Packages that use make variable settings</I></B> 
+<A NAME = "start_3_4"></A><B><I>Packages that require Makefile.machine settings</I></B> 

-<P>One package, the KOKKOS package, allows its build options to be
-specified by setting variables via the "make" command, rather than by
-first building an auxiliary library and editing a Makefile.lammps
-file, as discussed in the previous sub-section for other packages.
-This is for convenience since it is common to want to experiment with
-different Kokkos library options.  Using variables enables a direct
-re-build of LAMMPS and its Kokkos dependencies, so that a benchmark
-test with different Kokkos options can be quickly performed.
+<P>A few packages require specific settings in Makefile.machine, to
+either build or use the package effectively.  These are the
+USER-INTEL, KOKKOS, USER-OMP, and OPT packages.  The details of what
+flags to add or what variables to define are given on the doc pages
+that describe each of these accelerator packages in detail:
 </P>
-<P>The syntax for setting make variables is as follows.  You must
-use a GNU-compatible make command for this to work.  Try "gmake"
-if your system's standard make complains.
-</P>
-<PRE>make yes-kokkos
-make g++ VAR1=value VAR2=value ... 
-</PRE>
-<P>The first line installs the KOKKOS package, which only needs to be
-done once.  The second line builds LAMMPS with src/MAKE/Makefile.g++
-and optionally sets one or more variables that affect the build.  Each
-variable is specified in upper-case; its value follows an equal sign
-with no spaces.  The second line can be repeated with different
-variable settings, though a "clean" must be done before the rebuild.
-Type "make clean" to see options for this operation.
-</P>
-<P>These are the variables that can be specified.  Each takes a value of
-<I>yes</I> or <I>no</I>.  The default value is listed, which is set in the
-lib/kokkos/Makefile.lammps file.  See <A HREF = "Section_accelerate.html#acc_8">this
-section</A> for a discussion of what is
-meant by "host" and "device" in the Kokkos context.
-</P>
-<UL><LI>OMP, default = <I>yes</I>
-<LI>CUDA, default = <I>no</I>
-<LI>HWLOC, default = <I>no</I>
-<LI>AVX, default = <I>no</I>
-<LI>MIC, default = <I>no</I>
-<LI>LIBRT, default = <I>no</I>
-<LI>DEBUG, default = <I>no</I> 
+<UL><LI><A HREF = "accelerate_intel.html">USER-INTEL package</A>
+<LI><A HREF = "accelerate_kokkos.html">KOKKOS package</A>
+<LI><A HREF = "accelerate_omp.html">USER-OMP package</A>
+<LI><A HREF = "accelerate_opt.html">OPT package</A> 
 </UL>
-<P>OMP sets the parallelization method used for Kokkos code (within
-LAMMPS) that runs on the host.  OMP=yes means that OpenMP will be
-used.  OMP=no means that pthreads will be used.
+<P>Here is a brief summary of what Makefile.machine changes are needed.
+Note that the Make.py tool, described in the next <A HREF = "#start_4">Section
+2.4</A> can automatically add the needed info to an existing
+machine Makefile, using simple command-line arguments.
 </P>
-<P>CUDA sets the parallelization method used for Kokkos code (within
-LAMMPS) that runs on the device.  CUDA=yes means an NVIDIA GPU running
-CUDA will be used.  CUDA=no means that the OMP=yes or OMP=no setting
-will be used for the device as well as the host.
+<P>In src/MAKE/OPTIONS see the following Makefiles for examples of the
+changes described below:
 </P>
-<P>If CUDA=yes, then the lo-level Makefile in the src/MAKE directory must
-use "nvcc" as its compiler, via its CC setting.  For best performance
-its CCFLAGS setting should use -O3 and have an -arch setting that
-matches the compute capability of your NVIDIA hardware and software
-installation, e.g. -arch=sm_20.  Generally Fermi Generation GPUs are
-sm_20, while Kepler generation GPUs are sm_30 or sm_35 and Maxwell
-cards are sm_50.  A complete list can be found on
-<A HREF = "http://en.wikipedia.org/wiki/CUDA#Supported_GPUs">wikipedia</A>. You can
-also use the deviceQuery tool that comes with the CUDA samples.  Note
-the minimal required compute capability is 2.0, but this will give
-signicantly reduced performance compared to Kepler generation GPUs
-with compute capability 3.x.  For the LINK setting, "nvcc" should not
-be used; instead use g++ or another compiler suitable for linking C++
-applications.  Often you will want to use your MPI compiler wrapper
-for this setting (i.e. mpicxx).  Finally, the lo-level Makefile must
-also have a "Compilation rule" for creating *.o files from *.cu files.
-See src/Makefile.cuda for an example of a lo-level Makefile with all
-of these settings.
+<UL><LI>Makefile.intel_cpu
+<LI>Makefile.intel_phi
+<LI>Makefile.kokkos_omp
+<LI>Makefile.kokkos_cuda
+<LI>Makefile.kokkos_phi
+<LI>Makefile.omp 
+</UL>
+<P>For the USER-INTEL package, you have 2 choices when building.  You can
+build with CPU or Phi support.  The latter uses Xeon Phi chips in
+"offload" mode.  Each of these modes requires additional settings in
+your Makefile.machine for CCFLAGS and LINKFLAGS.
 </P>
-<P>HWLOC binds threads to hardware cores, so they do not migrate during a
-simulation.  HWLOC=yes should always be used if running with OMP=no
-for pthreads.  It is not necessary for OMP=yes for OpenMP, because
-OpenMP provides alternative methods via environment variables for
-binding threads to hardware cores.  More info on binding threads to
-cores is given in <A HREF = "Section_accelerate.html#acc_8">this section</A>.
+<P>For CPU mode (if using an Intel compiler):
 </P>
-<P>AVX enables Intel advanced vector extensions when compiling for an
-Intel-compatible chip.  AVX=yes should only be set if your host
-hardware supports AVX.  If it does not support it, this will cause a
-run-time crash.
+<UL><LI>CCFLAGS: add -fopenmp, -DLAMMPS_MEMALIGN=64, -restrict, -xHost, -fno-alias, -ansi-alias, -override-limits
+<LI>LINKFLAGS: add -fopenmp 
+</UL>
+<P>For Phi mode add the following in addition to the CPU mode flags:
 </P>
-<P>MIC enables compiler switches needed when compling for an Intel Phi
-processor.
+<UL><LI>CCFLAGS: add -DLMP_INTEL_OFFLOAD and 
+<LI>LINKFLAGS: add -offload 
+</UL>
+<P>And also add this to CCFLAGS:
 </P>
-<P>LIBRT enables use of a more accurate timer mechanism on most Unix
-platforms.  This library is not available on all platforms.
+<PRE>-offload-option,mic,compiler,"-fp-model fast=2 -mGLOB_default_function_attrs=\"gather_scatter_loop_unroll=4\"" 
+</PRE>
+<P>For the KOKKOS package, you have 3 choices when building.  You can
+build with OMP or Cuda or Phi support.  Phi support uses Xeon Phi
+chips in "native" mode.  This can be done by setting the following
+variables in your Makefile.machine:
 </P>
-<P>DEBUG is only useful when developing a Kokkos-enabled style within
-LAMMPS.  DEBUG=yes enables printing of run-time debugging information
-that can be useful.  It also enables runtime bounds checking on Kokkos
-data structures.
+<UL><LI>for OMP support, set OMP = yes
+<LI>for Cuda support, set OMP = yes and CUDA = yes
+<LI>for Phi support, set OMP = yes and MIC = yes 
+</UL>
+<P>These can also be set as additional arguments to the make command, e.g.
 </P>
+<PRE>make g++ OMP=yes MIC=yes 
+</PRE>
+<P>Building the KOKKOS package with CUDA support requires a Makefile
+machine that uses the NVIDIA "nvcc" compiler, as well as an
+appropriate "arch" setting appropriate to the GPU hardware and NVIDIA
+software you have on your machine.  See
+src/MAKE/OPTIONS/Makefile.kokkos_cuda for an example of such a machine
+Makefile.
+</P>
+<P>For the USER-OMP package, your Makefile.machine needs additional
+settings for CCFLAGS and LINKFLAGS.
+</P>
+<UL><LI>CCFLAGS: add -fopenmp and -restrict
+<LI>LINKFLAGS: add -fopenmp 
+</UL>
+<P>For the OPT package, your Makefile.machine needs an additional
+settings for CCFLAGS.
+</P>
+<UL><LI>CCFLAGS: add -restrict 
+</UL>
 <HR>

 <H4><A NAME = "start_4"></A>2.4 Building LAMMPS via the Make.py script 
 </H4>
-<P>The src directory includes a Make.py script, written
-in Python, which can be used to automate various steps
-of the build process.
+<P>The src directory includes a Make.py script, written in Python, which
+can be used to automate various steps of the build process.  It is
+particularly useful for working with the accelerator packages, as well
+as other packages which require auxiliary libraries to be built.
 </P>
-<P>You can run the script from the src directory by typing either:
+<P>You can run Make.py from the src directory by typing either:
 </P>
-<PRE>Make.py
-python Make.py 
+<PRE>Make.py -h
+python Make.py -h 
 </PRE>
-<P>which will give you info about the tool.  For the former to work, you
-may need to edit the 1st line of the script to point to your local
+<P>which will give you help info about the tool.  For the former to work,
+you may need to edit the first line of Make.py to point to your local
 Python.  And you may need to insure the script is executable:
 </P>
 <PRE>chmod +x Make.py 
 </PRE>
-<P>The following options are supported as switches:
+<P>Here are examples of build tasks you can perform with Make.py:
 </P>
-<UL><LI>-i file1 file2 ...
-<LI>-p package1 package2 ...
-<LI>-u package1 package2 ...
-<LI>-e package1 arg1 arg2 package2 ...
-<LI>-o dir
-<LI>-b machine
-<LI>-s suffix1 suffix2 ...
-<LI>-l dir
-<LI>-j N
-<LI>-h switch1 switch2 ... 
+<DIV ALIGN=center><TABLE  BORDER=1 >
+<TR><TD >Install/uninstall packages</TD><TD > Make.py -p no-lib kokkos omp intel</TD></TR>
+<TR><TD >Build specific auxiliary libs</TD><TD > Make.py lib-atc lib-meam</TD></TR>
+<TR><TD >Build libs for all installed packages</TD><TD > Make.py -p cuda gpu -gpu mode=double arch=31 lib-all</TD></TR>
+<TR><TD >Create a Makefile from scratch with a compiler and MPI</TD><TD > Make.py -m none -cc g++ -mpi mpich file</TD></TR>
+<TR><TD >Augment Makefile.serial with settings for installed packages</TD><TD > Make.py -p intel -intel cpu -m serial file</TD></TR>
+<TR><TD >Add JPG and FFTW support to Makefile.mpi</TD><TD > Make.py -m mpi -jpg -fft fftw file</TD></TR>
+<TR><TD >Build LAMMPS with a parallel make using Makefile.mpi</TD><TD > Make.py -j 16 -m mpi exe</TD></TR>
+<TR><TD >Build LAMMPS and libs it needs using Makefile.serial with accelerator settings</TD><TD > Make.py -p gpu intel -intel cpu lib-all file serial 
+</TD></TR></TABLE></DIV>
+
+<P>The bench and examples directories give Make.py commands that can be
+used to build LAMMPS with the various packages and options needed to
+run all the benchmark and example input scripts.  See these files for
+more details:
+</P>
+<UL><LI>bench/README
+<LI>bench/FERMI/README
+<LI>bench/KEPLER/README
+<LI>bench/PHI/README
+<LI>examples/README
+<LI>examples/accelerate/README
+<LI>examples/accelerate/make.list 
 </UL>
-<P>Help on any switch can be listed by using -h, e.g.
+<P>All of the Make.py options and syntax help can be accessed by using
+the "-h" switch.
 </P>
-<PRE>Make.py -h -i -p 
+<P>E.g. typing "Make.py -h" gives
+</P>
+<PRE>Syntax: Make.py switch args ... <I>action1</I> <I>action2</I> ...
+  actions:
+    lib-all, lib-dir, clean, file, exe or machine
+    zero or more actions, in any order (machine must be last)
+  switches:
+    -d (dir), -j (jmake), -m (makefile), -o (output),
+    -p (packages), -r (redo), -s (settings), -v (verbose)
+  switches for libs:
+    -atc, -awpmd, -colvars, -cuda
+    -gpu, -meam, -poems, -qmmm, -reax
+  switches for build and makefile options:
+    -intel, -kokkos, -cc, -mpi, -fft, -jpg, -png 
 </PRE>
-<P>At a hi-level, these are the kinds of package management
-and build tasks that can be performed easily, using
-the Make.py tool:
+<P>Using the "-h" switch with other switches and actions gives additional
+info on all the other specified switches or actions.  The "-h" can be
+anywhere in the command-line and the other switches do not need their
+arguments.  E.g. type "Make.py -h -d -atc -intel" will print:
 </P>
-<UL><LI>install/uninstall packages and build the associated external libs (use -p and -u and -e)
-<LI>install packages needed for one or more input scripts (use -i and -p)
-<LI>build LAMMPS, either in the src dir or new dir (use -b)
-<LI>create a new dir with only the source code needed for one or more input scripts (use -i and -o) 
-</UL>
-<P>The last bullet can be useful when you wish to build a stripped-down
-version of LAMMPS to run a specific script(s).  Or when you wish to
-move the minimal amount of files to another platform for a remote
-LAMMPS build.
+<PRE>-d dir
+  dir = LAMMPS home dir
+  if -d not specified, working dir must be lammps/src 
+</PRE>
+<PRE>-atc make=suffix lammps=suffix2
+  all args are optional and can be in any order
+  make = use Makefile.suffix (def = g++)
+  lammps = use Makefile.lammps.suffix2 (def = EXTRAMAKE in makefile) 
+</PRE>
+<PRE>-intel mode
+  mode = cpu or phi (def = cpu)
+    build Intel package for CPU or Xeon Phi 
+</PRE>
+<P>Note that Make.py never overwrites an existing Makefile.machine.
+Instead, it creates src/MAKE/MINE/Makefile.auto, which you can save or
+rename if desired.  Likewise it creates an executable named
+src/lmp_auto, which you can rename using the -o switch if desired.
 </P>
-<P>Note that using Make.py is not a substitute for insuring you have a
-valid src/MAKE/Makefile.foo for your system, or that external library
-Makefiles in any lib/* directories you use are also valid for your
-system.  But once you have done that, you can use Make.py to quickly
-include/exclude the packages and external libraries needed by your
-input scripts.
+<P>The most recently executed Make.py commmand is saved in
+src/Make.py.last.  You can use the "-r" switch (for redo) to re-invoke
+the last command, or you can save a sequence of one or more Make.py
+commands to a file and invoke the file of commands using "-r".  You
+can also label the commands in the file and invoke one or more of them
+by name.
+</P>
+<P>A typical use of Make.py is to start with a valid Makefile.machine for
+your system, that works for a vanilla LAMMPS build, i.e. when optional
+packages are not installed.  You can then use Make.py to add various
+settings (FFT, JPG, PNG) to the Makefile.machine as well as change its
+compiler and MPI options.  You can also add additional packages to the
+build, as well as build the needed supporting libraries.
+</P>
+<P>You can also use Make.py to create a new Makefile.machine from
+scratch, using the "-m none" switch, if you also specify what compiler
+and MPI options to use, via the "-cc" and "-mpi" switches.
 </P>
 <HR>

--- a/doc/Section_start.txt
+++ b/doc/Section_start.txt
@ -79,14 +79,27 @@ This section has the following sub-sections:

 [{Read this first:}] :link(start_2_1)

-If you want to avoid building LAMMPS, read the preceeding section
-about options available for downloading and installing executables.
-Details are discussed on the "download"_download page.
+If you want to avoid building LAMMPS yourself, read the preceeding
+section about options available for downloading and installing
+executables.  Details are discussed on the "download"_download page.

-Building LAMMPS can be simple or not-so-simple.  If MPI is already
-installed on your machine (or you just want to run LAMMPS in serial)
-and you can use one of the provided machine Makefiles and the build
-works on your platform, then it's simple.
+Building LAMMPS can be simple or not-so-simple.  If all you need are
+the default packages installed in LAMMPS, and MPI is already installed
+on your machine, or you just want to run LAMMPS in serial, then you
+can typically use the Makefile.mpi or Makefile.serial files in
+src/MAKE and type one of these lines (from the src dir):
+
+make mpi
+make serial :pre
+
+Or if one of the other Makefile.machine files in the src/MAKE
+sub-directories matches your system (type "make" to see a list), you
+can use it as-is by typing (for example):
+
+make stampede :pre
+
+If any of these builds with an existing Makefile.machine works on your
+system, then you're done!

 If you want to do one of these:

@ -99,7 +112,14 @@ then building LAMMPS is more complicated.  You may need to find where
 auxiliary libraries exist on your machine or install them if they
 don't.  You may need to build additional libraries that are part of
 the LAMMPS package, before building LAMMPS.  You may need to edit a
-machine Makefile to make it compatible with your system.
+Makefile.machine file to make it compatible with your system.
+
+Note that there is a Make.py tool in the src directory that automates
+several of these steps, but you still have to know what you are doing.
+"Section 2.4"_#start_4 below describes the tool.  It is a convenient
+way to work with installing/un-installing various packages, the
+Makefile.machine changes required by some packages, and the auxiliary
+libraries some of them use.

 Please read the following sections carefully.  If you are not
 comfortable with makefiles, or building codes on a Unix platform, or
@ -115,7 +135,7 @@ please post the issue to the "LAMMPS mail
 list"_http://lammps.sandia.gov/mail.html.

 If you succeed in building LAMMPS on a new kind of machine, for which
-there isn't a similar machine Makefile included in the src/MAKE
+there isn't a similar machine Makefile included in the src/MAKE/MORE
 directory, then send it to the developers and we can include it in the
 LAMMPS distribution.

@ -127,21 +147,43 @@ LAMMPS distribution.

 The src directory contains the C++ source and header files for LAMMPS.
 It also contains a top-level Makefile and a MAKE sub-directory with
-low-level Makefile.* files for many machines.  From within the src
-directory, type "make" or "gmake".  You should see a list of available
-choices.  If one of those is the machine and options you want, you can
-type a command like:
+low-level Makefile.* files for many systems and machines.  See the
+src/MAKE/README file for a quick overview of what files are available
+and what sub-directories they are in.

-make linux
+The src/MAKE dir has a few files that should work as-is on many
+platforms.  The src/MAKE/OPTIONS dir has more that inovke additional
+compiler, MPI, and other setting options commonly used by LAMMPS, to
+illustrate their syntax.  The src/MAKE/MACHINES dir has many more that
+have been tweaked or optimized for specific machines.  These files are
+all good starting points if you find you need to change them for your
+machine.  Put any file you edit into the src/MAKE/MINE directory and
+it will be never be touched by any LAMMPS updates.
+
+From within the src directory, type "make" or "gmake".  You should see
+a list of available choices from src/MAKE and all of its
+sub-directories.  If one of those has the options you want or is the
+machine you want, you can type a command like:
+
+make mpi
+or
+make serial_icc
 or
 gmake mac :pre

+Note that the corresponding Makefile.machine can exist in src/MAKE or
+any of its sub-directories.  If a file with the same name appears in
+multiple places (not a good idea), the order they are used is as
+follows: src/MAKE/MINE, src/MAKE, src/MAKE/OPTIONS, src/MAKE/MACHINES.
+This gives preference to a file you have created/edited and put in
+src/MAKE/MINE.
+
 Note that on a multi-processor or multi-core platform you can launch a
 parallel make, by using the "-j" switch with the make command, which
 will build LAMMPS more quickly.

-If you get no errors and an executable like lmp_linux or lmp_mac is
-produced, you're done; it's your lucky day.
+If you get no errors and an executable like lmp_mpi or lmp_g++_serial
+or lmp_mac is produced, then you're done; it's your lucky day.

 Note that by default only a few of LAMMPS optional packages are
 installed.  To build LAMMPS with optional packages, see "this
@ -151,43 +193,47 @@ section"_#start_3 below.

 If Step 0 did not work, you will need to create a low-level Makefile
 for your machine, like Makefile.foo.  You should make a copy of an
-existing src/MAKE/Makefile.* as a starting point.  The only portions
-of the file you need to edit are the first line, the "compiler/linker
-settings" section, and the "LAMMPS-specific settings" section.
+existing Makefile.* in src/MAKE or one of its sub-directories as a
+starting point.  The only portions of the file you need to edit are
+the first line, the "compiler/linker settings" section, and the
+"LAMMPS-specific settings" section.  When it works, put the edited
+file in src/MAKE/MINE and it will not be altered by any future LAMMPS
+updates.

 [Step 2]

-Change the first line of src/MAKE/Makefile.foo to list the word "foo"
-after the "#", and whatever other options it will set.  This is the
-line you will see if you just type "make".
+Change the first line of Makefile.foo to list the word "foo" after the
+"#", and whatever other options it will set.  This is the line you
+will see if you just type "make".

 [Step 3]

 The "compiler/linker settings" section lists compiler and linker
 settings for your C++ compiler, including optimization flags.  You can
 use g++, the open-source GNU compiler, which is available on all Unix
-systems.  You can also use mpicc which will typically be available if
+systems.  You can also use mpicxx which will typically be available if
 MPI is installed on your system, though you should check which actual
 compiler it wraps.  Vendor compilers often produce faster code.  On
-boxes with Intel CPUs, we suggest using the commercial Intel icc
-compiler, which can be downloaded from "Intel's compiler site"_intel.
+boxes with Intel CPUs, we suggest using the Intel icc compiler, which
+can be downloaded from "Intel's compiler site"_intel.

 :link(intel,http://www.intel.com/software/products/noncom)

 If building a C++ code on your machine requires additional libraries,
-then you should list them as part of the LIB variable.
+then you should list them as part of the LIB variable.  You should
+not need to do this if you use mpicxx.

 The DEPFLAGS setting is what triggers the C++ compiler to create a
 dependency list for a source file.  This speeds re-compilation when
 source (*.cpp) or header (*.h) files are edited.  Some compilers do
 not support dependency file creation, or may use a different switch
-than -D.  GNU g++ works with -D.  If your compiler can't create
-dependency files, then you'll need to create a Makefile.foo patterned
-after Makefile.storm, which uses different rules that do not involve
-dependency files.  Note that when you build LAMMPS for the first time
-on a new platform, a long list of *.d files will be printed out
-rapidly.  This is not an error; it is the Makefile doing its normal
-creation of dependencies.
+than -D.  GNU g++ and Intel icc works with -D.  If your compiler can't
+create dependency files, then you'll need to create a Makefile.foo
+patterned after Makefile.storm, which uses different rules that do not
+involve dependency files.  Note that when you build LAMMPS for the
+first time on a new platform, a long list of *.d files will be printed
+out rapidly.  This is not an error; it is the Makefile doing its
+normal creation of dependencies.

 [Step 4]

@ -271,20 +317,21 @@ Step 6 below for info about building LAMMPS with an FFT library.
 [Step 5]

 The 3 MPI variables are used to specify an MPI library to build LAMMPS
-with. 
+with.  Note that you do not need to set these if you use the MPI
+compiler mpicxx for your CC and LINK setting in the section above.
+The MPI wrapper knows where to find the needed files.

 If you want LAMMPS to run in parallel, you must have an MPI library
-installed on your platform.  If you use an MPI-wrapped compiler, such
-as "mpicc" to build LAMMPS, you should be able to leave these 3
-variables blank; the MPI wrapper knows where to find the needed files.
-If not, and MPI is installed on your system in the usual place (under
-/usr/local), you also may not need to specify these 3 variables.  On
-some large parallel machines which use "modules" for their
-compile/link environements, you may simply need to include the correct
-module in your build environment.  Or the parallel machine may have a
-vendor-provided MPI which the compiler has no trouble finding.
+installed on your platform.  If MPI is installed on your system in the
+usual place (under /usr/local), you also may not need to specify these
+3 variables, assuming /usr/local is in your path.  On some large
+parallel machines which use "modules" for their compile/link
+environements, you may simply need to include the correct module in
+your build environment, before building LAMMPS.  Or the parallel
+machine may have a vendor-provided MPI which the compiler has no
+trouble finding.

-Failing this, with these 3 variables you can specify where the mpi.h
+Failing this, these 3 variables can be used to specify where the mpi.h
 file (MPI_INC) and the MPI library file (MPI_PATH) are found and the
 name of the library file (MPI_LIB).

@ -304,20 +351,22 @@ arise when linking LAMMPS to the MPI library.

 If you just want to run LAMMPS on a single processor, you can use the
 dummy MPI library provided in src/STUBS, since you don't need a true
-MPI library installed on your system.  See the
-src/MAKE/Makefile.serial file for how to specify the 3 MPI variables
-in this case.  You will also need to build the STUBS library for your
-platform before making LAMMPS itself.  To build from the src
-directory, type "make stubs", or from the STUBS dir, type "make".
-This should create a libmpi_stubs.a file suitable for linking to
-LAMMPS.  If the build fails, you will need to edit the STUBS/Makefile
-for your platform.
+MPI library installed on your system.  See src/MAKE/Makefile.serial
+for how to specify the 3 MPI variables in this case.  You will also
+need to build the STUBS library for your platform before making LAMMPS
+itself.  Note that if you are building with src/MAKE/Makefile.serial,
+e.g. by typing "make serial", then the STUBS library is built for you.

-The file STUBS/mpi.c provides a CPU timer function called
-MPI_Wtime() that calls gettimeofday() .  If your system doesn't
-support gettimeofday() , you'll need to insert code to call another
-timer.  Note that the ANSI-standard function clock() rolls over after
-an hour or so, and is therefore insufficient for timing long LAMMPS
+To build the STUBS library from the src directory, type "make stubs",
+or from the src/STUBS dir, type "make".  This should create a
+libmpi_stubs.a file suitable for linking to LAMMPS.  If the build
+fails, you will need to edit the STUBS/Makefile for your platform.
+
+The file STUBS/mpi.c provides a CPU timer function called MPI_Wtime()
+that calls gettimeofday() .  If your system doesn't support
+gettimeofday() , you'll need to insert code to call another timer.
+Note that the ANSI-standard function clock() rolls over after an hour
+or so, and is therefore insufficient for timing long LAMMPS
 simulations.

 [Step 6]
@ -404,11 +453,9 @@ section"_#start_3 below, before proceeding to Step 9.

 [Step 9]

-That's it.  Once you have a correct Makefile.foo, you have installed
-the optional LAMMPS packages you want to include in your build, and
-you have pre-built any other needed libraries (e.g. MPI, FFT, package
-libraries), all you need to do from the src directory is type
-something like this:
+That's it.  Once you have a correct Makefile.foo, and you have
+pre-built any other needed libraries (e.g. MPI, FFT, etc) all you need
+to do from the src directory is type something like this:

 make foo
 or
@ -524,7 +571,7 @@ neighbor lists and would run very slowly in terms of CPU secs/timestep.
 [{Building for a Mac:}] :link(start_2_5)

 OS X is BSD Unix, so it should just work.  See the
-src/MAKE/Makefile.mac file.
+src/MAKE/MACHINES/Makefile.mac and Makefile.mac_mpi files.

 :line

@ -553,7 +600,7 @@ excluded, you can build it yourself.

 One way to do this is install and use cygwin to build LAMMPS with a
 standard unix style make program, just as you would on a Linux box;
-see src/MAKE/Makefile.cygwin.
+see src/MAKE/MACHINES/Makefile.cygwin.

 The other way to do this is using Visual Studio and project files.
 See the src/WINDOWS directory and its README.txt file for instructions
@ -568,7 +615,12 @@ This section has the following sub-sections:
 "Package basics"_#start_3_1
 "Including/excluding packages"_#start_3_2
 "Packages that require extra libraries"_#start_3_3
-"Packages that use make variable settings"_#start_3_4 :ul
+"Packages that require Makefile.machine settings"_#start_3_4 :ul
+
+Note that the following "Section 2.4"_#start_4 describes the Make.py
+tool which can be used to install/un-install packages and build the
+auxiliary libraries which some of them use.  It can also auto-edit a
+Makefile.machine to add settings needed by some packages.

 :line

@ -577,9 +629,11 @@ This section has the following sub-sections:
 The source code for LAMMPS is structured as a set of core files which
 are always included, plus optional packages.  Packages are groups of
 files that enable a specific set of features.  For example, force
-fields for molecular systems or granular systems are in packages.  You
-can see the list of all packages by typing "make package" from within
-the src directory of the LAMMPS distribution.
+fields for molecular systems or granular systems are in packages.
+
+You can see the list of all packages by typing "make package" from
+within the src directory of the LAMMPS distribution.  This also lists
+various make commands that can be used to manipulate packages.

 If you use a command in a LAMMPS input script that is specific to a
 particular package, you must have built LAMMPS with that package, else
@ -646,10 +700,11 @@ I.e. individual files are only included if their dependencies are
 already included.  Likewise, if a package is excluded, other files
 dependent on that package are also excluded.

-The reason to exclude packages is if you will never run certain kinds
-of simulations.  For some packages, this will keep you from having to
-build auxiliary libraries (see below), and will also produce a smaller
-executable which may run a bit faster.
+If you will never run simulations that use the features in a
+particular packages, there is no reason to include it in your build.
+For some packages, this will keep you from having to build auxiliary
+libraries (see below), and will also produce a smaller executable
+which may run a bit faster.

 When you download a LAMMPS tarball, these packages are pre-installed
 in the src directory: KSPACE, MANYBODY,MOLECULE.  When you download
@ -660,9 +715,10 @@ Packages are included or excluded by typing "make yes-name" or "make
 no-name", where "name" is the name of the package in lower-case, e.g.
 name = kspace for the KSPACE package or name = user-atc for the
 USER-ATC package.  You can also type "make yes-standard", "make
-no-standard", "make yes-user", "make no-user", "make yes-all" or "make
-no-all" to include/exclude various sets of packages.  Type "make
-package" to see the all of the package-related make options.
+no-standard", "make yes-std", "make no-std", "make yes-user", "make
+no-user", "make yes-all" or "make no-all" to include/exclude various
+sets of packages.  Type "make package" to see the all of the
+package-related make options.

 IMPORTANT NOTE: Inclusion/exclusion of a package works by simply
 moving files back and forth between the main src directory and
@ -676,18 +732,19 @@ sub-directories.  You do not normally need to use these commands
 unless you are editing LAMMPS files or have downloaded a patch from
 the LAMMPS WWW site.

-Typing "make package-update" will overwrite src files with files from
-the package sub-directories if the package has been included.  It
-should be used after a patch is installed, since patches only update
-the files in the package sub-directory, but not the src files.  Typing
-"make package-overwrite" will overwrite files in the package
-sub-directories with src files.
+Typing "make package-update" or "make pu" will overwrite src files
+with files from the package sub-directories if the package has been
+included.  It should be used after a patch is installed, since patches
+only update the files in the package sub-directory, but not the src
+files.  Typing "make package-overwrite" will overwrite files in the
+package sub-directories with src files.

-Typing "make package-status" will show which packages are currently
-included. Of those that are included, it will list files that are
-different in the src directory and package sub-directory.  Typing
-"make package-diff" lists all differences between these files.  Again,
-type "make package" to see all of the package-related make options.
+Typing "make package-status" or "make ps" will show which packages are
+currently included. Of those that are included, it will list files
+that are different in the src directory and package sub-directory.
+Typing "make package-diff" lists all differences between these files.
+Again, type "make package" to see all of the package-related make
+options.

 :line

@ -699,16 +756,16 @@ you get a LAMMPS build error about a missing library, this is likely
 the reason.  See the "Section_packages"_Section_packages.html doc page
 for a list of packages that have auxiliary libraries.

-Code for some of these auxiliary libraries is included in the LAMMPS
+Code for most of these auxiliary libraries is included in the LAMMPS
 distribution under the lib directory.  Examples are the USER-ATC and
-MEAM packages.  Some auxiliary libraries are NOT included with LAMMPS;
-to use the associated package you must download and install the
-auxiliary library yourself.  Examples are the KIM and VORONOI and
+MEAM packages.  A few auxiliary libraries are NOT included with
+LAMMPS; to use the associated package you must download and install
+the auxiliary library yourself.  Examples are the KIM and VORONOI and
 USER-MOLFILE packages.

-For libraries with provided source code, each lib directory has a
-README file (e.g. lib/reax/README) with instructions on how to build
-that library.  Typically this is done by typing something like:
+For provided libraries, each lib directory has a README file
+(e.g. lib/reax/README) with instructions on how to build that library.
+Typically this is done by typing something like:

 make -f Makefile.g++ :pre

@ -740,168 +797,203 @@ is built with, typically requires additional Fortran-to-C libraries be
 included in the link.  Another example are the BLAS and LAPACK
 libraries needed to use the USER-ATC or USER-AWPMD packages.

-For libraries without provided source code, see the
-src/package/Makefile.lammps file for information on where to find the
-library and how to build it.  E.g. the file src/KIM/Makefile.lammps or
-src/VORONOI/Makefile.lammps or src/UESR-MOLFILE/Makefile.lammps.
-These files serve the same purpose as the lib/package/Makefile.lammps
-files described above.  The files have settings needed when LAMMPS is
-built to link with the corresponding auxiliary library.
+For libraries without provided source code, the file
+src/package/README has information on where to find the library and
+how to build it, e.g. src/VORONOI/README.  There is also a
+Makefile.lammps file in the src/package directory.  E.g. files
+src/KIM/Makefile.lammps or src/VORONOI/Makefile.lammps or
+src/UESR-MOLFILE/Makefile.lammps.  These files serve the same purpose
+as the lib/package/Makefile.lammps files described above.  The files
+have settings needed when LAMMPS is built to link with the
+corresponding auxiliary library.

 Again, you must insure that the settings in
 src/package/Makefile.lammps are appropriate for your system and where
 you installed the auxiliary library.  If they are not, the LAMMPS
-build will fail.
+build will typically fail.

 :line

-[{Packages that use make variable settings}] :link(start_3_4)
+[{Packages that require Makefile.machine settings}] :link(start_3_4)

-One package, the KOKKOS package, allows its build options to be
-specified by setting variables via the "make" command, rather than by
-first building an auxiliary library and editing a Makefile.lammps
-file, as discussed in the previous sub-section for other packages.
-This is for convenience since it is common to want to experiment with
-different Kokkos library options.  Using variables enables a direct
-re-build of LAMMPS and its Kokkos dependencies, so that a benchmark
-test with different Kokkos options can be quickly performed.
+A few packages require specific settings in Makefile.machine, to
+either build or use the package effectively.  These are the
+USER-INTEL, KOKKOS, USER-OMP, and OPT packages.  The details of what
+flags to add or what variables to define are given on the doc pages
+that describe each of these accelerator packages in detail:

-The syntax for setting make variables is as follows.  You must
-use a GNU-compatible make command for this to work.  Try "gmake"
-if your system's standard make complains.
+"USER-INTEL package"_accelerate_intel.html
+"KOKKOS package"_accelerate_kokkos.html
+"USER-OMP package"_accelerate_omp.html
+"OPT package"_accelerate_opt.html :ul

-make yes-kokkos
-make g++ VAR1=value VAR2=value ... :pre
+Here is a brief summary of what Makefile.machine changes are needed.
+Note that the Make.py tool, described in the next "Section
+2.4"_#start_4 can automatically add the needed info to an existing
+machine Makefile, using simple command-line arguments.

-The first line installs the KOKKOS package, which only needs to be
-done once.  The second line builds LAMMPS with src/MAKE/Makefile.g++
-and optionally sets one or more variables that affect the build.  Each
-variable is specified in upper-case; its value follows an equal sign
-with no spaces.  The second line can be repeated with different
-variable settings, though a "clean" must be done before the rebuild.
-Type "make clean" to see options for this operation.
+In src/MAKE/OPTIONS see the following Makefiles for examples of the
+changes described below:

-These are the variables that can be specified.  Each takes a value of
-{yes} or {no}.  The default value is listed, which is set in the
-lib/kokkos/Makefile.lammps file.  See "this
-section"_Section_accelerate.html#acc_8 for a discussion of what is
-meant by "host" and "device" in the Kokkos context.
+Makefile.intel_cpu
+Makefile.intel_phi
+Makefile.kokkos_omp
+Makefile.kokkos_cuda
+Makefile.kokkos_phi
+Makefile.omp :ul

-OMP, default = {yes}
-CUDA, default = {no}
-HWLOC, default = {no}
-AVX, default = {no}
-MIC, default = {no}
-LIBRT, default = {no}
-DEBUG, default = {no} :ul
+For the USER-INTEL package, you have 2 choices when building.  You can
+build with CPU or Phi support.  The latter uses Xeon Phi chips in
+"offload" mode.  Each of these modes requires additional settings in
+your Makefile.machine for CCFLAGS and LINKFLAGS.

-OMP sets the parallelization method used for Kokkos code (within
-LAMMPS) that runs on the host.  OMP=yes means that OpenMP will be
-used.  OMP=no means that pthreads will be used.
+For CPU mode (if using an Intel compiler):

-CUDA sets the parallelization method used for Kokkos code (within
-LAMMPS) that runs on the device.  CUDA=yes means an NVIDIA GPU running
-CUDA will be used.  CUDA=no means that the OMP=yes or OMP=no setting
-will be used for the device as well as the host.
+CCFLAGS: add -fopenmp, -DLAMMPS_MEMALIGN=64, -restrict, -xHost, -fno-alias, -ansi-alias, -override-limits
+LINKFLAGS: add -fopenmp :ul

-If CUDA=yes, then the lo-level Makefile in the src/MAKE directory must
-use "nvcc" as its compiler, via its CC setting.  For best performance
-its CCFLAGS setting should use -O3 and have an -arch setting that
-matches the compute capability of your NVIDIA hardware and software
-installation, e.g. -arch=sm_20.  Generally Fermi Generation GPUs are
-sm_20, while Kepler generation GPUs are sm_30 or sm_35 and Maxwell
-cards are sm_50.  A complete list can be found on
-"wikipedia"_http://en.wikipedia.org/wiki/CUDA#Supported_GPUs. You can
-also use the deviceQuery tool that comes with the CUDA samples.  Note
-the minimal required compute capability is 2.0, but this will give
-signicantly reduced performance compared to Kepler generation GPUs
-with compute capability 3.x.  For the LINK setting, "nvcc" should not
-be used; instead use g++ or another compiler suitable for linking C++
-applications.  Often you will want to use your MPI compiler wrapper
-for this setting (i.e. mpicxx).  Finally, the lo-level Makefile must
-also have a "Compilation rule" for creating *.o files from *.cu files.
-See src/Makefile.cuda for an example of a lo-level Makefile with all
-of these settings.
+For Phi mode add the following in addition to the CPU mode flags:

-HWLOC binds threads to hardware cores, so they do not migrate during a
-simulation.  HWLOC=yes should always be used if running with OMP=no
-for pthreads.  It is not necessary for OMP=yes for OpenMP, because
-OpenMP provides alternative methods via environment variables for
-binding threads to hardware cores.  More info on binding threads to
-cores is given in "this section"_Section_accelerate.html#acc_8.
+CCFLAGS: add -DLMP_INTEL_OFFLOAD and 
+LINKFLAGS: add -offload :ul

-AVX enables Intel advanced vector extensions when compiling for an
-Intel-compatible chip.  AVX=yes should only be set if your host
-hardware supports AVX.  If it does not support it, this will cause a
-run-time crash.
+And also add this to CCFLAGS:

-MIC enables compiler switches needed when compling for an Intel Phi
-processor.
+-offload-option,mic,compiler,"-fp-model fast=2 -mGLOB_default_function_attrs=\"gather_scatter_loop_unroll=4\"" :pre

-LIBRT enables use of a more accurate timer mechanism on most Unix
-platforms.  This library is not available on all platforms.
+For the KOKKOS package, you have 3 choices when building.  You can
+build with OMP or Cuda or Phi support.  Phi support uses Xeon Phi
+chips in "native" mode.  This can be done by setting the following
+variables in your Makefile.machine:

-DEBUG is only useful when developing a Kokkos-enabled style within
-LAMMPS.  DEBUG=yes enables printing of run-time debugging information
-that can be useful.  It also enables runtime bounds checking on Kokkos
-data structures.
+for OMP support, set OMP = yes
+for Cuda support, set OMP = yes and CUDA = yes
+for Phi support, set OMP = yes and MIC = yes :ul
+
+These can also be set as additional arguments to the make command, e.g.
+
+make g++ OMP=yes MIC=yes :pre
+
+Building the KOKKOS package with CUDA support requires a Makefile
+machine that uses the NVIDIA "nvcc" compiler, as well as an
+appropriate "arch" setting appropriate to the GPU hardware and NVIDIA
+software you have on your machine.  See
+src/MAKE/OPTIONS/Makefile.kokkos_cuda for an example of such a machine
+Makefile.
+
+For the USER-OMP package, your Makefile.machine needs additional
+settings for CCFLAGS and LINKFLAGS.
+
+CCFLAGS: add -fopenmp and -restrict
+LINKFLAGS: add -fopenmp :ul
+
+For the OPT package, your Makefile.machine needs an additional
+settings for CCFLAGS.
+
+CCFLAGS: add -restrict :ul

 :line

 2.4 Building LAMMPS via the Make.py script :h4,link(start_4)

-The src directory includes a Make.py script, written
-in Python, which can be used to automate various steps
-of the build process.
+The src directory includes a Make.py script, written in Python, which
+can be used to automate various steps of the build process.  It is
+particularly useful for working with the accelerator packages, as well
+as other packages which require auxiliary libraries to be built.

-You can run the script from the src directory by typing either:
+You can run Make.py from the src directory by typing either:

-Make.py
-python Make.py :pre
+Make.py -h
+python Make.py -h :pre

-which will give you info about the tool.  For the former to work, you
-may need to edit the 1st line of the script to point to your local
+which will give you help info about the tool.  For the former to work,
+you may need to edit the first line of Make.py to point to your local
 Python.  And you may need to insure the script is executable:

 chmod +x Make.py :pre

-The following options are supported as switches:
+Here are examples of build tasks you can perform with Make.py:

-i file1 file2 ...
-p package1 package2 ...
-u package1 package2 ...
-e package1 arg1 arg2 package2 ...
-o dir
-b machine
-s suffix1 suffix2 ...
-l dir
-j N
-h switch1 switch2 ... :ul
+Install/uninstall packages: Make.py -p no-lib kokkos omp intel
+Build specific auxiliary libs: Make.py lib-atc lib-meam
+Build libs for all installed packages: Make.py -p cuda gpu -gpu mode=double arch=31 lib-all
+Create a Makefile from scratch with a compiler and MPI: Make.py -m none -cc g++ -mpi mpich file
+Augment Makefile.serial with settings for installed packages: Make.py -p intel -intel cpu -m serial file
+Add JPG and FFTW support to Makefile.mpi: Make.py -m mpi -jpg -fft fftw file
+Build LAMMPS with a parallel make using Makefile.mpi: Make.py -j 16 -m mpi exe
+Build LAMMPS and libs it needs using Makefile.serial with accelerator settings: Make.py -p gpu intel -intel cpu lib-all file serial :tb(s=:)

-Help on any switch can be listed by using -h, e.g.
+The bench and examples directories give Make.py commands that can be
+used to build LAMMPS with the various packages and options needed to
+run all the benchmark and example input scripts.  See these files for
+more details:

-Make.py -h -i -p :pre
+bench/README
+bench/FERMI/README
+bench/KEPLER/README
+bench/PHI/README
+examples/README
+examples/accelerate/README
+examples/accelerate/make.list :ul

-At a hi-level, these are the kinds of package management
-and build tasks that can be performed easily, using
-the Make.py tool:
+All of the Make.py options and syntax help can be accessed by using
+the "-h" switch.

-install/uninstall packages and build the associated external libs (use -p and -u and -e)
-install packages needed for one or more input scripts (use -i and -p)
-build LAMMPS, either in the src dir or new dir (use -b)
-create a new dir with only the source code needed for one or more input scripts (use -i and -o) :ul
+E.g. typing "Make.py -h" gives

-The last bullet can be useful when you wish to build a stripped-down
-version of LAMMPS to run a specific script(s).  Or when you wish to
-move the minimal amount of files to another platform for a remote
-LAMMPS build.
+Syntax: Make.py switch args ... {action1} {action2} ...
+  actions:
+    lib-all, lib-dir, clean, file, exe or machine
+    zero or more actions, in any order (machine must be last)
+  switches:
+    -d (dir), -j (jmake), -m (makefile), -o (output),
+    -p (packages), -r (redo), -s (settings), -v (verbose)
+  switches for libs:
+    -atc, -awpmd, -colvars, -cuda
+    -gpu, -meam, -poems, -qmmm, -reax
+  switches for build and makefile options:
+    -intel, -kokkos, -cc, -mpi, -fft, -jpg, -png :pre

-Note that using Make.py is not a substitute for insuring you have a
-valid src/MAKE/Makefile.foo for your system, or that external library
-Makefiles in any lib/* directories you use are also valid for your
-system.  But once you have done that, you can use Make.py to quickly
-include/exclude the packages and external libraries needed by your
-input scripts.
+Using the "-h" switch with other switches and actions gives additional
+info on all the other specified switches or actions.  The "-h" can be
+anywhere in the command-line and the other switches do not need their
+arguments.  E.g. type "Make.py -h -d -atc -intel" will print:
+
+-d dir
+  dir = LAMMPS home dir
+  if -d not specified, working dir must be lammps/src :pre
+
+-atc make=suffix lammps=suffix2
+  all args are optional and can be in any order
+  make = use Makefile.suffix (def = g++)
+  lammps = use Makefile.lammps.suffix2 (def = EXTRAMAKE in makefile) :pre
+
+-intel mode
+  mode = cpu or phi (def = cpu)
+    build Intel package for CPU or Xeon Phi :pre
+
+Note that Make.py never overwrites an existing Makefile.machine.
+Instead, it creates src/MAKE/MINE/Makefile.auto, which you can save or
+rename if desired.  Likewise it creates an executable named
+src/lmp_auto, which you can rename using the -o switch if desired.
+
+The most recently executed Make.py commmand is saved in
+src/Make.py.last.  You can use the "-r" switch (for redo) to re-invoke
+the last command, or you can save a sequence of one or more Make.py
+commands to a file and invoke the file of commands using "-r".  You
+can also label the commands in the file and invoke one or more of them
+by name.
+
+A typical use of Make.py is to start with a valid Makefile.machine for
+your system, that works for a vanilla LAMMPS build, i.e. when optional
+packages are not installed.  You can then use Make.py to add various
+settings (FFT, JPG, PNG) to the Makefile.machine as well as change its
+compiler and MPI options.  You can also add additional packages to the
+build, as well as build the needed supporting libraries.
+
+You can also use Make.py to create a new Makefile.machine from
+scratch, using the "-m none" switch, if you also specify what compiler
+and MPI options to use, via the "-cc" and "-mpi" switches.

 :line

--- a/doc/accelerate_cuda.html
+++ b/doc/accelerate_cuda.html
@ -74,16 +74,30 @@ projects can be compiled without problems.
 <P>This requires two steps (a,b): build the USER-CUDA library, then build
 LAMMPS with the USER-CUDA package.
 </P>
+<P>You can do both these steps in one line, using the src/Make.py script,
+described in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual.
+Type "Make.py -h" for help.  If run from the src directory, this
+command will create src/lmp_cuda using src/MAKE/Makefile.mpi as the
+starting Makefile.machine:
+</P>
+<PRE>Make.py -p cuda -cuda mode=single arch=20 -o cuda lib-cuda file mpi 
+</PRE>
+<P>Or you can follow these two (a,b) steps:
+</P>
 <P>(a) Build the USER-CUDA library
 </P>
 <P>The USER-CUDA library is in lammps/lib/cuda.  If your <I>CUDA</I> toolkit
 is not installed in the default system directoy <I>/usr/local/cuda</I> edit
 the file <I>lib/cuda/Makefile.common</I> accordingly.
 </P>
-<P>To set options for the library build, type "make OPTIONS", where
+<P>To build the library with the settings in lib/cuda/Makefile.default,
+simply type:
+</P>
+<PRE>make 
+</PRE>
+<P>To set options when the library is built, type "make OPTIONS", where
 <I>OPTIONS</I> are one or more of the following. The settings will be
-written to the <I>lib/cuda/Makefile.defaults</I> and used when
-the library is built.
+written to the <I>lib/cuda/Makefile.defaults</I> before the build.
 </P>
 <PRE><I>precision=N</I> to set the precision level
  N = 1 for single precision (default)
@ -107,11 +121,8 @@ the library is built.
  0 = no CUFFT support (default)
  in the future other CUDA-enabled FFT libraries might be supported 
 </PRE>
-<P>To build the library, simply type:
-</P>
-<PRE>make 
-</PRE>
-<P>If successful, it will produce the files libcuda.a and Makefile.lammps.
+<P>If the build is successful, it will produce the files liblammpscuda.a and
+Makefile.lammps.
 </P>
 <P>Note that if you change any of the options (like precision), you need
 to re-build the entire library.  Do a "make clean" first, followed by
@ -123,8 +134,7 @@ to re-build the entire library.  Do a "make clean" first, followed by
 make yes-user-cuda
 make machine 
 </PRE>
-<P>No additional compile/link flags are needed in your Makefile.machine
-in src/MAKE.
+<P>No additional compile/link flags are needed in Makefile.machine.
 </P>
 <P>Note that if you change the USER-CUDA library precision (discussed
 above) and rebuild the USER-CUDA library, then you also need to
--- a/doc/accelerate_cuda.txt
+++ b/doc/accelerate_cuda.txt
@ -71,16 +71,30 @@ projects can be compiled without problems.
 This requires two steps (a,b): build the USER-CUDA library, then build
 LAMMPS with the USER-CUDA package.

+You can do both these steps in one line, using the src/Make.py script,
+described in "Section 2.4"_Section_start.html#start_4 of the manual.
+Type "Make.py -h" for help.  If run from the src directory, this
+command will create src/lmp_cuda using src/MAKE/Makefile.mpi as the
+starting Makefile.machine:
+
+Make.py -p cuda -cuda mode=single arch=20 -o cuda lib-cuda file mpi :pre
+
+Or you can follow these two (a,b) steps:
+
 (a) Build the USER-CUDA library

 The USER-CUDA library is in lammps/lib/cuda.  If your {CUDA} toolkit
 is not installed in the default system directoy {/usr/local/cuda} edit
 the file {lib/cuda/Makefile.common} accordingly.

-To set options for the library build, type "make OPTIONS", where
+To build the library with the settings in lib/cuda/Makefile.default,
+simply type:
+
+make :pre
+
+To set options when the library is built, type "make OPTIONS", where
 {OPTIONS} are one or more of the following. The settings will be
-written to the {lib/cuda/Makefile.defaults} and used when
-the library is built.
+written to the {lib/cuda/Makefile.defaults} before the build.

 {precision=N} to set the precision level
  N = 1 for single precision (default)
@ -104,11 +118,8 @@ the library is built.
  0 = no CUFFT support (default)
  in the future other CUDA-enabled FFT libraries might be supported :pre

-To build the library, simply type:
-
-make :pre
-
-If successful, it will produce the files libcuda.a and Makefile.lammps.
+If the build is successful, it will produce the files liblammpscuda.a and
+Makefile.lammps.

 Note that if you change any of the options (like precision), you need
 to re-build the entire library.  Do a "make clean" first, followed by
@ -120,8 +131,7 @@ cd lammps/src
 make yes-user-cuda
 make machine :pre

-No additional compile/link flags are needed in your Makefile.machine
-in src/MAKE.
+No additional compile/link flags are needed in Makefile.machine.

 Note that if you change the USER-CUDA library precision (discussed
 above) and rebuild the USER-CUDA library, then you also need to
--- a/doc/accelerate_gpu.html
+++ b/doc/accelerate_gpu.html
@ -76,6 +76,16 @@ install the NVIDIA Cuda software on your system:
 <P>This requires two steps (a,b): build the GPU library, then build
 LAMMPS with the GPU package.
 </P>
+<P>You can do both these steps in one line, using the src/Make.py script,
+described in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual.
+Type "Make.py -h" for help.  If run from the src directory, this
+command will create src/lmp_gpu using src/MAKE/Makefile.mpi as the
+starting Makefile.machine:
+</P>
+<PRE>Make.py -p gpu -gpu mode=single arch=31 -o gpu lib-gpu file mpi 
+</PRE>
+<P>Or you can follow these two (a,b) steps:
+</P>
 <P>(a) Build the GPU library
 </P>
 <P>The GPU library is in lammps/lib/gpu.  Select a Makefile.machine (in
@ -120,8 +130,7 @@ Makefile.linux clean", followed by the make command above.
 make yes-gpu
 make machine 
 </PRE>
-<P>No additional compile/link flags are needed in your Makefile.machine
-in src/MAKE.
+<P>No additional compile/link flags are needed in Makefile.machine.
 </P>
 <P>Note that if you change the GPU library precision (discussed above)
 and rebuild the GPU library, then you also need to re-install the GPU
--- a/doc/accelerate_gpu.txt
+++ b/doc/accelerate_gpu.txt
@ -73,6 +73,16 @@ Run lammps/lib/gpu/nvc_get_devices (after building the GPU library, see below) t
 This requires two steps (a,b): build the GPU library, then build
 LAMMPS with the GPU package.

+You can do both these steps in one line, using the src/Make.py script,
+described in "Section 2.4"_Section_start.html#start_4 of the manual.
+Type "Make.py -h" for help.  If run from the src directory, this
+command will create src/lmp_gpu using src/MAKE/Makefile.mpi as the
+starting Makefile.machine:
+
+Make.py -p gpu -gpu mode=single arch=31 -o gpu lib-gpu file mpi :pre
+
+Or you can follow these two (a,b) steps:
+
 (a) Build the GPU library

 The GPU library is in lammps/lib/gpu.  Select a Makefile.machine (in
@ -117,8 +127,7 @@ cd lammps/src
 make yes-gpu
 make machine :pre

-No additional compile/link flags are needed in your Makefile.machine
-in src/MAKE.
+No additional compile/link flags are needed in Makefile.machine.

 Note that if you change the GPU library precision (discussed above)
 and rebuild the GPU library, then you also need to re-install the GPU
--- a/doc/accelerate_intel.html
+++ b/doc/accelerate_intel.html
@ -41,6 +41,10 @@ suffix to "omp" so that styles from the USER-OMP package will be used
 if available, after first testing if a style from the USER-INTEL
 package is available.
 </P>
+<P>When using the USER-INTEL package, you must choose at build time
+whether you are building for CPU-only acceleration or for using the
+Xeon Phi in offload mode.
+</P>
 <P>Here is a quick overview of how to use the USER-INTEL package
 for CPU-only acceleration:
 </P>
@ -50,6 +54,9 @@ for CPU-only acceleration:
 <LI>specify how many OpenMP threads per MPI task to use
 <LI>use USER-INTEL and (optionally) USER-OMP styles in your input script 
 </UL>
+<P>Note that many of these settings can only be used with the Intel
+compiler, as discussed below.
+</P>
 <P>Using the USER-INTEL package to offload work to the Intel(R)
 Xeon Phi(TM) coprocessor is the same except for these additional
 steps:
@ -74,25 +81,41 @@ Phi(TM) coprocessors.
 Intel(R) compiler.  Use of other compilers may not result in
 vectorization or give poor performance.
 </P>
-<P>Use of an Intel C++ compiler is reccommended, but not required.  The
-compiler must support the OpenMP interface.
+<P>Use of an Intel C++ compiler is recommended, but not required (though
+g++ will not recognize some of the settings, so they cannot be used).
+The compiler must support the OpenMP interface.
 </P>
 <P><B>Building LAMMPS with the USER-INTEL package:</B>
 </P>
-<P>Include the package(s) and build LAMMPS:  
+<P>You must choose at build time whether to build for CPU acceleration or
+to use the Xeon Phi in offload mode.
+</P>
+<P>You can do either in one line, using the src/Make.py script, described
+in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual.  Type
+"Make.py -h" for help.  If run from the src directory, these commands
+will create src/lmp_intel_cpu and lmp_intel_phi using
+src/MAKE/Makefile.mpi as the starting Makefile.machine:
+</P>
+<PRE>Make.py -p intel omp -intel cpu -o intel_cpu -cc icc file mpi 
+Make.py -p intel omp -intel phi -o intel_phi -cc icc file mpi 
+</PRE>
+<P>Note that this assumes that your MPI and its mpicxx wrapper
+is using the Intel compiler.  If it is not, you should
+leave off the "-cc icc" switch.
+</P>
+<P>Or you can follow these steps:
 </P>
 <PRE>cd lammps/src
 make yes-user-intel
 make yes-user-omp (if desired)
 make machine 
 </PRE>
-<P>If the USER-OMP package is also installed, you can use styles from
-both packages, as described below.
+<P>Note that if the USER-OMP package is also installed, you can use
+styles from both packages, as described below.
 </P>
-<P>The lo-level src/MAKE/Makefile.machine needs a flag for OpenMP support
-in both the CCFLAGS and LINKFLAGS variables, which is <I>-openmp</I> for
-Intel compilers.  You also need to add -DLAMMPS_MEMALIGN=64 and
-restrict to CCFLAGS.
+<P>The Makefile.machine needs a "-fopenmp" flag for OpenMP support in
+both the CCFLAGS and LINKFLAGS variables.  You also need to add
+-DLAMMPS_MEMALIGN=64 and -restrict to CCFLAGS.
 </P>
 <P>If you are compiling on the same architecture that will be used for
 the runs, adding the flag <I>-xHost</I> to CCFLAGS will enable
@ -102,10 +125,10 @@ vectorization with the Intel(R) compiler.
 coprocessor, the flag <I>-offload</I> should be added to the LINKFLAGS line
 and the flag -DLMP_INTEL_OFFLOAD should be added to the CCFLAGS line.
 </P>
-<P>Note that the machine makefiles Makefile.intel and
-Makefile.intel_offload are included in the src/MAKE directory with
-options that perform well with the Intel(R) compiler. The latter file
-has support for offload to coprocessors; the former does not.
+<P>Example makefiles Makefile.intel_cpu and Makefile.intel_phi are
+included in the src/MAKE/OPTIONS directory with settings that perform
+well with the Intel(R) compiler. The latter file has support for
+offload to coprocessors; the former does not.
 </P>
 <P>If using an Intel compiler, it is recommended that Intel(R) Compiler
 2013 SP1 update 1 be used.  Newer versions have some performance
--- a/doc/accelerate_intel.txt
+++ b/doc/accelerate_intel.txt
@ -38,6 +38,10 @@ suffix to "omp" so that styles from the USER-OMP package will be used
 if available, after first testing if a style from the USER-INTEL
 package is available.

+When using the USER-INTEL package, you must choose at build time
+whether you are building for CPU-only acceleration or for using the
+Xeon Phi in offload mode.
+
 Here is a quick overview of how to use the USER-INTEL package
 for CPU-only acceleration:

@ -47,6 +51,9 @@ include the USER-INTEL package and (optionally) USER-OMP package and build LAMMP
 specify how many OpenMP threads per MPI task to use
 use USER-INTEL and (optionally) USER-OMP styles in your input script :ul

+Note that many of these settings can only be used with the Intel
+compiler, as discussed below.
+
 Using the USER-INTEL package to offload work to the Intel(R)
 Xeon Phi(TM) coprocessor is the same except for these additional
 steps:
@ -71,25 +78,41 @@ Optimizations for vectorization have only been tested with the
 Intel(R) compiler.  Use of other compilers may not result in
 vectorization or give poor performance.

-Use of an Intel C++ compiler is reccommended, but not required.  The
-compiler must support the OpenMP interface.
+Use of an Intel C++ compiler is recommended, but not required (though
+g++ will not recognize some of the settings, so they cannot be used).
+The compiler must support the OpenMP interface.

 [Building LAMMPS with the USER-INTEL package:]

-Include the package(s) and build LAMMPS:  
+You must choose at build time whether to build for CPU acceleration or
+to use the Xeon Phi in offload mode.
+
+You can do either in one line, using the src/Make.py script, described
+in "Section 2.4"_Section_start.html#start_4 of the manual.  Type
+"Make.py -h" for help.  If run from the src directory, these commands
+will create src/lmp_intel_cpu and lmp_intel_phi using
+src/MAKE/Makefile.mpi as the starting Makefile.machine:
+
+Make.py -p intel omp -intel cpu -o intel_cpu -cc icc file mpi 
+Make.py -p intel omp -intel phi -o intel_phi -cc icc file mpi :pre
+
+Note that this assumes that your MPI and its mpicxx wrapper
+is using the Intel compiler.  If it is not, you should
+leave off the "-cc icc" switch.
+
+Or you can follow these steps:

 cd lammps/src
 make yes-user-intel
 make yes-user-omp (if desired)
 make machine :pre

-If the USER-OMP package is also installed, you can use styles from
-both packages, as described below.
+Note that if the USER-OMP package is also installed, you can use
+styles from both packages, as described below.

-The lo-level src/MAKE/Makefile.machine needs a flag for OpenMP support
-in both the CCFLAGS and LINKFLAGS variables, which is {-openmp} for
-Intel compilers.  You also need to add -DLAMMPS_MEMALIGN=64 and
-restrict to CCFLAGS.
+The Makefile.machine needs a "-fopenmp" flag for OpenMP support in
+both the CCFLAGS and LINKFLAGS variables.  You also need to add
+-DLAMMPS_MEMALIGN=64 and -restrict to CCFLAGS.

 If you are compiling on the same architecture that will be used for
 the runs, adding the flag {-xHost} to CCFLAGS will enable
@ -99,10 +122,10 @@ In order to build with support for an Intel(R) Xeon Phi(TM)
 coprocessor, the flag {-offload} should be added to the LINKFLAGS line
 and the flag -DLMP_INTEL_OFFLOAD should be added to the CCFLAGS line.

-Note that the machine makefiles Makefile.intel and
-Makefile.intel_offload are included in the src/MAKE directory with
-options that perform well with the Intel(R) compiler. The latter file
-has support for offload to coprocessors; the former does not.
+Example makefiles Makefile.intel_cpu and Makefile.intel_phi are
+included in the src/MAKE/OPTIONS directory with settings that perform
+well with the Intel(R) compiler. The latter file has support for
+offload to coprocessors; the former does not.

 If using an Intel compiler, it is recommended that Intel(R) Compiler
 2013 SP1 update 1 be used.  Newer versions have some performance
--- a/doc/accelerate_kokkos.html
+++ b/doc/accelerate_kokkos.html
@ -61,14 +61,17 @@ one or the other of the two modes.  The first mode is called the
 processor (running in native mode, not offload mode like the
 USER-INTEL package) are supported.  The second mode is called the
 "device" and is an accelerator chip of some kind.  Currently only an
-NVIDIA GPU is supported.  If your compute node does not have a GPU,
-then there is only one mode of execution, i.e. the host and device are
-the same.
+NVIDIA GPU is supported via Cuda.  If your compute node does not have
+a GPU, then there is only one mode of execution, i.e. the host and
+device are the same.
 </P>
-<P>Here is a quick overview of how to use the KOKKOS package
-for GPU acceleration:
+<P>When using the KOKKOS package, you must choose at build time whether
+you are building for OpenMP, GPU, or for using the Xeon Phi in native
+mode.
 </P>
-<UL><LI>specify variables and settings in your Makefile.machine that enable GPU, Phi, or OpenMP support
+<P>Here is a quick overview of how to use the KOKKOS package:
+</P>
+<UL><LI>specify variables and settings in your Makefile.machine that enable OpenMP, GPU, or Phi support
 <LI>include the KOKKOS package and build LAMMPS
 <LI>enable the KOKKOS package and its hardware options via the "-k on" command-line switch
 <LI>use KOKKOS styles in your input script 
@ -105,14 +108,23 @@ and GPU packages for details of how to check and do this.
 </P>
 <P><B>Building LAMMPS with the KOKKOS package:</B>
 </P>
-<P>Unlike other acceleration packages discussed in this section, the
-Kokkos library in lib/kokkos does not have to be pre-built before
-building LAMMPS itself.  Instead, options for the Kokkos library are
-specified at compile time, when LAMMPS itself is built.  This can be
-done in one of two ways, as discussed below.
+<P>You must choose at build time whether to build for OpenMP, Cuda, or
+Phi.
 </P>
-<P>Here are examples of how to build LAMMPS for the different compute-node
-configurations listed above.
+<P>You can do any of these in one line, using the src/Make.py script,
+described in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual.
+Type "Make.py -h" for help.  If run from the src directory, these
+commands will create src/lmp_kokkos_omp, lmp_kokkos_cuda, and
+lmp_kokkos_phi.  The OMP and PHI options use src/MAKE/Makefile.mpi as
+the starting Makefile.machine.  The CUDA option uses
+src/MAKE/OPTIONS/Makefile.cuda since the NVIDIA nvcc compiler is
+required.
+</P>
+<P>Make.py -p kokkos -kokkos omp -o kokkos_omp file mpi 
+Make.py -p kokkos -kokkos cuda arch=31 -o kokkos_cuda file kokkos_cuda
+Make.py -p kokkos -kokkos phi -o kokkos_phi file mpi 
+</P>
+<P>Or you can follow these steps:
 </P>
 <P>CPU-only (run all-MPI or with OpenMP threading):
 </P>
@ -164,15 +176,76 @@ in <A HREF = "Section_start.html#start_3_4">Section 2.3.4</A> of the manual, as
 as other settings that must be included in the machine makefile, if
 you create your own.
 </P>
-<P>There are other allowed options when building with the KOKKOS package.
-As above, They can be set either as variables on the make command line
-or in the machine makefile in the src/MAKE directory.  See <A HREF = "Section_start.html#start_3_4">Section
-2.3.4</A> of the manual for details.
-</P>
 <P>IMPORTANT NOTE: Currently, there are no precision options with the
 KOKKOS package.  All compilation and computation is performed in
 double precision.
 </P>
+<P>There are other allowed options when building with the KOKKOS package.
+As above, they can be set either as variables on the make command line
+or in Makefile.machine.  This is the full list of options, including
+those discussed above, Each takes a value of <I>yes</I> or <I>no</I>.  The
+default value is listed, which is set in the
+lib/kokkos/Makefile.lammps file.
+</P>
+<UL><LI>OMP, default = <I>yes</I>
+<LI>CUDA, default = <I>no</I>
+<LI>HWLOC, default = <I>no</I>
+<LI>AVX, default = <I>no</I>
+<LI>MIC, default = <I>no</I>
+<LI>LIBRT, default = <I>no</I>
+<LI>DEBUG, default = <I>no</I> 
+</UL>
+<P>OMP sets the parallelization method used for Kokkos code (within
+LAMMPS) that runs on the host.  OMP=yes means that OpenMP will be
+used.  OMP=no means that pthreads will be used.
+</P>
+<P>CUDA sets the parallelization method used for Kokkos code (within
+LAMMPS) that runs on the device.  CUDA=yes means an NVIDIA GPU running
+CUDA will be used.  CUDA=no means that the OMP=yes or OMP=no setting
+will be used for the device as well as the host.
+</P>
+<P>If CUDA=yes, then the lo-level Makefile in the src/MAKE directory must
+use "nvcc" as its compiler, via its CC setting.  For best performance
+its CCFLAGS setting should use -O3 and have an -arch setting that
+matches the compute capability of your NVIDIA hardware and software
+installation, e.g. -arch=sm_20.  Generally Fermi Generation GPUs are
+sm_20, while Kepler generation GPUs are sm_30 or sm_35 and Maxwell
+cards are sm_50.  A complete list can be found on
+<A HREF = "http://en.wikipedia.org/wiki/CUDA#Supported_GPUs">wikipedia</A>. You can
+also use the deviceQuery tool that comes with the CUDA samples.  Note
+the minimal required compute capability is 2.0, but this will give
+signicantly reduced performance compared to Kepler generation GPUs
+with compute capability 3.x.  For the LINK setting, "nvcc" should not
+be used; instead use g++ or another compiler suitable for linking C++
+applications.  Often you will want to use your MPI compiler wrapper
+for this setting (i.e. mpicxx).  Finally, the lo-level Makefile must
+also have a "Compilation rule" for creating *.o files from *.cu files.
+See src/Makefile.cuda for an example of a lo-level Makefile with all
+of these settings.
+</P>
+<P>HWLOC binds threads to hardware cores, so they do not migrate during a
+simulation.  HWLOC=yes should always be used if running with OMP=no
+for pthreads.  It is not necessary for OMP=yes for OpenMP, because
+OpenMP provides alternative methods via environment variables for
+binding threads to hardware cores.  More info on binding threads to
+cores is given in <A HREF = "Section_accelerate.html#acc_8">this section</A>.
+</P>
+<P>AVX enables Intel advanced vector extensions when compiling for an
+Intel-compatible chip.  AVX=yes should only be set if your host
+hardware supports AVX.  If it does not support it, this will cause a
+run-time crash.
+</P>
+<P>MIC enables compiler switches needed when compling for an Intel Phi
+processor.
+</P>
+<P>LIBRT enables use of a more accurate timer mechanism on most Unix
+platforms.  This library is not available on all platforms.
+</P>
+<P>DEBUG is only useful when developing a Kokkos-enabled style within
+LAMMPS.  DEBUG=yes enables printing of run-time debugging information
+that can be useful.  It also enables runtime bounds checking on Kokkos
+data structures.
+</P>
 <P><B>Run with the KOKKOS package from the command line:</B>
 </P>
 <P>The mpirun or mpiexec command sets the total number of MPI tasks used
--- a/doc/accelerate_kokkos.txt
+++ b/doc/accelerate_kokkos.txt
@ -58,14 +58,17 @@ one or the other of the two modes.  The first mode is called the
 processor (running in native mode, not offload mode like the
 USER-INTEL package) are supported.  The second mode is called the
 "device" and is an accelerator chip of some kind.  Currently only an
-NVIDIA GPU is supported.  If your compute node does not have a GPU,
-then there is only one mode of execution, i.e. the host and device are
-the same.
+NVIDIA GPU is supported via Cuda.  If your compute node does not have
+a GPU, then there is only one mode of execution, i.e. the host and
+device are the same.

-Here is a quick overview of how to use the KOKKOS package
-for GPU acceleration:
+When using the KOKKOS package, you must choose at build time whether
+you are building for OpenMP, GPU, or for using the Xeon Phi in native
+mode.

-specify variables and settings in your Makefile.machine that enable GPU, Phi, or OpenMP support
+Here is a quick overview of how to use the KOKKOS package:
+
+specify variables and settings in your Makefile.machine that enable OpenMP, GPU, or Phi support
 include the KOKKOS package and build LAMMPS
 enable the KOKKOS package and its hardware options via the "-k on" command-line switch
 use KOKKOS styles in your input script :ul
@ -102,14 +105,23 @@ and GPU packages for details of how to check and do this.

 [Building LAMMPS with the KOKKOS package:]

-Unlike other acceleration packages discussed in this section, the
-Kokkos library in lib/kokkos does not have to be pre-built before
-building LAMMPS itself.  Instead, options for the Kokkos library are
-specified at compile time, when LAMMPS itself is built.  This can be
-done in one of two ways, as discussed below.
+You must choose at build time whether to build for OpenMP, Cuda, or
+Phi.

-Here are examples of how to build LAMMPS for the different compute-node
-configurations listed above.
+You can do any of these in one line, using the src/Make.py script,
+described in "Section 2.4"_Section_start.html#start_4 of the manual.
+Type "Make.py -h" for help.  If run from the src directory, these
+commands will create src/lmp_kokkos_omp, lmp_kokkos_cuda, and
+lmp_kokkos_phi.  The OMP and PHI options use src/MAKE/Makefile.mpi as
+the starting Makefile.machine.  The CUDA option uses
+src/MAKE/OPTIONS/Makefile.cuda since the NVIDIA nvcc compiler is
+required.
+
+Make.py -p kokkos -kokkos omp -o kokkos_omp file mpi 
+Make.py -p kokkos -kokkos cuda arch=31 -o kokkos_cuda file kokkos_cuda
+Make.py -p kokkos -kokkos phi -o kokkos_phi file mpi 
+
+Or you can follow these steps:

 CPU-only (run all-MPI or with OpenMP threading):

@ -161,15 +173,76 @@ in "Section 2.3.4"_Section_start.html#start_3_4 of the manual, as well
 as other settings that must be included in the machine makefile, if
 you create your own.

-There are other allowed options when building with the KOKKOS package.
-As above, They can be set either as variables on the make command line
-or in the machine makefile in the src/MAKE directory.  See "Section
-2.3.4"_Section_start.html#start_3_4 of the manual for details.
-
 IMPORTANT NOTE: Currently, there are no precision options with the
 KOKKOS package.  All compilation and computation is performed in
 double precision.

+There are other allowed options when building with the KOKKOS package.
+As above, they can be set either as variables on the make command line
+or in Makefile.machine.  This is the full list of options, including
+those discussed above, Each takes a value of {yes} or {no}.  The
+default value is listed, which is set in the
+lib/kokkos/Makefile.lammps file.
+
+OMP, default = {yes}
+CUDA, default = {no}
+HWLOC, default = {no}
+AVX, default = {no}
+MIC, default = {no}
+LIBRT, default = {no}
+DEBUG, default = {no} :ul
+
+OMP sets the parallelization method used for Kokkos code (within
+LAMMPS) that runs on the host.  OMP=yes means that OpenMP will be
+used.  OMP=no means that pthreads will be used.
+
+CUDA sets the parallelization method used for Kokkos code (within
+LAMMPS) that runs on the device.  CUDA=yes means an NVIDIA GPU running
+CUDA will be used.  CUDA=no means that the OMP=yes or OMP=no setting
+will be used for the device as well as the host.
+
+If CUDA=yes, then the lo-level Makefile in the src/MAKE directory must
+use "nvcc" as its compiler, via its CC setting.  For best performance
+its CCFLAGS setting should use -O3 and have an -arch setting that
+matches the compute capability of your NVIDIA hardware and software
+installation, e.g. -arch=sm_20.  Generally Fermi Generation GPUs are
+sm_20, while Kepler generation GPUs are sm_30 or sm_35 and Maxwell
+cards are sm_50.  A complete list can be found on
+"wikipedia"_http://en.wikipedia.org/wiki/CUDA#Supported_GPUs. You can
+also use the deviceQuery tool that comes with the CUDA samples.  Note
+the minimal required compute capability is 2.0, but this will give
+signicantly reduced performance compared to Kepler generation GPUs
+with compute capability 3.x.  For the LINK setting, "nvcc" should not
+be used; instead use g++ or another compiler suitable for linking C++
+applications.  Often you will want to use your MPI compiler wrapper
+for this setting (i.e. mpicxx).  Finally, the lo-level Makefile must
+also have a "Compilation rule" for creating *.o files from *.cu files.
+See src/Makefile.cuda for an example of a lo-level Makefile with all
+of these settings.
+
+HWLOC binds threads to hardware cores, so they do not migrate during a
+simulation.  HWLOC=yes should always be used if running with OMP=no
+for pthreads.  It is not necessary for OMP=yes for OpenMP, because
+OpenMP provides alternative methods via environment variables for
+binding threads to hardware cores.  More info on binding threads to
+cores is given in "this section"_Section_accelerate.html#acc_8.
+
+AVX enables Intel advanced vector extensions when compiling for an
+Intel-compatible chip.  AVX=yes should only be set if your host
+hardware supports AVX.  If it does not support it, this will cause a
+run-time crash.
+
+MIC enables compiler switches needed when compling for an Intel Phi
+processor.
+
+LIBRT enables use of a more accurate timer mechanism on most Unix
+platforms.  This library is not available on all platforms.
+
+DEBUG is only useful when developing a Kokkos-enabled style within
+LAMMPS.  DEBUG=yes enables printing of run-time debugging information
+that can be useful.  It also enables runtime bounds checking on Kokkos
+data structures.
+
 [Run with the KOKKOS package from the command line:]

 The mpirun or mpiexec command sets the total number of MPI tasks used
--- a/doc/accelerate_omp.html
+++ b/doc/accelerate_omp.html
@ -42,17 +42,27 @@ MPI task running on a CPU.
 </P>
 <P><B>Building LAMMPS with the USER-OMP package:</B>
 </P>
-<P>Include the package and build LAMMPS:
+<P>To do this in one line, use the src/Make.py script, described in
+<A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual.  Type "Make.py
+-h" for help.  If run from the src directory, this command will create
+src/lmp_omp using src/MAKE/Makefile.mpi as the starting
+Makefile.machine:
+</P>
+<PRE>Make.py -p omp -o omp file mpi 
+</PRE>
+<P>Or you can follow these steps:
 </P>
 <PRE>cd lammps/src
 make yes-user-omp
 make machine 
 </PRE>
-<P>The CCFLAGS setting in your src/MAKE/Makefile.machine needs "-fopenmp"
-to add OpenMP support.  This works for both the GNU and Intel
-compilers.  Without this flag the USER-OMP styles will still be
-compiled and work, but will not support multi-threading.  For the
-Intel compilers the CCFLAGS setting also needs to include "-restrict".
+<P>The CCFLAGS setting in Makefile.machine needs "-fopenmp" to add OpenMP
+support.  This works for both the GNU and Intel compilers.  Without
+this flag the USER-OMP styles will still be compiled and work, but
+will not support multi-threading.  For the Intel compilers the CCFLAGS
+setting also needs to include "-restrict".
+</P>
+<P><B>Run with the USER-OMP package from the command line:</B>
 </P>
 <P>The mpirun or mpiexec command sets the total number of MPI tasks used
 by LAMMPS (one or multiple per compute node) and the number of MPI
--- a/doc/accelerate_omp.txt
+++ b/doc/accelerate_omp.txt
@ -39,17 +39,27 @@ MPI task running on a CPU.

 [Building LAMMPS with the USER-OMP package:]

-Include the package and build LAMMPS:
+To do this in one line, use the src/Make.py script, described in
+"Section 2.4"_Section_start.html#start_4 of the manual.  Type "Make.py
+-h" for help.  If run from the src directory, this command will create
+src/lmp_omp using src/MAKE/Makefile.mpi as the starting
+Makefile.machine:
+
+Make.py -p omp -o omp file mpi :pre
+
+Or you can follow these steps:

 cd lammps/src
 make yes-user-omp
 make machine :pre

-The CCFLAGS setting in your src/MAKE/Makefile.machine needs "-fopenmp"
-to add OpenMP support.  This works for both the GNU and Intel
-compilers.  Without this flag the USER-OMP styles will still be
-compiled and work, but will not support multi-threading.  For the
-Intel compilers the CCFLAGS setting also needs to include "-restrict".
+The CCFLAGS setting in Makefile.machine needs "-fopenmp" to add OpenMP
+support.  This works for both the GNU and Intel compilers.  Without
+this flag the USER-OMP styles will still be compiled and work, but
+will not support multi-threading.  For the Intel compilers the CCFLAGS
+setting also needs to include "-restrict".
+
+[Run with the USER-OMP package from the command line:]

 The mpirun or mpiexec command sets the total number of MPI tasks used
 by LAMMPS (one or multiple per compute node) and the number of MPI
--- a/doc/accelerate_opt.html
+++ b/doc/accelerate_opt.html
@ -38,12 +38,22 @@ input script.
 </P>
 <P>Include the package and build LAMMPS:
 </P>
+<P>To do this in one line, use the src/Make.py script, described in
+<A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual.  Type "Make.py
+-h" for help.  If run from the src directory, this command will create
+src/lmp_opt using src/MAKE/Makefile.mpi as the starting
+Makefile.machine:
+</P>
+<PRE>Make.py -p opt -o opt file mpi 
+</PRE>
+<P>Or you can follow these steps:
+</P>
 <PRE>cd lammps/src
 make yes-opt
 make machine 
 </PRE>
-<P>If you are using Intel compilers, then the CCFLAGS setting in your
-src/MAKE/Makefile.machine needs to include "-restrict".
+<P>If you are using Intel compilers, then the CCFLAGS setting in
+Makefile.machine needs to include "-restrict".
 </P>
 <P><B>Run with the OPT package from the command line:</B>
 </P>
--- a/doc/accelerate_opt.txt
+++ b/doc/accelerate_opt.txt
@ -35,12 +35,22 @@ None.

 Include the package and build LAMMPS:

+To do this in one line, use the src/Make.py script, described in
+"Section 2.4"_Section_start.html#start_4 of the manual.  Type "Make.py
+-h" for help.  If run from the src directory, this command will create
+src/lmp_opt using src/MAKE/Makefile.mpi as the starting
+Makefile.machine:
+
+Make.py -p opt -o opt file mpi :pre
+
+Or you can follow these steps:
+
 cd lammps/src
 make yes-opt
 make machine :pre

-If you are using Intel compilers, then the CCFLAGS setting in your
-src/MAKE/Makefile.machine needs to include "-restrict".
+If you are using Intel compilers, then the CCFLAGS setting in
+Makefile.machine needs to include "-restrict".

 [Run with the OPT package from the command line:]