diff --git a/bench/FERMI/README b/bench/FERMI/README
index db3f527bdc..b66e560775 100644
--- a/bench/FERMI/README
+++ b/bench/FERMI/README
@@ -1,55 +1,21 @@
 These are input scripts used to run versions of several of the
-benchmarks in the top-level bench directory using the GPU and
-USER-CUDA accelerator packages.  The results of running these scripts
-on two different machines (a desktop with 2 Tesla GPUs and the ORNL
-Titan supercomputer) are shown on the "GPU (Fermi)" section of the
-Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
+benchmarks in the top-level bench directory using the GPU accelerator
+package.  The results of running these scripts on two different machines
+(a desktop with 2 Tesla GPUs and the ORNL Titan supercomputer) are shown
+on the "GPU (Fermi)" section of the Benchmark page of the LAMMPS WWW
+site: lammps.sandia.gov/bench.
 
 Examples are shown below of how to run these scripts.  This assumes
-you have built 3 executables with both the GPU and USER-CUDA packages
+you have built 3 executables with the GPU package
 installed, e.g.
 
 lmp_linux_single
 lmp_linux_mixed
 lmp_linux_double
 
-The precision (single, mixed, double) refers to the GPU and USER-CUDA
-package precision.  See the README files in the lib/gpu and lib/cuda
-directories for instructions on how to build the packages with
-different precisions.  The GPU and USER-CUDA sub-sections of the
-doc/Section_accelerate.html file also describes this process.
-
-Make.py -d ~/lammps -j 16 -p #all orig -m linux -o cpu -a exe
-Make.py -d ~/lammps -j 16 -p #all opt orig -m linux -o opt -a exe
-Make.py -d ~/lammps -j 16 -p #all omp orig -m linux -o omp -a exe
-Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
-        -gpu mode=double arch=20 -o gpu_double -a libs exe
-Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
-        -gpu mode=mixed arch=20 -o gpu_mixed -a libs exe
-Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
-        -gpu mode=single arch=20 -o gpu_single -a libs exe
-Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
-        -cuda mode=double arch=20 -o cuda_double -a libs exe
-Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
-        -cuda mode=mixed arch=20 -o cuda_mixed -a libs exe
-Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
-        -cuda mode=single arch=20 -o cuda_single -a libs exe
-Make.py -d ~/lammps -j 16 -p #all intel orig -m linux -o intel_cpu -a exe
-Make.py -d ~/lammps -j 16 -p #all kokkos orig -m linux -o kokkos_omp -a exe
-Make.py -d ~/lammps -j 16 -p #all kokkos orig -kokkos cuda arch=20 \
-        -m cuda -o kokkos_cuda -a exe
-
-Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
-        -gpu mode=double arch=20 -cuda mode=double arch=20 -m linux \
-        -o all -a libs exe
-
-Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
-        -kokkos cuda arch=20 -gpu mode=double arch=20 \
-        -cuda mode=double arch=20 -m cuda -o all_cuda -a libs exe
-
 ------------------------------------------------------------------------
 
-To run on just CPUs (without using the GPU or USER-CUDA styles),
+To run on just CPUs (without using the GPU styles),
 do something like the following:
 
 mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj
@@ -81,23 +47,5 @@ node via a "-ppn" setting.
 
 ------------------------------------------------------------------------
 
-To run with the USER-CUDA package, do something like the following:
-
-mpirun -np 1 lmp_linux_single -c on -sf cuda -v x 16 -v y 16 -v z 16 -v t 100 < in.lj
-mpirun -np 2 lmp_linux_double -c on -sf cuda -pk cuda 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam
-
-The "xyz" settings determine the problem size.  The "t" setting
-determines the number of timesteps.  The "np" setting determines how
-many MPI tasks (per node) the problem will run on.  The numeric
-argument to the "-pk" setting is the number of GPUs (per node); 1 GPU
-is the default.  Note that the number of MPI tasks must equal the
-number of GPUs (both per node) with the USER-CUDA package.
-
-These mpirun commands run on a single node.  To run on multiple nodes,
-scale up the "-np" setting, and control the number of MPI tasks per
-node via a "-ppn" setting.
-
-------------------------------------------------------------------------
-
 If the script has "titan" in its name, it was run on the Titan
 supercomputer at ORNL.
diff --git a/bench/README b/bench/README
index 85d71cbb5d..0806fcded6 100644
--- a/bench/README
+++ b/bench/README
@@ -71,49 +71,33 @@ integration
 
 ----------------------------------------------------------------------
 
-Here is a src/Make.py command which will perform a parallel build of a
-LAMMPS executable "lmp_mpi" with all the packages needed by all the
-examples.  This assumes you have an MPI installed on your machine so
-that "mpicxx" can be used as the wrapper compiler.  It also assumes
-you have an Intel compiler to use as the base compiler.  You can leave
-off the "-cc mpi wrap=icc" switch if that is not the case.  You can
-also leave off the "-fft fftw3" switch if you do not have the FFTW
-(v3) installed as an FFT package, in which case the default KISS FFT
-library will be used.
-
-cd src
-Make.py -j 16 -p none molecule manybody kspace granular rigid orig \
-  -cc mpi wrap=icc -fft fftw3 -a file mpi
-
-----------------------------------------------------------------------
-
 Here is how to run each problem, assuming the LAMMPS executable is
 named lmp_mpi, and you are using the mpirun command to launch parallel
 runs:
 
 Serial (one processor runs):
 
-lmp_mpi < in.lj
-lmp_mpi < in.chain
-lmp_mpi < in.eam
-lmp_mpi < in.chute
-lmp_mpi < in.rhodo
+lmp_mpi -in in.lj
+lmp_mpi -in in.chain
+lmp_mpi -in in.eam
+lmp_mpi -in in.chute
+lmp_mpi -in in.rhodo
 
 Parallel fixed-size runs (on 8 procs in this case):
 
-mpirun -np 8 lmp_mpi < in.lj
-mpirun -np 8 lmp_mpi < in.chain
-mpirun -np 8 lmp_mpi < in.eam
-mpirun -np 8 lmp_mpi < in.chute
-mpirun -np 8 lmp_mpi < in.rhodo
+mpirun -np 8 lmp_mpi -in in.lj
+mpirun -np 8 lmp_mpi -in in.chain
+mpirun -np 8 lmp_mpi -in in.eam
+mpirun -np 8 lmp_mpi -in in.chute
+mpirun -np 8 lmp_mpi -in in.rhodo
 
 Parallel scaled-size runs (on 16 procs in this case):
 
-mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.lj
-mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.chain.scaled
-mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.eam
-mpirun -np 16 lmp_mpi -var x 4 -var y 4 < in.chute.scaled
-mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.rhodo.scaled
+mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.lj
+mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.chain.scaled
+mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.eam
+mpirun -np 16 lmp_mpi -var x 4 -var y 4 -in in.chute.scaled
+mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.rhodo.scaled
 
 For each of the scaled-size runs you must set 3 variables as -var
 command line switches.  The variables x,y,z are used in the input
diff --git a/examples/README b/examples/README
index e4312e2598..702ada790b 100644
--- a/examples/README
+++ b/examples/README
@@ -105,20 +105,11 @@ tad:      temperature-accelerated dynamics of vacancy diffusion in bulk Si
 vashishta: models using the Vashishta potential
 voronoi:  Voronoi tesselation via compute voronoi/atom command
 
-Here is a src/Make.py command which will perform a parallel build of a
-LAMMPS executable "lmp_mpi" with all the packages needed by all the
-examples, with the exception of the accelerate sub-directory.  See the
-accelerate/README for Make.py commands suitable for its example
-scripts.
-
-cd src
-Make.py -j 16 -p none std no-lib reax meam poems reaxc orig -a lib-all mpi
-
 Here is how you might run and visualize one of the sample problems:
 
 cd indent
 cp ../../src/lmp_mpi .           # copy LAMMPS executable to this dir
-lmp_mpi < in.indent              # run the problem
+lmp_mpi -in in.indent              # run the problem
 
 Running the simulation produces the files {dump.indent} and
 {log.lammps}.  You can visualize the dump file as follows:
diff --git a/examples/accelerate/README b/examples/accelerate/README
index 1fab296a53..c4eb5dcc8d 100644
--- a/examples/accelerate/README
+++ b/examples/accelerate/README
@@ -1,14 +1,11 @@
 These are example scripts that can be run with any of
 the acclerator packages in LAMMPS:
 
-USER-CUDA, GPU, USER-INTEL, KOKKOS, USER-OMP, OPT
+GPU, USER-INTEL, KOKKOS, USER-OMP, OPT
 
 The easiest way to build LAMMPS with these packages
-is via the src/Make.py tool described in Section 2.4
-of the manual.  You can also type "Make.py -h" to see
-its options.  The easiest way to run these scripts
-is by using the appropriate
-
+is via the flags described in Section 4 of the manual.
+The easiest way to run these scripts is by using the appropriate
 Details on the individual accelerator packages
 can be found in doc/Section_accelerate.html.
 
@@ -16,21 +13,6 @@ can be found in doc/Section_accelerate.html.
 
 Build LAMMPS with one or more of the accelerator packages
 
-The following command will invoke the src/Make.py tool with one of the
-command-lines from the Make.list file:
-
-../../src/Make.py -r Make.list target
-
-target = one or more of the following:
-  cpu, omp, opt
-  cuda_double, cuda_mixed, cuda_single
-  gpu_double, gpu_mixed, gpu_single
-  intel_cpu, intel_phi
-  kokkos_omp, kokkos_cuda, kokkos_phi
-
-If successful, the build will produce the file lmp_target in this
-directory.
-
 Note that in addition to any accelerator packages, these packages also
 need to be installed to run all of the example scripts: ASPHERE,
 MOLECULE, KSPACE, RIGID.
@@ -38,39 +20,11 @@ MOLECULE, KSPACE, RIGID.
 These two targets will build a single LAMMPS executable with all the
 CPU accelerator packages installed (USER-INTEL for CPU, KOKKOS for
 OMP, USER-OMP, OPT) or all the GPU accelerator packages installed
-(USER-CUDA, GPU, KOKKOS for CUDA):
+(GPU, KOKKOS for CUDA):
 
-target = all_cpu, all_gpu
-
-Note that the Make.py commands in Make.list assume an MPI environment
-exists on your machine and use mpicxx as the wrapper compiler with
-whatever underlying compiler it wraps by default.  If you add "-cc mpi
-wrap=g++" or "-cc mpi wrap=icc" after the target, you can choose the
-underlying compiler for mpicxx to invoke.  E.g.
-
-../../src/Make.py -r Make.list intel_cpu -cc mpi wrap=icc
-
-You should do this for any build that includes the USER-INTEL
-package, since it will perform best with the Intel compilers.
-
-Note that for kokkos_cuda, it needs to be "-cc nvcc" instead of "mpi",
-since a KOKKOS for CUDA build requires NVIDIA nvcc as the wrapper
-compiler.
-
-Also note that the Make.py commands in Make.list use the default
-FFT support which is via the KISS library.  If you want to
-build with another FFT library, e.g. FFTW3, then you can add
-"-fft fftw3" after the target, e.g.
-
-../../src/Make.py -r Make.list gpu -fft fftw3
-
-For any build with USER-CUDA, GPU, or KOKKOS for CUDA, be sure to set
+For any build with GPU, or KOKKOS for CUDA, be sure to set
 the arch=XX setting to the appropriate value for the GPUs and Cuda
-environment on your system.  What is defined in the Make.list file is
-arch=21 for older Fermi GPUs.  This can be overridden as follows,
-e.g. for Kepler GPUs:
-
-../../src/Make.py -r Make.list gpu_double -gpu mode=double arch=35
+environment on your system.
 
 ---------------------
 
@@ -118,12 +72,6 @@ Note that when running in.lj.5.0 (which has a long cutoff) with the
 GPU package, the "-pk tpa" setting should be > 1 (e.g. 8) for best
 performance.
 
-** USER-CUDA package
-
-lmp_machine -c on -sf cuda < in.lj
-mpirun -np 1 lmp_machine -c on -sf cuda < in.lj    # 1 MPI, 1 MPI/GPU
-mpirun -np 2 lmp_machine -c on -sf cuda -pk cuda 2 < in.lj  # 2 MPI, 1 MPI/GPU
-
 ** KOKKOS package for OMP
 
 lmp_kokkos_omp -k on t 1 -sf kk -pk kokkos neigh half < in.lj