remove references to Make.py and USER-CUDA
This commit is contained in:
@ -1,55 +1,21 @@
|
||||
These are input scripts used to run versions of several of the
|
||||
benchmarks in the top-level bench directory using the GPU and
|
||||
USER-CUDA accelerator packages. The results of running these scripts
|
||||
on two different machines (a desktop with 2 Tesla GPUs and the ORNL
|
||||
Titan supercomputer) are shown on the "GPU (Fermi)" section of the
|
||||
Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
|
||||
benchmarks in the top-level bench directory using the GPU accelerator
|
||||
package. The results of running these scripts on two different machines
|
||||
(a desktop with 2 Tesla GPUs and the ORNL Titan supercomputer) are shown
|
||||
on the "GPU (Fermi)" section of the Benchmark page of the LAMMPS WWW
|
||||
site: lammps.sandia.gov/bench.
|
||||
|
||||
Examples are shown below of how to run these scripts. This assumes
|
||||
you have built 3 executables with both the GPU and USER-CUDA packages
|
||||
you have built 3 executables with the GPU package
|
||||
installed, e.g.
|
||||
|
||||
lmp_linux_single
|
||||
lmp_linux_mixed
|
||||
lmp_linux_double
|
||||
|
||||
The precision (single, mixed, double) refers to the GPU and USER-CUDA
|
||||
package precision. See the README files in the lib/gpu and lib/cuda
|
||||
directories for instructions on how to build the packages with
|
||||
different precisions. The GPU and USER-CUDA sub-sections of the
|
||||
doc/Section_accelerate.html file also describes this process.
|
||||
|
||||
Make.py -d ~/lammps -j 16 -p #all orig -m linux -o cpu -a exe
|
||||
Make.py -d ~/lammps -j 16 -p #all opt orig -m linux -o opt -a exe
|
||||
Make.py -d ~/lammps -j 16 -p #all omp orig -m linux -o omp -a exe
|
||||
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
|
||||
-gpu mode=double arch=20 -o gpu_double -a libs exe
|
||||
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
|
||||
-gpu mode=mixed arch=20 -o gpu_mixed -a libs exe
|
||||
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
|
||||
-gpu mode=single arch=20 -o gpu_single -a libs exe
|
||||
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
|
||||
-cuda mode=double arch=20 -o cuda_double -a libs exe
|
||||
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
|
||||
-cuda mode=mixed arch=20 -o cuda_mixed -a libs exe
|
||||
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
|
||||
-cuda mode=single arch=20 -o cuda_single -a libs exe
|
||||
Make.py -d ~/lammps -j 16 -p #all intel orig -m linux -o intel_cpu -a exe
|
||||
Make.py -d ~/lammps -j 16 -p #all kokkos orig -m linux -o kokkos_omp -a exe
|
||||
Make.py -d ~/lammps -j 16 -p #all kokkos orig -kokkos cuda arch=20 \
|
||||
-m cuda -o kokkos_cuda -a exe
|
||||
|
||||
Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
|
||||
-gpu mode=double arch=20 -cuda mode=double arch=20 -m linux \
|
||||
-o all -a libs exe
|
||||
|
||||
Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
|
||||
-kokkos cuda arch=20 -gpu mode=double arch=20 \
|
||||
-cuda mode=double arch=20 -m cuda -o all_cuda -a libs exe
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
To run on just CPUs (without using the GPU or USER-CUDA styles),
|
||||
To run on just CPUs (without using the GPU styles),
|
||||
do something like the following:
|
||||
|
||||
mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj
|
||||
@ -81,23 +47,5 @@ node via a "-ppn" setting.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
To run with the USER-CUDA package, do something like the following:
|
||||
|
||||
mpirun -np 1 lmp_linux_single -c on -sf cuda -v x 16 -v y 16 -v z 16 -v t 100 < in.lj
|
||||
mpirun -np 2 lmp_linux_double -c on -sf cuda -pk cuda 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam
|
||||
|
||||
The "xyz" settings determine the problem size. The "t" setting
|
||||
determines the number of timesteps. The "np" setting determines how
|
||||
many MPI tasks (per node) the problem will run on. The numeric
|
||||
argument to the "-pk" setting is the number of GPUs (per node); 1 GPU
|
||||
is the default. Note that the number of MPI tasks must equal the
|
||||
number of GPUs (both per node) with the USER-CUDA package.
|
||||
|
||||
These mpirun commands run on a single node. To run on multiple nodes,
|
||||
scale up the "-np" setting, and control the number of MPI tasks per
|
||||
node via a "-ppn" setting.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
If the script has "titan" in its name, it was run on the Titan
|
||||
supercomputer at ORNL.
|
||||
|
||||
46
bench/README
46
bench/README
@ -71,49 +71,33 @@ integration
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Here is a src/Make.py command which will perform a parallel build of a
|
||||
LAMMPS executable "lmp_mpi" with all the packages needed by all the
|
||||
examples. This assumes you have an MPI installed on your machine so
|
||||
that "mpicxx" can be used as the wrapper compiler. It also assumes
|
||||
you have an Intel compiler to use as the base compiler. You can leave
|
||||
off the "-cc mpi wrap=icc" switch if that is not the case. You can
|
||||
also leave off the "-fft fftw3" switch if you do not have the FFTW
|
||||
(v3) installed as an FFT package, in which case the default KISS FFT
|
||||
library will be used.
|
||||
|
||||
cd src
|
||||
Make.py -j 16 -p none molecule manybody kspace granular rigid orig \
|
||||
-cc mpi wrap=icc -fft fftw3 -a file mpi
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Here is how to run each problem, assuming the LAMMPS executable is
|
||||
named lmp_mpi, and you are using the mpirun command to launch parallel
|
||||
runs:
|
||||
|
||||
Serial (one processor runs):
|
||||
|
||||
lmp_mpi < in.lj
|
||||
lmp_mpi < in.chain
|
||||
lmp_mpi < in.eam
|
||||
lmp_mpi < in.chute
|
||||
lmp_mpi < in.rhodo
|
||||
lmp_mpi -in in.lj
|
||||
lmp_mpi -in in.chain
|
||||
lmp_mpi -in in.eam
|
||||
lmp_mpi -in in.chute
|
||||
lmp_mpi -in in.rhodo
|
||||
|
||||
Parallel fixed-size runs (on 8 procs in this case):
|
||||
|
||||
mpirun -np 8 lmp_mpi < in.lj
|
||||
mpirun -np 8 lmp_mpi < in.chain
|
||||
mpirun -np 8 lmp_mpi < in.eam
|
||||
mpirun -np 8 lmp_mpi < in.chute
|
||||
mpirun -np 8 lmp_mpi < in.rhodo
|
||||
mpirun -np 8 lmp_mpi -in in.lj
|
||||
mpirun -np 8 lmp_mpi -in in.chain
|
||||
mpirun -np 8 lmp_mpi -in in.eam
|
||||
mpirun -np 8 lmp_mpi -in in.chute
|
||||
mpirun -np 8 lmp_mpi -in in.rhodo
|
||||
|
||||
Parallel scaled-size runs (on 16 procs in this case):
|
||||
|
||||
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.lj
|
||||
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.chain.scaled
|
||||
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.eam
|
||||
mpirun -np 16 lmp_mpi -var x 4 -var y 4 < in.chute.scaled
|
||||
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.rhodo.scaled
|
||||
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.lj
|
||||
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.chain.scaled
|
||||
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.eam
|
||||
mpirun -np 16 lmp_mpi -var x 4 -var y 4 -in in.chute.scaled
|
||||
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.rhodo.scaled
|
||||
|
||||
For each of the scaled-size runs you must set 3 variables as -var
|
||||
command line switches. The variables x,y,z are used in the input
|
||||
|
||||
@ -105,20 +105,11 @@ tad: temperature-accelerated dynamics of vacancy diffusion in bulk Si
|
||||
vashishta: models using the Vashishta potential
|
||||
voronoi: Voronoi tesselation via compute voronoi/atom command
|
||||
|
||||
Here is a src/Make.py command which will perform a parallel build of a
|
||||
LAMMPS executable "lmp_mpi" with all the packages needed by all the
|
||||
examples, with the exception of the accelerate sub-directory. See the
|
||||
accelerate/README for Make.py commands suitable for its example
|
||||
scripts.
|
||||
|
||||
cd src
|
||||
Make.py -j 16 -p none std no-lib reax meam poems reaxc orig -a lib-all mpi
|
||||
|
||||
Here is how you might run and visualize one of the sample problems:
|
||||
|
||||
cd indent
|
||||
cp ../../src/lmp_mpi . # copy LAMMPS executable to this dir
|
||||
lmp_mpi < in.indent # run the problem
|
||||
lmp_mpi -in in.indent # run the problem
|
||||
|
||||
Running the simulation produces the files {dump.indent} and
|
||||
{log.lammps}. You can visualize the dump file as follows:
|
||||
|
||||
@ -1,14 +1,11 @@
|
||||
These are example scripts that can be run with any of
|
||||
the acclerator packages in LAMMPS:
|
||||
|
||||
USER-CUDA, GPU, USER-INTEL, KOKKOS, USER-OMP, OPT
|
||||
GPU, USER-INTEL, KOKKOS, USER-OMP, OPT
|
||||
|
||||
The easiest way to build LAMMPS with these packages
|
||||
is via the src/Make.py tool described in Section 2.4
|
||||
of the manual. You can also type "Make.py -h" to see
|
||||
its options. The easiest way to run these scripts
|
||||
is by using the appropriate
|
||||
|
||||
is via the flags described in Section 4 of the manual.
|
||||
The easiest way to run these scripts is by using the appropriate
|
||||
Details on the individual accelerator packages
|
||||
can be found in doc/Section_accelerate.html.
|
||||
|
||||
@ -16,21 +13,6 @@ can be found in doc/Section_accelerate.html.
|
||||
|
||||
Build LAMMPS with one or more of the accelerator packages
|
||||
|
||||
The following command will invoke the src/Make.py tool with one of the
|
||||
command-lines from the Make.list file:
|
||||
|
||||
../../src/Make.py -r Make.list target
|
||||
|
||||
target = one or more of the following:
|
||||
cpu, omp, opt
|
||||
cuda_double, cuda_mixed, cuda_single
|
||||
gpu_double, gpu_mixed, gpu_single
|
||||
intel_cpu, intel_phi
|
||||
kokkos_omp, kokkos_cuda, kokkos_phi
|
||||
|
||||
If successful, the build will produce the file lmp_target in this
|
||||
directory.
|
||||
|
||||
Note that in addition to any accelerator packages, these packages also
|
||||
need to be installed to run all of the example scripts: ASPHERE,
|
||||
MOLECULE, KSPACE, RIGID.
|
||||
@ -38,39 +20,11 @@ MOLECULE, KSPACE, RIGID.
|
||||
These two targets will build a single LAMMPS executable with all the
|
||||
CPU accelerator packages installed (USER-INTEL for CPU, KOKKOS for
|
||||
OMP, USER-OMP, OPT) or all the GPU accelerator packages installed
|
||||
(USER-CUDA, GPU, KOKKOS for CUDA):
|
||||
(GPU, KOKKOS for CUDA):
|
||||
|
||||
target = all_cpu, all_gpu
|
||||
|
||||
Note that the Make.py commands in Make.list assume an MPI environment
|
||||
exists on your machine and use mpicxx as the wrapper compiler with
|
||||
whatever underlying compiler it wraps by default. If you add "-cc mpi
|
||||
wrap=g++" or "-cc mpi wrap=icc" after the target, you can choose the
|
||||
underlying compiler for mpicxx to invoke. E.g.
|
||||
|
||||
../../src/Make.py -r Make.list intel_cpu -cc mpi wrap=icc
|
||||
|
||||
You should do this for any build that includes the USER-INTEL
|
||||
package, since it will perform best with the Intel compilers.
|
||||
|
||||
Note that for kokkos_cuda, it needs to be "-cc nvcc" instead of "mpi",
|
||||
since a KOKKOS for CUDA build requires NVIDIA nvcc as the wrapper
|
||||
compiler.
|
||||
|
||||
Also note that the Make.py commands in Make.list use the default
|
||||
FFT support which is via the KISS library. If you want to
|
||||
build with another FFT library, e.g. FFTW3, then you can add
|
||||
"-fft fftw3" after the target, e.g.
|
||||
|
||||
../../src/Make.py -r Make.list gpu -fft fftw3
|
||||
|
||||
For any build with USER-CUDA, GPU, or KOKKOS for CUDA, be sure to set
|
||||
For any build with GPU, or KOKKOS for CUDA, be sure to set
|
||||
the arch=XX setting to the appropriate value for the GPUs and Cuda
|
||||
environment on your system. What is defined in the Make.list file is
|
||||
arch=21 for older Fermi GPUs. This can be overridden as follows,
|
||||
e.g. for Kepler GPUs:
|
||||
|
||||
../../src/Make.py -r Make.list gpu_double -gpu mode=double arch=35
|
||||
environment on your system.
|
||||
|
||||
---------------------
|
||||
|
||||
@ -118,12 +72,6 @@ Note that when running in.lj.5.0 (which has a long cutoff) with the
|
||||
GPU package, the "-pk tpa" setting should be > 1 (e.g. 8) for best
|
||||
performance.
|
||||
|
||||
** USER-CUDA package
|
||||
|
||||
lmp_machine -c on -sf cuda < in.lj
|
||||
mpirun -np 1 lmp_machine -c on -sf cuda < in.lj # 1 MPI, 1 MPI/GPU
|
||||
mpirun -np 2 lmp_machine -c on -sf cuda -pk cuda 2 < in.lj # 2 MPI, 1 MPI/GPU
|
||||
|
||||
** KOKKOS package for OMP
|
||||
|
||||
lmp_kokkos_omp -k on t 1 -sf kk -pk kokkos neigh half < in.lj
|
||||
|
||||
Reference in New Issue
Block a user