diff --git a/bench/FERMI/README b/bench/FERMI/README index db3f527bdc..b66e560775 100644 --- a/bench/FERMI/README +++ b/bench/FERMI/README @@ -1,55 +1,21 @@ These are input scripts used to run versions of several of the -benchmarks in the top-level bench directory using the GPU and -USER-CUDA accelerator packages. The results of running these scripts -on two different machines (a desktop with 2 Tesla GPUs and the ORNL -Titan supercomputer) are shown on the "GPU (Fermi)" section of the -Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench. +benchmarks in the top-level bench directory using the GPU accelerator +package. The results of running these scripts on two different machines +(a desktop with 2 Tesla GPUs and the ORNL Titan supercomputer) are shown +on the "GPU (Fermi)" section of the Benchmark page of the LAMMPS WWW +site: lammps.sandia.gov/bench. Examples are shown below of how to run these scripts. This assumes -you have built 3 executables with both the GPU and USER-CUDA packages +you have built 3 executables with the GPU package installed, e.g. lmp_linux_single lmp_linux_mixed lmp_linux_double -The precision (single, mixed, double) refers to the GPU and USER-CUDA -package precision. See the README files in the lib/gpu and lib/cuda -directories for instructions on how to build the packages with -different precisions. The GPU and USER-CUDA sub-sections of the -doc/Section_accelerate.html file also describes this process. - -Make.py -d ~/lammps -j 16 -p #all orig -m linux -o cpu -a exe -Make.py -d ~/lammps -j 16 -p #all opt orig -m linux -o opt -a exe -Make.py -d ~/lammps -j 16 -p #all omp orig -m linux -o omp -a exe -Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \ - -gpu mode=double arch=20 -o gpu_double -a libs exe -Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \ - -gpu mode=mixed arch=20 -o gpu_mixed -a libs exe -Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \ - -gpu mode=single arch=20 -o gpu_single -a libs exe -Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \ - -cuda mode=double arch=20 -o cuda_double -a libs exe -Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \ - -cuda mode=mixed arch=20 -o cuda_mixed -a libs exe -Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \ - -cuda mode=single arch=20 -o cuda_single -a libs exe -Make.py -d ~/lammps -j 16 -p #all intel orig -m linux -o intel_cpu -a exe -Make.py -d ~/lammps -j 16 -p #all kokkos orig -m linux -o kokkos_omp -a exe -Make.py -d ~/lammps -j 16 -p #all kokkos orig -kokkos cuda arch=20 \ - -m cuda -o kokkos_cuda -a exe - -Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \ - -gpu mode=double arch=20 -cuda mode=double arch=20 -m linux \ - -o all -a libs exe - -Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \ - -kokkos cuda arch=20 -gpu mode=double arch=20 \ - -cuda mode=double arch=20 -m cuda -o all_cuda -a libs exe - ------------------------------------------------------------------------ -To run on just CPUs (without using the GPU or USER-CUDA styles), +To run on just CPUs (without using the GPU styles), do something like the following: mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj @@ -81,23 +47,5 @@ node via a "-ppn" setting. ------------------------------------------------------------------------ -To run with the USER-CUDA package, do something like the following: - -mpirun -np 1 lmp_linux_single -c on -sf cuda -v x 16 -v y 16 -v z 16 -v t 100 < in.lj -mpirun -np 2 lmp_linux_double -c on -sf cuda -pk cuda 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam - -The "xyz" settings determine the problem size. The "t" setting -determines the number of timesteps. The "np" setting determines how -many MPI tasks (per node) the problem will run on. The numeric -argument to the "-pk" setting is the number of GPUs (per node); 1 GPU -is the default. Note that the number of MPI tasks must equal the -number of GPUs (both per node) with the USER-CUDA package. - -These mpirun commands run on a single node. To run on multiple nodes, -scale up the "-np" setting, and control the number of MPI tasks per -node via a "-ppn" setting. - ------------------------------------------------------------------------- - If the script has "titan" in its name, it was run on the Titan supercomputer at ORNL. diff --git a/bench/README b/bench/README index 85d71cbb5d..0806fcded6 100644 --- a/bench/README +++ b/bench/README @@ -71,49 +71,33 @@ integration ---------------------------------------------------------------------- -Here is a src/Make.py command which will perform a parallel build of a -LAMMPS executable "lmp_mpi" with all the packages needed by all the -examples. This assumes you have an MPI installed on your machine so -that "mpicxx" can be used as the wrapper compiler. It also assumes -you have an Intel compiler to use as the base compiler. You can leave -off the "-cc mpi wrap=icc" switch if that is not the case. You can -also leave off the "-fft fftw3" switch if you do not have the FFTW -(v3) installed as an FFT package, in which case the default KISS FFT -library will be used. - -cd src -Make.py -j 16 -p none molecule manybody kspace granular rigid orig \ - -cc mpi wrap=icc -fft fftw3 -a file mpi - ----------------------------------------------------------------------- - Here is how to run each problem, assuming the LAMMPS executable is named lmp_mpi, and you are using the mpirun command to launch parallel runs: Serial (one processor runs): -lmp_mpi < in.lj -lmp_mpi < in.chain -lmp_mpi < in.eam -lmp_mpi < in.chute -lmp_mpi < in.rhodo +lmp_mpi -in in.lj +lmp_mpi -in in.chain +lmp_mpi -in in.eam +lmp_mpi -in in.chute +lmp_mpi -in in.rhodo Parallel fixed-size runs (on 8 procs in this case): -mpirun -np 8 lmp_mpi < in.lj -mpirun -np 8 lmp_mpi < in.chain -mpirun -np 8 lmp_mpi < in.eam -mpirun -np 8 lmp_mpi < in.chute -mpirun -np 8 lmp_mpi < in.rhodo +mpirun -np 8 lmp_mpi -in in.lj +mpirun -np 8 lmp_mpi -in in.chain +mpirun -np 8 lmp_mpi -in in.eam +mpirun -np 8 lmp_mpi -in in.chute +mpirun -np 8 lmp_mpi -in in.rhodo Parallel scaled-size runs (on 16 procs in this case): -mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.lj -mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.chain.scaled -mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.eam -mpirun -np 16 lmp_mpi -var x 4 -var y 4 < in.chute.scaled -mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.rhodo.scaled +mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.lj +mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.chain.scaled +mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.eam +mpirun -np 16 lmp_mpi -var x 4 -var y 4 -in in.chute.scaled +mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.rhodo.scaled For each of the scaled-size runs you must set 3 variables as -var command line switches. The variables x,y,z are used in the input diff --git a/examples/README b/examples/README index e4312e2598..702ada790b 100644 --- a/examples/README +++ b/examples/README @@ -105,20 +105,11 @@ tad: temperature-accelerated dynamics of vacancy diffusion in bulk Si vashishta: models using the Vashishta potential voronoi: Voronoi tesselation via compute voronoi/atom command -Here is a src/Make.py command which will perform a parallel build of a -LAMMPS executable "lmp_mpi" with all the packages needed by all the -examples, with the exception of the accelerate sub-directory. See the -accelerate/README for Make.py commands suitable for its example -scripts. - -cd src -Make.py -j 16 -p none std no-lib reax meam poems reaxc orig -a lib-all mpi - Here is how you might run and visualize one of the sample problems: cd indent cp ../../src/lmp_mpi . # copy LAMMPS executable to this dir -lmp_mpi < in.indent # run the problem +lmp_mpi -in in.indent # run the problem Running the simulation produces the files {dump.indent} and {log.lammps}. You can visualize the dump file as follows: diff --git a/examples/accelerate/README b/examples/accelerate/README index 1fab296a53..c4eb5dcc8d 100644 --- a/examples/accelerate/README +++ b/examples/accelerate/README @@ -1,14 +1,11 @@ These are example scripts that can be run with any of the acclerator packages in LAMMPS: -USER-CUDA, GPU, USER-INTEL, KOKKOS, USER-OMP, OPT +GPU, USER-INTEL, KOKKOS, USER-OMP, OPT The easiest way to build LAMMPS with these packages -is via the src/Make.py tool described in Section 2.4 -of the manual. You can also type "Make.py -h" to see -its options. The easiest way to run these scripts -is by using the appropriate - +is via the flags described in Section 4 of the manual. +The easiest way to run these scripts is by using the appropriate Details on the individual accelerator packages can be found in doc/Section_accelerate.html. @@ -16,21 +13,6 @@ can be found in doc/Section_accelerate.html. Build LAMMPS with one or more of the accelerator packages -The following command will invoke the src/Make.py tool with one of the -command-lines from the Make.list file: - -../../src/Make.py -r Make.list target - -target = one or more of the following: - cpu, omp, opt - cuda_double, cuda_mixed, cuda_single - gpu_double, gpu_mixed, gpu_single - intel_cpu, intel_phi - kokkos_omp, kokkos_cuda, kokkos_phi - -If successful, the build will produce the file lmp_target in this -directory. - Note that in addition to any accelerator packages, these packages also need to be installed to run all of the example scripts: ASPHERE, MOLECULE, KSPACE, RIGID. @@ -38,39 +20,11 @@ MOLECULE, KSPACE, RIGID. These two targets will build a single LAMMPS executable with all the CPU accelerator packages installed (USER-INTEL for CPU, KOKKOS for OMP, USER-OMP, OPT) or all the GPU accelerator packages installed -(USER-CUDA, GPU, KOKKOS for CUDA): +(GPU, KOKKOS for CUDA): -target = all_cpu, all_gpu - -Note that the Make.py commands in Make.list assume an MPI environment -exists on your machine and use mpicxx as the wrapper compiler with -whatever underlying compiler it wraps by default. If you add "-cc mpi -wrap=g++" or "-cc mpi wrap=icc" after the target, you can choose the -underlying compiler for mpicxx to invoke. E.g. - -../../src/Make.py -r Make.list intel_cpu -cc mpi wrap=icc - -You should do this for any build that includes the USER-INTEL -package, since it will perform best with the Intel compilers. - -Note that for kokkos_cuda, it needs to be "-cc nvcc" instead of "mpi", -since a KOKKOS for CUDA build requires NVIDIA nvcc as the wrapper -compiler. - -Also note that the Make.py commands in Make.list use the default -FFT support which is via the KISS library. If you want to -build with another FFT library, e.g. FFTW3, then you can add -"-fft fftw3" after the target, e.g. - -../../src/Make.py -r Make.list gpu -fft fftw3 - -For any build with USER-CUDA, GPU, or KOKKOS for CUDA, be sure to set +For any build with GPU, or KOKKOS for CUDA, be sure to set the arch=XX setting to the appropriate value for the GPUs and Cuda -environment on your system. What is defined in the Make.list file is -arch=21 for older Fermi GPUs. This can be overridden as follows, -e.g. for Kepler GPUs: - -../../src/Make.py -r Make.list gpu_double -gpu mode=double arch=35 +environment on your system. --------------------- @@ -118,12 +72,6 @@ Note that when running in.lj.5.0 (which has a long cutoff) with the GPU package, the "-pk tpa" setting should be > 1 (e.g. 8) for best performance. -** USER-CUDA package - -lmp_machine -c on -sf cuda < in.lj -mpirun -np 1 lmp_machine -c on -sf cuda < in.lj # 1 MPI, 1 MPI/GPU -mpirun -np 2 lmp_machine -c on -sf cuda -pk cuda 2 < in.lj # 2 MPI, 1 MPI/GPU - ** KOKKOS package for OMP lmp_kokkos_omp -k on t 1 -sf kk -pk kokkos neigh half < in.lj