Files
lammps/bench/FERMI
..

These are input scripts used to run versions of several of the
benchmarks in the top-level bench directory using the GPU and
USER-CUDA accelerator packages.  The results of running these scripts
on two different machines (a desktop with 2 Tesla GPUs and the ORNL
Titan supercomputer) are shown on the "GPU (Fermi)" section of the
Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.

Examples are shown below of how to run these scripts.  This assumes
you have built 3 executables with both the GPU and USER-CUDA packages
installed, e.g.

lmp_linux_single
lmp_linux_mixed
lmp_linux_double

The precision (single, mixed, double) refers to the GPU and USER-CUDA
pacakge precision.  See the README files in the lib/gpu and lib/cuda
directories for instructions on how to build the packages with
different precisions.  The GPU and USER-CUDA sub-sections of the
doc/Section_accelerate.html file also describes this process.

------------------------------------------------------------------------

To run on just CPUs (without using the GPU or USER-CUDA styles),
do something like the following:

mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj
mpirun -np 12 lmp_linux_double -v x 16 -v y 16 -v z 16 -v t 100 < in.lj

The "xyz" settings determine the problem size.  The "t" setting
determines the number of timesteps.

These mpirun commands run on a single node.  To run on multiple
nodes, scale up the "-np" setting.

------------------------------------------------------------------------

To run with the GPU package, do something like the following:

mpirun -np 12 lmp_linux_single -sf gpu -pk gpu 1 -v x 32 -v y 32 -v z 64 -v t 100 < in.lj
mpirun -np 8 lmp_linux_mixed -sf gpu -pk gpu 2 -v x 32 -v y 32 -v z 64 -v t 100 < in.lj

The "xyz" settings determine the problem size.  The "t" setting
determines the number of timesteps.  The "np" setting determines how
many MPI tasks (per node) the problem will run on, The numeric
argument to the "-pk" setting is the number of GPUs (per node).  Note
that you can use more MPI tasks than GPUs (per node) with the GPU
package.

These mpirun commands run on a single node.  To run on multiple
nodes, scale up the "-np" setting, and control the number of
MPI tasks per node via a "-ppn" setting.

------------------------------------------------------------------------

To run with the USER-CUDA package, do something like the following:

If the script has "cuda" in its name, it is meant to be run using
the USER-CUDA package.  For example:

mpirun -np 1 ../lmp_linux_single -c on -sf cuda -v g 1 -v x 16 -v y 16 -v z 16 -v t 100 < in.lj.cuda

mpirun -np 2 ../lmp_linux_double -c on -sf cuda -v g 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam.cuda

The "xyz" settings determine the problem size.  The "t" setting
determines the number of timesteps.  The "np" setting determines how
many MPI tasks per compute node the problem will run on, and the "g"
setting determines how many GPUs per compute node the problem will run
on, i.e. 1 or 2 in this case.  For the USER-CUDA package, the number
of MPI tasks and GPUs (both per compute node) must be equal.

These mpirun commands run on a single node.  To run on multiple
nodes, scale up the "-np" setting.

------------------------------------------------------------------------

If the script has "titan" in its name, it was run on the Titan
supercomputer at ORNL.