git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@14950 f3b2605a-c512-4ea7-a41b-209d697bcdaa

2016-05-09 17:22:38 +00:00
parent d3e96156a7
commit 1cf54d01f4
2104 changed files with 270636 additions and 0 deletions
--- a/doc/html/_sources/accelerate_cuda.txt
+++ b/doc/html/_sources/accelerate_cuda.txt
@ -0,0 +1,223 @@
+:doc:`Return to Section accelerate overview <Section_accelerate>`
+
+5.USER-CUDA package
+-------------------
+
+The USER-CUDA package was developed by Christian Trott (Sandia) while
+at U Technology Ilmenau in Germany.  It provides NVIDIA GPU versions
+of many pair styles, many fixes, a few computes, and for long-range
+Coulombics via the PPPM command.  It has the following general
+features:
+
+* The package is designed to allow an entire LAMMPS calculation, for
+  many timesteps, to run entirely on the GPU (except for inter-processor
+  MPI communication), so that atom-based data (e.g. coordinates, forces)
+  do not have to move back-and-forth between the CPU and GPU.
+* The speed-up advantage of this approach is typically better when the
+  number of atoms per GPU is large
+* Data will stay on the GPU until a timestep where a non-USER-CUDA fix
+  or compute is invoked.  Whenever a non-GPU operation occurs (fix,
+  compute, output), data automatically moves back to the CPU as needed.
+  This may incur a performance penalty, but should otherwise work
+  transparently.
+* Neighbor lists are constructed on the GPU.
+* The package only supports use of a single MPI task, running on a
+  single CPU (core), assigned to each GPU.
+Here is a quick overview of how to use the USER-CUDA package:
+
+* build the library in lib/cuda for your GPU hardware with desired precision
+* include the USER-CUDA package and build LAMMPS
+* use the mpirun command to specify 1 MPI task per GPU (on each node)
+* enable the USER-CUDA package via the "-c on" command-line switch
+* specify the # of GPUs per node
+* use USER-CUDA styles in your input script
+
+The latter two steps can be done using the "-pk cuda" and "-sf cuda"
+:ref:`command-line switches <start_7>` respectively.  Or
+the effect of the "-pk" or "-sf" switches can be duplicated by adding
+the :doc:`package cuda <package>` or :doc:`suffix cuda <suffix>` commands
+respectively to your input script.
+
+**Required hardware/software:**
+
+To use this package, you need to have one or more NVIDIA GPUs and
+install the NVIDIA Cuda software on your system:
+
+Your NVIDIA GPU needs to support Compute Capability 1.3. This list may
+help you to find out the Compute Capability of your card:
+
+http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units
+
+Install the Nvidia Cuda Toolkit (version 3.2 or higher) and the
+corresponding GPU drivers.  The Nvidia Cuda SDK is not required, but
+we recommend it also be installed.  You can then make sure its sample
+projects can be compiled without problems.
+
+**Building LAMMPS with the USER-CUDA package:**
+
+This requires two steps (a,b): build the USER-CUDA library, then build
+LAMMPS with the USER-CUDA package.
+
+You can do both these steps in one line, using the src/Make.py script,
+described in :ref:`Section 2.4 <start_4>` of the manual.
+Type "Make.py -h" for help.  If run from the src directory, this
+command will create src/lmp_cuda using src/MAKE/Makefile.mpi as the
+starting Makefile.machine:
+
+.. parsed-literal::
+
+   Make.py -p cuda -cuda mode=single arch=20 -o cuda -a lib-cuda file mpi
+
+Or you can follow these two (a,b) steps:
+
+(a) Build the USER-CUDA library
+
+The USER-CUDA library is in lammps/lib/cuda.  If your *CUDA* toolkit
+is not installed in the default system directoy */usr/local/cuda* edit
+the file *lib/cuda/Makefile.common* accordingly.
+
+To build the library with the settings in lib/cuda/Makefile.default,
+simply type:
+
+.. parsed-literal::
+
+   make
+
+To set options when the library is built, type "make OPTIONS", where
+*OPTIONS* are one or more of the following. The settings will be
+written to the *lib/cuda/Makefile.defaults* before the build.
+
+.. parsed-literal::
+
+   *precision=N* to set the precision level
+     N = 1 for single precision (default)
+     N = 2 for double precision
+     N = 3 for positions in double precision
+     N = 4 for positions and velocities in double precision
+   *arch=M* to set GPU compute capability
+     M = 35 for Kepler GPUs
+     M = 20 for CC2.0 (GF100/110, e.g. C2050,GTX580,GTX470) (default)
+     M = 21 for CC2.1 (GF104/114,  e.g. GTX560, GTX460, GTX450)
+     M = 13 for CC1.3 (GF200, e.g. C1060, GTX285)
+   *prec_timer=0/1* to use hi-precision timers
+     0 = do not use them (default)
+     1 = use them
+     this is usually only useful for Mac machines 
+   *dbg=0/1* to activate debug mode
+     0 = no debug mode (default)
+     1 = yes debug mode
+     this is only useful for developers
+   *cufft=1* for use of the CUDA FFT library
+     0 = no CUFFT support (default)
+     in the future other CUDA-enabled FFT libraries might be supported
+
+If the build is successful, it will produce the files liblammpscuda.a and
+Makefile.lammps.
+
+Note that if you change any of the options (like precision), you need
+to re-build the entire library.  Do a "make clean" first, followed by
+"make".
+
+(b) Build LAMMPS with the USER-CUDA package
+
+.. parsed-literal::
+
+   cd lammps/src
+   make yes-user-cuda
+   make machine
+
+No additional compile/link flags are needed in Makefile.machine.
+
+Note that if you change the USER-CUDA library precision (discussed
+above) and rebuild the USER-CUDA library, then you also need to
+re-install the USER-CUDA package and re-build LAMMPS, so that all
+affected files are re-compiled and linked to the new USER-CUDA
+library.
+
+**Run with the USER-CUDA package from the command line:**
+
+The mpirun or mpiexec command sets the total number of MPI tasks used
+by LAMMPS (one or multiple per compute node) and the number of MPI
+tasks used per node.  E.g. the mpirun command in MPICH does this via
+its -np and -ppn switches.  Ditto for OpenMPI via -np and -npernode.
+
+When using the USER-CUDA package, you must use exactly one MPI task
+per physical GPU.
+
+You must use the "-c on" :ref:`command-line switch <start_7>` to enable the USER-CUDA package.
+The "-c on" switch also issues a default :doc:`package cuda 1 <package>`
+command which sets various USER-CUDA options to default values, as
+discussed on the :doc:`package <package>` command doc page.
+
+Use the "-sf cuda" :ref:`command-line switch <start_7>`,
+which will automatically append "cuda" to styles that support it.  Use
+the "-pk cuda Ng" :ref:`command-line switch <start_7>` to
+set Ng = # of GPUs per node to a different value than the default set
+by the "-c on" switch (1 GPU) or change other :doc:`package cuda <package>` options.
+
+.. parsed-literal::
+
+   lmp_machine -c on -sf cuda -pk cuda 1 -in in.script                       # 1 MPI task uses 1 GPU
+   mpirun -np 2 lmp_machine -c on -sf cuda -pk cuda 2 -in in.script          # 2 MPI tasks use 2 GPUs on a single 16-core (or whatever) node
+   mpirun -np 24 -ppn 2 lmp_machine -c on -sf cuda -pk cuda 2 -in in.script  # ditto on 12 16-core nodes
+
+The syntax for the "-pk" switch is the same as same as the "package
+cuda" command.  See the :doc:`package <package>` command doc page for
+details, including the default values used for all its options if it
+is not specified.
+
+Note that the default for the :doc:`package cuda <package>` command is
+to set the Newton flag to "off" for both pairwise and bonded
+interactions.  This typically gives fastest performance.  If the
+:doc:`newton <newton>` command is used in the input script, it can
+override these defaults.
+
+**Or run with the USER-CUDA package by editing an input script:**
+
+The discussion above for the mpirun/mpiexec command and the requirement
+of one MPI task per GPU is the same.
+
+You must still use the "-c on" :ref:`command-line switch <start_7>` to enable the USER-CUDA package.
+
+Use the :doc:`suffix cuda <suffix>` command, or you can explicitly add a
+"cuda" suffix to individual styles in your input script, e.g.
+
+.. parsed-literal::
+
+   pair_style lj/cut/cuda 2.5
+
+You only need to use the :doc:`package cuda <package>` command if you
+wish to change any of its option defaults, including the number of
+GPUs/node (default = 1), as set by the "-c on" :ref:`command-line switch <start_7>`.
+
+**Speed-ups to expect:**
+
+The performance of a GPU versus a multi-core CPU is a function of your
+hardware, which pair style is used, the number of atoms/GPU, and the
+precision used on the GPU (double, single, mixed).
+
+See the `Benchmark page <http://lammps.sandia.gov/bench.html>`_ of the
+LAMMPS web site for performance of the USER-CUDA package on different
+hardware.
+
+**Guidelines for best performance:**
+
+* The USER-CUDA package offers more speed-up relative to CPU performance
+  when the number of atoms per GPU is large, e.g. on the order of tens
+  or hundreds of 1000s.
+* As noted above, this package will continue to run a simulation
+  entirely on the GPU(s) (except for inter-processor MPI communication),
+  for multiple timesteps, until a CPU calculation is required, either by
+  a fix or compute that is non-GPU-ized, or until output is performed
+  (thermo or dump snapshot or restart file).  The less often this
+  occurs, the faster your simulation will run.
+Restrictions
+""""""""""""
+
+
+None.
+
+
+.. _lws: http://lammps.sandia.gov
+.. _ld: Manual.html
+.. _lc: Section_commands.html#comm