diff --git a/doc/Section_accelerate.html b/doc/Section_accelerate.html index 27b80f3d63..7547b571af 100644 --- a/doc/Section_accelerate.html +++ b/doc/Section_accelerate.html @@ -17,22 +17,38 @@ Section performance for different classes of problems running on different kinds of machines.

-5.1 Measuring performance
-5.2 General strategies
-5.3 Packages with optimized styles
-5.4 OPT package
-5.5 USER-OMP package
-5.6 GPU package
-5.7 USER-CUDA package
-5.8 KOKKOS package
-5.9 USER-INTEL package
-5.10 Comparison of USER-CUDA, GPU, and KOKKOS packages
+

There are two thrusts to the discussion that follows. The +first is using code options that implement alternate algorithms +that can speed-up a simulation. The second is to use one +of the several accelerator packages provided with LAMMPS that +contain code optimized for certain kinds of hardware, including +multi-core CPUs, GPUs, and Intel Xeon Phi coprocessors. +

+

The Benchmark page of the LAMMPS web site gives performance results for the various accelerator -packages discussed in this section for several of the standard LAMMPS -benchmarks, as a function of problem size and number of compute nodes, -on different hardware platforms. +packages discussed in Section 5.2, for several of the standard LAMMPS +benchmark problems, as a function of problem size and number of +compute nodes, on different hardware platforms.


@@ -104,11 +120,9 @@ various options.
  • Staggered PPPM
  • single vs double PPPM
  • partial charge PPPM -
  • verlet/split -
  • processor mapping via processors numa command -
  • load-balancing: balance and fix balance -
  • processor command for layout -
  • OMP when lots of cores +
  • verlet/split run style +
  • processor command for proc layout and numa layout +
  • load-balancing: balance and fix balance

    2-FFT PPPM, also called analytic differentiation or ad PPPM, uses 2 FFTs instead of the 4 FFTs used by the default ik differentiation @@ -146,28 +160,30 @@ such as when using a barostat. fixes, computes, and other commands have been added to LAMMPS, which will typically run faster than the standard non-accelerated versions. Some require appropriate hardware -on your system, e.g. GPUs or Intel Xeon Phi chips. +to be present on your system, e.g. GPUs or Intel Xeon Phi +coprocessors.

    -

    All of these commands are in packages provided with LAMMPS, as -explained here. Currently, there are 6 such -accelerator packages in LAMMPS, either as standard or user packages: +

    All of these commands are in packages provided with LAMMPS. An +overview of packages is give in Section +packages. Currently, there are 6 accelerator +packages in LAMMPS, either as standard or user packages:

    - - - - - - + + + + +
    USER-CUDA for NVIDIA GPUs
    GPU for NVIDIA GPUs as well as OpenCL support
    USER-INTEL for Intel CPUs and Intel Xeon Phi
    KOKKOS for GPUs, Intel Xeon Phi, and OpenMP threading
    USER-OMP for OpenMP threading
    OPT generic CPU optimizations +
    USER-CUDA for NVIDIA GPUs
    GPU for NVIDIA GPUs as well as OpenCL support
    USER-INTEL for Intel CPUs and Intel Xeon Phi
    KOKKOS for GPUs, Intel Xeon Phi, and OpenMP threading
    USER-OMP for OpenMP threading
    OPT generic CPU optimizations

    Any accelerated style has the same name as the corresponding standard style, except that a suffix is appended. Otherwise, the syntax for -the command that specifies the style is identical, their functionality -is the same, and the numerical results it produces should also be the +the command that uses the style is identical, their functionality is +the same, and the numerical results it produces should also be the same, except for precision and round-off effects.

    -

    For example, all of these styles are variants of the basic +

    For example, all of these styles are accelerated variants of the Lennard-Jones pair_style lj/cut:

    -

    Assuming LAMMPS was built with the appropriate package, a simulation -using accelerated styles from the package can be run without modifying -your input script, by specifying command-line -switches. The details of how to do this -vary from package to package and are explained below. There is also a -suffix command and a package command that -accomplish the same thing and can be used within an input script if -preferred. The suffix command allows more precise -control of whether an accelerated or unaccelerated version of a style -is used at various points within an input script. +

    To see what accelerate styles are currently available, see +Section_commands 5 of the manual. The +doc pages for individual commands (e.g. pair lj/cut or +fix nve) also list any accelerated variants available +for that style.

    -

    To see what styles are currently available in each of the accelerated -packages, see Section_commands 5 of the -manual. The doc page for individual commands (e.g. pair -lj/cut or fix nve) also lists any -accelerated variants available for that style. +

    To use an accelerator package in LAMMPS, and one or more of the styles +it provides, follow these general steps. Details vary from package to +package and are explained in the individual accelerator sub-section +doc pages, listed above: +

    +
    + + + + + + + +
    build the accelerator library only for USER-CUDA and GPU packages
    install the accelerator package make yes-opt, make yes-user-intel, etc
    add compile/link flags to Makefile.machine in src/MAKE,
    only for USER-INTEL, KOKKOS, USER-OMP packages
    re-build LAMMPS make machine
    run a LAMMPS simulation lmp_machine < in.script
    enable the accelerator package via "-c on" and "-k on" command-line switches,
    only for USER-CUDA and KOKKOS packages
    set any needed options for the package via "-pk" command-line switch or package command,
    only if defaults need to be changed
    use accelerated styles in your input script via "-sf" command-line switch or suffix command +
    + +

    The first 4 steps typically only need to be done once, to create an +executable that uses one or more accelerator packages. We are working +to create a "make" tool that will perform all these 4 steps in a +single command. +

    +

    The last 4 steps can all be done from the command-line when LAMMPS is +launched, without changing your input script. Or you can add +package and suffix commands to your input +script.

    The examples directory has several sub-directories with scripts and -README files for using the accelerator packages: +README files for how to use the following accelerator packages: