diff --git a/doc/Manual.html b/doc/Manual.html index e783b2eaa9..39cc7d2f0b 100644 --- a/doc/Manual.html +++ b/doc/Manual.html @@ -132,15 +132,21 @@ it gives quick access to documentation for all LAMMPS commands.
  • Accelerating LAMMPS performance - +

    NOTE: + discuss 3 precisions + if change, also have to re-link with LAMMPS + always use newton off + expt with differing numbers of CPUs vs GPU - can't tell what is fastest + give command line switches in examples +

    Hardware and software requirements:

    To use this package, you currently need to have specific NVIDIA @@ -378,9 +458,7 @@ requires that your GPU card support double precision.


    -
    - -

    5.4 USER-CUDA package +

    5.7 USER-CUDA package

    The USER-CUDA package was developed by Christian Trott at U Technology Ilmenau in Germany. It provides NVIDIA GPU versions of many pair @@ -516,7 +594,7 @@ occurs, the faster your simulation will run.


    -

    5.5 Comparison of GPU and USER-CUDA packages +

    5.8 Comparison of GPU and USER-CUDA packages

    Both the GPU and USER-CUDA packages accelerate a LAMMPS calculation using NVIDIA hardware, but they do it in different ways. @@ -602,66 +680,4 @@ for the GPU and USER-CUDA packages.

    These contain input scripts for identical systems, so they can be used to benchmark the performance of both packages on your system.

    -
    - -

    Benchmark data: -

    -

    NOTE: We plan to add some benchmark results and plots here for the -examples described in the previous section. -

    -

    Simulations: -

    -

    1. Lennard Jones -

    - -

    2. Lennard Jones -

    - -

    3. Rhodopsin model -

    - -

    4. Lihtium-Phosphate -

    - -

    Hardware: -

    -

    Workstation: -

    - -

    eStella: -

    - -

    Keeneland: -

    - diff --git a/doc/Section_accelerate.txt b/doc/Section_accelerate.txt index c1721a87f6..388ed55741 100644 --- a/doc/Section_accelerate.txt +++ b/doc/Section_accelerate.txt @@ -11,14 +11,92 @@ Section"_Section_howto.html :c 5. Accelerating LAMMPS performance :h3 This section describes various methods for improving LAMMPS -performance for different classes of problems running -on different kinds of machines. +performance for different classes of problems running on different +kinds of machines. -5.1 "OPT package"_#acc_1 -5.2 "USER-OMP package"_#acc_2 -5.3 "GPU package"_#acc_3 -5.4 "USER-CUDA package"_#acc_4 -5.5 "Comparison of GPU and USER-CUDA packages"_#acc_5 :all(b) +5.1 "Measuring performance"_#acc_1 +5.2 "General strategies"_#acc_2 +5.3 "Packages with optimized styles"_#acc_3 +5.4 "OPT package"_#acc_4 +5.5 "USER-OMP package"_#acc_5 +5.6 "GPU package"_#acc_6 +5.7 "USER-CUDA package"_#acc_7 +5.8 "Comparison of GPU and USER-CUDA packages"_#acc_8 :all(b) + +:line +:line + +5.1 Measuring performance :h4,link(acc_1) + +Before trying to make your simulation run faster, you should +understand how it currently performs and where the bottlenecks are. + +The best way to do this is run the your system (actual number of +atoms) for a modest number of timesteps (say 100, or a few 100 at +most) on several different processor counts, including a single +processor if possible. Do this for an equilibrium version of your +system, so that the 100-step timings are representative of a much +longer run. There is typically no need to run for 1000s or timesteps +to get accurate timings; you can simply extrapolate from short runs. + +For the set of runs, look at the timing data printed to the screen and +log file at the end of each LAMMPS run. "This +section"_Section_start.html#start_8 of the manual has an overview. + +Running on one (or a few processors) should give a good estimate of +the serial performance and what portions of the timestep are taking +the most time. Running the same problem on a few different processor +counts should give an estimate of parallel scalability. I.e. if the +simulation runs 16x faster on 16 processors, its 100% parallel +efficient; if it runs 8x faster on 16 processors, it's 50% efficient. + +The most important data to look at in the timing info is the timing +breakdown and relative percentages. For example, trying different +options for speeding up the long-range solvers will have little impact +if they only consume 10% of the run time. If the pairwise time is +dominating, you may want to look at GPU or OMP versions of the pair +style, as discussed below. Comparing how the percentages change as +you increase the processor count gives you a sense of how different +operations within the timestep are scaling. Note that if you are +running with a Kspace solver, there is additional output on the +breakdown of the Kspace time. For PPPM, this includes the fraction +spent on FFTs, which can be communication intensive. + +Another important detail in the timing info are the histograms of +atoms counts and neighbor counts. If these vary widely across +processors, you have a load-imbalance issue. This often results in +inaccurate relative timing data, because processors have to wait when +communication occurs for other processors to catch up. Thus the +reported times for "Communication" or "Other" may be higher than they +really are, due to load-imbalance. If this is an issue, you can +uncomment the MPI_Barrier() lines in src/timer.cpp, and recompile +LAMMPS, to obtain synchronized timings. + +:line + +5.2 General strategies :h4,link(acc_2) + +Here is a list of general ideas for improving simulation performance. +Most of them are only applicable to certain models and certain +bottlenecks in the current performance, so let the timing data you +intially generate be your guide. It is hard, if not impossible, to +predict how much difference these options will make, since it is a +function of your problem and your machine. There is no substitute for +simply trying them out. + +rRESPA +2-FFT PPPM +single vs double PPPM +partial charge PPPM +verlet/split +processor mapping via processors numa command +load-balancing: balance and fix balance +processor command for layout +OMP when lots of cores :ul + +:line + +5.3 Packages with optimized styles :h4,link(acc_3) Accelerated versions of various "pair_style"_pair_style.html, "fixes"_fix.html, "computes"_compute.html, and other commands have @@ -81,10 +159,9 @@ speed-ups you can expect :ul The final section compares and contrasts the GPU and USER-CUDA packages, since they are both designed to use NVIDIA GPU hardware. -:line :line -5.1 OPT package :h4,link(acc_1) +5.4 OPT package :h4,link(acc_4) The OPT package was developed by James Fischer (High Performance Technologies), David Richie, and Vincent Natoli (Stone Ridge @@ -109,10 +186,9 @@ You should see a reduction in the "Pair time" printed out at the end of the run. On most machines and problems, this will typically be a 5 to 20% savings. -:line :line -5.2 USER-OMP package :h4,link(acc_2) +5.5 USER-OMP package :h4,link(acc_5) The USER-OMP package was developed by Axel Kohlmeyer at Temple University. It provides multi-threaded versions of most pair styles, all dihedral @@ -229,10 +305,9 @@ through hyper-threading. A description of the multi-threading strategy and some performance examples are "presented here"_http://sites.google.com/site/akohlmey/software/lammps-icms/lammps-icms-tms2011-talk.pdf?attredirects=0&d=1 -:line :line -5.3 GPU package :h4,link(acc_3) +5.6 GPU package :h4,link(acc_6) The GPU package was developed by Mike Brown at ORNL. It provides GPU versions of several pair styles and for long-range Coulombics via the @@ -260,6 +335,19 @@ NVIDIA support as well as more general OpenCL support, so that the same functionality can eventually be supported on a variety of GPU hardware. :l,ule + + +NOTE: + discuss 3 precisions + if change, also have to re-link with LAMMPS + always use newton off + expt with differing numbers of CPUs vs GPU - can't tell what is fastest + give command line switches in examples + + + + + [Hardware and software requirements:] To use this package, you currently need to have specific NVIDIA @@ -370,10 +458,9 @@ See the lammps/lib/gpu/README file for instructions on how to build the GPU library for single, mixed, or double precision. The latter requires that your GPU card support double precision. -:line :line -5.4 USER-CUDA package :h4,link(acc_4) +5.7 USER-CUDA package :h4,link(acc_7) The USER-CUDA package was developed by Christian Trott at U Technology Ilmenau in Germany. It provides NVIDIA GPU versions of many pair @@ -508,7 +595,7 @@ occurs, the faster your simulation will run. :line :line -5.5 Comparison of GPU and USER-CUDA packages :h4,link(acc_5) +5.8 Comparison of GPU and USER-CUDA packages :h4,link(acc_8) Both the GPU and USER-CUDA packages accelerate a LAMMPS calculation using NVIDIA hardware, but they do it in different ways. @@ -593,65 +680,3 @@ lammps/examples/USER/cuda = USER-CUDA package files :ul These contain input scripts for identical systems, so they can be used to benchmark the performance of both packages on your system. - -:line - -[Benchmark data:] - -NOTE: We plan to add some benchmark results and plots here for the -examples described in the previous section. - -Simulations: - -1. Lennard Jones - -256,000 atoms -2.5 A cutoff -0.844 density :ul - -2. Lennard Jones - -256,000 atoms -5.0 A cutoff -0.844 density :ul - -3. Rhodopsin model - -256,000 atoms -10A cutoff -Coulomb via PPPM :ul - -4. Lihtium-Phosphate - -295650 atoms -15A cutoff -Coulomb via PPPM :ul - -Hardware: - -Workstation: - -2x GTX470 -i7 950@3GHz -24Gb DDR3 @ 1066Mhz -CentOS 5.5 -CUDA 3.2 -Driver 260.19.12 :ul - -eStella: - -6 Nodes -2xC2050 -2xQDR Infiniband interconnect(aggregate bandwidth 80GBps) -Intel X5650 HexCore @ 2.67GHz -SL 5.5 -CUDA 3.2 -Driver 260.19.26 :ul - -Keeneland: - -HP SL-390 (Ariston) cluster -120 nodes -2x Intel Westmere hex-core CPUs -3xC2070s -QDR InfiniBand interconnect :ul diff --git a/doc/Section_commands.html b/doc/Section_commands.html index 3abb5ccfb6..aacf28ffe1 100644 --- a/doc/Section_commands.html +++ b/doc/Section_commands.html @@ -383,12 +383,12 @@ each style or click on the style itself for a full description:
    - - - - - - + + + + +
    angle/localatom/moleculebond/localcentro/atomcluster/atomcna/atom
    comcom/moleculecoord/atomdamage/atomdihedral/localdisplace/atom
    erotate/asphereerotate/sphereevent/displacegroup/groupgyrationgyration/molecule
    heat/fluximproper/localkeke/atommsdmsd/molecule
    pairpair/localpepe/atompressureproperty/atom
    property/localproperty/moleculerdfreducereduce/regionslice
    stress/atomtemptemp/aspheretemp/comtemp/deformtemp/partial
    temp/profiletemp/ramptemp/regiontemp/sphereti +
    erotate/asphereerotate/sphereerotate/sphere/atomevent/displacegroup/groupgyration
    gyration/moleculeheat/fluximproper/localkeke/atommsd
    msd/moleculepairpair/localpepe/atompressure
    property/atomproperty/localproperty/moleculerdfreducereduce/region
    slicestress/atomtemptemp/aspheretemp/comtemp/deform
    temp/partialtemp/profiletemp/ramptemp/regiontemp/sphereti

    These are compute styles contributed by users, which can be used if diff --git a/doc/Section_commands.txt b/doc/Section_commands.txt index fc8e881c27..051c5ff739 100644 --- a/doc/Section_commands.txt +++ b/doc/Section_commands.txt @@ -556,6 +556,7 @@ each style or click on the style itself for a full description: "displace/atom"_compute_displace_atom.html, "erotate/asphere"_compute_erotate_asphere.html, "erotate/sphere"_compute_erotate_sphere.html, +"erotate/sphere/atom"_compute_erotate_sphere_atom.html, "event/displace"_compute_event_displace.html, "group/group"_compute_group_group.html, "gyration"_compute_gyration.html, diff --git a/doc/Section_packages.html b/doc/Section_packages.html index ae604bbf98..bbd94a4a6f 100644 --- a/doc/Section_packages.html +++ b/doc/Section_packages.html @@ -49,7 +49,7 @@ packages, more details are provided. COLLOID colloidal particles - atom_style colloid colloid - DIPOLE point dipole particles - pair_style dipole/cut dipole - FLD Fast Lubrication Dynamics Kumar & Bybee & Higdon (1) pair_style lubricateU - - -GPU GPU-enabled potentials Mike Brown (ORNL) Section accelerate gpu lib/gpu +GPU GPU-enabled potentials Mike Brown (ORNL) Section accelerate gpu lib/gpu GRANULAR granular systems - Section_howto pour - KIM openKIM potentials Smirichinski & Elliot & Tadmor (3) pair_style kim kim lib/kim KSPACE long-range Coulombic solvers - kspace_style peptide - @@ -57,7 +57,7 @@ packages, more details are provided. MEAM modified EAM potential Greg Wagner (Sandia) pair_style meam meam lib/meam MC Monte Carlo options - fix gcmc - - MOLECULE molecular system force fields - Section_howto peptide - -OPT optimized pair potentials Fischer & Richie & Natoli (2) Section accelerate - - +OPT optimized pair potentials Fischer & Richie & Natoli (2) Section accelerate - - PERI Peridynamics models Mike Parks (Sandia) pair_style peri peri - POEMS coupled rigid body motion Rudra Mukherjee (JPL) fix poems rigid lib/poems REAX ReaxFF potential Aidan Thompson (Sandia) pair_style reax reax lib/reax @@ -106,11 +106,11 @@ E.g. "peptide" refers to the examples/peptide directory. USER-AWPMD wave-packet MD Ilya Valuev (JIHT) pair_style awpmd/cut USER/awpmd - lib/awpmd USER-CG-CMM coarse-graining model Axel Kohlmeyer (Temple U) pair_style lj/sdk USER/cg-cmm cg - USER-COLVARS collective variables Fiorin & Henin & Kohlmeyer (3) fix colvars USER/colvars colvars lib/colvars -USER-CUDA NVIDIA GPU styles Christian Trott (U Tech Ilmenau) Section accelerate USER/cuda - lib/cuda +USER-CUDA NVIDIA GPU styles Christian Trott (U Tech Ilmenau) Section accelerate USER/cuda - lib/cuda USER-EFF electron force field Andres Jaramillo-Botero (Caltech) pair_style eff/cut USER/eff eff - USER-EWALDN Ewald for 1/R^n Pieter in' t Veld (BASF) kspace_style - - - USER-MOLFILE VMD molfile plug-ins Axel Kohlmeyer (Temple U) dump molfile - - lib/molfile -USER-OMP OpenMP threaded styles Axel Kohlmeyer (Temple U) Section accelerate - - - +USER-OMP OpenMP threaded styles Axel Kohlmeyer (Temple U) Section accelerate - - - USER-REAXC C version of ReaxFF Metin Aktulga (LBNL) pair_style reaxc reax - - USER-SPH smoothed particle hydrodynamics Georg Ganzenmuller (EMI) userguide.pdf USER/sph sph - @@ -304,7 +304,7 @@ GPUs.

    See this section of the manual to get started:

    -

    Section_accelerate +

    Section_accelerate

    There are example scripts for using this package in examples/USER/cuda. @@ -403,7 +403,7 @@ styles, and fix styles.

    See this section of the manual to get started:

    -

    Section_accelerate +

    Section_accelerate

    The person who created this package is Axel Kohlmeyer at Temple U (akohlmey at gmail.com). Contact him directly if you have questions. diff --git a/doc/Section_packages.txt b/doc/Section_packages.txt index e3575fd5f7..51e0a33f90 100644 --- a/doc/Section_packages.txt +++ b/doc/Section_packages.txt @@ -44,7 +44,7 @@ CLASS2, class 2 force fields, -, "pair_style lj/class2"_pair_class2.html, -, - COLLOID, colloidal particles, -, "atom_style colloid"_atom_style.html, colloid, - DIPOLE, point dipole particles, -, "pair_style dipole/cut"_pair_dipole.html, dipole, - FLD, Fast Lubrication Dynamics, Kumar & Bybee & Higdon (1), "pair_style lubricateU"_pair_lubricateU.html, -, - -GPU, GPU-enabled potentials, Mike Brown (ORNL), "Section accelerate"_Section_accelerate.html#acc_3, gpu, lib/gpu +GPU, GPU-enabled potentials, Mike Brown (ORNL), "Section accelerate"_Section_accelerate.html#acc_6, gpu, lib/gpu GRANULAR, granular systems, -, "Section_howto"_Section_howto.html#howto_6, pour, - KIM, openKIM potentials, Smirichinski & Elliot & Tadmor (3), "pair_style kim"_pair_kim.html, kim, lib/kim KSPACE, long-range Coulombic solvers, -, "kspace_style"_kspace_style.html, peptide, - @@ -52,7 +52,7 @@ MANYBODY, many-body potentials, -, "pair_style tersoff"_pair_tersoff.html, shear MEAM, modified EAM potential, Greg Wagner (Sandia), "pair_style meam"_pair_meam.html, meam, lib/meam MC, Monte Carlo options, -, "fix gcmc"_fix_gcmc.html, -, - MOLECULE, molecular system force fields, -, "Section_howto"_Section_howto.html#howto_3, peptide, - -OPT, optimized pair potentials, Fischer & Richie & Natoli (2), "Section accelerate"_Section_accelerate.html#acc_1, -, - +OPT, optimized pair potentials, Fischer & Richie & Natoli (2), "Section accelerate"_Section_accelerate.html#acc_4, -, - PERI, Peridynamics models, Mike Parks (Sandia), "pair_style peri"_pair_peri.html, peri, - POEMS, coupled rigid body motion, Rudra Mukherjee (JPL), "fix poems"_fix_poems.html, rigid, lib/poems REAX, ReaxFF potential, Aidan Thompson (Sandia), "pair_style reax"_pair_reax.html, reax, lib/reax @@ -98,11 +98,11 @@ USER-ATC, atom-to-continuum coupling, Jones & Templeton & Zimmerman (2), "fix at USER-AWPMD, wave-packet MD, Ilya Valuev (JIHT), "pair_style awpmd/cut"_pair_awpmd.html, USER/awpmd, -, lib/awpmd USER-CG-CMM, coarse-graining model, Axel Kohlmeyer (Temple U), "pair_style lj/sdk"_pair_sdk.html, USER/cg-cmm, "cg"_cg, - USER-COLVARS, collective variables, Fiorin & Henin & Kohlmeyer (3), "fix colvars"_fix_colvars.html, USER/colvars, "colvars"_colvars, lib/colvars -USER-CUDA, NVIDIA GPU styles, Christian Trott (U Tech Ilmenau), "Section accelerate"_Section_accelerate.html#acc_4, USER/cuda, -, lib/cuda +USER-CUDA, NVIDIA GPU styles, Christian Trott (U Tech Ilmenau), "Section accelerate"_Section_accelerate.html#acc_7, USER/cuda, -, lib/cuda USER-EFF, electron force field, Andres Jaramillo-Botero (Caltech), "pair_style eff/cut"_pair_eff.html, USER/eff, "eff"_eff, - USER-EWALDN, Ewald for 1/R^n, Pieter in' t Veld (BASF), "kspace_style"_kspace_style.html, -, -, - USER-MOLFILE, "VMD"_VMD molfile plug-ins, Axel Kohlmeyer (Temple U), "dump molfile"_dump_molfile.html, -, -, lib/molfile -USER-OMP, OpenMP threaded styles, Axel Kohlmeyer (Temple U), "Section accelerate"_Section_accelerate.html#acc_2, -, -, - +USER-OMP, OpenMP threaded styles, Axel Kohlmeyer (Temple U), "Section accelerate"_Section_accelerate.html#acc_5, -, -, - USER-REAXC, C version of ReaxFF, Metin Aktulga (LBNL), "pair_style reaxc"_pair_reax_c.html, reax, -, - USER-SPH, smoothed particle hydrodynamics, Georg Ganzenmuller (EMI), "userguide.pdf"_USER/sph/SPH_LAMMPS_userguide.pdf, USER/sph, "sph"_sph, - :tb(ea=c) @@ -291,7 +291,7 @@ GPUs. See this section of the manual to get started: -"Section_accelerate"_Section_accelerate.html#acc_4 +"Section_accelerate"_Section_accelerate.html#acc_7 There are example scripts for using this package in examples/USER/cuda. @@ -390,7 +390,7 @@ styles, and fix styles. See this section of the manual to get started: -"Section_accelerate"_Section_accelerate.html#acc_2 +"Section_accelerate"_Section_accelerate.html#acc_5 The person who created this package is Axel Kohlmeyer at Temple U (akohlmey at gmail.com). Contact him directly if you have questions. diff --git a/doc/compute.html b/doc/compute.html index 4e3d6ffee0..879e6fa8c5 100644 --- a/doc/compute.html +++ b/doc/compute.html @@ -181,6 +181,7 @@ available in LAMMPS:

  • displace/atom - displacement of each atom
  • erotate/asphere - rotational energy of aspherical particles
  • erotate/sphere - rotational energy of spherical particles +
  • erotate/sphere/atom - rotational energy for each spherical particle
  • event/displace - detect event on atom displacement
  • group/group - energy/force between two groups of atoms
  • gyration - radius of gyration of group of atoms diff --git a/doc/compute.txt b/doc/compute.txt index 2c53f5f4e6..02ca64a298 100644 --- a/doc/compute.txt +++ b/doc/compute.txt @@ -176,6 +176,7 @@ available in LAMMPS: "displace/atom"_compute_displace_atom.html - displacement of each atom "erotate/asphere"_compute_erotate_asphere.html - rotational energy of aspherical particles "erotate/sphere"_compute_erotate_sphere.html - rotational energy of spherical particles +"erotate/sphere/atom"_compute_erotate_sphere.html - rotational energy for each spherical particle "event/displace"_compute_event_displace.html - detect event on atom displacement "group/group"_compute_group_group.html - energy/force between two groups of atoms "gyration"_compute_gyration.html - radius of gyration of group of atoms