diff --git a/doc/Section_commands.html b/doc/Section_commands.html index ca64e36c5d..74ed768f3c 100644 --- a/doc/Section_commands.html +++ b/doc/Section_commands.html @@ -311,7 +311,7 @@ included when LAMMPS was built. Not all packages are included in a default LAMMPS build. These dependencies are listed as Restrictions in the command's documentation.

-
+
@@ -335,7 +335,7 @@ in the command's documentation.

See the fix command for one-line descriptions of each style or click on the style itself for a full description:

-
angle_coeffangle_styleatom_modifyatom_stylebond_coeffbond_style
boundarychange_boxclearcommunicatecomputecompute_modify
create_atomscreate_boxdelete_atomsdelete_bondsdielectricdihedral_coeff
+
@@ -351,7 +351,7 @@ of each style or click on the style itself for a full description:

These are fix styles contributed by users, which can be used if LAMMPS is built with the appropriate package.

-
adaptaddforceaveforceave/atomave/correlateave/histoave/spatialave/time
bond/breakbond/createbond/swapbox/relaxdeformdepositdragdt/reset
efieldenforce2devaporateexternalfreezegravityheatindent
+
atcimdlangevin/effnph/effnpt/effnve/eff
nvt/effnvt/sllod/effqeq/reaxsmdtemp/rescale/eff
@@ -363,7 +363,7 @@ of each style or click on the style itself for a full description:

See the compute command for one-line descriptions of each style or click on the style itself for a full description:

-
+
@@ -377,7 +377,7 @@ each style or click on the style itself for a full description:

These are compute styles contributed by users, which can be used if LAMMPS is built with the appropriate package.

-
angle/localatom/moleculebond/localcentro/atomcna/atomcom
com/moleculecoord/atomdamage/atomdihedral/localdisplace/atomerotate/asphere
erotate/sphereevent/displacegroup/groupgyrationgyration/moleculeheat/flux
+
ackland/atomke/effke/atom/efftemp/efftemp/deform/efftemp/region/eff
@@ -388,7 +388,7 @@ each style or click on the style itself for a full description:

See the pair_style command for an overview of pair potentials. Click on the style itself for a full description:

-
+
@@ -400,20 +400,22 @@ potentials. Click on the style itself for a full description: - - - - - - + + + + + +
nonehybridhybrid/overlayairebo
bornborn/coul/longbuckbuck/coul/cut
buck/coul/longcolloidcombcoul/cut
hbond/dreiding/morselj/charmm/coul/charmmlj/charmm/coul/charmm/implicitlj/charmm/coul/long
lj/charmm/coul/long/optlj/class2lj/class2/coul/cutlj/class2/coul/long
lj/cutlj/cut/gpulj/cut/optlj/cut/coul/cut
lj/cut/coul/debyelj/cut/coul/longlj/cut/coul/long/tip4plj/expand
lj/gromacslj/gromacs/coul/gromacslj/smoothlj96/cut
lubricatemeammorsemorse/opt
peri/lpsperi/pmbreaxresquared
softswtabletersoff
tersoff/zblyukawayukawa/colloid +
lj/cut/coul/cut/gpulj/cut/coul/debyelj/cut/coul/longlj/cut/coul/long/gpu
lj/cut/coul/long/tip4plj/expandlj/gromacslj/gromacs/coul/gromacs
lj/smoothlj96/cutlj96/cut/gpulubricate
meammorsemorse/optperi/lps
peri/pmbreaxresquaredsoft
swtabletersofftersoff/zbl
yukawayukawa/colloid

These are pair styles contributed by users, which can be used if LAMMPS is built with the appropriate package.

-
- -
buck/coulcg/cmmcg/cmm/coul/cutcg/cmm/coul/long
eam/cdeff/cutlj/coulreax/c +
@@ -423,7 +425,7 @@ potentials. Click on the style itself for a full description:

See the bond_style command for an overview of bond potentials. Click on the style itself for a full description:

-
+
nonehybridclass2fene
fene/expandharmonicmorsenonlinear
quartictable @@ -436,7 +438,7 @@ potentials. Click on the style itself for a full description:

See the angle_style command for an overview of angle potentials. Click on the style itself for a full description:

-
+
nonehybridcharmmclass2
cosinecosine/deltacosine/periodiccosine/squared
harmonictable @@ -445,7 +447,7 @@ angle potentials. Click on the style itself for a full description:

These are angle styles contributed by users, which can be used if LAMMPS is built with the appropriate package.

-
+
cg/cmm
@@ -457,7 +459,7 @@ angle potentials. Click on the style itself for a full description: of dihedral potentials. Click on the style itself for a full description:

- @@ -470,7 +472,7 @@ description: of improper potentials. Click on the style itself for a full description:

- @@ -482,14 +484,14 @@ description:

See the kspace_style command for an overview of Kspace solvers. Click on the style itself for a full description:

-

These are Kspace solvers contributed by users, which can be used if LAMMPS is built with the appropriate package.

- diff --git a/doc/Section_commands.txt b/doc/Section_commands.txt index 2c0de9615f..4a8d83d524 100644 --- a/doc/Section_commands.txt +++ b/doc/Section_commands.txt @@ -605,14 +605,17 @@ potentials. Click on the style itself for a full description: "lj/cut/gpu"_pair_lj.html, "lj/cut/opt"_pair_lj.html, "lj/cut/coul/cut"_pair_lj.html, +"lj/cut/coul/cut/gpu"_pair_lj.html, "lj/cut/coul/debye"_pair_lj.html, "lj/cut/coul/long"_pair_lj.html, +"lj/cut/coul/long/gpu"_pair_lj.html, "lj/cut/coul/long/tip4p"_pair_lj.html, "lj/expand"_pair_lj_expand.html, "lj/gromacs"_pair_gromacs.html, "lj/gromacs/coul/gromacs"_pair_gromacs.html, "lj/smooth"_pair_lj_smooth.html, "lj96/cut"_pair_lj96_cut.html, +"lj96/cut/gpu"_pair_lj96_cut.html, "lubricate"_pair_lubricate.html, "meam"_pair_meam.html, "morse"_pair_morse.html, @@ -634,8 +637,10 @@ These are pair styles contributed by users, which can be used if "buck/coul"_pair_buck_coul.html, "cg/cmm"_pair_cmm.html, +"cg/cmm/gpu"_pair_cmm.html, "cg/cmm/coul/cut"_pair_cmm.html, "cg/cmm/coul/long"_pair_cmm.html, +"cg/cmm/coul/long/gpu"_pair_cmm.html, "eam/cd"_pair_eam.html, "eff/cut"_pair_eff.html, "lj/coul"_pair_lj_coul.html, diff --git a/doc/Section_start.html b/doc/Section_start.html index 41a45d5170..4be3c89537 100644 --- a/doc/Section_start.html +++ b/doc/Section_start.html @@ -403,9 +403,9 @@ LAMMPS is built. the files in these packages require other packages to also be included. If this is not the case, then those subsidiary files in "gpu" and "opt" will not be installed either. To install all the -files in package "gpu", the "asphere" package must also be installed. -To install all the files in package "opt", the "kspace" and "manybody" -packages must also be installed. +files in package "gpu", the "asphere" and "kspace" packages must also be +installed. To install all the files in package "opt", the "kspace" and +"manybody" packages must also be installed.

You may wish to exclude certain packages if you will never run certain kinds of simulations. This will keep you from having to build @@ -909,53 +909,141 @@ certain NVIDIA CUDA software on your system:

  • Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0
  • Go to http://www.nvidia.com/object/cuda_get.html -
  • Install a driver and toolkit appopriate for your system (SDK is not necessary) -
  • Run make in lammps/lib/gpu, editing a Makefile if necessary +
  • Install a driver and toolkit appropriate for your system (SDK is not necessary) +
  • Follow the instructions in README in lammps/lib/gpu to build the library.
  • Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties
-

GPU hardware +

GPU configuration

When using GPUs, you are restricted to one physical GPU per LAMMPS -process. This can be multiple GPUs on a single node or across -multiple nodes. For each GPU pair style, the first two arguments (GPU -mode followed by GPU ID) control how GPUs are selected. If you are -running on a single node, the mode is "one/node" and the parameter is -the ID of the first GPU to select: +process. Multiple processes can share a single GPU and in many cases it +will be more efficient to run with multiple processes per GPU. Any GPU +accelerated style requires that fix gpu be used in the +input script to select and initialize the GPUs. The format for the fix +is:

-
pair_style lj/cut/gpu one/node 0 2.5 
+
fix name all gpu mode first last split 
 
-

The ID is the GPU ID reported by the driver for CUDA enabled graphics -cards. For multiple GPU cards on a node, an MPI process should be run -for each graphics card. In this case, each process will grab the GPU -with ID equal to the process rank plus the GPU parameter. +

where name is the name for the fix. The gpu fix must be the first +fix specified for a given run, otherwise the program will exit +with an error. The gpu fix will not have any effect on runs +that do not use GPU acceleration; there should be no problem +with specifying the fix first in any input script.

-

For multiple nodes with one GPU per node, the mode is "one/gpu" and -the parameter is the ID of the GPU used on every node: +

mode can be either "force" or "force/neigh". In the former, +neighbor list calculation is performed on the CPU using the +standard LAMMPS routines. In the latter, the neighbor list +calculation is performed on the GPU. The GPU neighbor list +can be used for better performance, however, it +should not be used with a triclinic box.

-
pair_style lj/cut/gpu one/gpu 1 2.5 
+

There are cases when it might be more efficient to select the CPU for neighbor +list builds. If a non-GPU enabled style requires a neighbor list, it will also +be built using CPU routines. Redundant CPU and GPU neighbor list calculations +will typically be less efficient. For hybrid pair +styles, GPU calculated neighbor lists might be less efficient because +no particles will be skipped in a given neighbor list. +

+

first is the ID (as reported by lammps/lib/gpu/nvc_get_devices) +of the first GPU that will be used on each node. last is the +ID of the last GPU that will be used on each node. If you have +only one GPU per node, first and last will typically both be +0. Selecting a non-sequential set of GPU IDs (e.g. 0,1,3) +is not currently supported. +

+

split is the fraction of particles whose forces, torques, +energies, and/or virials will be calculated on the GPU. This +can be used to perform CPU and GPU force calculations +simultaneously. If split is negative, the software will +attempt to calculate the optimal fraction automatically +every 25 timesteps based on CPU and GPU timings. Because the GPU speedups +are dependent on the number of particles, automatic calculation of the +split can be less efficient, but typically results in loop times +within 20% of an optimal fixed split. +

+

If you have two GPUs per node, 8 CPU cores per node, and +would like to run on 4 nodes with dynamic balancing of +force calculation across CPU and GPU cores, the fix +might be +

+
fix 0 all gpu force/neigh 0 1 -1 
 
-

In this case, MPI should be run with exactly one process per node. +

with LAMMPS run on 32 processes. In this case, all +CPU cores and GPU devices on the nodes would be utilized. +Each GPU device would be shared by 4 CPU cores. The +CPU cores would perform force calculations for some +fraction of the particles at the same time the GPUs +performed force calculation for the other particles.

-

For multiple nodes with multiple GPUs, the mode is "multi/gpu" and the -parameter is the number of GPUs per node: +

Because of the large number of cores on each GPU +device, it might be more efficient to run on fewer +processes per GPU when the number of particles per process +is small (100's of particles); this can be necessary +to keep the GPU cores busy.

-
pair_style lj/cut/gpu multi/gpu 3 2.5 
-
-

In this case, LAMMPS will attempt to grab 3 GPUs per node and this -requires that the number of processes per node be 3. The first GPU -selected must have ID zero for this mode (in the example, GPUs 0, 1, -and 2 will be selected on every node). An additional constraint is -that the MPI processes must be filled by slot on each node such that -the process ranks on each node are always sequential. This is a option -for the MPI launcher (mpirun/mpiexec) and will be the default on many -clusters. +

GPU input script +

+

In order to use GPU acceleration in LAMMPS, +fix_gpu +should be used in order to initialize and configure the +GPUs for use. Additionally, GPU enabled styles must be +selected in the input script. Currently, +this is limited to a few pair styles. +Some GPU-enabled styles have additional restrictions +listed in their documentation. +

+

GPU asynchronous pair computation +

+

The GPU accelerated pair styles can be used to perform +pair style force calculation on the GPU while other +calculations are +performed on the CPU. One method to do this is to specify +a split in the gpu fix as described above. In this case, +force calculation for the pair style will also be performed +on the CPU. +

+

When the CPU work in a GPU pair style has finished, +the next force computation will begin, possibly before the +GPU has finished. If split is 1.0 in the gpu fix, the next +force computation will begin almost immediately. This can +be used to run a hybrid GPU pair style at +the same time as a hybrid CPU pair style. In this case, the +GPU pair style should be first in the hybrid command in order to +perform simultaneous calculations. This also +allows bond, angle, +dihedral, improper, +and long-range force +computations to be run simultaneously with the GPU pair style. +Once all CPU force computations have completed, the gpu fix +will block until the GPU has finished all work before continuing +the run. +

+

GPU timing +

+

GPU accelerated pair styles can perform computations asynchronously +with CPU computations. The "Pair" time reported by LAMMPS +will be the maximum of the time required to complete the CPU +pair style computations and the time required to complete the GPU +pair style computations. Any time spent for GPU-enabled pair styles +for computations that run simultaneously with bond, +angle, dihedral, +improper, and long-range calculations +will not be included in the "Pair" time. +

+

When mode for the gpu fix is force/neigh, +the time for neighbor list calculations on the GPU will be added +into the "Pair" time, not the "Neigh" time. A breakdown of the +times required for various tasks on the GPU (data copy, neighbor +calculations, force computations, etc.) are output only +with the LAMMPS screen output at the end of each run. These timings represent +total time spent on the GPU for each routine, regardless of asynchronous +CPU calculations.

GPU single vs double precision

See the lammps/lib/gpu/README file for instructions on how to build -the LAMMPS gpu library for single vs double precision. The latter -requires that your GPU card supports double precision. The lj/cut/gpu -pair style does not support double precision. +the LAMMPS gpu library for single, mixed, and double precision. The latter +requires that your GPU card supports double precision.


diff --git a/doc/Section_start.txt b/doc/Section_start.txt index 3fc685cd25..ae747ded66 100644 --- a/doc/Section_start.txt +++ b/doc/Section_start.txt @@ -396,9 +396,9 @@ The two exceptions to this are the "gpu" and "opt" packages. Some of the files in these packages require other packages to also be included. If this is not the case, then those subsidiary files in "gpu" and "opt" will not be installed either. To install all the -files in package "gpu", the "asphere" package must also be installed. -To install all the files in package "opt", the "kspace" and "manybody" -packages must also be installed. +files in package "gpu", the "asphere" and "kspace" packages must also be +installed. To install all the files in package "opt", the "kspace" and +"manybody" packages must also be installed. You may wish to exclude certain packages if you will never run certain kinds of simulations. This will keep you from having to build @@ -899,53 +899,141 @@ certain NVIDIA CUDA software on your system: Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0 Go to http://www.nvidia.com/object/cuda_get.html -Install a driver and toolkit appopriate for your system (SDK is not necessary) -Run make in lammps/lib/gpu, editing a Makefile if necessary +Install a driver and toolkit appropriate for your system (SDK is not necessary) +Follow the instructions in README in lammps/lib/gpu to build the library. Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties :ul -GPU hardware :h4 +GPU configuration :h4 When using GPUs, you are restricted to one physical GPU per LAMMPS -process. This can be multiple GPUs on a single node or across -multiple nodes. For each GPU pair style, the first two arguments (GPU -mode followed by GPU ID) control how GPUs are selected. If you are -running on a single node, the mode is "one/node" and the parameter is -the ID of the first GPU to select: +process. Multiple processes can share a single GPU and in many cases it +will be more efficient to run with multiple processes per GPU. Any GPU +accelerated style requires that "fix gpu"_fix_gpu.html be used in the +input script to select and initialize the GPUs. The format for the fix +is: -pair_style lj/cut/gpu one/node 0 2.5 :pre +fix {name} all gpu {mode} {first} {last} {split} :pre -The ID is the GPU ID reported by the driver for CUDA enabled graphics -cards. For multiple GPU cards on a node, an MPI process should be run -for each graphics card. In this case, each process will grab the GPU -with ID equal to the process rank plus the GPU parameter. +where {name} is the name for the fix. The gpu fix must be the first +fix specified for a given run, otherwise the program will exit +with an error. The gpu fix will not have any effect on runs +that do not use GPU acceleration; there should be no problem +with specifying the fix first in any input script. -For multiple nodes with one GPU per node, the mode is "one/gpu" and -the parameter is the ID of the GPU used on every node: +{mode} can be either "force" or "force/neigh". In the former, +neighbor list calculation is performed on the CPU using the +standard LAMMPS routines. In the latter, the neighbor list +calculation is performed on the GPU. The GPU neighbor list +can be used for better performance, however, it +should not be used with a triclinic box. -pair_style lj/cut/gpu one/gpu 1 2.5 :pre +There are cases when it might be more efficient to select the CPU for neighbor +list builds. If a non-GPU enabled style requires a neighbor list, it will also +be built using CPU routines. Redundant CPU and GPU neighbor list calculations +will typically be less efficient. For "hybrid"_pair_hybrid.html pair +styles, GPU calculated neighbor lists might be less efficient because +no particles will be skipped in a given neighbor list. -In this case, MPI should be run with exactly one process per node. +{first} is the ID (as reported by lammps/lib/gpu/nvc_get_devices) +of the first GPU that will be used on each node. {last} is the +ID of the last GPU that will be used on each node. If you have +only one GPU per node, {first} and {last} will typically both be +0. Selecting a non-sequential set of GPU IDs (e.g. 0,1,3) +is not currently supported. -For multiple nodes with multiple GPUs, the mode is "multi/gpu" and the -parameter is the number of GPUs per node: +{split} is the fraction of particles whose forces, torques, +energies, and/or virials will be calculated on the GPU. This +can be used to perform CPU and GPU force calculations +simultaneously. If {split} is negative, the software will +attempt to calculate the optimal fraction automatically +every 25 timesteps based on CPU and GPU timings. Because the GPU speedups +are dependent on the number of particles, automatic calculation of the +split can be less efficient, but typically results in loop times +within 20% of an optimal fixed split. -pair_style lj/cut/gpu multi/gpu 3 2.5 :pre +If you have two GPUs per node, 8 CPU cores per node, and +would like to run on 4 nodes with dynamic balancing of +force calculation across CPU and GPU cores, the fix +might be -In this case, LAMMPS will attempt to grab 3 GPUs per node and this -requires that the number of processes per node be 3. The first GPU -selected must have ID zero for this mode (in the example, GPUs 0, 1, -and 2 will be selected on every node). An additional constraint is -that the MPI processes must be filled by slot on each node such that -the process ranks on each node are always sequential. This is a option -for the MPI launcher (mpirun/mpiexec) and will be the default on many -clusters. +fix 0 all gpu force/neigh 0 1 -1 :pre + +with LAMMPS run on 32 processes. In this case, all +CPU cores and GPU devices on the nodes would be utilized. +Each GPU device would be shared by 4 CPU cores. The +CPU cores would perform force calculations for some +fraction of the particles at the same time the GPUs +performed force calculation for the other particles. + +Because of the large number of cores on each GPU +device, it might be more efficient to run on fewer +processes per GPU when the number of particles per process +is small (100's of particles); this can be necessary +to keep the GPU cores busy. + +GPU input script :h4 + +In order to use GPU acceleration in LAMMPS, +"fix_gpu"_fix_gpu.html +should be used in order to initialize and configure the +GPUs for use. Additionally, GPU enabled styles must be +selected in the input script. Currently, +this is limited to a few "pair styles"_pair_style.html. +Some GPU-enabled styles have additional restrictions +listed in their documentation. + +GPU asynchronous pair computation :h4 + +The GPU accelerated pair styles can be used to perform +pair style force calculation on the GPU while other +calculations are +performed on the CPU. One method to do this is to specify +a {split} in the gpu fix as described above. In this case, +force calculation for the pair style will also be performed +on the CPU. + +When the CPU work in a GPU pair style has finished, +the next force computation will begin, possibly before the +GPU has finished. If {split} is 1.0 in the gpu fix, the next +force computation will begin almost immediately. This can +be used to run a "hybrid"_pair_hybrid.html GPU pair style at +the same time as a hybrid CPU pair style. In this case, the +GPU pair style should be first in the hybrid command in order to +perform simultaneous calculations. This also +allows "bond"_bond_style.html, "angle"_angle_style.html, +"dihedral"_dihedral_style.html, "improper"_improper_style.html, +and "long-range"_kspace_style.html force +computations to be run simultaneously with the GPU pair style. +Once all CPU force computations have completed, the gpu fix +will block until the GPU has finished all work before continuing +the run. + +GPU timing :h4 + +GPU accelerated pair styles can perform computations asynchronously +with CPU computations. The "Pair" time reported by LAMMPS +will be the maximum of the time required to complete the CPU +pair style computations and the time required to complete the GPU +pair style computations. Any time spent for GPU-enabled pair styles +for computations that run simultaneously with "bond"_bond_style.html, +"angle"_angle_style.html, "dihedral"_dihedral_style.html, +"improper"_improper_style.html, and "long-range"_kspace_style.html calculations +will not be included in the "Pair" time. + +When {mode} for the gpu fix is force/neigh, +the time for neighbor list calculations on the GPU will be added +into the "Pair" time, not the "Neigh" time. A breakdown of the +times required for various tasks on the GPU (data copy, neighbor +calculations, force computations, etc.) are output only +with the LAMMPS screen output at the end of each run. These timings represent +total time spent on the GPU for each routine, regardless of asynchronous +CPU calculations. GPU single vs double precision :h4 See the lammps/lib/gpu/README file for instructions on how to build -the LAMMPS gpu library for single vs double precision. The latter -requires that your GPU card supports double precision. The lj/cut/gpu -pair style does not support double precision. +the LAMMPS gpu library for single, mixed, and double precision. The latter +requires that your GPU card supports double precision. :line diff --git a/doc/fix_gpu.html b/doc/fix_gpu.html new file mode 100644 index 0000000000..72839bc0d1 --- /dev/null +++ b/doc/fix_gpu.html @@ -0,0 +1,107 @@ + +
LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Commands +
+ + + + + + +
+ +

fix gpu command +

+

Syntax: +

+
fix ID group-ID gpu mode first last split 
+
+
  • ID, group-ID are documented in fix command + +
  • gpu = style name of this fix command + +
  • mode = force or force/neigh + +
  • first = ID of first GPU to be used on each node + +
  • last = ID of last GPU to be used on each node + +
  • split = fraction of particles assigned to the GPU + + +
+

Examples: +

+
fix 0 all gpu force 0 0 1.0
+fix 0 all gpu force 0 0 0.75
+fix 0 all gpu force/neigh 0 0 1.0
+fix 0 all gpu force/neigh 0 1 -1.0 
+
+

Description: +

+

Select and initialize GPUs to be used for acceleration and configure +GPU acceleration in LAMMPS. This fix is required in order to use +any style with GPU acceleration. The fix must be the first fix +specified for a run or an error will be generated. The fix will not have an +effect on any LAMMPS computations that do not use GPU acceleration, so there +should not be any problems with specifying this fix first in input scripts. +

+

mode specifies where neighbor list calculations will be performed. +If mode is force, neighbor list calculation is performed on the +CPU. If mode is force/neigh, neighbor list calculation is +performed on the GPU. GPU neighbor +list calculation currently cannot be used with a triclinic box. +GPU neighbor lists are not compatible with styles that are not GPU-enabled. +When a non-GPU enabled style requires a neighbor list, it will also be +built using CPU routines. In these cases, it will typically be more efficient +to only use CPU neighbor list builds. For hybrid pair +styles, GPU calculated neighbor lists might be less efficient because +no particles will be skipped in a given neighbor list. +

+

first and last specify the GPUs that will be used for simulation. +On each node, the GPU IDs in the inclusive range from first to last will +be used. +

+

split can be used for load balancing force calculation work between +CPU and GPU cores in GPU-enabled pair styles. If 0<split<1.0, +a fixed fraction of particles is offloaded to the GPU while force calculation +for the other particles occurs simulataneously on the CPU. If split<0, +the optimal fraction (based on CPU and GPU timings) is calculated +every 25 timesteps. If split=1.0, all force calculations for +GPU accelerated pair styles are performed +on the GPU. In this case, hybrid, +bond, angle, +dihedral, improper, +and long-range calculations can be performed on the CPU +while the GPU is performing force calculations for the GPU-enabled pair +style. +

+

In order to use GPU acceleration, a GPU enabled style must be +selected in the input script in addition to this fix. Currently, +this is limited to a few pair styles. +

+

More details about these settings and various possible hardware +configuration are in this section of the +manual. +

+

Restart, fix_modify, output, run start/stop, minimize info: +

+

No information about this fix is written to binary restart +files. None of the fix_modify options +are relevant to this fix. +

+

No parameter of this fix can be used with the start/stop keywords of +the run command. +

+

Restrictions: +

+

The fix must be the first fix specified for a given run. The force/neigh +mode should not be used with a triclinic box or GPU-enabled pair styles +that need special_bonds settings. +

+

Currently, group-ID must be all. +

+

Related commands: none +

+

Default: none +

+ diff --git a/doc/fix_gpu.txt b/doc/fix_gpu.txt new file mode 100644 index 0000000000..88fa6f5414 --- /dev/null +++ b/doc/fix_gpu.txt @@ -0,0 +1,97 @@ +"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c + +:link(lws,http://lammps.sandia.gov) +:link(ld,Manual.html) +:link(lc,Section_commands.html#comm) + +:line + +fix gpu command :h3 + +[Syntax:] + +fix ID group-ID gpu mode first last split :pre + +ID, group-ID are documented in "fix"_fix.html command :ulb,l +gpu = style name of this fix command :l +mode = force or force/neigh :l +first = ID of first GPU to be used on each node :l +last = ID of last GPU to be used on each node :l +split = fraction of particles assigned to the GPU :l +:ule + +[Examples:] + +fix 0 all gpu force 0 0 1.0 +fix 0 all gpu force 0 0 0.75 +fix 0 all gpu force/neigh 0 0 1.0 +fix 0 all gpu force/neigh 0 1 -1.0 :pre + +[Description:] + +Select and initialize GPUs to be used for acceleration and configure +GPU acceleration in LAMMPS. This fix is required in order to use +any style with GPU acceleration. The fix must be the first fix +specified for a run or an error will be generated. The fix will not have an +effect on any LAMMPS computations that do not use GPU acceleration, so there +should not be any problems with specifying this fix first in input scripts. + +{mode} specifies where neighbor list calculations will be performed. +If {mode} is force, neighbor list calculation is performed on the +CPU. If {mode} is force/neigh, neighbor list calculation is +performed on the GPU. GPU neighbor +list calculation currently cannot be used with a triclinic box. +GPU neighbor lists are not compatible with styles that are not GPU-enabled. +When a non-GPU enabled style requires a neighbor list, it will also be +built using CPU routines. In these cases, it will typically be more efficient +to only use CPU neighbor list builds. For "hybrid"_pair_hybrid.html pair +styles, GPU calculated neighbor lists might be less efficient because +no particles will be skipped in a given neighbor list. + +{first} and {last} specify the GPUs that will be used for simulation. +On each node, the GPU IDs in the inclusive range from {first} to {last} will +be used. + +{split} can be used for load balancing force calculation work between +CPU and GPU cores in GPU-enabled pair styles. If 0<{split}<1.0, +a fixed fraction of particles is offloaded to the GPU while force calculation +for the other particles occurs simulataneously on the CPU. If {split}<0, +the optimal fraction (based on CPU and GPU timings) is calculated +every 25 timesteps. If {split}=1.0, all force calculations for +GPU accelerated pair styles are performed +on the GPU. In this case, "hybrid"_pair_hybrid.html, +"bond"_bond_style.html, "angle"_angle_style.html, +"dihedral"_dihedral_style.html, "improper"_improper_style.html, +and "long-range"_kspace_style.html calculations can be performed on the CPU +while the GPU is performing force calculations for the GPU-enabled pair +style. + +In order to use GPU acceleration, a GPU enabled style must be +selected in the input script in addition to this fix. Currently, +this is limited to a few "pair styles"_pair_style.html. + +More details about these settings and various possible hardware +configuration are in "this section"_Section_start.html#2_8 of the +manual. + +[Restart, fix_modify, output, run start/stop, minimize info:] + +No information about this fix is written to "binary restart +files"_restart.html. None of the "fix_modify"_fix_modify.html options +are relevant to this fix. + +No parameter of this fix can be used with the {start/stop} keywords of +the "run"_run.html command. + +[Restrictions:] + +The fix must be the first fix specified for a given run. The force/neigh +{mode} should not be used with a triclinic box or GPU-enabled pair styles +that need "special_bonds"_special_bonds.html settings. + +Currently, group-ID must be all. + +[Related commands:] none + +[Default:] none + diff --git a/doc/pair_cmm.html b/doc/pair_cmm.html index f6033ded25..5f43b6ef63 100644 --- a/doc/pair_cmm.html +++ b/doc/pair_cmm.html @@ -11,19 +11,25 @@

pair_style cg/cmm command

+

pair_style cg/cmm/gpu command +

pair_style cg/cmm/coul/cut command

pair_style cg/cmm/coul/long command

+

pair_style cg/cmm/coul/long/gpu command +

Syntax:

pair_style style args 
 
-
  • style = cg/cmm or cg/cmm/coul/cut or cg/cmm/coul/long +
    • style = cg/cmm or cg/cmm/gpu or cg/cmm/coul/cut or cg/cmm/coul/long or cg/cmm/coul/long/gpu
    • args = list of arguments for a particular style
      cg/cmm args = cutoff
         cutoff = global cutoff for Lennard Jones interactions (distance units)
    +  cg/cmm/gpu args = cutoff
    +    cutoff = global cutoff for Lennard Jones interactions (distance units)
       cg/cmm/coul/cut args = cutoff (cutoff2) (kappa)
         cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
         cutoff2 = global cutoff for Coulombic (optional) (distance units)
    @@ -32,6 +38,10 @@
         cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
         cutoff2 = global cutoff for Coulombic (optional) (distance units) 
     
    +
      cg/cmm/coul/long/gpu args = cutoff (cutoff2)
    +    cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
    +    cutoff2 = global cutoff for Coulombic (optional) (distance units) 
    +

    Examples:

    pair_style cg/cmm 2.5
    @@ -55,6 +65,9 @@ given by
     

    as required for the CMM Coarse-grained MD parametrization discussed in (Shinoda) and (DeVane). Rc is the cutoff.

    +

    Style cg/cmm/gpu is a GPU-enabled version of style cg/cmm. +See more details below. +

    Style cg/cmm/coul/cut adds a Coulombic pairwise interaction given by

    @@ -83,6 +96,9 @@ option. The Coulombic cutoff specified for this style means that pairwise interactions within this distance are computed directly; interactions outside that distance are computed in reciprocal space.

    +

    Style cg/cmm/coul/long/gpu is a GPU-enabled version of style cg/cmm/coul/long. +See more details below. +

    The following coefficients must be defined for each pair of atoms types via the pair_coeff command as in the examples above, or in the data file or restart files read by the @@ -113,6 +129,27 @@ pair_style command.


    +

    The cg/cmm/gpu and cg/cmm/coul/long/gpu styles +are identical to the cg/cmm and cg/cmm/coul/long +styles, except that each processor off-loads its pairwise calculations to a +GPU chip. Depending on the hardware available on your system this can provide a +speed-up. See the Running on GPUs section of +the manual for more details about hardware and software requirements +for using GPUs. +

    +

    More details about these settings and various possible hardware +configuration are in this section of the +manual. +

    +

    Additional requirements in your input script to run with GPU-enabled styles +are as follows: +

    +

    The newton pair setting must be off and +fix gpu must be used. The fix controls the +essential GPU selection and initialization steps. +

    +
    +

    Mixing, shift, table, tail correction, restart, and rRESPA info:

    For atom type pairs I,J and I != J, the epsilon and sigma coefficients diff --git a/doc/pair_cmm.txt b/doc/pair_cmm.txt index 7ebd3141fc..ec5884c9a5 100644 --- a/doc/pair_cmm.txt +++ b/doc/pair_cmm.txt @@ -7,17 +7,21 @@ :line pair_style cg/cmm command :h3 +pair_style cg/cmm/gpu command :h3 pair_style cg/cmm/coul/cut command :h3 pair_style cg/cmm/coul/long command :h3 +pair_style cg/cmm/coul/long/gpu command :h3 [Syntax:] pair_style style args :pre -style = {cg/cmm} or {cg/cmm/coul/cut} or {cg/cmm/coul/long} +style = {cg/cmm} or {cg/cmm/gpu} or {cg/cmm/coul/cut} or {cg/cmm/coul/long} or {cg/cmm/coul/long/gpu} args = list of arguments for a particular style :ul {cg/cmm} args = cutoff cutoff = global cutoff for Lennard Jones interactions (distance units) + {cg/cmm/gpu} args = cutoff + cutoff = global cutoff for Lennard Jones interactions (distance units) {cg/cmm/coul/cut} args = cutoff (cutoff2) (kappa) cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units) cutoff2 = global cutoff for Coulombic (optional) (distance units) @@ -25,6 +29,9 @@ args = list of arguments for a particular style :ul {cg/cmm/coul/long} args = cutoff (cutoff2) cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units) cutoff2 = global cutoff for Coulombic (optional) (distance units) :pre + {cg/cmm/coul/long/gpu} args = cutoff (cutoff2) + cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units) + cutoff2 = global cutoff for Coulombic (optional) (distance units) :pre [Examples:] @@ -49,6 +56,9 @@ given by as required for the CMM Coarse-grained MD parametrization discussed in "(Shinoda)"_#Shinoda and "(DeVane)"_#DeVane. Rc is the cutoff. +Style {cg/cmm/gpu} is a GPU-enabled version of style {cg/cmm}. +See more details below. + Style {cg/cmm/coul/cut} adds a Coulombic pairwise interaction given by :c,image(Eqs/pair_coulomb.jpg) @@ -77,6 +87,9 @@ option. The Coulombic cutoff specified for this style means that pairwise interactions within this distance are computed directly; interactions outside that distance are computed in reciprocal space. +Style {cg/cmm/coul/long/gpu} is a GPU-enabled version of style {cg/cmm/coul/long}. +See more details below. + The following coefficients must be defined for each pair of atoms types via the "pair_coeff"_pair_coeff.html command as in the examples above, or in the data file or restart files read by the @@ -107,6 +120,27 @@ pair_style command. :line +The {cg/cmm/gpu} and {cg/cmm/coul/long/gpu} styles +are identical to the {cg/cmm} and {cg/cmm/coul/long} +styles, except that each processor off-loads its pairwise calculations to a +GPU chip. Depending on the hardware available on your system this can provide a +speed-up. See the "Running on GPUs"_Section_start.html#2_8 section of +the manual for more details about hardware and software requirements +for using GPUs. + +More details about these settings and various possible hardware +configuration are in "this section"_Section_start.html#2_8 of the +manual. + +Additional requirements in your input script to run with GPU-enabled styles +are as follows: + +The "newton pair"_newton.html setting must be {off} and +"fix gpu"_fix_gpu.html must be used. The fix controls the +essential GPU selection and initialization steps. + +:line + [Mixing, shift, table, tail correction, restart, and rRESPA info]: For atom type pairs I,J and I != J, the epsilon and sigma coefficients diff --git a/doc/pair_gayberne.html b/doc/pair_gayberne.html index a9cbd492d1..a7533a42ca 100644 --- a/doc/pair_gayberne.html +++ b/doc/pair_gayberne.html @@ -17,11 +17,9 @@

    pair_style gayberne gamma upsilon mu cutoff 
     
    -
    pair_style gayberne/gpu gpuflag gpunum gamma upsilon mu cutoff 
    +
    pair_style gayberne/gpu gamma upsilon mu cutoff 
     
    • style = gayberne or gayberne/gpu -
    • gpumode = one/node or one/gpu or multi/gpu, only used with gayberne/gpu -
    • gpuID = ID or number of GPUs, only used with gayberne/gpu
    • gamma = shift for potential minimum (typically 1)
    • upsilon = exponent for eta orientation-dependent energy function
    • mu = exponent for chi orientation-dependent energy function @@ -30,7 +28,7 @@

      Examples:

      pair_style gayberne 1.0 1.0 1.0 10.0
      -pair_style gayberne/gpu one/node 0 1.0 1.0 1.0 10.0
      +pair_style gayberne/gpu 1.0 1.0 1.0 10.0
       pair_coeff * * 1.0 1.7 1.7 3.4 3.4 1.0 1.0 1.0 
       

      Description: @@ -50,10 +48,8 @@ both particles are spherical, the formula reduces to the usual Lennard-Jones interaction (see details below for when Gay-Berne treats a particle as "spherical").

      -

      Style gayberne/gpu is a GPU-enabled version of style gayberne that -should give identical answers. Depending on system size and the GPU -processor you have on your system, it may be 100x faster (for the -pairwise portion of the run time). See more details below. +

      Style gayberne/gpu is a GPU-enabled version of style gayberne. +See more details below.

      For large uniform molecules it has been shown that the energy parameters are approximately representable in terms of local contact @@ -141,27 +137,11 @@ to specify its interaction with other spherical particles.

      The gayberne/gpu style is identical to the gayberne style, except that each processor off-loads its pairwise calculations to a GPU chip. Depending on the hardware available on your system this can provide a -significant speed-up, espcially for the relatively expensive +significant speed-up, especially for the relatively expensive computations inherent in Gay-Berne interactions. See the Running on GPUs section of the manual for more details about hardware and software requirements for using GPUs.

      -

      The gpumode and gpuID settings in the pair_style command refer to -how the GPUs on your system are configured. -

      -

      Set gpumode to one/node if you have a single compute "node" on -your system, which may have multiple cores and/or GPUs. GpuID -should be set to the ID of the (first) GPU you wish to use with LAMMPS -(another GPU might be driving your display). -

      -

      Set gpumode to one/gpu if you have multiple compute "nodes" on -your system, with one GPU per node. GpuID should be set to the ID -of the GPU. -

      -

      Set gpumode to multi/gpu if you have multiple compute "nodes" on -your system, each with multiple GPUs. GpuID should be set to the -number of GPUs per node. -

      More details about these settings and various possible hardware configuration are in this section of the manual. @@ -169,7 +149,9 @@ manual.

      Additional requirements in your input script to run with style gayberne/gpu are as follows:

      -

      The newton pair setting must be off. +

      The newton pair setting must be off and +fix gpu must be used. The fix controls the +essential GPU selection and initialization steps.


      diff --git a/doc/pair_gayberne.txt b/doc/pair_gayberne.txt index 71618b616b..d2eedd4dc2 100755 --- a/doc/pair_gayberne.txt +++ b/doc/pair_gayberne.txt @@ -12,11 +12,9 @@ pair_style gayberne/gpu command :h3 [Syntax:] pair_style gayberne gamma upsilon mu cutoff :pre -pair_style gayberne/gpu gpuflag gpunum gamma upsilon mu cutoff :pre +pair_style gayberne/gpu gamma upsilon mu cutoff :pre style = {gayberne} or {gayberne/gpu} -gpumode = {one/node} or {one/gpu} or {multi/gpu}, only used with gayberne/gpu -gpuID = ID or number of GPUs, only used with gayberne/gpu gamma = shift for potential minimum (typically 1) upsilon = exponent for eta orientation-dependent energy function mu = exponent for chi orientation-dependent energy function @@ -25,7 +23,7 @@ cutoff = global cutoff for interactions (distance units) :ul [Examples:] pair_style gayberne 1.0 1.0 1.0 10.0 -pair_style gayberne/gpu one/node 0 1.0 1.0 1.0 10.0 +pair_style gayberne/gpu 1.0 1.0 1.0 10.0 pair_coeff * * 1.0 1.7 1.7 3.4 3.4 1.0 1.0 1.0 :pre [Description:] @@ -45,10 +43,8 @@ both particles are spherical, the formula reduces to the usual Lennard-Jones interaction (see details below for when Gay-Berne treats a particle as "spherical"). -Style {gayberne/gpu} is a GPU-enabled version of style {gayberne} that -should give identical answers. Depending on system size and the GPU -processor you have on your system, it may be 100x faster (for the -pairwise portion of the run time). See more details below. +Style {gayberne/gpu} is a GPU-enabled version of style {gayberne}. +See more details below. For large uniform molecules it has been shown that the energy parameters are approximately representable in terms of local contact @@ -136,27 +132,11 @@ to specify its interaction with other spherical particles. The {gayberne/gpu} style is identical to the {gayberne} style, except that each processor off-loads its pairwise calculations to a GPU chip. Depending on the hardware available on your system this can provide a -significant speed-up, espcially for the relatively expensive +significant speed-up, especially for the relatively expensive computations inherent in Gay-Berne interactions. See the "Running on GPUs"_Section_start.html#2_8 section of the manual for more details about hardware and software requirements for using GPUs. -The {gpumode} and {gpuID} settings in the pair_style command refer to -how the GPUs on your system are configured. - -Set {gpumode} to {one/node} if you have a single compute "node" on -your system, which may have multiple cores and/or GPUs. {GpuID} -should be set to the ID of the (first) GPU you wish to use with LAMMPS -(another GPU might be driving your display). - -Set {gpumode} to {one/gpu} if you have multiple compute "nodes" on -your system, with one GPU per node. {GpuID} should be set to the ID -of the GPU. - -Set {gpumode} to {multi/gpu} if you have multiple compute "nodes" on -your system, each with multiple GPUs. {GpuID} should be set to the -number of GPUs per node. - More details about these settings and various possible hardware configuration are in "this section"_Section_start.html#2_8 of the manual. @@ -164,7 +144,9 @@ manual. Additional requirements in your input script to run with style {gayberne/gpu} are as follows: -The "newton pair"_newton.html setting must be {off}. +The "newton pair"_newton.html setting must be {off} and +"fix gpu"_fix_gpu.html must be used. The fix controls the +essential GPU selection and initialization steps. :line diff --git a/doc/pair_lj.html b/doc/pair_lj.html index b09d383003..c70f96baff 100644 --- a/doc/pair_lj.html +++ b/doc/pair_lj.html @@ -17,30 +17,35 @@

      pair_style lj/cut/coul/cut command

      +

      pair_style lj/cut/coul/cut/gpu command +

      pair_style lj/cut/coul/debye command

      pair_style lj/cut/coul/long command

      +

      pair_style lj/cut/coul/long/gpu command +

      pair_style lj/cut/coul/long/tip4p command

      Syntax:

      pair_style style args 
       
      -
      • style = lj/cut or lj/cut/gpu or lj/cut/opt or lj/cut/coul/cut or lj/cut/coul/debye or lj/cut/coul/long or lj/cut/coul/long/tip4p +
        • style = lj/cut or lj/cut/gpu or lj/cut/opt or lj/cut/coul/cut or lj/cut/coul/debye or lj/cut/coul/long or lj/cut/coul/long/tip4p
        • args = list of arguments for a particular style
          lj/cut args = cutoff
             cutoff = global cutoff for Lennard Jones interactions (distance units)
        -  lj/cut/gpu args = gpumode gpuID cutoff
        -    gpumode = one/node or one/gpu or multi/gpu
        -    gpuID = ID or number of GPUs
        +  lj/cut/gpu args = cutoff
             cutoff = global cutoff for Lennard Jones interactions (distance units)
           lj/cut/opt args = cutoff
             cutoff = global cutoff for Lennard Jones interactions (distance units)
           lj/cut/coul/cut args = cutoff (cutoff2)
             cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
             cutoff2 = global cutoff for Coulombic (optional) (distance units)
        +  lj/cut/coul/cut/gpu args = cutoff (cutoff2)
        +    cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
        +    cutoff2 = global cutoff for Coulombic (optional) (distance units)
           lj/cut/coul/debye args = kappa cutoff (cutoff2)
             kappa = Debye length (inverse distance units)
             cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
        @@ -48,6 +53,9 @@
           lj/cut/coul/long args = cutoff (cutoff2)
             cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
             cutoff2 = global cutoff for Coulombic (optional) (distance units)
        +  lj/cut/coul/long/gpu args = cutoff (cutoff2)
        +    cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
        +    cutoff2 = global cutoff for Coulombic (optional) (distance units)
           lj/cut/coul/long/tip4p args = otype htype btype atype qdist cutoff (cutoff2)
             otype,htype = atom types for TIP4P O and H
             btype,atype = bond and angle types for TIP4P waters
        @@ -58,12 +66,13 @@
         

        Examples:

        pair_style lj/cut 2.5
        -pair_style lj/cut/gpu one/node 0 2.5
        +pair_style lj/cut/gpu 2.5
         pair_style lj/cut/opt 2.5
         pair_coeff * * 1 1
         pair_coeff 1 1 1 1.1 2.8 
         
        pair_style lj/cut/coul/cut 10.0
        +pair_style lj/cut/coul/cut/gpu 10.0
         pair_style lj/cut/coul/cut 10.0 8.0
         pair_coeff * * 100.0 3.0
         pair_coeff 1 1 100.0 3.5 9.0
        @@ -76,6 +85,7 @@ pair_coeff 1 1 1.0 1.5 2.5
         pair_coeff 1 1 1.0 1.5 2.5 5.0 
         
        pair_style lj/cut/coul/long 10.0
        +pair_style lj/cut/coul/long/gpu 10.0
         pair_style lj/cut/coul/long 10.0 8.0
         pair_coeff * * 100.0 3.0
         pair_coeff 1 1 100.0 3.5 9.0 
        @@ -94,10 +104,8 @@ given by
         

    Rc is the cutoff.

    -

    Style lj/cut/gpu is a GPU-enabled version of style lj/cut that -should give identical answers. Depending on system size and the GPU -processor you have on your system, it may be 4x faster (for the -pairwise portion of the run time). See more details below. +

    Style lj/cut/gpu is a GPU-enabled version of style lj/cut. +See more details below.

    Style lj/cut/opt is an optimized version of style lj/cut that should give identical answers. Depending on system size and the @@ -115,6 +123,9 @@ specified in the pair_style command, it is used for both the LJ and Coulombic terms. If two cutoffs are specified, they are used as cutoffs for the LJ and Coulombic terms respectively.

    +

    Style lj/cut/coul/cut/gpu is a GPU-enabled version of style lj/cut/coul/cut. +See more details below. +

    Style lj/cut/coul/debye adds an additional exp() damping factor to the Coulombic term, given by

    @@ -131,6 +142,9 @@ option. The Coulombic cutoff specified for this style means that pairwise interactions within this distance are computed directly; interactions outside that distance are computed in reciprocal space.

    +

    Style lj/cut/coul/long/gpu is a GPU-enabled version of style lj/cut/coul/long. +See more details below. +

    Style lj/cut/coul/long/tip4p implements the TIP4P water model of (Jorgensen), which introduces a massless site located a short distance away from the oxygen atom along the bisector of the HOH @@ -177,9 +191,10 @@ Coulombic cutoff specified in the pair_style command.


    -

    The lj/cut/gpu style is identical to the lj/cut style, except that -each processor off-loads its pairwise calculations to a GPU chip. -Depending on the hardware available on your system this can provide a +

    The lj/cut/gpu, lj/cut/coul/cut/gpu, and lj/cut/coul/long/gpu styles +are identical to the lj/cut, lj/cut/coul/cut, and lj/cut/coul/long +styles, except that each processor off-loads its pairwise calculations to a +GPU chip. Depending on the hardware available on your system this can provide a speed-up. See the Running on GPUs section of the manual for more details about hardware and software requirements for using GPUs. @@ -204,10 +219,12 @@ number of GPUs per node. configuration are in this section of the manual.

    -

    Additional requirements in your input script to run with style -lj/cut/gpu are as follows: +

    Additional requirements in your input script to run with GPU-enabled styles +are as follows:

    -

    The newton pair setting must be off. +

    The newton pair setting must be off and +fix gpu must be used. The fix controls +the essential GPU selection and initialization steps.


    @@ -248,7 +265,8 @@ See the run_style command for details.

    Restrictions:

    The lj/cut/coul/long and lj/cut/coul/long/tip4p styles are part of -the "kspace" package. The lj/cut/gpu style is part of the "gpu" +the "kspace" package. The lj/cut/gpu, lj/cut/coul/cut/gpu, and +lj/cut/coul/long/gpu styles are part of the "gpu" package. The lj/cut/opt style is part of the "opt" package. They are only enabled if LAMMPS was built with those packages. See the Making LAMMPS section for more info. Note diff --git a/doc/pair_lj.txt b/doc/pair_lj.txt index 3099113334..42f0c9d6cc 100644 --- a/doc/pair_lj.txt +++ b/doc/pair_lj.txt @@ -10,28 +10,31 @@ pair_style lj/cut command :h3 pair_style lj/cut/gpu command :h3 pair_style lj/cut/opt command :h3 pair_style lj/cut/coul/cut command :h3 +pair_style lj/cut/coul/cut/gpu command :h3 pair_style lj/cut/coul/debye command :h3 pair_style lj/cut/coul/long command :h3 +pair_style lj/cut/coul/long/gpu command :h3 pair_style lj/cut/coul/long/tip4p command :h3 [Syntax:] pair_style style args :pre -style = {lj/cut} or {lj/cut/gpu} or {lj/cut/opt} or {lj/cut/coul/cut} or {lj/cut/coul/debye} \ - or {lj/cut/coul/long} or {lj/cut/coul/long/tip4p} +style = {lj/cut} or {lj/cut/gpu} or {lj/cut/opt} or {lj/cut/coul/cut} \ + or {lj/cut/coul/debye} or {lj/cut/coul/long} or {lj/cut/coul/long/tip4p} args = list of arguments for a particular style :ul {lj/cut} args = cutoff cutoff = global cutoff for Lennard Jones interactions (distance units) - {lj/cut/gpu} args = gpumode gpuID cutoff - gpumode = {one/node} or {one/gpu} or {multi/gpu} - gpuID = ID or number of GPUs + {lj/cut/gpu} args = cutoff cutoff = global cutoff for Lennard Jones interactions (distance units) {lj/cut/opt} args = cutoff cutoff = global cutoff for Lennard Jones interactions (distance units) {lj/cut/coul/cut} args = cutoff (cutoff2) cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units) cutoff2 = global cutoff for Coulombic (optional) (distance units) + {lj/cut/coul/cut/gpu} args = cutoff (cutoff2) + cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units) + cutoff2 = global cutoff for Coulombic (optional) (distance units) {lj/cut/coul/debye} args = kappa cutoff (cutoff2) kappa = Debye length (inverse distance units) cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units) @@ -39,6 +42,9 @@ args = list of arguments for a particular style :ul {lj/cut/coul/long} args = cutoff (cutoff2) cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units) cutoff2 = global cutoff for Coulombic (optional) (distance units) + {lj/cut/coul/long/gpu} args = cutoff (cutoff2) + cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units) + cutoff2 = global cutoff for Coulombic (optional) (distance units) {lj/cut/coul/long/tip4p} args = otype htype btype atype qdist cutoff (cutoff2) otype,htype = atom types for TIP4P O and H btype,atype = bond and angle types for TIP4P waters @@ -49,12 +55,13 @@ args = list of arguments for a particular style :ul [Examples:] pair_style lj/cut 2.5 -pair_style lj/cut/gpu one/node 0 2.5 +pair_style lj/cut/gpu 2.5 pair_style lj/cut/opt 2.5 pair_coeff * * 1 1 pair_coeff 1 1 1 1.1 2.8 :pre pair_style lj/cut/coul/cut 10.0 +pair_style lj/cut/coul/cut/gpu 10.0 pair_style lj/cut/coul/cut 10.0 8.0 pair_coeff * * 100.0 3.0 pair_coeff 1 1 100.0 3.5 9.0 @@ -67,6 +74,7 @@ pair_coeff 1 1 1.0 1.5 2.5 pair_coeff 1 1 1.0 1.5 2.5 5.0 :pre pair_style lj/cut/coul/long 10.0 +pair_style lj/cut/coul/long/gpu 10.0 pair_style lj/cut/coul/long 10.0 8.0 pair_coeff * * 100.0 3.0 pair_coeff 1 1 100.0 3.5 9.0 :pre @@ -85,10 +93,8 @@ given by Rc is the cutoff. -Style {lj/cut/gpu} is a GPU-enabled version of style {lj/cut} that -should give identical answers. Depending on system size and the GPU -processor you have on your system, it may be 4x faster (for the -pairwise portion of the run time). See more details below. +Style {lj/cut/gpu} is a GPU-enabled version of style {lj/cut}. +See more details below. Style {lj/cut/opt} is an optimized version of style {lj/cut} that should give identical answers. Depending on system size and the @@ -106,6 +112,9 @@ specified in the pair_style command, it is used for both the LJ and Coulombic terms. If two cutoffs are specified, they are used as cutoffs for the LJ and Coulombic terms respectively. +Style {lj/cut/coul/cut/gpu} is a GPU-enabled version of style {lj/cut/coul/cut}. +See more details below. + Style {lj/cut/coul/debye} adds an additional exp() damping factor to the Coulombic term, given by @@ -122,6 +131,9 @@ option. The Coulombic cutoff specified for this style means that pairwise interactions within this distance are computed directly; interactions outside that distance are computed in reciprocal space. +Style {lj/cut/coul/long/gpu} is a GPU-enabled version of style {lj/cut/coul/long}. +See more details below. + Style {lj/cut/coul/long/tip4p} implements the TIP4P water model of "(Jorgensen)"_#Jorgensen, which introduces a massless site located a short distance away from the oxygen atom along the bisector of the HOH @@ -168,9 +180,10 @@ Coulombic cutoff specified in the pair_style command. :line -The {lj/cut/gpu} style is identical to the {lj/cut} style, except that -each processor off-loads its pairwise calculations to a GPU chip. -Depending on the hardware available on your system this can provide a +The {lj/cut/gpu}, {lj/cut/coul/cut/gpu}, and {lj/cut/coul/long/gpu} styles +are identical to the {lj/cut}, {lj/cut/coul/cut}, and {lj/cut/coul/long} +styles, except that each processor off-loads its pairwise calculations to a +GPU chip. Depending on the hardware available on your system this can provide a speed-up. See the "Running on GPUs"_Section_start.html#2_8 section of the manual for more details about hardware and software requirements for using GPUs. @@ -195,10 +208,12 @@ More details about these settings and various possible hardware configuration are in "this section"_Section_start.html#2_8 of the manual. -Additional requirements in your input script to run with style -{lj/cut/gpu} are as follows: +Additional requirements in your input script to run with GPU-enabled styles +are as follows: -The "newton pair"_newton.html setting must be {off}. +The "newton pair"_newton.html setting must be {off} and +"fix gpu"_fix_gpu.html must be used. The fix controls +the essential GPU selection and initialization steps. :line @@ -239,7 +254,8 @@ See the "run_style"_run_style.html command for details. [Restrictions:] The {lj/cut/coul/long} and {lj/cut/coul/long/tip4p} styles are part of -the "kspace" package. The {lj/cut/gpu} style is part of the "gpu" +the "kspace" package. The {lj/cut/gpu}, {lj/cut/coul/cut/gpu}, and +{lj/cut/coul/long/gpu} styles are part of the "gpu" package. The {lj/cut/opt} style is part of the "opt" package. They are only enabled if LAMMPS was built with those packages. See the "Making LAMMPS"_Section_start.html#2_3 section for more info. Note diff --git a/doc/pair_lj96_cut.html b/doc/pair_lj96_cut.html index 7b0b9f184a..997ba0983c 100644 --- a/doc/pair_lj96_cut.html +++ b/doc/pair_lj96_cut.html @@ -11,15 +11,19 @@

    pair_style lj96/cut command

    +

    pair_style lj96/cut/gpu command +

    Syntax:

    -
    pair_style lj96/cut cutoff 
    +
    pair_style style cutoff 
     
    -
    • cutoff = global cutoff for lj96/cut interactions (distance units) +
      • style = lj96/cut or lj96/cut/gpu +
      • cutoff = global cutoff for lj96/cut interactions (distance units)

      Examples:

      pair_style lj96/cut 2.5
      +pair_style lj96/cut/gpu 2.5
       pair_coeff * * 1.0 1.0 4.0
       pair_coeff 1 1 1.0 1.0 
       
      @@ -32,6 +36,9 @@ of the standard 12/6 potential, given by

      Rc is the cutoff.

      +

      Style lj96/cut/gpu is a GPU-enabled version of style lj96/cut. +See more details below. +

      The following coefficients must be defined for each pair of atoms types via the pair_coeff command as in the examples above, or in the data file or restart files read by the @@ -47,6 +54,26 @@ cutoff specified in the pair_style command is used.


      +

      The lj96/cut/gpu style is identical to the lj96/cut style, except that +each processor off-loads its pairwise calculations to a +GPU chip. Depending on the hardware available on your system this can provide a +speed-up. See the Running on GPUs section of +the manual for more details about hardware and software requirements +for using GPUs. +

      +

      More details about these settings and various possible hardware +configuration are in this section of the +manual. +

      +

      Additional requirements in your input script to run with the lj96/cut/gpu +style are as follows: +

      +

      The newton pair setting must be off and +fix gpu must be used. The fix controls the +essential GPU selection and initialization steps +

      +
      +

      Mixing, shift, table, tail correction, restart, rRESPA info:

      For atom type pairs I,J and I != J, the epsilon and sigma coefficients @@ -76,7 +103,11 @@ details.


      -

      Restrictions: none +

      Restrictions: +

      +

      The lj96/cut/gpu style is part of the "gpu" package. It +is only enabled if LAMMPS is built with this packages. See the +Making LAMMPS section for more info.

      Related commands:

      diff --git a/doc/pair_lj96_cut.txt b/doc/pair_lj96_cut.txt index 1f82e5dbd7..892fc6fa8a 100644 --- a/doc/pair_lj96_cut.txt +++ b/doc/pair_lj96_cut.txt @@ -7,16 +7,19 @@ :line pair_style lj96/cut command :h3 +pair_style lj96/cut/gpu command :h3 [Syntax:] -pair_style lj96/cut cutoff :pre +pair_style style cutoff :pre +style = {lj96/cut} or {lj96/cut/gpu} cutoff = global cutoff for lj96/cut interactions (distance units) :ul [Examples:] pair_style lj96/cut 2.5 +pair_style lj96/cut/gpu 2.5 pair_coeff * * 1.0 1.0 4.0 pair_coeff 1 1 1.0 1.0 :pre @@ -29,6 +32,9 @@ of the standard 12/6 potential, given by Rc is the cutoff. +Style {lj96/cut/gpu} is a GPU-enabled version of style {lj96/cut}. +See more details below. + The following coefficients must be defined for each pair of atoms types via the "pair_coeff"_pair_coeff.html command as in the examples above, or in the data file or restart files read by the @@ -44,6 +50,26 @@ cutoff specified in the pair_style command is used. :line +The {lj96/cut/gpu} style is identical to the {lj96/cut} style, except that +each processor off-loads its pairwise calculations to a +GPU chip. Depending on the hardware available on your system this can provide a +speed-up. See the "Running on GPUs"_Section_start.html#2_8 section of +the manual for more details about hardware and software requirements +for using GPUs. + +More details about these settings and various possible hardware +configuration are in "this section"_Section_start.html#2_8 of the +manual. + +Additional requirements in your input script to run with the {lj96/cut/gpu} +style are as follows: + +The "newton pair"_newton.html setting must be {off} and +"fix gpu"_fix_gpu.html must be used. The fix controls the +essential GPU selection and initialization steps + +:line + [Mixing, shift, table, tail correction, restart, rRESPA info]: For atom type pairs I,J and I != J, the epsilon and sigma coefficients @@ -73,7 +99,11 @@ details. :line -[Restrictions:] none +[Restrictions:] + +The {lj96/cut/gpu} style is part of the "gpu" package. It +is only enabled if LAMMPS is built with this packages. See the +"Making LAMMPS"_Section_start.html#2_3 section for more info. [Related commands:]