diff --git a/doc/Section_accelerate.html b/doc/Section_accelerate.html index 4b4838ba70..b64c88ca40 100644 --- a/doc/Section_accelerate.html +++ b/doc/Section_accelerate.html @@ -190,55 +190,16 @@ from the GPU package, you can either append "gpu" to the style name switch, or use the suffix command.

-

The fix gpu command controls the GPU selection and -initialization steps. +

The package gpu command must be used near the beginning +of your script to control the GPU selection and initialization steps. +It also enables asynchronous splitting of force computations between +the CPUs and GPUs.

-

The format for the fix is: -

-
fix fix-ID all gpu mode first last split 
-
-

where fix-ID is the name for the fix. The gpu fix must be the first -fix specified for a given run, otherwise LAMMPS will exit with an -error. The gpu fix does not have any effect on runs that do not use -GPU acceleration, so there should be no problem specifying the fix -first in any input script. -

-

The mode setting can be either "force" or "force/neigh". In the -former, neighbor list calculation is performed on the CPU using the -standard LAMMPS routines. In the latter, the neighbor list calculation -is performed on the GPU. The GPU neighbor list can be used for better -performance, however, it cannot not be used with a triclinic box or -with hybrid pair styles. -

-

There are cases when it may be more efficient to select the CPU for -neighbor list builds. If a non-GPU enabled style (e.g. a fix or -compute) requires a neighbor list, it will also be built using CPU -routines. Redundant CPU and GPU neighbor list calculations will -typically be less efficient. -

-

The first setting is the ID (as reported by -lammps/lib/gpu/nvc_get_devices) of the first GPU that will be used on -each node. The last setting is the ID of the last GPU that will be -used on each node. If you have only one GPU per node, first and -last will typically both be 0. Selecting a non-sequential set of GPU -IDs (e.g. 0,1,3) is not currently supported. -

-

The split setting is the fraction of particles whose forces, -torques, energies, and/or virials will be calculated on the GPU. This -can be used to perform CPU and GPU force calculations simultaneously, -e.g. on a hybrid node with a multicore CPU and a GPU(s). If split -is negative, the software will attempt to calculate the optimal -fraction automatically every 25 timesteps based on CPU and GPU -timings. Because the GPU speedups are dependent on the number of -particles, automatic calculation of the split can be less efficient, -but typically results in loop times within 20% of an optimal fixed -split. -

-

As an example, if you have two GPUs per node, 8 CPU cores per node, +

As an example, if you have two GPUs per node and 8 CPU cores per node, and would like to run on 4 nodes (32 cores) with dynamic balancing of -force calculation across CPU and GPU cores, the fix might be +force calculation across CPU and GPU cores, you could specify

-
fix 0 all gpu force/neigh 0 1 -1 
+
package gpu force/neigh 0 1 -1 
 

In this case, all CPU cores and GPU devices on the nodes would be utilized. Each GPU device would be shared by 4 CPU cores. The CPU @@ -246,39 +207,14 @@ cores would perform force calculations for some fraction of the particles at the same time the GPUs performed force calculation for the other particles.

-

Asynchronous pair computation on GPU and CPU -

-

The GPU accelerated pair styles can perform pair style force -calculation on the GPU at the same time other force calculations -within LAMMPS are being performed on the CPU. These include pair, -bond, angle, etc forces as well as long-range Coulombic forces. This -is enabled by the split setting in the gpu fix as described above. -

-

With a split setting less than 1.0, a portion of the pair-wise force -calculations will also be performed on the CPU. When the CPU finishes -its pair style computations (if any), the next LAMMPS force -computation will begin (bond, angle, etc), possibly before the GPU has -finished its pair style computations. -

-

This means that if split is set to 1.0, the GPU will begin the -LAMMPS force computation immediately. This can be used to run a -hybrid GPU pair style at the same time as a hybrid -CPU pair style. In this case, the GPU pair style should be first in -the hybrid command in order to perform simultaneous calculations. This -also allows bond, angle, -dihedral, improper, and -long-range force computations to run -simultaneously with the GPU pair style. If all CPU force computations -complete before the GPU, LAMMPS will block until the GPU has finished -before continuing the timestep. -

Timing output:

-

As noted above, GPU accelerated pair styles can perform computations -asynchronously with CPU computations. The "Pair" time reported by -LAMMPS will be the maximum of the time required to complete the CPU -pair style computations and the time required to complete the GPU pair -style computations. Any time spent for GPU-enabled pair styles for +

As described by the package gpu command, GPU +accelerated pair styles can perform computations asynchronously with +CPU computations. The "Pair" time reported by LAMMPS will be the +maximum of the time required to complete the CPU pair style +computations and the time required to complete the GPU pair style +computations. Any time spent for GPU-enabled pair styles for computations that run simultaneously with bond, angle, dihedral, improper, and long-range diff --git a/doc/Section_accelerate.txt b/doc/Section_accelerate.txt index c67655f13e..b348fe207d 100644 --- a/doc/Section_accelerate.txt +++ b/doc/Section_accelerate.txt @@ -185,55 +185,16 @@ from the GPU package, you can either append "gpu" to the style name switch"_Section_start.html#2_6, or use the "suffix"_suffix.html command. -The "fix gpu"_fix_gpu.html command controls the GPU selection and -initialization steps. +The "package gpu"_package.html command must be used near the beginning +of your script to control the GPU selection and initialization steps. +It also enables asynchronous splitting of force computations between +the CPUs and GPUs. -The format for the fix is: - -fix fix-ID all gpu {mode} {first} {last} {split} :pre - -where fix-ID is the name for the fix. The gpu fix must be the first -fix specified for a given run, otherwise LAMMPS will exit with an -error. The gpu fix does not have any effect on runs that do not use -GPU acceleration, so there should be no problem specifying the fix -first in any input script. - -The {mode} setting can be either "force" or "force/neigh". In the -former, neighbor list calculation is performed on the CPU using the -standard LAMMPS routines. In the latter, the neighbor list calculation -is performed on the GPU. The GPU neighbor list can be used for better -performance, however, it cannot not be used with a triclinic box or -with "hybrid"_pair_hybrid.html pair styles. - -There are cases when it may be more efficient to select the CPU for -neighbor list builds. If a non-GPU enabled style (e.g. a fix or -compute) requires a neighbor list, it will also be built using CPU -routines. Redundant CPU and GPU neighbor list calculations will -typically be less efficient. - -The {first} setting is the ID (as reported by -lammps/lib/gpu/nvc_get_devices) of the first GPU that will be used on -each node. The {last} setting is the ID of the last GPU that will be -used on each node. If you have only one GPU per node, {first} and -{last} will typically both be 0. Selecting a non-sequential set of GPU -IDs (e.g. 0,1,3) is not currently supported. - -The {split} setting is the fraction of particles whose forces, -torques, energies, and/or virials will be calculated on the GPU. This -can be used to perform CPU and GPU force calculations simultaneously, -e.g. on a hybrid node with a multicore CPU and a GPU(s). If {split} -is negative, the software will attempt to calculate the optimal -fraction automatically every 25 timesteps based on CPU and GPU -timings. Because the GPU speedups are dependent on the number of -particles, automatic calculation of the split can be less efficient, -but typically results in loop times within 20% of an optimal fixed -split. - -As an example, if you have two GPUs per node, 8 CPU cores per node, +As an example, if you have two GPUs per node and 8 CPU cores per node, and would like to run on 4 nodes (32 cores) with dynamic balancing of -force calculation across CPU and GPU cores, the fix might be +force calculation across CPU and GPU cores, you could specify -fix 0 all gpu force/neigh 0 1 -1 :pre +package gpu force/neigh 0 1 -1 :pre In this case, all CPU cores and GPU devices on the nodes would be utilized. Each GPU device would be shared by 4 CPU cores. The CPU @@ -241,39 +202,14 @@ cores would perform force calculations for some fraction of the particles at the same time the GPUs performed force calculation for the other particles. -[Asynchronous pair computation on GPU and CPU] - -The GPU accelerated pair styles can perform pair style force -calculation on the GPU at the same time other force calculations -within LAMMPS are being performed on the CPU. These include pair, -bond, angle, etc forces as well as long-range Coulombic forces. This -is enabled by the {split} setting in the gpu fix as described above. - -With a {split} setting less than 1.0, a portion of the pair-wise force -calculations will also be performed on the CPU. When the CPU finishes -its pair style computations (if any), the next LAMMPS force -computation will begin (bond, angle, etc), possibly before the GPU has -finished its pair style computations. - -This means that if {split} is set to 1.0, the GPU will begin the -LAMMPS force computation immediately. This can be used to run a -"hybrid"_pair_hybrid.html GPU pair style at the same time as a hybrid -CPU pair style. In this case, the GPU pair style should be first in -the hybrid command in order to perform simultaneous calculations. This -also allows "bond"_bond_style.html, "angle"_angle_style.html, -"dihedral"_dihedral_style.html, "improper"_improper_style.html, and -"long-range"_kspace_style.html force computations to run -simultaneously with the GPU pair style. If all CPU force computations -complete before the GPU, LAMMPS will block until the GPU has finished -before continuing the timestep. - [Timing output:] -As noted above, GPU accelerated pair styles can perform computations -asynchronously with CPU computations. The "Pair" time reported by -LAMMPS will be the maximum of the time required to complete the CPU -pair style computations and the time required to complete the GPU pair -style computations. Any time spent for GPU-enabled pair styles for +As described by the "package gpu"_package.html command, GPU +accelerated pair styles can perform computations asynchronously with +CPU computations. The "Pair" time reported by LAMMPS will be the +maximum of the time required to complete the CPU pair style +computations and the time required to complete the GPU pair style +computations. Any time spent for GPU-enabled pair styles for computations that run simultaneously with "bond"_bond_style.html, "angle"_angle_style.html, "dihedral"_dihedral_style.html, "improper"_improper_style.html, and "long-range"_kspace_style.html diff --git a/doc/Section_commands.html b/doc/Section_commands.html index 941b2a1de2..0618a5d250 100644 --- a/doc/Section_commands.html +++ b/doc/Section_commands.html @@ -338,15 +338,14 @@ of each style or click on the style itself for a full description:

- - - - - - - - - + + + + + + +
adaptaddforceaveforceave/atomave/correlateave/histoave/spatialave/time
bond/breakbond/createbond/swapbox/relaxdeformdepositdragdt/reset
efieldenforce2devaporateexternalfreezegpugravityheat
indentlangevinlineforcemomentummovemsstnebnph
nph/aspherenph/spherenptnpt/aspherenpt/spherenvenve/aspherenve/limit
nve/noforcenve/spherenvtnvt/aspherenvt/sllodnvt/sphereorient/fccplaneforce
poemspourpress/berendsenprintqeq/combreax/bondsrecenterrigid
rigid/nverigid/nvtsetforceshakespringspring/rgspring/selfsrd
store/forcestore/statetemp/berendsentemp/rescalethermal/conductivitytmdttmviscosity
viscouswall/colloidwall/granwall/harmonicwall/lj126wall/lj93wall/reflectwall/region
wall/srd +
efieldenforce2devaporateexternalfreezegravityheatindent
langevinlineforcemomentummovemsstnebnphnph/asphere
nph/spherenptnpt/aspherenpt/spherenvenve/aspherenve/limitnve/noforce
nve/spherenvtnvt/aspherenvt/sllodnvt/sphereorient/fccplaneforcepoems
pourpress/berendsenprintqeq/combreax/bondsrecenterrigidrigid/nve
rigid/nvtsetforceshakespringspring/rgspring/selfsrdstore/force
store/statetemp/berendsentemp/rescalethermal/conductivitytmdttmviscosityviscous
wall/colloidwall/granwall/harmonicwall/lj126wall/lj93wall/reflectwall/regionwall/srd

These are fix styles contributed by users, which can be used if diff --git a/doc/Section_commands.txt b/doc/Section_commands.txt index 3635a753f5..f9b9b1a189 100644 --- a/doc/Section_commands.txt +++ b/doc/Section_commands.txt @@ -418,7 +418,6 @@ of each style or click on the style itself for a full description: "evaporate"_fix_evaporate.html, "external"_fix_external.html, "freeze"_fix_freeze.html, -"gpu"_fix_gpu.html, "gravity"_fix_gravity.html, "heat"_fix_heat.html, "indent"_fix_indent.html, diff --git a/doc/fix_gpu.html b/doc/fix_gpu.html deleted file mode 100644 index d48e510798..0000000000 --- a/doc/fix_gpu.html +++ /dev/null @@ -1,112 +0,0 @@ - -

LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Commands -
- - - - - - -
- -

fix gpu command -

-

Syntax: -

-
fix ID group-ID gpu mode first last split 
-
- -

Examples: -

-
fix 0 all gpu force 0 0 1.0
-fix 0 all gpu force 0 0 0.75
-fix 0 all gpu force/neigh 0 0 1.0
-fix 0 all gpu force/neigh 0 1 -1.0 
-
-

Description: -

-

Select and initialize GPUs to be used for acceleration and configure -GPU acceleration in LAMMPS. This fix is required in order to use -any style with GPU acceleration. The fix must be the first fix -specified for a run or an error will be generated. The fix will not have an -effect on any LAMMPS computations that do not use GPU acceleration, so there -should not be any problems with specifying this fix first in input scripts. -

-

The mode setting specifies where neighbor list calculations will be -performed. If mode is force, neighbor list calculation is performed -on the CPU. If mode is force/neigh, neighbor list calculation is -performed on the GPU. GPU neighbor list calculation currently cannot -be used with a triclinic box. GPU neighbor list calculation currently -cannot be used with hybrid pair styles. GPU -neighbor lists are not compatible with styles that are not -GPU-enabled. When a non-GPU enabled style requires a neighbor list, -it will also be built using CPU routines. In these cases, it will -typically be more efficient to only use CPU neighbor list builds. -

-

The first and last settings specify the GPUs that will be used for -simulation. On each node, the GPU IDs in the inclusive range from -first to last will be used. -

-

The split setting can be used for load balancing force calculation -work between CPU and GPU cores in GPU-enabled pair styles. If -0<split<1.0, a fixed fraction of particles is offloaded to the GPU -while force calculation for the other particles occurs simulataneously -on the CPU. If split<0, the optimal fraction (based on CPU and GPU -timings) is calculated every 25 timesteps. If split=1.0, all force -calculations for GPU accelerated pair styles are performed on the -GPU. In this case, hybrid, bond, -angle, dihedral, -improper, and long-range -calculations can be performed on the CPU while the GPU is performing -force calculations for the GPU-enabled pair style. -

-

In order to use GPU acceleration, a GPU enabled style must be selected -in the input script in addition to this fix. Currently, this is -limited to a few pair styles and the PPPM kspace -style. -

-

See this section of the manual for more -details about using the GPU package. -

-

Restart, fix_modify, output, run start/stop, minimize info: -

-

This fix is part of the "gpu" package. It is only enabled if LAMMPS -was built with that package. See the Making -LAMMPS section for more info. -

-

No information about this fix is written to binary restart -files. None of the fix_modify options -are relevant to this fix. -

-

No parameter of this fix can be used with the start/stop keywords of -the run command. -

-

Restrictions: -

-

The fix must be the first fix specified for a given run. The -force/neigh mode should not be used with a triclinic box or -hybrid pair styles. -

-

The split setting must be positive when using -hybrid pair styles. -

-

Currently, group-ID must be all. -

-

Related commands: none -

-

Default: none -

- diff --git a/doc/fix_gpu.txt b/doc/fix_gpu.txt deleted file mode 100644 index 6abf729e74..0000000000 --- a/doc/fix_gpu.txt +++ /dev/null @@ -1,102 +0,0 @@ -"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c - -:link(lws,http://lammps.sandia.gov) -:link(ld,Manual.html) -:link(lc,Section_commands.html#comm) - -:line - -fix gpu command :h3 - -[Syntax:] - -fix ID group-ID gpu mode first last split :pre - -ID, group-ID are documented in "fix"_fix.html command :ulb,l -gpu = style name of this fix command :l -mode = force or force/neigh :l -first = ID of first GPU to be used on each node :l -last = ID of last GPU to be used on each node :l -split = fraction of particles assigned to the GPU :l -:ule - -[Examples:] - -fix 0 all gpu force 0 0 1.0 -fix 0 all gpu force 0 0 0.75 -fix 0 all gpu force/neigh 0 0 1.0 -fix 0 all gpu force/neigh 0 1 -1.0 :pre - -[Description:] - -Select and initialize GPUs to be used for acceleration and configure -GPU acceleration in LAMMPS. This fix is required in order to use -any style with GPU acceleration. The fix must be the first fix -specified for a run or an error will be generated. The fix will not have an -effect on any LAMMPS computations that do not use GPU acceleration, so there -should not be any problems with specifying this fix first in input scripts. - -The {mode} setting specifies where neighbor list calculations will be -performed. If {mode} is force, neighbor list calculation is performed -on the CPU. If {mode} is force/neigh, neighbor list calculation is -performed on the GPU. GPU neighbor list calculation currently cannot -be used with a triclinic box. GPU neighbor list calculation currently -cannot be used with "hybrid"_pair_hybrid.html pair styles. GPU -neighbor lists are not compatible with styles that are not -GPU-enabled. When a non-GPU enabled style requires a neighbor list, -it will also be built using CPU routines. In these cases, it will -typically be more efficient to only use CPU neighbor list builds. - -The {first} and {last} settings specify the GPUs that will be used for -simulation. On each node, the GPU IDs in the inclusive range from -{first} to {last} will be used. - -The {split} setting can be used for load balancing force calculation -work between CPU and GPU cores in GPU-enabled pair styles. If -0<{split}<1.0, a fixed fraction of particles is offloaded to the GPU -while force calculation for the other particles occurs simulataneously -on the CPU. If {split}<0, the optimal fraction (based on CPU and GPU -timings) is calculated every 25 timesteps. If {split}=1.0, all force -calculations for GPU accelerated pair styles are performed on the -GPU. In this case, "hybrid"_pair_hybrid.html, "bond"_bond_style.html, -"angle"_angle_style.html, "dihedral"_dihedral_style.html, -"improper"_improper_style.html, and "long-range"_kspace_style.html -calculations can be performed on the CPU while the GPU is performing -force calculations for the GPU-enabled pair style. - -In order to use GPU acceleration, a GPU enabled style must be selected -in the input script in addition to this fix. Currently, this is -limited to a few "pair styles"_pair_style.html and the PPPM "kspace -style"_kspace_style.html. - -See "this section"_doc/Section_accerate.html of the manual for more -details about using the GPU package. - -[Restart, fix_modify, output, run start/stop, minimize info:] - -This fix is part of the "gpu" package. It is only enabled if LAMMPS -was built with that package. See the "Making -LAMMPS"_Section_start.html#2_3 section for more info. - -No information about this fix is written to "binary restart -files"_restart.html. None of the "fix_modify"_fix_modify.html options -are relevant to this fix. - -No parameter of this fix can be used with the {start/stop} keywords of -the "run"_run.html command. - -[Restrictions:] - -The fix must be the first fix specified for a given run. The -force/neigh {mode} should not be used with a triclinic box or -"hybrid"_pair_hybrid.html pair styles. - -The {split} setting must be positive when using -"hybrid"_pair_hybrid.html pair styles. - -Currently, group-ID must be all. - -[Related commands:] none - -[Default:] none - diff --git a/doc/package.html b/doc/package.html index 814340bc81..c1b5b0bebf 100644 --- a/doc/package.html +++ b/doc/package.html @@ -15,39 +15,136 @@

package style args 
 
-