git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@6697 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp
2011-08-17 14:20:30 +00:00
parent 7d79f54a3c
commit a5e5cfec02
8 changed files with 247 additions and 402 deletions

View File

@ -190,55 +190,16 @@ from the GPU package, you can either append "gpu" to the style name
switch</A>, or use the <A HREF = "suffix.html">suffix</A>
command.
</P>
<P>The <A HREF = "fix_gpu.html">fix gpu</A> command controls the GPU selection and
initialization steps.
<P>The <A HREF = "package.html">package gpu</A> command must be used near the beginning
of your script to control the GPU selection and initialization steps.
It also enables asynchronous splitting of force computations between
the CPUs and GPUs.
</P>
<P>The format for the fix is:
</P>
<PRE>fix fix-ID all gpu <I>mode</I> <I>first</I> <I>last</I> <I>split</I>
</PRE>
<P>where fix-ID is the name for the fix. The gpu fix must be the first
fix specified for a given run, otherwise LAMMPS will exit with an
error. The gpu fix does not have any effect on runs that do not use
GPU acceleration, so there should be no problem specifying the fix
first in any input script.
</P>
<P>The <I>mode</I> setting can be either "force" or "force/neigh". In the
former, neighbor list calculation is performed on the CPU using the
standard LAMMPS routines. In the latter, the neighbor list calculation
is performed on the GPU. The GPU neighbor list can be used for better
performance, however, it cannot not be used with a triclinic box or
with <A HREF = "pair_hybrid.html">hybrid</A> pair styles.
</P>
<P>There are cases when it may be more efficient to select the CPU for
neighbor list builds. If a non-GPU enabled style (e.g. a fix or
compute) requires a neighbor list, it will also be built using CPU
routines. Redundant CPU and GPU neighbor list calculations will
typically be less efficient.
</P>
<P>The <I>first</I> setting is the ID (as reported by
lammps/lib/gpu/nvc_get_devices) of the first GPU that will be used on
each node. The <I>last</I> setting is the ID of the last GPU that will be
used on each node. If you have only one GPU per node, <I>first</I> and
<I>last</I> will typically both be 0. Selecting a non-sequential set of GPU
IDs (e.g. 0,1,3) is not currently supported.
</P>
<P>The <I>split</I> setting is the fraction of particles whose forces,
torques, energies, and/or virials will be calculated on the GPU. This
can be used to perform CPU and GPU force calculations simultaneously,
e.g. on a hybrid node with a multicore CPU and a GPU(s). If <I>split</I>
is negative, the software will attempt to calculate the optimal
fraction automatically every 25 timesteps based on CPU and GPU
timings. Because the GPU speedups are dependent on the number of
particles, automatic calculation of the split can be less efficient,
but typically results in loop times within 20% of an optimal fixed
split.
</P>
<P>As an example, if you have two GPUs per node, 8 CPU cores per node,
<P>As an example, if you have two GPUs per node and 8 CPU cores per node,
and would like to run on 4 nodes (32 cores) with dynamic balancing of
force calculation across CPU and GPU cores, the fix might be
force calculation across CPU and GPU cores, you could specify
</P>
<PRE>fix 0 all gpu force/neigh 0 1 -1
<PRE>package gpu force/neigh 0 1 -1
</PRE>
<P>In this case, all CPU cores and GPU devices on the nodes would be
utilized. Each GPU device would be shared by 4 CPU cores. The CPU
@ -246,39 +207,14 @@ cores would perform force calculations for some fraction of the
particles at the same time the GPUs performed force calculation for
the other particles.
</P>
<P><B>Asynchronous pair computation on GPU and CPU</B>
</P>
<P>The GPU accelerated pair styles can perform pair style force
calculation on the GPU at the same time other force calculations
within LAMMPS are being performed on the CPU. These include pair,
bond, angle, etc forces as well as long-range Coulombic forces. This
is enabled by the <I>split</I> setting in the gpu fix as described above.
</P>
<P>With a <I>split</I> setting less than 1.0, a portion of the pair-wise force
calculations will also be performed on the CPU. When the CPU finishes
its pair style computations (if any), the next LAMMPS force
computation will begin (bond, angle, etc), possibly before the GPU has
finished its pair style computations.
</P>
<P>This means that if <I>split</I> is set to 1.0, the GPU will begin the
LAMMPS force computation immediately. This can be used to run a
<A HREF = "pair_hybrid.html">hybrid</A> GPU pair style at the same time as a hybrid
CPU pair style. In this case, the GPU pair style should be first in
the hybrid command in order to perform simultaneous calculations. This
also allows <A HREF = "bond_style.html">bond</A>, <A HREF = "angle_style.html">angle</A>,
<A HREF = "dihedral_style.html">dihedral</A>, <A HREF = "improper_style.html">improper</A>, and
<A HREF = "kspace_style.html">long-range</A> force computations to run
simultaneously with the GPU pair style. If all CPU force computations
complete before the GPU, LAMMPS will block until the GPU has finished
before continuing the timestep.
</P>
<P><B>Timing output:</B>
</P>
<P>As noted above, GPU accelerated pair styles can perform computations
asynchronously with CPU computations. The "Pair" time reported by
LAMMPS will be the maximum of the time required to complete the CPU
pair style computations and the time required to complete the GPU pair
style computations. Any time spent for GPU-enabled pair styles for
<P>As described by the <A HREF = "package.html">package gpu</A> command, GPU
accelerated pair styles can perform computations asynchronously with
CPU computations. The "Pair" time reported by LAMMPS will be the
maximum of the time required to complete the CPU pair style
computations and the time required to complete the GPU pair style
computations. Any time spent for GPU-enabled pair styles for
computations that run simultaneously with <A HREF = "bond_style.html">bond</A>,
<A HREF = "angle_style.html">angle</A>, <A HREF = "dihedral_style.html">dihedral</A>,
<A HREF = "improper_style.html">improper</A>, and <A HREF = "kspace_style.html">long-range</A>

View File

@ -185,55 +185,16 @@ from the GPU package, you can either append "gpu" to the style name
switch"_Section_start.html#2_6, or use the "suffix"_suffix.html
command.
The "fix gpu"_fix_gpu.html command controls the GPU selection and
initialization steps.
The "package gpu"_package.html command must be used near the beginning
of your script to control the GPU selection and initialization steps.
It also enables asynchronous splitting of force computations between
the CPUs and GPUs.
The format for the fix is:
fix fix-ID all gpu {mode} {first} {last} {split} :pre
where fix-ID is the name for the fix. The gpu fix must be the first
fix specified for a given run, otherwise LAMMPS will exit with an
error. The gpu fix does not have any effect on runs that do not use
GPU acceleration, so there should be no problem specifying the fix
first in any input script.
The {mode} setting can be either "force" or "force/neigh". In the
former, neighbor list calculation is performed on the CPU using the
standard LAMMPS routines. In the latter, the neighbor list calculation
is performed on the GPU. The GPU neighbor list can be used for better
performance, however, it cannot not be used with a triclinic box or
with "hybrid"_pair_hybrid.html pair styles.
There are cases when it may be more efficient to select the CPU for
neighbor list builds. If a non-GPU enabled style (e.g. a fix or
compute) requires a neighbor list, it will also be built using CPU
routines. Redundant CPU and GPU neighbor list calculations will
typically be less efficient.
The {first} setting is the ID (as reported by
lammps/lib/gpu/nvc_get_devices) of the first GPU that will be used on
each node. The {last} setting is the ID of the last GPU that will be
used on each node. If you have only one GPU per node, {first} and
{last} will typically both be 0. Selecting a non-sequential set of GPU
IDs (e.g. 0,1,3) is not currently supported.
The {split} setting is the fraction of particles whose forces,
torques, energies, and/or virials will be calculated on the GPU. This
can be used to perform CPU and GPU force calculations simultaneously,
e.g. on a hybrid node with a multicore CPU and a GPU(s). If {split}
is negative, the software will attempt to calculate the optimal
fraction automatically every 25 timesteps based on CPU and GPU
timings. Because the GPU speedups are dependent on the number of
particles, automatic calculation of the split can be less efficient,
but typically results in loop times within 20% of an optimal fixed
split.
As an example, if you have two GPUs per node, 8 CPU cores per node,
As an example, if you have two GPUs per node and 8 CPU cores per node,
and would like to run on 4 nodes (32 cores) with dynamic balancing of
force calculation across CPU and GPU cores, the fix might be
force calculation across CPU and GPU cores, you could specify
fix 0 all gpu force/neigh 0 1 -1 :pre
package gpu force/neigh 0 1 -1 :pre
In this case, all CPU cores and GPU devices on the nodes would be
utilized. Each GPU device would be shared by 4 CPU cores. The CPU
@ -241,39 +202,14 @@ cores would perform force calculations for some fraction of the
particles at the same time the GPUs performed force calculation for
the other particles.
[Asynchronous pair computation on GPU and CPU]
The GPU accelerated pair styles can perform pair style force
calculation on the GPU at the same time other force calculations
within LAMMPS are being performed on the CPU. These include pair,
bond, angle, etc forces as well as long-range Coulombic forces. This
is enabled by the {split} setting in the gpu fix as described above.
With a {split} setting less than 1.0, a portion of the pair-wise force
calculations will also be performed on the CPU. When the CPU finishes
its pair style computations (if any), the next LAMMPS force
computation will begin (bond, angle, etc), possibly before the GPU has
finished its pair style computations.
This means that if {split} is set to 1.0, the GPU will begin the
LAMMPS force computation immediately. This can be used to run a
"hybrid"_pair_hybrid.html GPU pair style at the same time as a hybrid
CPU pair style. In this case, the GPU pair style should be first in
the hybrid command in order to perform simultaneous calculations. This
also allows "bond"_bond_style.html, "angle"_angle_style.html,
"dihedral"_dihedral_style.html, "improper"_improper_style.html, and
"long-range"_kspace_style.html force computations to run
simultaneously with the GPU pair style. If all CPU force computations
complete before the GPU, LAMMPS will block until the GPU has finished
before continuing the timestep.
[Timing output:]
As noted above, GPU accelerated pair styles can perform computations
asynchronously with CPU computations. The "Pair" time reported by
LAMMPS will be the maximum of the time required to complete the CPU
pair style computations and the time required to complete the GPU pair
style computations. Any time spent for GPU-enabled pair styles for
As described by the "package gpu"_package.html command, GPU
accelerated pair styles can perform computations asynchronously with
CPU computations. The "Pair" time reported by LAMMPS will be the
maximum of the time required to complete the CPU pair style
computations and the time required to complete the GPU pair style
computations. Any time spent for GPU-enabled pair styles for
computations that run simultaneously with "bond"_bond_style.html,
"angle"_angle_style.html, "dihedral"_dihedral_style.html,
"improper"_improper_style.html, and "long-range"_kspace_style.html

View File

@ -338,15 +338,14 @@ of each style or click on the style itself for a full description:
<DIV ALIGN=center><TABLE BORDER=1 >
<TR ALIGN="center"><TD ><A HREF = "fix_adapt.html">adapt</A></TD><TD ><A HREF = "fix_addforce.html">addforce</A></TD><TD ><A HREF = "fix_aveforce.html">aveforce</A></TD><TD ><A HREF = "fix_ave_atom.html">ave/atom</A></TD><TD ><A HREF = "fix_ave_correlate.html">ave/correlate</A></TD><TD ><A HREF = "fix_ave_histo.html">ave/histo</A></TD><TD ><A HREF = "fix_ave_spatial.html">ave/spatial</A></TD><TD ><A HREF = "fix_ave_time.html">ave/time</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_bond_break.html">bond/break</A></TD><TD ><A HREF = "fix_bond_create.html">bond/create</A></TD><TD ><A HREF = "fix_bond_swap.html">bond/swap</A></TD><TD ><A HREF = "fix_box_relax.html">box/relax</A></TD><TD ><A HREF = "fix_deform.html">deform</A></TD><TD ><A HREF = "fix_deposit.html">deposit</A></TD><TD ><A HREF = "fix_drag.html">drag</A></TD><TD ><A HREF = "fix_dt_reset.html">dt/reset</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_efield.html">efield</A></TD><TD ><A HREF = "fix_enforce2d.html">enforce2d</A></TD><TD ><A HREF = "fix_evaporate.html">evaporate</A></TD><TD ><A HREF = "fix_external.html">external</A></TD><TD ><A HREF = "fix_freeze.html">freeze</A></TD><TD ><A HREF = "fix_gpu.html">gpu</A></TD><TD ><A HREF = "fix_gravity.html">gravity</A></TD><TD ><A HREF = "fix_heat.html">heat</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_indent.html">indent</A></TD><TD ><A HREF = "fix_langevin.html">langevin</A></TD><TD ><A HREF = "fix_lineforce.html">lineforce</A></TD><TD ><A HREF = "fix_momentum.html">momentum</A></TD><TD ><A HREF = "fix_move.html">move</A></TD><TD ><A HREF = "fix_msst.html">msst</A></TD><TD ><A HREF = "fix_neb.html">neb</A></TD><TD ><A HREF = "fix_nh.html">nph</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_nph_asphere.html">nph/asphere</A></TD><TD ><A HREF = "fix_nph_sphere.html">nph/sphere</A></TD><TD ><A HREF = "fix_nh.html">npt</A></TD><TD ><A HREF = "fix_npt_asphere.html">npt/asphere</A></TD><TD ><A HREF = "fix_npt_sphere.html">npt/sphere</A></TD><TD ><A HREF = "fix_nve.html">nve</A></TD><TD ><A HREF = "fix_nve_asphere.html">nve/asphere</A></TD><TD ><A HREF = "fix_nve_limit.html">nve/limit</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_nve_noforce.html">nve/noforce</A></TD><TD ><A HREF = "fix_nve_sphere.html">nve/sphere</A></TD><TD ><A HREF = "fix_nh.html">nvt</A></TD><TD ><A HREF = "fix_nvt_asphere.html">nvt/asphere</A></TD><TD ><A HREF = "fix_nvt_sllod.html">nvt/sllod</A></TD><TD ><A HREF = "fix_nvt_sphere.html">nvt/sphere</A></TD><TD ><A HREF = "fix_orient_fcc.html">orient/fcc</A></TD><TD ><A HREF = "fix_planeforce.html">planeforce</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_poems.html">poems</A></TD><TD ><A HREF = "fix_pour.html">pour</A></TD><TD ><A HREF = "fix_press_berendsen.html">press/berendsen</A></TD><TD ><A HREF = "fix_print.html">print</A></TD><TD ><A HREF = "fix_qeq_comb.html">qeq/comb</A></TD><TD ><A HREF = "fix_reax_bonds.html">reax/bonds</A></TD><TD ><A HREF = "fix_recenter.html">recenter</A></TD><TD ><A HREF = "fix_rigid.html">rigid</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_rigid.html">rigid/nve</A></TD><TD ><A HREF = "fix_rigid.html">rigid/nvt</A></TD><TD ><A HREF = "fix_setforce.html">setforce</A></TD><TD ><A HREF = "fix_shake.html">shake</A></TD><TD ><A HREF = "fix_spring.html">spring</A></TD><TD ><A HREF = "fix_spring_rg.html">spring/rg</A></TD><TD ><A HREF = "fix_spring_self.html">spring/self</A></TD><TD ><A HREF = "fix_srd.html">srd</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_store_force.html">store/force</A></TD><TD ><A HREF = "fix_store_state.html">store/state</A></TD><TD ><A HREF = "fix_temp_berendsen.html">temp/berendsen</A></TD><TD ><A HREF = "fix_temp_rescale.html">temp/rescale</A></TD><TD ><A HREF = "fix_thermal_conductivity.html">thermal/conductivity</A></TD><TD ><A HREF = "fix_tmd.html">tmd</A></TD><TD ><A HREF = "fix_ttm.html">ttm</A></TD><TD ><A HREF = "fix_viscosity.html">viscosity</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_viscous.html">viscous</A></TD><TD ><A HREF = "fix_wall.html">wall/colloid</A></TD><TD ><A HREF = "fix_wall_gran.html">wall/gran</A></TD><TD ><A HREF = "fix_wall.html">wall/harmonic</A></TD><TD ><A HREF = "fix_wall.html">wall/lj126</A></TD><TD ><A HREF = "fix_wall.html">wall/lj93</A></TD><TD ><A HREF = "fix_wall_reflect.html">wall/reflect</A></TD><TD ><A HREF = "fix_wall_region.html">wall/region</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_wall_srd.html">wall/srd</A>
<TR ALIGN="center"><TD ><A HREF = "fix_efield.html">efield</A></TD><TD ><A HREF = "fix_enforce2d.html">enforce2d</A></TD><TD ><A HREF = "fix_evaporate.html">evaporate</A></TD><TD ><A HREF = "fix_external.html">external</A></TD><TD ><A HREF = "fix_freeze.html">freeze</A></TD><TD ><A HREF = "fix_gravity.html">gravity</A></TD><TD ><A HREF = "fix_heat.html">heat</A></TD><TD ><A HREF = "fix_indent.html">indent</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_langevin.html">langevin</A></TD><TD ><A HREF = "fix_lineforce.html">lineforce</A></TD><TD ><A HREF = "fix_momentum.html">momentum</A></TD><TD ><A HREF = "fix_move.html">move</A></TD><TD ><A HREF = "fix_msst.html">msst</A></TD><TD ><A HREF = "fix_neb.html">neb</A></TD><TD ><A HREF = "fix_nh.html">nph</A></TD><TD ><A HREF = "fix_nph_asphere.html">nph/asphere</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_nph_sphere.html">nph/sphere</A></TD><TD ><A HREF = "fix_nh.html">npt</A></TD><TD ><A HREF = "fix_npt_asphere.html">npt/asphere</A></TD><TD ><A HREF = "fix_npt_sphere.html">npt/sphere</A></TD><TD ><A HREF = "fix_nve.html">nve</A></TD><TD ><A HREF = "fix_nve_asphere.html">nve/asphere</A></TD><TD ><A HREF = "fix_nve_limit.html">nve/limit</A></TD><TD ><A HREF = "fix_nve_noforce.html">nve/noforce</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_nve_sphere.html">nve/sphere</A></TD><TD ><A HREF = "fix_nh.html">nvt</A></TD><TD ><A HREF = "fix_nvt_asphere.html">nvt/asphere</A></TD><TD ><A HREF = "fix_nvt_sllod.html">nvt/sllod</A></TD><TD ><A HREF = "fix_nvt_sphere.html">nvt/sphere</A></TD><TD ><A HREF = "fix_orient_fcc.html">orient/fcc</A></TD><TD ><A HREF = "fix_planeforce.html">planeforce</A></TD><TD ><A HREF = "fix_poems.html">poems</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_pour.html">pour</A></TD><TD ><A HREF = "fix_press_berendsen.html">press/berendsen</A></TD><TD ><A HREF = "fix_print.html">print</A></TD><TD ><A HREF = "fix_qeq_comb.html">qeq/comb</A></TD><TD ><A HREF = "fix_reax_bonds.html">reax/bonds</A></TD><TD ><A HREF = "fix_recenter.html">recenter</A></TD><TD ><A HREF = "fix_rigid.html">rigid</A></TD><TD ><A HREF = "fix_rigid.html">rigid/nve</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_rigid.html">rigid/nvt</A></TD><TD ><A HREF = "fix_setforce.html">setforce</A></TD><TD ><A HREF = "fix_shake.html">shake</A></TD><TD ><A HREF = "fix_spring.html">spring</A></TD><TD ><A HREF = "fix_spring_rg.html">spring/rg</A></TD><TD ><A HREF = "fix_spring_self.html">spring/self</A></TD><TD ><A HREF = "fix_srd.html">srd</A></TD><TD ><A HREF = "fix_store_force.html">store/force</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_store_state.html">store/state</A></TD><TD ><A HREF = "fix_temp_berendsen.html">temp/berendsen</A></TD><TD ><A HREF = "fix_temp_rescale.html">temp/rescale</A></TD><TD ><A HREF = "fix_thermal_conductivity.html">thermal/conductivity</A></TD><TD ><A HREF = "fix_tmd.html">tmd</A></TD><TD ><A HREF = "fix_ttm.html">ttm</A></TD><TD ><A HREF = "fix_viscosity.html">viscosity</A></TD><TD ><A HREF = "fix_viscous.html">viscous</A></TD></TR>
<TR ALIGN="center"><TD ><A HREF = "fix_wall.html">wall/colloid</A></TD><TD ><A HREF = "fix_wall_gran.html">wall/gran</A></TD><TD ><A HREF = "fix_wall.html">wall/harmonic</A></TD><TD ><A HREF = "fix_wall.html">wall/lj126</A></TD><TD ><A HREF = "fix_wall.html">wall/lj93</A></TD><TD ><A HREF = "fix_wall_reflect.html">wall/reflect</A></TD><TD ><A HREF = "fix_wall_region.html">wall/region</A></TD><TD ><A HREF = "fix_wall_srd.html">wall/srd</A>
</TD></TR></TABLE></DIV>
<P>These are fix styles contributed by users, which can be used if

View File

@ -418,7 +418,6 @@ of each style or click on the style itself for a full description:
"evaporate"_fix_evaporate.html,
"external"_fix_external.html,
"freeze"_fix_freeze.html,
"gpu"_fix_gpu.html,
"gravity"_fix_gravity.html,
"heat"_fix_heat.html,
"indent"_fix_indent.html,

View File

@ -1,112 +0,0 @@
<HTML>
<CENTER><A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A> - <A HREF = "Manual.html">LAMMPS Documentation</A> - <A HREF = "Section_commands.html#comm">LAMMPS Commands</A>
</CENTER>
<HR>
<H3>fix gpu command
</H3>
<P><B>Syntax:</B>
</P>
<PRE>fix ID group-ID gpu mode first last split
</PRE>
<UL><LI>ID, group-ID are documented in <A HREF = "fix.html">fix</A> command
<LI>gpu = style name of this fix command
<LI>mode = force or force/neigh
<LI>first = ID of first GPU to be used on each node
<LI>last = ID of last GPU to be used on each node
<LI>split = fraction of particles assigned to the GPU
</UL>
<P><B>Examples:</B>
</P>
<PRE>fix 0 all gpu force 0 0 1.0
fix 0 all gpu force 0 0 0.75
fix 0 all gpu force/neigh 0 0 1.0
fix 0 all gpu force/neigh 0 1 -1.0
</PRE>
<P><B>Description:</B>
</P>
<P>Select and initialize GPUs to be used for acceleration and configure
GPU acceleration in LAMMPS. This fix is required in order to use
any style with GPU acceleration. The fix must be the first fix
specified for a run or an error will be generated. The fix will not have an
effect on any LAMMPS computations that do not use GPU acceleration, so there
should not be any problems with specifying this fix first in input scripts.
</P>
<P>The <I>mode</I> setting specifies where neighbor list calculations will be
performed. If <I>mode</I> is force, neighbor list calculation is performed
on the CPU. If <I>mode</I> is force/neigh, neighbor list calculation is
performed on the GPU. GPU neighbor list calculation currently cannot
be used with a triclinic box. GPU neighbor list calculation currently
cannot be used with <A HREF = "pair_hybrid.html">hybrid</A> pair styles. GPU
neighbor lists are not compatible with styles that are not
GPU-enabled. When a non-GPU enabled style requires a neighbor list,
it will also be built using CPU routines. In these cases, it will
typically be more efficient to only use CPU neighbor list builds.
</P>
<P>The <I>first</I> and <I>last</I> settings specify the GPUs that will be used for
simulation. On each node, the GPU IDs in the inclusive range from
<I>first</I> to <I>last</I> will be used.
</P>
<P>The <I>split</I> setting can be used for load balancing force calculation
work between CPU and GPU cores in GPU-enabled pair styles. If
0<<I>split</I><1.0, a fixed fraction of particles is offloaded to the GPU
while force calculation for the other particles occurs simulataneously
on the CPU. If <I>split</I><0, the optimal fraction (based on CPU and GPU
timings) is calculated every 25 timesteps. If <I>split</I>=1.0, all force
calculations for GPU accelerated pair styles are performed on the
GPU. In this case, <A HREF = "pair_hybrid.html">hybrid</A>, <A HREF = "bond_style.html">bond</A>,
<A HREF = "angle_style.html">angle</A>, <A HREF = "dihedral_style.html">dihedral</A>,
<A HREF = "improper_style.html">improper</A>, and <A HREF = "kspace_style.html">long-range</A>
calculations can be performed on the CPU while the GPU is performing
force calculations for the GPU-enabled pair style.
</P>
<P>In order to use GPU acceleration, a GPU enabled style must be selected
in the input script in addition to this fix. Currently, this is
limited to a few <A HREF = "pair_style.html">pair styles</A> and the PPPM <A HREF = "kspace_style.html">kspace
style</A>.
</P>
<P>See <A HREF = "doc/Section_accerate.html">this section</A> of the manual for more
details about using the GPU package.
</P>
<P><B>Restart, fix_modify, output, run start/stop, minimize info:</B>
</P>
<P>This fix is part of the "gpu" package. It is only enabled if LAMMPS
was built with that package. See the <A HREF = "Section_start.html#2_3">Making
LAMMPS</A> section for more info.
</P>
<P>No information about this fix is written to <A HREF = "restart.html">binary restart
files</A>. None of the <A HREF = "fix_modify.html">fix_modify</A> options
are relevant to this fix.
</P>
<P>No parameter of this fix can be used with the <I>start/stop</I> keywords of
the <A HREF = "run.html">run</A> command.
</P>
<P><B>Restrictions:</B>
</P>
<P>The fix must be the first fix specified for a given run. The
force/neigh <I>mode</I> should not be used with a triclinic box or
<A HREF = "pair_hybrid.html">hybrid</A> pair styles.
</P>
<P>The <I>split</I> setting must be positive when using
<A HREF = "pair_hybrid.html">hybrid</A> pair styles.
</P>
<P>Currently, group-ID must be all.
</P>
<P><B>Related commands:</B> none
</P>
<P><B>Default:</B> none
</P>
</HTML>

View File

@ -1,102 +0,0 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix gpu command :h3
[Syntax:]
fix ID group-ID gpu mode first last split :pre
ID, group-ID are documented in "fix"_fix.html command :ulb,l
gpu = style name of this fix command :l
mode = force or force/neigh :l
first = ID of first GPU to be used on each node :l
last = ID of last GPU to be used on each node :l
split = fraction of particles assigned to the GPU :l
:ule
[Examples:]
fix 0 all gpu force 0 0 1.0
fix 0 all gpu force 0 0 0.75
fix 0 all gpu force/neigh 0 0 1.0
fix 0 all gpu force/neigh 0 1 -1.0 :pre
[Description:]
Select and initialize GPUs to be used for acceleration and configure
GPU acceleration in LAMMPS. This fix is required in order to use
any style with GPU acceleration. The fix must be the first fix
specified for a run or an error will be generated. The fix will not have an
effect on any LAMMPS computations that do not use GPU acceleration, so there
should not be any problems with specifying this fix first in input scripts.
The {mode} setting specifies where neighbor list calculations will be
performed. If {mode} is force, neighbor list calculation is performed
on the CPU. If {mode} is force/neigh, neighbor list calculation is
performed on the GPU. GPU neighbor list calculation currently cannot
be used with a triclinic box. GPU neighbor list calculation currently
cannot be used with "hybrid"_pair_hybrid.html pair styles. GPU
neighbor lists are not compatible with styles that are not
GPU-enabled. When a non-GPU enabled style requires a neighbor list,
it will also be built using CPU routines. In these cases, it will
typically be more efficient to only use CPU neighbor list builds.
The {first} and {last} settings specify the GPUs that will be used for
simulation. On each node, the GPU IDs in the inclusive range from
{first} to {last} will be used.
The {split} setting can be used for load balancing force calculation
work between CPU and GPU cores in GPU-enabled pair styles. If
0<{split}<1.0, a fixed fraction of particles is offloaded to the GPU
while force calculation for the other particles occurs simulataneously
on the CPU. If {split}<0, the optimal fraction (based on CPU and GPU
timings) is calculated every 25 timesteps. If {split}=1.0, all force
calculations for GPU accelerated pair styles are performed on the
GPU. In this case, "hybrid"_pair_hybrid.html, "bond"_bond_style.html,
"angle"_angle_style.html, "dihedral"_dihedral_style.html,
"improper"_improper_style.html, and "long-range"_kspace_style.html
calculations can be performed on the CPU while the GPU is performing
force calculations for the GPU-enabled pair style.
In order to use GPU acceleration, a GPU enabled style must be selected
in the input script in addition to this fix. Currently, this is
limited to a few "pair styles"_pair_style.html and the PPPM "kspace
style"_kspace_style.html.
See "this section"_doc/Section_accerate.html of the manual for more
details about using the GPU package.
[Restart, fix_modify, output, run start/stop, minimize info:]
This fix is part of the "gpu" package. It is only enabled if LAMMPS
was built with that package. See the "Making
LAMMPS"_Section_start.html#2_3 section for more info.
No information about this fix is written to "binary restart
files"_restart.html. None of the "fix_modify"_fix_modify.html options
are relevant to this fix.
No parameter of this fix can be used with the {start/stop} keywords of
the "run"_run.html command.
[Restrictions:]
The fix must be the first fix specified for a given run. The
force/neigh {mode} should not be used with a triclinic box or
"hybrid"_pair_hybrid.html pair styles.
The {split} setting must be positive when using
"hybrid"_pair_hybrid.html pair styles.
Currently, group-ID must be all.
[Related commands:] none
[Default:] none

View File

@ -15,39 +15,136 @@
</P>
<PRE>package style args
</PRE>
<UL><LI>style = <I>cuda</I>
<UL><LI>style = <I>gpu</I> or <I>cuda</I> or <I>omp</I>
<LI>args = 0 or more args specific to the style
<LI>args = arguments specific to the style
<PRE> <I>cuda</I> args = to be determined
<LI> <I>gpu</I> args = mode first last split
mode = force or force/neigh
<LI> first = ID of first GPU to be used on each node
<LI> last = ID of last GPU to be used on each node
<LI> split = fraction of particles assigned to the GPU
<PRE> <I>cuda</I> args = to be determined
<I>omp</I> args = Nthreads
</PRE>
<PRE> Nthreads = # of OpenMP threads to associate with each MPI process
</PRE>
</UL>
<P><B>Examples:</B>
</P>
<PRE>package cuda blah
<PRE>package gpu force 0 0 1.0
package gpu force 0 0 0.75
package gpu force/neigh 0 0 1.0
package gpu force/neigh 0 1 -1.0
package cuda blah
package omp 4
</PRE>
<P><B>Description:</B>
</P>
<P>This command invokes package-specific settings. Currently only the
USER-CUDA package uses it.
<P>This command invokes package-specific settings. Currently the
following packages use it: GPU, USER-CUDA, and USER-OMP.
</P>
<P>See <A HREF = "doc/Section_accerate.html">this section</A> of the manual for more
details about using these various packages for accelerating
a LAMMPS calculation.
</P>
<HR>
<P>The <I>gpu</I> style invokes options associated with the use of the GPU
package. It allows you to select and initialize GPUs to be used for
acceleration via this package and configure how the GPU acceleration
is performed. These settings are required in order to use any style
with GPU acceleration.
</P>
<P>The <I>mode</I> setting specifies where neighbor list calculations will be
performed. If <I>mode</I> is force, neighbor list calculation is performed
on the CPU. If <I>mode</I> is force/neigh, neighbor list calculation is
performed on the GPU. GPU neighbor list calculation currently cannot
be used with a triclinic box. GPU neighbor list calculation currently
cannot be used with <A HREF = "pair_hybrid.html">hybrid</A> pair styles. GPU
neighbor lists are not compatible with styles that are not
GPU-enabled. When a non-GPU enabled style requires a neighbor list,
it will also be built using CPU routines. In these cases, it will
typically be more efficient to only use CPU neighbor list builds.
</P>
<P>The <I>first</I> and <I>last</I> settings specify the GPUs that will be used for
simulation. On each node, the GPU IDs in the inclusive range from
<I>first</I> to <I>last</I> will be used.
</P>
<P>The <I>split</I> setting can be used for load balancing force calculation
work between CPU and GPU cores in GPU-enabled pair styles. If 0 <
<I>split</I> < 1.0, a fixed fraction of particles is offloaded to the GPU
while force calculation for the other particles occurs simulataneously
on the CPU. If <I>split</I><0, the optimal fraction (based on CPU and GPU
timings) is calculated every 25 timesteps. If <I>split</I> = 1.0, all force
calculations for GPU accelerated pair styles are performed on the
GPU. In this case, <A HREF = "pair_hybrid.html">hybrid</A>, <A HREF = "bond_style.html">bond</A>,
<A HREF = "angle_style.html">angle</A>, <A HREF = "dihedral_style.html">dihedral</A>,
<A HREF = "improper_style.html">improper</A>, and <A HREF = "kspace_style.html">long-range</A>
calculations can be performed on the CPU while the GPU is performing
force calculations for the GPU-enabled pair style. If all CPU force
computations complete before the GPU, LAMMPS will block until the GPU
has finished before continuing the timestep.
</P>
<P>As an example, if you have two GPUs per node and 8 CPU cores per node,
and would like to run on 4 nodes (32 cores) with dynamic balancing of
force calculation across CPU and GPU cores, you could specify
</P>
<PRE>package gpu force/neigh 0 1 -1
</PRE>
<P>In this case, all CPU cores and GPU devices on the nodes would be
utilized. Each GPU device would be shared by 4 CPU cores. The CPU
cores would perform force calculations for some fraction of the
particles at the same time the GPUs performed force calculation for
the other particles.
</P>
<HR>
<P>The <I>cuda</I> style invokes options associated with the use of the
USER-CUDA package. These will be described when the USER-CUDA package
is released with LAMMPS.
USER-CUDA package. These need to be documented.
</P>
<HR>
<P>The <I>omp</I> style invokes options associated with the use of the
USER-OMP package.
</P>
<P>The only setting to make is the number of OpenMP threads to be
allocated for each MPI process. For example, if your system has nodes
with dual quad-core processors, it has a total of 8 cores per node.
You could run MPI on 2 cores on each node (e.g. using options for the
mpirun command), and set the <I>Nthreads</I> setting to 4. This would
effectively use all 8 cores on each node. Since each MPI process
would spawn 4 threads (one of which runs as part of the MPI process
itself).
</P>
<P>For performance reasons, you should not set <I>Nthreads</I> to more threads
than there are physical cores, but LAMMPS does not check for this.
</P>
<HR>
<P><B>Restrictions:</B>
</P>
<P>This command cannot be used after the simulation box is defined by a
<A HREF = "read_data.html">read_data</A> or <A HREF = "create_box.html">create_box</A> command.
</P>
<P>The cuda style of this command can only be invoked if LAMMPS was built
with the USER-CUDA package. See the <A HREF = "Section_start.html#2_3">Making
LAMMPS</A> section for more info.
</P>
<P>Obviously, you must have GPU hardware and associated software to build
and use LAMMPS with either the GPU or USER-CUDA packages.
<P>The gpu style of this command can only be invoked if LAMMPS was built
with the GPU package. See the <A HREF = "Section_start.html#2_3">Making LAMMPS</A>
section for more info.
</P>
<P><B>Related commands:</B>
<P>The omp style of this command can only be invoked if LAMMPS was built
with the USER-OMP package. See the <A HREF = "Section_start.html#2_3">Making
LAMMPS</A> section for more info.
</P>
<P><A HREF = "fix_gpu.html">fix gpu</A>
<P><B>Related commands:</B> none
</P>
<P><B>Default:</B> none
</P>

View File

@ -12,35 +12,127 @@ package command :h3
package style args :pre
style = {cuda} :ulb,l
args = 0 or more args specific to the style :l
{cuda} args = to be determined :pre
style = {gpu} or {cuda} or {omp} :ulb,l
args = arguments specific to the style :l
{gpu} args = mode first last split
mode = force or force/neigh :l
first = ID of first GPU to be used on each node :l
last = ID of last GPU to be used on each node :l
split = fraction of particles assigned to the GPU :l
{cuda} args = to be determined
{omp} args = Nthreads :pre
Nthreads = # of OpenMP threads to associate with each MPI process :pre
:ule
[Examples:]
package cuda blah :pre
package gpu force 0 0 1.0
package gpu force 0 0 0.75
package gpu force/neigh 0 0 1.0
package gpu force/neigh 0 1 -1.0
package cuda blah
package omp 4 :pre
[Description:]
This command invokes package-specific settings. Currently only the
USER-CUDA package uses it.
This command invokes package-specific settings. Currently the
following packages use it: GPU, USER-CUDA, and USER-OMP.
See "this section"_doc/Section_accerate.html of the manual for more
details about using these various packages for accelerating
a LAMMPS calculation.
:line
The {gpu} style invokes options associated with the use of the GPU
package. It allows you to select and initialize GPUs to be used for
acceleration via this package and configure how the GPU acceleration
is performed. These settings are required in order to use any style
with GPU acceleration.
The {mode} setting specifies where neighbor list calculations will be
performed. If {mode} is force, neighbor list calculation is performed
on the CPU. If {mode} is force/neigh, neighbor list calculation is
performed on the GPU. GPU neighbor list calculation currently cannot
be used with a triclinic box. GPU neighbor list calculation currently
cannot be used with "hybrid"_pair_hybrid.html pair styles. GPU
neighbor lists are not compatible with styles that are not
GPU-enabled. When a non-GPU enabled style requires a neighbor list,
it will also be built using CPU routines. In these cases, it will
typically be more efficient to only use CPU neighbor list builds.
The {first} and {last} settings specify the GPUs that will be used for
simulation. On each node, the GPU IDs in the inclusive range from
{first} to {last} will be used.
The {split} setting can be used for load balancing force calculation
work between CPU and GPU cores in GPU-enabled pair styles. If 0 <
{split} < 1.0, a fixed fraction of particles is offloaded to the GPU
while force calculation for the other particles occurs simulataneously
on the CPU. If {split}<0, the optimal fraction (based on CPU and GPU
timings) is calculated every 25 timesteps. If {split} = 1.0, all force
calculations for GPU accelerated pair styles are performed on the
GPU. In this case, "hybrid"_pair_hybrid.html, "bond"_bond_style.html,
"angle"_angle_style.html, "dihedral"_dihedral_style.html,
"improper"_improper_style.html, and "long-range"_kspace_style.html
calculations can be performed on the CPU while the GPU is performing
force calculations for the GPU-enabled pair style. If all CPU force
computations complete before the GPU, LAMMPS will block until the GPU
has finished before continuing the timestep.
As an example, if you have two GPUs per node and 8 CPU cores per node,
and would like to run on 4 nodes (32 cores) with dynamic balancing of
force calculation across CPU and GPU cores, you could specify
package gpu force/neigh 0 1 -1 :pre
In this case, all CPU cores and GPU devices on the nodes would be
utilized. Each GPU device would be shared by 4 CPU cores. The CPU
cores would perform force calculations for some fraction of the
particles at the same time the GPUs performed force calculation for
the other particles.
:line
The {cuda} style invokes options associated with the use of the
USER-CUDA package. These will be described when the USER-CUDA package
is released with LAMMPS.
USER-CUDA package. These need to be documented.
:line
The {omp} style invokes options associated with the use of the
USER-OMP package.
The only setting to make is the number of OpenMP threads to be
allocated for each MPI process. For example, if your system has nodes
with dual quad-core processors, it has a total of 8 cores per node.
You could run MPI on 2 cores on each node (e.g. using options for the
mpirun command), and set the {Nthreads} setting to 4. This would
effectively use all 8 cores on each node. Since each MPI process
would spawn 4 threads (one of which runs as part of the MPI process
itself).
For performance reasons, you should not set {Nthreads} to more threads
than there are physical cores, but LAMMPS does not check for this.
:line
[Restrictions:]
This command cannot be used after the simulation box is defined by a
"read_data"_read_data.html or "create_box"_create_box.html command.
The cuda style of this command can only be invoked if LAMMPS was built
with the USER-CUDA package. See the "Making
LAMMPS"_Section_start.html#2_3 section for more info.
Obviously, you must have GPU hardware and associated software to build
and use LAMMPS with either the GPU or USER-CUDA packages.
The gpu style of this command can only be invoked if LAMMPS was built
with the GPU package. See the "Making LAMMPS"_Section_start.html#2_3
section for more info.
[Related commands:]
The omp style of this command can only be invoked if LAMMPS was built
with the USER-OMP package. See the "Making
LAMMPS"_Section_start.html#2_3 section for more info.
"fix gpu"_fix_gpu.html
[Related commands:] none
[Default:] none