git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@6711 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp
2011-08-17 21:55:22 +00:00
parent b416be6cbc
commit dcc7913857
12 changed files with 288 additions and 209 deletions

View File

@ -30,6 +30,7 @@ style exist in LAMMPS:
</P> </P>
<UL><LI><A HREF = "pair_lj.html">pair_style lj/cut</A> <UL><LI><A HREF = "pair_lj.html">pair_style lj/cut</A>
<LI><A HREF = "pair_lj.html">pair_style lj/cut/opt</A> <LI><A HREF = "pair_lj.html">pair_style lj/cut/opt</A>
<LI><A HREF = "pair_lj.html">pair_style lj/cut/omp</A>
<LI><A HREF = "pair_lj.html">pair_style lj/cut/gpu</A> <LI><A HREF = "pair_lj.html">pair_style lj/cut/gpu</A>
<LI><A HREF = "pair_lj.html">pair_style lj/cut/cuda</A> <LI><A HREF = "pair_lj.html">pair_style lj/cut/cuda</A>
</UL> </UL>
@ -45,6 +46,12 @@ input script.
<P>Styles with an "opt" suffix are part of the OPT package and typically <P>Styles with an "opt" suffix are part of the OPT package and typically
speed-up the pairwise calculations of your simulation by 5-25%. speed-up the pairwise calculations of your simulation by 5-25%.
</P> </P>
<P>Styles with an "omp" suffix are part of the USER-OMP package and allow
a pair-style to be run in threaded mode using OpenMP. This can be
useful on nodes with high-core counts when using less MPI processes
than cores is advantageous, e.g. when running with PPPM so that FFTs
are run on fewer MPI processors.
</P>
<P>Styles with a "gpu" or "cuda" suffix are part of the GPU or USER-CUDA <P>Styles with a "gpu" or "cuda" suffix are part of the GPU or USER-CUDA
packages, and can be run on NVIDIA GPUs associated with your CPUs. packages, and can be run on NVIDIA GPUs associated with your CPUs.
The speed-up due to GPU usage depends on a variety of factors, as The speed-up due to GPU usage depends on a variety of factors, as
@ -67,8 +74,9 @@ and kspace sections.
packages, since they are both designed to use NVIDIA GPU hardware. packages, since they are both designed to use NVIDIA GPU hardware.
</P> </P>
10.1 <A HREF = "#10_1">OPT package</A><BR> 10.1 <A HREF = "#10_1">OPT package</A><BR>
10.2 <A HREF = "#10_2">GPU package</A><BR> 10.5 <A HREF = "#10_2">USER-OMP package</A><BR>
10.3 <A HREF = "#10_3">USER-CUDA package</A><BR> 10.2 <A HREF = "#10_3">GPU package</A><BR>
10.3 <A HREF = "#10_4">USER-CUDA package</A><BR>
10.4 <A HREF = "#10_4">Comparison of GPU and USER-CUDA packages</A> <BR> 10.4 <A HREF = "#10_4">Comparison of GPU and USER-CUDA packages</A> <BR>
<HR> <HR>
@ -104,53 +112,62 @@ to 20% savings.
<HR> <HR>
<H4><A NAME = "10_2"></A>10.2 GPU package <H4><A NAME = "10_2"></A>10.2 USER-OMP package
</H4>
<P>This section will be written when the USER-OMP package is released
in main LAMMPS.
</P>
<HR>
<HR>
<H4><A NAME = "10_3"></A>10.3 GPU package
</H4> </H4>
<P>The GPU package was developed by Mike Brown at ORNL. It provides GPU <P>The GPU package was developed by Mike Brown at ORNL. It provides GPU
versions of several pair styles and for long-range Coulombics via the versions of several pair styles and for long-range Coulombics via the
PPPM command. It has the following features: PPPM command. It has the following features:
</P> </P>
<UL><LI>The package is designed to exploit common GPU hardware configurations <UL><LI>The package is designed to exploit common GPU hardware configurations
where one or more GPUs are coupled with one or more multi-core CPUs where one or more GPUs are coupled with many cores of a multi-core
within a node of a parallel machine. CPUs, e.g. within a node of a parallel machine.
<LI>Atom-based data (e.g. coordinates, forces) moves back-and-forth <LI>Atom-based data (e.g. coordinates, forces) moves back-and-forth
between the CPU and GPU every timestep. between the CPU(s) and GPU every timestep.
<LI>Neighbor lists can be constructed by on the CPU or on the GPU, <LI>Neighbor lists can be constructed on the CPU or on the GPU
controlled by the <A HREF = "fix_gpu.html">fix gpu</A> command.
<LI>The charge assignement and force interpolation portions of PPPM can be <LI>The charge assignement and force interpolation portions of PPPM can be
run on the GPU. The FFT portion, which requires MPI communication run on the GPU. The FFT portion, which requires MPI communication
between processors, runs on the CPU. between processors, runs on the CPU.
<LI>Asynchronous force computations can be performed simulataneously on <LI>Asynchronous force computations can be performed simultaneously on the
the CPU and GPU. CPU(s) and GPU.
<LI>LAMMPS-specific code is in the GPU package. It makee calls to a more <LI>LAMMPS-specific code is in the GPU package. It makes calls to a
generic GPU library in the lib/gpu directory. This library provides generic GPU library in the lib/gpu directory. This library provides
NVIDIA support as well as a more general OpenCL support, so that the NVIDIA support as well as more general OpenCL support, so that the
same functionality can eventually be supported on other GPU same functionality can eventually be supported on a variety of GPU
hardware. hardware.
</UL> </UL>
<P><B>Hardware and software requirements:</B> <P><B>Hardware and software requirements:</B>
</P> </P>
<P>To use this package, you need to have specific NVIDIA hardware and <P>To use this package, you currently need to have specific NVIDIA
install specific NVIDIA CUDA software on your system: hardware and install specific NVIDIA CUDA software on your system:
</P> </P>
<UL><LI>Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0 <UL><LI>Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0
<LI>Go to http://www.nvidia.com/object/cuda_get.html <LI>Go to http://www.nvidia.com/object/cuda_get.html
<LI>Install a driver and toolkit appropriate for your system (SDK is not necessary) <LI>Install a driver and toolkit appropriate for your system (SDK is not necessary)
<LI>Follow the instructions in lammps/lib/gpu/README to build the library (also see below) <LI>Follow the instructions in lammps/lib/gpu/README to build the library (see below)
<LI>Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties <LI>Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties
</UL> </UL>
<P><B>Building LAMMPS with the GPU package:</B> <P><B>Building LAMMPS with the GPU package:</B>
</P> </P>
<P>As with other packages that link with a separately complied library, <P>As with other packages that include a separately compiled library, you
you need to first build the GPU library, before building LAMMPS need to first build the GPU library, before building LAMMPS itself.
itself. General instructions for doing this are in <A HREF = "doc/Section_start.html#2_3">this General instructions for doing this are in <A HREF = "doc/Section_start.html#2_3">this
section</A> of the manual. For this package, section</A> of the manual. For this package,
do the following, using a Makefile appropriate for your system: do the following, using a Makefile in lib/gpu appropriate for your
system:
</P> </P>
<PRE>cd lammps/lib/gpu <PRE>cd lammps/lib/gpu
make -f Makefile.linux make -f Makefile.linux
@ -160,7 +177,7 @@ make -f Makefile.linux
</P> </P>
<P>Now you are ready to build LAMMPS with the GPU package installed: <P>Now you are ready to build LAMMPS with the GPU package installed:
</P> </P>
<PRE>cd lammps/lib/src <PRE>cd lammps/src
make yes-gpu make yes-gpu
make machine make machine
</PRE> </PRE>
@ -173,28 +190,27 @@ example.
<P><B>GPU configuration</B> <P><B>GPU configuration</B>
</P> </P>
<P>When using GPUs, you are restricted to one physical GPU per LAMMPS <P>When using GPUs, you are restricted to one physical GPU per LAMMPS
process, which is an MPI process running (typically) on a single core process, which is an MPI process running on a single core or
or processor. Multiple processes can share a single GPU and in many processor. Multiple MPI processes (CPU cores) can share a single GPU,
cases it will be more efficient to run with multiple processes per and in many cases it will be more efficient to run this way.
GPU.
</P> </P>
<P><B>Input script requirements:</B> <P><B>Input script requirements:</B>
</P> </P>
<P>Additional input script requirements to run styles with a <I>gpu</I> suffix <P>Additional input script requirements to run pair or PPPM styles with a
are as follows. <I>gpu</I> suffix are as follows:
</P> </P>
<P>The <A HREF = "newton.html">newton pair</A> setting must be <I>off</I>. <UL><LI>To invoke specific styles from the GPU package, you can either append
</P>
<P>To invoke specific styles from the GPU package, you can either append
"gpu" to the style name (e.g. pair_style lj/cut/gpu), or use the "gpu" to the style name (e.g. pair_style lj/cut/gpu), or use the
<A HREF = "Section_start.html#2_6">-suffix command-line switch</A>, or use the <A HREF = "Section_start.html#2_6">-suffix command-line switch</A>, or use the
<A HREF = "suffix.html">suffix</A> command. <A HREF = "suffix.html">suffix</A> command.
</P>
<P>The <A HREF = "package.html">package gpu</A> command must be used near the beginning <LI>The <A HREF = "newton.html">newton pair</A> setting must be <I>off</I>.
of your script to control the GPU selection and initialization steps.
It also enables asynchronous splitting of force computations between <LI>The <A HREF = "package.html">package gpu</A> command must be used near the beginning
the CPUs and GPUs. of your script to control the GPU selection and initialization
</P> settings. It also has an option to enable asynchronous splitting of
force computations between the CPUs and GPUs.
</UL>
<P>As an example, if you have two GPUs per node and 8 CPU cores per node, <P>As an example, if you have two GPUs per node and 8 CPU cores per node,
and would like to run on 4 nodes (32 cores) with dynamic balancing of and would like to run on 4 nodes (32 cores) with dynamic balancing of
force calculation across CPU and GPU cores, you could specify force calculation across CPU and GPU cores, you could specify
@ -220,10 +236,10 @@ computations that run simultaneously with <A HREF = "bond_style.html">bond</A>,
<A HREF = "improper_style.html">improper</A>, and <A HREF = "kspace_style.html">long-range</A> <A HREF = "improper_style.html">improper</A>, and <A HREF = "kspace_style.html">long-range</A>
calculations will not be included in the "Pair" time. calculations will not be included in the "Pair" time.
</P> </P>
<P>When the <I>mode</I> setting for the gpu fix is force/neigh, the time for <P>When the <I>mode</I> setting for the package gpu command is force/neigh,
neighbor list calculations on the GPU will be added into the "Pair" the time for neighbor list calculations on the GPU will be added into
time, not the "Neigh" time. An additional breakdown of the times the "Pair" time, not the "Neigh" time. An additional breakdown of the
required for various tasks on the GPU (data copy, neighbor times required for various tasks on the GPU (data copy, neighbor
calculations, force computations, etc) are output only with the LAMMPS calculations, force computations, etc) are output only with the LAMMPS
screen output (not in the log file) at the end of each run. These screen output (not in the log file) at the end of each run. These
timings represent total time spent on the GPU for each routine, timings represent total time spent on the GPU for each routine,
@ -231,20 +247,23 @@ regardless of asynchronous CPU calculations.
</P> </P>
<P><B>Performance tips:</B> <P><B>Performance tips:</B>
</P> </P>
<P>Generally speaking, for best performance, you should use multiple CPUs
per GPU, as provided my most multi-core CPU/GPU configurations.
</P>
<P>Because of the large number of cores within each GPU device, it may be <P>Because of the large number of cores within each GPU device, it may be
more efficient to run on fewer processes per GPU when the number of more efficient to run on fewer processes per GPU when the number of
particles per MPI process is small (100's of particles); this can be particles per MPI process is small (100's of particles); this can be
necessary to keep the GPU cores busy. necessary to keep the GPU cores busy.
</P> </P>
<P>See the lammps/lib/gpu/README file for instructions on how to build <P>See the lammps/lib/gpu/README file for instructions on how to build
the LAMMPS gpu library for single, mixed, and double precision. The the GPU library for single, mixed, or double precision. The latter
latter requires that your GPU card support double precision. requires that your GPU card support double precision.
</P> </P>
<HR> <HR>
<HR> <HR>
<H4><A NAME = "10_3"></A>10.3 USER-CUDA package <H4><A NAME = "10_4"></A>10.4 USER-CUDA package
</H4> </H4>
<P>The USER-CUDA package was developed by Christian Trott at U Technology <P>The USER-CUDA package was developed by Christian Trott at U Technology
Ilmenau in Germany. It provides NVIDIA GPU versions of many pair Ilmenau in Germany. It provides NVIDIA GPU versions of many pair
@ -256,19 +275,22 @@ many timesteps, to run entirely on the GPU (except for inter-processor
MPI communication), so that atom-based data (e.g. coordinates, forces) MPI communication), so that atom-based data (e.g. coordinates, forces)
do not have to move back-and-forth between the CPU and GPU. do not have to move back-and-forth between the CPU and GPU.
<LI>This will occur until a timestep where a non-GPU-ized fix or compute <LI>Data will stay on the GPU until a timestep where a non-GPU-ized fix or
is invoked. E.g. whenever a non-GPU operation occurs (fix, compute, compute is invoked. Whenever a non-GPU operation occurs (fix,
output), data automatically moves back to the CPU as needed. This may compute, output), data automatically moves back to the CPU as needed.
incur a performance penalty, but should otherwise just work This may incur a performance penalty, but should otherwise work
transparently. transparently.
<LI>Neighbor lists for GPU-ized pair styles are constructed on the <LI>Neighbor lists for GPU-ized pair styles are constructed on the
GPU. GPU.
<LI>The package only supports use of a single CPU (core) with each
GPU.
</UL> </UL>
<P><B>Hardware and software requirements:</B> <P><B>Hardware and software requirements:</B>
</P> </P>
<P>To use this package, you need to have specific NVIDIA hardware and <P>To use this package, you need to have specific NVIDIA hardware and
install specific NVIDIA CUDA software on your system: install specific NVIDIA CUDA software on your system.
</P> </P>
<P>Your NVIDIA GPU needs to support Compute Capability 1.3. This list may <P>Your NVIDIA GPU needs to support Compute Capability 1.3. This list may
help you to find out the Compute Capability of your card: help you to find out the Compute Capability of your card:
@ -282,18 +304,19 @@ that its sample projects can be compiled without problems.
</P> </P>
<P><B>Building LAMMPS with the USER-CUDA package:</B> <P><B>Building LAMMPS with the USER-CUDA package:</B>
</P> </P>
<P>As with other packages that link with a separately complied library, <P>As with other packages that include a separately compiled library, you
you need to first build the USER-CUDA library, before building LAMMPS need to first build the USER-CUDA library, before building LAMMPS
itself. General instructions for doing this are in <A HREF = "doc/Section_start.html#2_3">this itself. General instructions for doing this are in <A HREF = "doc/Section_start.html#2_3">this
section</A> of the manual. For this package, section</A> of the manual. For this package,
do the following, using a Makefile appropriate for your system: do the following, using settings in the lib/cuda Makefiles appropriate
for your system:
</P> </P>
<UL><LI>If your <I>CUDA</I> toolkit is not installed in the default system directoy <UL><LI>Go to the lammps/lib/cuda directory
<LI>If your <I>CUDA</I> toolkit is not installed in the default system directoy
<I>/usr/local/cuda</I> edit the file <I>lib/cuda/Makefile.common</I> <I>/usr/local/cuda</I> edit the file <I>lib/cuda/Makefile.common</I>
accordingly. accordingly.
<LI>Go to the lammps/lib/cuda directory
<LI>Type "make OPTIONS", where <I>OPTIONS</I> are one or more of the following <LI>Type "make OPTIONS", where <I>OPTIONS</I> are one or more of the following
options. The settings will be written to the options. The settings will be written to the
<I>lib/cuda/Makefile.defaults</I> and used in the next step. <I>lib/cuda/Makefile.defaults</I> and used in the next step.
@ -324,36 +347,38 @@ produce the file lib/libcuda.a.
</UL> </UL>
<P>Now you are ready to build LAMMPS with the USER-CUDA package installed: <P>Now you are ready to build LAMMPS with the USER-CUDA package installed:
</P> </P>
<PRE>cd lammps/lib/src <PRE>cd lammps/src
make yes-user-cuda make yes-user-cuda
make machine make machine
</PRE> </PRE>
<P>Note that the build will reference the lib/cuda/Makefile.common file <P>Note that the LAMMPS build references the lib/cuda/Makefile.common
to extract setting relevant to the LAMMPS build. So it is important file to extract setting specific CUDA settings. So it is important
that you have first built the cuda library (in lib/cuda) using that you have first built the cuda library (in lib/cuda) using
settings appropriate to your system. settings appropriate to your system.
</P> </P>
<P><B>Input script requirements:</B> <P><B>Input script requirements:</B>
</P> </P>
<P>Additional input script requirements to run styles with a <I>cuda</I> <P>Additional input script requirements to run styles with a <I>cuda</I>
suffix are as follows. suffix are as follows:
</P> </P>
<P>To invoke specific styles from the USER-CUDA package, you can either <UL><LI>To invoke specific styles from the USER-CUDA package, you can either
append "cuda" to the style name (e.g. pair_style lj/cut/cuda), or use append "cuda" to the style name (e.g. pair_style lj/cut/cuda), or use
the <A HREF = "Section_start.html#2_6">-suffix command-line switch</A>, or use the the <A HREF = "Section_start.html#2_6">-suffix command-line switch</A>, or use the
<A HREF = "suffix.html">suffix</A> command. One exception is that the <A HREF = "kspace_style.html">kspace_style <A HREF = "suffix.html">suffix</A> command. One exception is that the <A HREF = "kspace_style.html">kspace_style
pppm/cuda</A> command has to be requested explicitly. pppm/cuda</A> command has to be requested
</P> explicitly.
<P>To use the USER-CUDA package with its default settings, no additional
<LI>To use the USER-CUDA package with its default settings, no additional
command is needed in your input script. This is because when LAMMPS command is needed in your input script. This is because when LAMMPS
starts up, it detects if it has been built with the USER-CUDA package. starts up, it detects if it has been built with the USER-CUDA package.
See the <A HREF = "Section_start.html#2_6">-cuda command-line switch</A> for more See the <A HREF = "Section_start.html#2_6">-cuda command-line switch</A> for more
details. details.
</P>
<P>To change settings for the USER-CUDA package at run-time, the <A HREF = "package.html">package <LI>To change settings for the USER-CUDA package at run-time, the <A HREF = "package.html">package
cuda</A> command can be used at the beginning of your input cuda</A> command can be used near the beginning of your
script. See the commands doc page for details. input script. See the <A HREF = "package.html">package</A> command doc page for
</P> details.
</UL>
<P><B>Performance tips:</B> <P><B>Performance tips:</B>
</P> </P>
<P>The USER-CUDA package offers more speed-up relative to CPU performance <P>The USER-CUDA package offers more speed-up relative to CPU performance
@ -365,18 +390,18 @@ entirely on the GPU(s) (except for inter-processor MPI communication),
for multiple timesteps, until a CPU calculation is required, either by for multiple timesteps, until a CPU calculation is required, either by
a fix or compute that is non-GPU-ized, or until output is performed a fix or compute that is non-GPU-ized, or until output is performed
(thermo or dump snapshot or restart file). The less often this (thermo or dump snapshot or restart file). The less often this
occurs, the faster your simulation may run. occurs, the faster your simulation will run.
</P> </P>
<HR> <HR>
<HR> <HR>
<H4><A NAME = "10_4"></A>10.4 Comparison of GPU and USER-CUDA packages <H4><A NAME = "10_5"></A>10.5 Comparison of GPU and USER-CUDA packages
</H4> </H4>
<P>Both the GPU and USER-CUDA packages accelerate a LAMMPS calculation <P>Both the GPU and USER-CUDA packages accelerate a LAMMPS calculation
using NVIDIA hardware, but they do it in different ways. using NVIDIA hardware, but they do it in different ways.
</P> </P>
<P>As a consequence, for a specific simulation on particular hardware, <P>As a consequence, for a particular simulation on specific hardware,
one package may be faster than the other. We give guidelines below, one package may be faster than the other. We give guidelines below,
but the best way to determine which package is faster for your input but the best way to determine which package is faster for your input
script is to try both of them on your machine. See the benchmarking script is to try both of them on your machine. See the benchmarking
@ -384,7 +409,12 @@ section below for examples where this has been done.
</P> </P>
<P><B>Guidelines for using each package optimally:</B> <P><B>Guidelines for using each package optimally:</B>
</P> </P>
<UL><LI>The GPU package moves per-atom data (coordinates, forces) <UL><LI>The GPU package allows you to assign multiple CPUs (cores) to a single
GPU (a common configuration for "hybrid" nodes that contain multicore
CPU(s) and GPU(s)) and works effectively in this mode. The USER-CUDA
package does not allow this; you can only use one CPU per GPU.
<LI>The GPU package moves per-atom data (coordinates, forces)
back-and-forth between the CPU and GPU every timestep. The USER-CUDA back-and-forth between the CPU and GPU every timestep. The USER-CUDA
package only does this on timesteps when a CPU calculation is required package only does this on timesteps when a CPU calculation is required
(e.g. to invoke a fix or compute that is non-GPU-ized). Hence, if you (e.g. to invoke a fix or compute that is non-GPU-ized). Hence, if you
@ -402,28 +432,12 @@ system the crossover (in single precision) is often about 50K-100K
atoms per GPU. When performing double precision calculations the atoms per GPU. When performing double precision calculations the
crossover point can be significantly smaller. crossover point can be significantly smaller.
<LI>The GPU package allows you to assign multiple CPUs (cores) to a single
GPU (a common configuration for "hybrid" nodes that contain multicore
CPU(s) and GPU(s)) and works effectively in this mode. The USER-CUDA
package does not; it works best when there is one CPU per GPU.
<LI>Both packages compute bonded interactions (bonds, angles, etc) on the <LI>Both packages compute bonded interactions (bonds, angles, etc) on the
CPU. This means a model with bonds will force the USER-CUDA package CPU. This means a model with bonds will force the USER-CUDA package
to transfer per-atom data back-and-forth between the CPU and GPU every to transfer per-atom data back-and-forth between the CPU and GPU every
timestep. If the GPU package is running with several MPI processes timestep. If the GPU package is running with several MPI processes
assigned to one GPU, the cost of computing the bonded interactions is assigned to one GPU, the cost of computing the bonded interactions is
spread across more CPUs and hence the GPU package can run faster. spread across more CPUs and hence the GPU package can run faster.
</UL>
<P><B>Chief differences between the two packages:</B>
</P>
<UL><LI>The GPU package accelerates only pair force, neighbor list, and PPPM
calculations. The USER-CUDA package currently supports a wider range
of pair styles and can also accelerate many fix styles and some
compute styles, as well as neighbor list and PPPM calculations.
<LI>The GPU package uses more GPU memory than the USER-CUDA package. This
is generally not much of a problem since typical runs are
computation-limited rather than memory-limited.
<LI>When using the GPU package with multiple CPUs assigned to one GPU, its <LI>When using the GPU package with multiple CPUs assigned to one GPU, its
performance depends to some extent on high bandwidth between the CPUs performance depends to some extent on high bandwidth between the CPUs
@ -433,18 +447,30 @@ case if S2050/70 servers are used, where two devices generally share
one PCIe 2.0 16x slot. Also many multi-GPU mainboards do not provide one PCIe 2.0 16x slot. Also many multi-GPU mainboards do not provide
full 16 lanes to each of the PCIe 2.0 16x slots. full 16 lanes to each of the PCIe 2.0 16x slots.
</UL> </UL>
<P><B>Differences between the two packages:</B>
</P>
<UL><LI>The GPU package accelerates only pair force, neighbor list, and PPPM
calculations. The USER-CUDA package currently supports a wider range
of pair styles and can also accelerate many fix styles and some
compute styles, as well as neighbor list and PPPM calculations.
<LI>The GPU package uses more GPU memory than the USER-CUDA package. This
is generally not a problem since typical runs are computation-limited
rather than memory-limited.
</UL>
<P><B>Examples:</B> <P><B>Examples:</B>
</P> </P>
<P>The LAMMPS distribution has two directories with sample <P>The LAMMPS distribution has two directories with sample input scripts
input scripts for the GPU and USER-CUDA packages. for the GPU and USER-CUDA packages.
</P> </P>
<UL><LI>lammps/examples/gpu = GPU package files <UL><LI>lammps/examples/gpu = GPU package files
<LI>lammps/examples/USER/cuda = USER-CUDA package files <LI>lammps/examples/USER/cuda = USER-CUDA package files
</UL> </UL>
<P>These are files for identical systems, so they can be <P>These contain input scripts for identical systems, so they can be used
used to benchmark the performance of both packages to benchmark the performance of both packages on your system.
on your system.
</P> </P>
<HR>
<P><B>Benchmark data:</B> <P><B>Benchmark data:</B>
</P> </P>
<P>NOTE: We plan to add some benchmark results and plots here for the <P>NOTE: We plan to add some benchmark results and plots here for the

View File

@ -27,6 +27,7 @@ style exist in LAMMPS:
"pair_style lj/cut"_pair_lj.html "pair_style lj/cut"_pair_lj.html
"pair_style lj/cut/opt"_pair_lj.html "pair_style lj/cut/opt"_pair_lj.html
"pair_style lj/cut/omp"_pair_lj.html
"pair_style lj/cut/gpu"_pair_lj.html "pair_style lj/cut/gpu"_pair_lj.html
"pair_style lj/cut/cuda"_pair_lj.html :ul "pair_style lj/cut/cuda"_pair_lj.html :ul
@ -42,6 +43,12 @@ input script.
Styles with an "opt" suffix are part of the OPT package and typically Styles with an "opt" suffix are part of the OPT package and typically
speed-up the pairwise calculations of your simulation by 5-25%. speed-up the pairwise calculations of your simulation by 5-25%.
Styles with an "omp" suffix are part of the USER-OMP package and allow
a pair-style to be run in threaded mode using OpenMP. This can be
useful on nodes with high-core counts when using less MPI processes
than cores is advantageous, e.g. when running with PPPM so that FFTs
are run on fewer MPI processors.
Styles with a "gpu" or "cuda" suffix are part of the GPU or USER-CUDA Styles with a "gpu" or "cuda" suffix are part of the GPU or USER-CUDA
packages, and can be run on NVIDIA GPUs associated with your CPUs. packages, and can be run on NVIDIA GPUs associated with your CPUs.
The speed-up due to GPU usage depends on a variety of factors, as The speed-up due to GPU usage depends on a variety of factors, as
@ -64,8 +71,9 @@ The final section compares and contrasts the GPU and USER-CUDA
packages, since they are both designed to use NVIDIA GPU hardware. packages, since they are both designed to use NVIDIA GPU hardware.
10.1 "OPT package"_#10_1 10.1 "OPT package"_#10_1
10.2 "GPU package"_#10_2 10.5 "USER-OMP package"_#10_2
10.3 "USER-CUDA package"_#10_3 10.2 "GPU package"_#10_3
10.3 "USER-CUDA package"_#10_4
10.4 "Comparison of GPU and USER-CUDA packages"_#10_4 :all(b) 10.4 "Comparison of GPU and USER-CUDA packages"_#10_4 :all(b)
:line :line
@ -99,53 +107,61 @@ to 20% savings.
:line :line
:line :line
10.2 GPU package :h4,link(10_2) 10.2 USER-OMP package :h4,link(10_2)
This section will be written when the USER-OMP package is released
in main LAMMPS.
:line
:line
10.3 GPU package :h4,link(10_3)
The GPU package was developed by Mike Brown at ORNL. It provides GPU The GPU package was developed by Mike Brown at ORNL. It provides GPU
versions of several pair styles and for long-range Coulombics via the versions of several pair styles and for long-range Coulombics via the
PPPM command. It has the following features: PPPM command. It has the following features:
The package is designed to exploit common GPU hardware configurations The package is designed to exploit common GPU hardware configurations
where one or more GPUs are coupled with one or more multi-core CPUs where one or more GPUs are coupled with many cores of a multi-core
within a node of a parallel machine. :ulb,l CPUs, e.g. within a node of a parallel machine. :ulb,l
Atom-based data (e.g. coordinates, forces) moves back-and-forth Atom-based data (e.g. coordinates, forces) moves back-and-forth
between the CPU and GPU every timestep. :l between the CPU(s) and GPU every timestep. :l
Neighbor lists can be constructed by on the CPU or on the GPU, Neighbor lists can be constructed on the CPU or on the GPU :l
controlled by the "fix gpu"_fix_gpu.html command. :l
The charge assignement and force interpolation portions of PPPM can be The charge assignement and force interpolation portions of PPPM can be
run on the GPU. The FFT portion, which requires MPI communication run on the GPU. The FFT portion, which requires MPI communication
between processors, runs on the CPU. :l between processors, runs on the CPU. :l
Asynchronous force computations can be performed simulataneously on Asynchronous force computations can be performed simultaneously on the
the CPU and GPU. :l CPU(s) and GPU. :l
LAMMPS-specific code is in the GPU package. It makee calls to a more LAMMPS-specific code is in the GPU package. It makes calls to a
generic GPU library in the lib/gpu directory. This library provides generic GPU library in the lib/gpu directory. This library provides
NVIDIA support as well as a more general OpenCL support, so that the NVIDIA support as well as more general OpenCL support, so that the
same functionality can eventually be supported on other GPU same functionality can eventually be supported on a variety of GPU
hardware. :l,ule hardware. :l,ule
[Hardware and software requirements:] [Hardware and software requirements:]
To use this package, you need to have specific NVIDIA hardware and To use this package, you currently need to have specific NVIDIA
install specific NVIDIA CUDA software on your system: hardware and install specific NVIDIA CUDA software on your system:
Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0 Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0
Go to http://www.nvidia.com/object/cuda_get.html Go to http://www.nvidia.com/object/cuda_get.html
Install a driver and toolkit appropriate for your system (SDK is not necessary) Install a driver and toolkit appropriate for your system (SDK is not necessary)
Follow the instructions in lammps/lib/gpu/README to build the library (also see below) Follow the instructions in lammps/lib/gpu/README to build the library (see below)
Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties :ul Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties :ul
[Building LAMMPS with the GPU package:] [Building LAMMPS with the GPU package:]
As with other packages that link with a separately complied library, As with other packages that include a separately compiled library, you
you need to first build the GPU library, before building LAMMPS need to first build the GPU library, before building LAMMPS itself.
itself. General instructions for doing this are in "this General instructions for doing this are in "this
section"_doc/Section_start.html#2_3 of the manual. For this package, section"_doc/Section_start.html#2_3 of the manual. For this package,
do the following, using a Makefile appropriate for your system: do the following, using a Makefile in lib/gpu appropriate for your
system:
cd lammps/lib/gpu cd lammps/lib/gpu
make -f Makefile.linux make -f Makefile.linux
@ -155,7 +171,7 @@ If you are successful, you will produce the file lib/libgpu.a.
Now you are ready to build LAMMPS with the GPU package installed: Now you are ready to build LAMMPS with the GPU package installed:
cd lammps/lib/src cd lammps/src
make yes-gpu make yes-gpu
make machine :pre make machine :pre
@ -168,27 +184,26 @@ example.
[GPU configuration] [GPU configuration]
When using GPUs, you are restricted to one physical GPU per LAMMPS When using GPUs, you are restricted to one physical GPU per LAMMPS
process, which is an MPI process running (typically) on a single core process, which is an MPI process running on a single core or
or processor. Multiple processes can share a single GPU and in many processor. Multiple MPI processes (CPU cores) can share a single GPU,
cases it will be more efficient to run with multiple processes per and in many cases it will be more efficient to run this way.
GPU.
[Input script requirements:] [Input script requirements:]
Additional input script requirements to run styles with a {gpu} suffix Additional input script requirements to run pair or PPPM styles with a
are as follows. {gpu} suffix are as follows:
The "newton pair"_newton.html setting must be {off}.
To invoke specific styles from the GPU package, you can either append To invoke specific styles from the GPU package, you can either append
"gpu" to the style name (e.g. pair_style lj/cut/gpu), or use the "gpu" to the style name (e.g. pair_style lj/cut/gpu), or use the
"-suffix command-line switch"_Section_start.html#2_6, or use the "-suffix command-line switch"_Section_start.html#2_6, or use the
"suffix"_suffix.html command. "suffix"_suffix.html command. :ulb,l
The "newton pair"_newton.html setting must be {off}. :l
The "package gpu"_package.html command must be used near the beginning The "package gpu"_package.html command must be used near the beginning
of your script to control the GPU selection and initialization steps. of your script to control the GPU selection and initialization
It also enables asynchronous splitting of force computations between settings. It also has an option to enable asynchronous splitting of
the CPUs and GPUs. force computations between the CPUs and GPUs. :l,ule
As an example, if you have two GPUs per node and 8 CPU cores per node, As an example, if you have two GPUs per node and 8 CPU cores per node,
and would like to run on 4 nodes (32 cores) with dynamic balancing of and would like to run on 4 nodes (32 cores) with dynamic balancing of
@ -215,10 +230,10 @@ computations that run simultaneously with "bond"_bond_style.html,
"improper"_improper_style.html, and "long-range"_kspace_style.html "improper"_improper_style.html, and "long-range"_kspace_style.html
calculations will not be included in the "Pair" time. calculations will not be included in the "Pair" time.
When the {mode} setting for the gpu fix is force/neigh, the time for When the {mode} setting for the package gpu command is force/neigh,
neighbor list calculations on the GPU will be added into the "Pair" the time for neighbor list calculations on the GPU will be added into
time, not the "Neigh" time. An additional breakdown of the times the "Pair" time, not the "Neigh" time. An additional breakdown of the
required for various tasks on the GPU (data copy, neighbor times required for various tasks on the GPU (data copy, neighbor
calculations, force computations, etc) are output only with the LAMMPS calculations, force computations, etc) are output only with the LAMMPS
screen output (not in the log file) at the end of each run. These screen output (not in the log file) at the end of each run. These
timings represent total time spent on the GPU for each routine, timings represent total time spent on the GPU for each routine,
@ -226,19 +241,22 @@ regardless of asynchronous CPU calculations.
[Performance tips:] [Performance tips:]
Generally speaking, for best performance, you should use multiple CPUs
per GPU, as provided my most multi-core CPU/GPU configurations.
Because of the large number of cores within each GPU device, it may be Because of the large number of cores within each GPU device, it may be
more efficient to run on fewer processes per GPU when the number of more efficient to run on fewer processes per GPU when the number of
particles per MPI process is small (100's of particles); this can be particles per MPI process is small (100's of particles); this can be
necessary to keep the GPU cores busy. necessary to keep the GPU cores busy.
See the lammps/lib/gpu/README file for instructions on how to build See the lammps/lib/gpu/README file for instructions on how to build
the LAMMPS gpu library for single, mixed, and double precision. The the GPU library for single, mixed, or double precision. The latter
latter requires that your GPU card support double precision. requires that your GPU card support double precision.
:line :line
:line :line
10.3 USER-CUDA package :h4,link(10_3) 10.4 USER-CUDA package :h4,link(10_4)
The USER-CUDA package was developed by Christian Trott at U Technology The USER-CUDA package was developed by Christian Trott at U Technology
Ilmenau in Germany. It provides NVIDIA GPU versions of many pair Ilmenau in Germany. It provides NVIDIA GPU versions of many pair
@ -250,19 +268,22 @@ many timesteps, to run entirely on the GPU (except for inter-processor
MPI communication), so that atom-based data (e.g. coordinates, forces) MPI communication), so that atom-based data (e.g. coordinates, forces)
do not have to move back-and-forth between the CPU and GPU. :ulb,l do not have to move back-and-forth between the CPU and GPU. :ulb,l
This will occur until a timestep where a non-GPU-ized fix or compute Data will stay on the GPU until a timestep where a non-GPU-ized fix or
is invoked. E.g. whenever a non-GPU operation occurs (fix, compute, compute is invoked. Whenever a non-GPU operation occurs (fix,
output), data automatically moves back to the CPU as needed. This may compute, output), data automatically moves back to the CPU as needed.
incur a performance penalty, but should otherwise just work This may incur a performance penalty, but should otherwise work
transparently. :l transparently. :l
Neighbor lists for GPU-ized pair styles are constructed on the Neighbor lists for GPU-ized pair styles are constructed on the
GPU. :l
The package only supports use of a single CPU (core) with each
GPU. :l,ule GPU. :l,ule
[Hardware and software requirements:] [Hardware and software requirements:]
To use this package, you need to have specific NVIDIA hardware and To use this package, you need to have specific NVIDIA hardware and
install specific NVIDIA CUDA software on your system: install specific NVIDIA CUDA software on your system.
Your NVIDIA GPU needs to support Compute Capability 1.3. This list may Your NVIDIA GPU needs to support Compute Capability 1.3. This list may
help you to find out the Compute Capability of your card: help you to find out the Compute Capability of your card:
@ -276,17 +297,18 @@ that its sample projects can be compiled without problems.
[Building LAMMPS with the USER-CUDA package:] [Building LAMMPS with the USER-CUDA package:]
As with other packages that link with a separately complied library, As with other packages that include a separately compiled library, you
you need to first build the USER-CUDA library, before building LAMMPS need to first build the USER-CUDA library, before building LAMMPS
itself. General instructions for doing this are in "this itself. General instructions for doing this are in "this
section"_doc/Section_start.html#2_3 of the manual. For this package, section"_doc/Section_start.html#2_3 of the manual. For this package,
do the following, using a Makefile appropriate for your system: do the following, using settings in the lib/cuda Makefiles appropriate
for your system:
Go to the lammps/lib/cuda directory :ulb,l
If your {CUDA} toolkit is not installed in the default system directoy If your {CUDA} toolkit is not installed in the default system directoy
{/usr/local/cuda} edit the file {lib/cuda/Makefile.common} {/usr/local/cuda} edit the file {lib/cuda/Makefile.common}
accordingly. :ulb,l accordingly. :l
Go to the lammps/lib/cuda directory :l
Type "make OPTIONS", where {OPTIONS} are one or more of the following Type "make OPTIONS", where {OPTIONS} are one or more of the following
options. The settings will be written to the options. The settings will be written to the
@ -318,35 +340,37 @@ produce the file lib/libcuda.a. :l,ule
Now you are ready to build LAMMPS with the USER-CUDA package installed: Now you are ready to build LAMMPS with the USER-CUDA package installed:
cd lammps/lib/src cd lammps/src
make yes-user-cuda make yes-user-cuda
make machine :pre make machine :pre
Note that the build will reference the lib/cuda/Makefile.common file Note that the LAMMPS build references the lib/cuda/Makefile.common
to extract setting relevant to the LAMMPS build. So it is important file to extract setting specific CUDA settings. So it is important
that you have first built the cuda library (in lib/cuda) using that you have first built the cuda library (in lib/cuda) using
settings appropriate to your system. settings appropriate to your system.
[Input script requirements:] [Input script requirements:]
Additional input script requirements to run styles with a {cuda} Additional input script requirements to run styles with a {cuda}
suffix are as follows. suffix are as follows:
To invoke specific styles from the USER-CUDA package, you can either To invoke specific styles from the USER-CUDA package, you can either
append "cuda" to the style name (e.g. pair_style lj/cut/cuda), or use append "cuda" to the style name (e.g. pair_style lj/cut/cuda), or use
the "-suffix command-line switch"_Section_start.html#2_6, or use the the "-suffix command-line switch"_Section_start.html#2_6, or use the
"suffix"_suffix.html command. One exception is that the "kspace_style "suffix"_suffix.html command. One exception is that the "kspace_style
pppm/cuda"_kspace_style.html command has to be requested explicitly. pppm/cuda"_kspace_style.html command has to be requested
explicitly. :ulb,l
To use the USER-CUDA package with its default settings, no additional To use the USER-CUDA package with its default settings, no additional
command is needed in your input script. This is because when LAMMPS command is needed in your input script. This is because when LAMMPS
starts up, it detects if it has been built with the USER-CUDA package. starts up, it detects if it has been built with the USER-CUDA package.
See the "-cuda command-line switch"_Section_start.html#2_6 for more See the "-cuda command-line switch"_Section_start.html#2_6 for more
details. details. :l
To change settings for the USER-CUDA package at run-time, the "package To change settings for the USER-CUDA package at run-time, the "package
cuda"_package.html command can be used at the beginning of your input cuda"_package.html command can be used near the beginning of your
script. See the commands doc page for details. input script. See the "package"_package.html command doc page for
details. :l,ule
[Performance tips:] [Performance tips:]
@ -359,17 +383,17 @@ entirely on the GPU(s) (except for inter-processor MPI communication),
for multiple timesteps, until a CPU calculation is required, either by for multiple timesteps, until a CPU calculation is required, either by
a fix or compute that is non-GPU-ized, or until output is performed a fix or compute that is non-GPU-ized, or until output is performed
(thermo or dump snapshot or restart file). The less often this (thermo or dump snapshot or restart file). The less often this
occurs, the faster your simulation may run. occurs, the faster your simulation will run.
:line :line
:line :line
10.4 Comparison of GPU and USER-CUDA packages :h4,link(10_4) 10.5 Comparison of GPU and USER-CUDA packages :h4,link(10_5)
Both the GPU and USER-CUDA packages accelerate a LAMMPS calculation Both the GPU and USER-CUDA packages accelerate a LAMMPS calculation
using NVIDIA hardware, but they do it in different ways. using NVIDIA hardware, but they do it in different ways.
As a consequence, for a specific simulation on particular hardware, As a consequence, for a particular simulation on specific hardware,
one package may be faster than the other. We give guidelines below, one package may be faster than the other. We give guidelines below,
but the best way to determine which package is faster for your input but the best way to determine which package is faster for your input
script is to try both of them on your machine. See the benchmarking script is to try both of them on your machine. See the benchmarking
@ -377,6 +401,11 @@ section below for examples where this has been done.
[Guidelines for using each package optimally:] [Guidelines for using each package optimally:]
The GPU package allows you to assign multiple CPUs (cores) to a single
GPU (a common configuration for "hybrid" nodes that contain multicore
CPU(s) and GPU(s)) and works effectively in this mode. The USER-CUDA
package does not allow this; you can only use one CPU per GPU. :ulb,l
The GPU package moves per-atom data (coordinates, forces) The GPU package moves per-atom data (coordinates, forces)
back-and-forth between the CPU and GPU every timestep. The USER-CUDA back-and-forth between the CPU and GPU every timestep. The USER-CUDA
package only does this on timesteps when a CPU calculation is required package only does this on timesteps when a CPU calculation is required
@ -385,7 +414,7 @@ can formulate your input script to only use GPU-ized fixes and
computes, and avoid doing I/O too often (thermo output, dump file computes, and avoid doing I/O too often (thermo output, dump file
snapshots, restart files), then the data transfer cost of the snapshots, restart files), then the data transfer cost of the
USER-CUDA package can be very low, causing it to run faster than the USER-CUDA package can be very low, causing it to run faster than the
GPU package. :ulb,l GPU package. :l
The GPU package is often faster than the USER-CUDA package, if the The GPU package is often faster than the USER-CUDA package, if the
number of atoms per GPU is "small". The crossover point, in terms of number of atoms per GPU is "small". The crossover point, in terms of
@ -395,28 +424,12 @@ system the crossover (in single precision) is often about 50K-100K
atoms per GPU. When performing double precision calculations the atoms per GPU. When performing double precision calculations the
crossover point can be significantly smaller. :l crossover point can be significantly smaller. :l
The GPU package allows you to assign multiple CPUs (cores) to a single
GPU (a common configuration for "hybrid" nodes that contain multicore
CPU(s) and GPU(s)) and works effectively in this mode. The USER-CUDA
package does not; it works best when there is one CPU per GPU. :l
Both packages compute bonded interactions (bonds, angles, etc) on the Both packages compute bonded interactions (bonds, angles, etc) on the
CPU. This means a model with bonds will force the USER-CUDA package CPU. This means a model with bonds will force the USER-CUDA package
to transfer per-atom data back-and-forth between the CPU and GPU every to transfer per-atom data back-and-forth between the CPU and GPU every
timestep. If the GPU package is running with several MPI processes timestep. If the GPU package is running with several MPI processes
assigned to one GPU, the cost of computing the bonded interactions is assigned to one GPU, the cost of computing the bonded interactions is
spread across more CPUs and hence the GPU package can run faster. :l,ule spread across more CPUs and hence the GPU package can run faster. :l
[Chief differences between the two packages:]
The GPU package accelerates only pair force, neighbor list, and PPPM
calculations. The USER-CUDA package currently supports a wider range
of pair styles and can also accelerate many fix styles and some
compute styles, as well as neighbor list and PPPM calculations. :ulb,l
The GPU package uses more GPU memory than the USER-CUDA package. This
is generally not much of a problem since typical runs are
computation-limited rather than memory-limited. :l
When using the GPU package with multiple CPUs assigned to one GPU, its When using the GPU package with multiple CPUs assigned to one GPU, its
performance depends to some extent on high bandwidth between the CPUs performance depends to some extent on high bandwidth between the CPUs
@ -426,17 +439,29 @@ case if S2050/70 servers are used, where two devices generally share
one PCIe 2.0 16x slot. Also many multi-GPU mainboards do not provide one PCIe 2.0 16x slot. Also many multi-GPU mainboards do not provide
full 16 lanes to each of the PCIe 2.0 16x slots. :l,ule full 16 lanes to each of the PCIe 2.0 16x slots. :l,ule
[Differences between the two packages:]
The GPU package accelerates only pair force, neighbor list, and PPPM
calculations. The USER-CUDA package currently supports a wider range
of pair styles and can also accelerate many fix styles and some
compute styles, as well as neighbor list and PPPM calculations. :ulb,l
The GPU package uses more GPU memory than the USER-CUDA package. This
is generally not a problem since typical runs are computation-limited
rather than memory-limited. :l,ule
[Examples:] [Examples:]
The LAMMPS distribution has two directories with sample The LAMMPS distribution has two directories with sample input scripts
input scripts for the GPU and USER-CUDA packages. for the GPU and USER-CUDA packages.
lammps/examples/gpu = GPU package files lammps/examples/gpu = GPU package files
lammps/examples/USER/cuda = USER-CUDA package files :ul lammps/examples/USER/cuda = USER-CUDA package files :ul
These are files for identical systems, so they can be These contain input scripts for identical systems, so they can be used
used to benchmark the performance of both packages to benchmark the performance of both packages on your system.
on your system.
:line
[Benchmark data:] [Benchmark data:]

View File

@ -58,8 +58,8 @@ LAMMPS output options.
</P> </P>
<P><B>Restrictions:</B> <P><B>Restrictions:</B>
</P> </P>
<P>This compute is part of the "user-ackland" package. It is only <P>This compute is part of the "user-misc" package. It is only enabled
enabled if LAMMPS was built with that package. See the <A HREF = "Section_start.html#2_3">Making if LAMMPS was built with that package. See the <A HREF = "Section_start.html#2_3">Making
LAMMPS</A> section for more info. LAMMPS</A> section for more info.
</P> </P>
<P><B>Related commands:</B> <P><B>Related commands:</B>

View File

@ -55,8 +55,8 @@ LAMMPS output options.
[Restrictions:] [Restrictions:]
This compute is part of the "user-ackland" package. It is only This compute is part of the "user-misc" package. It is only enabled
enabled if LAMMPS was built with that package. See the "Making if LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#2_3 section for more info. LAMMPS"_Section_start.html#2_3 section for more info.
[Related commands:] [Related commands:]

View File

@ -43,9 +43,22 @@ fix comm all imd 8888 trate 5 unwrap on fscale 10.0
<P><B>Description:</B> <P><B>Description:</B>
</P> </P>
<P>This fix implements the "Interactive MD" (IMD) protocol which allows <P>This fix implements the "Interactive MD" (IMD) protocol which allows
to connect an IMD client, for example the <A HREF = "http://www.ks.uiuc.edu/Research/vmd">VMD visualization realtime visualization and manipulation of MD simulations through the
program</A>, to a running LAMMPS simulation and monitor the progress IMD protocol, as initially implemented in VMD and NAMD. Specifically
of the simulation and interactively apply forces to selected atoms. it allows LAMMPS to connect an IMD client, for example the <A HREF = "http://www.ks.uiuc.edu/Research/vmd">VMD
visualization program</A>, so that it can monitor the progress of the
simulation and interactively apply forces to selected atoms.
</P>
<P>If LAMMPS is compiled with the preprocessor flag -DLAMMPS_ASYNC_IMD
then fix imd will use posix threads to spawn a thread on MPI rank 0 in
order to offload data reading and writing from the main execution
thread and potentiall lower the inferred latencies for slow
communication links. This feature has only been tested under linux.
</P>
<P>There are example scripts for using this package with LAMMPS in
examples/USER/imd. Additional examples and a driver for use with the
Novint Falcon game controller as haptic device can be found at:
http://sites.google.com/site/akohlmey/software/vrpn-icms.
</P> </P>
<P>The source code for this fix includes code developed by the <P>The source code for this fix includes code developed by the
Theoretical and Computational Biophysics Group in the Beckman Theoretical and Computational Biophysics Group in the Beckman
@ -138,15 +151,16 @@ This fix is not invoked during <A HREF = "minimize.html">energy minimization</A>
</P> </P>
<P><B>Restrictions:</B> <P><B>Restrictions:</B>
</P> </P>
<P>This fix is part of the "user-imd" package. It is only enabled if <P>This fix is part of the "user-misc" package. It is only enabled if
LAMMPS was built with that package. See the <A HREF = "Section_start.html#2_3">Making LAMMPS was built with that package. See the <A HREF = "Section_start.html#2_3">Making
LAMMPS</A> section for more info. LAMMPS</A> section for more info.
This on platforms that support multi-threading, this fix can be </P>
compiled in a way that the coordinate transfers to the IMD client <P>On platforms that support multi-threading, this fix can be compiled in
can be handled from a separate thread, when LAMMPS is compiled with a way that the coordinate transfers to the IMD client can be handled
the -DLAMMPS_ASYNC_IMD preprocessor flag. This should to keep from a separate thread, when LAMMPS is compiled with the
MD loop times low and transfer rates high, especially for systems -DLAMMPS_ASYNC_IMD preprocessor flag. This should to keep MD loop
with many atoms and for slow connections. times low and transfer rates high, especially for systems with many
atoms and for slow connections.
</P> </P>
<P>When used in combination with VMD, a topology or coordinate file has <P>When used in combination with VMD, a topology or coordinate file has
to be loaded, which matches (in number and ordering of atoms) the to be loaded, which matches (in number and ordering of atoms) the

View File

@ -35,9 +35,22 @@ fix comm all imd 8888 trate 5 unwrap on fscale 10.0 :pre
[Description:] [Description:]
This fix implements the "Interactive MD" (IMD) protocol which allows This fix implements the "Interactive MD" (IMD) protocol which allows
to connect an IMD client, for example the "VMD visualization realtime visualization and manipulation of MD simulations through the
program"_VMD, to a running LAMMPS simulation and monitor the progress IMD protocol, as initially implemented in VMD and NAMD. Specifically
of the simulation and interactively apply forces to selected atoms. it allows LAMMPS to connect an IMD client, for example the "VMD
visualization program"_VMD, so that it can monitor the progress of the
simulation and interactively apply forces to selected atoms.
If LAMMPS is compiled with the preprocessor flag -DLAMMPS_ASYNC_IMD
then fix imd will use posix threads to spawn a thread on MPI rank 0 in
order to offload data reading and writing from the main execution
thread and potentiall lower the inferred latencies for slow
communication links. This feature has only been tested under linux.
There are example scripts for using this package with LAMMPS in
examples/USER/imd. Additional examples and a driver for use with the
Novint Falcon game controller as haptic device can be found at:
http://sites.google.com/site/akohlmey/software/vrpn-icms.
The source code for this fix includes code developed by the The source code for this fix includes code developed by the
Theoretical and Computational Biophysics Group in the Beckman Theoretical and Computational Biophysics Group in the Beckman
@ -128,15 +141,16 @@ This fix is not invoked during "energy minimization"_minimize.html.
[Restrictions:] [Restrictions:]
This fix is part of the "user-imd" package. It is only enabled if This fix is part of the "user-misc" package. It is only enabled if
LAMMPS was built with that package. See the "Making LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#2_3 section for more info. LAMMPS"_Section_start.html#2_3 section for more info.
This on platforms that support multi-threading, this fix can be
compiled in a way that the coordinate transfers to the IMD client On platforms that support multi-threading, this fix can be compiled in
can be handled from a separate thread, when LAMMPS is compiled with a way that the coordinate transfers to the IMD client can be handled
the -DLAMMPS_ASYNC_IMD preprocessor flag. This should to keep from a separate thread, when LAMMPS is compiled with the
MD loop times low and transfer rates high, especially for systems -DLAMMPS_ASYNC_IMD preprocessor flag. This should to keep MD loop
with many atoms and for slow connections. times low and transfer rates high, especially for systems with many
atoms and for slow connections.
When used in combination with VMD, a topology or coordinate file has When used in combination with VMD, a topology or coordinate file has
to be loaded, which matches (in number and ordering of atoms) the to be loaded, which matches (in number and ordering of atoms) the

View File

@ -132,7 +132,7 @@ minimization</A>.
</P> </P>
<P><B>Restrictions:</B> <P><B>Restrictions:</B>
</P> </P>
<P>This fix is part of the "user-smd" package. It is only enabled if <P>This fix is part of the "user-misc" package. It is only enabled if
LAMMPS was built with that package. See the <A HREF = "Section_start.html#2_3">Making LAMMPS was built with that package. See the <A HREF = "Section_start.html#2_3">Making
LAMMPS</A> section for more info. LAMMPS</A> section for more info.
</P> </P>

View File

@ -123,7 +123,7 @@ minimization"_minimize.html.
[Restrictions:] [Restrictions:]
This fix is part of the "user-smd" package. It is only enabled if This fix is part of the "user-misc" package. It is only enabled if
LAMMPS was built with that package. See the "Making LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#2_3 section for more info. LAMMPS"_Section_start.html#2_3 section for more info.

View File

@ -101,7 +101,7 @@ the other particles.
<HR> <HR>
<P>The <I>cuda</I> style invokes options associated with the use of the <P>The <I>cuda</I> style invokes options associated with the use of the
USER-CUDA package. These need to be documented. USER-CUDA package. These still need to be documented.
</P> </P>
<HR> <HR>

View File

@ -95,7 +95,7 @@ the other particles.
:line :line
The {cuda} style invokes options associated with the use of the The {cuda} style invokes options associated with the use of the
USER-CUDA package. These need to be documented. USER-CUDA package. These still need to be documented.
:line :line

View File

@ -415,7 +415,7 @@ an input script that reads a restart file.
that package (which it is by default). See the <A HREF = "Section_start.html#2_3">Making that package (which it is by default). See the <A HREF = "Section_start.html#2_3">Making
LAMMPS</A> section for more info. LAMMPS</A> section for more info.
</P> </P>
<P>The <I>eam/cd</I> style is part of the "user-cd-eam" package and also <P>The <I>eam/cd</I> style is part of the "user-misc" package and also
requires the "manybody" package. It is only enabled if LAMMPS was requires the "manybody" package. It is only enabled if LAMMPS was
built with those packages. See the <A HREF = "Section_start.html#2_3">Making built with those packages. See the <A HREF = "Section_start.html#2_3">Making
LAMMPS</A> section for more info. LAMMPS</A> section for more info.

View File

@ -403,7 +403,7 @@ All of these styles except the {eam/cd} style are part of the
that package (which it is by default). See the "Making that package (which it is by default). See the "Making
LAMMPS"_Section_start.html#2_3 section for more info. LAMMPS"_Section_start.html#2_3 section for more info.
The {eam/cd} style is part of the "user-cd-eam" package and also The {eam/cd} style is part of the "user-misc" package and also
requires the "manybody" package. It is only enabled if LAMMPS was requires the "manybody" package. It is only enabled if LAMMPS was
built with those packages. See the "Making built with those packages. See the "Making
LAMMPS"_Section_start.html#2_3 section for more info. LAMMPS"_Section_start.html#2_3 section for more info.