380 lines
27 KiB
HTML
380 lines
27 KiB
HTML
<!DOCTYPE html>
|
||
<html class="writer-html5" lang="en" >
|
||
<head>
|
||
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
|
||
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
<title>7.4.1. GPU package — LAMMPS documentation</title>
|
||
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/sphinx-design.min.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/css/lammps.css" type="text/css" />
|
||
<link rel="shortcut icon" href="_static/lammps.ico"/>
|
||
<link rel="canonical" href="https://docs.lammps.org/Speed_gpu.html" />
|
||
<!--[if lt IE 9]>
|
||
<script src="_static/js/html5shiv.min.js"></script>
|
||
<![endif]-->
|
||
|
||
<script src="_static/jquery.js?v=5d32c60e"></script>
|
||
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||
<script src="_static/documentation_options.js?v=5929fcd5"></script>
|
||
<script src="_static/doctools.js?v=9bcbadda"></script>
|
||
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
|
||
<script src="_static/design-tabs.js?v=f930bc37"></script>
|
||
<script async="async" src="_static/mathjax/es5/tex-mml-chtml.js?v=cadf963e"></script>
|
||
<script src="_static/js/theme.js"></script>
|
||
<link rel="index" title="Index" href="genindex.html" />
|
||
<link rel="search" title="Search" href="search.html" />
|
||
<link rel="next" title="7.4.2. INTEL package" href="Speed_intel.html" />
|
||
<link rel="prev" title="7.4. Accelerator packages" href="Speed_packages.html" />
|
||
</head>
|
||
|
||
<body class="wy-body-for-nav">
|
||
<div class="wy-grid-for-nav">
|
||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||
<div class="wy-side-scroll">
|
||
<div class="wy-side-nav-search" >
|
||
|
||
|
||
|
||
<a href="Manual.html">
|
||
|
||
<img src="_static/lammps-logo.png" class="logo" alt="Logo"/>
|
||
</a>
|
||
<div class="lammps_version">Version: <b>19 Nov 2024</b></div>
|
||
<div class="lammps_release">git info: </div>
|
||
<div role="search">
|
||
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
|
||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||
<input type="hidden" name="check_keywords" value="yes" />
|
||
<input type="hidden" name="area" value="default" />
|
||
</form>
|
||
</div>
|
||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||
<p class="caption" role="heading"><span class="caption-text">User Guide</span></p>
|
||
<ul class="current">
|
||
<li class="toctree-l1"><a class="reference internal" href="Intro.html">1. Introduction</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Install.html">2. Install LAMMPS</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Build.html">3. Build LAMMPS</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Run_head.html">4. Run LAMMPS</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Commands.html">5. Commands</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Packages.html">6. Optional packages</a></li>
|
||
<li class="toctree-l1 current"><a class="reference internal" href="Speed.html">7. Accelerate performance</a><ul class="current">
|
||
<li class="toctree-l2"><a class="reference internal" href="Speed_bench.html">7.1. Benchmarks</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="Speed_measure.html">7.2. Measuring performance</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="Speed_tips.html">7.3. General tips</a></li>
|
||
<li class="toctree-l2 current"><a class="reference internal" href="Speed_packages.html">7.4. Accelerator packages</a><ul class="current">
|
||
<li class="toctree-l3 current"><a class="current reference internal" href="#">7.4.1. GPU package</a></li>
|
||
<li class="toctree-l3"><a class="reference internal" href="Speed_intel.html">7.4.2. INTEL package</a></li>
|
||
<li class="toctree-l3"><a class="reference internal" href="Speed_kokkos.html">7.4.3. KOKKOS package</a></li>
|
||
<li class="toctree-l3"><a class="reference internal" href="Speed_omp.html">7.4.4. OPENMP package</a></li>
|
||
<li class="toctree-l3"><a class="reference internal" href="Speed_opt.html">7.4.5. OPT package</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l2"><a class="reference internal" href="Speed_compare.html">7.5. Comparison of various accelerator packages</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Howto.html">8. Howto discussions</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Examples.html">9. Example scripts</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Tools.html">10. Auxiliary tools</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Errors.html">11. Errors</a></li>
|
||
</ul>
|
||
<p class="caption" role="heading"><span class="caption-text">Programmer Guide</span></p>
|
||
<ul>
|
||
<li class="toctree-l1"><a class="reference internal" href="Library.html">1. LAMMPS Library Interfaces</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Python_head.html">2. Use Python with LAMMPS</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Modify.html">3. Modifying & extending LAMMPS</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Developer.html">4. Information for Developers</a></li>
|
||
</ul>
|
||
<p class="caption" role="heading"><span class="caption-text">Command Reference</span></p>
|
||
<ul>
|
||
<li class="toctree-l1"><a class="reference internal" href="commands_list.html">Commands</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="fixes.html">Fix Styles</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="computes.html">Compute Styles</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="pairs.html">Pair Styles</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="bonds.html">Bond Styles</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="angles.html">Angle Styles</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="dihedrals.html">Dihedral Styles</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="impropers.html">Improper Styles</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="dumps.html">Dump Styles</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="fix_modify_atc_commands.html">fix_modify AtC commands</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="Bibliography.html">Bibliography</a></li>
|
||
</ul>
|
||
|
||
</div>
|
||
</div>
|
||
</nav>
|
||
|
||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||
<a href="Manual.html">LAMMPS</a>
|
||
</nav>
|
||
|
||
<div class="wy-nav-content">
|
||
<div class="rst-content style-external-links">
|
||
<div role="navigation" aria-label="Page navigation">
|
||
<ul class="wy-breadcrumbs">
|
||
<li><a href="Manual.html" class="icon icon-home" aria-label="Home"></a></li>
|
||
<li class="breadcrumb-item"><a href="Speed.html"><span class="section-number">7. </span>Accelerate performance</a></li>
|
||
<li class="breadcrumb-item"><a href="Speed_packages.html"><span class="section-number">7.4. </span>Accelerator packages</a></li>
|
||
<li class="breadcrumb-item active"><span class="section-number">7.4.1. </span>GPU package</li>
|
||
<li class="wy-breadcrumbs-aside">
|
||
<a href="https://www.lammps.org"><img src="_static/lammps-logo.png" width="64" height="16" alt="LAMMPS Homepage"></a> | <a href="Commands_all.html">Commands</a>
|
||
</li>
|
||
</ul><div class="rst-breadcrumbs-buttons" role="navigation" aria-label="Sequential page navigation">
|
||
<a href="Speed_packages.html" class="btn btn-neutral float-left" title="7.4. Accelerator packages" accesskey="p"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
||
<a href="Speed_intel.html" class="btn btn-neutral float-right" title="7.4.2. INTEL package" accesskey="n">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
||
</div>
|
||
<hr/>
|
||
</div>
|
||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||
<div itemprop="articleBody">
|
||
|
||
<p><span class="math notranslate nohighlight">\(\renewcommand{\AA}{\text{Å}}\)</span></p>
|
||
<section id="gpu-package">
|
||
<h1><span class="section-number">7.4.1. </span>GPU package<a class="headerlink" href="#gpu-package" title="Link to this heading"></a></h1>
|
||
<p>The GPU package was developed by Mike Brown while at SNL and ORNL (now
|
||
at Intel Corp.) and his collaborators, particularly Trung Nguyen (now at
|
||
Northwestern). Support for AMD GPUs via HIP was added by Vsevolod Nikolskiy
|
||
and coworkers at HSE University.</p>
|
||
<p>The GPU package provides GPU versions of many pair styles and for
|
||
parts of the <a class="reference internal" href="kspace_style.html"><span class="doc">kspace_style pppm</span></a> for long-range
|
||
Coulombics. It has the following general features:</p>
|
||
<ul class="simple">
|
||
<li><p>It is designed to exploit common GPU hardware configurations where one
|
||
or more GPUs are coupled to many cores of one or more multicore CPUs,
|
||
e.g. within a node of a parallel machine.</p></li>
|
||
<li><p>Atom-based data (e.g. coordinates, forces) are moved back-and-forth
|
||
between the CPU(s) and GPU every timestep.</p></li>
|
||
<li><p>Neighbor lists can be built on the CPU or on the GPU</p></li>
|
||
<li><p>The charge assignment and force interpolation portions of PPPM can be
|
||
run on the GPU. The FFT portion, which requires MPI communication
|
||
between processors, runs on the CPU.</p></li>
|
||
<li><p>Force computations of different style (pair vs. bond/angle/dihedral/improper)
|
||
can be performed concurrently on the GPU and CPU(s), respectively.</p></li>
|
||
<li><p>It allows for GPU computations to be performed in single or double
|
||
precision, or in mixed-mode precision, where pairwise forces are
|
||
computed in single precision, but accumulated into double-precision
|
||
force vectors.</p></li>
|
||
<li><p>LAMMPS-specific code is in the GPU package. It makes calls to a
|
||
generic GPU library in the lib/gpu directory. This library provides
|
||
either Nvidia support, AMD support, or more general OpenCL support
|
||
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs).
|
||
so that the same functionality is supported on a variety of hardware.</p></li>
|
||
</ul>
|
||
<section id="required-hardware-software">
|
||
<h2>Required hardware/software<a class="headerlink" href="#required-hardware-software" title="Link to this heading"></a></h2>
|
||
<p>To compile and use this package in CUDA mode, you currently need
|
||
to have an NVIDIA GPU and install the corresponding NVIDIA CUDA
|
||
toolkit software on your system (this is only tested on Linux
|
||
and unsupported on Windows):</p>
|
||
<ul class="simple">
|
||
<li><p>Check if you have an NVIDIA GPU: <code class="docutils literal notranslate"><span class="pre">cat</span> <span class="pre">/proc/driver/nvidia/gpus/\*/information</span></code></p></li>
|
||
<li><p>Go to <a class="reference external" href="https://developer.nvidia.com/cuda-downloads">https://developer.nvidia.com/cuda-downloads</a></p></li>
|
||
<li><p>Install a driver and toolkit appropriate for your system (SDK is not necessary)</p></li>
|
||
<li><p>Run <code class="docutils literal notranslate"><span class="pre">lammps/lib/gpu/nvc_get_devices</span></code> (after building the GPU library, see below) to
|
||
list supported devices and properties</p></li>
|
||
</ul>
|
||
<p>To compile and use this package in OpenCL mode, you currently need
|
||
to have the OpenCL headers and the (vendor neutral) OpenCL library installed.
|
||
In OpenCL mode, the acceleration depends on having an <a class="reference external" href="https://www.khronos.org/news/permalink/opencl-installable-client-driver-icd-loader">OpenCL Installable Client Driver (ICD)</a>
|
||
installed. There can be multiple of them for the same or different hardware
|
||
(GPUs, CPUs, Accelerators) installed at the same time. OpenCL refers to those
|
||
as ‘platforms’. The GPU library will try to auto-select the best suitable platform,
|
||
but this can be overridden using the platform option of the <a class="reference internal" href="package.html"><span class="doc">package</span></a>
|
||
command. run <code class="docutils literal notranslate"><span class="pre">lammps/lib/gpu/ocl_get_devices</span></code> to get a list of available
|
||
platforms and devices with a suitable ICD available.</p>
|
||
<p>To compile and use this package for Intel GPUs, OpenCL or the Intel oneAPI
|
||
HPC Toolkit can be installed using linux package managers. The latter also
|
||
provides optimized C++, MPI, and many other libraries and tools. See:</p>
|
||
<ul class="simple">
|
||
<li><p><a class="reference external" href="https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html">https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html</a></p></li>
|
||
</ul>
|
||
<p>If you do not have a discrete GPU card installed, this package can still provide
|
||
significant speedups on some CPUs that include integrated GPUs. Additionally, for
|
||
many macs, OpenCL is already included with the OS and Makefiles are available
|
||
in the <code class="docutils literal notranslate"><span class="pre">lib/gpu</span></code> directory.</p>
|
||
<p>To compile and use this package in HIP mode, you have to have the AMD ROCm
|
||
software installed. Versions of ROCm older than 3.5 are currently deprecated
|
||
by AMD.</p>
|
||
</section>
|
||
<section id="building-lammps-with-the-gpu-package">
|
||
<h2>Building LAMMPS with the GPU package<a class="headerlink" href="#building-lammps-with-the-gpu-package" title="Link to this heading"></a></h2>
|
||
<p>See the <a class="reference internal" href="Build_extras.html#gpu"><span class="std std-ref">Build extras</span></a> page for
|
||
instructions.</p>
|
||
</section>
|
||
<section id="run-with-the-gpu-package-from-the-command-line">
|
||
<h2>Run with the GPU package from the command-line<a class="headerlink" href="#run-with-the-gpu-package-from-the-command-line" title="Link to this heading"></a></h2>
|
||
<p>The <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> or <code class="docutils literal notranslate"><span class="pre">mpiexec</span></code> command sets the total number of MPI tasks
|
||
used by LAMMPS (one or multiple per compute node) and the number of MPI
|
||
tasks used per node. E.g. the <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> command in MPICH does this via
|
||
its <code class="docutils literal notranslate"><span class="pre">-np</span></code> and <code class="docutils literal notranslate"><span class="pre">-ppn</span></code> switches. Ditto for OpenMPI via <code class="docutils literal notranslate"><span class="pre">-np</span></code> and
|
||
<code class="docutils literal notranslate"><span class="pre">-npernode</span></code>.</p>
|
||
<p>When using the GPU package, you cannot assign more than one GPU to a
|
||
single MPI task. However multiple MPI tasks can share the same GPU,
|
||
and in many cases it will be more efficient to run this way. Likewise
|
||
it may be more efficient to use less MPI tasks/node than the available
|
||
# of CPU cores. Assignment of multiple MPI tasks to a GPU will happen
|
||
automatically if you create more MPI tasks/node than there are
|
||
GPUs/mode. E.g. with 8 MPI tasks/node and 2 GPUs, each GPU will be
|
||
shared by 4 MPI tasks.</p>
|
||
<p>The GPU package also has limited support for OpenMP for both
|
||
multi-threading and vectorization of routines that are run on the CPUs.
|
||
This requires that the GPU library and LAMMPS are built with flags to
|
||
enable OpenMP support (e.g. <code class="docutils literal notranslate"><span class="pre">-fopenmp</span></code>). Some styles for time integration
|
||
are also available in the GPU package. These run completely on the CPUs
|
||
in full double precision, but exploit multi-threading and vectorization
|
||
for faster performance.</p>
|
||
<p>Use the <code class="docutils literal notranslate"><span class="pre">-sf</span> <span class="pre">gpu</span></code> <a class="reference internal" href="Run_options.html"><span class="doc">command-line switch</span></a>, which will
|
||
automatically append “gpu” to styles that support it. Use the <code class="docutils literal notranslate"><span class="pre">-pk</span>
|
||
<span class="pre">gpu</span> <span class="pre">Ng</span></code> <a class="reference internal" href="Run_options.html"><span class="doc">command-line switch</span></a> to set <code class="docutils literal notranslate"><span class="pre">Ng</span></code> = # of
|
||
GPUs/node to use. If <code class="docutils literal notranslate"><span class="pre">Ng</span></code> is 0, the number is selected automatically as
|
||
the number of matching GPUs that have the highest number of compute
|
||
cores.</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># 1 MPI task uses 1 GPU</span>
|
||
lmp_machine<span class="w"> </span>-sf<span class="w"> </span>gpu<span class="w"> </span>-pk<span class="w"> </span>gpu<span class="w"> </span><span class="m">1</span><span class="w"> </span>-in<span class="w"> </span><span class="k">in</span>.script
|
||
|
||
<span class="c1"># 12 MPI tasks share 2 GPUs on a single 16-core (or whatever) node</span>
|
||
mpirun<span class="w"> </span>-np<span class="w"> </span><span class="m">12</span><span class="w"> </span>lmp_machine<span class="w"> </span>-sf<span class="w"> </span>gpu<span class="w"> </span>-pk<span class="w"> </span>gpu<span class="w"> </span><span class="m">2</span><span class="w"> </span>-in<span class="w"> </span><span class="k">in</span>.script
|
||
|
||
<span class="c1"># ditto on 4 16-core nodes</span>
|
||
mpirun<span class="w"> </span>-np<span class="w"> </span><span class="m">48</span><span class="w"> </span>-ppn<span class="w"> </span><span class="m">12</span><span class="w"> </span>lmp_machine<span class="w"> </span>-sf<span class="w"> </span>gpu<span class="w"> </span>-pk<span class="w"> </span>gpu<span class="w"> </span><span class="m">2</span><span class="w"> </span>-in<span class="w"> </span><span class="k">in</span>.script
|
||
</pre></div>
|
||
</div>
|
||
<p>Note that if the <code class="docutils literal notranslate"><span class="pre">-sf</span> <span class="pre">gpu</span></code> switch is used, it also issues a default
|
||
<a class="reference internal" href="package.html"><span class="doc">package gpu 0</span></a> command, which will result in
|
||
automatic selection of the number of GPUs to use.</p>
|
||
<p>Using the <code class="docutils literal notranslate"><span class="pre">-pk</span></code> switch explicitly allows for setting of the number of
|
||
GPUs/node to use and additional options. Its syntax is the same as
|
||
the <code class="docutils literal notranslate"><span class="pre">package</span> <span class="pre">gpu</span></code> command. See the <a class="reference internal" href="package.html"><span class="doc">package</span></a>
|
||
command page for details, including the default values used for
|
||
all its options if it is not specified.</p>
|
||
<p>Note that the default for the <a class="reference internal" href="package.html"><span class="doc">package gpu</span></a> command is to
|
||
set the Newton flag to “off” pairwise interactions. It does not
|
||
affect the setting for bonded interactions (LAMMPS default is “on”).
|
||
The “off” setting for pairwise interaction is currently required for
|
||
GPU package pair styles.</p>
|
||
</section>
|
||
<section id="run-with-the-gpu-package-by-editing-an-input-script">
|
||
<h2>Run with the GPU package by editing an input script<a class="headerlink" href="#run-with-the-gpu-package-by-editing-an-input-script" title="Link to this heading"></a></h2>
|
||
<p>The discussion above for the <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> or <code class="docutils literal notranslate"><span class="pre">mpiexec</span></code> command, MPI
|
||
tasks/node, and use of multiple MPI tasks/GPU is the same.</p>
|
||
<p>Use the <a class="reference internal" href="suffix.html"><span class="doc">suffix gpu</span></a> command, or you can explicitly add an
|
||
“gpu” suffix to individual styles in your input script, e.g.</p>
|
||
<div class="highlight-LAMMPS notranslate"><div class="highlight"><pre><span></span><span class="k">pair_style</span><span class="w"> </span><span class="n">lj</span><span class="o">/</span><span class="n">cut</span><span class="o">/</span><span class="n">gpu</span><span class="w"> </span><span class="m">2.5</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>You must also use the <a class="reference internal" href="package.html"><span class="doc">package gpu</span></a> command to enable the
|
||
GPU package, unless the <code class="docutils literal notranslate"><span class="pre">-sf</span> <span class="pre">gpu</span></code> or <code class="docutils literal notranslate"><span class="pre">-pk</span> <span class="pre">gpu</span></code> <a class="reference internal" href="Run_options.html"><span class="doc">command-line switches</span></a> were used. It specifies the number of
|
||
GPUs/node to use, as well as other options.</p>
|
||
</section>
|
||
<section id="speed-up-to-expect">
|
||
<h2>Speed-up to expect<a class="headerlink" href="#speed-up-to-expect" title="Link to this heading"></a></h2>
|
||
<p>The performance of a GPU versus a multicore CPU is a function of your
|
||
hardware, which pair style is used, the number of atoms/GPU, and the
|
||
precision used on the GPU (double, single, mixed). Using the GPU package
|
||
in OpenCL mode on CPUs (which uses vectorization and multithreading) is
|
||
usually resulting in inferior performance compared to using LAMMPS’ native
|
||
threading and vectorization support in the OPENMP and INTEL packages.</p>
|
||
<p>See the <a class="reference external" href="https://www.lammps.org/bench.html">Benchmark page</a> of the
|
||
LAMMPS website for performance of the GPU package on various
|
||
hardware, including the Titan HPC platform at ORNL.</p>
|
||
<p>You should also experiment with how many MPI tasks per GPU to use to
|
||
give the best performance for your problem and machine. This is also
|
||
a function of the problem size and the pair style being using.
|
||
Likewise, you should experiment with the precision setting for the GPU
|
||
library to see if single or mixed precision will give accurate
|
||
results, since they will typically be faster.</p>
|
||
<p>MPI parallelism typically outperforms OpenMP parallelism, but in some
|
||
cases using fewer MPI tasks and multiple OpenMP threads with the GPU
|
||
package can give better performance. 3-body potentials can often perform
|
||
better with multiple OMP threads because the inter-process communication
|
||
is higher for these styles with the GPU package in order to allow
|
||
deterministic results.</p>
|
||
</section>
|
||
<section id="guidelines-for-best-performance">
|
||
<h2>Guidelines for best performance<a class="headerlink" href="#guidelines-for-best-performance" title="Link to this heading"></a></h2>
|
||
<ul class="simple">
|
||
<li><p>Using multiple MPI tasks (2-10) per GPU will often give the best
|
||
performance, as allowed my most multicore CPU/GPU configurations.
|
||
Using too many MPI tasks will result in worse performance due to
|
||
growing overhead with the growing number of MPI tasks.</p></li>
|
||
<li><p>If the number of particles per MPI task is small (e.g. 100s of
|
||
particles), it can be more efficient to run with fewer MPI tasks per
|
||
GPU, even if you do not use all the cores on the compute node.</p></li>
|
||
<li><p>The <a class="reference internal" href="package.html"><span class="doc">package gpu</span></a> command has several options for tuning
|
||
performance. Neighbor lists can be built on the GPU or CPU. Force
|
||
calculations can be dynamically balanced across the CPU cores and
|
||
GPUs. GPU-specific settings can be made which can be optimized
|
||
for different hardware. See the <a class="reference internal" href="package.html"><span class="doc">package</span></a> command
|
||
page for details.</p></li>
|
||
<li><p>As described by the <a class="reference internal" href="package.html"><span class="doc">package gpu</span></a> command, GPU
|
||
accelerated pair styles can perform computations asynchronously with
|
||
CPU computations. The “Pair” time reported by LAMMPS will be the
|
||
maximum of the time required to complete the CPU pair style
|
||
computations and the time required to complete the GPU pair style
|
||
computations. Any time spent for GPU-enabled pair styles for
|
||
computations that run simultaneously with <a class="reference internal" href="bond_style.html"><span class="doc">bond</span></a>,
|
||
<a class="reference internal" href="angle_style.html"><span class="doc">angle</span></a>, <a class="reference internal" href="dihedral_style.html"><span class="doc">dihedral</span></a>,
|
||
<a class="reference internal" href="improper_style.html"><span class="doc">improper</span></a>, and <a class="reference internal" href="kspace_style.html"><span class="doc">long-range</span></a>
|
||
calculations will not be included in the “Pair” time.</p></li>
|
||
<li><p>Since only part of the pppm kspace style is GPU accelerated, it may be
|
||
faster to only use GPU acceleration for Pair styles with long-range
|
||
electrostatics. See the “pair/only” keyword of the <a class="reference internal" href="package.html"><span class="doc">package
|
||
command</span></a> for a shortcut to do that. The distribution of
|
||
work between kspace on the CPU and non-bonded interactions on the GPU
|
||
can be balanced through adjusting the coulomb cutoff without loss of
|
||
accuracy.</p></li>
|
||
<li><p>When the <em>mode</em> setting for the package gpu command is force/neigh,
|
||
the time for neighbor list calculations on the GPU will be added into
|
||
the “Pair” time, not the “Neigh” time. An additional breakdown of the
|
||
times required for various tasks on the GPU (data copy, neighbor
|
||
calculations, force computations, etc) are output only with the LAMMPS
|
||
screen output (not in the log file) at the end of each run. These
|
||
timings represent total time spent on the GPU for each routine,
|
||
regardless of asynchronous CPU calculations.</p></li>
|
||
<li><p>The output section “GPU Time Info (average)” reports “Max Mem / Proc”.
|
||
This is the maximum memory used at one time on the GPU for data
|
||
storage by a single MPI process.</p></li>
|
||
</ul>
|
||
</section>
|
||
<section id="restrictions">
|
||
<h2>Restrictions<a class="headerlink" href="#restrictions" title="Link to this heading"></a></h2>
|
||
<p>When using <a class="reference internal" href="pair_hybrid.html"><span class="doc">hybrid pair styles</span></a>, the neighbor list
|
||
must be generated on the host instead of the GPU and thus the potential
|
||
GPU acceleration is reduced.</p>
|
||
</section>
|
||
</section>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
||
<a href="Speed_packages.html" class="btn btn-neutral float-left" title="7.4. Accelerator packages" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
||
<a href="Speed_intel.html" class="btn btn-neutral float-right" title="7.4.2. INTEL package" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
||
</div>
|
||
|
||
<hr/>
|
||
|
||
<div role="contentinfo">
|
||
<p>© Copyright 2003-2025 Sandia Corporation.</p>
|
||
</div>
|
||
|
||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||
|
||
|
||
</footer>
|
||
</div>
|
||
</div>
|
||
</section>
|
||
</div>
|
||
<script>
|
||
jQuery(function () {
|
||
SphinxRtdTheme.Navigation.enable(false);
|
||
});
|
||
</script>
|
||
|
||
</body>
|
||
</html> |