lammps/doc/html/Speed_gpu.html

<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
  <meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />

  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>7.4.1. GPU package &mdash; LAMMPS documentation</title>
      <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
      <link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
      <link rel="stylesheet" href="_static/sphinx-design.min.css" type="text/css" />
      <link rel="stylesheet" href="_static/css/lammps.css" type="text/css" />
    <link rel="shortcut icon" href="_static/lammps.ico"/>
    <link rel="canonical" href="https://docs.lammps.org/Speed_gpu.html" />
  <!--[if lt IE 9]>
    <script src="_static/js/html5shiv.min.js"></script>
  <![endif]-->

        <script src="_static/jquery.js?v=5d32c60e"></script>
        <script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
        <script src="_static/documentation_options.js?v=5929fcd5"></script>
        <script src="_static/doctools.js?v=9bcbadda"></script>
        <script src="_static/sphinx_highlight.js?v=dc90522c"></script>
        <script src="_static/design-tabs.js?v=f930bc37"></script>
        <script async="async" src="_static/mathjax/es5/tex-mml-chtml.js?v=cadf963e"></script>
    <script src="_static/js/theme.js"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="7.4.2. INTEL package" href="Speed_intel.html" />
    <link rel="prev" title="7.4. Accelerator packages" href="Speed_packages.html" />
</head>

<body class="wy-body-for-nav">
  <div class="wy-grid-for-nav">
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search" >


          <a href="Manual.html">

              <img src="_static/lammps-logo.png" class="logo" alt="Logo"/>
          </a>
            <div class="lammps_version">Version: <b>19 Nov 2024</b></div>
            <div class="lammps_release">git info: </div>
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>
        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
              <p class="caption" role="heading"><span class="caption-text">User Guide</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="Intro.html">1. Introduction</a></li>
<li class="toctree-l1"><a class="reference internal" href="Install.html">2. Install LAMMPS</a></li>
<li class="toctree-l1"><a class="reference internal" href="Build.html">3. Build LAMMPS</a></li>
<li class="toctree-l1"><a class="reference internal" href="Run_head.html">4. Run LAMMPS</a></li>
<li class="toctree-l1"><a class="reference internal" href="Commands.html">5. Commands</a></li>
<li class="toctree-l1"><a class="reference internal" href="Packages.html">6. Optional packages</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="Speed.html">7. Accelerate performance</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="Speed_bench.html">7.1. Benchmarks</a></li>
<li class="toctree-l2"><a class="reference internal" href="Speed_measure.html">7.2. Measuring performance</a></li>
<li class="toctree-l2"><a class="reference internal" href="Speed_tips.html">7.3. General tips</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="Speed_packages.html">7.4. Accelerator packages</a><ul class="current">
<li class="toctree-l3 current"><a class="current reference internal" href="#">7.4.1. GPU package</a></li>
<li class="toctree-l3"><a class="reference internal" href="Speed_intel.html">7.4.2. INTEL package</a></li>
<li class="toctree-l3"><a class="reference internal" href="Speed_kokkos.html">7.4.3. KOKKOS package</a></li>
<li class="toctree-l3"><a class="reference internal" href="Speed_omp.html">7.4.4. OPENMP package</a></li>
<li class="toctree-l3"><a class="reference internal" href="Speed_opt.html">7.4.5. OPT package</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="Speed_compare.html">7.5. Comparison of various accelerator packages</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="Howto.html">8. Howto discussions</a></li>
<li class="toctree-l1"><a class="reference internal" href="Examples.html">9. Example scripts</a></li>
<li class="toctree-l1"><a class="reference internal" href="Tools.html">10. Auxiliary tools</a></li>
<li class="toctree-l1"><a class="reference internal" href="Errors.html">11. Errors</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Programmer Guide</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Library.html">1. LAMMPS Library Interfaces</a></li>
<li class="toctree-l1"><a class="reference internal" href="Python_head.html">2. Use Python with LAMMPS</a></li>
<li class="toctree-l1"><a class="reference internal" href="Modify.html">3. Modifying &amp; extending LAMMPS</a></li>
<li class="toctree-l1"><a class="reference internal" href="Developer.html">4. Information for Developers</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Command Reference</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="commands_list.html">Commands</a></li>
<li class="toctree-l1"><a class="reference internal" href="fixes.html">Fix Styles</a></li>
<li class="toctree-l1"><a class="reference internal" href="computes.html">Compute Styles</a></li>
<li class="toctree-l1"><a class="reference internal" href="pairs.html">Pair Styles</a></li>
<li class="toctree-l1"><a class="reference internal" href="bonds.html">Bond Styles</a></li>
<li class="toctree-l1"><a class="reference internal" href="angles.html">Angle Styles</a></li>
<li class="toctree-l1"><a class="reference internal" href="dihedrals.html">Dihedral Styles</a></li>
<li class="toctree-l1"><a class="reference internal" href="impropers.html">Improper Styles</a></li>
<li class="toctree-l1"><a class="reference internal" href="dumps.html">Dump Styles</a></li>
<li class="toctree-l1"><a class="reference internal" href="fix_modify_atc_commands.html">fix_modify AtC commands</a></li>
<li class="toctree-l1"><a class="reference internal" href="Bibliography.html">Bibliography</a></li>
</ul>

        </div>
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="Manual.html">LAMMPS</a>
      </nav>

      <div class="wy-nav-content">
        <div class="rst-content style-external-links">
          <div role="navigation" aria-label="Page navigation">
  <ul class="wy-breadcrumbs">
      <li><a href="Manual.html" class="icon icon-home" aria-label="Home"></a></li>
          <li class="breadcrumb-item"><a href="Speed.html"><span class="section-number">7. </span>Accelerate performance</a></li>
          <li class="breadcrumb-item"><a href="Speed_packages.html"><span class="section-number">7.4. </span>Accelerator packages</a></li>
      <li class="breadcrumb-item active"><span class="section-number">7.4.1. </span>GPU package</li>
      <li class="wy-breadcrumbs-aside">
          <a href="https://www.lammps.org"><img src="_static/lammps-logo.png" width="64" height="16" alt="LAMMPS Homepage"></a> | <a href="Commands_all.html">Commands</a>
      </li>
  </ul><div class="rst-breadcrumbs-buttons" role="navigation" aria-label="Sequential page navigation">
        <a href="Speed_packages.html" class="btn btn-neutral float-left" title="7.4. Accelerator packages" accesskey="p"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="Speed_intel.html" class="btn btn-neutral float-right" title="7.4.2. INTEL package" accesskey="n">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
  </div>
  <hr/>
</div>
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">

  <p><span class="math notranslate nohighlight">\(\renewcommand{\AA}{\text{Å}}\)</span></p>
<section id="gpu-package">
<h1><span class="section-number">7.4.1. </span>GPU package<a class="headerlink" href="#gpu-package" title="Link to this heading"></a></h1>
<p>The GPU package was developed by Mike Brown while at SNL and ORNL (now
at Intel Corp.) and his collaborators, particularly Trung Nguyen (now at
Northwestern).  Support for AMD GPUs via HIP was added by Vsevolod Nikolskiy
and coworkers at HSE University.</p>
<p>The GPU package provides GPU versions of many pair styles and for
parts of the <a class="reference internal" href="kspace_style.html"><span class="doc">kspace_style pppm</span></a> for long-range
Coulombics.  It has the following general features:</p>
<ul class="simple">
<li><p>It is designed to exploit common GPU hardware configurations where one
or more GPUs are coupled to many cores of one or more multicore CPUs,
e.g. within a node of a parallel machine.</p></li>
<li><p>Atom-based data (e.g. coordinates, forces) are moved back-and-forth
between the CPU(s) and GPU every timestep.</p></li>
<li><p>Neighbor lists can be built on the CPU or on the GPU</p></li>
<li><p>The charge assignment and force interpolation portions of PPPM can be
run on the GPU.  The FFT portion, which requires MPI communication
between processors, runs on the CPU.</p></li>
<li><p>Force computations of different style (pair vs. bond/angle/dihedral/improper)
can be performed concurrently on the GPU and CPU(s), respectively.</p></li>
<li><p>It allows for GPU computations to be performed in single or double
precision, or in mixed-mode precision, where pairwise forces are
computed in single precision, but accumulated into double-precision
force vectors.</p></li>
<li><p>LAMMPS-specific code is in the GPU package.  It makes calls to a
generic GPU library in the lib/gpu directory.  This library provides
either Nvidia support, AMD support, or more general OpenCL support
(for Nvidia GPUs, AMD GPUs, Intel GPUs, and multicore CPUs).
so that the same functionality is supported on a variety of hardware.</p></li>
</ul>
<section id="required-hardware-software">
<h2>Required hardware/software<a class="headerlink" href="#required-hardware-software" title="Link to this heading"></a></h2>
<p>To compile and use this package in CUDA mode, you currently need
to have an NVIDIA GPU and install the corresponding NVIDIA CUDA
toolkit software on your system (this is only tested on Linux
and unsupported on Windows):</p>
<ul class="simple">
<li><p>Check if you have an NVIDIA GPU: <code class="docutils literal notranslate"><span class="pre">cat</span> <span class="pre">/proc/driver/nvidia/gpus/\*/information</span></code></p></li>
<li><p>Go to <a class="reference external" href="https://developer.nvidia.com/cuda-downloads">https://developer.nvidia.com/cuda-downloads</a></p></li>
<li><p>Install a driver and toolkit appropriate for your system (SDK is not necessary)</p></li>
<li><p>Run <code class="docutils literal notranslate"><span class="pre">lammps/lib/gpu/nvc_get_devices</span></code> (after building the GPU library, see below) to
list supported devices and properties</p></li>
</ul>
<p>To compile and use this package in OpenCL mode, you currently need
to have the OpenCL headers and the (vendor neutral) OpenCL library installed.
In OpenCL mode, the acceleration depends on having an <a class="reference external" href="https://www.khronos.org/news/permalink/opencl-installable-client-driver-icd-loader">OpenCL Installable Client Driver (ICD)</a>
installed. There can be multiple of them for the same or different hardware
(GPUs, CPUs, Accelerators) installed at the same time. OpenCL refers to those
as ‘platforms’.  The GPU library will try to auto-select the best suitable platform,
but this can be overridden using the platform option of the <a class="reference internal" href="package.html"><span class="doc">package</span></a>
command. run <code class="docutils literal notranslate"><span class="pre">lammps/lib/gpu/ocl_get_devices</span></code> to get a list of available
platforms and devices with a suitable ICD available.</p>
<p>To compile and use this package for Intel GPUs, OpenCL or the Intel oneAPI
HPC Toolkit can be installed using linux package managers. The latter also
provides optimized C++, MPI, and many other libraries and tools. See:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html">https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html</a></p></li>
</ul>
<p>If you do not have a discrete GPU card installed, this package can still provide
significant speedups on some CPUs that include integrated GPUs. Additionally, for
many macs, OpenCL is already included with the OS and Makefiles are available
in the <code class="docutils literal notranslate"><span class="pre">lib/gpu</span></code> directory.</p>
<p>To compile and use this package in HIP mode, you have to have the AMD ROCm
software installed. Versions of ROCm older than 3.5 are currently deprecated
by AMD.</p>
</section>
<section id="building-lammps-with-the-gpu-package">
<h2>Building LAMMPS with the GPU package<a class="headerlink" href="#building-lammps-with-the-gpu-package" title="Link to this heading"></a></h2>
<p>See the <a class="reference internal" href="Build_extras.html#gpu"><span class="std std-ref">Build extras</span></a> page for
instructions.</p>
</section>
<section id="run-with-the-gpu-package-from-the-command-line">
<h2>Run with the GPU package from the command-line<a class="headerlink" href="#run-with-the-gpu-package-from-the-command-line" title="Link to this heading"></a></h2>
<p>The <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> or <code class="docutils literal notranslate"><span class="pre">mpiexec</span></code> command sets the total number of MPI tasks
used by LAMMPS (one or multiple per compute node) and the number of MPI
tasks used per node.  E.g. the <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> command in MPICH does this via
its <code class="docutils literal notranslate"><span class="pre">-np</span></code> and <code class="docutils literal notranslate"><span class="pre">-ppn</span></code> switches.  Ditto for OpenMPI via <code class="docutils literal notranslate"><span class="pre">-np</span></code> and
<code class="docutils literal notranslate"><span class="pre">-npernode</span></code>.</p>
<p>When using the GPU package, you cannot assign more than one GPU to a
single MPI task.  However multiple MPI tasks can share the same GPU,
and in many cases it will be more efficient to run this way.  Likewise
it may be more efficient to use less MPI tasks/node than the available
# of CPU cores.  Assignment of multiple MPI tasks to a GPU will happen
automatically if you create more MPI tasks/node than there are
GPUs/mode.  E.g. with 8 MPI tasks/node and 2 GPUs, each GPU will be
shared by 4 MPI tasks.</p>
<p>The GPU package also has limited support for OpenMP for both
multi-threading and vectorization of routines that are run on the CPUs.
This requires that the GPU library and LAMMPS are built with flags to
enable OpenMP support (e.g. <code class="docutils literal notranslate"><span class="pre">-fopenmp</span></code>). Some styles for time integration
are also available in the GPU package. These run completely on the CPUs
in full double precision, but exploit multi-threading and vectorization
for faster performance.</p>
<p>Use the <code class="docutils literal notranslate"><span class="pre">-sf</span> <span class="pre">gpu</span></code> <a class="reference internal" href="Run_options.html"><span class="doc">command-line switch</span></a>, which will
automatically append “gpu” to styles that support it.  Use the <code class="docutils literal notranslate"><span class="pre">-pk</span>
<span class="pre">gpu</span> <span class="pre">Ng</span></code> <a class="reference internal" href="Run_options.html"><span class="doc">command-line switch</span></a> to set <code class="docutils literal notranslate"><span class="pre">Ng</span></code> = # of
GPUs/node to use. If <code class="docutils literal notranslate"><span class="pre">Ng</span></code> is 0, the number is selected automatically as
the number of matching GPUs that have the highest number of compute
cores.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># 1 MPI task uses 1 GPU</span>
lmp_machine<span class="w"> </span>-sf<span class="w"> </span>gpu<span class="w"> </span>-pk<span class="w"> </span>gpu<span class="w"> </span><span class="m">1</span><span class="w"> </span>-in<span class="w"> </span><span class="k">in</span>.script

<span class="c1"># 12 MPI tasks share 2 GPUs on a single 16-core (or whatever) node</span>
mpirun<span class="w"> </span>-np<span class="w"> </span><span class="m">12</span><span class="w"> </span>lmp_machine<span class="w"> </span>-sf<span class="w"> </span>gpu<span class="w"> </span>-pk<span class="w"> </span>gpu<span class="w"> </span><span class="m">2</span><span class="w"> </span>-in<span class="w"> </span><span class="k">in</span>.script

<span class="c1"># ditto on 4 16-core nodes</span>
mpirun<span class="w"> </span>-np<span class="w"> </span><span class="m">48</span><span class="w"> </span>-ppn<span class="w"> </span><span class="m">12</span><span class="w"> </span>lmp_machine<span class="w"> </span>-sf<span class="w"> </span>gpu<span class="w"> </span>-pk<span class="w"> </span>gpu<span class="w"> </span><span class="m">2</span><span class="w"> </span>-in<span class="w"> </span><span class="k">in</span>.script
</pre></div>
</div>
<p>Note that if the <code class="docutils literal notranslate"><span class="pre">-sf</span> <span class="pre">gpu</span></code> switch is used, it also issues a default
<a class="reference internal" href="package.html"><span class="doc">package gpu 0</span></a> command, which will result in
automatic selection of the number of GPUs to use.</p>
<p>Using the <code class="docutils literal notranslate"><span class="pre">-pk</span></code> switch explicitly allows for setting of the number of
GPUs/node to use and additional options.  Its syntax is the same as
the <code class="docutils literal notranslate"><span class="pre">package</span> <span class="pre">gpu</span></code> command.  See the <a class="reference internal" href="package.html"><span class="doc">package</span></a>
command page for details, including the default values used for
all its options if it is not specified.</p>
<p>Note that the default for the <a class="reference internal" href="package.html"><span class="doc">package gpu</span></a> command is to
set the Newton flag to “off” pairwise interactions.  It does not
affect the setting for bonded interactions (LAMMPS default is “on”).
The “off” setting for pairwise interaction is currently required for
GPU package pair styles.</p>
</section>
<section id="run-with-the-gpu-package-by-editing-an-input-script">
<h2>Run with the GPU package by editing an input script<a class="headerlink" href="#run-with-the-gpu-package-by-editing-an-input-script" title="Link to this heading"></a></h2>
<p>The discussion above for the <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> or <code class="docutils literal notranslate"><span class="pre">mpiexec</span></code> command, MPI
tasks/node, and use of multiple MPI tasks/GPU is the same.</p>
<p>Use the <a class="reference internal" href="suffix.html"><span class="doc">suffix gpu</span></a> command, or you can explicitly add an
“gpu” suffix to individual styles in your input script, e.g.</p>
<div class="highlight-LAMMPS notranslate"><div class="highlight"><pre><span></span><span class="k">pair_style</span><span class="w"> </span><span class="n">lj</span><span class="o">/</span><span class="n">cut</span><span class="o">/</span><span class="n">gpu</span><span class="w"> </span><span class="m">2.5</span>
</pre></div>
</div>
<p>You must also use the <a class="reference internal" href="package.html"><span class="doc">package gpu</span></a> command to enable the
GPU package, unless the <code class="docutils literal notranslate"><span class="pre">-sf</span> <span class="pre">gpu</span></code> or <code class="docutils literal notranslate"><span class="pre">-pk</span> <span class="pre">gpu</span></code> <a class="reference internal" href="Run_options.html"><span class="doc">command-line switches</span></a> were used.  It specifies the number of
GPUs/node to use, as well as other options.</p>
</section>
<section id="speed-up-to-expect">
<h2>Speed-up to expect<a class="headerlink" href="#speed-up-to-expect" title="Link to this heading"></a></h2>
<p>The performance of a GPU versus a multicore CPU is a function of your
hardware, which pair style is used, the number of atoms/GPU, and the
precision used on the GPU (double, single, mixed). Using the GPU package
in OpenCL mode on CPUs (which uses vectorization and multithreading) is
usually resulting in inferior performance compared to using LAMMPS’ native
threading and vectorization support in the OPENMP and INTEL packages.</p>
<p>See the <a class="reference external" href="https://www.lammps.org/bench.html">Benchmark page</a> of the
LAMMPS website for performance of the GPU package on various
hardware, including the Titan HPC platform at ORNL.</p>
<p>You should also experiment with how many MPI tasks per GPU to use to
give the best performance for your problem and machine.  This is also
a function of the problem size and the pair style being using.
Likewise, you should experiment with the precision setting for the GPU
library to see if single or mixed precision will give accurate
results, since they will typically be faster.</p>
<p>MPI parallelism typically outperforms OpenMP parallelism, but in some
cases using fewer MPI tasks and multiple OpenMP threads with the GPU
package can give better performance. 3-body potentials can often perform
better with multiple OMP threads because the inter-process communication
is higher for these styles with the GPU package in order to allow
deterministic results.</p>
</section>
<section id="guidelines-for-best-performance">
<h2>Guidelines for best performance<a class="headerlink" href="#guidelines-for-best-performance" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>Using multiple MPI tasks (2-10) per GPU will often give the best
performance, as allowed my most multicore CPU/GPU configurations.
Using too many MPI tasks will result in worse performance due to
growing overhead with the growing number of MPI tasks.</p></li>
<li><p>If the number of particles per MPI task is small (e.g. 100s of
particles), it can be more efficient to run with fewer MPI tasks per
GPU, even if you do not use all the cores on the compute node.</p></li>
<li><p>The <a class="reference internal" href="package.html"><span class="doc">package gpu</span></a> command has several options for tuning
performance.  Neighbor lists can be built on the GPU or CPU.  Force
calculations can be dynamically balanced across the CPU cores and
GPUs.  GPU-specific settings can be made which can be optimized
for different hardware.  See the <a class="reference internal" href="package.html"><span class="doc">package</span></a> command
page for details.</p></li>
<li><p>As described by the <a class="reference internal" href="package.html"><span class="doc">package gpu</span></a> command, GPU
accelerated pair styles can perform computations asynchronously with
CPU computations. The “Pair” time reported by LAMMPS will be the
maximum of the time required to complete the CPU pair style
computations and the time required to complete the GPU pair style
computations. Any time spent for GPU-enabled pair styles for
computations that run simultaneously with <a class="reference internal" href="bond_style.html"><span class="doc">bond</span></a>,
<a class="reference internal" href="angle_style.html"><span class="doc">angle</span></a>, <a class="reference internal" href="dihedral_style.html"><span class="doc">dihedral</span></a>,
<a class="reference internal" href="improper_style.html"><span class="doc">improper</span></a>, and <a class="reference internal" href="kspace_style.html"><span class="doc">long-range</span></a>
calculations will not be included in the “Pair” time.</p></li>
<li><p>Since only part of the pppm kspace style is GPU accelerated, it may be
faster to only use GPU acceleration for Pair styles with long-range
electrostatics.  See the “pair/only” keyword of the <a class="reference internal" href="package.html"><span class="doc">package
command</span></a> for a shortcut to do that.  The distribution of
work between kspace on the CPU and non-bonded interactions on the GPU
can be balanced through adjusting the coulomb cutoff without loss of
accuracy.</p></li>
<li><p>When the <em>mode</em> setting for the package gpu command is force/neigh,
the time for neighbor list calculations on the GPU will be added into
the “Pair” time, not the “Neigh” time.  An additional breakdown of the
times required for various tasks on the GPU (data copy, neighbor
calculations, force computations, etc) are output only with the LAMMPS
screen output (not in the log file) at the end of each run.  These
timings represent total time spent on the GPU for each routine,
regardless of asynchronous CPU calculations.</p></li>
<li><p>The output section “GPU Time Info (average)” reports “Max Mem / Proc”.
This is the maximum memory used at one time on the GPU for data
storage by a single MPI process.</p></li>
</ul>
</section>
<section id="restrictions">
<h2>Restrictions<a class="headerlink" href="#restrictions" title="Link to this heading"></a></h2>
<p>When using <a class="reference internal" href="pair_hybrid.html"><span class="doc">hybrid pair styles</span></a>, the neighbor list
must be generated on the host instead of the GPU and thus the potential
GPU acceleration is reduced.</p>
</section>
</section>


           </div>
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="Speed_packages.html" class="btn btn-neutral float-left" title="7.4. Accelerator packages" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="Speed_intel.html" class="btn btn-neutral float-right" title="7.4.2. INTEL package" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>

  <div role="contentinfo">
    <p>&#169; Copyright 2003-2025 Sandia Corporation.</p>
  </div>

  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
    provided by <a href="https://readthedocs.org">Read the Docs</a>.


</footer>
        </div>
      </div>
    </section>
  </div>
  <script>
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(false);
      });
  </script>

</body>
</html>