291 lines
20 KiB
HTML
291 lines
20 KiB
HTML
<!DOCTYPE html>
|
|
<html class="writer-html5" lang="en" >
|
|
<head>
|
|
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
|
|
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
|
<title>4.4.5. OpenMP Parallelism — LAMMPS documentation</title>
|
|
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
|
|
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
|
|
<link rel="stylesheet" href="_static/sphinx-design.min.css" type="text/css" />
|
|
<link rel="stylesheet" href="_static/css/lammps.css" type="text/css" />
|
|
<link rel="shortcut icon" href="_static/lammps.ico"/>
|
|
<link rel="canonical" href="https://docs.lammps.org/Developer_par_openmp.html" />
|
|
<!--[if lt IE 9]>
|
|
<script src="_static/js/html5shiv.min.js"></script>
|
|
<![endif]-->
|
|
|
|
<script src="_static/jquery.js?v=5d32c60e"></script>
|
|
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
|
<script src="_static/documentation_options.js?v=5929fcd5"></script>
|
|
<script src="_static/doctools.js?v=9bcbadda"></script>
|
|
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
|
|
<script src="_static/design-tabs.js?v=f930bc37"></script>
|
|
<script async="async" src="_static/mathjax/es5/tex-mml-chtml.js?v=cadf963e"></script>
|
|
<script src="_static/js/theme.js"></script>
|
|
<link rel="index" title="Index" href="genindex.html" />
|
|
<link rel="search" title="Search" href="search.html" />
|
|
<link rel="next" title="4.5. Accessing per-atom data" href="Developer_atom.html" />
|
|
<link rel="prev" title="4.4.4. Long-range interactions" href="Developer_par_long.html" />
|
|
</head>
|
|
|
|
<body class="wy-body-for-nav">
|
|
<div class="wy-grid-for-nav">
|
|
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
|
<div class="wy-side-scroll">
|
|
<div class="wy-side-nav-search" >
|
|
|
|
|
|
|
|
<a href="Manual.html">
|
|
|
|
<img src="_static/lammps-logo.png" class="logo" alt="Logo"/>
|
|
</a>
|
|
<div class="lammps_version">Version: <b>19 Nov 2024</b></div>
|
|
<div class="lammps_release">git info: </div>
|
|
<div role="search">
|
|
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
|
|
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
|
<input type="hidden" name="check_keywords" value="yes" />
|
|
<input type="hidden" name="area" value="default" />
|
|
</form>
|
|
</div>
|
|
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
|
<p class="caption" role="heading"><span class="caption-text">User Guide</span></p>
|
|
<ul>
|
|
<li class="toctree-l1"><a class="reference internal" href="Intro.html">1. Introduction</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Install.html">2. Install LAMMPS</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Build.html">3. Build LAMMPS</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Run_head.html">4. Run LAMMPS</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Commands.html">5. Commands</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Packages.html">6. Optional packages</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Speed.html">7. Accelerate performance</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Howto.html">8. Howto discussions</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Examples.html">9. Example scripts</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Tools.html">10. Auxiliary tools</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Errors.html">11. Errors</a></li>
|
|
</ul>
|
|
<p class="caption" role="heading"><span class="caption-text">Programmer Guide</span></p>
|
|
<ul class="current">
|
|
<li class="toctree-l1"><a class="reference internal" href="Library.html">1. LAMMPS Library Interfaces</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Python_head.html">2. Use Python with LAMMPS</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Modify.html">3. Modifying & extending LAMMPS</a></li>
|
|
<li class="toctree-l1 current"><a class="reference internal" href="Developer.html">4. Information for Developers</a><ul class="current">
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_org.html">4.1. Source files</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_org.html#class-topology">4.2. Class topology</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_code_design.html">4.3. Code design</a></li>
|
|
<li class="toctree-l2 current"><a class="reference internal" href="Developer_parallel.html">4.4. Parallel algorithms</a><ul class="current">
|
|
<li class="toctree-l3"><a class="reference internal" href="Developer_par_part.html">4.4.1. Partitioning</a></li>
|
|
<li class="toctree-l3"><a class="reference internal" href="Developer_par_comm.html">4.4.2. Communication</a></li>
|
|
<li class="toctree-l3"><a class="reference internal" href="Developer_par_neigh.html">4.4.3. Neighbor lists</a></li>
|
|
<li class="toctree-l3"><a class="reference internal" href="Developer_par_long.html">4.4.4. Long-range interactions</a></li>
|
|
<li class="toctree-l3 current"><a class="current reference internal" href="#">4.4.5. OpenMP Parallelism</a></li>
|
|
</ul>
|
|
</li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_atom.html">4.5. Accessing per-atom data</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_comm_ops.html">4.6. Communication patterns</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_flow.html">4.7. How a timestep works</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_write.html">4.8. Writing new styles</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_notes.html">4.9. Notes for developers and code maintainers</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_updating.html">4.10. Notes for updating code written for older LAMMPS versions</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_plugins.html">4.11. Writing plugins</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_unittest.html">4.12. Adding tests for unit testing</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Classes.html">4.13. C++ base classes</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_platform.html">4.14. Platform abstraction functions</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_utils.html">4.15. Utility functions</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_utils.html#special-math-functions">4.16. Special Math functions</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_utils.html#tokenizer-classes">4.17. Tokenizer classes</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_utils.html#argument-parsing-classes">4.18. Argument parsing classes</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_utils.html#file-reader-classes">4.19. File reader classes</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_utils.html#memory-pool-classes">4.20. Memory pool classes</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_utils.html#eigensolver-functions">4.21. Eigensolver functions</a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_utils.html#communication-buffer-coding-with-ubuf">4.22. Communication buffer coding with <em>ubuf</em></a></li>
|
|
<li class="toctree-l2"><a class="reference internal" href="Developer_grid.html">4.23. Use of distributed grids within style classes</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<p class="caption" role="heading"><span class="caption-text">Command Reference</span></p>
|
|
<ul>
|
|
<li class="toctree-l1"><a class="reference internal" href="commands_list.html">Commands</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="fixes.html">Fix Styles</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="computes.html">Compute Styles</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="pairs.html">Pair Styles</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="bonds.html">Bond Styles</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="angles.html">Angle Styles</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="dihedrals.html">Dihedral Styles</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="impropers.html">Improper Styles</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="dumps.html">Dump Styles</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="fix_modify_atc_commands.html">fix_modify AtC commands</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Bibliography.html">Bibliography</a></li>
|
|
</ul>
|
|
|
|
</div>
|
|
</div>
|
|
</nav>
|
|
|
|
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
|
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
|
<a href="Manual.html">LAMMPS</a>
|
|
</nav>
|
|
|
|
<div class="wy-nav-content">
|
|
<div class="rst-content style-external-links">
|
|
<div role="navigation" aria-label="Page navigation">
|
|
<ul class="wy-breadcrumbs">
|
|
<li><a href="Manual.html" class="icon icon-home" aria-label="Home"></a></li>
|
|
<li class="breadcrumb-item"><a href="Developer.html"><span class="section-number">4. </span>Information for Developers</a></li>
|
|
<li class="breadcrumb-item"><a href="Developer_parallel.html"><span class="section-number">4.4. </span>Parallel algorithms</a></li>
|
|
<li class="breadcrumb-item active"><span class="section-number">4.4.5. </span>OpenMP Parallelism</li>
|
|
<li class="wy-breadcrumbs-aside">
|
|
<a href="https://www.lammps.org"><img src="_static/lammps-logo.png" width="64" height="16" alt="LAMMPS Homepage"></a> | <a href="Commands_all.html">Commands</a>
|
|
</li>
|
|
</ul><div class="rst-breadcrumbs-buttons" role="navigation" aria-label="Sequential page navigation">
|
|
<a href="Developer_par_long.html" class="btn btn-neutral float-left" title="4.4.4. Long-range interactions" accesskey="p"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
|
<a href="Developer_atom.html" class="btn btn-neutral float-right" title="4.5. Accessing per-atom data" accesskey="n">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
|
</div>
|
|
<hr/>
|
|
</div>
|
|
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
|
<div itemprop="articleBody">
|
|
|
|
<p><span class="math notranslate nohighlight">\(\renewcommand{\AA}{\text{Å}}\)</span></p>
|
|
<section id="openmp-parallelism">
|
|
<h1><span class="section-number">4.4.5. </span>OpenMP Parallelism<a class="headerlink" href="#openmp-parallelism" title="Link to this heading"></a></h1>
|
|
<p>The styles in the INTEL, KOKKOS, and OPENMP packages offer to use OpenMP
|
|
thread parallelism to predominantly distribute loops over local data
|
|
and thus follow an orthogonal parallelization strategy to the
|
|
decomposition into spatial domains used by the <a class="reference internal" href="Developer_par_part.html"><span class="doc">MPI partitioning</span></a>. For clarity, this section discusses only the
|
|
implementation in the OPENMP package, as it is the simplest. The INTEL
|
|
and KOKKOS packages offer additional options and are more complex since
|
|
they support more features and different hardware like co-processors
|
|
or GPUs.</p>
|
|
<p>One of the key decisions when implementing the OPENMP package was to
|
|
keep the changes to the source code small, so that it would be easier to
|
|
maintain the code and keep it in sync with the non-threaded standard
|
|
implementation. This is achieved by a) making the OPENMP version a
|
|
derived class from the regular version (e.g. <code class="docutils literal notranslate"><span class="pre">PairLJCutOMP</span></code> from
|
|
<code class="docutils literal notranslate"><span class="pre">PairLJCut</span></code>) and only overriding methods that are multi-threaded or
|
|
need to be modified to support multi-threading (similar to what was done
|
|
in the OPT package), b) keeping the structure in the modified code very
|
|
similar so that side-by-side comparisons are still useful, and c)
|
|
offloading additional functionality and multi-thread support functions
|
|
into three separate classes <code class="docutils literal notranslate"><span class="pre">ThrOMP</span></code>, <code class="docutils literal notranslate"><span class="pre">ThrData</span></code>, and <code class="docutils literal notranslate"><span class="pre">FixOMP</span></code>.
|
|
<code class="docutils literal notranslate"><span class="pre">ThrOMP</span></code> provides additional, multi-thread aware functionality not
|
|
available in the corresponding base class (e.g. <code class="docutils literal notranslate"><span class="pre">Pair</span></code> for
|
|
<code class="docutils literal notranslate"><span class="pre">PairLJCutOMP</span></code>) like multi-thread aware variants of the “tally”
|
|
functions. Those functions are made available through multiple
|
|
inheritance, so those new functions have to have unique names to avoid
|
|
ambiguities; typically <code class="docutils literal notranslate"><span class="pre">_thr</span></code> is appended to the name of the function.
|
|
<code class="docutils literal notranslate"><span class="pre">ThrData</span></code> is a class that manages per-thread data structures. It is
|
|
used instead of extending the corresponding storage to per-thread arrays
|
|
to avoid slowdowns due to “false sharing” when multiple threads update
|
|
adjacent elements in an array and thus force the CPU cache lines to be
|
|
reset and re-fetched. <code class="docutils literal notranslate"><span class="pre">FixOMP</span></code> finally manages the “multi-thread
|
|
state” like settings and access to per-thread storage, it is activated
|
|
by the <a class="reference internal" href="package.html"><span class="doc">package omp</span></a> command.</p>
|
|
<section id="avoiding-data-races">
|
|
<h2>Avoiding data races<a class="headerlink" href="#avoiding-data-races" title="Link to this heading"></a></h2>
|
|
<p>A key problem when implementing thread parallelism in an MD code is
|
|
to avoid data races when updating accumulated properties like forces,
|
|
energies, and stresses. When interactions are computed, they always
|
|
involve multiple atoms and thus there are race conditions when multiple
|
|
threads want to update per-atom data of the same atoms. Five possible
|
|
strategies have been considered to avoid this:</p>
|
|
<ol class="arabic simple">
|
|
<li><p>Restructure the code so that there is no overlapping access possible
|
|
when computing in parallel, e.g. by breaking lists into multiple
|
|
parts and synchronizing threads in between.</p></li>
|
|
<li><p>Have each thread be “responsible” for a specific group of atoms and
|
|
compute these interactions multiple times, once on each thread that
|
|
is responsible for a given atom, and then have each thread only update
|
|
the properties of this atom.</p></li>
|
|
<li><p>Use mutexes around functions and regions of code where the data race
|
|
could happen.</p></li>
|
|
<li><p>Use atomic operations when updating per-atom properties.</p></li>
|
|
<li><p>Use replicated per-thread data structures to accumulate data without
|
|
conflicts and then use a reduction to combine those results into the
|
|
data structures used by the regular style.</p></li>
|
|
</ol>
|
|
<p>Option 5 was chosen for the OPENMP package because it would retain the
|
|
performance for the case of a single thread and the code would be more
|
|
maintainable. Option 1 would require extensive code changes,
|
|
particularly to the neighbor list code; option 2 would have incurred a
|
|
2x or more performance penalty for the serial case; option 3 causes
|
|
significant overhead and would enforce serialization of operations in
|
|
inner loops and thus defeat the purpose of multi-threading; option 4
|
|
slows down the serial case although not quite as bad as option 2. The
|
|
downside of option 5 is that the overhead of the reduction operations
|
|
grows with the number of threads used, so there would be a crossover
|
|
point where options 2 or 4 would result in faster executing. That is
|
|
why option 2 for example is used in the GPU package because a GPU is a
|
|
processor with a massive number of threads. However, since the MPI
|
|
parallelization is generally more effective for typical MD systems, the
|
|
expectation is that thread parallelism is only used for a smaller number
|
|
of threads (2-8). At the time of its implementation, that number was
|
|
equivalent to the number of CPU cores per CPU socket on high-end
|
|
supercomputers.</p>
|
|
<p>Thus arrays like the force array are dimensioned to the number of atoms
|
|
times the number of threads when enabling OpenMP support, and inside the
|
|
compute functions a pointer to a different chunk is obtained by each thread.
|
|
Similarly, accumulators like potential energy or virial are kept in
|
|
per-thread instances of the <code class="docutils literal notranslate"><span class="pre">ThrData</span></code> class and then only reduced and
|
|
stored in their global counterparts at the end of the force computation.</p>
|
|
</section>
|
|
<section id="loop-scheduling">
|
|
<h2>Loop scheduling<a class="headerlink" href="#loop-scheduling" title="Link to this heading"></a></h2>
|
|
<p>Multi-thread parallelization is applied by distributing (outer) loops
|
|
statically across threads. Typically, this would be the loop over local
|
|
atoms <em>i</em> when processing <em>i,j</em> pairs of atoms from a neighbor list.
|
|
The design of the neighbor list code results in atoms having a similar
|
|
number of neighbors for homogeneous systems and thus load imbalances
|
|
across threads are not common and typically happen for systems where
|
|
also the MPI parallelization would be unbalanced, which would typically
|
|
have a more pronounced impact on the performance. This same loop
|
|
scheduling scheme can also be applied to the reduction operations on
|
|
per-atom data to try and reduce the overhead of the reduction operation.</p>
|
|
</section>
|
|
<section id="neighbor-list-parallelization">
|
|
<h2>Neighbor list parallelization<a class="headerlink" href="#neighbor-list-parallelization" title="Link to this heading"></a></h2>
|
|
<p>In addition to the parallelization of force computations, also the
|
|
generation of the neighbor lists is parallelized. As explained
|
|
previously, neighbor lists are built by looping over “owned” atoms and
|
|
storing the neighbors in “pages”. In the OPENMP variants of the
|
|
neighbor list code, each thread operates on a different chunk of “owned”
|
|
atoms and allocates and fills its own set of pages with neighbor list
|
|
data. This is achieved by each thread keeping its own instance of the
|
|
<a class="reference internal" href="Developer_utils.html#_CPPv4N9LAMMPS_NS6MyPageE" title="LAMMPS_NS::MyPage"><code class="xref cpp cpp-class docutils literal notranslate"><span class="pre">MyPage</span></code></a> page allocator class.</p>
|
|
</section>
|
|
</section>
|
|
|
|
|
|
</div>
|
|
</div>
|
|
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
|
<a href="Developer_par_long.html" class="btn btn-neutral float-left" title="4.4.4. Long-range interactions" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
|
<a href="Developer_atom.html" class="btn btn-neutral float-right" title="4.5. Accessing per-atom data" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
|
</div>
|
|
|
|
<hr/>
|
|
|
|
<div role="contentinfo">
|
|
<p>© Copyright 2003-2025 Sandia Corporation.</p>
|
|
</div>
|
|
|
|
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
|
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
|
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
|
|
|
|
|
</footer>
|
|
</div>
|
|
</div>
|
|
</section>
|
|
</div>
|
|
<script>
|
|
jQuery(function () {
|
|
SphinxRtdTheme.Navigation.enable(false);
|
|
});
|
|
</script>
|
|
|
|
</body>
|
|
</html> |