This should print a warning when 2x the bonded interaction cutoff list larger then other cutoffs, as was the setting before the performance optimization with the change in 2690075405
this now covers a large set of cases where the variable name can be printed.
it also is complete for the current code, since no more default arguments are required
are provided.
New directory: tools/doxygen
New file: tools/doxygen/Developer.dox.lammps
New file: tools/doxygen/Doxyfile.lammps
New file: tools/doxygen/doxygen.sh
New file: tools/doxygen/README
The Developer.dox.lammps file contains a slightly revised version of the
Developer.pdf file adopted to the LAMMPS "doxygen" documentation.
The Doxyfile.lammps file is a first proposal for a LAMMPS "doxygen"
documentation flavor and can be adjusted to specific requirements.
The "doxygen.sh" shell script generates the LAMMPS "doxygen"
documentation.
Detailed instructions can be found in the README file.
This reverts commit 4a3a6b4455.
As it turns out, when using the LAMMPS python wrapper from inside
code using the PYTHON package, the library symbols *are* needed.
Thanks for Richard Berger (@rbberger) for pointing this out.
This reverts commit 4a3a6b4455.
As it turns out, when using the LAMMPS python wrapper from inside
code using the PYTHON package, the library symbols *are* needed.
Thanks for Richard Berger (@rbberger) for pointing this out.
NEB was not working fine when using multiple proc
per replica and the keywords last/efirst or last/efirst/middle
I have corrected this in the enclosed fix_neb.cpp
I also slightly modified the nudging for this free end so that
it would be applied only when the target energy is larger than
the energy. Anyway if the target energy is lower than the energy,
the replica should relax toward the target energy without adding
any nudging.
I also modified the documentation according to this change.
This reverts commit 4a3a6b4455.
As it turns out, when using the LAMMPS python wrapper from inside
code using the PYTHON package, the library symbols *are* needed.
Thanks for Richard Berger (@rbberger) for pointing this out.
the sphinxcontrib.image extension was broken with sphinx 16.x.
however, sphinx 15.x breaks with newer version of the multiprocessor module.
so we suspend the thumbnail processing and lift the lock to sphinx 15.x
also, the number of parallel sphinx tasks is can be overridden with SPHINXEXTRA="-j #'.
default is to try use all local CPU cores.
This includes an example of how to implement fix NVE in Python.
The library interface was extended to provide direct access to atom data using
numpy arrays. No data copies are made and numpy operations directly manipulate
memory of the native code.
To keep this numpy dependency optional, all functions are wrapped into the
lammps.numpy sub-object which is only loaded when accessed.
This was accomplished with several key changes:
1) Modified fix_shardlow's control flow to match fix_shardlow_kokkos so
that random numbers are pulled fromn the RNGs in exactly the same order.
2) Created random_external_state.h, a simplified version of the Kokkos
random number generator that keeps its state variables external to itself.
Thus it can be used both with and without Kokkos enabled, as long as the
caller stores and passes in the required state variable.
3) Replaced all references to random_mars.h and Kokkos_Random.hpp code in
the fix_shardlow* files with calls to the random_external_state.h code,
guaranteeing that fix_shardlow* is using an identical RNG in all cases.
Result: most (56 of 61) of our internal tests now generate the same results
with kokkos turned on or off. Four cases still differ due to what appear
to be vectorization caused rounding differences, and the fifth case
appears to be something triggered by the kokkos "atom_style hybrid" code.
Propogate the efaa4c67 changes to npair_ssa_kokkos from npair_kokkos that
support the new neigh_modify exclude molecule/intra and /inter options.
Note: npair_ssa_kokkos could inherit from npair_kokkos to avoid this kind
of missed change. Unfortunately, inheritance from templated classes is
both tricky and messy, and not worth the complexity in this case, IMHO.
Notable features are the umbrella-integration based free energy estimator for
eABF, and the traditional thermodynamic integration estimator now available
for umbrella sampling, SMD, metadynamics. Also included are several small fixes.
Below is a list of relevant commits in the Colvars repository since the last update.
321d06a 2017-10-10 Add macros to manage colvarscript commands [Giacomo Fiorin]
26c3bec 2017-10-09 Document coming availability of Lepton in LAMMPS [Giacomo Fiorin]
cc8f249 2017-10-04 Clarify that SMP depends on code build [Giacomo Fiorin]
0b2ffac 2017-10-04 Summarize colvar definition options, clarify some details [Giacomo Fiorin]
28002e0 2017-10-01 Separate writing of restart file from other output (e.g. PMFs) [Giacomo Fiorin]
92f7c1d 2017-10-01 Deprecate colvarsTrajAppend [Giacomo Fiorin]
12a707f 2017-09-26 Accurate Jacobian calculation for RMSD variants [Jérôme Hénin]
fe389c9 2017-09-21 Allow subtractAppliedForce with extended-L again [Jérôme Hénin]
c050ce0 2017-09-18 Silence compiler warnings, remove Tabs [Giacomo Fiorin]
cb41905 2017-01-11 Add base class for TI estimator in other biases than ABF [Giacomo Fiorin]
a1bc676 2017-09-14 Avoid writing to unopened traj file [Jérôme Hénin]
b58d8cd 2017-09-08 Function to check for overlapping groups [Jérôme Hénin]
1e5efec 2017-09-07 Check for overlapping groups in coordNum [Jérôme Hénin]
03a61a4 2017-04-06 Add UI-based estimator [fhh2626]
ae43754 2017-08-17 Fix outputCenters parsing [Josh Vermaas]
1619e0e 2017-08-14 Delete static feature arrays in cvm destructor [Jérôme Hénin]
- re-indent to 2 blanks
- white space cleanup
- use force->numeric() and force->inumeric() instead of atof() and atoi()
- include system headers before local LAMMPS headers
- move example folder to examples/USER/misc/
- comment out writing of trajectory files
- reduce run length (for easier testing for regressions)
- record example outputs for 1 and 4 MPI processes
- rename readme.md to README.md for visibility
Adding raw performance numbers for Skylake xeon server.
Fixes for using older Intel compilers and compiling without OpenMP.
Fix adding in hooks for using USER-INTEL w/ minimization.
- include the used tricubic functions directly as static functions
- silence compiler warnings
- define f2c.h imported data types directly or use C equivalents
- since the direct LAPACK API was called and not cLAPACK, declare LAPACK interface and depend only on LAPACK
- add proper dependencies
- disable automatic minor version number generation. step version manually.
- comment out optional spglib functionality by default
with this change, the USER-INTEL package can be installed and
compiled without having to alter makefiles for adding -lpthread.
All "intel optimized" makefiles have been updated to have the
LRT feature enabled. This change will allow us to include the
USER-INTEL package in several automated testing configurations
and thus allows to detect incompatibilities and compilation issus faster.
The library interface was extended to provide direct access to atom data using
numpy arrays. No data copies are made and numpy operations directly manipulate
memory of the native code.
To keep this numpy dependency optional, all functions are wrapped into the
lammps.numpy sub-object which is only loaded when accessed.
The Constant Energy DPD (DPDE) was our primary usage case, so only stubs
for the Constant Temperature case were included in Kokkos code so far.
The non-Kokkos version works fine for Constant Temperature DPD.
New function that allows for parallel tempering (replica exchange) in MD in LAMMPS in the isothermal-isobaric ensemble (NPT)
Similar to temper which works in the canonical (NVT) ensemble.
An example is included that uses temper_npt
Merge changes thru July 27, 2017 from master 6d0a2286 into USER-DPD_kokkos
Includes 67a0183b which partially reverted 7f9a331c (from May 16, 2017) in USER-DPD,
since SSA neighbor lists use ghost info, so they can't currently be used as "occasional" lists.
The default compiler flags in voro++'s config.mk file do not include
-fPIC, which makes it incompatible with building the shared object for
the python wrapper.
- build into local directory to replace existing installation is now default
- add wrapper function that calls curl in case python package has not ssl support
- have to specify -n flag to avoid wiping out the existing installation
- can specify -p to point to an existing kim-api installation (implies -n)
This example showcases the use of different 'special_bonds' settings for
different pair styles, so quip gets all the bonded neighbours but lj can
exclude them if it needs to.
The results have been checked against a pure quip implementation of the
potential; the expected lammps output is included.
DISCLAIMER: This example mixes parameters for methane and silane and is
NOT intended to be a realistic representation of either system.
this check currently only applies to rigid fixes and is needed
so that their respective enforce2d function is called _after_
the post force functions. this is required in combination with
commit a9ff593763 to allow rigid
fixes use the langevin option correctly for 2d systems
In #514 it has been raised that the switching function that
ensures a smooth transition to the cutoff is only correct if
cutlj = 3.0. This patch gives users an opportunity to configure
the switching function together with the cutoff by specifying
the start of the transition region. Behaviour in the default case
remaing unchanged.
This allows users to specify larger cutoffs than 3 (which used to
have no effect) and get correct cutoff behaviour for values less
then 3.
This change replaces the bondorderLJ() function with code provided
by Github user CF17, which is based on the bondorder() code.
It could be fixed with a shorter patch [1], but layering fix upon
fix seems to be unwise in this case.
While the code at this point departs from following the Fortran
code closely, the reason is that the bug is present in the Fortran
code as well.
Instead, the new code follows closely the bondorder() code that
already exists, which should be easier to maintain in the future.
This patch makes the two functions consistent with each other,
and makes outside contributions easier.
Since it uses a different approach to compute its value, some
explanation of that reasoning has been added on top.
1: e8c5c662b2
- use static/const
- return instead of ptr-parameter, &ref if more than one return
- replace macros from header with inline functions
- remove useless/old comments
rather than placing an if statement around every incidence of calling atom->check_mass() to ensure it is only called when per atom masses are not set, we place that check _inside_ Atom::check_mass(). This avoids unexpected error messages.
- fm_exp moved to math_special (exp2 was already there)
- use std::min/max template instead of macros
- use memory->create for dynamic arrays (still 1-indexed with macro)
- remove _ from function names, adjust method visibility
- move MEAM class into LAMMPS_NS namespace
- move inclusion of meam.h header to pair_meamc.cpp to reduce namespace pollution
- use forward declaration for MEAM class reference
- make that class reference a pointer and add a destructor
- replace MAX/MIN macros with versions compatible with older compilers
There were several clean_copy() calls in pair
styles *outside device code*.
They seem to have been left over from an abandoned
effort to copy the Kokkos neighbor list as
a member of the pair style, instead of copying
out the individual views needed.
These leftover clean_copy() calls were setting
pointers to NULL that had not been freed,
leading to large memory leaks.
I've removed the clean_copy() function entirely,
and replaced it with the copymode flag system used
in many other Kokkos objects.
The copymode flag is only set to one in
functors that hold copies of the neighbor list.
There were several clean_copy() calls in pair
styles *outside device code*.
They seem to have been left over from an abandoned
effort to copy the Kokkos neighbor list as
a member of the pair style, instead of copying
out the individual views needed.
These leftover clean_copy() calls were setting
pointers to NULL that had not been freed,
leading to large memory leaks.
I've removed the clean_copy() function entirely,
and replaced it with the copymode flag system used
in many other Kokkos objects.
The copymode flag is only set to one in
functors that hold copies of the neighbor list.
- Moved the particle loop inside a replica of getMixingWeights, getMixingWeightsVect,
and refactored to improve vectorization.
- Added OMP SIMD and OMP threading directly inside that function but will replace with
kokkos parallel_for and parallel_reduce methods later.
Normally, the gzip process would be pinned to the same core as the
MPI rank 0 process, which makes the pipe stay in one core's cache,
but forces the two process to fight for that core, slowing things down.
Note: "newton on" still required if using non-kokkos pair styles or fixes.
Non-kokkos pairs/fixes don't expect their half lists with newton off,
which happens if newton is turned off globally by kokkos via commandline.
Note2: Regardless, fix_shardlow* will still use half lists and newton on.
two sort functions with different
names but identical functionality.
making them the same function
until we descide to use a different
algorithm for atoms and ghosts
KOKKOS_LAMBDA doesn't quite work on CUDA,
you have to use LAMMPS_LAMBDA.
Also, if you do use LAMMPS_LAMBDA, you need
to run on the default device type,
i.e. no using lambdas to run on OpenMP
when LAMMPS has been compiled for CUDA.
Add support for lock-free and deterministic use of Random_XorShift*_Pool
by giving state_idx selection and lock responsibility up to the
application. Done by an overload of get_state() to take sate_idx as
an argument that the appplication guarantees is concurrently unique
and within the range of num_states that the application passed to init().
In other words, this allows the RNG state to be associated with some
application specific index, rather than a runtime arbitrary thread ID,
and thus the application can control which work is performed using
which RNG in a deterministic manner, regardless of which thread
performs the work.
Random_XorShift*_Pool<Kokkos::Cuda>::free_state() has two purposes:
1) update the state value kept in the pool
2) unlock the state
For a CUDA host thread, ONLY skip step 2, not both.
SSA atom binning algorithm was adjusted to do as much work in
parallel while preserving deterministic behavior. The final
step is done serially to preserve deterministic behavior.
An alternative would be to sort the contents of the bins so
that they are always in the same order.
ssa_update_dpde() hangs on first use of rand_gen.normal()
Switching to not using a pointer to PairDPDfdtEnergyKokkos's rand_pool
had no noticble effect.
Eliminates a special case version of a loop just for Subphase 0.
NOTE: pair evaluation order changes, causing numerical differences!
This changed the order that close neighbors of ghosts are processed.
NOTE: pair evaluation order changes, causing numerical differences!
Atom pair processing order is fully planned out in npair_half_bin_newton_ssa
Makes the SSA neighbor list structure very different. Do not use by others!
Each local is in ilist, numneigh, and firstneigh four times instead of once.
Changes LAMMPS core code that had been previously changed for USER-DPD/SSA:
Removes ssaAIR[] from class Atom as it is now unused.
Removes ndxAIR_ssa[] from class NeighList as it is now unused.
Increases length of ilist[], numneigh[], and firstneigh[] if SSA flag set.
NOTE: pair evaluation order changes, causing numerical differences!
This enables processing neighbors in subphase groups that enforce
a geometrical seperation of pairs, allowing greater parallelism
once fix_shardlow (SSA) is converted to Kokkos.
This removes the the distinction between pure and impure locals.
Pure and impure locals messed up the directionality of half neighbor lists,
which turns out is crucial to the approach for SSA with kokkos.
- Switched from using lambda functions to operator()'s with type tags
in FixRxKokkos. The lambda's were giving big problems in Cuda with
the memory objects. This required that all referenced views be members
of the FixRXKokkos class.
- Add copymode controls to solve_reactions() to avoid the destructor
freeing pointers carried forward from the copy constructor. Added
the same to FixRX since its called, too.
- Updated the function prototypes to include the necessary KOKKOS
macros for __host__ and __device__ functions and inlined functions.
- Changed several View definitions to match the disjoint memory spaces
that only come up with Cuda builds.
- Finished porting all scratch arrays to using the StridedArrayType
template.
- Created a single, large Kokkos device array and using that for all
scratch data passed into the StridedArrayType objects.
- Created an Array class that provides stride access for operator[]
w/o needing Kokkos views. This was designed to avoid the performance
issues encountered with Views and sub-views throughout the RHS and
ODE solver functions.
- Added the diagnostics performance analysis routine to FixRxKokkos
using Kokkos views.
TODO:
- Switch to using Kokkos data for the per-iteration scratch data.
How to allocate only enouch for each work-unit and not all
iterations? Can the shared-memory scratch memory work for this,
even for large sizes?
usage and calls to computeLocalTemperature.
- Created request for kokkos neighbor list for fix and switched to
that neighbor list datatype in computeLocalTemperature.
- Reconfigured pre_force and setup_pre_force to call a common
solve_reactions() method to avoid duplicate code.
TODO:
- Clean-up
- Provide per-problem scratch data within kokkos framework (instead
of C++ new/delete data).
- Added a kokkos version of setup_pre_force that only sets dvector
and then communicates that.
- Converted all for loops to parallel_for's in computeLocalTemperator()
and setup_pre_force.
- Added pack/unpack forward/reverse methods with Kokkos host views.
TODO:
- The Kokkos neighbor list is not working. Need to request a Kokkos
neighbor list in ::init(). Then, replace objects like list->ilist[]
with k_list->d_ilist().
Added kokkos dual-view datatypes used in computeLocalTemperature and
pre_force (e.g., dpdThetaLocal) but still using the original host
pointers for the pack/unpack operations.
TODO:
- The Kokkos neighbor list is not working. Need to request a Kokkos
neighbor list in ::init(). Then, replace objects like list->ilist[]
with k_list->d_ilist().
- Add another template parameter for HALFTHREAD and create (automatic)
atomic view of dpdThetaLocal and sumWeights.
- Add modify/sync comments and replace the host-only pointers in the
pack/unpack methods.
- Added templated computeLocalTemp<>() to FixRxKokkos but still
using the original host data pointers.
- Updated the copy-back to dvector operation to be the same with
RK4 and RKF45 per discussion with J. Larentzos.
TODO:
- Add kokkos data for computeLocalTemp and parallel_for loop.
- Updated the KOKKOS installer to include the fix_rx_kokkos.[cpp,h].
- Updated the USER-DPD version of fix_rx.[cpp,h] to sync with the Kokkos
version. Solves child->parent class dependencies.
- Added kokkos-managed parameter data for the kinetics equations.
- Removed dependencies in rhs() on atom and domain objects.
TODO:
1. Switch to using KOKKOS data for dvector.
2. Port ComputeLocalTemp(...) to Kokkos (needs pairing algorithm).
Initial port of USER-DPD/fix_rx.cpp to KOKKOS/fix_rx_kokkos.cpp.
Using parallel_reduce(...) but still using host-only data.
TODO:
1. Switch to KOKKOS datatypes for sparse-kinetics data; dense
is finished.
2. Switch to using KOKKOS data for dvector.
3. Remove dependencies in rhs(...) on atom. Store those consts
in UserData{} or as member constants.
4. Port ComputeLocalTemp(...) to Kokkos (needs pairing algorithm).
the main bug here is the use of a local
rho_i accumulator which later gets assigned
back to rho[i].
in parallel, atomic additions can happen to
rho[i] while the local accumulator is held;
those atomic additions are lost when
the accumulator is atomically assigned.
we instead initialize the accumulator to zero
and atomically add it back to rho[i].
one Kokkos kernel was not annotated consistently,
STACKPARAMS was essentially uninitialized and
confused with a local variable,
plus lots of variables were unused in some
of the Kokkos kernels.
During dynamic load balancing, the subdomains will not be uniform so the
bbox size test in USER-DPD/fix_shardlow.cpp may only be called by one rank.
Using error->one allows any rank to stop the simulation in this scenario.
Added rcut and bbox information to help in diagnostics.
Thank your for considering to contribute to the LAMMPS software project.
The following is a set of guidelines as well as explanations of policies and workflows for contributing to the LAMMPS molecular dynamics software project. These guidelines focus on submitting issues or pull requests on the LAMMPS GitHub project.
Thus please also have a look at:
* [The Section on submitting new features for inclusion in LAMMPS of the Manual](http://lammps.sandia.gov/doc/Section_modify.html#mod-15)
* [The LAMMPS GitHub Tutorial in the Manual](http://lammps.sandia.gov/doc/tutorial_github.html)
## Table of Contents
[I don't want to read this whole thing, I just have a question!](#i-dont-want-to-read-this-whole-thing-i-just-have-a-question)
[How Can I Contribute?](#how-can-i-contribute)
* [Discussing How To Use LAMMPS](#discussing-how-to-use-lammps)
## I don't want to read this whole thing I just have a question!
> **Note:** Please do not file an issue to ask a general question about LAMMPS, its features, how to use specific commands, or how perform simulations or analysis in LAMMPS. Instead post your question to the ['lammps-users' mailing list](http://lammps.sandia.gov/mail.html). You do not need to be subscribed to post to the list (but a mailing list subscription avoids having your post delayed until it is approved by a mailing list moderator). Most posts to the mailing list receive a response within less than 24 hours. Before posting to the mailing list, please read the [mailing list guidelines](http://lammps.sandia.gov/guidelines.html). Following those guidelines will help greatly to get a helpful response. Always mention which LAMMPS version you are using.
## How Can I Contribute?
There are several ways how you can actively contribute to the LAMMPS project: you can discuss compiling and using LAMMPS, and solving LAMMPS related problems with other LAMMPS users on the lammps-users mailing list, you can report bugs or suggest enhancements by creating issues on GitHub (or posting them to the lammps-users mailing list), and you can contribute by submitting pull requests on GitHub or e-mail your code
to one of the [LAMMPS core developers](http://lammps.sandia.gov/authors.html). As you may see from the aforementioned developer page, the LAMMPS software package includes the efforts of a very large number of contributors beyond the principal authors and maintainers.
### Discussing How To Use LAMMPS
The LAMMPS mailing list is hosted at SourceForge. The mailing list began in 2005, and now includes tens of thousands of messages in thousands of threads. LAMMPS developers try to respond to posted questions in a timely manner, but there are no guarantees. Please consider that people live in different timezone and may not have time to answer e-mails outside of their work hours.
You can post to list by sending your email to lammps-users at lists.sourceforge.net (no subscription required), but before posting, please read the [mailing list guidelines](http://lammps.sandia.gov/guidelines.html) to maximize your chances to receive a helpful response.
Anyone can browse/search previous questions/answers in the archives. You do not have to subscribe to the list to post questions, receive answers (to your questions), or browse/search the archives. You **do** need to subscribe to the list if you want emails for **all** the posts (as individual messages or in digest form), or to answer questions yourself. Feel free to sign up and help us out! Answering questions from fellow LAMMPS users is a great way to pay back the community for providing you a useful tool for free, and to pass on the advice you have received yourself to others. It improves your karma and helps you understand your own research better.
If you post a message and you are a subscriber, your message will appear immediately. If you are not a subscriber, your message will be moderated, which typically takes one business day. Either way, when someone replies the reply will usually be sent to both, your personal email address and the mailing list. When replying to people, that responded to your post to the list, please always included the mailing list in your replies (i.e. use "Reply All" and **not** "Reply"). Responses will appear on the list in a few minutes, but it can take a few hours for postings and replies to show up in the SourceForge archive. Sending replies also to the mailing list is important, so that responses are archived and people with a similar issue can search for possible solutions in the mailing list archive.
### Reporting Bugs
While developers writing code for LAMMPS are careful to test their code, LAMMPS is such a large and complex software, that it is impossible to test for all combinations of features under all normal and not so normal circumstances. Thus bugs do happen, and if you suspect, that you have encountered one, please try to document it and report it as an [Issue](https://github.com/lammps/lammps/issues) on the LAMMPS GitHub project web page. However, before reporting a bug, you need to check whether this is something that may have already been corrected. The [Latest Features and Bug Fixes in LAMMPS](http://lammps.sandia.gov/bug.html) web page lists all significant changes to LAMMPS over the years. It also tells you what the current latest development version of LAMMPS is, and you should test whether your issue still applies to that version.
When you click on the green "New Issue" button, you will be provided with a text field, where you can enter your message. That text field with contain a template with several headlines and some descriptions. Keep the headlines that are relevant to your reported potential bug and replace the descriptions with the information as suggested by the descriptions.
You can also attach small text files (please add the file name extension `.txt` or it will be rejected), images, or small compressed text files (using gzip, do not use RAR or 7-ZIP or similar tools that are uncommon outside of Windows machines). In many cases, bugs are best illustrated by providing a small input deck (do **not** attach your entire production input, but remove everything that is not required to reproduce the issue, and scale down your system size, that the resulting calculation runs fast and can be run on small desktop quickly).
To be able to submit an issue on GitHub, you have to register for an account (for GitHub in general). If you do not want to do that, or have other reservations against submitting an issue there, you can - as an alternative and in decreasing preference - either send an e-mail to the lammps-users mailing list, the original authors of the feature that you suspect to be affected, or one or more of the core LAMMPS developers.
### Suggesting Enhancements
The LAMMPS developers welcome suggestions for enhancements or new features. These should be submitted using the [GitHub Issue Tracker](https://github.com/lammps/lammps/issues) of the LAMMPS project. This is particularly recommended, when you plan to implement the feature or enhancement yourself, as this allows to coordinate in case there are other similar or conflicting ongoing developments.
The LAMMPS developers will review your submission and consider implementing it. Whether this will actually happen depends on many factors: how difficult it would be, how much effort it would take, how many users would benefit from it, how well the individual developer would understand the underlying physics of the feature, and whether this is a feature that would fit into a software like LAMMPS, or would be better implemented as a separate tool. Because of these factors, it matters how well the suggested enhancement is formulated and the overall benefit is argued convincingly.
To be able to submit an issue on GitHub, you have to register for an account (for GitHub in general). If you do not want to do that, or have other reservations against submitting an issue there, you can - as an alternative - send an e-mail to the lammps-users mailing list.
### Contributing Code
We encourage users to submit new features or modifications for LAMMPS to the core developers so they can be added to the LAMMPS distribution. The preferred way to manage and coordinate this is by submitting a pull request at the LAMMPS project on GitHub. For any larger modifications or programming project, you are encouraged to contact the LAMMPS developers ahead of time, in order to discuss implementation strategies and coding guidelines, that will make it easier to integrate your contribution and result in less work for everybody involved. You are also encouraged to search through the list of open issues on GitHub and submit a new issue for a planned feature, so you would not duplicate the work of others (and possibly get scooped by them) or have your work duplicated by others.
How quickly your contribution will be integrated depends largely on how much effort it will cause to integrate and test it, how much it requires changes to the core code base, and of how much interest it is to the larger LAMMPS community. Please see below for a checklist of typical requirements. Once you have prepared everything, see [this tutorial](http://lammps.sandia.gov/doc/tutorial_github.html)
for instructions on how to submit your changes or new files through a GitHub pull request
Here is a checklist of steps you need to follow to submit a single file or user package for our consideration. Following these steps will save both you and us time. See existing files in packages in the source directory for examples. If you are uncertain, please ask on the lammps-users mailing list.
* All source files you provide must compile with the most current version of LAMMPS with multiple configurations. In particular you need to test compiling LAMMPS from scratch with `-DLAMMPS_BIGBIG` set in addition to the default `-DLAMMPS_SMALLBIG` setting. Your code will need to work correctly in serial and in parallel using MPI.
* For consistency with the rest of LAMMPS and especially, if you want your contribution(s) to be added to main LAMMPS code or one of its standard packages, it needs to be written in a style compatible with other LAMMPS source files. This means: 2-character indentation per level, no tabs, no lines over 80 characters. I/O is done via the C-style stdio library, class header files should not import any system headers outside <stdio.h>, STL containers should be avoided in headers, and forward declarations used where possible or needed. All added code should be placed into the LAMMPS_NS namespace or a sub-namespace; global or static variables should be avoided, as they conflict with the modular nature of LAMMPS and the C++ class structure. Header files must not import namespaces with using. This all is so the developers can more easily understand, integrate, and maintain your contribution and reduce conflicts with other parts of LAMMPS. This basically means that the code accesses data structures, performs its operations, and is formatted similar to other LAMMPS source files, including the use of the error class for error and warning messages.
* If you want your contribution to be added as a user-contributed feature, and it is a single file (actually a `<name>.cpp` and `<name>.h` file) it can be rapidly added to the USER-MISC directory. Include the one-line entry to add to the USER-MISC/README file in that directory, along with the 2 source files. You can do this multiple times if you wish to contribute several individual features.
* If you want your contribution to be added as a user-contribution and it is several related features, it is probably best to make it a user package directory with a name like USER-FOO. In addition to your new files, the directory should contain a README text file. The README should contain your name and contact information and a brief description of what your new package does. If your files depend on other LAMMPS style files also being installed (e.g. because your file is a derived class from the other LAMMPS class), then an Install.sh file is also needed to check for those dependencies. See other README and Install.sh files in other USER directories as examples. Send us a tarball of this USER-FOO directory.
* Your new source files need to have the LAMMPS copyright, GPL notice, and your name and email address at the top, like other user-contributed LAMMPS source files. They need to create a class that is inside the LAMMPS namespace. If the file is for one of the USER packages, including USER-MISC, then we are not as picky about the coding style (see above). I.e. the files do not need to be in the same stylistic format and syntax as other LAMMPS files, though that would be nice for developers as well as users who try to read your code.
* You **must** also create or extend a documentation file for each new command or style you are adding to LAMMPS. For simplicity and convenience, the documentation of groups of closely related commands or styles may be combined into a single file. This will be one file for a single-file feature. For a package, it might be several files. These are simple text files with a specific markup language, that are then auto-converted to HTML and PDF. The tools for this conversion are included in the source distribution, and the translation can be as simple as doing "make html pdf" in the doc folder. Thus the documentation source files must be in the same format and style as other `<name>.txt` files in the lammps/doc/src directory for similar commands and styles; use one or more of them as a starting point. A description of the markup can also be found in `lammps/doc/utils/txt2html/README.html` As appropriate, the text files can include links to equations (see doc/Eqs/*.tex for examples, we auto-create the associated JPG files), or figures (see doc/JPG for examples), or even additional PDF files with further details (see doc/PDF for examples). The doc page should also include literature citations as appropriate; see the bottom of doc/fix_nh.txt for examples and the earlier part of the same file for how to format the cite itself. The "Restrictions" section of the doc page should indicate that your command is only available if LAMMPS is built with the appropriate USER-MISC or USER-FOO package. See other user package doc files for examples of how to do this. The prerequisite for building the HTML format files are Python 3.x and virtualenv, the requirement for generating the PDF format manual is the htmldoc software. Please run at least "make html" and carefully inspect and proofread the resulting HTML format doc page before submitting your code.
* For a new package (or even a single command) you should include one or more example scripts demonstrating its use. These should run in no more than a couple minutes, even on a single processor, and not require large data files as input. See directories under examples/USER for examples of input scripts other users provided for their packages. These example inputs are also required for validating memory accesses and testing for memory leaks with valgrind
* If there is a paper of yours describing your feature (either the algorithm/science behind the feature itself, or its initial usage, or its implementation in LAMMPS), you can add the citation to the *.cpp source file. See src/USER-EFF/atom_vec_electron.cpp for an example. A LaTeX citation is stored in a variable at the top of the file and a single line of code that references the variable is added to the constructor of the class. Whenever a user invokes your feature from their input script, this will cause LAMMPS to output the citation to a log.cite file and prompt the user to examine the file. Note that you should only use this for a paper you or your group authored. E.g. adding a cite in the code for a paper by Nose and Hoover if you write a fix that implements their integrator is not the intended usage. That kind of citation should just be in the doc page you provide.
Finally, as a general rule-of-thumb, the more clear and self-explanatory you make your documentation and README files, and the easier you make it for people to get started, e.g. by providing example scripts, the more likely it is that users will try out your new feature.
If the new features/files are broadly useful we may add them as core files to LAMMPS or as part of a standard package. Else we will add them as a user-contributed file or package. Examples of user packages are in src sub-directories that start with USER. The USER-MISC package is simply a collection of (mostly) unrelated single files, which is the simplest way to have your contribution quickly added to the LAMMPS distribution. You can see a list of the both standard and user packages by typing "make package" in the LAMMPS src directory.
Note that by providing us files to release, you are agreeing to make them open-source, i.e. we can release them under the terms of the GPL, used as a license for the rest of LAMMPS. See Section 1.4 for details.
With user packages and files, all we are really providing (aside from the fame and fortune that accompanies having your name in the source code and on the Authors page of the LAMMPS WWW site), is a means for you to distribute your work to the LAMMPS user community, and a mechanism for others to easily try out your new feature. This may help you find bugs or make contact with new collaborators. Note that you are also implicitly agreeing to support your code which means answer questions, fix bugs, and maintain it if LAMMPS changes in some way that breaks it (an unusual event).
To be able to submit an issue on GitHub, you have to register for an account (for GitHub in general). If you do not want to do that, or have other reservations or difficulties to submit a pull request, you can - as an alternative - contact one or more of the core LAMMPS developers and ask if one of them would be interested in manually merging your code into LAMMPS and send them your source code. Since the effort to merge a pull request is a small fraction of the effort of integrating source code manually (which would usually be done by converting the contribution into a pull request), your chances to have your new code included quickly are the best with a pull request.
If you prefer to submit patches or full files, you should first make certain, that your code works correctly with the latest patch-level version of LAMMPS and contains all bug fixes from it. Then create a gzipped tar file of all changed or added files or a corresponding patch file using 'diff -u' or 'diff -c' and compress it with gzip. Please only use gzip compression, as this works well on all platforms.
## GitHub Workflows
This section briefly summarizes the steps that will happen **after** you have submitted either an issue or a pull request on the LAMMPS GitHub project page.
### Issues
After submitting an issue, one or more of the LAMMPS developers will review it and categorize it by assigning labels. Confirmed bug reports will be labeled `bug`; if the bug report also contains a suggestion for how to fix it, it will be labeled `bugfix`; if the issue is a feature request, it will be labeled `enhancement`. Other labels may be attached as well, depending on which parts of the LAMMPS code are affected. If the assessment is, that the issue does not warrant any changes, the `wontfix` label will be applied and if the submission is incorrect or something that should not be submitted as an issue, the `invalid` label will be applied. In both of the last two cases, the issue will then be closed without further action.
For feature requests, what happens next is that developers may comment on the viability or relevance of the request, discuss and make suggestions for how to implement it. If a LAMMPS developer or user is planning to implement the feature, the issue will be assigned to that developer. For developers, that are not yet listed as LAMMPS project collaborators, they will receive an invitation to be added to the LAMMPS project as a collaborator so they can get assigned. If the requested feature or enhancement is implemented, it will usually be submitted as a pull request, which will contain a reference to the issue number. And once the pull request is reviewed and accepted for inclusion into LAMMPS, the issue will be closed. For details on how pull requests are processed, please see below.
For bug reports, the next step is that one of the core LAMMPS developers will self-assign to the issue and try to confirm the bug. If confirmed, the `bug` label and potentially other labels are added to classify the issue and its impact to LAMMPS. Before confirming, further questions may be asked or requests for providing additional input files or details about the steps required to reproduce the issue. Any bugfix is likely to be submitted as a pull request (more about that below) and since most bugs require only local changes, the bugfix may be included in a pull request specifically set up to collect such local bugfixes or small enhancements. Once the bugfix is included in the master branch, the issue will be closed.
### Pull Requests
For submitting pull requests, there is a [detailed tutorial](http://lammps.sandia.gov/doc/tutorial_github.html) in the LAMMPS manual. Thus only a brief breakdown of the steps is presented here.
Immediately after the submission, the LAMMPS continuing integration server at ci.lammps.org will download your submitted branch and perform a simple compilation test, i.e. will test whether your submitted code can be compiled under various conditions. It will also do a check on whether your included documentation translates cleanly. Whether these tests are successful or fail will be recorded. If a test fails, please inspect the corresponding output on the CI server and take the necessary steps, if needed, so that the code can compile cleanly again. The test will be re-run each the pull request is updated with a push to the remote branch on GitHub.
Next a LAMMPS core developer will self-assign and do an overall technical assessment of the submission. If you are not yet registered as a LAMMPS collaborator, you will receive an invitation for that.
You may also receive comments and suggestions on the overall submission or specific details. If permitted, additional changes may be pushed into your pull request branch or a pull request may be filed in your LAMMPS fork on GitHub to include those changes.
The LAMMPS developer may then decide to assign the pull request to another developer (e.g. when that developer is more knowledgeable about the submitted feature or enhancement or has written the modified code). It may also happen, that additional developers are requested to provide a review and approve the changes. For submissions, that may change the general behavior of LAMMPS, or where a possibility of unwanted side effects exists, additional tests may be requested by the assigned developer.
If the assigned developer is satisfied and considers the submission ready for inclusion into LAMMPS, the pull request will be assigned to the LAMMPS lead developer, Steve Plimpton (@sjplimp), who will then have the final decision on whether the submission will be included, additional changes are required or it will be ultimately rejected. After the pull request is merged, you may delete the pull request branch in your personal LAMMPS fork.
Since the learning curve for git is quite steep for efficiently managing remote repositories, local and remote branches, pull requests and more, do not hesitate to ask questions, if you are not sure about how to do certain steps that are asked of you. Even if the changes asked of you do not make sense to you, they may be important for the LAMMPS developers. Please also note, that these all are guidelines and not set in stone.
_Is this a 'Bug Report' or a 'Suggestion for an Enhancement'?_
## Detailed Description (Enhancement Suggestion)
_Explain how you would like to see LAMMPS enhanced, what feature(s) you are looking for, provide references to relevant background information, and whether you are willing to implement the enhancement yourself or would like to participate in the implementation_
## LAMMPS Version (Bug Report)
_Please specify which LAMMPS version this issue was detected with. If this is not the latest development version, please stop and test that version, too, and report it here if the bug persists_
## Expected Behavior (Bug Report)
_Describe the expected behavior. Quote from the LAMMPS manual where needed or explain why the expected behavior is meaningful, especially when it differs from the manual_
## Actual Behavior (Bug Report)
_Describe the actual behavior, how it differs from the expected behavior, and how this can be observed. Try to be specific and do **not* use vague terms like "doesn't work" or "wrong result". Do not assume that the person reading this has any experience with or knowledge of your specific research._
## Steps to Reproduce (Bug Report)
_Describe the steps required to quickly reproduce the issue. You can attach (small) files to the section below or add URLs where to download an archive with all necessary files. Please try to create input that are as small as possible and run as fast as possible. NOTE: the less effort and time it takes to reproduce your issue, the more likely, that somebody will look into it._
## Further Information, Files, and Links
_Put any additional information here, attach relevant text or image files and URLs to external sites, e.g. relevant publications_
_Briefly describe the new feature(s), enhancement(s), or bugfix(es) included in this pull request. If this addresses an open GitHub Issue, mention the issue number, e.g. with `fixes #221` or `closes #135`, so that issue will be automatically closed when the pull request is merged_
## Author(s)
_Please state name and affiliation of the author or authors that should be credited with the changes in this pull request_
## Backward Compatibility
_Please state whether any changes in the pull request break backward compatibility for inputs, and - if yes - explain what has been changed and why_
## Implementation Notes
_Provide any relevant details about how the changes are implemented, how correctness was verified, how other features - if any - in LAMMPS are affected_
## Post Submission Checklist
_Please check the fields below as they are completed_
- [ ] The feature or features in this pull request is complete
- [ ] Suitable new documentation files and/or updates to the existing docs are included
- [ ] One or more example input decks are included
- [ ] The source code follows the LAMMPS formatting guidelines
## Further Information, Files, and Links
_Put any additional information here, attach relevant text or image files, and URLs to external sites (e.g. DOIs or webpages)_
set(LAMMPS_MEMALIGN"64"CACHESTRING"enables the use of the posix_memalign() call instead of malloc() when large chunks or memory are allocated by LAMMPS")
E =\epsilon\left[\frac{2\sigma_{LJ}^{12}\left(7 r^5+14 r^3\sigma_{n}^2+3 r \sigma_{n}^4\right)}{945\left(r^2-\sigma_{n}^2\right)^7}-\frac{\sigma_{LJ}^6\left(2 r \sigma_{n}^3+\sigma_{n}^2\left(r^2-\sigma_{n}^2\right)\log{\left[\frac{r-\sigma_{n}}{r+\sigma_{n}}\right]}\right)}{12\sigma_{n}^5\left(r^2-\sigma_{n}^2\right)}\right]\qquad\sigma_n < r < r_c
enable specific accelerator support via '-k on' "command-line switch"_Section_start.html#start_7, |
enable specific accelerator support via '-k on' "command-line switch"_Section_start.html#start_6, |
only needed for KOKKOS package |
set any needed options for the package via "-pk" "command-line switch"_Section_start.html#start_7 or "package"_package.html command, |
set any needed options for the package via "-pk" "command-line switch"_Section_start.html#start_6 or "package"_package.html command, |
only if defaults need to be changed |
use accelerated styles in your input via "-sf" "command-line switch"_Section_start.html#start_7 or "suffix"_suffix.html command | lmp_machine -in in.script -sf gpu
use accelerated styles in your input via "-sf" "command-line switch"_Section_start.html#start_6 or "suffix"_suffix.html command | lmp_machine -in in.script -sf gpu
:tb(c=2,s=|)
Note that the first 4 steps can be done as a single command, using the
src/Make.py tool. This tool is discussed in "Section
2.4"_Section_start.html#start_4 of the manual, and its use is
Note that the first 4 steps can be done as a single command with
suitable make command invocations. This is discussed in "Section
4"_Section_packages.html of the manual, and its use is
illustrated in the individual accelerator sections. Typically these
steps only need to be done once, to create an executable that uses one
@ -118,14 +131,17 @@ Package, Description, Doc page, Example, Library
"USER-EFF"_#USER-EFF, electron force field,"pair_style eff/cut"_pair_eff.html, USER/eff, -
"USER-FEP"_#USER-FEP, free energy perturbation,"compute fep"_compute_fep.html, USER/fep, -
"USER-H5MD"_#USER-H5MD, dump output via HDF5,"dump h5md"_dump_h5md.html, -, ext
"USER-INTEL"_#USER-INTEL, optimized Intel CPU and KNL styles,"Section 5.3.2"_accelerate_intel.html, WWW bench, -
"USER-INTEL"_#USER-INTEL, optimized Intel CPU and KNL styles,"Section 5.3.2"_accelerate_intel.html, "Benchmarks"_http://lammps.sandia.gov/bench.html, -
Here is a quick overview of how to use the KOKKOS package
for CPU acceleration, assuming one or more 16-core nodes.
KOKKOS_DEVICE sets the parallelization method used for Kokkos code
(within LAMMPS). KOKKOS_DEVICES=OpenMP means that OpenMP will be
used. KOKKOS_DEVICES=Pthreads means that pthreads will be used.
KOKKOS_DEVICES=Cuda means an NVIDIA GPU running CUDA will be used.
If KOKKOS_DEVICES=Cuda, then the lo-level Makefile in the src/MAKE
directory must use "nvcc" as its compiler, via its CC setting. For
best performance its CCFLAGS setting should use -O3 and have a
KOKKOS_ARCH setting that matches the compute capability of your NVIDIA
hardware and software installation, e.g. KOKKOS_ARCH=Kepler30. Note
the minimal required compute capability is 2.0, but this will give
significantly reduced performance compared to Kepler generation GPUs
with compute capability 3.x. For the LINK setting, "nvcc" should not
be used; instead use g++ or another compiler suitable for linking C++
applications. Often you will want to use your MPI compiler wrapper
for this setting (i.e. mpicxx). Finally, the lo-level Makefile must
also have a "Compilation rule" for creating *.o files from *.cu files.
See src/Makefile.cuda for an example of a lo-level Makefile with all
of these settings.
KOKKOS_USE_TPLS=hwloc binds threads to hardware cores, so they do not
migrate during a simulation. KOKKOS_USE_TPLS=hwloc should always be
used if running with KOKKOS_DEVICES=Pthreads for pthreads. It is not
necessary for KOKKOS_DEVICES=OpenMP for OpenMP, because OpenMP
provides alternative methods via environment variables for binding
threads to hardware cores. More info on binding threads to cores is
given in "Section 5.3"_Section_accelerate.html#acc_3.
KOKKOS_ARCH=KNC enables compiler switches needed when compiling for an
Intel Phi processor.
KOKKOS_USE_TPLS=librt enables use of a more accurate timer mechanism
on most Unix platforms. This library is not available on all
platforms.
KOKKOS_DEBUG is only useful when developing a Kokkos-enabled style
within LAMMPS. KOKKOS_DEBUG=yes enables printing of run-time
debugging information that can be useful. It also enables runtime
bounds checking on Kokkos data structures.
KOKKOS_CUDA_OPTIONS are additional options for CUDA.
For more information on Kokkos see the Kokkos programmers' guide here:
/lib/kokkos/doc/Kokkos_PG.pdf.
[Run with the KOKKOS package from the command line:]
The mpirun or mpiexec command sets the total number of MPI tasks used
by LAMMPS (one or multiple per compute node) and the number of MPI
tasks used per node. E.g. the mpirun command in MPICH does this via
its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode.
When using KOKKOS built with host=OMP, you need to choose how many
OpenMP threads per MPI task will be used (via the "-k" command-line
switch discussed below). Note that the product of MPI tasks * OpenMP
threads/task should not exceed the physical number of cores (on a
node), otherwise performance will suffer.
When using the KOKKOS package built with device=CUDA, you must use
exactly one MPI task per physical GPU.
When using the KOKKOS package built with host=MIC for Intel Xeon Phi
coprocessor support you need to insure there are one or more MPI tasks
per coprocessor, and choose the number of coprocessor threads to use
per MPI task (via the "-k" command-line switch discussed below). The
product of MPI tasks * coprocessor threads/task should not exceed the
maximum number of threads the coprocessor is designed to run,
otherwise performance will suffer. This value is 240 for current
generation Xeon Phi(TM) chips, which is 60 physical cores * 4
threads/core. Note that with the KOKKOS package you do not need to
specify how many Phi coprocessors there are per node; each
coprocessors is simply treated as running some number of MPI tasks.
mpirun -np 16 lmp_kokkos_mpi_only -k on -sf kk -in in.lj # 1 node, 16 MPI tasks/node, no multi-threading
mpirun -np 2 -ppn 1 lmp_kokkos_omp -k on t 16 -sf kk -in in.lj # 2 nodes, 1 MPI task/node, 16 threads/task
mpirun -np 2 lmp_kokkos_omp -k on t 8 -sf kk -in in.lj # 1 node, 2 MPI tasks/node, 8 threads/task
mpirun -np 32 -ppn 4 lmp_kokkos_omp -k on t 4 -sf kk -in in.lj # 8 nodes, 4 MPI tasks/node, 4 threads/task :pre
To run using the KOKKOS package, use the "-k on", "-sf kk" and "-pk kokkos" "command-line switches"_Section_start.html#start_7 in your mpirun command.
You must use the "-k on" "command-line
switch"_Section_start.html#start_7 to enable the KOKKOS package. It
switch"_Section_start.html#start_7 to enable the KOKKOS package. It
takes additional arguments for hardware settings appropriate to your
system. Those arguments are "documented
here"_Section_start.html#start_7. The two most commonly used
options are:
system. Those arguments are "documented
here"_Section_start.html#start_7. For OpenMP use:
-k on t Nt g Ng :pre
The "t Nt" option applies to host=OMP (even if device=CUDA) and
host=MIC. For host=OMP, it specifies how many OpenMP threads per MPI
task to use with a node. For host=MIC, it specifies how many Xeon Phi
threads per MPI task to use within a node. The default is Nt = 1.
Note that for host=OMP this is effectively MPI-only mode which may be
fine. But for host=MIC you will typically end up using far less than
all the 240 available threads, which could give very poor performance.
The "g Ng" option applies to device=CUDA. It specifies how many GPUs
per compute node to use. The default is 1, so this only needs to be
specified is you have 2 or more GPUs per compute node.
-k on t Nt :pre
The "t Nt" option specifies how many OpenMP threads per MPI
task to use with a node. The default is Nt = 1, which is MPI-only mode.
Note that the product of MPI tasks * OpenMP
threads/task should not exceed the physical number of cores (on a
node), otherwise performance will suffer. If hyperthreading is enabled, then
the product of MPI tasks * OpenMP threads/task should not exceed the
physical number of cores * hardware threads.
The "-k on" switch also issues a "package kokkos" command (with no
additional arguments) which sets various KOKKOS options to default
values, as discussed on the "package"_package.html command doc page.
Use the "-sf kk" "command-line switch"_Section_start.html#start_7,
which will automatically append "kk" to styles that support it. Use
the "-pk kokkos" "command-line switch"_Section_start.html#start_7 if
you wish to change any of the default "package kokkos"_package.html
optionns set by the "-k on" "command-line
switch"_Section_start.html#start_7.
The "-sf kk" "command-line switch"_Section_start.html#start_7
will automatically append the "/kk" suffix to styles that support it.
In this manner no modification to the input script is needed. Alternatively,
one can run with the KOKKOS package by editing the input script as described below.
Note that the default for the "package kokkos"_package.html command is
NOTE: The default for the "package kokkos"_package.html command is
to use "full" neighbor lists and set the Newton flag to "off" for both
pairwise and bonded interactions. This typically gives fastest
performance. If the "newton"_newton.html command is used in the input
script, it can override the Newton flag defaults.
However, when running in MPI-only mode with 1 thread per MPI task, it
pairwise and bonded interactions. However, when running on CPUs, it
will typically be faster to use "half" neighbor lists and set the
Newton flag to "on", just as is the case for non-accelerated pair
styles. You can do this with the "-pk" "command-line
switch"_Section_start.html#start_7.
styles. It can also be faster to use non-threaded communication.
Use the "-pk kokkos" "command-line switch"_Section_start.html#start_7 to
change the default "package kokkos"_package.html
options. See its doc page for details and default settings. Experimenting with
its options can provide a speed-up for specific calculations. For example:
[Or run with the KOKKOS package by editing an input script:]
mpirun -np 16 lmp_kokkos_mpi_only -k on -sf kk -pk kokkos newton on neigh half comm no -in in.lj # Newton on, Half neighbor list, non-threaded comm :pre
The discussion above for the mpirun/mpiexec command and setting
appropriate thread and GPU values for host=OMP or host=MIC or
device=CUDA are the same.
If the "newton"_newton.html command is used in the input
script, it can also override the Newton flag defaults.
[Core and Thread Affinity:]
When using multi-threading, it is important for
performance to bind both MPI tasks to physical cores, and threads to
physical cores, so they do not migrate during a simulation.
If you are not certain MPI tasks are being bound (check the defaults
for your MPI installation), binding can be forced with these flags:
For binding threads with KOKKOS OpenMP, use thread affinity
environment variables to force binding. With OpenMP 3.1 (gcc 4.7 or
later, intel 12 or later) setting the environment variable
OMP_PROC_BIND=true should be sufficient. In general, for best performance
with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads.
For binding threads with the
KOKKOS pthreads option, compile LAMMPS the KOKKOS HWLOC=yes option
as described below.
[Running on Knight's Landing (KNL) Intel Xeon Phi:]
Here is a quick overview of how to use the KOKKOS package
for the Intel Knight's Landing (KNL) Xeon Phi:
KNL Intel Phi chips have 68 physical cores. Typically 1 to 4 cores
are reserved for the OS, and only 64 or 66 cores are used. Each core
has 4 hyperthreads,so there are effectively N = 256 (4*64) or
N = 264 (4*66) cores to run on. The product of MPI tasks * OpenMP threads/task should not exceed this limit,
otherwise performance will suffer. Note that with the KOKKOS package you do not need to
specify how many KNLs there are per node; each
KNL is simply treated as running some number of MPI tasks.
Examples of mpirun commands that follow these rules are shown below.
Intel KNL node with 68 cores (272 threads/node via 4x hardware threading):
mpirun -np 64 lmp_kokkos_phi -k on t 4 -sf kk -in in.lj # 1 node, 64 MPI tasks/node, 4 threads/task
mpirun -np 66 lmp_kokkos_phi -k on t 4 -sf kk -in in.lj # 1 node, 66 MPI tasks/node, 4 threads/task
mpirun -np 32 lmp_kokkos_phi -k on t 8 -sf kk -in in.lj # 1 node, 32 MPI tasks/node, 8 threads/task
mpirun -np 512 -ppn 64 lmp_kokkos_phi -k on t 4 -sf kk -in in.lj # 8 nodes, 64 MPI tasks/node, 4 threads/task :pre
The -np setting of the mpirun command sets the number of MPI
tasks/node. The "-k on t Nt" command-line switch sets the number of
threads/task as Nt. The product of these two values should be N, i.e.
256 or 264.
NOTE: The default for the "package kokkos"_package.html command is
to use "full" neighbor lists and set the Newton flag to "off" for both
pairwise and bonded interactions. When running on KNL, this
will typically be best for pair-wise potentials. For manybody potentials,
using "half" neighbor lists and setting the
Newton flag to "on" may be faster. It can also be faster to use non-threaded communication.
Use the "-pk kokkos" "command-line switch"_Section_start.html#start_7 to
change the default "package kokkos"_package.html
options. See its doc page for details and default settings. Experimenting with
its options can provide a speed-up for specific calculations. For example:
mpirun -np 64 lmp_kokkos_phi -k on t 4 -sf kk -pk kokkos comm no -in in.lj # Newton off, full neighbor list, non-threaded comm
mpirun -np 64 lmp_kokkos_phi -k on t 4 -sf kk -pk kokkos newton on neigh half comm no -in in.reax # Newton on, half neighbor list, non-threaded comm :pre
NOTE: MPI tasks and threads should be bound to cores as described above for CPUs.
NOTE: To build with Kokkos support for Intel Xeon Phi coprocessors such as Knight's Corner (KNC), your
system must be configured to use them in "native" mode, not "offload"
mode like the USER-INTEL package supports.
[Running on GPUs:]
Use the "-k" "command-line switch"_Section_commands.html#start_7 to
specify the number of GPUs per node. Typically the -np setting
of the mpirun command should set the number of MPI
tasks/node to be equal to the # of physical GPUs on the node.
You can assign multiple MPI tasks to the same GPU with the
KOKKOS package, but this is usually only faster if significant portions
of the input script have not been ported to use Kokkos. Using CUDA MPS
is recommended in this scenario. As above for multi-core CPUs (and no GPU), if N is the number
of physical cores/node, then the number of MPI tasks/node should not exceed N.
-k on g Ng :pre
Here are examples of how to use the KOKKOS package for GPUs,
assuming one or more nodes, each with two GPUs:
mpirun -np 2 lmp_kokkos_cuda_openmpi -k on g 2 -sf kk -in in.lj # 1 node, 2 MPI tasks/node, 2 GPUs/node
mpirun -np 32 -ppn 2 lmp_kokkos_cuda_openmpi -k on g 2 -sf kk -in in.lj # 16 nodes, 2 MPI tasks/node, 2 GPUs/node (32 GPUs total) :pre
NOTE: The default for the "package kokkos"_package.html command is
to use "full" neighbor lists and set the Newton flag to "off" for both
pairwise and bonded interactions, along with threaded communication.
When running on Maxwell or Kepler GPUs, this will typically be best. For Pascal GPUs,
using "half" neighbor lists and setting the
Newton flag to "on" may be faster. For many pair styles, setting the neighbor binsize
equal to the ghost atom cutoff will give speedup.
Use the "-pk kokkos" "command-line switch"_Section_start.html#start_7 to
change the default "package kokkos"_package.html
options. See its doc page for details and default settings. Experimenting with
its options can provide a speed-up for specific calculations. For example:
mpirun -np 2 lmp_kokkos_cuda_openmpi -k on g 2 -sf kk -pk kokkos binsize 2.8 -in in.lj # Set binsize = neighbor ghost cutoff
mpirun -np 2 lmp_kokkos_cuda_openmpi -k on g 2 -sf kk -pk kokkos newton on neigh half binsize 2.8 -in in.lj # Newton on, half neighborlist, set binsize = neighbor ghost cutoff :pre
NOTE: For good performance of the KOKKOS package on GPUs, you must
have Kepler generation GPUs (or later). The Kokkos library exploits
texture cache options not supported by Telsa generation GPUs (or
older).
NOTE: When using a GPU, you will achieve the best performance if your
input script does not use fix or compute styles which are not yet
Kokkos-enabled. This allows data to stay on the GPU for multiple
timesteps, without being copied back to the host CPU. Invoking a
non-Kokkos fix or compute, or performing I/O for
"thermo"_thermo_style.html or "dump"_dump.html output will cause data
to be copied back to the CPU incurring a performance penalty.
NOTE: To get an accurate timing breakdown between time spend in pair,
kspace, etc., you must set the environment variable CUDA_LAUNCH_BLOCKING=1.
However, this will reduce performance and is not recommended for production runs.
[Run with the KOKKOS package by editing an input script:]
Alternatively the effect of the "-sf" or "-pk" switches can be
duplicated by adding the "package kokkos"_package.html or "suffix
kk"_suffix.html commands to your input script.
The discussion above for building LAMMPS with the KOKKOS package, the mpirun/mpiexec command, and setting
appropriate thread are the same.
You must still use the "-k on" "command-line
switch"_Section_start.html#start_7 to enable the KOKKOS package, and
specify its additional arguments for hardware options appropriate to
your system, as documented above.
Use the "suffix kk"_suffix.html command, or you can explicitly add a
You can use the "suffix kk"_suffix.html command, or you can explicitly add a
"kk" suffix to individual styles in your input script, e.g.
pair_style lj/cut/kk 2.5 :pre
@ -345,6 +318,40 @@ You only need to use the "package kokkos"_package.html command if you
wish to change any of its option defaults, as set by the "-k on"
"command-line switch"_Section_start.html#start_7.
[Using OpenMP threading and CUDA together (experimental):]
With the KOKKOS package, both OpenMP multi-threading and GPUs can be used
together in a few special cases. In the Makefile, the KOKKOS_DEVICES variable must
include both "Cuda" and "OpenMP", as is the case for /src/MAKE/OPTIONS/Makefile.kokkos_cuda_mpi
KOKKOS_DEVICES=Cuda,OpenMP :pre
The suffix "/kk" is equivalent to "/kk/device", and for Kokkos CUDA,
using the "-sf kk" in the command line gives the default CUDA version everywhere.
However, if the "/kk/host" suffix is added to a specific style in the input
script, the Kokkos OpenMP (CPU) version of that specific style will be used instead.
Set the number of OpenMP threads as "t Nt" and the number of GPUs as "g Ng"
-k on t Nt g Ng :pre
For example, the command to run with 1 GPU and 8 OpenMP threads is then:
mpiexec -np 1 lmp_kokkos_cuda_openmpi -in in.lj -k on g 1 t 8 -sf kk :pre
Conversely, if the "-sf kk/host" is used in the command line and then the
"/kk" or "/kk/device" suffix is added to a specific style in your input script,
then only that specific style will run on the GPU while everything else will
run on the CPU in OpenMP mode. Note that the execution of the CPU and GPU
styles will NOT overlap, except for a special case:
A kspace style and/or molecular topology (bonds, angles, etc.) running on
the host CPU can overlap with a pair style running on the GPU. First compile
with "--default-stream per-thread" added to CCFLAGS in the Kokkos CUDA Makefile.
Then explicitly use the "/kk/host" suffix for kspace and bonds, angles, etc.
in the input file and the "kk" suffix (equal to "kk/device") on the command line.
Also make sure the environment variable CUDA_LAUNCH_BLOCKING is not set to "1"
so CPU/GPU overlap can occur.
[Speed-ups to expect:]
The performance of KOKKOS running in different modes is a function of
@ -356,7 +363,7 @@ Generally speaking, the following rules of thumb apply:
When running on CPUs only, with a single thread per MPI task,
performance of a KOKKOS style is somewhere between the standard
(un-accelerated) styles (MPI-only mode), and those provided by the
USER-OMP package. However the difference between all 3 is small (less
USER-OMP package. However the difference between all 3 is small (less
than 20%). :ulb,l
When running on CPUs only, with multiple threads per MPI task,
@ -366,7 +373,7 @@ package. :l
When running large number of atoms per GPU, KOKKOS is typically faster
than the GPU package. :l
When running on Intel Xeon Phi, KOKKOS is not as fast as
When running on Intel hardware, KOKKOS is not as fast as
the USER-INTEL package, which is optimized for that hardware. :l
:ule
@ -374,123 +381,78 @@ See the "Benchmark page"_http://lammps.sandia.gov/bench.html of the
LAMMPS web site for performance of the KOKKOS package on different
hardware.
[Guidelines for best performance:]
[Advanced Kokkos options:]
Here are guidline for using the KOKKOS package on the different
hardware configurations listed above.
There are other allowed options when building with the KOKKOS package.
As above, they can be set either as variables on the make command line
or in Makefile.machine. This is the full list of options, including
those discussed above. Each takes a value shown below. The
default value is listed, which is set in the
/lib/kokkos/Makefile.kokkos file.
Many of the guidelines use the "package kokkos"_package.html command
See its doc page for details and default settings. Experimenting with
its options can provide a speed-up for specific calculations.
The {cosine/buck6d} angle style uses the potential
:c,image(Eqs/angle_cosine_buck6d.jpg)
where K is the energy constant, n is the periodic multiplicity and
Theta0 is the equilibrium angle.
The coefficients must be defined for each angle type via the
"angle_coeff"_angle_coeff.html command as in the example above, or in
the data file or restart files read by the "read_data"_read_data.html
or "read_restart"_read_restart.html commands in the following order:
K (energy)
n
Theta0 (degrees) :ul
Theta0 is specified in degrees, but LAMMPS converts it to radians
internally.
Additional to the cosine term the {cosine/buck6d} angle style computes
the short range (vdW) interaction belonging to the
"pair_buck6d"_pair_buck6d_coul_gauss.html between the end atoms of
the angle. For this reason this angle style only works in combination
with the "pair_buck6d"_pair_buck6d_coul_gauss.html styles and needs
the "special_bonds"_special_bonds.html 1-3 interactions to be weighted
0.0 to prevent double counting.
:line
[Restrictions:]
{cosine/buck6d} can only be used in combination with the
"pair_buck6d"_pair_buck6d_coul_gauss.html style and with a
"special_bonds"_special_bonds.html 0.0 weighting of 1-3 interactions.
This angle style can only be used if LAMMPS was built with the
USER-MOFFF package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info on packages.
[Related commands:]
"angle_coeff"_angle_coeff.html
[Default:] none
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.