- Optimizations for molecular systems
- Improved kernel performance and greater CPU overlap
- Reduced GPU to CPU communications for discrete devices
- Switch classic Intel makefiles to use LLVM-based compilers
- Prefetch optimizations supported for OpenCL
- Optimized data repack for quaternions
Since LAMMPS uses the low-level driver API of CUDA, it needs to ensure
that it is in the correct context when invoking such functions. At the
moment it creates and switches to its own context inside `UCL_Device::set`
but then assumes that the driver is still in that context for subsequent
calls into CUDA; if another part of the program uses a different context
(such as the CUDA runtime using the "primary" context) this will cause
failures inside LAMMPS.
This patch changes the context creation to instead use the primary
context for the requested device. While it's not perfect, in that it
still doesn't ensure that it's in the correct context before making
driver API calls, it at least allows it to work with libraries that use
the runtime API.