Update Kokkos library in LAMMPS to v3.3.0
This commit is contained in:
@ -65,10 +65,15 @@ which activates the OpenMP backend. All of the options controlling device backen
|
|||||||
|
|
||||||
## Spack
|
## Spack
|
||||||
An alternative to manually building with the CMake is to use the Spack package manager.
|
An alternative to manually building with the CMake is to use the Spack package manager.
|
||||||
To do so, download the `kokkos-spack` git repo and add to the package list:
|
Make sure you have downloaded [Spack](https://github.com/spack/spack).
|
||||||
|
The easiest way to configure the Spack environment is:
|
||||||
````bash
|
````bash
|
||||||
> spack repo add $path-to-kokkos-spack
|
> source spack/share/spack/setup-env.sh
|
||||||
````
|
````
|
||||||
|
with other scripts available for other shells.
|
||||||
|
You can display information about how to install packages with:
|
||||||
|
````bash
|
||||||
|
> spack info kokkos
|
||||||
A basic installation would be done as:
|
A basic installation would be done as:
|
||||||
````bash
|
````bash
|
||||||
> spack install kokkos
|
> spack install kokkos
|
||||||
@ -178,8 +183,8 @@ Options can be enabled by specifying `-DKokkos_ENABLE_X`.
|
|||||||
|
|
||||||
## Other Options
|
## Other Options
|
||||||
* Kokkos_CXX_STANDARD
|
* Kokkos_CXX_STANDARD
|
||||||
* The C++ standard for Kokkos to use: c++11, c++14, c++17, or c++20. This should be given in CMake style as 11, 14, 17, or 20.
|
* The C++ standard for Kokkos to use: c++14, c++17, or c++20. This should be given in CMake style as 14, 17, or 20.
|
||||||
* STRING Default: 11
|
* STRING Default: 14
|
||||||
|
|
||||||
## Third-party Libraries (TPLs)
|
## Third-party Libraries (TPLs)
|
||||||
The following options control enabling TPLs:
|
The following options control enabling TPLs:
|
||||||
|
|||||||
@ -1,5 +1,104 @@
|
|||||||
# Change Log
|
# Change Log
|
||||||
|
|
||||||
|
## [3.3.00](https://github.com/kokkos/kokkos/tree/3.3.00) (2020-12-16)
|
||||||
|
[Full Changelog](https://github.com/kokkos/kokkos/compare/3.2.01...3.3.00)
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- Require C++14 as minimum C++ standard. C++17 and C++20 are supported too.
|
||||||
|
- HIP backend is nearly feature complete. Kokkos Dynamic Task Graphs are missing.
|
||||||
|
- Major update for OpenMPTarget: many capabilities now work. For details contact us.
|
||||||
|
- Added DPC++/SYCL backend: primary capabilites are working.
|
||||||
|
- Added Kokkos Graph API analogous to CUDA Graphs.
|
||||||
|
- Added parallel_scan support with TeamThreadRange [\#3536](https://github.com/kokkos/kokkos/pull/#3536)
|
||||||
|
- Added Logical Memory Spaces [\#3546](https://github.com/kokkos/kokkos/pull/#3546)
|
||||||
|
- Added initial half precision support [\#3439](https://github.com/kokkos/kokkos/pull/#3439)
|
||||||
|
- Experimental feature: control cuda occupancy [\#3379](https://github.com/kokkos/kokkos/pull/#3379)
|
||||||
|
|
||||||
|
**Implemented enhancements Backends and Archs:**
|
||||||
|
- Add a64fx and fujitsu Compiler support [\#3614](https://github.com/kokkos/kokkos/pull/#3614)
|
||||||
|
- Adding support for AMD gfx908 archictecture [\#3375](https://github.com/kokkos/kokkos/pull/#3375)
|
||||||
|
- SYCL parallel\_for MDRangePolicy [\#3583](https://github.com/kokkos/kokkos/pull/#3583)
|
||||||
|
- SYCL add parallel\_scan [\#3577](https://github.com/kokkos/kokkos/pull/#3577)
|
||||||
|
- SYCL custom reductions [\#3544](https://github.com/kokkos/kokkos/pull/#3544)
|
||||||
|
- SYCL Enable container unit tests [\#3550](https://github.com/kokkos/kokkos/pull/#3550)
|
||||||
|
- SYCL feature level 5 [\#3480](https://github.com/kokkos/kokkos/pull/#3480)
|
||||||
|
- SYCL Feature level 4 (parallel\_for) [\#3474](https://github.com/kokkos/kokkos/pull/#3474)
|
||||||
|
- SYCL feature level 3 [\#3451](https://github.com/kokkos/kokkos/pull/#3451)
|
||||||
|
- SYCL feature level 2 [\#3447](https://github.com/kokkos/kokkos/pull/#3447)
|
||||||
|
- OpenMPTarget: Hierarchial reduction for + operator on scalars [\#3504](https://github.com/kokkos/kokkos/pull/#3504)
|
||||||
|
- OpenMPTarget hierarchical [\#3411](https://github.com/kokkos/kokkos/pull/#3411)
|
||||||
|
- HIP Add Impl::atomic\_[store,load] [\#3440](https://github.com/kokkos/kokkos/pull/#3440)
|
||||||
|
- HIP enable global lock arrays [\#3418](https://github.com/kokkos/kokkos/pull/#3418)
|
||||||
|
- HIP Implement multiple occupancy paths for various HIP kernel launchers [\#3366](https://github.com/kokkos/kokkos/pull/#3366)
|
||||||
|
|
||||||
|
**Implemented enhancements Policies:**
|
||||||
|
- MDRangePolicy: Let it be semiregular [\#3494](https://github.com/kokkos/kokkos/pull/#3494)
|
||||||
|
- MDRangePolicy: Check narrowing conversion in construction [\#3527](https://github.com/kokkos/kokkos/pull/#3527)
|
||||||
|
- MDRangePolicy: CombinedReducers support [\#3395](https://github.com/kokkos/kokkos/pull/#3395)
|
||||||
|
- Kokkos Graph: Interface and Default Implementation [\#3362](https://github.com/kokkos/kokkos/pull/#3362)
|
||||||
|
- Kokkos Graph: add Cuda Graph implementation [\#3369](https://github.com/kokkos/kokkos/pull/#3369)
|
||||||
|
- TeamPolicy: implemented autotuning of team sizes and vector lengths [\#3206](https://github.com/kokkos/kokkos/pull/#3206)
|
||||||
|
- RangePolicy: Initialize all data members in default constructor [\#3509](https://github.com/kokkos/kokkos/pull/#3509)
|
||||||
|
|
||||||
|
**Implemented enhancements BuildSystem:**
|
||||||
|
- Auto-generate core test files for all backends [\#3488](https://github.com/kokkos/kokkos/pull/#3488)
|
||||||
|
- Avoid rewriting test files when calling cmake [\#3548](https://github.com/kokkos/kokkos/pull/#3548)
|
||||||
|
- RULE\_LAUNCH\_COMPILE and RULE\_LAUNCH\_LINK system for nvcc\_wrapper [\#3136](https://github.com/kokkos/kokkos/pull/#3136)
|
||||||
|
- Adding -include as a known argument to nvcc\_wrapper [\#3434](https://github.com/kokkos/kokkos/pull/#3434)
|
||||||
|
- Install hpcbind script [\#3402](https://github.com/kokkos/kokkos/pull/#3402)
|
||||||
|
- cmake/kokkos\_tribits.cmake: add parsing for args [\#3457](https://github.com/kokkos/kokkos/pull/#3457)
|
||||||
|
|
||||||
|
**Implemented enhancements Tools:**
|
||||||
|
- Changed namespacing of Kokkos::Tools::Impl::Impl::tune\_policy [\#3455](https://github.com/kokkos/kokkos/pull/#3455)
|
||||||
|
- Delegate to an impl allocate/deallocate method to allow specifying a SpaceHandle for MemorySpaces [\#3530](https://github.com/kokkos/kokkos/pull/#3530)
|
||||||
|
- Use the Kokkos Profiling interface rather than the Impl interface [\#3518](https://github.com/kokkos/kokkos/pull/#3518)
|
||||||
|
- Runtime option for tuning [\#3459](https://github.com/kokkos/kokkos/pull/#3459)
|
||||||
|
- Dual View Tool Events [\#3326](https://github.com/kokkos/kokkos/pull/#3326)
|
||||||
|
|
||||||
|
**Implemented enhancements Other:**
|
||||||
|
- Abort on errors instead of just printing [\#3528](https://github.com/kokkos/kokkos/pull/#3528)
|
||||||
|
- Enable C++14 macros unconditionally [\#3449](https://github.com/kokkos/kokkos/pull/#3449)
|
||||||
|
- Make ViewMapping trivially copyable [\#3436](https://github.com/kokkos/kokkos/pull/#3436)
|
||||||
|
- Rename struct ViewMapping to class [\#3435](https://github.com/kokkos/kokkos/pull/#3435)
|
||||||
|
- Replace enums in Kokkos\_ViewMapping.hpp (removes -Wextra) [\#3422](https://github.com/kokkos/kokkos/pull/#3422)
|
||||||
|
- Use bool for enums representing bools [\#3416](https://github.com/kokkos/kokkos/pull/#3416)
|
||||||
|
- Fence active instead of default execution space instances [\#3388](https://github.com/kokkos/kokkos/pull/#3388)
|
||||||
|
- Refactor parallel\_reduce fence usage [\#3359](https://github.com/kokkos/kokkos/pull/#3359)
|
||||||
|
- Moved Space EBO helpers to Kokkos\_EBO [\#3357](https://github.com/kokkos/kokkos/pull/#3357)
|
||||||
|
- Add remove\_cvref type trait [\#3340](https://github.com/kokkos/kokkos/pull/#3340)
|
||||||
|
- Adding identity type traits and update definition of identity\_t alias [\#3339](https://github.com/kokkos/kokkos/pull/#3339)
|
||||||
|
- Add is\_specialization\_of type trait [\#3338](https://github.com/kokkos/kokkos/pull/#3338)
|
||||||
|
- Make ScratchMemorySpace semi-regular [\#3309](https://github.com/kokkos/kokkos/pull/#3309)
|
||||||
|
- Optimize min/max atomics with early exit on no-op case [\#3265](https://github.com/kokkos/kokkos/pull/#3265)
|
||||||
|
- Refactor Backend Development [\#2941](https://github.com/kokkos/kokkos/pull/#2941)
|
||||||
|
|
||||||
|
**Fixed bugs:**
|
||||||
|
- Fixup MDRangePolicy construction from Kokkos arrays [\#3591](https://github.com/kokkos/kokkos/pull/#3591)
|
||||||
|
- Add atomic functions for unsigned long long using gcc built-in [\#3588](https://github.com/kokkos/kokkos/pull/#3588)
|
||||||
|
- Fixup silent pointless comparison with zero in checked\_narrow\_cast (compiler workaround) [\#3566](https://github.com/kokkos/kokkos/pull/#3566)
|
||||||
|
- Fixes for ROCm 3.9 [\#3565](https://github.com/kokkos/kokkos/pull/#3565)
|
||||||
|
- Fix windows build issues which crept in for the CUDA build [\#3532](https://github.com/kokkos/kokkos/pull/#3532)
|
||||||
|
- HIP Fix atomics of large data types and clean up lock arrays [\#3529](https://github.com/kokkos/kokkos/pull/#3529)
|
||||||
|
- Pthreads fix exception resulting from 0 grain size [\#3510](https://github.com/kokkos/kokkos/pull/#3510)
|
||||||
|
- Fixup do not require atomic operation to be default constructible [\#3503](https://github.com/kokkos/kokkos/pull/#3503)
|
||||||
|
- Fix race condition in HIP backend [\#3467](https://github.com/kokkos/kokkos/pull/#3467)
|
||||||
|
- Replace KOKKOS\_DEBUG with KOKKOS\_ENABLE\_DEBUG [\#3458](https://github.com/kokkos/kokkos/pull/#3458)
|
||||||
|
- Fix multi-stream team scratch space definition for HIP [\#3398](https://github.com/kokkos/kokkos/pull/#3398)
|
||||||
|
- HIP fix template deduction [\#3393](https://github.com/kokkos/kokkos/pull/#3393)
|
||||||
|
- Fix compiling with HIP and C++17 [\#3390](https://github.com/kokkos/kokkos/pull/#3390)
|
||||||
|
- Fix sigFPE in HIP blocksize deduction [\#3378](https://github.com/kokkos/kokkos/pull/#3378)
|
||||||
|
- Type alias change: replace CS with CTS to avoid conflicts with NVSHMEM [\#3348](https://github.com/kokkos/kokkos/pull/#3348)
|
||||||
|
- Clang compilation of CUDA backend on Windows [\#3345](https://github.com/kokkos/kokkos/pull/#3345)
|
||||||
|
- Fix HBW support [\#3343](https://github.com/kokkos/kokkos/pull/#3343)
|
||||||
|
- Added missing fences to unique token [\#3260](https://github.com/kokkos/kokkos/pull/#3260)
|
||||||
|
|
||||||
|
**Incompatibilities:**
|
||||||
|
- Remove unused utilities (forward, move, and expand\_variadic) from Kokkos::Impl [\#3535](https://github.com/kokkos/kokkos/pull/#3535)
|
||||||
|
- Remove unused traits [\#3534](https://github.com/kokkos/kokkos/pull/#3534)
|
||||||
|
- HIP: Remove old HCC code [\#3301](https://github.com/kokkos/kokkos/pull/#3301)
|
||||||
|
- Prepare for deprecation of ViewAllocateWithoutInitializing [\#3264](https://github.com/kokkos/kokkos/pull/#3264)
|
||||||
|
- Remove ROCm backend [\#3148](https://github.com/kokkos/kokkos/pull/#3148)
|
||||||
|
|
||||||
## [3.2.01](https://github.com/kokkos/kokkos/tree/3.2.01) (2020-11-17)
|
## [3.2.01](https://github.com/kokkos/kokkos/tree/3.2.01) (2020-11-17)
|
||||||
[Full Changelog](https://github.com/kokkos/kokkos/compare/3.2.00...3.2.01)
|
[Full Changelog](https://github.com/kokkos/kokkos/compare/3.2.00...3.2.01)
|
||||||
|
|
||||||
@ -36,37 +135,31 @@
|
|||||||
- Windows Cuda support [\#3018](https://github.com/kokkos/kokkos/issues/3018)
|
- Windows Cuda support [\#3018](https://github.com/kokkos/kokkos/issues/3018)
|
||||||
- Pass `-Wext-lambda-captures-this` to NVCC when support for `__host__ __device__` lambda is enabled from CUDA 11 [\#3241](https://github.com/kokkos/kokkos/issues/3241)
|
- Pass `-Wext-lambda-captures-this` to NVCC when support for `__host__ __device__` lambda is enabled from CUDA 11 [\#3241](https://github.com/kokkos/kokkos/issues/3241)
|
||||||
- Use explicit staging buffer for constant memory kernel launches and cleanup host/device synchronization [\#3234](https://github.com/kokkos/kokkos/issues/3234)
|
- Use explicit staging buffer for constant memory kernel launches and cleanup host/device synchronization [\#3234](https://github.com/kokkos/kokkos/issues/3234)
|
||||||
- Various fixup to policies including making TeamPolicy default constructible and making RangePolicy and TeamPolicy assignable 1: [\#3202](https://github.com/kokkos/kokkos/issues/3202)
|
- Various fixup to policies including making TeamPolicy default constructible and making RangePolicy and TeamPolicy assignable: [\#3202](https://github.com/kokkos/kokkos/issues/3202) , [\#3203](https://github.com/kokkos/kokkos/issues/3203) , [\#3196](https://github.com/kokkos/kokkos/issues/3196)
|
||||||
- Various fixup to policies including making TeamPolicy default constructible and making RangePolicy and TeamPolicy assignable 2: [\#3203](https://github.com/kokkos/kokkos/issues/3203)
|
|
||||||
- Various fixup to policies including making TeamPolicy default constructible and making RangePolicy and TeamPolicy assignable 3: [\#3196](https://github.com/kokkos/kokkos/issues/3196)
|
|
||||||
- Annotations for `DefaultExectutionSpace` and `DefaultHostExectutionSpace` to use in static analysis [\#3189](https://github.com/kokkos/kokkos/issues/3189)
|
- Annotations for `DefaultExectutionSpace` and `DefaultHostExectutionSpace` to use in static analysis [\#3189](https://github.com/kokkos/kokkos/issues/3189)
|
||||||
- Add documentation on using Spack to install Kokkos and developing packages that depend on Kokkos [\#3187](https://github.com/kokkos/kokkos/issues/3187)
|
- Add documentation on using Spack to install Kokkos and developing packages that depend on Kokkos [\#3187](https://github.com/kokkos/kokkos/issues/3187)
|
||||||
- Improve support for nvcc\_wrapper with exotic host compiler [\#3186](https://github.com/kokkos/kokkos/issues/3186)
|
|
||||||
- Add OpenMPTarget backend flags for NVC++ compiler [\#3185](https://github.com/kokkos/kokkos/issues/3185)
|
- Add OpenMPTarget backend flags for NVC++ compiler [\#3185](https://github.com/kokkos/kokkos/issues/3185)
|
||||||
- Move deep\_copy/create\_mirror\_view on Experimental::OffsetView into Kokkos:: namespace [\#3166](https://github.com/kokkos/kokkos/issues/3166)
|
- Move deep\_copy/create\_mirror\_view on Experimental::OffsetView into Kokkos:: namespace [\#3166](https://github.com/kokkos/kokkos/issues/3166)
|
||||||
- Allow for larger block size in HIP [\#3165](https://github.com/kokkos/kokkos/issues/3165)
|
- Allow for larger block size in HIP [\#3165](https://github.com/kokkos/kokkos/issues/3165)
|
||||||
- View: Added names of Views to the different View initialize/free kernels [\#3159](https://github.com/kokkos/kokkos/issues/3159)
|
- View: Added names of Views to the different View initialize/free kernels [\#3159](https://github.com/kokkos/kokkos/issues/3159)
|
||||||
- Cuda: Caching cudaFunctorAttributes and whether L1/Shmem prefer was set [\#3151](https://github.com/kokkos/kokkos/issues/3151)
|
- Cuda: Caching cudaFunctorAttributes and whether L1/Shmem prefer was set [\#3151](https://github.com/kokkos/kokkos/issues/3151)
|
||||||
- BuildSystem: Provide an explicit default CMAKE\_BUILD\_TYPE [\#3131](https://github.com/kokkos/kokkos/issues/3131)
|
- BuildSystem: Improved performance in default configuration by defaulting to Release build [\#3131](https://github.com/kokkos/kokkos/issues/3131)
|
||||||
- Cuda: Update CUDA occupancy calculation [\#3124](https://github.com/kokkos/kokkos/issues/3124)
|
- Cuda: Update CUDA occupancy calculation [\#3124](https://github.com/kokkos/kokkos/issues/3124)
|
||||||
- Vector: Adding data() to Vector [\#3123](https://github.com/kokkos/kokkos/issues/3123)
|
- Vector: Adding data() to Vector [\#3123](https://github.com/kokkos/kokkos/issues/3123)
|
||||||
- BuildSystem: Add CUDA Ampere configuration support [\#3122](https://github.com/kokkos/kokkos/issues/3122)
|
- BuildSystem: Add CUDA Ampere configuration support [\#3122](https://github.com/kokkos/kokkos/issues/3122)
|
||||||
- General: Apply [[noreturn]] to Kokkos::abort when applicable [\#3106](https://github.com/kokkos/kokkos/issues/3106)
|
- General: Apply [[noreturn]] to Kokkos::abort when applicable [\#3106](https://github.com/kokkos/kokkos/issues/3106)
|
||||||
- TeamPolicy: Validate storage level argument passed to TeamPolicy::set\_scratch\_size() [\#3098](https://github.com/kokkos/kokkos/issues/3098)
|
- TeamPolicy: Validate storage level argument passed to TeamPolicy::set\_scratch\_size() [\#3098](https://github.com/kokkos/kokkos/issues/3098)
|
||||||
- nvcc\_wrapper: send --cudart to nvcc instead of host compiler [\#3092](https://github.com/kokkos/kokkos/issues/3092)
|
|
||||||
- BuildSystem: Make kokkos\_has\_string() function in Makefile.kokkos case insensitive [\#3091](https://github.com/kokkos/kokkos/issues/3091)
|
- BuildSystem: Make kokkos\_has\_string() function in Makefile.kokkos case insensitive [\#3091](https://github.com/kokkos/kokkos/issues/3091)
|
||||||
- Modify KOKKOS\_FUNCTION macro for clang-tidy analysis [\#3087](https://github.com/kokkos/kokkos/issues/3087)
|
- Modify KOKKOS\_FUNCTION macro for clang-tidy analysis [\#3087](https://github.com/kokkos/kokkos/issues/3087)
|
||||||
- Move allocation profiling to allocate/deallocate calls [\#3084](https://github.com/kokkos/kokkos/issues/3084)
|
- Move allocation profiling to allocate/deallocate calls [\#3084](https://github.com/kokkos/kokkos/issues/3084)
|
||||||
- BuildSystem: FATAL\_ERROR when attempting in-source build [\#3082](https://github.com/kokkos/kokkos/issues/3082)
|
- BuildSystem: FATAL\_ERROR when attempting in-source build [\#3082](https://github.com/kokkos/kokkos/issues/3082)
|
||||||
- Change enums in ScatterView to types [\#3076](https://github.com/kokkos/kokkos/issues/3076)
|
- Change enums in ScatterView to types [\#3076](https://github.com/kokkos/kokkos/issues/3076)
|
||||||
- HIP: Changes for new compiler/runtime [\#3067](https://github.com/kokkos/kokkos/issues/3067)
|
- HIP: Changes for new compiler/runtime [\#3067](https://github.com/kokkos/kokkos/issues/3067)
|
||||||
- Extract and use get\_gpu [\#3061](https://github.com/kokkos/kokkos/issues/3061)
|
- Extract and use get\_gpu [\#3061](https://github.com/kokkos/kokkos/issues/3061) , [\#3048](https://github.com/kokkos/kokkos/issues/3048)
|
||||||
- Extract and use get\_gpu [\#3048](https://github.com/kokkos/kokkos/issues/3048)
|
|
||||||
- Add is\_allocated to View-like containers [\#3059](https://github.com/kokkos/kokkos/issues/3059)
|
- Add is\_allocated to View-like containers [\#3059](https://github.com/kokkos/kokkos/issues/3059)
|
||||||
- Combined reducers for scalar references [\#3052](https://github.com/kokkos/kokkos/issues/3052)
|
- Combined reducers for scalar references [\#3052](https://github.com/kokkos/kokkos/issues/3052)
|
||||||
- Add configurable capacity for UniqueToken [\#3051](https://github.com/kokkos/kokkos/issues/3051)
|
- Add configurable capacity for UniqueToken [\#3051](https://github.com/kokkos/kokkos/issues/3051)
|
||||||
- Add installation testing [\#3034](https://github.com/kokkos/kokkos/issues/3034)
|
- Add installation testing [\#3034](https://github.com/kokkos/kokkos/issues/3034)
|
||||||
- BuildSystem: Add -expt-relaxed-constexpr flag to nvcc\_wrapper [\#3021](https://github.com/kokkos/kokkos/issues/3021)
|
|
||||||
- HIP: Add UniqueToken [\#3020](https://github.com/kokkos/kokkos/issues/3020)
|
- HIP: Add UniqueToken [\#3020](https://github.com/kokkos/kokkos/issues/3020)
|
||||||
- Autodetect number of devices [\#3013](https://github.com/kokkos/kokkos/issues/3013)
|
- Autodetect number of devices [\#3013](https://github.com/kokkos/kokkos/issues/3013)
|
||||||
|
|
||||||
@ -82,11 +175,13 @@
|
|||||||
- ScatterView: fix for OpenmpTarget remove inheritance from reducers [\#3162](https://github.com/kokkos/kokkos/issues/3162)
|
- ScatterView: fix for OpenmpTarget remove inheritance from reducers [\#3162](https://github.com/kokkos/kokkos/issues/3162)
|
||||||
- BuildSystem: Set OpenMP flags according to host compiler [\#3127](https://github.com/kokkos/kokkos/issues/3127)
|
- BuildSystem: Set OpenMP flags according to host compiler [\#3127](https://github.com/kokkos/kokkos/issues/3127)
|
||||||
- OpenMP: Fix logic for nested omp in partition\_master bug [\#3101](https://github.com/kokkos/kokkos/issues/3101)
|
- OpenMP: Fix logic for nested omp in partition\_master bug [\#3101](https://github.com/kokkos/kokkos/issues/3101)
|
||||||
|
- nvcc\_wrapper: send --cudart to nvcc instead of host compiler [\#3092](https://github.com/kokkos/kokkos/issues/3092)
|
||||||
- BuildSystem: Fixes for Cuda/11 and c++17 [\#3085](https://github.com/kokkos/kokkos/issues/3085)
|
- BuildSystem: Fixes for Cuda/11 and c++17 [\#3085](https://github.com/kokkos/kokkos/issues/3085)
|
||||||
- HIP: Fix print\_configuration [\#3080](https://github.com/kokkos/kokkos/issues/3080)
|
- HIP: Fix print\_configuration [\#3080](https://github.com/kokkos/kokkos/issues/3080)
|
||||||
- Conditionally define get\_gpu [\#3072](https://github.com/kokkos/kokkos/issues/3072)
|
- Conditionally define get\_gpu [\#3072](https://github.com/kokkos/kokkos/issues/3072)
|
||||||
- Fix bounds for ranges in random number generator [\#3069](https://github.com/kokkos/kokkos/issues/3069)
|
- Fix bounds for ranges in random number generator [\#3069](https://github.com/kokkos/kokkos/issues/3069)
|
||||||
- Fix Cuda minor arch check [\#3035](https://github.com/kokkos/kokkos/issues/3035)
|
- Fix Cuda minor arch check [\#3035](https://github.com/kokkos/kokkos/issues/3035)
|
||||||
|
- BuildSystem: Add -expt-relaxed-constexpr flag to nvcc\_wrapper [\#3021](https://github.com/kokkos/kokkos/issues/3021)
|
||||||
|
|
||||||
**Incompatibilities:**
|
**Incompatibilities:**
|
||||||
|
|
||||||
|
|||||||
@ -111,8 +111,8 @@ ENDIF()
|
|||||||
|
|
||||||
|
|
||||||
set(Kokkos_VERSION_MAJOR 3)
|
set(Kokkos_VERSION_MAJOR 3)
|
||||||
set(Kokkos_VERSION_MINOR 2)
|
set(Kokkos_VERSION_MINOR 3)
|
||||||
set(Kokkos_VERSION_PATCH 1)
|
set(Kokkos_VERSION_PATCH 0)
|
||||||
set(Kokkos_VERSION "${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}.${Kokkos_VERSION_PATCH}")
|
set(Kokkos_VERSION "${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}.${Kokkos_VERSION_PATCH}")
|
||||||
math(EXPR KOKKOS_VERSION "${Kokkos_VERSION_MAJOR} * 10000 + ${Kokkos_VERSION_MINOR} * 100 + ${Kokkos_VERSION_PATCH}")
|
math(EXPR KOKKOS_VERSION "${Kokkos_VERSION_MAJOR} * 10000 + ${Kokkos_VERSION_MINOR} * 100 + ${Kokkos_VERSION_PATCH}")
|
||||||
|
|
||||||
@ -139,13 +139,15 @@ ENDIF()
|
|||||||
# I really wish these were regular variables
|
# I really wish these were regular variables
|
||||||
# but scoping issues can make it difficult
|
# but scoping issues can make it difficult
|
||||||
GLOBAL_SET(KOKKOS_COMPILE_OPTIONS)
|
GLOBAL_SET(KOKKOS_COMPILE_OPTIONS)
|
||||||
GLOBAL_SET(KOKKOS_LINK_OPTIONS)
|
GLOBAL_SET(KOKKOS_LINK_OPTIONS -DKOKKOS_DEPENDENCE)
|
||||||
GLOBAL_SET(KOKKOS_CUDA_OPTIONS)
|
GLOBAL_SET(KOKKOS_CUDA_OPTIONS)
|
||||||
GLOBAL_SET(KOKKOS_CUDAFE_OPTIONS)
|
GLOBAL_SET(KOKKOS_CUDAFE_OPTIONS)
|
||||||
GLOBAL_SET(KOKKOS_XCOMPILER_OPTIONS)
|
GLOBAL_SET(KOKKOS_XCOMPILER_OPTIONS)
|
||||||
# We need to append text here for making sure TPLs
|
# We need to append text here for making sure TPLs
|
||||||
# we import are available for an installed Kokkos
|
# we import are available for an installed Kokkos
|
||||||
GLOBAL_SET(KOKKOS_TPL_EXPORTS)
|
GLOBAL_SET(KOKKOS_TPL_EXPORTS)
|
||||||
|
# this could probably be scoped to project
|
||||||
|
GLOBAL_SET(KOKKOS_COMPILE_DEFINITIONS KOKKOS_DEPENDENCE)
|
||||||
|
|
||||||
# Include a set of Kokkos-specific wrapper functions that
|
# Include a set of Kokkos-specific wrapper functions that
|
||||||
# will either call raw CMake or TriBITS
|
# will either call raw CMake or TriBITS
|
||||||
@ -191,8 +193,6 @@ ELSE()
|
|||||||
SET(KOKKOS_IS_SUBDIRECTORY FALSE)
|
SET(KOKKOS_IS_SUBDIRECTORY FALSE)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#------------------------------------------------------------------------------
|
#------------------------------------------------------------------------------
|
||||||
#
|
#
|
||||||
# A) Forward declare the package so that certain options are also defined for
|
# A) Forward declare the package so that certain options are also defined for
|
||||||
@ -253,9 +253,7 @@ KOKKOS_PROCESS_SUBPACKAGES()
|
|||||||
KOKKOS_PACKAGE_DEF()
|
KOKKOS_PACKAGE_DEF()
|
||||||
KOKKOS_EXCLUDE_AUTOTOOLS_FILES()
|
KOKKOS_EXCLUDE_AUTOTOOLS_FILES()
|
||||||
KOKKOS_PACKAGE_POSTPROCESS()
|
KOKKOS_PACKAGE_POSTPROCESS()
|
||||||
|
KOKKOS_CONFIGURE_CORE()
|
||||||
#We are ready to configure the header
|
|
||||||
CONFIGURE_FILE(cmake/KokkosCore_config.h.in KokkosCore_config.h @ONLY)
|
|
||||||
|
|
||||||
IF (NOT KOKKOS_HAS_TRILINOS AND NOT Kokkos_INSTALL_TESTING)
|
IF (NOT KOKKOS_HAS_TRILINOS AND NOT Kokkos_INSTALL_TESTING)
|
||||||
ADD_LIBRARY(kokkos INTERFACE)
|
ADD_LIBRARY(kokkos INTERFACE)
|
||||||
@ -272,7 +270,10 @@ INCLUDE(${KOKKOS_SRC_PATH}/cmake/kokkos_install.cmake)
|
|||||||
# executables also need nvcc_wrapper. Thus, we need to install it.
|
# executables also need nvcc_wrapper. Thus, we need to install it.
|
||||||
# If the argument of DESTINATION is a relative path, CMake computes it
|
# If the argument of DESTINATION is a relative path, CMake computes it
|
||||||
# as relative to ${CMAKE_INSTALL_PATH}.
|
# as relative to ${CMAKE_INSTALL_PATH}.
|
||||||
INSTALL(PROGRAMS ${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper DESTINATION ${CMAKE_INSTALL_BINDIR})
|
# KOKKOS_INSTALL_ADDITIONAL_FILES will install nvcc wrapper and other generated
|
||||||
|
# files
|
||||||
|
KOKKOS_INSTALL_ADDITIONAL_FILES()
|
||||||
|
|
||||||
|
|
||||||
# Finally - if we are a subproject - make sure the enabled devices are visible
|
# Finally - if we are a subproject - make sure the enabled devices are visible
|
||||||
IF (HAS_PARENT)
|
IF (HAS_PARENT)
|
||||||
|
|||||||
@ -11,27 +11,27 @@ CXXFLAGS += $(SHFLAGS)
|
|||||||
endif
|
endif
|
||||||
|
|
||||||
KOKKOS_VERSION_MAJOR = 3
|
KOKKOS_VERSION_MAJOR = 3
|
||||||
KOKKOS_VERSION_MINOR = 2
|
KOKKOS_VERSION_MINOR = 3
|
||||||
KOKKOS_VERSION_PATCH = 1
|
KOKKOS_VERSION_PATCH = 0
|
||||||
KOKKOS_VERSION = $(shell echo $(KOKKOS_VERSION_MAJOR)*10000+$(KOKKOS_VERSION_MINOR)*100+$(KOKKOS_VERSION_PATCH) | bc)
|
KOKKOS_VERSION = $(shell echo $(KOKKOS_VERSION_MAJOR)*10000+$(KOKKOS_VERSION_MINOR)*100+$(KOKKOS_VERSION_PATCH) | bc)
|
||||||
|
|
||||||
# Options: Cuda,HIP,ROCm,OpenMP,Pthread,Serial
|
# Options: Cuda,HIP,OpenMP,Pthread,Serial
|
||||||
KOKKOS_DEVICES ?= "OpenMP"
|
KOKKOS_DEVICES ?= "OpenMP"
|
||||||
#KOKKOS_DEVICES ?= "Pthread"
|
#KOKKOS_DEVICES ?= "Pthread"
|
||||||
# Options:
|
# Options:
|
||||||
# Intel: KNC,KNL,SNB,HSW,BDW,SKX
|
# Intel: KNC,KNL,SNB,HSW,BDW,SKX
|
||||||
# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80
|
# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80
|
||||||
# ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2
|
# ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX
|
||||||
# IBM: BGQ,Power7,Power8,Power9
|
# IBM: BGQ,Power7,Power8,Power9
|
||||||
# AMD-GPUS: Vega900,Vega906
|
# AMD-GPUS: Vega900,Vega906,Vega908
|
||||||
# AMD-CPUS: AMDAVX,Zen,Zen2
|
# AMD-CPUS: AMDAVX,Zen,Zen2
|
||||||
KOKKOS_ARCH ?= ""
|
KOKKOS_ARCH ?= ""
|
||||||
# Options: yes,no
|
# Options: yes,no
|
||||||
KOKKOS_DEBUG ?= "no"
|
KOKKOS_DEBUG ?= "no"
|
||||||
# Options: hwloc,librt,experimental_memkind
|
# Options: hwloc,librt,experimental_memkind
|
||||||
KOKKOS_USE_TPLS ?= ""
|
KOKKOS_USE_TPLS ?= ""
|
||||||
# Options: c++11,c++14,c++1y,c++17,c++1z,c++2a
|
# Options: c++14,c++1y,c++17,c++1z,c++2a
|
||||||
KOKKOS_CXX_STANDARD ?= "c++11"
|
KOKKOS_CXX_STANDARD ?= "c++14"
|
||||||
# Options: aggressive_vectorization,disable_profiling,enable_large_mem_tests,disable_complex_align
|
# Options: aggressive_vectorization,disable_profiling,enable_large_mem_tests,disable_complex_align
|
||||||
KOKKOS_OPTIONS ?= ""
|
KOKKOS_OPTIONS ?= ""
|
||||||
KOKKOS_CMAKE ?= "no"
|
KOKKOS_CMAKE ?= "no"
|
||||||
@ -66,7 +66,6 @@ kokkos_path_exists=$(if $(wildcard $1),1,0)
|
|||||||
# Check for general settings
|
# Check for general settings
|
||||||
|
|
||||||
KOKKOS_INTERNAL_ENABLE_DEBUG := $(call kokkos_has_string,$(KOKKOS_DEBUG),yes)
|
KOKKOS_INTERNAL_ENABLE_DEBUG := $(call kokkos_has_string,$(KOKKOS_DEBUG),yes)
|
||||||
KOKKOS_INTERNAL_ENABLE_CXX11 := $(call kokkos_has_string,$(KOKKOS_CXX_STANDARD),c++11)
|
|
||||||
KOKKOS_INTERNAL_ENABLE_CXX14 := $(call kokkos_has_string,$(KOKKOS_CXX_STANDARD),c++14)
|
KOKKOS_INTERNAL_ENABLE_CXX14 := $(call kokkos_has_string,$(KOKKOS_CXX_STANDARD),c++14)
|
||||||
KOKKOS_INTERNAL_ENABLE_CXX1Y := $(call kokkos_has_string,$(KOKKOS_CXX_STANDARD),c++1y)
|
KOKKOS_INTERNAL_ENABLE_CXX1Y := $(call kokkos_has_string,$(KOKKOS_CXX_STANDARD),c++1y)
|
||||||
KOKKOS_INTERNAL_ENABLE_CXX17 := $(call kokkos_has_string,$(KOKKOS_CXX_STANDARD),c++17)
|
KOKKOS_INTERNAL_ENABLE_CXX17 := $(call kokkos_has_string,$(KOKKOS_CXX_STANDARD),c++17)
|
||||||
@ -279,14 +278,12 @@ else
|
|||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
|
|
||||||
# Set C++11 flags.
|
# Set C++ version flags.
|
||||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||||
KOKKOS_INTERNAL_CXX11_FLAG := --c++11
|
|
||||||
KOKKOS_INTERNAL_CXX14_FLAG := --c++14
|
KOKKOS_INTERNAL_CXX14_FLAG := --c++14
|
||||||
KOKKOS_INTERNAL_CXX17_FLAG := --c++17
|
KOKKOS_INTERNAL_CXX17_FLAG := --c++17
|
||||||
else
|
else
|
||||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
|
||||||
KOKKOS_INTERNAL_CXX11_FLAG := -std=c++11
|
|
||||||
KOKKOS_INTERNAL_CXX14_FLAG := -std=c++14
|
KOKKOS_INTERNAL_CXX14_FLAG := -std=c++14
|
||||||
KOKKOS_INTERNAL_CXX1Y_FLAG := -std=c++1y
|
KOKKOS_INTERNAL_CXX1Y_FLAG := -std=c++1y
|
||||||
#KOKKOS_INTERNAL_CXX17_FLAG := -std=c++17
|
#KOKKOS_INTERNAL_CXX17_FLAG := -std=c++17
|
||||||
@ -294,23 +291,17 @@ else
|
|||||||
#KOKKOS_INTERNAL_CXX2A_FLAG := -std=c++2a
|
#KOKKOS_INTERNAL_CXX2A_FLAG := -std=c++2a
|
||||||
else
|
else
|
||||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||||
KOKKOS_INTERNAL_CXX11_FLAG := -hstd=c++11
|
|
||||||
KOKKOS_INTERNAL_CXX14_FLAG := -hstd=c++14
|
KOKKOS_INTERNAL_CXX14_FLAG := -hstd=c++14
|
||||||
#KOKKOS_INTERNAL_CXX1Y_FLAG := -hstd=c++1y
|
#KOKKOS_INTERNAL_CXX1Y_FLAG := -hstd=c++1y
|
||||||
#KOKKOS_INTERNAL_CXX17_FLAG := -hstd=c++17
|
#KOKKOS_INTERNAL_CXX17_FLAG := -hstd=c++17
|
||||||
#KOKKOS_INTERNAL_CXX1Z_FLAG := -hstd=c++1z
|
#KOKKOS_INTERNAL_CXX1Z_FLAG := -hstd=c++1z
|
||||||
#KOKKOS_INTERNAL_CXX2A_FLAG := -hstd=c++2a
|
#KOKKOS_INTERNAL_CXX2A_FLAG := -hstd=c++2a
|
||||||
else
|
else
|
||||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_HCC), 1)
|
KOKKOS_INTERNAL_CXX14_FLAG := --std=c++14
|
||||||
KOKKOS_INTERNAL_CXX11_FLAG :=
|
KOKKOS_INTERNAL_CXX1Y_FLAG := --std=c++1y
|
||||||
else
|
KOKKOS_INTERNAL_CXX17_FLAG := --std=c++17
|
||||||
KOKKOS_INTERNAL_CXX11_FLAG := --std=c++11
|
KOKKOS_INTERNAL_CXX1Z_FLAG := --std=c++1z
|
||||||
KOKKOS_INTERNAL_CXX14_FLAG := --std=c++14
|
KOKKOS_INTERNAL_CXX2A_FLAG := --std=c++2a
|
||||||
KOKKOS_INTERNAL_CXX1Y_FLAG := --std=c++1y
|
|
||||||
KOKKOS_INTERNAL_CXX17_FLAG := --std=c++17
|
|
||||||
KOKKOS_INTERNAL_CXX1Z_FLAG := --std=c++1z
|
|
||||||
KOKKOS_INTERNAL_CXX2A_FLAG := --std=c++2a
|
|
||||||
endif
|
|
||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
@ -377,7 +368,8 @@ KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8
|
|||||||
KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv81)
|
KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv81)
|
||||||
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8-ThunderX)
|
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8-ThunderX)
|
||||||
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX2 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8-TX2)
|
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX2 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8-TX2)
|
||||||
KOKKOS_INTERNAL_USE_ARCH_ARM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX2) | bc))
|
KOKKOS_INTERNAL_USE_ARCH_A64FX := $(call kokkos_has_string,$(KOKKOS_ARCH),A64FX)
|
||||||
|
KOKKOS_INTERNAL_USE_ARCH_ARM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX2)+$(KOKKOS_INTERNAL_USE_ARCH_A64FX) | bc))
|
||||||
|
|
||||||
# IBM based.
|
# IBM based.
|
||||||
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(call kokkos_has_string,$(KOKKOS_ARCH),BGQ)
|
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(call kokkos_has_string,$(KOKKOS_ARCH),BGQ)
|
||||||
@ -392,6 +384,7 @@ KOKKOS_INTERNAL_USE_ARCH_ZEN2 := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen2)
|
|||||||
KOKKOS_INTERNAL_USE_ARCH_ZEN := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen)
|
KOKKOS_INTERNAL_USE_ARCH_ZEN := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen)
|
||||||
KOKKOS_INTERNAL_USE_ARCH_VEGA900 := $(call kokkos_has_string,$(KOKKOS_ARCH),Vega900)
|
KOKKOS_INTERNAL_USE_ARCH_VEGA900 := $(call kokkos_has_string,$(KOKKOS_ARCH),Vega900)
|
||||||
KOKKOS_INTERNAL_USE_ARCH_VEGA906 := $(call kokkos_has_string,$(KOKKOS_ARCH),Vega906)
|
KOKKOS_INTERNAL_USE_ARCH_VEGA906 := $(call kokkos_has_string,$(KOKKOS_ARCH),Vega906)
|
||||||
|
KOKKOS_INTERNAL_USE_ARCH_VEGA908 := $(call kokkos_has_string,$(KOKKOS_ARCH),Vega908)
|
||||||
|
|
||||||
# Any AVX?
|
# Any AVX?
|
||||||
KOKKOS_INTERNAL_USE_ARCH_SSE42 := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_WSM))
|
KOKKOS_INTERNAL_USE_ARCH_SSE42 := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_WSM))
|
||||||
@ -459,7 +452,6 @@ H := \#
|
|||||||
# Do not append first line
|
# Do not append first line
|
||||||
tmp := $(shell echo "/* ---------------------------------------------" > KokkosCore_config.tmp)
|
tmp := $(shell echo "/* ---------------------------------------------" > KokkosCore_config.tmp)
|
||||||
tmp := $(call kokkos_append_header,"Makefile constructed configuration:")
|
tmp := $(call kokkos_append_header,"Makefile constructed configuration:")
|
||||||
tmp := $(call kokkos_append_header,"$(shell date)")
|
|
||||||
tmp := $(call kokkos_append_header,"----------------------------------------------*/")
|
tmp := $(call kokkos_append_header,"----------------------------------------------*/")
|
||||||
|
|
||||||
tmp := $(call kokkos_append_header,'$H''if !defined(KOKKOS_MACROS_HPP) || defined(KOKKOS_CORE_CONFIG_H)')
|
tmp := $(call kokkos_append_header,'$H''if !defined(KOKKOS_MACROS_HPP) || defined(KOKKOS_CORE_CONFIG_H)')
|
||||||
@ -479,10 +471,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
|||||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_COMPILER_CUDA_VERSION $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION)")
|
tmp := $(call kokkos_append_header,"$H""define KOKKOS_COMPILER_CUDA_VERSION $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION)")
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
|
|
||||||
tmp := $(call kokkos_append_header,'$H''define KOKKOS_ENABLE_ROCM')
|
|
||||||
tmp := $(call kokkos_append_header,'$H''define KOKKOS_IMPL_ROCM_CLANG_WORKAROUND 1')
|
|
||||||
endif
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_HIP), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_HIP), 1)
|
||||||
tmp := $(call kokkos_append_header,'$H''define KOKKOS_ENABLE_HIP')
|
tmp := $(call kokkos_append_header,'$H''define KOKKOS_ENABLE_HIP')
|
||||||
endif
|
endif
|
||||||
@ -542,12 +530,6 @@ endif
|
|||||||
|
|
||||||
#only add the c++ standard flags if this is not CMake
|
#only add the c++ standard flags if this is not CMake
|
||||||
tmp := $(call kokkos_append_header,"/* General Settings */")
|
tmp := $(call kokkos_append_header,"/* General Settings */")
|
||||||
ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX11), 1)
|
|
||||||
ifneq ($(KOKKOS_STANDALONE_CMAKE), yes)
|
|
||||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX11_FLAG)
|
|
||||||
endif
|
|
||||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ENABLE_CXX11")
|
|
||||||
endif
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX14), 1)
|
ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX14), 1)
|
||||||
ifneq ($(KOKKOS_STANDALONE_CMAKE), yes)
|
ifneq ($(KOKKOS_STANDALONE_CMAKE), yes)
|
||||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX14_FLAG)
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX14_FLAG)
|
||||||
@ -765,6 +747,13 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV81), 1)
|
|||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_A64FX), 1)
|
||||||
|
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_A64FX")
|
||||||
|
|
||||||
|
KOKKOS_CXXFLAGS += -march=armv8.2-a+sve
|
||||||
|
KOKKOS_LDFLAGS += -march=armv8.2-a+sve
|
||||||
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN), 1)
|
||||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_ZEN")
|
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_ZEN")
|
||||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_AVX2")
|
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_AVX2")
|
||||||
@ -1143,6 +1132,12 @@ ifeq ($(KOKKOS_INTERNAL_USE_HIP), 1)
|
|||||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_VEGA906")
|
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_VEGA906")
|
||||||
KOKKOS_INTERNAL_HIP_ARCH_FLAG := --amdgpu-target=gfx906
|
KOKKOS_INTERNAL_HIP_ARCH_FLAG := --amdgpu-target=gfx906
|
||||||
endif
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_VEGA908), 1)
|
||||||
|
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_HIP 908")
|
||||||
|
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_VEGA908")
|
||||||
|
KOKKOS_INTERNAL_HIP_ARCH_FLAG := --amdgpu-target=gfx908
|
||||||
|
endif
|
||||||
|
|
||||||
|
|
||||||
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/HIP/*.cpp)
|
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/HIP/*.cpp)
|
||||||
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/HIP/*.hpp)
|
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/HIP/*.hpp)
|
||||||
@ -1173,6 +1168,55 @@ ifneq ($(KOKKOS_INTERNAL_NEW_CONFIG), 0)
|
|||||||
tmp := $(shell cp KokkosCore_config.tmp KokkosCore_config.h)
|
tmp := $(shell cp KokkosCore_config.tmp KokkosCore_config.h)
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
# Functions for generating config header file
|
||||||
|
kokkos_start_config_header = $(shell sed 's~@INCLUDE_NEXT_FILE@~~g' $(KOKKOS_PATH)/cmake/KokkosCore_Config_HeaderSet.in > $1)
|
||||||
|
kokkos_update_config_header = $(shell sed 's~@HEADER_GUARD_TAG@~$1~g' $2 > $3)
|
||||||
|
kokkos_append_config_header = $(shell echo $1 >> $2))
|
||||||
|
tmp := $(call kokkos_start_config_header, "KokkosCore_Config_FwdBackend.tmp")
|
||||||
|
tmp := $(call kokkos_start_config_header, "KokkosCore_Config_SetupBackend.tmp")
|
||||||
|
tmp := $(call kokkos_start_config_header, "KokkosCore_Config_DeclareBackend.tmp")
|
||||||
|
tmp := $(call kokkos_start_config_header, "KokkosCore_Config_PostInclude.tmp")
|
||||||
|
tmp := $(call kokkos_update_config_header, KOKKOS_FWD_HPP_, "KokkosCore_Config_FwdBackend.tmp", "KokkosCore_Config_FwdBackend.hpp")
|
||||||
|
tmp := $(call kokkos_update_config_header, KOKKOS_SETUP_HPP_, "KokkosCore_Config_SetupBackend.tmp", "KokkosCore_Config_SetupBackend.hpp")
|
||||||
|
tmp := $(call kokkos_update_config_header, KOKKOS_DECLARE_HPP_, "KokkosCore_Config_DeclareBackend.tmp", "KokkosCore_Config_DeclareBackend.hpp")
|
||||||
|
tmp := $(call kokkos_update_config_header, KOKKOS_POST_INCLUDE_HPP_, "KokkosCore_Config_PostInclude.tmp", "KokkosCore_Config_PostInclude.hpp")
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_CUDA.hpp>","KokkosCore_Config_FwdBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_CUDA.hpp>","KokkosCore_Config_DeclareBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <setup/Kokkos_Setup_Cuda.hpp>","KokkosCore_Config_SetupBackend.hpp")
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
|
||||||
|
else
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_OPENMPTARGET.hpp>","KokkosCore_Config_FwdBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_OPENMPTARGET.hpp>","KokkosCore_Config_DeclareBackend.hpp")
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_HIP), 1)
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_HIP.hpp>","KokkosCore_Config_FwdBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_HIP.hpp>","KokkosCore_Config_DeclareBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <setup/Kokkos_Setup_HIP.hpp>","KokkosCore_Config_SetupBackend.hpp")
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_OPENMP.hpp>","KokkosCore_Config_FwdBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_OPENMP.hpp>","KokkosCore_Config_DeclareBackend.hpp")
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_THREADS.hpp>","KokkosCore_Config_FwdBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_THREADS.hpp>","KokkosCore_Config_DeclareBackend.hpp")
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_HPX), 1)
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_HPX.hpp>","KokkosCore_Config_FwdBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_HPX.hpp>","KokkosCore_Config_DeclareBackend.hpp")
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_SERIAL.hpp>","KokkosCore_Config_FwdBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_SERIAL.hpp>","KokkosCore_Config_DeclareBackend.hpp")
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_HBWSpace.hpp>","KokkosCore_Config_FwdBackend.hpp")
|
||||||
|
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_HBWSpace.hpp>","KokkosCore_Config_DeclareBackend.hpp")
|
||||||
|
endif
|
||||||
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
|
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
|
||||||
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
|
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
|
||||||
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
|
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
|
||||||
@ -1290,7 +1334,7 @@ ifneq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
|
|||||||
endif
|
endif
|
||||||
|
|
||||||
# With Cygwin functions such as fdopen and fileno are not defined
|
# With Cygwin functions such as fdopen and fileno are not defined
|
||||||
# when strict ansi is enabled. strict ansi gets enabled with --std=c++11
|
# when strict ansi is enabled. strict ansi gets enabled with --std=c++14
|
||||||
# though. So we hard undefine it here. Not sure if that has any bad side effects
|
# though. So we hard undefine it here. Not sure if that has any bad side effects
|
||||||
# This is needed for gtest actually, not for Kokkos itself!
|
# This is needed for gtest actually, not for Kokkos itself!
|
||||||
ifeq ($(KOKKOS_INTERNAL_OS_CYGWIN), 1)
|
ifeq ($(KOKKOS_INTERNAL_OS_CYGWIN), 1)
|
||||||
@ -1313,7 +1357,9 @@ KOKKOS_OBJ_LINK = $(notdir $(KOKKOS_OBJ))
|
|||||||
include $(KOKKOS_PATH)/Makefile.targets
|
include $(KOKKOS_PATH)/Makefile.targets
|
||||||
|
|
||||||
kokkos-clean:
|
kokkos-clean:
|
||||||
rm -f $(KOKKOS_OBJ_LINK) KokkosCore_config.h KokkosCore_config.tmp libkokkos.a
|
rm -f $(KOKKOS_OBJ_LINK) KokkosCore_config.h KokkosCore_config.tmp libkokkos.a KokkosCore_Config_SetupBackend.hpp \
|
||||||
|
KokkosCore_Config_FwdBackend.hpp KokkosCore_Config_DeclareBackend.hpp KokkosCore_Config_DeclareBackend.tmp \
|
||||||
|
KokkosCore_Config_FwdBackend.tmp KokkosCore_Config_PostInclude.hpp KokkosCore_Config_PostInclude.tmp KokkosCore_Config_SetupBackend.tmp
|
||||||
|
|
||||||
libkokkos.a: $(KOKKOS_OBJ_LINK) $(KOKKOS_SRC) $(KOKKOS_HEADERS)
|
libkokkos.a: $(KOKKOS_OBJ_LINK) $(KOKKOS_SRC) $(KOKKOS_HEADERS)
|
||||||
ar cr libkokkos.a $(KOKKOS_OBJ_LINK)
|
ar cr libkokkos.a $(KOKKOS_OBJ_LINK)
|
||||||
|
|||||||
@ -53,23 +53,10 @@ Kokkos_HIP_Space.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP
|
|||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Space.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Space.cpp
|
||||||
Kokkos_HIP_Instance.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Instance.cpp
|
Kokkos_HIP_Instance.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Instance.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Instance.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Instance.cpp
|
||||||
Kokkos_HIP_KernelLaunch.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_KernelLaunch.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_KernelLaunch.cpp
|
|
||||||
Kokkos_HIP_Locks.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Locks.cpp
|
Kokkos_HIP_Locks.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Locks.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Locks.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/HIP/Kokkos_HIP_Locks.cpp
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
|
|
||||||
Kokkos_ROCm_Exec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/ROCm/Kokkos_ROCm_Exec.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/ROCm/Kokkos_ROCm_Exec.cpp
|
|
||||||
Kokkos_ROCm_Space.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/ROCm/Kokkos_ROCm_Space.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/ROCm/Kokkos_ROCm_Space.cpp
|
|
||||||
Kokkos_ROCm_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/ROCm/Kokkos_ROCm_Task.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/ROCm/Kokkos_ROCm_Task.cpp
|
|
||||||
Kokkos_ROCm_Impl.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/ROCm/Kokkos_ROCm_Impl.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/ROCm/Kokkos_ROCm_Impl.cpp
|
|
||||||
endif
|
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
|
||||||
Kokkos_ThreadsExec_base.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
|
Kokkos_ThreadsExec_base.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
|
||||||
|
|||||||
@ -54,24 +54,16 @@ For specifics see the LICENSE file contained in the repository or distribution.
|
|||||||
# Requirements
|
# Requirements
|
||||||
|
|
||||||
### Primary tested compilers on X86 are:
|
### Primary tested compilers on X86 are:
|
||||||
* GCC 4.8.4
|
* GCC 5.3.0
|
||||||
* GCC 4.9.3
|
|
||||||
* GCC 5.1.0
|
|
||||||
* GCC 5.4.0
|
* GCC 5.4.0
|
||||||
* GCC 5.5.0
|
* GCC 5.5.0
|
||||||
* GCC 6.1.0
|
* GCC 6.1.0
|
||||||
* GCC 7.2.0
|
* GCC 7.2.0
|
||||||
* GCC 7.3.0
|
* GCC 7.3.0
|
||||||
* GCC 8.1.0
|
* GCC 8.1.0
|
||||||
* Intel 15.0.2
|
|
||||||
* Intel 16.0.1
|
|
||||||
* Intel 17.0.1
|
* Intel 17.0.1
|
||||||
* Intel 17.4.196
|
* Intel 17.4.196
|
||||||
* Intel 18.2.128
|
* Intel 18.2.128
|
||||||
* Clang 3.6.1
|
|
||||||
* Clang 3.7.1
|
|
||||||
* Clang 3.8.1
|
|
||||||
* Clang 3.9.0
|
|
||||||
* Clang 4.0.0
|
* Clang 4.0.0
|
||||||
* Clang 6.0.0 for CUDA (CUDA Toolkit 9.0)
|
* Clang 6.0.0 for CUDA (CUDA Toolkit 9.0)
|
||||||
* Clang 7.0.0 for CUDA (CUDA Toolkit 9.1)
|
* Clang 7.0.0 for CUDA (CUDA Toolkit 9.1)
|
||||||
@ -81,6 +73,7 @@ For specifics see the LICENSE file contained in the repository or distribution.
|
|||||||
* NVCC 9.2 for CUDA (with gcc 7.2.0)
|
* NVCC 9.2 for CUDA (with gcc 7.2.0)
|
||||||
* NVCC 10.0 for CUDA (with gcc 7.4.0)
|
* NVCC 10.0 for CUDA (with gcc 7.4.0)
|
||||||
* NVCC 10.1 for CUDA (with gcc 7.4.0)
|
* NVCC 10.1 for CUDA (with gcc 7.4.0)
|
||||||
|
* NVCC 11.0 for CUDA (with gcc 8.4.0)
|
||||||
|
|
||||||
### Primary tested compilers on Power 8 are:
|
### Primary tested compilers on Power 8 are:
|
||||||
* GCC 6.4.0 (OpenMP,Serial)
|
* GCC 6.4.0 (OpenMP,Serial)
|
||||||
@ -89,9 +82,8 @@ For specifics see the LICENSE file contained in the repository or distribution.
|
|||||||
* NVCC 9.2.88 for CUDA (with gcc 7.2.0 and XL 16.1.0)
|
* NVCC 9.2.88 for CUDA (with gcc 7.2.0 and XL 16.1.0)
|
||||||
|
|
||||||
### Primary tested compilers on Intel KNL are:
|
### Primary tested compilers on Intel KNL are:
|
||||||
* Intel 16.4.258 (with gcc 4.7.2)
|
* Intel 17.2.174 (with gcc 6.2.0 and 6.4.0)
|
||||||
* Intel 17.2.174 (with gcc 4.9.3)
|
* Intel 18.2.199 (with gcc 6.2.0 and 6.4.0)
|
||||||
* Intel 18.2.199 (with gcc 4.9.3)
|
|
||||||
|
|
||||||
### Primary tested compilers on ARM (Cavium ThunderX2)
|
### Primary tested compilers on ARM (Cavium ThunderX2)
|
||||||
* GCC 7.2.0
|
* GCC 7.2.0
|
||||||
|
|||||||
@ -806,7 +806,7 @@ class Random_XorShift64 {
|
|||||||
const double V = 2.0 * drand() - 1.0;
|
const double V = 2.0 * drand() - 1.0;
|
||||||
S = U * U + V * V;
|
S = U * U + V * V;
|
||||||
}
|
}
|
||||||
return U * std::sqrt(-2.0 * log(S) / S);
|
return U * std::sqrt(-2.0 * std::log(S) / S);
|
||||||
}
|
}
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
@ -1042,7 +1042,7 @@ class Random_XorShift1024 {
|
|||||||
const double V = 2.0 * drand() - 1.0;
|
const double V = 2.0 * drand() - 1.0;
|
||||||
S = U * U + V * V;
|
S = U * U + V * V;
|
||||||
}
|
}
|
||||||
return U * std::sqrt(-2.0 * log(S) / S);
|
return U * std::sqrt(-2.0 * std::log(S) / S);
|
||||||
}
|
}
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
|||||||
@ -222,12 +222,12 @@ class BinSort {
|
|||||||
"Kokkos::SortImpl::BinSortFunctor::bin_count", bin_op.max_bins());
|
"Kokkos::SortImpl::BinSortFunctor::bin_count", bin_op.max_bins());
|
||||||
bin_count_const = bin_count_atomic;
|
bin_count_const = bin_count_atomic;
|
||||||
bin_offsets =
|
bin_offsets =
|
||||||
offset_type(ViewAllocateWithoutInitializing(
|
offset_type(view_alloc(WithoutInitializing,
|
||||||
"Kokkos::SortImpl::BinSortFunctor::bin_offsets"),
|
"Kokkos::SortImpl::BinSortFunctor::bin_offsets"),
|
||||||
bin_op.max_bins());
|
bin_op.max_bins());
|
||||||
sort_order =
|
sort_order =
|
||||||
offset_type(ViewAllocateWithoutInitializing(
|
offset_type(view_alloc(WithoutInitializing,
|
||||||
"Kokkos::SortImpl::BinSortFunctor::sort_order"),
|
"Kokkos::SortImpl::BinSortFunctor::sort_order"),
|
||||||
range_end - range_begin);
|
range_end - range_begin);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -279,8 +279,8 @@ class BinSort {
|
|||||||
}
|
}
|
||||||
|
|
||||||
scratch_view_type sorted_values(
|
scratch_view_type sorted_values(
|
||||||
ViewAllocateWithoutInitializing(
|
view_alloc(WithoutInitializing,
|
||||||
"Kokkos::SortImpl::BinSortFunctor::sorted_values"),
|
"Kokkos::SortImpl::BinSortFunctor::sorted_values"),
|
||||||
values.rank_dynamic > 0 ? len : KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
values.rank_dynamic > 0 ? len : KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
||||||
values.rank_dynamic > 1 ? values.extent(1)
|
values.rank_dynamic > 1 ? values.extent(1)
|
||||||
: KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
: KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
||||||
|
|||||||
@ -24,7 +24,7 @@ KOKKOS_ADD_TEST_LIBRARY(
|
|||||||
# avoid deprecation warnings from MSVC
|
# avoid deprecation warnings from MSVC
|
||||||
TARGET_COMPILE_DEFINITIONS(kokkosalgorithms_gtest PUBLIC GTEST_HAS_TR1_TUPLE=0 GTEST_HAS_PTHREAD=0)
|
TARGET_COMPILE_DEFINITIONS(kokkosalgorithms_gtest PUBLIC GTEST_HAS_TR1_TUPLE=0 GTEST_HAS_PTHREAD=0)
|
||||||
|
|
||||||
IF(NOT (Kokkos_ENABLE_CUDA AND WIN32))
|
IF((NOT (Kokkos_ENABLE_CUDA AND WIN32)) AND (NOT ("${KOKKOS_CXX_COMPILER_ID}" STREQUAL "Fujitsu")))
|
||||||
TARGET_COMPILE_FEATURES(kokkosalgorithms_gtest PUBLIC cxx_std_11)
|
TARGET_COMPILE_FEATURES(kokkosalgorithms_gtest PUBLIC cxx_std_11)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
|||||||
@ -31,10 +31,10 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
|||||||
TEST_TARGETS += test-cuda
|
TEST_TARGETS += test-cuda
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_HIP), 1)
|
||||||
OBJ_ROCM = TestROCm.o UnitTestMain.o gtest-all.o
|
OBJ_HIP = TestHIP.o UnitTestMain.o gtest-all.o
|
||||||
TARGETS += KokkosAlgorithms_UnitTest_ROCm
|
TARGETS += KokkosAlgorithms_UnitTest_HIP
|
||||||
TEST_TARGETS += test-rocm
|
TEST_TARGETS += test-hip
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
|
||||||
@ -64,8 +64,8 @@ endif
|
|||||||
KokkosAlgorithms_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
|
KokkosAlgorithms_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Cuda
|
$(LINK) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Cuda
|
||||||
|
|
||||||
KokkosAlgorithms_UnitTest_ROCm: $(OBJ_ROCM) $(KOKKOS_LINK_DEPENDS)
|
KokkosAlgorithms_UnitTest_HIP: $(OBJ_HIP) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(EXTRA_PATH) $(OBJ_ROCM) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_ROCm
|
$(LINK) $(EXTRA_PATH) $(OBJ_HIP) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_HIP
|
||||||
|
|
||||||
KokkosAlgorithms_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
|
KokkosAlgorithms_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Threads
|
$(LINK) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Threads
|
||||||
@ -82,8 +82,8 @@ KokkosAlgorithms_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
|
|||||||
test-cuda: KokkosAlgorithms_UnitTest_Cuda
|
test-cuda: KokkosAlgorithms_UnitTest_Cuda
|
||||||
./KokkosAlgorithms_UnitTest_Cuda
|
./KokkosAlgorithms_UnitTest_Cuda
|
||||||
|
|
||||||
test-rocm: KokkosAlgorithms_UnitTest_ROCm
|
test-hip: KokkosAlgorithms_UnitTest_HIP
|
||||||
./KokkosAlgorithms_UnitTest_ROCm
|
./KokkosAlgorithms_UnitTest_HIP
|
||||||
|
|
||||||
test-threads: KokkosAlgorithms_UnitTest_Threads
|
test-threads: KokkosAlgorithms_UnitTest_Threads
|
||||||
./KokkosAlgorithms_UnitTest_Threads
|
./KokkosAlgorithms_UnitTest_Threads
|
||||||
|
|||||||
@ -1,31 +1,38 @@
|
|||||||
KOKKOS_PATH = ${HOME}/kokkos
|
KOKKOS_DEVICES=Cuda
|
||||||
KOKKOS_DEVICES = "OpenMP"
|
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||||
KOKKOS_ARCH = "SNB"
|
KOKKOS_ARCH = "SNB,Volta70"
|
||||||
EXE_NAME = "test"
|
|
||||||
|
|
||||||
SRC = $(wildcard *.cpp)
|
|
||||||
|
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||||
|
|
||||||
|
ifndef KOKKOS_PATH
|
||||||
|
KOKKOS_PATH = $(MAKEFILE_PATH)../..
|
||||||
|
endif
|
||||||
|
|
||||||
|
SRC = $(wildcard $(MAKEFILE_PATH)*.cpp)
|
||||||
|
HEADERS = $(wildcard $(MAKEFILE_PATH)*.hpp)
|
||||||
|
|
||||||
|
vpath %.cpp $(sort $(dir $(SRC)))
|
||||||
|
|
||||||
default: build
|
default: build
|
||||||
echo "Start Build"
|
echo "Start Build"
|
||||||
|
|
||||||
|
|
||||||
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
||||||
EXE = ${EXE_NAME}.cuda
|
EXE = atomic_perf.cuda
|
||||||
KOKKOS_CUDA_OPTIONS = "enable_lambda"
|
|
||||||
else
|
else
|
||||||
CXX = g++
|
CXX = g++
|
||||||
EXE = ${EXE_NAME}.host
|
EXE = atomic_perf.exe
|
||||||
endif
|
endif
|
||||||
|
|
||||||
CXXFLAGS = -O3
|
CXXFLAGS ?= -O3 -g
|
||||||
|
override CXXFLAGS += -I$(MAKEFILE_PATH)
|
||||||
LINK = ${CXX}
|
|
||||||
LINKFLAGS = -O3
|
|
||||||
|
|
||||||
DEPFLAGS = -M
|
DEPFLAGS = -M
|
||||||
|
LINK = ${CXX}
|
||||||
|
LINKFLAGS =
|
||||||
|
|
||||||
OBJ = $(SRC:.cpp=.o)
|
OBJ = $(notdir $(SRC:.cpp=.o))
|
||||||
LIB =
|
LIB =
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
@ -35,10 +42,10 @@ build: $(EXE)
|
|||||||
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
||||||
|
|
||||||
clean: kokkos-clean
|
clean: kokkos-clean
|
||||||
rm -f *.o *.cuda *.host
|
rm -f *.o atomic_perf.cuda atomic_perf.exe
|
||||||
|
|
||||||
# Compilation rules
|
# Compilation rules
|
||||||
|
|
||||||
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
|
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(HEADERS)
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
|
||||||
|
|||||||
@ -9,7 +9,7 @@ if [[ ${USE_CUDA} > 0 ]]; then
|
|||||||
BAF_EXE=bytes_and_flops.cuda
|
BAF_EXE=bytes_and_flops.cuda
|
||||||
TEAM_SIZE=256
|
TEAM_SIZE=256
|
||||||
else
|
else
|
||||||
BAF_EXE=bytes_and_flops.host
|
BAF_EXE=bytes_and_flops.exe
|
||||||
TEAM_SIZE=1
|
TEAM_SIZE=1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
KOKKOS_DEVICES=Cuda
|
KOKKOS_DEVICES=Cuda
|
||||||
KOKKOS_CUDA_OPTIONS=enable_lambda
|
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||||
KOKKOS_ARCH = "SNB,Kepler35"
|
KOKKOS_ARCH = "SNB,Volta70"
|
||||||
|
|
||||||
|
|
||||||
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||||
@ -22,7 +22,7 @@ CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
|||||||
EXE = bytes_and_flops.cuda
|
EXE = bytes_and_flops.cuda
|
||||||
else
|
else
|
||||||
CXX = g++
|
CXX = g++
|
||||||
EXE = bytes_and_flops.host
|
EXE = bytes_and_flops.exe
|
||||||
endif
|
endif
|
||||||
|
|
||||||
CXXFLAGS ?= -O3 -g
|
CXXFLAGS ?= -O3 -g
|
||||||
|
|||||||
@ -1,7 +1,18 @@
|
|||||||
KOKKOS_PATH = ${HOME}/kokkos
|
|
||||||
SRC = $(wildcard *.cpp)
|
|
||||||
KOKKOS_DEVICES=Cuda
|
KOKKOS_DEVICES=Cuda
|
||||||
KOKKOS_CUDA_OPTIONS=enable_lambda
|
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||||
|
KOKKOS_ARCH = "SNB,Volta70"
|
||||||
|
|
||||||
|
|
||||||
|
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||||
|
|
||||||
|
ifndef KOKKOS_PATH
|
||||||
|
KOKKOS_PATH = $(MAKEFILE_PATH)../..
|
||||||
|
endif
|
||||||
|
|
||||||
|
SRC = $(wildcard $(MAKEFILE_PATH)*.cpp)
|
||||||
|
HEADERS = $(wildcard $(MAKEFILE_PATH)*.hpp)
|
||||||
|
|
||||||
|
vpath %.cpp $(sort $(dir $(SRC)))
|
||||||
|
|
||||||
default: build
|
default: build
|
||||||
echo "Start Build"
|
echo "Start Build"
|
||||||
@ -9,36 +20,32 @@ default: build
|
|||||||
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
||||||
EXE = gather.cuda
|
EXE = gather.cuda
|
||||||
KOKKOS_DEVICES = "Cuda,OpenMP"
|
|
||||||
KOKKOS_ARCH = "SNB,Kepler35"
|
|
||||||
else
|
else
|
||||||
CXX = g++
|
CXX = g++
|
||||||
EXE = gather.host
|
EXE = gather.exe
|
||||||
KOKKOS_DEVICES = "OpenMP"
|
|
||||||
KOKKOS_ARCH = "SNB"
|
|
||||||
endif
|
endif
|
||||||
|
|
||||||
CXXFLAGS = -O3 -g
|
CXXFLAGS ?= -O3 -g
|
||||||
|
override CXXFLAGS += -I$(MAKEFILE_PATH)
|
||||||
|
|
||||||
DEPFLAGS = -M
|
DEPFLAGS = -M
|
||||||
LINK = ${CXX}
|
LINK = ${CXX}
|
||||||
LINKFLAGS =
|
LINKFLAGS =
|
||||||
|
|
||||||
OBJ = $(SRC:.cpp=.o)
|
OBJ = $(notdir $(SRC:.cpp=.o))
|
||||||
LIB =
|
LIB =
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
|
|
||||||
$(warning ${KOKKOS_CPPFLAGS})
|
|
||||||
build: $(EXE)
|
build: $(EXE)
|
||||||
|
|
||||||
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
||||||
|
|
||||||
clean: kokkos-clean
|
clean: kokkos-clean
|
||||||
rm -f *.o *.cuda *.host
|
rm -f *.o gather.cuda gather.exe
|
||||||
|
|
||||||
# Compilation rules
|
# Compilation rules
|
||||||
|
|
||||||
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) gather_unroll.hpp gather.hpp
|
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(HEADERS)
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
|
||||||
|
|||||||
@ -1,28 +1,38 @@
|
|||||||
#Set your Kokkos path to something appropriate
|
KOKKOS_DEVICES=Cuda
|
||||||
KOKKOS_PATH = ${HOME}/git/kokkos-github-repo
|
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||||
KOKKOS_DEVICES = "Cuda"
|
KOKKOS_ARCH = "SNB,Volta70"
|
||||||
KOKKOS_ARCH = "Pascal60"
|
|
||||||
KOKKOS_CUDA_OPTIONS = enable_lambda
|
|
||||||
#KOKKOS_DEVICES = "OpenMP"
|
|
||||||
#KOKKOS_ARCH = "Power8"
|
|
||||||
|
|
||||||
SRC = gups-kokkos.cc
|
|
||||||
|
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||||
|
|
||||||
|
ifndef KOKKOS_PATH
|
||||||
|
KOKKOS_PATH = $(MAKEFILE_PATH)../..
|
||||||
|
endif
|
||||||
|
|
||||||
|
SRC = $(wildcard $(MAKEFILE_PATH)*.cpp)
|
||||||
|
HEADERS = $(wildcard $(MAKEFILE_PATH)*.hpp)
|
||||||
|
|
||||||
|
vpath %.cpp $(sort $(dir $(SRC)))
|
||||||
|
|
||||||
default: build
|
default: build
|
||||||
echo "Start Build"
|
echo "Start Build"
|
||||||
|
|
||||||
CXXFLAGS = -O3
|
|
||||||
CXX = ${HOME}/git/kokkos-github-repo/bin/nvcc_wrapper
|
|
||||||
#CXX = g++
|
|
||||||
|
|
||||||
LINK = ${CXX}
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
|
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
||||||
|
EXE = gups.cuda
|
||||||
|
else
|
||||||
|
CXX = g++
|
||||||
|
EXE = gups.exe
|
||||||
|
endif
|
||||||
|
|
||||||
LINKFLAGS =
|
CXXFLAGS ?= -O3 -g
|
||||||
EXE = gups-kokkos
|
override CXXFLAGS += -I$(MAKEFILE_PATH)
|
||||||
|
|
||||||
DEPFLAGS = -M
|
DEPFLAGS = -M
|
||||||
|
LINK = ${CXX}
|
||||||
|
LINKFLAGS =
|
||||||
|
|
||||||
OBJ = $(SRC:.cc=.o)
|
OBJ = $(notdir $(SRC:.cpp=.o))
|
||||||
LIB =
|
LIB =
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
@ -32,10 +42,10 @@ build: $(EXE)
|
|||||||
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
||||||
|
|
||||||
clean: kokkos-clean
|
clean: kokkos-clean
|
||||||
rm -f *.o $(EXE)
|
rm -f *.o gups.cuda gups.exe
|
||||||
|
|
||||||
# Compilation rules
|
# Compilation rules
|
||||||
|
|
||||||
%.o:%.cc $(KOKKOS_CPP_DEPENDS)
|
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(HEADERS)
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
|
||||||
|
|||||||
@ -1,31 +1,38 @@
|
|||||||
KOKKOS_PATH = ../..
|
KOKKOS_DEVICES=Cuda
|
||||||
SRC = $(wildcard *.cpp)
|
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||||
|
KOKKOS_ARCH = "SNB,Volta70"
|
||||||
|
|
||||||
|
|
||||||
|
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||||
|
|
||||||
|
ifndef KOKKOS_PATH
|
||||||
|
KOKKOS_PATH = $(MAKEFILE_PATH)../..
|
||||||
|
endif
|
||||||
|
|
||||||
|
SRC = $(wildcard $(MAKEFILE_PATH)*.cpp)
|
||||||
|
HEADERS = $(wildcard $(MAKEFILE_PATH)*.hpp)
|
||||||
|
|
||||||
|
vpath %.cpp $(sort $(dir $(SRC)))
|
||||||
|
|
||||||
default: build
|
default: build
|
||||||
echo "Start Build"
|
echo "Start Build"
|
||||||
|
|
||||||
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
||||||
CXXFLAGS = -O3 -g
|
EXE = policy_perf.cuda
|
||||||
LINK = ${CXX}
|
|
||||||
LINKFLAGS =
|
|
||||||
EXE = policy_performance.cuda
|
|
||||||
KOKKOS_DEVICES = "Cuda,OpenMP"
|
|
||||||
KOKKOS_ARCH = "SNB,Kepler35"
|
|
||||||
KOKKOS_CUDA_OPTIONS+=enable_lambda
|
|
||||||
else
|
else
|
||||||
CXX = g++
|
CXX = g++
|
||||||
CXXFLAGS = -O3 -g -Wall -Werror
|
EXE = policy_perf.exe
|
||||||
LINK = ${CXX}
|
|
||||||
LINKFLAGS =
|
|
||||||
EXE = policy_performance.host
|
|
||||||
KOKKOS_DEVICES = "OpenMP"
|
|
||||||
KOKKOS_ARCH = "SNB"
|
|
||||||
endif
|
endif
|
||||||
|
|
||||||
DEPFLAGS = -M
|
CXXFLAGS ?= -O3 -g
|
||||||
|
override CXXFLAGS += -I$(MAKEFILE_PATH)
|
||||||
|
|
||||||
OBJ = $(SRC:.cpp=.o)
|
DEPFLAGS = -M
|
||||||
|
LINK = ${CXX}
|
||||||
|
LINKFLAGS =
|
||||||
|
|
||||||
|
OBJ = $(notdir $(SRC:.cpp=.o))
|
||||||
LIB =
|
LIB =
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
@ -35,10 +42,10 @@ build: $(EXE)
|
|||||||
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
||||||
|
|
||||||
clean: kokkos-clean
|
clean: kokkos-clean
|
||||||
rm -f *.o *.cuda *.host
|
rm -f *.o policy_perf.cuda policy_perf.exe
|
||||||
|
|
||||||
# Compilation rules
|
# Compilation rules
|
||||||
|
|
||||||
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) main.cpp policy_perf_test.hpp
|
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(HEADERS)
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
|
||||||
|
|||||||
@ -146,11 +146,11 @@ int main(int argc, char* argv[]) {
|
|||||||
// Call a 'warmup' test with 1 repeat - this will initialize the corresponding
|
// Call a 'warmup' test with 1 repeat - this will initialize the corresponding
|
||||||
// view appropriately for test and should obey first-touch etc Second call to
|
// view appropriately for test and should obey first-touch etc Second call to
|
||||||
// test is the one we actually care about and time
|
// test is the one we actually care about and time
|
||||||
view_type_1d v_1(Kokkos::ViewAllocateWithoutInitializing("v_1"),
|
view_type_1d v_1(Kokkos::view_alloc(Kokkos::WithoutInitializing, "v_1"),
|
||||||
team_range * team_size);
|
team_range * team_size);
|
||||||
view_type_2d v_2(Kokkos::ViewAllocateWithoutInitializing("v_2"),
|
view_type_2d v_2(Kokkos::view_alloc(Kokkos::WithoutInitializing, "v_2"),
|
||||||
team_range * team_size, thread_range);
|
team_range * team_size, thread_range);
|
||||||
view_type_3d v_3(Kokkos::ViewAllocateWithoutInitializing("v_3"),
|
view_type_3d v_3(Kokkos::view_alloc(Kokkos::WithoutInitializing, "v_3"),
|
||||||
team_range * team_size, thread_range, vector_range);
|
team_range * team_size, thread_range, vector_range);
|
||||||
|
|
||||||
double result_computed = 0.0;
|
double result_computed = 0.0;
|
||||||
|
|||||||
@ -1,28 +1,38 @@
|
|||||||
#Set your Kokkos path to something appropriate
|
KOKKOS_DEVICES=Cuda
|
||||||
KOKKOS_PATH = ${HOME}/git/kokkos-github-repo
|
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||||
#KOKKOS_DEVICES = "Cuda"
|
KOKKOS_ARCH = "SNB,Volta70"
|
||||||
#KOKKOS_ARCH = "Pascal60"
|
|
||||||
#KOKKOS_CUDA_OPTIONS = enable_lambda
|
|
||||||
KOKKOS_DEVICES = "OpenMP"
|
|
||||||
KOKKOS_ARCH = "Power8"
|
|
||||||
|
|
||||||
SRC = stream-kokkos.cc
|
|
||||||
|
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||||
|
|
||||||
|
ifndef KOKKOS_PATH
|
||||||
|
KOKKOS_PATH = $(MAKEFILE_PATH)../..
|
||||||
|
endif
|
||||||
|
|
||||||
|
SRC = $(wildcard $(MAKEFILE_PATH)*.cpp)
|
||||||
|
HEADERS = $(wildcard $(MAKEFILE_PATH)*.hpp)
|
||||||
|
|
||||||
|
vpath %.cpp $(sort $(dir $(SRC)))
|
||||||
|
|
||||||
default: build
|
default: build
|
||||||
echo "Start Build"
|
echo "Start Build"
|
||||||
|
|
||||||
CXXFLAGS = -O3
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
#CXX = ${HOME}/git/kokkos-github-repo/bin/nvcc_wrapper
|
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
||||||
|
EXE = stream.cuda
|
||||||
|
else
|
||||||
CXX = g++
|
CXX = g++
|
||||||
|
EXE = stream.exe
|
||||||
|
endif
|
||||||
|
|
||||||
LINK = ${CXX}
|
CXXFLAGS ?= -O3 -g
|
||||||
|
override CXXFLAGS += -I$(MAKEFILE_PATH)
|
||||||
LINKFLAGS =
|
|
||||||
EXE = stream-kokkos
|
|
||||||
|
|
||||||
DEPFLAGS = -M
|
DEPFLAGS = -M
|
||||||
|
LINK = ${CXX}
|
||||||
|
LINKFLAGS =
|
||||||
|
|
||||||
OBJ = $(SRC:.cc=.o)
|
OBJ = $(notdir $(SRC:.cpp=.o))
|
||||||
LIB =
|
LIB =
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
@ -32,10 +42,10 @@ build: $(EXE)
|
|||||||
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
||||||
|
|
||||||
clean: kokkos-clean
|
clean: kokkos-clean
|
||||||
rm -f *.o $(EXE)
|
rm -f *.o stream.cuda stream.exe
|
||||||
|
|
||||||
# Compilation rules
|
# Compilation rules
|
||||||
|
|
||||||
%.o:%.cc $(KOKKOS_CPP_DEPENDS)
|
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(HEADERS)
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)
|
||||||
|
|||||||
87
lib/kokkos/bin/kokkos_launch_compiler
Executable file
87
lib/kokkos/bin/kokkos_launch_compiler
Executable file
@ -0,0 +1,87 @@
|
|||||||
|
#!/bin/bash -e
|
||||||
|
#
|
||||||
|
# This script allows CMAKE_CXX_COMPILER to be a standard
|
||||||
|
# C++ compiler and Kokkos sets RULE_LAUNCH_COMPILE and
|
||||||
|
# RULE_LAUNCH_LINK in CMake so that all compiler and link
|
||||||
|
# commands are prefixed with this script followed by the
|
||||||
|
# C++ compiler. Thus if $1 == $2 then we know the command
|
||||||
|
# was intended for the C++ compiler and we discard both
|
||||||
|
# $1 and $2 and redirect the command to NVCC_WRAPPER.
|
||||||
|
# If $1 != $2 then we know that the command was not intended
|
||||||
|
# for the C++ compiler and we just discard $1 and launch
|
||||||
|
# the original command. Examples of when $2 will not equal
|
||||||
|
# $1 are 'ar', 'cmake', etc. during the linking phase
|
||||||
|
#
|
||||||
|
|
||||||
|
# check the arguments for the KOKKOS_DEPENDENCE compiler definition
|
||||||
|
KOKKOS_DEPENDENCE=0
|
||||||
|
for i in ${@}
|
||||||
|
do
|
||||||
|
if [ -n "$(echo ${i} | grep 'KOKKOS_DEPENDENCE$')" ]; then
|
||||||
|
KOKKOS_DEPENDENCE=1
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
# if C++ is not passed, someone is probably trying to invoke it directly
|
||||||
|
if [ -z "${1}" ]; then
|
||||||
|
echo -e "\n${BASH_SOURCE[0]} was invoked without the C++ compiler as the first argument."
|
||||||
|
echo "This script is not indended to be directly invoked by any mechanism other"
|
||||||
|
echo -e "than through a RULE_LAUNCH_COMPILE or RULE_LAUNCH_LINK property set in CMake\n"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# if there aren't two args, this isn't necessarily invalid, just a bit strange
|
||||||
|
if [ -z "${2}" ]; then exit 0; fi
|
||||||
|
|
||||||
|
# store the expected C++ compiler
|
||||||
|
CXX_COMPILER=${1}
|
||||||
|
|
||||||
|
# remove the expected C++ compiler from the arguments
|
||||||
|
shift
|
||||||
|
|
||||||
|
# after the above shift, $1 is now the exe for the compile or link command, e.g.
|
||||||
|
# kokkos_launch_compiler g++ gcc -c file.c -o file.o
|
||||||
|
# becomes:
|
||||||
|
# kokkos_launch_compiler gcc -c file.c -o file.o
|
||||||
|
# Check to see if the executable is the C++ compiler and if it is not, then
|
||||||
|
# just execute the command.
|
||||||
|
#
|
||||||
|
# Summary:
|
||||||
|
# kokkos_launch_compiler g++ gcc -c file.c -o file.o
|
||||||
|
# results in this command being executed:
|
||||||
|
# gcc -c file.c -o file.o
|
||||||
|
# and
|
||||||
|
# kokkos_launch_compiler g++ g++ -c file.cpp -o file.o
|
||||||
|
# results in this command being executed:
|
||||||
|
# nvcc_wrapper -c file.cpp -o file.o
|
||||||
|
if [[ "${KOKKOS_DEPENDENCE}" -eq "0" || "${CXX_COMPILER}" != "${1}" ]]; then
|
||||||
|
# the command does not depend on Kokkos so just execute the command w/o re-directing to nvcc_wrapper
|
||||||
|
eval $@
|
||||||
|
else
|
||||||
|
# the executable is the C++ compiler, so we need to re-direct to nvcc_wrapper
|
||||||
|
|
||||||
|
# find the nvcc_wrapper from the same build/install
|
||||||
|
NVCC_WRAPPER="$(dirname ${BASH_SOURCE[0]})/nvcc_wrapper"
|
||||||
|
|
||||||
|
if [ -z "${NVCC_WRAPPER}" ]; then
|
||||||
|
echo -e "\nError: nvcc_wrapper not found in $(dirname ${BASH_SOURCE[0]}).\n"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# set default nvcc wrapper compiler if not specified
|
||||||
|
: ${NVCC_WRAPPER_DEFAULT_COMPILER:=${CXX_COMPILER}}
|
||||||
|
export NVCC_WRAPPER_DEFAULT_COMPILER
|
||||||
|
|
||||||
|
# calling itself will cause an infinitely long build
|
||||||
|
if [ "${NVCC_WRAPPER}" = "${NVCC_WRAPPER_DEFAULT_COMPILER}" ]; then
|
||||||
|
echo -e "\nError: NVCC_WRAPPER == NVCC_WRAPPER_DEFAULT_COMPILER. Terminating to avoid infinite loop!\n"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# discard the compiler from the command
|
||||||
|
shift
|
||||||
|
|
||||||
|
# execute nvcc_wrapper
|
||||||
|
${NVCC_WRAPPER} $@
|
||||||
|
fi
|
||||||
@ -90,7 +90,12 @@ replace_pragma_ident=0
|
|||||||
# Mark first host compiler argument
|
# Mark first host compiler argument
|
||||||
first_xcompiler_arg=1
|
first_xcompiler_arg=1
|
||||||
|
|
||||||
temp_dir=${TMPDIR:-/tmp}
|
# Allow for setting temp dir without setting TMPDIR in parent (see https://docs.olcf.ornl.gov/systems/summit_user_guide.html#setting-tmpdir-causes-jsm-jsrun-errors-job-state-flip-flop)
|
||||||
|
if [[ ! -z ${NVCC_WRAPPER_TMPDIR+x} ]]; then
|
||||||
|
temp_dir=${TMPDIR:-/tmp}
|
||||||
|
else
|
||||||
|
temp_dir=${NVCC_WRAPPER_TMPDIR+x}
|
||||||
|
fi
|
||||||
|
|
||||||
# optimization flag added as a command-line argument
|
# optimization flag added as a command-line argument
|
||||||
optimization_flag=""
|
optimization_flag=""
|
||||||
@ -194,7 +199,7 @@ do
|
|||||||
cuda_args="$cuda_args $1"
|
cuda_args="$cuda_args $1"
|
||||||
;;
|
;;
|
||||||
#Handle known nvcc args that have an argument
|
#Handle known nvcc args that have an argument
|
||||||
-rdc|-maxrregcount|--default-stream|-Xnvlink|--fmad|-cudart|--cudart)
|
-rdc|-maxrregcount|--default-stream|-Xnvlink|--fmad|-cudart|--cudart|-include)
|
||||||
cuda_args="$cuda_args $1 $2"
|
cuda_args="$cuda_args $1 $2"
|
||||||
shift
|
shift
|
||||||
;;
|
;;
|
||||||
|
|||||||
@ -1,3 +1,9 @@
|
|||||||
|
# No need for policy push/pop. CMake also manages a new entry for scripts
|
||||||
|
# loaded by include() and find_package() commands except when invoked with
|
||||||
|
# the NO_POLICY_SCOPE option
|
||||||
|
# CMP0057 + NEW -> IN_LIST operator in IF(...)
|
||||||
|
CMAKE_POLICY(SET CMP0057 NEW)
|
||||||
|
|
||||||
# Compute paths
|
# Compute paths
|
||||||
@PACKAGE_INIT@
|
@PACKAGE_INIT@
|
||||||
|
|
||||||
@ -12,3 +18,18 @@ GET_FILENAME_COMPONENT(Kokkos_CMAKE_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH)
|
|||||||
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosTargets.cmake")
|
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosTargets.cmake")
|
||||||
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosConfigCommon.cmake")
|
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosConfigCommon.cmake")
|
||||||
UNSET(Kokkos_CMAKE_DIR)
|
UNSET(Kokkos_CMAKE_DIR)
|
||||||
|
|
||||||
|
# if CUDA was enabled and separable compilation was specified, e.g.
|
||||||
|
# find_package(Kokkos COMPONENTS separable_compilation)
|
||||||
|
# then we set the RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK
|
||||||
|
IF(@Kokkos_ENABLE_CUDA@ AND NOT "separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS)
|
||||||
|
# run test to see if CMAKE_CXX_COMPILER=nvcc_wrapper
|
||||||
|
kokkos_compiler_is_nvcc(IS_NVCC ${CMAKE_CXX_COMPILER})
|
||||||
|
# if not nvcc_wrapper, use RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK
|
||||||
|
IF(NOT IS_NVCC AND NOT CMAKE_CXX_COMPILER_ID STREQUAL Clang AND
|
||||||
|
(NOT DEFINED Kokkos_LAUNCH_COMPILER OR Kokkos_LAUNCH_COMPILER))
|
||||||
|
MESSAGE(STATUS "kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to nvcc_wrapper")
|
||||||
|
kokkos_compilation(GLOBAL)
|
||||||
|
ENDIF()
|
||||||
|
UNSET(IS_NVCC) # be mindful of the environment, pollution is bad
|
||||||
|
ENDIF()
|
||||||
|
|||||||
@ -89,3 +89,73 @@ function(kokkos_check)
|
|||||||
set(${KOKKOS_CHECK_RETURN_VALUE} ${KOKKOS_CHECK_SUCCESS} PARENT_SCOPE)
|
set(${KOKKOS_CHECK_RETURN_VALUE} ${KOKKOS_CHECK_SUCCESS} PARENT_SCOPE)
|
||||||
endif()
|
endif()
|
||||||
endfunction()
|
endfunction()
|
||||||
|
|
||||||
|
# this function is provided to easily select which files use nvcc_wrapper:
|
||||||
|
#
|
||||||
|
# GLOBAL --> all files
|
||||||
|
# TARGET --> all files in a target
|
||||||
|
# SOURCE --> specific source files
|
||||||
|
# DIRECTORY --> all files in directory
|
||||||
|
# PROJECT --> all files/targets in a project/subproject
|
||||||
|
#
|
||||||
|
FUNCTION(kokkos_compilation)
|
||||||
|
CMAKE_PARSE_ARGUMENTS(COMP "GLOBAL;PROJECT" "" "DIRECTORY;TARGET;SOURCE" ${ARGN})
|
||||||
|
|
||||||
|
# search relative first and then absolute
|
||||||
|
SET(_HINTS "${CMAKE_CURRENT_LIST_DIR}/../.." "@CMAKE_INSTALL_PREFIX@")
|
||||||
|
|
||||||
|
# find kokkos_launch_compiler
|
||||||
|
FIND_PROGRAM(Kokkos_COMPILE_LAUNCHER
|
||||||
|
NAMES kokkos_launch_compiler
|
||||||
|
HINTS ${_HINTS}
|
||||||
|
PATHS ${_HINTS}
|
||||||
|
PATH_SUFFIXES bin)
|
||||||
|
|
||||||
|
IF(NOT Kokkos_COMPILE_LAUNCHER)
|
||||||
|
MESSAGE(FATAL_ERROR "Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'")
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
|
IF(COMP_GLOBAL)
|
||||||
|
# if global, don't bother setting others
|
||||||
|
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
|
||||||
|
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
|
||||||
|
ELSE()
|
||||||
|
FOREACH(_TYPE PROJECT DIRECTORY TARGET SOURCE)
|
||||||
|
# make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...)
|
||||||
|
IF("${_TYPE}" STREQUAL "PROJECT" AND COMP_${_TYPE})
|
||||||
|
LIST(APPEND COMP_DIRECTORY ${PROJECT_SOURCE_DIR})
|
||||||
|
UNSET(COMP_${_TYPE})
|
||||||
|
ENDIF()
|
||||||
|
# set the properties if defined
|
||||||
|
IF(COMP_${_TYPE})
|
||||||
|
# MESSAGE(STATUS "Using nvcc_wrapper :: ${_TYPE} :: ${COMP_${_TYPE}}")
|
||||||
|
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
|
||||||
|
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
|
||||||
|
ENDIF()
|
||||||
|
ENDFOREACH()
|
||||||
|
ENDIF()
|
||||||
|
ENDFUNCTION()
|
||||||
|
|
||||||
|
# A test to check whether a downstream project set the C++ compiler to NVCC or not
|
||||||
|
# this is called only when Kokkos was installed with Kokkos_ENABLE_CUDA=ON
|
||||||
|
FUNCTION(kokkos_compiler_is_nvcc VAR COMPILER)
|
||||||
|
# Check if the compiler is nvcc (which really means nvcc_wrapper).
|
||||||
|
EXECUTE_PROCESS(COMMAND ${COMPILER} ${ARGN} --version
|
||||||
|
OUTPUT_VARIABLE INTERNAL_COMPILER_VERSION
|
||||||
|
OUTPUT_STRIP_TRAILING_WHITESPACE
|
||||||
|
RESULT_VARIABLE RET)
|
||||||
|
# something went wrong
|
||||||
|
IF(RET GREATER 0)
|
||||||
|
SET(${VAR} false PARENT_SCOPE)
|
||||||
|
ELSE()
|
||||||
|
STRING(REPLACE "\n" " - " INTERNAL_COMPILER_VERSION_ONE_LINE ${INTERNAL_COMPILER_VERSION} )
|
||||||
|
STRING(FIND ${INTERNAL_COMPILER_VERSION_ONE_LINE} "nvcc" INTERNAL_COMPILER_VERSION_CONTAINS_NVCC)
|
||||||
|
STRING(REGEX REPLACE "^ +" "" INTERNAL_HAVE_COMPILER_NVCC "${INTERNAL_HAVE_COMPILER_NVCC}")
|
||||||
|
IF(${INTERNAL_COMPILER_VERSION_CONTAINS_NVCC} GREATER -1)
|
||||||
|
SET(${VAR} true PARENT_SCOPE)
|
||||||
|
ELSE()
|
||||||
|
SET(${VAR} false PARENT_SCOPE)
|
||||||
|
ENDIF()
|
||||||
|
ENDIF()
|
||||||
|
ENDFUNCTION()
|
||||||
|
|
||||||
|
|||||||
@ -1,4 +1,3 @@
|
|||||||
|
|
||||||
/*
|
/*
|
||||||
//@HEADER
|
//@HEADER
|
||||||
// ************************************************************************
|
// ************************************************************************
|
||||||
@ -42,6 +41,9 @@
|
|||||||
// ************************************************************************
|
// ************************************************************************
|
||||||
//@HEADER
|
//@HEADER
|
||||||
*/
|
*/
|
||||||
|
#ifndef @HEADER_GUARD_TAG@
|
||||||
|
#define @HEADER_GUARD_TAG@
|
||||||
|
|
||||||
#include <cuda/TestCuda_Category.hpp>
|
@INCLUDE_NEXT_FILE@
|
||||||
#include <TestAtomicViews.hpp>
|
|
||||||
|
#endif
|
||||||
@ -21,6 +21,7 @@
|
|||||||
#cmakedefine KOKKOS_ENABLE_HPX
|
#cmakedefine KOKKOS_ENABLE_HPX
|
||||||
#cmakedefine KOKKOS_ENABLE_MEMKIND
|
#cmakedefine KOKKOS_ENABLE_MEMKIND
|
||||||
#cmakedefine KOKKOS_ENABLE_LIBRT
|
#cmakedefine KOKKOS_ENABLE_LIBRT
|
||||||
|
#cmakedefine KOKKOS_ENABLE_SYCL
|
||||||
|
|
||||||
#ifndef __CUDA_ARCH__
|
#ifndef __CUDA_ARCH__
|
||||||
#cmakedefine KOKKOS_ENABLE_TM
|
#cmakedefine KOKKOS_ENABLE_TM
|
||||||
@ -31,7 +32,6 @@
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* General Settings */
|
/* General Settings */
|
||||||
#cmakedefine KOKKOS_ENABLE_CXX11
|
|
||||||
#cmakedefine KOKKOS_ENABLE_CXX14
|
#cmakedefine KOKKOS_ENABLE_CXX14
|
||||||
#cmakedefine KOKKOS_ENABLE_CXX17
|
#cmakedefine KOKKOS_ENABLE_CXX17
|
||||||
#cmakedefine KOKKOS_ENABLE_CXX20
|
#cmakedefine KOKKOS_ENABLE_CXX20
|
||||||
@ -58,7 +58,7 @@
|
|||||||
/* TPL Settings */
|
/* TPL Settings */
|
||||||
#cmakedefine KOKKOS_ENABLE_HWLOC
|
#cmakedefine KOKKOS_ENABLE_HWLOC
|
||||||
#cmakedefine KOKKOS_USE_LIBRT
|
#cmakedefine KOKKOS_USE_LIBRT
|
||||||
#cmakedefine KOKKOS_ENABLE_HWBSPACE
|
#cmakedefine KOKKOS_ENABLE_HBWSPACE
|
||||||
#cmakedefine KOKKOS_ENABLE_LIBDL
|
#cmakedefine KOKKOS_ENABLE_LIBDL
|
||||||
#cmakedefine KOKKOS_IMPL_CUDA_CLANG_WORKAROUND
|
#cmakedefine KOKKOS_IMPL_CUDA_CLANG_WORKAROUND
|
||||||
|
|
||||||
|
|||||||
@ -73,20 +73,20 @@ Compiler features are more fine-grained and require conflicting requests to be r
|
|||||||
Suppose I have
|
Suppose I have
|
||||||
````
|
````
|
||||||
add_library(A a.cpp)
|
add_library(A a.cpp)
|
||||||
target_compile_features(A PUBLIC cxx_std_11)
|
target_compile_features(A PUBLIC cxx_std_14)
|
||||||
````
|
````
|
||||||
then another target
|
then another target
|
||||||
````
|
````
|
||||||
add_library(B b.cpp)
|
add_library(B b.cpp)
|
||||||
target_compile_features(B PUBLIC cxx_std_14)
|
target_compile_features(B PUBLIC cxx_std_17)
|
||||||
target_link_libraries(A B)
|
target_link_libraries(A B)
|
||||||
````
|
````
|
||||||
I have requested two different features.
|
I have requested two different features.
|
||||||
CMake understands the requests and knows that `cxx_std_11` is a subset of `cxx_std_14`.
|
CMake understands the requests and knows that `cxx_std_14` is a subset of `cxx_std_17`.
|
||||||
CMake then picks C++14 for library `B`.
|
CMake then picks C++17 for library `B`.
|
||||||
CMake would not have been able to do feature resolution if we had directly done:
|
CMake would not have been able to do feature resolution if we had directly done:
|
||||||
````
|
````
|
||||||
target_compile_options(A PUBLIC -std=c++11)
|
target_compile_options(A PUBLIC -std=c++14)
|
||||||
````
|
````
|
||||||
|
|
||||||
### Adding Kokkos Options
|
### Adding Kokkos Options
|
||||||
|
|||||||
@ -1,14 +1,16 @@
|
|||||||
# @HEADER
|
# @HEADER
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
#
|
#
|
||||||
# Trilinos: An Object-Oriented Solver Framework
|
# Kokkos v. 3.0
|
||||||
# Copyright (2001) Sandia Corporation
|
# Copyright (2020) National Technology & Engineering
|
||||||
|
# Solutions of Sandia, LLC (NTESS).
|
||||||
#
|
#
|
||||||
|
# Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
# the U.S. Government retains certain rights in this software.
|
||||||
#
|
#
|
||||||
# Copyright (2001) Sandia Corporation. Under the terms of Contract
|
# Redistribution and use in source and binary forms, with or without
|
||||||
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
|
# modification, are permitted provided that the following conditions are
|
||||||
# work by or on behalf of the U.S. Government. Export of this program
|
# met:
|
||||||
# may require a license from the United States Government.
|
|
||||||
#
|
#
|
||||||
# 1. Redistributions of source code must retain the above copyright
|
# 1. Redistributions of source code must retain the above copyright
|
||||||
# notice, this list of conditions and the following disclaimer.
|
# notice, this list of conditions and the following disclaimer.
|
||||||
@ -21,10 +23,10 @@
|
|||||||
# contributors may be used to endorse or promote products derived from
|
# contributors may be used to endorse or promote products derived from
|
||||||
# this software without specific prior written permission.
|
# this software without specific prior written permission.
|
||||||
#
|
#
|
||||||
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
# THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
@ -33,22 +35,7 @@
|
|||||||
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
#
|
#
|
||||||
# NOTICE: The United States Government is granted for itself and others
|
# Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
|
|
||||||
# license in this data to reproduce, prepare derivative works, and
|
|
||||||
# perform publicly and display publicly. Beginning five (5) years from
|
|
||||||
# July 25, 2001, the United States Government is granted for itself and
|
|
||||||
# others acting on its behalf a paid-up, nonexclusive, irrevocable
|
|
||||||
# worldwide license in this data to reproduce, prepare derivative works,
|
|
||||||
# distribute copies to the public, perform publicly and display
|
|
||||||
# publicly, and to permit others to do so.
|
|
||||||
#
|
|
||||||
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
|
|
||||||
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
|
|
||||||
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
|
|
||||||
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
|
|
||||||
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
|
|
||||||
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
|
|
||||||
#
|
#
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
# @HEADER
|
# @HEADER
|
||||||
|
|||||||
@ -1,14 +1,16 @@
|
|||||||
# @HEADER
|
# @HEADER
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
#
|
#
|
||||||
# Trilinos: An Object-Oriented Solver Framework
|
# Kokkos v. 3.0
|
||||||
# Copyright (2001) Sandia Corporation
|
# Copyright (2020) National Technology & Engineering
|
||||||
|
# Solutions of Sandia, LLC (NTESS).
|
||||||
#
|
#
|
||||||
|
# Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
# the U.S. Government retains certain rights in this software.
|
||||||
#
|
#
|
||||||
# Copyright (2001) Sandia Corporation. Under the terms of Contract
|
# Redistribution and use in source and binary forms, with or without
|
||||||
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
|
# modification, are permitted provided that the following conditions are
|
||||||
# work by or on behalf of the U.S. Government. Export of this program
|
# met:
|
||||||
# may require a license from the United States Government.
|
|
||||||
#
|
#
|
||||||
# 1. Redistributions of source code must retain the above copyright
|
# 1. Redistributions of source code must retain the above copyright
|
||||||
# notice, this list of conditions and the following disclaimer.
|
# notice, this list of conditions and the following disclaimer.
|
||||||
@ -21,10 +23,10 @@
|
|||||||
# contributors may be used to endorse or promote products derived from
|
# contributors may be used to endorse or promote products derived from
|
||||||
# this software without specific prior written permission.
|
# this software without specific prior written permission.
|
||||||
#
|
#
|
||||||
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
# THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
@ -33,22 +35,7 @@
|
|||||||
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
#
|
#
|
||||||
# NOTICE: The United States Government is granted for itself and others
|
# Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
|
|
||||||
# license in this data to reproduce, prepare derivative works, and
|
|
||||||
# perform publicly and display publicly. Beginning five (5) years from
|
|
||||||
# July 25, 2001, the United States Government is granted for itself and
|
|
||||||
# others acting on its behalf a paid-up, nonexclusive, irrevocable
|
|
||||||
# worldwide license in this data to reproduce, prepare derivative works,
|
|
||||||
# distribute copies to the public, perform publicly and display
|
|
||||||
# publicly, and to permit others to do so.
|
|
||||||
#
|
|
||||||
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
|
|
||||||
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
|
|
||||||
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
|
|
||||||
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
|
|
||||||
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
|
|
||||||
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
|
|
||||||
#
|
#
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
# @HEADER
|
# @HEADER
|
||||||
|
|||||||
@ -1,14 +1,16 @@
|
|||||||
# @HEADER
|
# @HEADER
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
#
|
#
|
||||||
# Trilinos: An Object-Oriented Solver Framework
|
# Kokkos v. 3.0
|
||||||
# Copyright (2001) Sandia Corporation
|
# Copyright (2020) National Technology & Engineering
|
||||||
|
# Solutions of Sandia, LLC (NTESS).
|
||||||
#
|
#
|
||||||
|
# Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
# the U.S. Government retains certain rights in this software.
|
||||||
#
|
#
|
||||||
# Copyright (2001) Sandia Corporation. Under the terms of Contract
|
# Redistribution and use in source and binary forms, with or without
|
||||||
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
|
# modification, are permitted provided that the following conditions are
|
||||||
# work by or on behalf of the U.S. Government. Export of this program
|
# met:
|
||||||
# may require a license from the United States Government.
|
|
||||||
#
|
#
|
||||||
# 1. Redistributions of source code must retain the above copyright
|
# 1. Redistributions of source code must retain the above copyright
|
||||||
# notice, this list of conditions and the following disclaimer.
|
# notice, this list of conditions and the following disclaimer.
|
||||||
@ -21,10 +23,10 @@
|
|||||||
# contributors may be used to endorse or promote products derived from
|
# contributors may be used to endorse or promote products derived from
|
||||||
# this software without specific prior written permission.
|
# this software without specific prior written permission.
|
||||||
#
|
#
|
||||||
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
# THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
@ -33,22 +35,7 @@
|
|||||||
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
#
|
#
|
||||||
# NOTICE: The United States Government is granted for itself and others
|
# Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
|
|
||||||
# license in this data to reproduce, prepare derivative works, and
|
|
||||||
# perform publicly and display publicly. Beginning five (5) years from
|
|
||||||
# July 25, 2001, the United States Government is granted for itself and
|
|
||||||
# others acting on its behalf a paid-up, nonexclusive, irrevocable
|
|
||||||
# worldwide license in this data to reproduce, prepare derivative works,
|
|
||||||
# distribute copies to the public, perform publicly and display
|
|
||||||
# publicly, and to permit others to do so.
|
|
||||||
#
|
|
||||||
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
|
|
||||||
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
|
|
||||||
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
|
|
||||||
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
|
|
||||||
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
|
|
||||||
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
|
|
||||||
#
|
#
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
# @HEADER
|
# @HEADER
|
||||||
|
|||||||
@ -1,14 +1,16 @@
|
|||||||
# @HEADER
|
# @HEADER
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
#
|
#
|
||||||
# Trilinos: An Object-Oriented Solver Framework
|
# Kokkos v. 3.0
|
||||||
# Copyright (2001) Sandia Corporation
|
# Copyright (2020) National Technology & Engineering
|
||||||
|
# Solutions of Sandia, LLC (NTESS).
|
||||||
#
|
#
|
||||||
|
# Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
# the U.S. Government retains certain rights in this software.
|
||||||
#
|
#
|
||||||
# Copyright (2001) Sandia Corporation. Under the terms of Contract
|
# Redistribution and use in source and binary forms, with or without
|
||||||
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
|
# modification, are permitted provided that the following conditions are
|
||||||
# work by or on behalf of the U.S. Government. Export of this program
|
# met:
|
||||||
# may require a license from the United States Government.
|
|
||||||
#
|
#
|
||||||
# 1. Redistributions of source code must retain the above copyright
|
# 1. Redistributions of source code must retain the above copyright
|
||||||
# notice, this list of conditions and the following disclaimer.
|
# notice, this list of conditions and the following disclaimer.
|
||||||
@ -21,10 +23,10 @@
|
|||||||
# contributors may be used to endorse or promote products derived from
|
# contributors may be used to endorse or promote products derived from
|
||||||
# this software without specific prior written permission.
|
# this software without specific prior written permission.
|
||||||
#
|
#
|
||||||
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
# THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
@ -33,22 +35,7 @@
|
|||||||
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
#
|
#
|
||||||
# NOTICE: The United States Government is granted for itself and others
|
# Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
|
|
||||||
# license in this data to reproduce, prepare derivative works, and
|
|
||||||
# perform publicly and display publicly. Beginning five (5) years from
|
|
||||||
# July 25, 2001, the United States Government is granted for itself and
|
|
||||||
# others acting on its behalf a paid-up, nonexclusive, irrevocable
|
|
||||||
# worldwide license in this data to reproduce, prepare derivative works,
|
|
||||||
# distribute copies to the public, perform publicly and display
|
|
||||||
# publicly, and to permit others to do so.
|
|
||||||
#
|
|
||||||
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
|
|
||||||
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
|
|
||||||
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
|
|
||||||
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
|
|
||||||
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
|
|
||||||
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
|
|
||||||
#
|
#
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
# @HEADER
|
# @HEADER
|
||||||
|
|||||||
@ -38,12 +38,6 @@ MACRO(GLOBAL_SET VARNAME)
|
|||||||
SET(${VARNAME} ${ARGN} CACHE INTERNAL "" FORCE)
|
SET(${VARNAME} ${ARGN} CACHE INTERNAL "" FORCE)
|
||||||
ENDMACRO()
|
ENDMACRO()
|
||||||
|
|
||||||
FUNCTION(VERIFY_EMPTY CONTEXT)
|
|
||||||
if(${ARGN})
|
|
||||||
MESSAGE(FATAL_ERROR "Kokkos does not support all of Tribits. Unhandled arguments in ${CONTEXT}:\n${ARGN}")
|
|
||||||
endif()
|
|
||||||
ENDFUNCTION()
|
|
||||||
|
|
||||||
MACRO(PREPEND_GLOBAL_SET VARNAME)
|
MACRO(PREPEND_GLOBAL_SET VARNAME)
|
||||||
ASSERT_DEFINED(${VARNAME})
|
ASSERT_DEFINED(${VARNAME})
|
||||||
GLOBAL_SET(${VARNAME} ${ARGN} ${${VARNAME}})
|
GLOBAL_SET(${VARNAME} ${ARGN} ${${VARNAME}})
|
||||||
@ -89,7 +83,7 @@ FUNCTION(KOKKOS_ADD_TEST)
|
|||||||
CMAKE_PARSE_ARGUMENTS(TEST
|
CMAKE_PARSE_ARGUMENTS(TEST
|
||||||
""
|
""
|
||||||
"EXE;NAME;TOOL"
|
"EXE;NAME;TOOL"
|
||||||
""
|
"ARGS"
|
||||||
${ARGN})
|
${ARGN})
|
||||||
IF(TEST_EXE)
|
IF(TEST_EXE)
|
||||||
SET(EXE_ROOT ${TEST_EXE})
|
SET(EXE_ROOT ${TEST_EXE})
|
||||||
@ -102,6 +96,7 @@ FUNCTION(KOKKOS_ADD_TEST)
|
|||||||
NAME ${TEST_NAME}
|
NAME ${TEST_NAME}
|
||||||
COMM serial mpi
|
COMM serial mpi
|
||||||
NUM_MPI_PROCS 1
|
NUM_MPI_PROCS 1
|
||||||
|
ARGS ${TEST_ARGS}
|
||||||
${TEST_UNPARSED_ARGUMENTS}
|
${TEST_UNPARSED_ARGUMENTS}
|
||||||
ADDED_TESTS_NAMES_OUT ALL_TESTS_ADDED
|
ADDED_TESTS_NAMES_OUT ALL_TESTS_ADDED
|
||||||
)
|
)
|
||||||
@ -110,18 +105,25 @@ FUNCTION(KOKKOS_ADD_TEST)
|
|||||||
SET(TEST_NAME ${PACKAGE_NAME}_${TEST_NAME})
|
SET(TEST_NAME ${PACKAGE_NAME}_${TEST_NAME})
|
||||||
SET(EXE ${PACKAGE_NAME}_${EXE_ROOT})
|
SET(EXE ${PACKAGE_NAME}_${EXE_ROOT})
|
||||||
|
|
||||||
if(TEST_TOOL)
|
# The function TRIBITS_ADD_TEST() has a CATEGORIES argument that defaults
|
||||||
add_dependencies(${EXE} ${TEST_TOOL}) #make sure the exe has to build the tool
|
# to BASIC. If a project elects to only enable tests marked as PERFORMANCE,
|
||||||
foreach(TEST_ADDED ${ALL_TESTS_ADDED})
|
# the test won't actually be added and attempting to set a property on it below
|
||||||
set_property(TEST ${TEST_ADDED} APPEND PROPERTY ENVIRONMENT "KOKKOS_PROFILE_LIBRARY=$<TARGET_FILE:${TEST_TOOL}>")
|
# will yield an error.
|
||||||
endforeach()
|
if(TARGET ${EXE})
|
||||||
|
if(TEST_TOOL)
|
||||||
|
add_dependencies(${EXE} ${TEST_TOOL}) #make sure the exe has to build the tool
|
||||||
|
foreach(TEST_ADDED ${ALL_TESTS_ADDED})
|
||||||
|
set_property(TEST ${TEST_ADDED} APPEND PROPERTY ENVIRONMENT "KOKKOS_PROFILE_LIBRARY=$<TARGET_FILE:${TEST_TOOL}>")
|
||||||
|
endforeach()
|
||||||
|
endif()
|
||||||
endif()
|
endif()
|
||||||
else()
|
else()
|
||||||
CMAKE_PARSE_ARGUMENTS(TEST
|
CMAKE_PARSE_ARGUMENTS(TEST
|
||||||
"WILL_FAIL"
|
"WILL_FAIL"
|
||||||
"FAIL_REGULAR_EXPRESSION;PASS_REGULAR_EXPRESSION;EXE;NAME;TOOL"
|
"FAIL_REGULAR_EXPRESSION;PASS_REGULAR_EXPRESSION;EXE;NAME;TOOL"
|
||||||
"CATEGORIES;CMD_ARGS"
|
"CATEGORIES;ARGS"
|
||||||
${ARGN})
|
${ARGN})
|
||||||
|
SET(TESTS_ADDED)
|
||||||
# To match Tribits, we should always be receiving
|
# To match Tribits, we should always be receiving
|
||||||
# the root names of exes/libs
|
# the root names of exes/libs
|
||||||
IF(TEST_EXE)
|
IF(TEST_EXE)
|
||||||
@ -133,24 +135,46 @@ FUNCTION(KOKKOS_ADD_TEST)
|
|||||||
# These should be the full target name
|
# These should be the full target name
|
||||||
SET(TEST_NAME ${PACKAGE_NAME}_${TEST_NAME})
|
SET(TEST_NAME ${PACKAGE_NAME}_${TEST_NAME})
|
||||||
SET(EXE ${PACKAGE_NAME}_${EXE_ROOT})
|
SET(EXE ${PACKAGE_NAME}_${EXE_ROOT})
|
||||||
IF(WIN32)
|
IF (TEST_ARGS)
|
||||||
ADD_TEST(NAME ${TEST_NAME} WORKING_DIRECTORY ${LIBRARY_OUTPUT_PATH} COMMAND ${EXE}${CMAKE_EXECUTABLE_SUFFIX} ${TEST_CMD_ARGS})
|
SET(TEST_NUMBER 0)
|
||||||
|
FOREACH (ARG_STR ${TEST_ARGS})
|
||||||
|
# This is passed as a single string blob to match TriBITS behavior
|
||||||
|
# We need this to be turned into a list
|
||||||
|
STRING(REPLACE " " ";" ARG_STR_LIST ${ARG_STR})
|
||||||
|
IF(WIN32)
|
||||||
|
ADD_TEST(NAME ${TEST_NAME}${TEST_NUMBER} WORKING_DIRECTORY ${LIBRARY_OUTPUT_PATH}
|
||||||
|
COMMAND ${EXE}${CMAKE_EXECUTABLE_SUFFIX} ${ARG_STR_LIST})
|
||||||
|
ELSE()
|
||||||
|
ADD_TEST(NAME ${TEST_NAME}${TEST_NUMBER} COMMAND ${EXE} ${ARG_STR_LIST})
|
||||||
|
ENDIF()
|
||||||
|
LIST(APPEND TESTS_ADDED "${TEST_NAME}${TEST_NUMBER}")
|
||||||
|
MATH(EXPR TEST_NUMBER "${TEST_NUMBER} + 1")
|
||||||
|
ENDFOREACH()
|
||||||
ELSE()
|
ELSE()
|
||||||
ADD_TEST(NAME ${TEST_NAME} COMMAND ${EXE} ${TEST_CMD_ARGS})
|
IF(WIN32)
|
||||||
|
ADD_TEST(NAME ${TEST_NAME} WORKING_DIRECTORY ${LIBRARY_OUTPUT_PATH}
|
||||||
|
COMMAND ${EXE}${CMAKE_EXECUTABLE_SUFFIX})
|
||||||
|
ELSE()
|
||||||
|
ADD_TEST(NAME ${TEST_NAME} COMMAND ${EXE})
|
||||||
|
ENDIF()
|
||||||
|
LIST(APPEND TESTS_ADDED "${TEST_NAME}")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
IF(TEST_WILL_FAIL)
|
|
||||||
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES WILL_FAIL ${TEST_WILL_FAIL})
|
FOREACH(TEST_NAME ${TESTS_ADDED})
|
||||||
ENDIF()
|
IF(TEST_WILL_FAIL)
|
||||||
IF(TEST_FAIL_REGULAR_EXPRESSION)
|
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES WILL_FAIL ${TEST_WILL_FAIL})
|
||||||
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES FAIL_REGULAR_EXPRESSION ${TEST_FAIL_REGULAR_EXPRESSION})
|
ENDIF()
|
||||||
ENDIF()
|
IF(TEST_FAIL_REGULAR_EXPRESSION)
|
||||||
IF(TEST_PASS_REGULAR_EXPRESSION)
|
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES FAIL_REGULAR_EXPRESSION ${TEST_FAIL_REGULAR_EXPRESSION})
|
||||||
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES PASS_REGULAR_EXPRESSION ${TEST_PASS_REGULAR_EXPRESSION})
|
ENDIF()
|
||||||
ENDIF()
|
IF(TEST_PASS_REGULAR_EXPRESSION)
|
||||||
if(TEST_TOOL)
|
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES PASS_REGULAR_EXPRESSION ${TEST_PASS_REGULAR_EXPRESSION})
|
||||||
add_dependencies(${EXE} ${TEST_TOOL}) #make sure the exe has to build the tool
|
ENDIF()
|
||||||
set_property(TEST ${TEST_NAME} APPEND_STRING PROPERTY ENVIRONMENT "KOKKOS_PROFILE_LIBRARY=$<TARGET_FILE:${TEST_TOOL}>")
|
if(TEST_TOOL)
|
||||||
endif()
|
add_dependencies(${EXE} ${TEST_TOOL}) #make sure the exe has to build the tool
|
||||||
|
set_property(TEST ${TEST_NAME} APPEND_STRING PROPERTY ENVIRONMENT "KOKKOS_PROFILE_LIBRARY=$<TARGET_FILE:${TEST_TOOL}>")
|
||||||
|
endif()
|
||||||
|
ENDFOREACH()
|
||||||
VERIFY_EMPTY(KOKKOS_ADD_TEST ${TEST_UNPARSED_ARGUMENTS})
|
VERIFY_EMPTY(KOKKOS_ADD_TEST ${TEST_UNPARSED_ARGUMENTS})
|
||||||
endif()
|
endif()
|
||||||
ENDFUNCTION()
|
ENDFUNCTION()
|
||||||
|
|||||||
@ -3,7 +3,7 @@ FUNCTION(kokkos_set_intel_flags full_standard int_standard)
|
|||||||
STRING(TOLOWER ${full_standard} FULL_LC_STANDARD)
|
STRING(TOLOWER ${full_standard} FULL_LC_STANDARD)
|
||||||
STRING(TOLOWER ${int_standard} INT_LC_STANDARD)
|
STRING(TOLOWER ${int_standard} INT_LC_STANDARD)
|
||||||
# The following three blocks of code were copied from
|
# The following three blocks of code were copied from
|
||||||
# /Modules/Compiler/Intel-CXX.cmake from CMake 3.7.2 and then modified.
|
# /Modules/Compiler/Intel-CXX.cmake from CMake 3.18.1 and then modified.
|
||||||
IF(CMAKE_CXX_SIMULATE_ID STREQUAL MSVC)
|
IF(CMAKE_CXX_SIMULATE_ID STREQUAL MSVC)
|
||||||
SET(_std -Qstd)
|
SET(_std -Qstd)
|
||||||
SET(_ext c++)
|
SET(_ext c++)
|
||||||
@ -11,20 +11,8 @@ FUNCTION(kokkos_set_intel_flags full_standard int_standard)
|
|||||||
SET(_std -std)
|
SET(_std -std)
|
||||||
SET(_ext gnu++)
|
SET(_ext gnu++)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
SET(KOKKOS_CXX_STANDARD_FLAG "${_std}=c++${FULL_LC_STANDARD}" PARENT_SCOPE)
|
||||||
IF(NOT KOKKOS_CXX_STANDARD STREQUAL 11 AND NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 15.0.2)
|
SET(KOKKOS_CXX_INTERMDIATE_STANDARD_FLAG "${_std}=${_ext}${INT_LC_STANDARD}" PARENT_SCOPE)
|
||||||
#There is no gnu++14 value supported; figure out what to do.
|
|
||||||
SET(KOKKOS_CXX_STANDARD_FLAG "${_std}=c++${FULL_LC_STANDARD}" PARENT_SCOPE)
|
|
||||||
SET(KOKKOS_CXX_INTERMEDIATE_STANDARD_FLAG "${_std}=c++${INT_LC_STANDARD}" PARENT_SCOPE)
|
|
||||||
ELSEIF(KOKKOS_CXX_STANDARD STREQUAL 11 AND NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 13.0)
|
|
||||||
IF (CMAKE_CXX_EXTENSIONS)
|
|
||||||
SET(KOKKOS_CXX_STANDARD_FLAG "${_std}=${_ext}c++11" PARENT_SCOPE)
|
|
||||||
ELSE()
|
|
||||||
SET(KOKKOS_CXX_STANDARD_FLAG "${_std}=c++11" PARENT_SCOPE)
|
|
||||||
ENDIF()
|
|
||||||
ELSE()
|
|
||||||
MESSAGE(FATAL_ERROR "Intel compiler version too low - need 13.0 for C++11 and 15.0 for C++14")
|
|
||||||
ENDIF()
|
|
||||||
|
|
||||||
ENDFUNCTION()
|
ENDFUNCTION()
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -35,6 +35,7 @@ KOKKOS_ARCH_OPTION(ARMV80 HOST "ARMv8.0 Compatible CPU")
|
|||||||
KOKKOS_ARCH_OPTION(ARMV81 HOST "ARMv8.1 Compatible CPU")
|
KOKKOS_ARCH_OPTION(ARMV81 HOST "ARMv8.1 Compatible CPU")
|
||||||
KOKKOS_ARCH_OPTION(ARMV8_THUNDERX HOST "ARMv8 Cavium ThunderX CPU")
|
KOKKOS_ARCH_OPTION(ARMV8_THUNDERX HOST "ARMv8 Cavium ThunderX CPU")
|
||||||
KOKKOS_ARCH_OPTION(ARMV8_THUNDERX2 HOST "ARMv8 Cavium ThunderX2 CPU")
|
KOKKOS_ARCH_OPTION(ARMV8_THUNDERX2 HOST "ARMv8 Cavium ThunderX2 CPU")
|
||||||
|
KOKKOS_ARCH_OPTION(A64FX HOST "ARMv8.2 with SVE Suport")
|
||||||
KOKKOS_ARCH_OPTION(WSM HOST "Intel Westmere CPU")
|
KOKKOS_ARCH_OPTION(WSM HOST "Intel Westmere CPU")
|
||||||
KOKKOS_ARCH_OPTION(SNB HOST "Intel Sandy/Ivy Bridge CPUs")
|
KOKKOS_ARCH_OPTION(SNB HOST "Intel Sandy/Ivy Bridge CPUs")
|
||||||
KOKKOS_ARCH_OPTION(HSW HOST "Intel Haswell CPUs")
|
KOKKOS_ARCH_OPTION(HSW HOST "Intel Haswell CPUs")
|
||||||
@ -63,6 +64,7 @@ KOKKOS_ARCH_OPTION(ZEN HOST "AMD Zen architecture")
|
|||||||
KOKKOS_ARCH_OPTION(ZEN2 HOST "AMD Zen2 architecture")
|
KOKKOS_ARCH_OPTION(ZEN2 HOST "AMD Zen2 architecture")
|
||||||
KOKKOS_ARCH_OPTION(VEGA900 GPU "AMD GPU MI25 GFX900")
|
KOKKOS_ARCH_OPTION(VEGA900 GPU "AMD GPU MI25 GFX900")
|
||||||
KOKKOS_ARCH_OPTION(VEGA906 GPU "AMD GPU MI50/MI60 GFX906")
|
KOKKOS_ARCH_OPTION(VEGA906 GPU "AMD GPU MI50/MI60 GFX906")
|
||||||
|
KOKKOS_ARCH_OPTION(VEGA908 GPU "AMD GPU")
|
||||||
KOKKOS_ARCH_OPTION(INTEL_GEN GPU "Intel GPUs Gen9+")
|
KOKKOS_ARCH_OPTION(INTEL_GEN GPU "Intel GPUs Gen9+")
|
||||||
|
|
||||||
|
|
||||||
@ -72,6 +74,11 @@ IF(KOKKOS_ENABLE_COMPILER_WARNINGS)
|
|||||||
"-Wall" "-Wunused-parameter" "-Wshadow" "-pedantic"
|
"-Wall" "-Wunused-parameter" "-Wshadow" "-pedantic"
|
||||||
"-Wsign-compare" "-Wtype-limits" "-Wuninitialized")
|
"-Wsign-compare" "-Wtype-limits" "-Wuninitialized")
|
||||||
|
|
||||||
|
# OpenMPTarget compilers give erroneous warnings about sign comparison in loops
|
||||||
|
IF(KOKKOS_ENABLE_OPENMPTARGET)
|
||||||
|
LIST(REMOVE_ITEM COMMON_WARNINGS "-Wsign-compare")
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
SET(GNU_WARNINGS "-Wempty-body" "-Wclobbered" "-Wignored-qualifiers"
|
SET(GNU_WARNINGS "-Wempty-body" "-Wclobbered" "-Wignored-qualifiers"
|
||||||
${COMMON_WARNINGS})
|
${COMMON_WARNINGS})
|
||||||
|
|
||||||
@ -106,6 +113,12 @@ ENDIF()
|
|||||||
IF (KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
|
IF (KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
|
||||||
SET(CUDA_ARCH_FLAG "--cuda-gpu-arch")
|
SET(CUDA_ARCH_FLAG "--cuda-gpu-arch")
|
||||||
GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS -x cuda)
|
GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS -x cuda)
|
||||||
|
# Kokkos_CUDA_DIR has priority over CUDAToolkit_BIN_DIR
|
||||||
|
IF (Kokkos_CUDA_DIR)
|
||||||
|
GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS --cuda-path=${Kokkos_CUDA_DIR})
|
||||||
|
ELSEIF(CUDAToolkit_BIN_DIR)
|
||||||
|
GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS --cuda-path=${CUDAToolkit_BIN_DIR}/..)
|
||||||
|
ENDIF()
|
||||||
IF (KOKKOS_ENABLE_CUDA)
|
IF (KOKKOS_ENABLE_CUDA)
|
||||||
SET(KOKKOS_IMPL_CUDA_CLANG_WORKAROUND ON CACHE BOOL "enable CUDA Clang workarounds" FORCE)
|
SET(KOKKOS_IMPL_CUDA_CLANG_WORKAROUND ON CACHE BOOL "enable CUDA Clang workarounds" FORCE)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
@ -167,6 +180,12 @@ IF (KOKKOS_ARCH_ARMV8_THUNDERX2)
|
|||||||
)
|
)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
IF (KOKKOS_ARCH_A64FX)
|
||||||
|
COMPILER_SPECIFIC_FLAGS(
|
||||||
|
DEFAULT -march=armv8.2-a+sve
|
||||||
|
)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
IF (KOKKOS_ARCH_ZEN)
|
IF (KOKKOS_ARCH_ZEN)
|
||||||
COMPILER_SPECIFIC_FLAGS(
|
COMPILER_SPECIFIC_FLAGS(
|
||||||
Intel -mavx2
|
Intel -mavx2
|
||||||
@ -327,6 +346,16 @@ IF (Kokkos_ENABLE_HIP)
|
|||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
|
||||||
|
IF (Kokkos_ENABLE_SYCL)
|
||||||
|
COMPILER_SPECIFIC_FLAGS(
|
||||||
|
DEFAULT -fsycl
|
||||||
|
)
|
||||||
|
COMPILER_SPECIFIC_OPTIONS(
|
||||||
|
DEFAULT -fsycl-unnamed-lambda
|
||||||
|
)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
|
|
||||||
SET(CUDA_ARCH_ALREADY_SPECIFIED "")
|
SET(CUDA_ARCH_ALREADY_SPECIFIED "")
|
||||||
FUNCTION(CHECK_CUDA_ARCH ARCH FLAG)
|
FUNCTION(CHECK_CUDA_ARCH ARCH FLAG)
|
||||||
IF(KOKKOS_ARCH_${ARCH})
|
IF(KOKKOS_ARCH_${ARCH})
|
||||||
@ -392,6 +421,7 @@ ENDFUNCTION()
|
|||||||
#to the corresponding flag name if ON
|
#to the corresponding flag name if ON
|
||||||
CHECK_AMDGPU_ARCH(VEGA900 gfx900) # Radeon Instinct MI25
|
CHECK_AMDGPU_ARCH(VEGA900 gfx900) # Radeon Instinct MI25
|
||||||
CHECK_AMDGPU_ARCH(VEGA906 gfx906) # Radeon Instinct MI50 and MI60
|
CHECK_AMDGPU_ARCH(VEGA906 gfx906) # Radeon Instinct MI50 and MI60
|
||||||
|
CHECK_AMDGPU_ARCH(VEGA908 gfx908)
|
||||||
|
|
||||||
IF(KOKKOS_ENABLE_HIP AND NOT AMDGPU_ARCH_ALREADY_SPECIFIED)
|
IF(KOKKOS_ENABLE_HIP AND NOT AMDGPU_ARCH_ALREADY_SPECIFIED)
|
||||||
MESSAGE(SEND_ERROR "HIP enabled but no AMD GPU architecture currently enabled. "
|
MESSAGE(SEND_ERROR "HIP enabled but no AMD GPU architecture currently enabled. "
|
||||||
@ -477,35 +507,53 @@ ENDIF()
|
|||||||
|
|
||||||
#CMake verbose is kind of pointless
|
#CMake verbose is kind of pointless
|
||||||
#Let's just always print things
|
#Let's just always print things
|
||||||
MESSAGE(STATUS "Execution Spaces:")
|
MESSAGE(STATUS "Built-in Execution Spaces:")
|
||||||
|
|
||||||
FOREACH (_BACKEND CUDA OPENMPTARGET HIP)
|
FOREACH (_BACKEND Cuda OpenMPTarget HIP SYCL)
|
||||||
IF(KOKKOS_ENABLE_${_BACKEND})
|
STRING(TOUPPER ${_BACKEND} UC_BACKEND)
|
||||||
|
IF(KOKKOS_ENABLE_${UC_BACKEND})
|
||||||
IF(_DEVICE_PARALLEL)
|
IF(_DEVICE_PARALLEL)
|
||||||
MESSAGE(FATAL_ERROR "Multiple device parallel execution spaces are not allowed! "
|
MESSAGE(FATAL_ERROR "Multiple device parallel execution spaces are not allowed! "
|
||||||
"Trying to enable execution space ${_BACKEND}, "
|
"Trying to enable execution space ${_BACKEND}, "
|
||||||
"but execution space ${_DEVICE_PARALLEL} is already enabled. "
|
"but execution space ${_DEVICE_PARALLEL} is already enabled. "
|
||||||
"Remove the CMakeCache.txt file and re-configure.")
|
"Remove the CMakeCache.txt file and re-configure.")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
SET(_DEVICE_PARALLEL ${_BACKEND})
|
IF (${_BACKEND} STREQUAL "Cuda")
|
||||||
|
IF(KOKKOS_ENABLE_CUDA_UVM)
|
||||||
|
SET(_DEFAULT_DEVICE_MEMSPACE "Kokkos::${_BACKEND}UVMSpace")
|
||||||
|
ELSE()
|
||||||
|
SET(_DEFAULT_DEVICE_MEMSPACE "Kokkos::${_BACKEND}Space")
|
||||||
|
ENDIF()
|
||||||
|
SET(_DEVICE_PARALLEL "Kokkos::${_BACKEND}")
|
||||||
|
ELSE()
|
||||||
|
SET(_DEFAULT_DEVICE_MEMSPACE "Kokkos::Experimental::${_BACKEND}Space")
|
||||||
|
SET(_DEVICE_PARALLEL "Kokkos::Experimental::${_BACKEND}")
|
||||||
|
ENDIF()
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ENDFOREACH()
|
ENDFOREACH()
|
||||||
IF(NOT _DEVICE_PARALLEL)
|
IF(NOT _DEVICE_PARALLEL)
|
||||||
SET(_DEVICE_PARALLEL "NONE")
|
SET(_DEVICE_PARALLEL "NoTypeDefined")
|
||||||
|
SET(_DEFAULT_DEVICE_MEMSPACE "NoTypeDefined")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
MESSAGE(STATUS " Device Parallel: ${_DEVICE_PARALLEL}")
|
MESSAGE(STATUS " Device Parallel: ${_DEVICE_PARALLEL}")
|
||||||
UNSET(_DEVICE_PARALLEL)
|
IF(KOKKOS_ENABLE_PTHREAD)
|
||||||
|
SET(KOKKOS_ENABLE_THREADS ON)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
|
FOREACH (_BACKEND OpenMP Threads HPX)
|
||||||
FOREACH (_BACKEND OPENMP PTHREAD HPX)
|
STRING(TOUPPER ${_BACKEND} UC_BACKEND)
|
||||||
IF(KOKKOS_ENABLE_${_BACKEND})
|
IF(KOKKOS_ENABLE_${UC_BACKEND})
|
||||||
IF(_HOST_PARALLEL)
|
IF(_HOST_PARALLEL)
|
||||||
MESSAGE(FATAL_ERROR "Multiple host parallel execution spaces are not allowed! "
|
MESSAGE(FATAL_ERROR "Multiple host parallel execution spaces are not allowed! "
|
||||||
"Trying to enable execution space ${_BACKEND}, "
|
"Trying to enable execution space ${_BACKEND}, "
|
||||||
"but execution space ${_HOST_PARALLEL} is already enabled. "
|
"but execution space ${_HOST_PARALLEL} is already enabled. "
|
||||||
"Remove the CMakeCache.txt file and re-configure.")
|
"Remove the CMakeCache.txt file and re-configure.")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
SET(_HOST_PARALLEL ${_BACKEND})
|
IF (${_BACKEND} STREQUAL "HPX")
|
||||||
|
SET(_HOST_PARALLEL "Kokkos::Experimental::${_BACKEND}")
|
||||||
|
ELSE()
|
||||||
|
SET(_HOST_PARALLEL "Kokkos::${_BACKEND}")
|
||||||
|
ENDIF()
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ENDFOREACH()
|
ENDFOREACH()
|
||||||
|
|
||||||
@ -515,14 +563,11 @@ IF(NOT _HOST_PARALLEL AND NOT KOKKOS_ENABLE_SERIAL)
|
|||||||
"and Kokkos_ENABLE_SERIAL=OFF.")
|
"and Kokkos_ENABLE_SERIAL=OFF.")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
IF(NOT _HOST_PARALLEL)
|
IF(_HOST_PARALLEL)
|
||||||
SET(_HOST_PARALLEL "NONE")
|
|
||||||
ENDIF()
|
|
||||||
MESSAGE(STATUS " Host Parallel: ${_HOST_PARALLEL}")
|
MESSAGE(STATUS " Host Parallel: ${_HOST_PARALLEL}")
|
||||||
UNSET(_HOST_PARALLEL)
|
ELSE()
|
||||||
|
SET(_HOST_PARALLEL "NoTypeDefined")
|
||||||
IF(KOKKOS_ENABLE_PTHREAD)
|
MESSAGE(STATUS " Host Parallel: NoTypeDefined")
|
||||||
SET(KOKKOS_ENABLE_THREADS ON)
|
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
IF(KOKKOS_ENABLE_SERIAL)
|
IF(KOKKOS_ENABLE_SERIAL)
|
||||||
|
|||||||
@ -4,24 +4,42 @@ SET(KOKKOS_CXX_COMPILER ${CMAKE_CXX_COMPILER})
|
|||||||
SET(KOKKOS_CXX_COMPILER_ID ${CMAKE_CXX_COMPILER_ID})
|
SET(KOKKOS_CXX_COMPILER_ID ${CMAKE_CXX_COMPILER_ID})
|
||||||
SET(KOKKOS_CXX_COMPILER_VERSION ${CMAKE_CXX_COMPILER_VERSION})
|
SET(KOKKOS_CXX_COMPILER_VERSION ${CMAKE_CXX_COMPILER_VERSION})
|
||||||
|
|
||||||
IF(Kokkos_ENABLE_CUDA)
|
MACRO(kokkos_internal_have_compiler_nvcc)
|
||||||
# Check if the compiler is nvcc (which really means nvcc_wrapper).
|
# Check if the compiler is nvcc (which really means nvcc_wrapper).
|
||||||
EXECUTE_PROCESS(COMMAND ${CMAKE_CXX_COMPILER} --version
|
EXECUTE_PROCESS(COMMAND ${ARGN} --version
|
||||||
OUTPUT_VARIABLE INTERNAL_COMPILER_VERSION
|
OUTPUT_VARIABLE INTERNAL_COMPILER_VERSION
|
||||||
OUTPUT_STRIP_TRAILING_WHITESPACE)
|
OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||||
|
|
||||||
STRING(REPLACE "\n" " - " INTERNAL_COMPILER_VERSION_ONE_LINE ${INTERNAL_COMPILER_VERSION} )
|
STRING(REPLACE "\n" " - " INTERNAL_COMPILER_VERSION_ONE_LINE ${INTERNAL_COMPILER_VERSION} )
|
||||||
|
|
||||||
STRING(FIND ${INTERNAL_COMPILER_VERSION_ONE_LINE} "nvcc" INTERNAL_COMPILER_VERSION_CONTAINS_NVCC)
|
STRING(FIND ${INTERNAL_COMPILER_VERSION_ONE_LINE} "nvcc" INTERNAL_COMPILER_VERSION_CONTAINS_NVCC)
|
||||||
|
STRING(REGEX REPLACE "^ +" "" INTERNAL_HAVE_COMPILER_NVCC "${INTERNAL_HAVE_COMPILER_NVCC}")
|
||||||
|
|
||||||
STRING(REGEX REPLACE "^ +" ""
|
|
||||||
INTERNAL_HAVE_COMPILER_NVCC "${INTERNAL_HAVE_COMPILER_NVCC}")
|
|
||||||
IF(${INTERNAL_COMPILER_VERSION_CONTAINS_NVCC} GREATER -1)
|
IF(${INTERNAL_COMPILER_VERSION_CONTAINS_NVCC} GREATER -1)
|
||||||
SET(INTERNAL_HAVE_COMPILER_NVCC true)
|
SET(INTERNAL_HAVE_COMPILER_NVCC true)
|
||||||
ELSE()
|
ELSE()
|
||||||
SET(INTERNAL_HAVE_COMPILER_NVCC false)
|
SET(INTERNAL_HAVE_COMPILER_NVCC false)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
ENDMACRO()
|
||||||
|
|
||||||
|
IF(Kokkos_ENABLE_CUDA)
|
||||||
|
# find kokkos_launch_compiler
|
||||||
|
FIND_PROGRAM(Kokkos_COMPILE_LAUNCHER
|
||||||
|
NAMES kokkos_launch_compiler
|
||||||
|
HINTS ${PROJECT_SOURCE_DIR}
|
||||||
|
PATHS ${PROJECT_SOURCE_DIR}
|
||||||
|
PATH_SUFFIXES bin)
|
||||||
|
|
||||||
|
# check if compiler was set to nvcc_wrapper
|
||||||
|
kokkos_internal_have_compiler_nvcc(${CMAKE_CXX_COMPILER})
|
||||||
|
# if launcher was found and nvcc_wrapper was not specified as
|
||||||
|
# compiler, set to use launcher. Will ensure CMAKE_CXX_COMPILER
|
||||||
|
# is replaced by nvcc_wrapper
|
||||||
|
IF(Kokkos_COMPILE_LAUNCHER AND NOT INTERNAL_HAVE_COMPILER_NVCC AND NOT KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
|
||||||
|
# the first argument to launcher is always the C++ compiler defined by cmake
|
||||||
|
# if the second argument matches the C++ compiler, it forwards the rest of the
|
||||||
|
# args to nvcc_wrapper
|
||||||
|
kokkos_internal_have_compiler_nvcc(
|
||||||
|
${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER} ${CMAKE_CXX_COMPILER} -DKOKKOS_DEPENDENCE)
|
||||||
|
SET(INTERNAL_USE_COMPILER_LAUNCHER true)
|
||||||
|
ENDIF()
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
IF(INTERNAL_HAVE_COMPILER_NVCC)
|
IF(INTERNAL_HAVE_COMPILER_NVCC)
|
||||||
@ -36,6 +54,35 @@ IF(INTERNAL_HAVE_COMPILER_NVCC)
|
|||||||
STRING(SUBSTRING ${TEMP_CXX_COMPILER_VERSION} 1 -1 TEMP_CXX_COMPILER_VERSION)
|
STRING(SUBSTRING ${TEMP_CXX_COMPILER_VERSION} 1 -1 TEMP_CXX_COMPILER_VERSION)
|
||||||
SET(KOKKOS_CXX_COMPILER_VERSION ${TEMP_CXX_COMPILER_VERSION} CACHE STRING INTERNAL FORCE)
|
SET(KOKKOS_CXX_COMPILER_VERSION ${TEMP_CXX_COMPILER_VERSION} CACHE STRING INTERNAL FORCE)
|
||||||
MESSAGE(STATUS "Compiler Version: ${KOKKOS_CXX_COMPILER_VERSION}")
|
MESSAGE(STATUS "Compiler Version: ${KOKKOS_CXX_COMPILER_VERSION}")
|
||||||
|
IF(INTERNAL_USE_COMPILER_LAUNCHER)
|
||||||
|
IF(Kokkos_LAUNCH_COMPILER_INFO)
|
||||||
|
GET_FILENAME_COMPONENT(BASE_COMPILER_NAME ${CMAKE_CXX_COMPILER} NAME)
|
||||||
|
# does not have STATUS intentionally
|
||||||
|
MESSAGE("")
|
||||||
|
MESSAGE("Kokkos_LAUNCH_COMPILER_INFO (${Kokkos_COMPILE_LAUNCHER}):")
|
||||||
|
MESSAGE(" - Kokkos + CUDA backend requires the C++ files to be compiled as CUDA code.")
|
||||||
|
MESSAGE(" - kokkos_launch_compiler permits CMAKE_CXX_COMPILER to be set to a traditional C++ compiler when Kokkos_ENABLE_CUDA=ON")
|
||||||
|
MESSAGE(" by prefixing all the compile and link commands with the path to the script + CMAKE_CXX_COMPILER (${CMAKE_CXX_COMPILER}).")
|
||||||
|
MESSAGE(" - If any of the compile or link commands have CMAKE_CXX_COMPILER as the first argument, it replaces CMAKE_CXX_COMPILER with nvcc_wrapper.")
|
||||||
|
MESSAGE(" - If the compile or link command is not CMAKE_CXX_COMPILER, it just executes the command.")
|
||||||
|
MESSAGE(" - If using ccache, set CMAKE_CXX_COMPILER to nvcc_wrapper explicitly.")
|
||||||
|
MESSAGE(" - kokkos_compiler_launcher is available to downstream projects as well.")
|
||||||
|
MESSAGE(" - If CMAKE_CXX_COMPILER=nvcc_wrapper, all legacy behavior will be preserved during 'find_package(Kokkos)'")
|
||||||
|
MESSAGE(" - If CMAKE_CXX_COMPILER is not nvcc_wrapper, 'find_package(Kokkos)' will apply 'kokkos_compilation(GLOBAL)' unless separable compilation is enabled")
|
||||||
|
MESSAGE(" - This can be disabled via '-DKokkos_LAUNCH_COMPILER=OFF'")
|
||||||
|
MESSAGE(" - Use 'find_package(Kokkos COMPONENTS separable_compilation)' to enable separable compilation")
|
||||||
|
MESSAGE(" - Separable compilation allows you to control the scope of where the compiler transformation behavior (${BASE_COMPILER_NAME} -> nvcc_wrapper) is applied")
|
||||||
|
MESSAGE(" - The compiler transformation can be applied on a per-project, per-directory, per-target, and/or per-source-file basis")
|
||||||
|
MESSAGE(" - 'kokkos_compilation(PROJECT)' will apply the compiler transformation to all targets in a project/subproject")
|
||||||
|
MESSAGE(" - 'kokkos_compilation(TARGET <TARGET> [<TARGETS>...])' will apply the compiler transformation to the specified target(s)")
|
||||||
|
MESSAGE(" - 'kokkos_compilation(SOURCE <SOURCE> [<SOURCES>...])' will apply the compiler transformation to the specified source file(s)")
|
||||||
|
MESSAGE(" - 'kokkos_compilation(DIRECTORY <DIR> [<DIRS>...])' will apply the compiler transformation to the specified directories")
|
||||||
|
MESSAGE("")
|
||||||
|
ELSE()
|
||||||
|
MESSAGE(STATUS "kokkos_launch_compiler (${Kokkos_COMPILE_LAUNCHER}) is enabled... Set Kokkos_LAUNCH_COMPILER_INFO=ON for more info.")
|
||||||
|
ENDIF()
|
||||||
|
kokkos_compilation(GLOBAL)
|
||||||
|
ENDIF()
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
IF(Kokkos_ENABLE_HIP)
|
IF(Kokkos_ENABLE_HIP)
|
||||||
@ -90,38 +137,49 @@ IF(KOKKOS_CXX_COMPILER_ID STREQUAL Cray OR KOKKOS_CLANG_IS_CRAY)
|
|||||||
ENDIF()
|
ENDIF()
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Fujitsu)
|
||||||
|
# SET Fujitsus compiler version which is not detected by CMake
|
||||||
|
EXECUTE_PROCESS(COMMAND ${CMAKE_CXX_COMPILER} --version
|
||||||
|
OUTPUT_VARIABLE INTERNAL_CXX_COMPILER_VERSION
|
||||||
|
OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||||
|
|
||||||
|
STRING(REGEX MATCH "[0-9]+\\.[0-9]+\\.[0-9]+"
|
||||||
|
TEMP_CXX_COMPILER_VERSION ${INTERNAL_CXX_COMPILER_VERSION})
|
||||||
|
SET(KOKKOS_CXX_COMPILER_VERSION ${TEMP_CXX_COMPILER_VERSION} CACHE STRING INTERNAL FORCE)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
# Enforce the minimum compilers supported by Kokkos.
|
# Enforce the minimum compilers supported by Kokkos.
|
||||||
SET(KOKKOS_MESSAGE_TEXT "Compiler not supported by Kokkos. Required compiler versions:")
|
SET(KOKKOS_MESSAGE_TEXT "Compiler not supported by Kokkos. Required compiler versions:")
|
||||||
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang 3.5.2 or higher")
|
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang 4.0.0 or higher")
|
||||||
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n GCC 4.8.4 or higher")
|
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n GCC 5.3.0 or higher")
|
||||||
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Intel 15.0.2 or higher")
|
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Intel 17.0.0 or higher")
|
||||||
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n NVCC 9.0.69 or higher")
|
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n NVCC 9.2.88 or higher")
|
||||||
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n HIPCC 3.5.0 or higher")
|
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n HIPCC 3.8.0 or higher")
|
||||||
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n PGI 17.1 or higher\n")
|
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n PGI 17.4 or higher\n")
|
||||||
|
|
||||||
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
|
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
|
||||||
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 3.5.2)
|
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 4.0.0)
|
||||||
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL GNU)
|
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL GNU)
|
||||||
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 4.8.4)
|
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 5.3.0)
|
||||||
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL Intel)
|
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL Intel)
|
||||||
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 15.0.2)
|
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 17.0.0)
|
||||||
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
|
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
|
||||||
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 9.0.69)
|
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 9.2.88)
|
||||||
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
SET(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "Kokkos turns off CXX extensions" FORCE)
|
SET(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "Kokkos turns off CXX extensions" FORCE)
|
||||||
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL HIP)
|
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL HIP)
|
||||||
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 3.5.0)
|
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 3.8.0)
|
||||||
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL PGI)
|
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL PGI)
|
||||||
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 17.1)
|
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 17.4)
|
||||||
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|||||||
@ -1,4 +1,4 @@
|
|||||||
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang AND KOKKOS_ENABLE_OPENMP AND NOT KOKKOS_CLANG_IS_CRAY AND NOT "x${CMAKE_CXX_SIMULATE_ID}" STREQUAL "xMSVC")
|
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang AND KOKKOS_ENABLE_OPENMP AND NOT KOKKOS_CLANG_IS_CRAY AND NOT KOKKOS_COMPILER_CLANG_MSVC)
|
||||||
# The clang "version" doesn't actually tell you what runtimes and tools
|
# The clang "version" doesn't actually tell you what runtimes and tools
|
||||||
# were built into Clang. We should therefore make sure that libomp
|
# were built into Clang. We should therefore make sure that libomp
|
||||||
# was actually built into Clang. Otherwise the user will get nonsensical
|
# was actually built into Clang. Otherwise the user will get nonsensical
|
||||||
|
|||||||
@ -25,6 +25,18 @@ IF (KOKKOS_ENABLE_PTHREAD)
|
|||||||
SET(KOKKOS_ENABLE_THREADS ON)
|
SET(KOKKOS_ENABLE_THREADS ON)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
# detect clang++ / cl / clang-cl clashes
|
||||||
|
IF (CMAKE_CXX_COMPILER_ID STREQUAL Clang AND "x${CMAKE_CXX_SIMULATE_ID}" STREQUAL "xMSVC")
|
||||||
|
# this specific test requires CMake >= 3.15
|
||||||
|
IF ("x${CMAKE_CXX_COMPILER_FRONTEND_VARIANT}" STREQUAL "xGNU")
|
||||||
|
# use pure clang++ instead of clang-cl
|
||||||
|
SET(KOKKOS_COMPILER_CLANG_MSVC OFF)
|
||||||
|
ELSE()
|
||||||
|
# it defaults to clang-cl
|
||||||
|
SET(KOKKOS_COMPILER_CLANG_MSVC ON)
|
||||||
|
ENDIF()
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
IF(Trilinos_ENABLE_Kokkos AND Trilinos_ENABLE_OpenMP)
|
IF(Trilinos_ENABLE_Kokkos AND Trilinos_ENABLE_OpenMP)
|
||||||
SET(OMP_DEFAULT ON)
|
SET(OMP_DEFAULT ON)
|
||||||
ELSE()
|
ELSE()
|
||||||
@ -39,13 +51,16 @@ IF(KOKKOS_ENABLE_OPENMP)
|
|||||||
IF(KOKKOS_CLANG_IS_INTEL)
|
IF(KOKKOS_CLANG_IS_INTEL)
|
||||||
SET(ClangOpenMPFlag -fiopenmp)
|
SET(ClangOpenMPFlag -fiopenmp)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang AND "x${CMAKE_CXX_SIMULATE_ID}" STREQUAL "xMSVC")
|
IF(KOKKOS_COMPILER_CLANG_MSVC)
|
||||||
#expression /openmp yields error, so add a specific Clang flag
|
#for clang-cl expression /openmp yields an error, so directly add the specific Clang flag
|
||||||
COMPILER_SPECIFIC_OPTIONS(Clang /clang:-fopenmp)
|
SET(ClangOpenMPFlag /clang:-fopenmp=libomp)
|
||||||
#link omp library from LLVM lib dir
|
ENDIF()
|
||||||
|
IF(WIN32 AND CMAKE_CXX_COMPILER_ID STREQUAL Clang)
|
||||||
|
#link omp library from LLVM lib dir, no matter if it is clang-cl or clang++
|
||||||
get_filename_component(LLVM_BIN_DIR ${CMAKE_CXX_COMPILER_AR} DIRECTORY)
|
get_filename_component(LLVM_BIN_DIR ${CMAKE_CXX_COMPILER_AR} DIRECTORY)
|
||||||
COMPILER_SPECIFIC_LIBS(Clang "${LLVM_BIN_DIR}/../lib/libomp.lib")
|
COMPILER_SPECIFIC_LIBS(Clang "${LLVM_BIN_DIR}/../lib/libomp.lib")
|
||||||
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
|
ENDIF()
|
||||||
|
IF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
|
||||||
COMPILER_SPECIFIC_FLAGS(
|
COMPILER_SPECIFIC_FLAGS(
|
||||||
COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID
|
COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID
|
||||||
Clang -Xcompiler ${ClangOpenMPFlag}
|
Clang -Xcompiler ${ClangOpenMPFlag}
|
||||||
@ -71,7 +86,7 @@ ENDIF()
|
|||||||
|
|
||||||
KOKKOS_DEVICE_OPTION(OPENMPTARGET OFF DEVICE "Whether to build the OpenMP target backend")
|
KOKKOS_DEVICE_OPTION(OPENMPTARGET OFF DEVICE "Whether to build the OpenMP target backend")
|
||||||
IF (KOKKOS_ENABLE_OPENMPTARGET)
|
IF (KOKKOS_ENABLE_OPENMPTARGET)
|
||||||
SET(ClangOpenMPFlag -fopenmp=libomp)
|
SET(ClangOpenMPFlag -fopenmp=libomp)
|
||||||
IF(KOKKOS_CLANG_IS_CRAY)
|
IF(KOKKOS_CLANG_IS_CRAY)
|
||||||
SET(ClangOpenMPFlag -fopenmp)
|
SET(ClangOpenMPFlag -fopenmp)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
@ -105,9 +120,11 @@ KOKKOS_DEVICE_OPTION(CUDA ${CUDA_DEFAULT} DEVICE "Whether to build CUDA backend"
|
|||||||
|
|
||||||
IF (KOKKOS_ENABLE_CUDA)
|
IF (KOKKOS_ENABLE_CUDA)
|
||||||
GLOBAL_SET(KOKKOS_DONT_ALLOW_EXTENSIONS "CUDA enabled")
|
GLOBAL_SET(KOKKOS_DONT_ALLOW_EXTENSIONS "CUDA enabled")
|
||||||
IF(WIN32)
|
IF(WIN32 AND NOT KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
|
||||||
GLOBAL_APPEND(KOKKOS_COMPILE_OPTIONS -x cu)
|
GLOBAL_APPEND(KOKKOS_COMPILE_OPTIONS -x cu)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
## Cuda has extra setup requirements, turn on Kokkos_Setup_Cuda.hpp in macros
|
||||||
|
LIST(APPEND DEVICE_SETUP_LIST Cuda)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
# We want this to default to OFF for cache reasons, but if no
|
# We want this to default to OFF for cache reasons, but if no
|
||||||
@ -128,3 +145,10 @@ KOKKOS_DEVICE_OPTION(SERIAL ${SERIAL_DEFAULT} HOST "Whether to build serial back
|
|||||||
KOKKOS_DEVICE_OPTION(HPX OFF HOST "Whether to build HPX backend (experimental)")
|
KOKKOS_DEVICE_OPTION(HPX OFF HOST "Whether to build HPX backend (experimental)")
|
||||||
|
|
||||||
KOKKOS_DEVICE_OPTION(HIP OFF DEVICE "Whether to build HIP backend")
|
KOKKOS_DEVICE_OPTION(HIP OFF DEVICE "Whether to build HIP backend")
|
||||||
|
|
||||||
|
## HIP has extra setup requirements, turn on Kokkos_Setup_HIP.hpp in macros
|
||||||
|
IF (KOKKOS_ENABLE_HIP)
|
||||||
|
LIST(APPEND DEVICE_SETUP_LIST HIP)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
|
KOKKOS_DEVICE_OPTION(SYCL OFF DEVICE "Whether to build SYCL backend")
|
||||||
|
|||||||
@ -154,13 +154,13 @@ MACRO(kokkos_export_imported_tpl NAME)
|
|||||||
KOKKOS_APPEND_CONFIG_LINE("SET_TARGET_PROPERTIES(${NAME} PROPERTIES")
|
KOKKOS_APPEND_CONFIG_LINE("SET_TARGET_PROPERTIES(${NAME} PROPERTIES")
|
||||||
GET_TARGET_PROPERTY(TPL_LIBRARY ${NAME} IMPORTED_LOCATION)
|
GET_TARGET_PROPERTY(TPL_LIBRARY ${NAME} IMPORTED_LOCATION)
|
||||||
IF(TPL_LIBRARY)
|
IF(TPL_LIBRARY)
|
||||||
KOKKOS_APPEND_CONFIG_LINE("IMPORTED_LOCATION ${TPL_LIBRARY}")
|
KOKKOS_APPEND_CONFIG_LINE("IMPORTED_LOCATION \"${TPL_LIBRARY}\"")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
GET_TARGET_PROPERTY(TPL_INCLUDES ${NAME} INTERFACE_INCLUDE_DIRECTORIES)
|
GET_TARGET_PROPERTY(TPL_INCLUDES ${NAME} INTERFACE_INCLUDE_DIRECTORIES)
|
||||||
IF(TPL_INCLUDES)
|
IF(TPL_INCLUDES)
|
||||||
KOKKOS_APPEND_CONFIG_LINE("INTERFACE_INCLUDE_DIRECTORIES ${TPL_INCLUDES}")
|
KOKKOS_APPEND_CONFIG_LINE("INTERFACE_INCLUDE_DIRECTORIES \"${TPL_INCLUDES}\"")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
GET_TARGET_PROPERTY(TPL_COMPILE_OPTIONS ${NAME} INTERFACE_COMPILE_OPTIONS)
|
GET_TARGET_PROPERTY(TPL_COMPILE_OPTIONS ${NAME} INTERFACE_COMPILE_OPTIONS)
|
||||||
@ -178,7 +178,7 @@ MACRO(kokkos_export_imported_tpl NAME)
|
|||||||
|
|
||||||
GET_TARGET_PROPERTY(TPL_LINK_LIBRARIES ${NAME} INTERFACE_LINK_LIBRARIES)
|
GET_TARGET_PROPERTY(TPL_LINK_LIBRARIES ${NAME} INTERFACE_LINK_LIBRARIES)
|
||||||
IF(TPL_LINK_LIBRARIES)
|
IF(TPL_LINK_LIBRARIES)
|
||||||
KOKKOS_APPEND_CONFIG_LINE("INTERFACE_LINK_LIBRARIES ${TPL_LINK_LIBRARIES}")
|
KOKKOS_APPEND_CONFIG_LINE("INTERFACE_LINK_LIBRARIES \"${TPL_LINK_LIBRARIES}\"")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
KOKKOS_APPEND_CONFIG_LINE(")")
|
KOKKOS_APPEND_CONFIG_LINE(")")
|
||||||
KOKKOS_APPEND_CONFIG_LINE("ENDIF()")
|
KOKKOS_APPEND_CONFIG_LINE("ENDIF()")
|
||||||
@ -770,7 +770,7 @@ FUNCTION(kokkos_link_tpl TARGET)
|
|||||||
ENDFUNCTION()
|
ENDFUNCTION()
|
||||||
|
|
||||||
FUNCTION(COMPILER_SPECIFIC_OPTIONS_HELPER)
|
FUNCTION(COMPILER_SPECIFIC_OPTIONS_HELPER)
|
||||||
SET(COMPILERS NVIDIA PGI XL DEFAULT Cray Intel Clang AppleClang IntelClang GNU HIP)
|
SET(COMPILERS NVIDIA PGI XL DEFAULT Cray Intel Clang AppleClang IntelClang GNU HIP Fujitsu)
|
||||||
CMAKE_PARSE_ARGUMENTS(
|
CMAKE_PARSE_ARGUMENTS(
|
||||||
PARSE
|
PARSE
|
||||||
"LINK_OPTIONS;COMPILE_OPTIONS;COMPILE_DEFINITIONS;LINK_LIBRARIES"
|
"LINK_OPTIONS;COMPILE_OPTIONS;COMPILE_DEFINITIONS;LINK_LIBRARIES"
|
||||||
@ -844,7 +844,6 @@ ENDFUNCTION(COMPILER_SPECIFIC_DEFS)
|
|||||||
FUNCTION(COMPILER_SPECIFIC_LIBS)
|
FUNCTION(COMPILER_SPECIFIC_LIBS)
|
||||||
COMPILER_SPECIFIC_OPTIONS_HELPER(${ARGN} LINK_LIBRARIES)
|
COMPILER_SPECIFIC_OPTIONS_HELPER(${ARGN} LINK_LIBRARIES)
|
||||||
ENDFUNCTION(COMPILER_SPECIFIC_LIBS)
|
ENDFUNCTION(COMPILER_SPECIFIC_LIBS)
|
||||||
|
|
||||||
# Given a list of the form
|
# Given a list of the form
|
||||||
# key1;value1;key2;value2,...
|
# key1;value1;key2;value2,...
|
||||||
# Create a list of all keys in a variable named ${KEY_LIST_NAME}
|
# Create a list of all keys in a variable named ${KEY_LIST_NAME}
|
||||||
@ -877,3 +876,114 @@ FUNCTION(KOKKOS_CHECK_DEPRECATED_OPTIONS)
|
|||||||
ENDIF()
|
ENDIF()
|
||||||
ENDFOREACH()
|
ENDFOREACH()
|
||||||
ENDFUNCTION()
|
ENDFUNCTION()
|
||||||
|
|
||||||
|
# this function checks whether the current CXX compiler supports building CUDA
|
||||||
|
FUNCTION(kokkos_cxx_compiler_cuda_test _VAR)
|
||||||
|
# don't run this test every time
|
||||||
|
IF(DEFINED ${_VAR})
|
||||||
|
RETURN()
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
|
FILE(WRITE ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cpp
|
||||||
|
"
|
||||||
|
#include <cuda.h>
|
||||||
|
#include <cstdlib>
|
||||||
|
|
||||||
|
__global__
|
||||||
|
void kernel(int sz, double* data)
|
||||||
|
{
|
||||||
|
auto _beg = blockIdx.x * blockDim.x + threadIdx.x;
|
||||||
|
for(int i = _beg; i < sz; ++i)
|
||||||
|
data[i] += static_cast<double>(i);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main()
|
||||||
|
{
|
||||||
|
double* data = nullptr;
|
||||||
|
int blocks = 64;
|
||||||
|
int grids = 64;
|
||||||
|
auto ret = cudaMalloc(&data, blocks * grids * sizeof(double));
|
||||||
|
if(ret != cudaSuccess)
|
||||||
|
return EXIT_FAILURE;
|
||||||
|
kernel<<<grids, blocks>>>(blocks * grids, data);
|
||||||
|
cudaDeviceSynchronize();
|
||||||
|
return EXIT_SUCCESS;
|
||||||
|
}
|
||||||
|
")
|
||||||
|
|
||||||
|
TRY_COMPILE(_RET
|
||||||
|
${PROJECT_BINARY_DIR}/compile_tests
|
||||||
|
SOURCES ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cpp)
|
||||||
|
|
||||||
|
SET(${_VAR} ${_RET} CACHE STRING "CXX compiler supports building CUDA")
|
||||||
|
ENDFUNCTION()
|
||||||
|
|
||||||
|
# this function is provided to easily select which files use nvcc_wrapper:
|
||||||
|
#
|
||||||
|
# GLOBAL --> all files
|
||||||
|
# TARGET --> all files in a target
|
||||||
|
# SOURCE --> specific source files
|
||||||
|
# DIRECTORY --> all files in directory
|
||||||
|
# PROJECT --> all files/targets in a project/subproject
|
||||||
|
#
|
||||||
|
FUNCTION(kokkos_compilation)
|
||||||
|
# check whether the compiler already supports building CUDA
|
||||||
|
KOKKOS_CXX_COMPILER_CUDA_TEST(Kokkos_CXX_COMPILER_COMPILES_CUDA)
|
||||||
|
# if CUDA compile test has already been performed, just return
|
||||||
|
IF(Kokkos_CXX_COMPILER_COMPILES_CUDA)
|
||||||
|
RETURN()
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
|
CMAKE_PARSE_ARGUMENTS(COMP "GLOBAL;PROJECT" "" "DIRECTORY;TARGET;SOURCE" ${ARGN})
|
||||||
|
|
||||||
|
# find kokkos_launch_compiler
|
||||||
|
FIND_PROGRAM(Kokkos_COMPILE_LAUNCHER
|
||||||
|
NAMES kokkos_launch_compiler
|
||||||
|
HINTS ${PROJECT_SOURCE_DIR}
|
||||||
|
PATHS ${PROJECT_SOURCE_DIR}
|
||||||
|
PATH_SUFFIXES bin)
|
||||||
|
|
||||||
|
IF(NOT Kokkos_COMPILE_LAUNCHER)
|
||||||
|
MESSAGE(FATAL_ERROR "Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'")
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
|
IF(COMP_GLOBAL)
|
||||||
|
# if global, don't bother setting others
|
||||||
|
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
|
||||||
|
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
|
||||||
|
ELSE()
|
||||||
|
FOREACH(_TYPE PROJECT DIRECTORY TARGET SOURCE)
|
||||||
|
# make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...)
|
||||||
|
IF("${_TYPE}" STREQUAL "PROJECT" AND COMP_${_TYPE})
|
||||||
|
LIST(APPEND COMP_DIRECTORY ${PROJECT_SOURCE_DIR})
|
||||||
|
UNSET(COMP_${_TYPE})
|
||||||
|
ENDIF()
|
||||||
|
# set the properties if defined
|
||||||
|
IF(COMP_${_TYPE})
|
||||||
|
# MESSAGE(STATUS "Using nvcc_wrapper :: ${_TYPE} :: ${COMP_${_TYPE}}")
|
||||||
|
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
|
||||||
|
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
|
||||||
|
ENDIF()
|
||||||
|
ENDFOREACH()
|
||||||
|
ENDIF()
|
||||||
|
ENDFUNCTION()
|
||||||
|
## KOKKOS_CONFIG_HEADER - parse the data list which is a list of backend names
|
||||||
|
## and create output config header file...used for
|
||||||
|
## creating dynamic include files based on enabled backends
|
||||||
|
##
|
||||||
|
## SRC_FILE is input file
|
||||||
|
## TARGET_FILE output file
|
||||||
|
## HEADER_GUARD TEXT used with include header guard
|
||||||
|
## HEADER_PREFIX prefix used with include (i.e. fwd, decl, setup)
|
||||||
|
## DATA_LIST list of backends to include in generated file
|
||||||
|
FUNCTION(KOKKOS_CONFIG_HEADER SRC_FILE TARGET_FILE HEADER_GUARD HEADER_PREFIX DATA_LIST)
|
||||||
|
SET(HEADER_GUARD_TAG "${HEADER_GUARD}_HPP_")
|
||||||
|
CONFIGURE_FILE(cmake/${SRC_FILE} ${PROJECT_BINARY_DIR}/temp/${TARGET_FILE}.work COPYONLY)
|
||||||
|
FOREACH( BACKEND_NAME ${DATA_LIST} )
|
||||||
|
SET(INCLUDE_NEXT_FILE "#include <${HEADER_PREFIX}_${BACKEND_NAME}.hpp>
|
||||||
|
\@INCLUDE_NEXT_FILE\@")
|
||||||
|
CONFIGURE_FILE(${PROJECT_BINARY_DIR}/temp/${TARGET_FILE}.work ${PROJECT_BINARY_DIR}/temp/${TARGET_FILE}.work @ONLY)
|
||||||
|
ENDFOREACH()
|
||||||
|
SET(INCLUDE_NEXT_FILE "" )
|
||||||
|
CONFIGURE_FILE(${PROJECT_BINARY_DIR}/temp/${TARGET_FILE}.work ${TARGET_FILE} @ONLY)
|
||||||
|
ENDFUNCTION()
|
||||||
|
|||||||
@ -1,19 +1,17 @@
|
|||||||
# From CMake 3.10 documentation
|
# From CMake 3.10 documentation
|
||||||
|
|
||||||
#This can run at any time
|
#This can run at any time
|
||||||
KOKKOS_OPTION(CXX_STANDARD "" STRING "The C++ standard for Kokkos to use: 11, 14, 17, or 20. If empty, this will default to CMAKE_CXX_STANDARD. If both CMAKE_CXX_STANDARD and Kokkos_CXX_STANDARD are empty, this will default to 11")
|
KOKKOS_OPTION(CXX_STANDARD "" STRING "The C++ standard for Kokkos to use: 14, 17, or 20. If empty, this will default to CMAKE_CXX_STANDARD. If both CMAKE_CXX_STANDARD and Kokkos_CXX_STANDARD are empty, this will default to 14")
|
||||||
|
|
||||||
# Set CXX standard flags
|
# Set CXX standard flags
|
||||||
SET(KOKKOS_ENABLE_CXX11 OFF)
|
|
||||||
SET(KOKKOS_ENABLE_CXX14 OFF)
|
SET(KOKKOS_ENABLE_CXX14 OFF)
|
||||||
SET(KOKKOS_ENABLE_CXX17 OFF)
|
SET(KOKKOS_ENABLE_CXX17 OFF)
|
||||||
SET(KOKKOS_ENABLE_CXX20 OFF)
|
SET(KOKKOS_ENABLE_CXX20 OFF)
|
||||||
IF (KOKKOS_CXX_STANDARD)
|
IF (KOKKOS_CXX_STANDARD)
|
||||||
IF (${KOKKOS_CXX_STANDARD} STREQUAL "c++98")
|
IF (${KOKKOS_CXX_STANDARD} STREQUAL "c++98")
|
||||||
MESSAGE(FATAL_ERROR "Kokkos no longer supports C++98 - minimum C++11")
|
MESSAGE(FATAL_ERROR "Kokkos no longer supports C++98 - minimum C++14")
|
||||||
ELSEIF (${KOKKOS_CXX_STANDARD} STREQUAL "c++11")
|
ELSEIF (${KOKKOS_CXX_STANDARD} STREQUAL "c++11")
|
||||||
MESSAGE(WARNING "Deprecated Kokkos C++ standard set as 'c++11'. Use '11' instead.")
|
MESSAGE(FATAL_ERROR "Kokkos no longer supports C++11 - minimum C++14")
|
||||||
SET(KOKKOS_CXX_STANDARD "11")
|
|
||||||
ELSEIF(${KOKKOS_CXX_STANDARD} STREQUAL "c++14")
|
ELSEIF(${KOKKOS_CXX_STANDARD} STREQUAL "c++14")
|
||||||
MESSAGE(WARNING "Deprecated Kokkos C++ standard set as 'c++14'. Use '14' instead.")
|
MESSAGE(WARNING "Deprecated Kokkos C++ standard set as 'c++14'. Use '14' instead.")
|
||||||
SET(KOKKOS_CXX_STANDARD "14")
|
SET(KOKKOS_CXX_STANDARD "14")
|
||||||
@ -33,8 +31,8 @@ IF (KOKKOS_CXX_STANDARD)
|
|||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
IF (NOT KOKKOS_CXX_STANDARD AND NOT CMAKE_CXX_STANDARD)
|
IF (NOT KOKKOS_CXX_STANDARD AND NOT CMAKE_CXX_STANDARD)
|
||||||
MESSAGE(STATUS "Setting default Kokkos CXX standard to 11")
|
MESSAGE(STATUS "Setting default Kokkos CXX standard to 14")
|
||||||
SET(KOKKOS_CXX_STANDARD "11")
|
SET(KOKKOS_CXX_STANDARD "14")
|
||||||
ELSEIF(NOT KOKKOS_CXX_STANDARD)
|
ELSEIF(NOT KOKKOS_CXX_STANDARD)
|
||||||
MESSAGE(STATUS "Setting default Kokkos CXX standard to ${CMAKE_CXX_STANDARD}")
|
MESSAGE(STATUS "Setting default Kokkos CXX standard to ${CMAKE_CXX_STANDARD}")
|
||||||
SET(KOKKOS_CXX_STANDARD ${CMAKE_CXX_STANDARD})
|
SET(KOKKOS_CXX_STANDARD ${CMAKE_CXX_STANDARD})
|
||||||
|
|||||||
@ -29,7 +29,7 @@ FUNCTION(kokkos_set_cxx_standard_feature standard)
|
|||||||
ELSEIF(NOT KOKKOS_USE_CXX_EXTENSIONS AND ${STANDARD_NAME})
|
ELSEIF(NOT KOKKOS_USE_CXX_EXTENSIONS AND ${STANDARD_NAME})
|
||||||
MESSAGE(STATUS "Using ${${STANDARD_NAME}} for C++${standard} standard as feature")
|
MESSAGE(STATUS "Using ${${STANDARD_NAME}} for C++${standard} standard as feature")
|
||||||
IF (KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA AND (KOKKOS_CXX_HOST_COMPILER_ID STREQUAL GNU OR KOKKOS_CXX_HOST_COMPILER_ID STREQUAL Clang))
|
IF (KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA AND (KOKKOS_CXX_HOST_COMPILER_ID STREQUAL GNU OR KOKKOS_CXX_HOST_COMPILER_ID STREQUAL Clang))
|
||||||
SET(SUPPORTED_NVCC_FLAGS "-std=c++11;-std=c++14;-std=c++17")
|
SET(SUPPORTED_NVCC_FLAGS "-std=c++14;-std=c++17")
|
||||||
IF (NOT ${${STANDARD_NAME}} IN_LIST SUPPORTED_NVCC_FLAGS)
|
IF (NOT ${${STANDARD_NAME}} IN_LIST SUPPORTED_NVCC_FLAGS)
|
||||||
MESSAGE(FATAL_ERROR "CMake wants to use ${${STANDARD_NAME}} which is not supported by NVCC. Using a more recent host compiler or a more recent CMake version might help.")
|
MESSAGE(FATAL_ERROR "CMake wants to use ${${STANDARD_NAME}} which is not supported by NVCC. Using a more recent host compiler or a more recent CMake version might help.")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
@ -42,13 +42,16 @@ FUNCTION(kokkos_set_cxx_standard_feature standard)
|
|||||||
ELSEIF((KOKKOS_CXX_COMPILER_ID STREQUAL "NVIDIA") AND WIN32)
|
ELSEIF((KOKKOS_CXX_COMPILER_ID STREQUAL "NVIDIA") AND WIN32)
|
||||||
MESSAGE(STATUS "Using no flag for C++${standard} standard as feature")
|
MESSAGE(STATUS "Using no flag for C++${standard} standard as feature")
|
||||||
GLOBAL_SET(KOKKOS_CXX_STANDARD_FEATURE "")
|
GLOBAL_SET(KOKKOS_CXX_STANDARD_FEATURE "")
|
||||||
|
ELSEIF((KOKKOS_CXX_COMPILER_ID STREQUAL "Fujitsu"))
|
||||||
|
MESSAGE(STATUS "Using no flag for C++${standard} standard as feature")
|
||||||
|
GLOBAL_SET(KOKKOS_CXX_STANDARD_FEATURE "")
|
||||||
ELSE()
|
ELSE()
|
||||||
#nope, we can't do anything here
|
#nope, we can't do anything here
|
||||||
MESSAGE(WARNING "C++${standard} is not supported as a compiler feature. We will choose custom flags for now, but this behavior has been deprecated. Please open an issue at https://github.com/kokkos/kokkos/issues reporting that ${KOKKOS_CXX_COMPILER_ID} ${KOKKOS_CXX_COMPILER_VERSION} failed for ${KOKKOS_CXX_STANDARD}, preferrably including your CMake command.")
|
MESSAGE(WARNING "C++${standard} is not supported as a compiler feature. We will choose custom flags for now, but this behavior has been deprecated. Please open an issue at https://github.com/kokkos/kokkos/issues reporting that ${KOKKOS_CXX_COMPILER_ID} ${KOKKOS_CXX_COMPILER_VERSION} failed for ${KOKKOS_CXX_STANDARD}, preferably including your CMake command.")
|
||||||
GLOBAL_SET(KOKKOS_CXX_STANDARD_FEATURE "")
|
GLOBAL_SET(KOKKOS_CXX_STANDARD_FEATURE "")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
IF(NOT WIN32)
|
IF((NOT WIN32) AND (NOT ("${KOKKOS_CXX_COMPILER_ID}" STREQUAL "Fujitsu")))
|
||||||
IF(NOT ${FEATURE_NAME} IN_LIST CMAKE_CXX_COMPILE_FEATURES)
|
IF(NOT ${FEATURE_NAME} IN_LIST CMAKE_CXX_COMPILE_FEATURES)
|
||||||
MESSAGE(FATAL_ERROR "Compiler ${KOKKOS_CXX_COMPILER_ID} should support ${FEATURE_NAME}, but CMake reports feature not supported")
|
MESSAGE(FATAL_ERROR "Compiler ${KOKKOS_CXX_COMPILER_ID} should support ${FEATURE_NAME}, but CMake reports feature not supported")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
@ -65,11 +68,7 @@ IF (KOKKOS_CXX_STANDARD AND CMAKE_CXX_STANDARD)
|
|||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
|
||||||
IF (KOKKOS_CXX_STANDARD STREQUAL "11" )
|
IF(KOKKOS_CXX_STANDARD STREQUAL "14")
|
||||||
kokkos_set_cxx_standard_feature(11)
|
|
||||||
SET(KOKKOS_ENABLE_CXX11 ON)
|
|
||||||
SET(KOKKOS_CXX_INTERMEDIATE_STANDARD "11")
|
|
||||||
ELSEIF(KOKKOS_CXX_STANDARD STREQUAL "14")
|
|
||||||
kokkos_set_cxx_standard_feature(14)
|
kokkos_set_cxx_standard_feature(14)
|
||||||
SET(KOKKOS_CXX_INTERMEDIATE_STANDARD "1Y")
|
SET(KOKKOS_CXX_INTERMEDIATE_STANDARD "1Y")
|
||||||
SET(KOKKOS_ENABLE_CXX14 ON)
|
SET(KOKKOS_ENABLE_CXX14 ON)
|
||||||
@ -81,21 +80,21 @@ ELSEIF(KOKKOS_CXX_STANDARD STREQUAL "20")
|
|||||||
kokkos_set_cxx_standard_feature(20)
|
kokkos_set_cxx_standard_feature(20)
|
||||||
SET(KOKKOS_CXX_INTERMEDIATE_STANDARD "2A")
|
SET(KOKKOS_CXX_INTERMEDIATE_STANDARD "2A")
|
||||||
SET(KOKKOS_ENABLE_CXX20 ON)
|
SET(KOKKOS_ENABLE_CXX20 ON)
|
||||||
ELSEIF(KOKKOS_CXX_STANDARD STREQUAL "98")
|
ELSEIF(KOKKOS_CXX_STANDARD STREQUAL "98" OR KOKKOS_CXX_STANDARD STREQUAL "11")
|
||||||
MESSAGE(FATAL_ERROR "Kokkos requires C++11 or newer!")
|
MESSAGE(FATAL_ERROR "Kokkos requires C++14 or newer!")
|
||||||
ELSE()
|
ELSE()
|
||||||
MESSAGE(FATAL_ERROR "Unknown C++ standard ${KOKKOS_CXX_STANDARD} - must be 11, 14, 17, or 20")
|
MESSAGE(FATAL_ERROR "Unknown C++ standard ${KOKKOS_CXX_STANDARD} - must be 14, 17, or 20")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# Enforce that extensions are turned off for nvcc_wrapper.
|
# Enforce that extensions are turned off for nvcc_wrapper.
|
||||||
# For compiling CUDA code using nvcc_wrapper, we will use the host compiler's
|
# For compiling CUDA code using nvcc_wrapper, we will use the host compiler's
|
||||||
# flags for turning on C++11. Since for compiler ID and versioning purposes
|
# flags for turning on C++14. Since for compiler ID and versioning purposes
|
||||||
# CMake recognizes the host compiler when calling nvcc_wrapper, this just
|
# CMake recognizes the host compiler when calling nvcc_wrapper, this just
|
||||||
# works. Both NVCC and nvcc_wrapper only recognize '-std=c++11' which means
|
# works. Both NVCC and nvcc_wrapper only recognize '-std=c++14' which means
|
||||||
# that we can only use host compilers for CUDA builds that use those flags.
|
# that we can only use host compilers for CUDA builds that use those flags.
|
||||||
# It also means that extensions (gnu++11) can't be turned on for CUDA builds.
|
# It also means that extensions (gnu++14) can't be turned on for CUDA builds.
|
||||||
|
|
||||||
IF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
|
IF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
|
||||||
IF(NOT DEFINED CMAKE_CXX_EXTENSIONS)
|
IF(NOT DEFINED CMAKE_CXX_EXTENSIONS)
|
||||||
@ -117,7 +116,7 @@ IF(KOKKOS_ENABLE_CUDA)
|
|||||||
MESSAGE(FATAL_ERROR "Compiling CUDA code with clang doesn't support C++ extensions. Set -DCMAKE_CXX_EXTENSIONS=OFF")
|
MESSAGE(FATAL_ERROR "Compiling CUDA code with clang doesn't support C++ extensions. Set -DCMAKE_CXX_EXTENSIONS=OFF")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ELSEIF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
|
ELSEIF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
|
||||||
MESSAGE(FATAL_ERROR "Invalid compiler for CUDA. The compiler must be nvcc_wrapper or Clang, but compiler ID was ${KOKKOS_CXX_COMPILER_ID}")
|
MESSAGE(FATAL_ERROR "Invalid compiler for CUDA. The compiler must be nvcc_wrapper or Clang or use kokkos_launch_compiler, but compiler ID was ${KOKKOS_CXX_COMPILER_ID}")
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
|||||||
@ -76,3 +76,7 @@ STRING(REPLACE ";" "\n" KOKKOS_TPL_EXPORT_TEMP "${KOKKOS_TPL_EXPORTS}")
|
|||||||
#Convert to a regular variable
|
#Convert to a regular variable
|
||||||
UNSET(KOKKOS_TPL_EXPORTS CACHE)
|
UNSET(KOKKOS_TPL_EXPORTS CACHE)
|
||||||
SET(KOKKOS_TPL_EXPORTS ${KOKKOS_TPL_EXPORT_TEMP})
|
SET(KOKKOS_TPL_EXPORTS ${KOKKOS_TPL_EXPORT_TEMP})
|
||||||
|
IF (KOKKOS_ENABLE_MEMKIND)
|
||||||
|
SET(KOKKOS_ENABLE_HBWSPACE)
|
||||||
|
LIST(APPEND KOKKOS_MEMSPACE_LIST HBWSpace)
|
||||||
|
ENDIF()
|
||||||
|
|||||||
@ -6,6 +6,12 @@ INCLUDE(GNUInstallDirs)
|
|||||||
|
|
||||||
MESSAGE(STATUS "The project name is: ${PROJECT_NAME}")
|
MESSAGE(STATUS "The project name is: ${PROJECT_NAME}")
|
||||||
|
|
||||||
|
FUNCTION(VERIFY_EMPTY CONTEXT)
|
||||||
|
if(${ARGN})
|
||||||
|
MESSAGE(FATAL_ERROR "Kokkos does not support all of Tribits. Unhandled arguments in ${CONTEXT}:\n${ARGN}")
|
||||||
|
endif()
|
||||||
|
ENDFUNCTION()
|
||||||
|
|
||||||
#Leave this here for now - but only do for tribits
|
#Leave this here for now - but only do for tribits
|
||||||
#This breaks the standalone CMake
|
#This breaks the standalone CMake
|
||||||
IF (KOKKOS_HAS_TRILINOS)
|
IF (KOKKOS_HAS_TRILINOS)
|
||||||
@ -135,28 +141,37 @@ FUNCTION(KOKKOS_ADD_EXECUTABLE ROOT_NAME)
|
|||||||
ENDFUNCTION()
|
ENDFUNCTION()
|
||||||
|
|
||||||
FUNCTION(KOKKOS_ADD_EXECUTABLE_AND_TEST ROOT_NAME)
|
FUNCTION(KOKKOS_ADD_EXECUTABLE_AND_TEST ROOT_NAME)
|
||||||
|
CMAKE_PARSE_ARGUMENTS(PARSE
|
||||||
|
""
|
||||||
|
""
|
||||||
|
"SOURCES;CATEGORIES;ARGS"
|
||||||
|
${ARGN})
|
||||||
|
VERIFY_EMPTY(KOKKOS_ADD_EXECUTABLE_AND_TEST ${PARSE_UNPARSED_ARGUMENTS})
|
||||||
|
|
||||||
IF (KOKKOS_HAS_TRILINOS)
|
IF (KOKKOS_HAS_TRILINOS)
|
||||||
|
IF(DEFINED PARSE_ARGS)
|
||||||
|
STRING(REPLACE ";" " " PARSE_ARGS "${PARSE_ARGS}")
|
||||||
|
ENDIF()
|
||||||
TRIBITS_ADD_EXECUTABLE_AND_TEST(
|
TRIBITS_ADD_EXECUTABLE_AND_TEST(
|
||||||
${ROOT_NAME}
|
${ROOT_NAME}
|
||||||
|
SOURCES ${PARSE_SOURCES}
|
||||||
TESTONLYLIBS kokkos_gtest
|
TESTONLYLIBS kokkos_gtest
|
||||||
${ARGN}
|
|
||||||
NUM_MPI_PROCS 1
|
NUM_MPI_PROCS 1
|
||||||
COMM serial mpi
|
COMM serial mpi
|
||||||
|
ARGS ${PARSE_ARGS}
|
||||||
|
CATEGORIES ${PARSE_CATEGORIES}
|
||||||
|
SOURCES ${PARSE_SOURCES}
|
||||||
FAIL_REGULAR_EXPRESSION " FAILED "
|
FAIL_REGULAR_EXPRESSION " FAILED "
|
||||||
|
ARGS ${PARSE_ARGS}
|
||||||
)
|
)
|
||||||
ELSE()
|
ELSE()
|
||||||
CMAKE_PARSE_ARGUMENTS(PARSE
|
|
||||||
""
|
|
||||||
""
|
|
||||||
"SOURCES;CATEGORIES"
|
|
||||||
${ARGN})
|
|
||||||
VERIFY_EMPTY(KOKKOS_ADD_EXECUTABLE_AND_TEST ${PARSE_UNPARSED_ARGUMENTS})
|
|
||||||
KOKKOS_ADD_TEST_EXECUTABLE(${ROOT_NAME}
|
KOKKOS_ADD_TEST_EXECUTABLE(${ROOT_NAME}
|
||||||
SOURCES ${PARSE_SOURCES}
|
SOURCES ${PARSE_SOURCES}
|
||||||
)
|
)
|
||||||
KOKKOS_ADD_TEST(NAME ${ROOT_NAME}
|
KOKKOS_ADD_TEST(NAME ${ROOT_NAME}
|
||||||
EXE ${ROOT_NAME}
|
EXE ${ROOT_NAME}
|
||||||
FAIL_REGULAR_EXPRESSION " FAILED "
|
FAIL_REGULAR_EXPRESSION " FAILED "
|
||||||
|
ARGS ${PARSE_ARGS}
|
||||||
)
|
)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
ENDFUNCTION()
|
ENDFUNCTION()
|
||||||
@ -219,6 +234,7 @@ MACRO(KOKKOS_ADD_TEST_EXECUTABLE ROOT_NAME)
|
|||||||
${PARSE_UNPARSED_ARGUMENTS}
|
${PARSE_UNPARSED_ARGUMENTS}
|
||||||
TESTONLYLIBS kokkos_gtest
|
TESTONLYLIBS kokkos_gtest
|
||||||
)
|
)
|
||||||
|
SET(EXE_NAME ${PACKAGE_NAME}_${ROOT_NAME})
|
||||||
ENDMACRO()
|
ENDMACRO()
|
||||||
|
|
||||||
MACRO(KOKKOS_PACKAGE_POSTPROCESS)
|
MACRO(KOKKOS_PACKAGE_POSTPROCESS)
|
||||||
@ -227,6 +243,79 @@ MACRO(KOKKOS_PACKAGE_POSTPROCESS)
|
|||||||
endif()
|
endif()
|
||||||
ENDMACRO()
|
ENDMACRO()
|
||||||
|
|
||||||
|
## KOKKOS_CONFIGURE_CORE Configure/Generate header files for core content based
|
||||||
|
## on enabled backends.
|
||||||
|
## KOKKOS_FWD is the forward declare set
|
||||||
|
## KOKKOS_SETUP is included in Kokkos_Macros.hpp and include prefix includes/defines
|
||||||
|
## KOKKOS_DECLARE is the declaration set
|
||||||
|
## KOKKOS_POST_INCLUDE is included at the end of Kokkos_Core.hpp
|
||||||
|
MACRO(KOKKOS_CONFIGURE_CORE)
|
||||||
|
SET(FWD_BACKEND_LIST)
|
||||||
|
FOREACH(MEMSPACE ${KOKKOS_MEMSPACE_LIST})
|
||||||
|
LIST(APPEND FWD_BACKEND_LIST ${MEMSPACE})
|
||||||
|
ENDFOREACH()
|
||||||
|
FOREACH(BACKEND_ ${KOKKOS_ENABLED_DEVICES})
|
||||||
|
IF( ${BACKEND_} STREQUAL "PTHREAD")
|
||||||
|
LIST(APPEND FWD_BACKEND_LIST THREADS)
|
||||||
|
ELSE()
|
||||||
|
LIST(APPEND FWD_BACKEND_LIST ${BACKEND_})
|
||||||
|
ENDIF()
|
||||||
|
ENDFOREACH()
|
||||||
|
MESSAGE(STATUS "Kokkos Devices: ${KOKKOS_ENABLED_DEVICES}, Kokkos Backends: ${FWD_BACKEND_LIST}")
|
||||||
|
KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_FwdBackend.hpp "KOKKOS_FWD" "fwd/Kokkos_Fwd" "${FWD_BACKEND_LIST}")
|
||||||
|
KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_SetupBackend.hpp "KOKKOS_SETUP" "setup/Kokkos_Setup" "${DEVICE_SETUP_LIST}")
|
||||||
|
KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_DeclareBackend.hpp "KOKKOS_DECLARE" "decl/Kokkos_Declare" "${FWD_BACKEND_LIST}")
|
||||||
|
KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_PostInclude.hpp "KOKKOS_POST_INCLUDE" "Kokkos_Post_Include" "${KOKKOS_BACKEND_POST_INCLUDE_LIST}")
|
||||||
|
SET(_DEFAULT_HOST_MEMSPACE "::Kokkos::HostSpace")
|
||||||
|
KOKKOS_OPTION(DEFAULT_DEVICE_MEMORY_SPACE "" STRING "Override default device memory space")
|
||||||
|
KOKKOS_OPTION(DEFAULT_HOST_MEMORY_SPACE "" STRING "Override default host memory space")
|
||||||
|
KOKKOS_OPTION(DEFAULT_DEVICE_EXECUTION_SPACE "" STRING "Override default device execution space")
|
||||||
|
KOKKOS_OPTION(DEFAULT_HOST_PARALLEL_EXECUTION_SPACE "" STRING "Override default host parallel execution space")
|
||||||
|
IF (NOT Kokkos_DEFAULT_DEVICE_EXECUTION_SPACE STREQUAL "")
|
||||||
|
SET(_DEVICE_PARALLEL ${Kokkos_DEFAULT_DEVICE_EXECUTION_SPACE})
|
||||||
|
MESSAGE(STATUS "Override default device execution space: ${_DEVICE_PARALLEL}")
|
||||||
|
SET(KOKKOS_DEVICE_SPACE_ACTIVE ON)
|
||||||
|
ELSE()
|
||||||
|
IF (_DEVICE_PARALLEL STREQUAL "NoTypeDefined")
|
||||||
|
SET(KOKKOS_DEVICE_SPACE_ACTIVE OFF)
|
||||||
|
ELSE()
|
||||||
|
SET(KOKKOS_DEVICE_SPACE_ACTIVE ON)
|
||||||
|
ENDIF()
|
||||||
|
ENDIF()
|
||||||
|
IF (NOT Kokkos_DEFAULT_HOST_PARALLEL_EXECUTION_SPACE STREQUAL "")
|
||||||
|
SET(_HOST_PARALLEL ${Kokkos_DEFAULT_HOST_PARALLEL_EXECUTION_SPACE})
|
||||||
|
MESSAGE(STATUS "Override default host parallel execution space: ${_HOST_PARALLEL}")
|
||||||
|
SET(KOKKOS_HOSTPARALLEL_SPACE_ACTIVE ON)
|
||||||
|
ELSE()
|
||||||
|
IF (_HOST_PARALLEL STREQUAL "NoTypeDefined")
|
||||||
|
SET(KOKKOS_HOSTPARALLEL_SPACE_ACTIVE OFF)
|
||||||
|
ELSE()
|
||||||
|
SET(KOKKOS_HOSTPARALLEL_SPACE_ACTIVE ON)
|
||||||
|
ENDIF()
|
||||||
|
ENDIF()
|
||||||
|
#We are ready to configure the header
|
||||||
|
CONFIGURE_FILE(cmake/KokkosCore_config.h.in KokkosCore_config.h @ONLY)
|
||||||
|
ENDMACRO()
|
||||||
|
|
||||||
|
## KOKKOS_INSTALL_ADDITIONAL_FILES - instruct cmake to install files in target destination.
|
||||||
|
## Includes generated header files, scripts such as nvcc_wrapper and hpcbind,
|
||||||
|
## as well as other files provided through plugins.
|
||||||
|
MACRO(KOKKOS_INSTALL_ADDITIONAL_FILES)
|
||||||
|
# kokkos_launch_compiler is used by Kokkos to prefix compiler commands so that they forward to nvcc_wrapper
|
||||||
|
INSTALL(PROGRAMS
|
||||||
|
"${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper"
|
||||||
|
"${CMAKE_CURRENT_SOURCE_DIR}/bin/hpcbind"
|
||||||
|
"${CMAKE_CURRENT_SOURCE_DIR}/bin/kokkos_launch_compiler"
|
||||||
|
DESTINATION ${CMAKE_INSTALL_BINDIR})
|
||||||
|
INSTALL(FILES
|
||||||
|
"${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_config.h"
|
||||||
|
"${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_FwdBackend.hpp"
|
||||||
|
"${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_SetupBackend.hpp"
|
||||||
|
"${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_DeclareBackend.hpp"
|
||||||
|
"${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_PostInclude.hpp"
|
||||||
|
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})
|
||||||
|
ENDMACRO()
|
||||||
|
|
||||||
FUNCTION(KOKKOS_SET_LIBRARY_PROPERTIES LIBRARY_NAME)
|
FUNCTION(KOKKOS_SET_LIBRARY_PROPERTIES LIBRARY_NAME)
|
||||||
CMAKE_PARSE_ARGUMENTS(PARSE
|
CMAKE_PARSE_ARGUMENTS(PARSE
|
||||||
"PLAIN_STYLE"
|
"PLAIN_STYLE"
|
||||||
|
|||||||
@ -1,14 +1,16 @@
|
|||||||
# @HEADER
|
# @HEADER
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
#
|
#
|
||||||
# Trilinos: An Object-Oriented Solver Framework
|
# Kokkos v. 3.0
|
||||||
# Copyright (2001) Sandia Corporation
|
# Copyright (2020) National Technology & Engineering
|
||||||
|
# Solutions of Sandia, LLC (NTESS).
|
||||||
#
|
#
|
||||||
|
# Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
# the U.S. Government retains certain rights in this software.
|
||||||
#
|
#
|
||||||
# Copyright (2001) Sandia Corporation. Under the terms of Contract
|
# Redistribution and use in source and binary forms, with or without
|
||||||
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
|
# modification, are permitted provided that the following conditions are
|
||||||
# work by or on behalf of the U.S. Government. Export of this program
|
# met:
|
||||||
# may require a license from the United States Government.
|
|
||||||
#
|
#
|
||||||
# 1. Redistributions of source code must retain the above copyright
|
# 1. Redistributions of source code must retain the above copyright
|
||||||
# notice, this list of conditions and the following disclaimer.
|
# notice, this list of conditions and the following disclaimer.
|
||||||
@ -21,10 +23,10 @@
|
|||||||
# contributors may be used to endorse or promote products derived from
|
# contributors may be used to endorse or promote products derived from
|
||||||
# this software without specific prior written permission.
|
# this software without specific prior written permission.
|
||||||
#
|
#
|
||||||
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
# THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
@ -33,22 +35,7 @@
|
|||||||
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
#
|
#
|
||||||
# NOTICE: The United States Government is granted for itself and others
|
# Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
|
|
||||||
# license in this data to reproduce, prepare derivative works, and
|
|
||||||
# perform publicly and display publicly. Beginning five (5) years from
|
|
||||||
# July 25, 2001, the United States Government is granted for itself and
|
|
||||||
# others acting on its behalf a paid-up, nonexclusive, irrevocable
|
|
||||||
# worldwide license in this data to reproduce, prepare derivative works,
|
|
||||||
# distribute copies to the public, perform publicly and display
|
|
||||||
# publicly, and to permit others to do so.
|
|
||||||
#
|
|
||||||
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
|
|
||||||
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
|
|
||||||
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
|
|
||||||
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
|
|
||||||
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
|
|
||||||
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
|
|
||||||
#
|
#
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
# @HEADER
|
# @HEADER
|
||||||
|
|||||||
@ -1,14 +1,16 @@
|
|||||||
# @HEADER
|
# @HEADER
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
#
|
#
|
||||||
# Trilinos: An Object-Oriented Solver Framework
|
# Kokkos v. 3.0
|
||||||
# Copyright (2001) Sandia Corporation
|
# Copyright (2020) National Technology & Engineering
|
||||||
|
# Solutions of Sandia, LLC (NTESS).
|
||||||
#
|
#
|
||||||
|
# Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
# the U.S. Government retains certain rights in this software.
|
||||||
#
|
#
|
||||||
# Copyright (2001) Sandia Corporation. Under the terms of Contract
|
# Redistribution and use in source and binary forms, with or without
|
||||||
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
|
# modification, are permitted provided that the following conditions are
|
||||||
# work by or on behalf of the U.S. Government. Export of this program
|
# met:
|
||||||
# may require a license from the United States Government.
|
|
||||||
#
|
#
|
||||||
# 1. Redistributions of source code must retain the above copyright
|
# 1. Redistributions of source code must retain the above copyright
|
||||||
# notice, this list of conditions and the following disclaimer.
|
# notice, this list of conditions and the following disclaimer.
|
||||||
@ -21,10 +23,10 @@
|
|||||||
# contributors may be used to endorse or promote products derived from
|
# contributors may be used to endorse or promote products derived from
|
||||||
# this software without specific prior written permission.
|
# this software without specific prior written permission.
|
||||||
#
|
#
|
||||||
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
# THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
@ -33,22 +35,7 @@
|
|||||||
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
#
|
#
|
||||||
# NOTICE: The United States Government is granted for itself and others
|
# Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
|
|
||||||
# license in this data to reproduce, prepare derivative works, and
|
|
||||||
# perform publicly and display publicly. Beginning five (5) years from
|
|
||||||
# July 25, 2001, the United States Government is granted for itself and
|
|
||||||
# others acting on its behalf a paid-up, nonexclusive, irrevocable
|
|
||||||
# worldwide license in this data to reproduce, prepare derivative works,
|
|
||||||
# distribute copies to the public, perform publicly and display
|
|
||||||
# publicly, and to permit others to do so.
|
|
||||||
#
|
|
||||||
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
|
|
||||||
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
|
|
||||||
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
|
|
||||||
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
|
|
||||||
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
|
|
||||||
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
|
|
||||||
#
|
#
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
# @HEADER
|
# @HEADER
|
||||||
|
|||||||
@ -1,14 +1,16 @@
|
|||||||
# @HEADER
|
# @HEADER
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
#
|
#
|
||||||
# Trilinos: An Object-Oriented Solver Framework
|
# Kokkos v. 3.0
|
||||||
# Copyright (2001) Sandia Corporation
|
# Copyright (2020) National Technology & Engineering
|
||||||
|
# Solutions of Sandia, LLC (NTESS).
|
||||||
#
|
#
|
||||||
|
# Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
# the U.S. Government retains certain rights in this software.
|
||||||
#
|
#
|
||||||
# Copyright (2001) Sandia Corporation. Under the terms of Contract
|
# Redistribution and use in source and binary forms, with or without
|
||||||
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
|
# modification, are permitted provided that the following conditions are
|
||||||
# work by or on behalf of the U.S. Government. Export of this program
|
# met:
|
||||||
# may require a license from the United States Government.
|
|
||||||
#
|
#
|
||||||
# 1. Redistributions of source code must retain the above copyright
|
# 1. Redistributions of source code must retain the above copyright
|
||||||
# notice, this list of conditions and the following disclaimer.
|
# notice, this list of conditions and the following disclaimer.
|
||||||
@ -21,10 +23,10 @@
|
|||||||
# contributors may be used to endorse or promote products derived from
|
# contributors may be used to endorse or promote products derived from
|
||||||
# this software without specific prior written permission.
|
# this software without specific prior written permission.
|
||||||
#
|
#
|
||||||
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
# THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
@ -33,22 +35,7 @@
|
|||||||
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
#
|
#
|
||||||
# NOTICE: The United States Government is granted for itself and others
|
# Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
|
|
||||||
# license in this data to reproduce, prepare derivative works, and
|
|
||||||
# perform publicly and display publicly. Beginning five (5) years from
|
|
||||||
# July 25, 2001, the United States Government is granted for itself and
|
|
||||||
# others acting on its behalf a paid-up, nonexclusive, irrevocable
|
|
||||||
# worldwide license in this data to reproduce, prepare derivative works,
|
|
||||||
# distribute copies to the public, perform publicly and display
|
|
||||||
# publicly, and to permit others to do so.
|
|
||||||
#
|
|
||||||
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
|
|
||||||
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
|
|
||||||
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
|
|
||||||
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
|
|
||||||
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
|
|
||||||
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
|
|
||||||
#
|
#
|
||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
# @HEADER
|
# @HEADER
|
||||||
|
|||||||
@ -3,44 +3,26 @@ KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
|||||||
KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
||||||
KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
||||||
|
|
||||||
IF(Kokkos_ENABLE_CUDA)
|
foreach(Tag Threads;OpenMP;Cuda;HPX;HIP)
|
||||||
SET(SOURCES
|
# Because there is always an exception to the rule
|
||||||
TestMain.cpp
|
if(Tag STREQUAL "Threads")
|
||||||
TestCuda.cpp
|
set(DEVICE "PTHREAD")
|
||||||
)
|
else()
|
||||||
|
string(TOUPPER ${Tag} DEVICE)
|
||||||
|
endif()
|
||||||
|
string(TOLOWER ${Tag} dir)
|
||||||
|
|
||||||
KOKKOS_ADD_EXECUTABLE_AND_TEST( PerformanceTest_Cuda
|
if(Kokkos_ENABLE_${DEVICE})
|
||||||
SOURCES ${SOURCES}
|
message(STATUS "Sources Test${Tag}.cpp")
|
||||||
)
|
|
||||||
ENDIF()
|
|
||||||
|
|
||||||
IF(Kokkos_ENABLE_PTHREAD)
|
set(SOURCES
|
||||||
SET(SOURCES
|
TestMain.cpp
|
||||||
TestMain.cpp
|
Test${Tag}.cpp
|
||||||
TestThreads.cpp
|
)
|
||||||
)
|
|
||||||
KOKKOS_ADD_EXECUTABLE_AND_TEST( PerformanceTest_Threads
|
|
||||||
SOURCES ${SOURCES}
|
|
||||||
)
|
|
||||||
ENDIF()
|
|
||||||
|
|
||||||
IF(Kokkos_ENABLE_OPENMP)
|
|
||||||
SET(SOURCES
|
|
||||||
TestMain.cpp
|
|
||||||
TestOpenMP.cpp
|
|
||||||
)
|
|
||||||
KOKKOS_ADD_EXECUTABLE_AND_TEST( PerformanceTest_OpenMP
|
|
||||||
SOURCES ${SOURCES}
|
|
||||||
)
|
|
||||||
ENDIF()
|
|
||||||
|
|
||||||
IF(Kokkos_ENABLE_HPX)
|
|
||||||
SET(SOURCES
|
|
||||||
TestMain.cpp
|
|
||||||
TestHPX.cpp
|
|
||||||
)
|
|
||||||
KOKKOS_ADD_EXECUTABLE_AND_TEST( PerformanceTest_HPX
|
|
||||||
SOURCES ${SOURCES}
|
|
||||||
)
|
|
||||||
ENDIF()
|
|
||||||
|
|
||||||
|
KOKKOS_ADD_EXECUTABLE_AND_TEST(
|
||||||
|
PerformanceTest_${Tag}
|
||||||
|
SOURCES ${SOURCES}
|
||||||
|
)
|
||||||
|
endif()
|
||||||
|
endforeach()
|
||||||
|
|||||||
@ -58,8 +58,8 @@ endif
|
|||||||
KokkosContainers_PerformanceTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
|
KokkosContainers_PerformanceTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_Cuda
|
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_Cuda
|
||||||
|
|
||||||
KokkosContainers_PerformanceTest_ROCm: $(OBJ_ROCM) $(KOKKOS_LINK_DEPENDS)
|
KokkosContainers_PerformanceTest_HIP: $(OBJ_HIP) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ROCM) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_ROCm
|
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_HIP) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_HIP
|
||||||
|
|
||||||
KokkosContainers_PerformanceTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
|
KokkosContainers_PerformanceTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_Threads
|
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_Threads
|
||||||
@ -73,8 +73,8 @@ KokkosContainers_PerformanceTest_HPX: $(OBJ_HPX) $(KOKKOS_LINK_DEPENDS)
|
|||||||
test-cuda: KokkosContainers_PerformanceTest_Cuda
|
test-cuda: KokkosContainers_PerformanceTest_Cuda
|
||||||
./KokkosContainers_PerformanceTest_Cuda
|
./KokkosContainers_PerformanceTest_Cuda
|
||||||
|
|
||||||
test-rocm: KokkosContainers_PerformanceTest_ROCm
|
test-hip: KokkosContainers_PerformanceTest_HIP
|
||||||
./KokkosContainers_PerformanceTest_ROCm
|
./KokkosContainers_PerformanceTest_HIP
|
||||||
|
|
||||||
test-threads: KokkosContainers_PerformanceTest_Threads
|
test-threads: KokkosContainers_PerformanceTest_Threads
|
||||||
./KokkosContainers_PerformanceTest_Threads
|
./KokkosContainers_PerformanceTest_Threads
|
||||||
|
|||||||
@ -43,7 +43,6 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
#if defined(KOKKOS_ENABLE_CUDA)
|
|
||||||
|
|
||||||
#include <cstdint>
|
#include <cstdint>
|
||||||
#include <string>
|
#include <string>
|
||||||
@ -66,23 +65,13 @@
|
|||||||
|
|
||||||
namespace Performance {
|
namespace Performance {
|
||||||
|
|
||||||
class cuda : public ::testing::Test {
|
TEST(TEST_CATEGORY, dynrankview_perf) {
|
||||||
protected:
|
|
||||||
static void SetUpTestCase() {
|
|
||||||
std::cout << std::setprecision(5) << std::scientific;
|
|
||||||
Kokkos::InitArguments args(-1, -1, 0);
|
|
||||||
Kokkos::initialize(args);
|
|
||||||
}
|
|
||||||
static void TearDownTestCase() { Kokkos::finalize(); }
|
|
||||||
};
|
|
||||||
|
|
||||||
TEST_F(cuda, dynrankview_perf) {
|
|
||||||
std::cout << "Cuda" << std::endl;
|
std::cout << "Cuda" << std::endl;
|
||||||
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
||||||
test_dynrankview_op_perf<Kokkos::Cuda>(40960);
|
test_dynrankview_op_perf<Kokkos::Cuda>(40960);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(cuda, global_2_local) {
|
TEST(TEST_CATEGORY, global_2_local) {
|
||||||
std::cout << "Cuda" << std::endl;
|
std::cout << "Cuda" << std::endl;
|
||||||
std::cout << "size, create, generate, fill, find" << std::endl;
|
std::cout << "size, create, generate, fill, find" << std::endl;
|
||||||
for (unsigned i = Performance::begin_id_size; i <= Performance::end_id_size;
|
for (unsigned i = Performance::begin_id_size; i <= Performance::end_id_size;
|
||||||
@ -90,15 +79,12 @@ TEST_F(cuda, global_2_local) {
|
|||||||
test_global_to_local_ids<Kokkos::Cuda>(i);
|
test_global_to_local_ids<Kokkos::Cuda>(i);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(cuda, unordered_map_performance_near) {
|
TEST(TEST_CATEGORY, unordered_map_performance_near) {
|
||||||
Perf::run_performance_tests<Kokkos::Cuda, true>("cuda-near");
|
Perf::run_performance_tests<Kokkos::Cuda, true>("cuda-near");
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(cuda, unordered_map_performance_far) {
|
TEST(TEST_CATEGORY, unordered_map_performance_far) {
|
||||||
Perf::run_performance_tests<Kokkos::Cuda, false>("cuda-far");
|
Perf::run_performance_tests<Kokkos::Cuda, false>("cuda-far");
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Performance
|
} // namespace Performance
|
||||||
#else
|
|
||||||
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTCUDA_PREVENT_EMPTY_LINK_ERROR() {}
|
|
||||||
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
|
|
||||||
|
|||||||
@ -43,7 +43,6 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
#if defined(KOKKOS_ENABLE_ROCM)
|
|
||||||
|
|
||||||
#include <cstdint>
|
#include <cstdint>
|
||||||
#include <string>
|
#include <string>
|
||||||
@ -66,46 +65,26 @@
|
|||||||
|
|
||||||
namespace Performance {
|
namespace Performance {
|
||||||
|
|
||||||
class rocm : public ::testing::Test {
|
TEST(TEST_CATEGORY, dynrankview_perf) {
|
||||||
protected:
|
std::cout << "HIP" << std::endl;
|
||||||
static void SetUpTestCase() {
|
|
||||||
std::cout << std::setprecision(5) << std::scientific;
|
|
||||||
Kokkos::HostSpace::execution_space::initialize();
|
|
||||||
Kokkos::Experimental::ROCm::initialize(
|
|
||||||
Kokkos::Experimental::ROCm::SelectDevice(0));
|
|
||||||
}
|
|
||||||
static void TearDownTestCase() {
|
|
||||||
Kokkos::Experimental::ROCm::finalize();
|
|
||||||
Kokkos::HostSpace::execution_space::finalize();
|
|
||||||
}
|
|
||||||
};
|
|
||||||
#if 0
|
|
||||||
// issue 1089
|
|
||||||
TEST_F( rocm, dynrankview_perf )
|
|
||||||
{
|
|
||||||
std::cout << "ROCm" << std::endl;
|
|
||||||
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
||||||
test_dynrankview_op_perf<Kokkos::Experimental::ROCm>( 40960 );
|
test_dynrankview_op_perf<Kokkos::Experimental::HIP>(40960);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F( rocm, global_2_local)
|
TEST(TEST_CATEGORY, global_2_local) {
|
||||||
{
|
std::cout << "HIP" << std::endl;
|
||||||
std::cout << "ROCm" << std::endl;
|
|
||||||
std::cout << "size, create, generate, fill, find" << std::endl;
|
std::cout << "size, create, generate, fill, find" << std::endl;
|
||||||
for (unsigned i=Performance::begin_id_size; i<=Performance::end_id_size; i *= Performance::id_step)
|
for (unsigned i = Performance::begin_id_size; i <= Performance::end_id_size;
|
||||||
test_global_to_local_ids<Kokkos::Experimental::ROCm>(i);
|
i *= Performance::id_step)
|
||||||
|
test_global_to_local_ids<Kokkos::Experimental::HIP>(i);
|
||||||
}
|
}
|
||||||
|
|
||||||
#endif
|
TEST(TEST_CATEGORY, unordered_map_performance_near) {
|
||||||
TEST_F(rocm, unordered_map_performance_near) {
|
Perf::run_performance_tests<Kokkos::Experimental::HIP, true>("hip-near");
|
||||||
Perf::run_performance_tests<Kokkos::Experimental::ROCm, true>("rocm-near");
|
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(rocm, unordered_map_performance_far) {
|
TEST(TEST_CATEGORY, unordered_map_performance_far) {
|
||||||
Perf::run_performance_tests<Kokkos::Experimental::ROCm, false>("rocm-far");
|
Perf::run_performance_tests<Kokkos::Experimental::HIP, false>("hip-far");
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Performance
|
} // namespace Performance
|
||||||
#else
|
|
||||||
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTROCM_PREVENT_EMPTY_LINK_ERROR() {}
|
|
||||||
#endif /* #if defined( KOKKOS_ENABLE_ROCM ) */
|
|
||||||
@ -43,7 +43,6 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
#if defined(KOKKOS_ENABLE_HPX)
|
|
||||||
|
|
||||||
#include <gtest/gtest.h>
|
#include <gtest/gtest.h>
|
||||||
|
|
||||||
@ -64,25 +63,13 @@
|
|||||||
|
|
||||||
namespace Performance {
|
namespace Performance {
|
||||||
|
|
||||||
class hpx : public ::testing::Test {
|
TEST(TEST_CATEGORY, dynrankview_perf) {
|
||||||
protected:
|
|
||||||
static void SetUpTestCase() {
|
|
||||||
std::cout << std::setprecision(5) << std::scientific;
|
|
||||||
|
|
||||||
Kokkos::initialize();
|
|
||||||
Kokkos::print_configuration(std::cout);
|
|
||||||
}
|
|
||||||
|
|
||||||
static void TearDownTestCase() { Kokkos::finalize(); }
|
|
||||||
};
|
|
||||||
|
|
||||||
TEST_F(hpx, dynrankview_perf) {
|
|
||||||
std::cout << "HPX" << std::endl;
|
std::cout << "HPX" << std::endl;
|
||||||
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
||||||
test_dynrankview_op_perf<Kokkos::Experimental::HPX>(8192);
|
test_dynrankview_op_perf<Kokkos::Experimental::HPX>(8192);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(hpx, global_2_local) {
|
TEST(TEST_CATEGORY, global_2_local) {
|
||||||
std::cout << "HPX" << std::endl;
|
std::cout << "HPX" << std::endl;
|
||||||
std::cout << "size, create, generate, fill, find" << std::endl;
|
std::cout << "size, create, generate, fill, find" << std::endl;
|
||||||
for (unsigned i = Performance::begin_id_size; i <= Performance::end_id_size;
|
for (unsigned i = Performance::begin_id_size; i <= Performance::end_id_size;
|
||||||
@ -90,7 +77,7 @@ TEST_F(hpx, global_2_local) {
|
|||||||
test_global_to_local_ids<Kokkos::Experimental::HPX>(i);
|
test_global_to_local_ids<Kokkos::Experimental::HPX>(i);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(hpx, unordered_map_performance_near) {
|
TEST(TEST_CATEGORY, unordered_map_performance_near) {
|
||||||
unsigned num_hpx = 4;
|
unsigned num_hpx = 4;
|
||||||
std::ostringstream base_file_name;
|
std::ostringstream base_file_name;
|
||||||
base_file_name << "hpx-" << num_hpx << "-near";
|
base_file_name << "hpx-" << num_hpx << "-near";
|
||||||
@ -98,7 +85,7 @@ TEST_F(hpx, unordered_map_performance_near) {
|
|||||||
base_file_name.str());
|
base_file_name.str());
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(hpx, unordered_map_performance_far) {
|
TEST(TEST_CATEGORY, unordered_map_performance_far) {
|
||||||
unsigned num_hpx = 4;
|
unsigned num_hpx = 4;
|
||||||
std::ostringstream base_file_name;
|
std::ostringstream base_file_name;
|
||||||
base_file_name << "hpx-" << num_hpx << "-far";
|
base_file_name << "hpx-" << num_hpx << "-far";
|
||||||
@ -106,7 +93,7 @@ TEST_F(hpx, unordered_map_performance_far) {
|
|||||||
base_file_name.str());
|
base_file_name.str());
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(hpx, scatter_view) {
|
TEST(TEST_CATEGORY, scatter_view) {
|
||||||
std::cout << "ScatterView data-duplicated test:\n";
|
std::cout << "ScatterView data-duplicated test:\n";
|
||||||
Perf::test_scatter_view<Kokkos::Experimental::HPX, Kokkos::LayoutRight,
|
Perf::test_scatter_view<Kokkos::Experimental::HPX, Kokkos::LayoutRight,
|
||||||
Kokkos::Experimental::ScatterDuplicated,
|
Kokkos::Experimental::ScatterDuplicated,
|
||||||
@ -119,6 +106,3 @@ TEST_F(hpx, scatter_view) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Performance
|
} // namespace Performance
|
||||||
#else
|
|
||||||
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTHPX_PREVENT_EMPTY_LINK_ERROR() {}
|
|
||||||
#endif
|
|
||||||
|
|||||||
@ -45,9 +45,13 @@
|
|||||||
#include <gtest/gtest.h>
|
#include <gtest/gtest.h>
|
||||||
#include <cstdlib>
|
#include <cstdlib>
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Core.hpp>
|
||||||
|
|
||||||
int main(int argc, char *argv[]) {
|
int main(int argc, char *argv[]) {
|
||||||
|
Kokkos::initialize(argc, argv);
|
||||||
::testing::InitGoogleTest(&argc, argv);
|
::testing::InitGoogleTest(&argc, argv);
|
||||||
return RUN_ALL_TESTS();
|
|
||||||
|
int result = RUN_ALL_TESTS();
|
||||||
|
Kokkos::finalize();
|
||||||
|
return result;
|
||||||
}
|
}
|
||||||
|
|||||||
@ -43,7 +43,6 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
#if defined(KOKKOS_ENABLE_OPENMP)
|
|
||||||
|
|
||||||
#include <gtest/gtest.h>
|
#include <gtest/gtest.h>
|
||||||
|
|
||||||
@ -64,25 +63,13 @@
|
|||||||
|
|
||||||
namespace Performance {
|
namespace Performance {
|
||||||
|
|
||||||
class openmp : public ::testing::Test {
|
TEST(TEST_CATEGORY, dynrankview_perf) {
|
||||||
protected:
|
|
||||||
static void SetUpTestCase() {
|
|
||||||
std::cout << std::setprecision(5) << std::scientific;
|
|
||||||
|
|
||||||
Kokkos::initialize();
|
|
||||||
Kokkos::OpenMP::print_configuration(std::cout);
|
|
||||||
}
|
|
||||||
|
|
||||||
static void TearDownTestCase() { Kokkos::finalize(); }
|
|
||||||
};
|
|
||||||
|
|
||||||
TEST_F(openmp, dynrankview_perf) {
|
|
||||||
std::cout << "OpenMP" << std::endl;
|
std::cout << "OpenMP" << std::endl;
|
||||||
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
||||||
test_dynrankview_op_perf<Kokkos::OpenMP>(8192);
|
test_dynrankview_op_perf<Kokkos::OpenMP>(8192);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(openmp, global_2_local) {
|
TEST(TEST_CATEGORY, global_2_local) {
|
||||||
std::cout << "OpenMP" << std::endl;
|
std::cout << "OpenMP" << std::endl;
|
||||||
std::cout << "size, create, generate, fill, find" << std::endl;
|
std::cout << "size, create, generate, fill, find" << std::endl;
|
||||||
for (unsigned i = Performance::begin_id_size; i <= Performance::end_id_size;
|
for (unsigned i = Performance::begin_id_size; i <= Performance::end_id_size;
|
||||||
@ -90,7 +77,7 @@ TEST_F(openmp, global_2_local) {
|
|||||||
test_global_to_local_ids<Kokkos::OpenMP>(i);
|
test_global_to_local_ids<Kokkos::OpenMP>(i);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(openmp, unordered_map_performance_near) {
|
TEST(TEST_CATEGORY, unordered_map_performance_near) {
|
||||||
unsigned num_openmp = 4;
|
unsigned num_openmp = 4;
|
||||||
if (Kokkos::hwloc::available()) {
|
if (Kokkos::hwloc::available()) {
|
||||||
num_openmp = Kokkos::hwloc::get_available_numa_count() *
|
num_openmp = Kokkos::hwloc::get_available_numa_count() *
|
||||||
@ -102,7 +89,7 @@ TEST_F(openmp, unordered_map_performance_near) {
|
|||||||
Perf::run_performance_tests<Kokkos::OpenMP, true>(base_file_name.str());
|
Perf::run_performance_tests<Kokkos::OpenMP, true>(base_file_name.str());
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(openmp, unordered_map_performance_far) {
|
TEST(TEST_CATEGORY, unordered_map_performance_far) {
|
||||||
unsigned num_openmp = 4;
|
unsigned num_openmp = 4;
|
||||||
if (Kokkos::hwloc::available()) {
|
if (Kokkos::hwloc::available()) {
|
||||||
num_openmp = Kokkos::hwloc::get_available_numa_count() *
|
num_openmp = Kokkos::hwloc::get_available_numa_count() *
|
||||||
@ -114,7 +101,7 @@ TEST_F(openmp, unordered_map_performance_far) {
|
|||||||
Perf::run_performance_tests<Kokkos::OpenMP, false>(base_file_name.str());
|
Perf::run_performance_tests<Kokkos::OpenMP, false>(base_file_name.str());
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(openmp, scatter_view) {
|
TEST(TEST_CATEGORY, scatter_view) {
|
||||||
std::cout << "ScatterView data-duplicated test:\n";
|
std::cout << "ScatterView data-duplicated test:\n";
|
||||||
Perf::test_scatter_view<Kokkos::OpenMP, Kokkos::LayoutRight,
|
Perf::test_scatter_view<Kokkos::OpenMP, Kokkos::LayoutRight,
|
||||||
Kokkos::Experimental::ScatterDuplicated,
|
Kokkos::Experimental::ScatterDuplicated,
|
||||||
@ -127,7 +114,3 @@ TEST_F(openmp, scatter_view) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Performance
|
} // namespace Performance
|
||||||
#else
|
|
||||||
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTOPENMP_PREVENT_EMPTY_LINK_ERROR() {
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|||||||
@ -43,7 +43,6 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
#if defined(KOKKOS_ENABLE_THREADS)
|
|
||||||
|
|
||||||
#include <gtest/gtest.h>
|
#include <gtest/gtest.h>
|
||||||
|
|
||||||
@ -65,34 +64,13 @@
|
|||||||
|
|
||||||
namespace Performance {
|
namespace Performance {
|
||||||
|
|
||||||
class threads : public ::testing::Test {
|
TEST(threads, dynrankview_perf) {
|
||||||
protected:
|
|
||||||
static void SetUpTestCase() {
|
|
||||||
std::cout << std::setprecision(5) << std::scientific;
|
|
||||||
|
|
||||||
unsigned num_threads = 4;
|
|
||||||
|
|
||||||
if (Kokkos::hwloc::available()) {
|
|
||||||
num_threads = Kokkos::hwloc::get_available_numa_count() *
|
|
||||||
Kokkos::hwloc::get_available_cores_per_numa() *
|
|
||||||
Kokkos::hwloc::get_available_threads_per_core();
|
|
||||||
}
|
|
||||||
|
|
||||||
std::cout << "Threads: " << num_threads << std::endl;
|
|
||||||
|
|
||||||
Kokkos::initialize(Kokkos::InitArguments(num_threads));
|
|
||||||
}
|
|
||||||
|
|
||||||
static void TearDownTestCase() { Kokkos::finalize(); }
|
|
||||||
};
|
|
||||||
|
|
||||||
TEST_F(threads, dynrankview_perf) {
|
|
||||||
std::cout << "Threads" << std::endl;
|
std::cout << "Threads" << std::endl;
|
||||||
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
||||||
test_dynrankview_op_perf<Kokkos::Threads>(8192);
|
test_dynrankview_op_perf<Kokkos::Threads>(8192);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(threads, global_2_local) {
|
TEST(threads, global_2_local) {
|
||||||
std::cout << "Threads" << std::endl;
|
std::cout << "Threads" << std::endl;
|
||||||
std::cout << "size, create, generate, fill, find" << std::endl;
|
std::cout << "size, create, generate, fill, find" << std::endl;
|
||||||
for (unsigned i = Performance::begin_id_size; i <= Performance::end_id_size;
|
for (unsigned i = Performance::begin_id_size; i <= Performance::end_id_size;
|
||||||
@ -100,7 +78,7 @@ TEST_F(threads, global_2_local) {
|
|||||||
test_global_to_local_ids<Kokkos::Threads>(i);
|
test_global_to_local_ids<Kokkos::Threads>(i);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(threads, unordered_map_performance_near) {
|
TEST(threads, unordered_map_performance_near) {
|
||||||
unsigned num_threads = 4;
|
unsigned num_threads = 4;
|
||||||
if (Kokkos::hwloc::available()) {
|
if (Kokkos::hwloc::available()) {
|
||||||
num_threads = Kokkos::hwloc::get_available_numa_count() *
|
num_threads = Kokkos::hwloc::get_available_numa_count() *
|
||||||
@ -112,7 +90,7 @@ TEST_F(threads, unordered_map_performance_near) {
|
|||||||
Perf::run_performance_tests<Kokkos::Threads, true>(base_file_name.str());
|
Perf::run_performance_tests<Kokkos::Threads, true>(base_file_name.str());
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(threads, unordered_map_performance_far) {
|
TEST(threads, unordered_map_performance_far) {
|
||||||
unsigned num_threads = 4;
|
unsigned num_threads = 4;
|
||||||
if (Kokkos::hwloc::available()) {
|
if (Kokkos::hwloc::available()) {
|
||||||
num_threads = Kokkos::hwloc::get_available_numa_count() *
|
num_threads = Kokkos::hwloc::get_available_numa_count() *
|
||||||
@ -125,8 +103,3 @@ TEST_F(threads, unordered_map_performance_far) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Performance
|
} // namespace Performance
|
||||||
|
|
||||||
#else
|
|
||||||
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTTHREADS_PREVENT_EMPTY_LINK_ERROR() {
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|||||||
@ -74,7 +74,7 @@ template <typename Device>
|
|||||||
class Bitset {
|
class Bitset {
|
||||||
public:
|
public:
|
||||||
using execution_space = Device;
|
using execution_space = Device;
|
||||||
using size_type = unsigned;
|
using size_type = unsigned int;
|
||||||
|
|
||||||
enum { BIT_SCAN_REVERSE = 1u };
|
enum { BIT_SCAN_REVERSE = 1u };
|
||||||
enum { MOVE_HINT_BACKWARD = 2u };
|
enum { MOVE_HINT_BACKWARD = 2u };
|
||||||
@ -309,7 +309,7 @@ template <typename Device>
|
|||||||
class ConstBitset {
|
class ConstBitset {
|
||||||
public:
|
public:
|
||||||
using execution_space = Device;
|
using execution_space = Device;
|
||||||
using size_type = unsigned;
|
using size_type = unsigned int;
|
||||||
|
|
||||||
private:
|
private:
|
||||||
enum { block_size = static_cast<unsigned>(sizeof(unsigned) * CHAR_BIT) };
|
enum { block_size = static_cast<unsigned>(sizeof(unsigned) * CHAR_BIT) };
|
||||||
|
|||||||
@ -162,7 +162,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
/// \brief The type of a const, random-access View host mirror of
|
/// \brief The type of a const, random-access View host mirror of
|
||||||
/// \c t_dev_const_randomread.
|
/// \c t_dev_const_randomread.
|
||||||
using t_host_const_randomread_um =
|
using t_host_const_randomread_um =
|
||||||
typename t_dev_const_randomread::HostMirror;
|
typename t_dev_const_randomread_um::HostMirror;
|
||||||
|
|
||||||
//@}
|
//@}
|
||||||
//! \name Counters to keep track of changes ("modified" flags)
|
//! \name Counters to keep track of changes ("modified" flags)
|
||||||
@ -245,21 +245,6 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
h_view(create_mirror_view(d_view)) // without UVM, host View mirrors
|
h_view(create_mirror_view(d_view)) // without UVM, host View mirrors
|
||||||
{}
|
{}
|
||||||
|
|
||||||
explicit inline DualView(const ViewAllocateWithoutInitializing& arg_prop,
|
|
||||||
const size_t arg_N0 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
|
||||||
const size_t arg_N1 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
|
||||||
const size_t arg_N2 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
|
||||||
const size_t arg_N3 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
|
||||||
const size_t arg_N4 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
|
||||||
const size_t arg_N5 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
|
||||||
const size_t arg_N6 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
|
||||||
const size_t arg_N7 = KOKKOS_IMPL_CTOR_DEFAULT_ARG)
|
|
||||||
: DualView(Impl::ViewCtorProp<std::string,
|
|
||||||
Kokkos::Impl::WithoutInitializing_t>(
|
|
||||||
arg_prop.label, Kokkos::WithoutInitializing),
|
|
||||||
arg_N0, arg_N1, arg_N2, arg_N3, arg_N4, arg_N5, arg_N6,
|
|
||||||
arg_N7) {}
|
|
||||||
|
|
||||||
//! Copy constructor (shallow copy)
|
//! Copy constructor (shallow copy)
|
||||||
template <class SS, class LS, class DS, class MS>
|
template <class SS, class LS, class DS, class MS>
|
||||||
DualView(const DualView<SS, LS, DS, MS>& src)
|
DualView(const DualView<SS, LS, DS, MS>& src)
|
||||||
@ -457,7 +442,21 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
}
|
}
|
||||||
return dev;
|
return dev;
|
||||||
}
|
}
|
||||||
|
static constexpr const int view_header_size = 128;
|
||||||
|
void impl_report_host_sync() const noexcept {
|
||||||
|
Kokkos::Tools::syncDualView(
|
||||||
|
h_view.label(),
|
||||||
|
reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(h_view.data()) -
|
||||||
|
view_header_size),
|
||||||
|
false);
|
||||||
|
}
|
||||||
|
void impl_report_device_sync() const noexcept {
|
||||||
|
Kokkos::Tools::syncDualView(
|
||||||
|
d_view.label(),
|
||||||
|
reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(d_view.data()) -
|
||||||
|
view_header_size),
|
||||||
|
true);
|
||||||
|
}
|
||||||
/// \brief Update data on device or host only if data in the other
|
/// \brief Update data on device or host only if data in the other
|
||||||
/// space has been marked as modified.
|
/// space has been marked as modified.
|
||||||
///
|
///
|
||||||
@ -499,6 +498,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
|
|
||||||
deep_copy(d_view, h_view);
|
deep_copy(d_view, h_view);
|
||||||
modified_flags(0) = modified_flags(1) = 0;
|
modified_flags(0) = modified_flags(1) = 0;
|
||||||
|
impl_report_device_sync();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if (dev == 0) { // hopefully Device is the same as DualView's host type
|
if (dev == 0) { // hopefully Device is the same as DualView's host type
|
||||||
@ -515,6 +515,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
|
|
||||||
deep_copy(h_view, d_view);
|
deep_copy(h_view, d_view);
|
||||||
modified_flags(0) = modified_flags(1) = 0;
|
modified_flags(0) = modified_flags(1) = 0;
|
||||||
|
impl_report_host_sync();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if (std::is_same<typename t_host::memory_space,
|
if (std::is_same<typename t_host::memory_space,
|
||||||
@ -539,12 +540,14 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
Impl::throw_runtime_exception(
|
Impl::throw_runtime_exception(
|
||||||
"Calling sync on a DualView with a const datatype.");
|
"Calling sync on a DualView with a const datatype.");
|
||||||
}
|
}
|
||||||
|
impl_report_device_sync();
|
||||||
}
|
}
|
||||||
if (dev == 0) { // hopefully Device is the same as DualView's host type
|
if (dev == 0) { // hopefully Device is the same as DualView's host type
|
||||||
if ((modified_flags(1) > 0) && (modified_flags(1) >= modified_flags(0))) {
|
if ((modified_flags(1) > 0) && (modified_flags(1) >= modified_flags(0))) {
|
||||||
Impl::throw_runtime_exception(
|
Impl::throw_runtime_exception(
|
||||||
"Calling sync on a DualView with a const datatype.");
|
"Calling sync on a DualView with a const datatype.");
|
||||||
}
|
}
|
||||||
|
impl_report_host_sync();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -567,6 +570,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
|
|
||||||
deep_copy(h_view, d_view);
|
deep_copy(h_view, d_view);
|
||||||
modified_flags(1) = modified_flags(0) = 0;
|
modified_flags(1) = modified_flags(0) = 0;
|
||||||
|
impl_report_host_sync();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -589,6 +593,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
|
|
||||||
deep_copy(d_view, h_view);
|
deep_copy(d_view, h_view);
|
||||||
modified_flags(1) = modified_flags(0) = 0;
|
modified_flags(1) = modified_flags(0) = 0;
|
||||||
|
impl_report_device_sync();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -619,7 +624,20 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
if (modified_flags.data() == nullptr) return false;
|
if (modified_flags.data() == nullptr) return false;
|
||||||
return modified_flags(1) < modified_flags(0);
|
return modified_flags(1) < modified_flags(0);
|
||||||
}
|
}
|
||||||
|
void impl_report_device_modification() {
|
||||||
|
Kokkos::Tools::modifyDualView(
|
||||||
|
d_view.label(),
|
||||||
|
reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(d_view.data()) -
|
||||||
|
view_header_size),
|
||||||
|
true);
|
||||||
|
}
|
||||||
|
void impl_report_host_modification() {
|
||||||
|
Kokkos::Tools::modifyDualView(
|
||||||
|
h_view.label(),
|
||||||
|
reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(h_view.data()) -
|
||||||
|
view_header_size),
|
||||||
|
false);
|
||||||
|
}
|
||||||
/// \brief Mark data as modified on the given device \c Device.
|
/// \brief Mark data as modified on the given device \c Device.
|
||||||
///
|
///
|
||||||
/// If \c Device is the same as this DualView's device type, then
|
/// If \c Device is the same as this DualView's device type, then
|
||||||
@ -636,6 +654,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
(modified_flags(1) > modified_flags(0) ? modified_flags(1)
|
(modified_flags(1) > modified_flags(0) ? modified_flags(1)
|
||||||
: modified_flags(0)) +
|
: modified_flags(0)) +
|
||||||
1;
|
1;
|
||||||
|
impl_report_device_modification();
|
||||||
}
|
}
|
||||||
if (dev == 0) { // hopefully Device is the same as DualView's host type
|
if (dev == 0) { // hopefully Device is the same as DualView's host type
|
||||||
// Increment the host's modified count.
|
// Increment the host's modified count.
|
||||||
@ -643,6 +662,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
(modified_flags(1) > modified_flags(0) ? modified_flags(1)
|
(modified_flags(1) > modified_flags(0) ? modified_flags(1)
|
||||||
: modified_flags(0)) +
|
: modified_flags(0)) +
|
||||||
1;
|
1;
|
||||||
|
impl_report_host_modification();
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
|
#ifdef KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
|
||||||
@ -663,6 +683,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
(modified_flags(1) > modified_flags(0) ? modified_flags(1)
|
(modified_flags(1) > modified_flags(0) ? modified_flags(1)
|
||||||
: modified_flags(0)) +
|
: modified_flags(0)) +
|
||||||
1;
|
1;
|
||||||
|
impl_report_host_modification();
|
||||||
#ifdef KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
|
#ifdef KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
|
||||||
if (modified_flags(0) && modified_flags(1)) {
|
if (modified_flags(0) && modified_flags(1)) {
|
||||||
std::string msg = "Kokkos::DualView::modify_host ERROR: ";
|
std::string msg = "Kokkos::DualView::modify_host ERROR: ";
|
||||||
@ -682,6 +703,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
|
|||||||
(modified_flags(1) > modified_flags(0) ? modified_flags(1)
|
(modified_flags(1) > modified_flags(0) ? modified_flags(1)
|
||||||
: modified_flags(0)) +
|
: modified_flags(0)) +
|
||||||
1;
|
1;
|
||||||
|
impl_report_device_modification();
|
||||||
#ifdef KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
|
#ifdef KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
|
||||||
if (modified_flags(0) && modified_flags(1)) {
|
if (modified_flags(0) && modified_flags(1)) {
|
||||||
std::string msg = "Kokkos::DualView::modify_device ERROR: ";
|
std::string msg = "Kokkos::DualView::modify_device ERROR: ";
|
||||||
|
|||||||
@ -245,10 +245,13 @@ KOKKOS_INLINE_FUNCTION bool dyn_rank_view_verify_operator_bounds(
|
|||||||
return (size_t(i) < map.extent(R)) &&
|
return (size_t(i) < map.extent(R)) &&
|
||||||
dyn_rank_view_verify_operator_bounds<R + 1>(rank, map, args...);
|
dyn_rank_view_verify_operator_bounds<R + 1>(rank, map, args...);
|
||||||
} else if (i != 0) {
|
} else if (i != 0) {
|
||||||
|
// FIXME_SYCL SYCL doesn't allow printf in kernels
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
printf(
|
printf(
|
||||||
"DynRankView Debug Bounds Checking Error: at rank %u\n Extra "
|
"DynRankView Debug Bounds Checking Error: at rank %u\n Extra "
|
||||||
"arguments beyond the rank must be zero \n",
|
"arguments beyond the rank must be zero \n",
|
||||||
R);
|
R);
|
||||||
|
#endif
|
||||||
return (false) &&
|
return (false) &&
|
||||||
dyn_rank_view_verify_operator_bounds<R + 1>(rank, map, args...);
|
dyn_rank_view_verify_operator_bounds<R + 1>(rank, map, args...);
|
||||||
} else {
|
} else {
|
||||||
@ -1264,33 +1267,6 @@ class DynRankView : public ViewTraits<DataType, Properties...> {
|
|||||||
typename traits::array_layout(arg_N0, arg_N1, arg_N2, arg_N3,
|
typename traits::array_layout(arg_N0, arg_N1, arg_N2, arg_N3,
|
||||||
arg_N4, arg_N5, arg_N6, arg_N7)) {}
|
arg_N4, arg_N5, arg_N6, arg_N7)) {}
|
||||||
|
|
||||||
// For backward compatibility
|
|
||||||
// NDE This ctor does not take ViewCtorProp argument - should not use
|
|
||||||
// alternative createLayout call
|
|
||||||
explicit inline DynRankView(const ViewAllocateWithoutInitializing& arg_prop,
|
|
||||||
const typename traits::array_layout& arg_layout)
|
|
||||||
: DynRankView(
|
|
||||||
Kokkos::Impl::ViewCtorProp<std::string,
|
|
||||||
Kokkos::Impl::WithoutInitializing_t>(
|
|
||||||
arg_prop.label, Kokkos::WithoutInitializing),
|
|
||||||
arg_layout) {}
|
|
||||||
|
|
||||||
explicit inline DynRankView(const ViewAllocateWithoutInitializing& arg_prop,
|
|
||||||
const size_t arg_N0 = KOKKOS_INVALID_INDEX,
|
|
||||||
const size_t arg_N1 = KOKKOS_INVALID_INDEX,
|
|
||||||
const size_t arg_N2 = KOKKOS_INVALID_INDEX,
|
|
||||||
const size_t arg_N3 = KOKKOS_INVALID_INDEX,
|
|
||||||
const size_t arg_N4 = KOKKOS_INVALID_INDEX,
|
|
||||||
const size_t arg_N5 = KOKKOS_INVALID_INDEX,
|
|
||||||
const size_t arg_N6 = KOKKOS_INVALID_INDEX,
|
|
||||||
const size_t arg_N7 = KOKKOS_INVALID_INDEX)
|
|
||||||
: DynRankView(
|
|
||||||
Kokkos::Impl::ViewCtorProp<std::string,
|
|
||||||
Kokkos::Impl::WithoutInitializing_t>(
|
|
||||||
arg_prop.label, Kokkos::WithoutInitializing),
|
|
||||||
typename traits::array_layout(arg_N0, arg_N1, arg_N2, arg_N3,
|
|
||||||
arg_N4, arg_N5, arg_N6, arg_N7)) {}
|
|
||||||
|
|
||||||
//----------------------------------------
|
//----------------------------------------
|
||||||
// Memory span required to wrap these dimensions.
|
// Memory span required to wrap these dimensions.
|
||||||
static constexpr size_t required_allocation_size(
|
static constexpr size_t required_allocation_size(
|
||||||
@ -1401,7 +1377,7 @@ struct DynRankSubviewTag {};
|
|||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
template <class SrcTraits, class... Args>
|
template <class SrcTraits, class... Args>
|
||||||
struct ViewMapping<
|
class ViewMapping<
|
||||||
typename std::enable_if<
|
typename std::enable_if<
|
||||||
(std::is_same<typename SrcTraits::specialize, void>::value &&
|
(std::is_same<typename SrcTraits::specialize, void>::value &&
|
||||||
(std::is_same<typename SrcTraits::array_layout,
|
(std::is_same<typename SrcTraits::array_layout,
|
||||||
@ -2052,7 +2028,7 @@ create_mirror_view_and_copy(
|
|||||||
nullptr) {
|
nullptr) {
|
||||||
using Mirror = typename Impl::MirrorDRViewType<Space, T, P...>::view_type;
|
using Mirror = typename Impl::MirrorDRViewType<Space, T, P...>::view_type;
|
||||||
std::string label = name.empty() ? src.label() : name;
|
std::string label = name.empty() ? src.label() : name;
|
||||||
auto mirror = Mirror(Kokkos::ViewAllocateWithoutInitializing(label),
|
auto mirror = Mirror(view_alloc(WithoutInitializing, label),
|
||||||
Impl::reconstructLayout(src.layout(), src.rank()));
|
Impl::reconstructLayout(src.layout(), src.rank()));
|
||||||
deep_copy(mirror, src);
|
deep_copy(mirror, src);
|
||||||
return mirror;
|
return mirror;
|
||||||
|
|||||||
@ -1940,7 +1940,7 @@ create_mirror(
|
|||||||
const Kokkos::Experimental::OffsetView<T, P...>& src,
|
const Kokkos::Experimental::OffsetView<T, P...>& src,
|
||||||
typename std::enable_if<
|
typename std::enable_if<
|
||||||
!std::is_same<typename Kokkos::ViewTraits<T, P...>::array_layout,
|
!std::is_same<typename Kokkos::ViewTraits<T, P...>::array_layout,
|
||||||
Kokkos::LayoutStride>::value>::type* = 0) {
|
Kokkos::LayoutStride>::value>::type* = nullptr) {
|
||||||
using src_type = Experimental::OffsetView<T, P...>;
|
using src_type = Experimental::OffsetView<T, P...>;
|
||||||
using dst_type = typename src_type::HostMirror;
|
using dst_type = typename src_type::HostMirror;
|
||||||
|
|
||||||
@ -1960,7 +1960,7 @@ create_mirror(
|
|||||||
const Kokkos::Experimental::OffsetView<T, P...>& src,
|
const Kokkos::Experimental::OffsetView<T, P...>& src,
|
||||||
typename std::enable_if<
|
typename std::enable_if<
|
||||||
std::is_same<typename Kokkos::ViewTraits<T, P...>::array_layout,
|
std::is_same<typename Kokkos::ViewTraits<T, P...>::array_layout,
|
||||||
Kokkos::LayoutStride>::value>::type* = 0) {
|
Kokkos::LayoutStride>::value>::type* = nullptr) {
|
||||||
using src_type = Experimental::OffsetView<T, P...>;
|
using src_type = Experimental::OffsetView<T, P...>;
|
||||||
using dst_type = typename src_type::HostMirror;
|
using dst_type = typename src_type::HostMirror;
|
||||||
|
|
||||||
@ -2028,7 +2028,7 @@ create_mirror_view(
|
|||||||
std::is_same<
|
std::is_same<
|
||||||
typename Kokkos::Experimental::OffsetView<T, P...>::data_type,
|
typename Kokkos::Experimental::OffsetView<T, P...>::data_type,
|
||||||
typename Kokkos::Experimental::OffsetView<
|
typename Kokkos::Experimental::OffsetView<
|
||||||
T, P...>::HostMirror::data_type>::value)>::type* = 0) {
|
T, P...>::HostMirror::data_type>::value)>::type* = nullptr) {
|
||||||
return Kokkos::create_mirror(src);
|
return Kokkos::create_mirror(src);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -2038,7 +2038,7 @@ typename Kokkos::Impl::MirrorOffsetViewType<Space, T, P...>::view_type
|
|||||||
create_mirror_view(const Space&,
|
create_mirror_view(const Space&,
|
||||||
const Kokkos::Experimental::OffsetView<T, P...>& src,
|
const Kokkos::Experimental::OffsetView<T, P...>& src,
|
||||||
typename std::enable_if<Impl::MirrorOffsetViewType<
|
typename std::enable_if<Impl::MirrorOffsetViewType<
|
||||||
Space, T, P...>::is_same_memspace>::type* = 0) {
|
Space, T, P...>::is_same_memspace>::type* = nullptr) {
|
||||||
return src;
|
return src;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -2048,7 +2048,7 @@ typename Kokkos::Impl::MirrorOffsetViewType<Space, T, P...>::view_type
|
|||||||
create_mirror_view(const Space&,
|
create_mirror_view(const Space&,
|
||||||
const Kokkos::Experimental::OffsetView<T, P...>& src,
|
const Kokkos::Experimental::OffsetView<T, P...>& src,
|
||||||
typename std::enable_if<!Impl::MirrorOffsetViewType<
|
typename std::enable_if<!Impl::MirrorOffsetViewType<
|
||||||
Space, T, P...>::is_same_memspace>::type* = 0) {
|
Space, T, P...>::is_same_memspace>::type* = nullptr) {
|
||||||
return typename Kokkos::Impl::MirrorOffsetViewType<Space, T, P...>::view_type(
|
return typename Kokkos::Impl::MirrorOffsetViewType<Space, T, P...>::view_type(
|
||||||
src.label(), src.layout(),
|
src.label(), src.layout(),
|
||||||
{src.begin(0), src.begin(1), src.begin(2), src.begin(3), src.begin(4),
|
{src.begin(0), src.begin(1), src.begin(2), src.begin(3), src.begin(4),
|
||||||
@ -2063,7 +2063,7 @@ create_mirror_view(const Space&,
|
|||||||
// , std::string const& name = ""
|
// , std::string const& name = ""
|
||||||
// , typename
|
// , typename
|
||||||
// std::enable_if<Impl::MirrorViewType<Space,T,P
|
// std::enable_if<Impl::MirrorViewType<Space,T,P
|
||||||
// ...>::is_same_memspace>::type* = 0 ) {
|
// ...>::is_same_memspace>::type* = nullptr) {
|
||||||
// (void)name;
|
// (void)name;
|
||||||
// return src;
|
// return src;
|
||||||
// }
|
// }
|
||||||
@ -2076,11 +2076,11 @@ create_mirror_view(const Space&,
|
|||||||
// , std::string const& name = ""
|
// , std::string const& name = ""
|
||||||
// , typename
|
// , typename
|
||||||
// std::enable_if<!Impl::MirrorViewType<Space,T,P
|
// std::enable_if<!Impl::MirrorViewType<Space,T,P
|
||||||
// ...>::is_same_memspace>::type* = 0 ) {
|
// ...>::is_same_memspace>::type* = nullptr) {
|
||||||
// using Mirror = typename
|
// using Mirror = typename
|
||||||
// Kokkos::Experimental::Impl::MirrorViewType<Space,T,P ...>::view_type;
|
// Kokkos::Experimental::Impl::MirrorViewType<Space,T,P ...>::view_type;
|
||||||
// std::string label = name.empty() ? src.label() : name;
|
// std::string label = name.empty() ? src.label() : name;
|
||||||
// auto mirror = Mirror(ViewAllocateWithoutInitializing(label), src.layout(),
|
// auto mirror = Mirror(view_alloc(WithoutInitializing, label), src.layout(),
|
||||||
// { src.begin(0), src.begin(1), src.begin(2),
|
// { src.begin(0), src.begin(1), src.begin(2),
|
||||||
// src.begin(3), src.begin(4),
|
// src.begin(3), src.begin(4),
|
||||||
// src.begin(5), src.begin(6), src.begin(7) });
|
// src.begin(5), src.begin(6), src.begin(7) });
|
||||||
|
|||||||
@ -206,6 +206,23 @@ struct DefaultContribution<Kokkos::Experimental::HIP,
|
|||||||
};
|
};
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
#ifdef KOKKOS_ENABLE_SYCL
|
||||||
|
template <>
|
||||||
|
struct DefaultDuplication<Kokkos::Experimental::SYCL> {
|
||||||
|
using type = Kokkos::Experimental::ScatterNonDuplicated;
|
||||||
|
};
|
||||||
|
template <>
|
||||||
|
struct DefaultContribution<Kokkos::Experimental::SYCL,
|
||||||
|
Kokkos::Experimental::ScatterNonDuplicated> {
|
||||||
|
using type = Kokkos::Experimental::ScatterAtomic;
|
||||||
|
};
|
||||||
|
template <>
|
||||||
|
struct DefaultContribution<Kokkos::Experimental::SYCL,
|
||||||
|
Kokkos::Experimental::ScatterDuplicated> {
|
||||||
|
using type = Kokkos::Experimental::ScatterAtomic;
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
|
||||||
// FIXME All these scatter values need overhaul:
|
// FIXME All these scatter values need overhaul:
|
||||||
// - like should they be copyable at all?
|
// - like should they be copyable at all?
|
||||||
// - what is the internal handle type
|
// - what is the internal handle type
|
||||||
@ -636,19 +653,10 @@ struct ReduceDuplicatesBase {
|
|||||||
size_t stride_in, size_t start_in, size_t n_in,
|
size_t stride_in, size_t start_in, size_t n_in,
|
||||||
std::string const& name)
|
std::string const& name)
|
||||||
: src(src_in), dst(dest_in), stride(stride_in), start(start_in), n(n_in) {
|
: src(src_in), dst(dest_in), stride(stride_in), start(start_in), n(n_in) {
|
||||||
uint64_t kpID = 0;
|
parallel_for(
|
||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
std::string("Kokkos::ScatterView::ReduceDuplicates [") + name + "]",
|
||||||
Kokkos::Profiling::beginParallelFor(std::string("reduce_") + name, 0,
|
RangePolicy<ExecSpace, size_t>(0, stride),
|
||||||
&kpID);
|
static_cast<Derived const&>(*this));
|
||||||
}
|
|
||||||
using policy_type = RangePolicy<ExecSpace, size_t>;
|
|
||||||
using closure_type = Kokkos::Impl::ParallelFor<Derived, policy_type>;
|
|
||||||
const closure_type closure(*(static_cast<Derived*>(this)),
|
|
||||||
policy_type(0, stride));
|
|
||||||
closure.execute();
|
|
||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
|
||||||
Kokkos::Profiling::endParallelFor(kpID);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -682,19 +690,10 @@ struct ResetDuplicatesBase {
|
|||||||
ResetDuplicatesBase(ValueType* data_in, size_t size_in,
|
ResetDuplicatesBase(ValueType* data_in, size_t size_in,
|
||||||
std::string const& name)
|
std::string const& name)
|
||||||
: data(data_in) {
|
: data(data_in) {
|
||||||
uint64_t kpID = 0;
|
parallel_for(
|
||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
std::string("Kokkos::ScatterView::ResetDuplicates [") + name + "]",
|
||||||
Kokkos::Profiling::beginParallelFor(std::string("reduce_") + name, 0,
|
RangePolicy<ExecSpace, size_t>(0, size_in),
|
||||||
&kpID);
|
static_cast<Derived const&>(*this));
|
||||||
}
|
|
||||||
using policy_type = RangePolicy<ExecSpace, size_t>;
|
|
||||||
using closure_type = Kokkos::Impl::ParallelFor<Derived, policy_type>;
|
|
||||||
const closure_type closure(*(static_cast<Derived*>(this)),
|
|
||||||
policy_type(0, size_in));
|
|
||||||
closure.execute();
|
|
||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
|
||||||
Kokkos::Profiling::endParallelFor(kpID);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -931,8 +930,8 @@ class ScatterView<DataType, Kokkos::LayoutRight, DeviceType, Op,
|
|||||||
ScatterView(View<RT, RP...> const& original_view)
|
ScatterView(View<RT, RP...> const& original_view)
|
||||||
: unique_token(),
|
: unique_token(),
|
||||||
internal_view(
|
internal_view(
|
||||||
Kokkos::ViewAllocateWithoutInitializing(std::string("duplicated_") +
|
view_alloc(WithoutInitializing,
|
||||||
original_view.label()),
|
std::string("duplicated_") + original_view.label()),
|
||||||
unique_token.size(),
|
unique_token.size(),
|
||||||
original_view.rank_dynamic > 0 ? original_view.extent(0)
|
original_view.rank_dynamic > 0 ? original_view.extent(0)
|
||||||
: KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
: KOKKOS_IMPL_CTOR_DEFAULT_ARG,
|
||||||
@ -955,7 +954,7 @@ class ScatterView<DataType, Kokkos::LayoutRight, DeviceType, Op,
|
|||||||
|
|
||||||
template <typename... Dims>
|
template <typename... Dims>
|
||||||
ScatterView(std::string const& name, Dims... dims)
|
ScatterView(std::string const& name, Dims... dims)
|
||||||
: internal_view(Kokkos::ViewAllocateWithoutInitializing(name),
|
: internal_view(view_alloc(WithoutInitializing, name),
|
||||||
unique_token.size(), dims...) {
|
unique_token.size(), dims...) {
|
||||||
reset();
|
reset();
|
||||||
}
|
}
|
||||||
@ -1094,8 +1093,8 @@ class ScatterView<DataType, Kokkos::LayoutLeft, DeviceType, Op,
|
|||||||
KOKKOS_IMPL_CTOR_DEFAULT_ARG};
|
KOKKOS_IMPL_CTOR_DEFAULT_ARG};
|
||||||
arg_N[internal_view_type::rank - 1] = unique_token.size();
|
arg_N[internal_view_type::rank - 1] = unique_token.size();
|
||||||
internal_view = internal_view_type(
|
internal_view = internal_view_type(
|
||||||
Kokkos::ViewAllocateWithoutInitializing(std::string("duplicated_") +
|
view_alloc(WithoutInitializing,
|
||||||
original_view.label()),
|
std::string("duplicated_") + original_view.label()),
|
||||||
arg_N[0], arg_N[1], arg_N[2], arg_N[3], arg_N[4], arg_N[5], arg_N[6],
|
arg_N[0], arg_N[1], arg_N[2], arg_N[3], arg_N[4], arg_N[5], arg_N[6],
|
||||||
arg_N[7]);
|
arg_N[7]);
|
||||||
reset();
|
reset();
|
||||||
@ -1121,9 +1120,9 @@ class ScatterView<DataType, Kokkos::LayoutLeft, DeviceType, Op,
|
|||||||
KOKKOS_IMPL_CTOR_DEFAULT_ARG};
|
KOKKOS_IMPL_CTOR_DEFAULT_ARG};
|
||||||
Kokkos::Impl::Experimental::args_to_array(arg_N, 0, dims...);
|
Kokkos::Impl::Experimental::args_to_array(arg_N, 0, dims...);
|
||||||
arg_N[internal_view_type::rank - 1] = unique_token.size();
|
arg_N[internal_view_type::rank - 1] = unique_token.size();
|
||||||
internal_view = internal_view_type(
|
internal_view = internal_view_type(view_alloc(WithoutInitializing, name),
|
||||||
Kokkos::ViewAllocateWithoutInitializing(name), arg_N[0], arg_N[1],
|
arg_N[0], arg_N[1], arg_N[2], arg_N[3],
|
||||||
arg_N[2], arg_N[3], arg_N[4], arg_N[5], arg_N[6], arg_N[7]);
|
arg_N[4], arg_N[5], arg_N[6], arg_N[7]);
|
||||||
reset();
|
reset();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -306,9 +306,9 @@ class UnorderedMap {
|
|||||||
m_equal_to(equal_to),
|
m_equal_to(equal_to),
|
||||||
m_size(),
|
m_size(),
|
||||||
m_available_indexes(calculate_capacity(capacity_hint)),
|
m_available_indexes(calculate_capacity(capacity_hint)),
|
||||||
m_hash_lists(ViewAllocateWithoutInitializing("UnorderedMap hash list"),
|
m_hash_lists(view_alloc(WithoutInitializing, "UnorderedMap hash list"),
|
||||||
Impl::find_hash_size(capacity())),
|
Impl::find_hash_size(capacity())),
|
||||||
m_next_index(ViewAllocateWithoutInitializing("UnorderedMap next index"),
|
m_next_index(view_alloc(WithoutInitializing, "UnorderedMap next index"),
|
||||||
capacity() + 1) // +1 so that the *_at functions can
|
capacity() + 1) // +1 so that the *_at functions can
|
||||||
// always return a valid reference
|
// always return a valid reference
|
||||||
,
|
,
|
||||||
@ -540,7 +540,10 @@ class UnorderedMap {
|
|||||||
// Previously claimed an unused entry that was not inserted.
|
// Previously claimed an unused entry that was not inserted.
|
||||||
// Release this unused entry immediately.
|
// Release this unused entry immediately.
|
||||||
if (!m_available_indexes.reset(new_index)) {
|
if (!m_available_indexes.reset(new_index)) {
|
||||||
|
// FIXME_SYCL SYCL doesn't allow printf in kernels
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
printf("Unable to free existing\n");
|
printf("Unable to free existing\n");
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -729,16 +732,16 @@ class UnorderedMap {
|
|||||||
tmp.m_size = src.size();
|
tmp.m_size = src.size();
|
||||||
tmp.m_available_indexes = bitset_type(src.capacity());
|
tmp.m_available_indexes = bitset_type(src.capacity());
|
||||||
tmp.m_hash_lists = size_type_view(
|
tmp.m_hash_lists = size_type_view(
|
||||||
ViewAllocateWithoutInitializing("UnorderedMap hash list"),
|
view_alloc(WithoutInitializing, "UnorderedMap hash list"),
|
||||||
src.m_hash_lists.extent(0));
|
src.m_hash_lists.extent(0));
|
||||||
tmp.m_next_index = size_type_view(
|
tmp.m_next_index = size_type_view(
|
||||||
ViewAllocateWithoutInitializing("UnorderedMap next index"),
|
view_alloc(WithoutInitializing, "UnorderedMap next index"),
|
||||||
src.m_next_index.extent(0));
|
src.m_next_index.extent(0));
|
||||||
tmp.m_keys =
|
tmp.m_keys =
|
||||||
key_type_view(ViewAllocateWithoutInitializing("UnorderedMap keys"),
|
key_type_view(view_alloc(WithoutInitializing, "UnorderedMap keys"),
|
||||||
src.m_keys.extent(0));
|
src.m_keys.extent(0));
|
||||||
tmp.m_values = value_type_view(
|
tmp.m_values = value_type_view(
|
||||||
ViewAllocateWithoutInitializing("UnorderedMap values"),
|
view_alloc(WithoutInitializing, "UnorderedMap values"),
|
||||||
src.m_values.extent(0));
|
src.m_values.extent(0));
|
||||||
tmp.m_scalars = scalars_view("UnorderedMap scalars");
|
tmp.m_scalars = scalars_view("UnorderedMap scalars");
|
||||||
|
|
||||||
|
|||||||
@ -3,7 +3,7 @@ KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
|||||||
KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
||||||
KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
||||||
|
|
||||||
foreach(Tag Threads;Serial;OpenMP;HPX;Cuda;HIP)
|
foreach(Tag Threads;Serial;OpenMP;HPX;Cuda;HIP;SYCL)
|
||||||
# Because there is always an exception to the rule
|
# Because there is always an exception to the rule
|
||||||
if(Tag STREQUAL "Threads")
|
if(Tag STREQUAL "Threads")
|
||||||
set(DEVICE "PTHREAD")
|
set(DEVICE "PTHREAD")
|
||||||
@ -31,13 +31,21 @@ foreach(Tag Threads;Serial;OpenMP;HPX;Cuda;HIP)
|
|||||||
Vector
|
Vector
|
||||||
ViewCtorPropEmbeddedDim
|
ViewCtorPropEmbeddedDim
|
||||||
)
|
)
|
||||||
|
# Write to a temporary intermediate file and call configure_file to avoid
|
||||||
|
# updating timestamps triggering unnecessary rebuilds on subsequent cmake runs.
|
||||||
set(file ${dir}/Test${Tag}_${Name}.cpp)
|
set(file ${dir}/Test${Tag}_${Name}.cpp)
|
||||||
file(WRITE ${file}
|
file(WRITE ${dir}/dummy.cpp
|
||||||
"#include <Test${Tag}_Category.hpp>\n"
|
"#include <Test${Tag}_Category.hpp>\n"
|
||||||
"#include <Test${Name}.hpp>\n"
|
"#include <Test${Name}.hpp>\n"
|
||||||
)
|
)
|
||||||
|
configure_file(${dir}/dummy.cpp ${file})
|
||||||
list(APPEND UnitTestSources ${file})
|
list(APPEND UnitTestSources ${file})
|
||||||
endforeach()
|
endforeach()
|
||||||
|
list(REMOVE_ITEM UnitTestSources
|
||||||
|
${CMAKE_CURRENT_BINARY_DIR}/sycl/TestSYCL_Bitset.cpp
|
||||||
|
${CMAKE_CURRENT_BINARY_DIR}/sycl/TestSYCL_ScatterView.cpp
|
||||||
|
${CMAKE_CURRENT_BINARY_DIR}/sycl/TestSYCL_UnorderedMap.cpp
|
||||||
|
)
|
||||||
KOKKOS_ADD_EXECUTABLE_AND_TEST(UnitTest_${Tag} SOURCES ${UnitTestSources})
|
KOKKOS_ADD_EXECUTABLE_AND_TEST(UnitTest_${Tag} SOURCES ${UnitTestSources})
|
||||||
endif()
|
endif()
|
||||||
endforeach()
|
endforeach()
|
||||||
|
|||||||
@ -7,7 +7,7 @@ vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/openmp
|
|||||||
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/hpx
|
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/hpx
|
||||||
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/serial
|
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/serial
|
||||||
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/threads
|
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/threads
|
||||||
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/rocm
|
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/hip
|
||||||
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/cuda
|
vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests/cuda
|
||||||
vpath %.cpp ${CURDIR}
|
vpath %.cpp ${CURDIR}
|
||||||
default: build_all
|
default: build_all
|
||||||
|
|||||||
@ -108,7 +108,7 @@ struct test_dualview_combinations {
|
|||||||
if (with_init) {
|
if (with_init) {
|
||||||
a = ViewType("A", n, m);
|
a = ViewType("A", n, m);
|
||||||
} else {
|
} else {
|
||||||
a = ViewType(Kokkos::ViewAllocateWithoutInitializing("A"), n, m);
|
a = ViewType(Kokkos::view_alloc(Kokkos::WithoutInitializing, "A"), n, m);
|
||||||
}
|
}
|
||||||
Kokkos::deep_copy(a.d_view, 1);
|
Kokkos::deep_copy(a.d_view, 1);
|
||||||
|
|
||||||
@ -404,14 +404,19 @@ void test_dualview_resize() {
|
|||||||
Impl::test_dualview_resize<Scalar, Device>();
|
Impl::test_dualview_resize<Scalar, Device>();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// FIXME_SYCL requires MDRange policy
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
TEST(TEST_CATEGORY, dualview_combination) {
|
TEST(TEST_CATEGORY, dualview_combination) {
|
||||||
test_dualview_combinations<int, TEST_EXECSPACE>(10, true);
|
test_dualview_combinations<int, TEST_EXECSPACE>(10, true);
|
||||||
}
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
TEST(TEST_CATEGORY, dualview_alloc) {
|
TEST(TEST_CATEGORY, dualview_alloc) {
|
||||||
test_dualview_alloc<int, TEST_EXECSPACE>(10);
|
test_dualview_alloc<int, TEST_EXECSPACE>(10);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// FIXME_SYCL requires MDRange policy
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
TEST(TEST_CATEGORY, dualview_combinations_without_init) {
|
TEST(TEST_CATEGORY, dualview_combinations_without_init) {
|
||||||
test_dualview_combinations<int, TEST_EXECSPACE>(10, false);
|
test_dualview_combinations<int, TEST_EXECSPACE>(10, false);
|
||||||
}
|
}
|
||||||
@ -428,6 +433,7 @@ TEST(TEST_CATEGORY, dualview_realloc) {
|
|||||||
TEST(TEST_CATEGORY, dualview_resize) {
|
TEST(TEST_CATEGORY, dualview_resize) {
|
||||||
test_dualview_resize<int, TEST_EXECSPACE>();
|
test_dualview_resize<int, TEST_EXECSPACE>();
|
||||||
}
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
} // namespace Test
|
} // namespace Test
|
||||||
|
|
||||||
|
|||||||
@ -1063,8 +1063,8 @@ class TestDynViewAPI {
|
|||||||
(void)thing;
|
(void)thing;
|
||||||
}
|
}
|
||||||
|
|
||||||
dView0 d_uninitialized(Kokkos::ViewAllocateWithoutInitializing("uninit"),
|
dView0 d_uninitialized(
|
||||||
10, 20);
|
Kokkos::view_alloc(Kokkos::WithoutInitializing, "uninit"), 10, 20);
|
||||||
ASSERT_TRUE(d_uninitialized.data() != nullptr);
|
ASSERT_TRUE(d_uninitialized.data() != nullptr);
|
||||||
ASSERT_EQ(d_uninitialized.rank(), 2);
|
ASSERT_EQ(d_uninitialized.rank(), 2);
|
||||||
ASSERT_EQ(d_uninitialized.extent(0), 10);
|
ASSERT_EQ(d_uninitialized.extent(0), 10);
|
||||||
@ -1532,7 +1532,7 @@ class TestDynViewAPI {
|
|||||||
ASSERT_EQ(ds5.extent(5), ds5plus.extent(5));
|
ASSERT_EQ(ds5.extent(5), ds5plus.extent(5));
|
||||||
|
|
||||||
#if (!defined(KOKKOS_ENABLE_CUDA) || defined(KOKKOS_ENABLE_CUDA_UVM)) && \
|
#if (!defined(KOKKOS_ENABLE_CUDA) || defined(KOKKOS_ENABLE_CUDA_UVM)) && \
|
||||||
!defined(KOKKOS_ENABLE_HIP)
|
!defined(KOKKOS_ENABLE_HIP) && !defined(KOKKOS_ENABLE_SYCL)
|
||||||
ASSERT_EQ(&ds5(1, 1, 1, 1, 0) - &ds5plus(1, 1, 1, 1, 0), 0);
|
ASSERT_EQ(&ds5(1, 1, 1, 1, 0) - &ds5plus(1, 1, 1, 1, 0), 0);
|
||||||
ASSERT_EQ(&ds5(1, 1, 1, 1, 0, 0) - &ds5plus(1, 1, 1, 1, 0, 0),
|
ASSERT_EQ(&ds5(1, 1, 1, 1, 0, 0) - &ds5plus(1, 1, 1, 1, 0, 0),
|
||||||
0); // passing argument to rank beyond the view's rank is allowed
|
0); // passing argument to rank beyond the view's rank is allowed
|
||||||
|
|||||||
@ -243,6 +243,8 @@ struct TestDynamicView {
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// FIXME_SYCL needs resize_serial
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
TEST(TEST_CATEGORY, dynamic_view) {
|
TEST(TEST_CATEGORY, dynamic_view) {
|
||||||
using TestDynView = TestDynamicView<double, TEST_EXECSPACE>;
|
using TestDynView = TestDynamicView<double, TEST_EXECSPACE>;
|
||||||
|
|
||||||
@ -250,6 +252,7 @@ TEST(TEST_CATEGORY, dynamic_view) {
|
|||||||
TestDynView::run(100000 + 100 * i);
|
TestDynView::run(100000 + 100 * i);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
} // namespace Test
|
} // namespace Test
|
||||||
|
|
||||||
|
|||||||
@ -95,10 +95,6 @@ void test_offsetview_construction() {
|
|||||||
ASSERT_EQ(ov.extent(1), 5);
|
ASSERT_EQ(ov.extent(1), 5);
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
||||||
const int ovmin0 = ov.begin(0);
|
|
||||||
const int ovend0 = ov.end(0);
|
|
||||||
const int ovmin1 = ov.begin(1);
|
|
||||||
const int ovend1 = ov.end(1);
|
|
||||||
{
|
{
|
||||||
Kokkos::Experimental::OffsetView<Scalar*, Device> offsetV1("OneDOffsetView",
|
Kokkos::Experimental::OffsetView<Scalar*, Device> offsetV1("OneDOffsetView",
|
||||||
range0);
|
range0);
|
||||||
@ -134,6 +130,13 @@ void test_offsetview_construction() {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// FIXME_SYCL requires MDRange policy
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
|
const int ovmin0 = ov.begin(0);
|
||||||
|
const int ovend0 = ov.end(0);
|
||||||
|
const int ovmin1 = ov.begin(1);
|
||||||
|
const int ovend1 = ov.end(1);
|
||||||
|
|
||||||
using range_type =
|
using range_type =
|
||||||
Kokkos::MDRangePolicy<Device, Kokkos::Rank<2>, Kokkos::IndexType<int> >;
|
Kokkos::MDRangePolicy<Device, Kokkos::Rank<2>, Kokkos::IndexType<int> >;
|
||||||
using point_type = typename range_type::point_type;
|
using point_type = typename range_type::point_type;
|
||||||
@ -175,6 +178,7 @@ void test_offsetview_construction() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
ASSERT_EQ(OVResult, answer) << "Bad data found in OffsetView";
|
ASSERT_EQ(OVResult, answer) << "Bad data found in OffsetView";
|
||||||
|
#endif
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
{
|
{
|
||||||
@ -211,6 +215,8 @@ void test_offsetview_construction() {
|
|||||||
point3_type{{extent0, extent1, extent2}});
|
point3_type{{extent0, extent1, extent2}});
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
// FIXME_SYCL requires MDRange policy
|
||||||
|
#ifdef KOKKOS_ENABLE_SYCL
|
||||||
int view3DSum = 0;
|
int view3DSum = 0;
|
||||||
Kokkos::parallel_reduce(
|
Kokkos::parallel_reduce(
|
||||||
rangePolicy3DZero,
|
rangePolicy3DZero,
|
||||||
@ -233,6 +239,7 @@ void test_offsetview_construction() {
|
|||||||
|
|
||||||
ASSERT_EQ(view3DSum, offsetView3DSum)
|
ASSERT_EQ(view3DSum, offsetView3DSum)
|
||||||
<< "construction of OffsetView from View and begins array broken.";
|
<< "construction of OffsetView from View and begins array broken.";
|
||||||
|
#endif
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
view_type viewFromOV = ov.view();
|
view_type viewFromOV = ov.view();
|
||||||
@ -259,6 +266,8 @@ void test_offsetview_construction() {
|
|||||||
Kokkos::deep_copy(aView, ov);
|
Kokkos::deep_copy(aView, ov);
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
// FIXME_SYCL requires MDRange policy
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
int sum = 0;
|
int sum = 0;
|
||||||
Kokkos::parallel_reduce(
|
Kokkos::parallel_reduce(
|
||||||
rangePolicy2D,
|
rangePolicy2D,
|
||||||
@ -268,6 +277,7 @@ void test_offsetview_construction() {
|
|||||||
sum);
|
sum);
|
||||||
|
|
||||||
ASSERT_EQ(sum, 0) << "deep_copy(view, offsetView) broken.";
|
ASSERT_EQ(sum, 0) << "deep_copy(view, offsetView) broken.";
|
||||||
|
#endif
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -278,6 +288,8 @@ void test_offsetview_construction() {
|
|||||||
Kokkos::deep_copy(ov, aView);
|
Kokkos::deep_copy(ov, aView);
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
// FIXME_SYCL requires MDRange policy
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
int sum = 0;
|
int sum = 0;
|
||||||
Kokkos::parallel_reduce(
|
Kokkos::parallel_reduce(
|
||||||
rangePolicy2D,
|
rangePolicy2D,
|
||||||
@ -287,6 +299,7 @@ void test_offsetview_construction() {
|
|||||||
sum);
|
sum);
|
||||||
|
|
||||||
ASSERT_EQ(sum, 0) << "deep_copy(offsetView, view) broken.";
|
ASSERT_EQ(sum, 0) << "deep_copy(offsetView, view) broken.";
|
||||||
|
#endif
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -458,6 +471,8 @@ void test_offsetview_subview() {
|
|||||||
ASSERT_EQ(offsetSubview.end(1), 9);
|
ASSERT_EQ(offsetSubview.end(1), 9);
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
// FIXME_SYCL requires MDRange policy
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
using range_type = Kokkos::MDRangePolicy<Device, Kokkos::Rank<2>,
|
using range_type = Kokkos::MDRangePolicy<Device, Kokkos::Rank<2>,
|
||||||
Kokkos::IndexType<int> >;
|
Kokkos::IndexType<int> >;
|
||||||
using point_type = typename range_type::point_type;
|
using point_type = typename range_type::point_type;
|
||||||
@ -483,6 +498,7 @@ void test_offsetview_subview() {
|
|||||||
sum);
|
sum);
|
||||||
|
|
||||||
ASSERT_EQ(sum, 6 * (e0 - b0) * (e1 - b1));
|
ASSERT_EQ(sum, 6 * (e0 - b0) * (e1 - b1));
|
||||||
|
#endif
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -685,9 +701,12 @@ void test_offsetview_offsets_rank3() {
|
|||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
// FIXME_SYCL needs MDRangePolicy
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
TEST(TEST_CATEGORY, offsetview_construction) {
|
TEST(TEST_CATEGORY, offsetview_construction) {
|
||||||
test_offsetview_construction<int, TEST_EXECSPACE>();
|
test_offsetview_construction<int, TEST_EXECSPACE>();
|
||||||
}
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
TEST(TEST_CATEGORY, offsetview_unmanaged_construction) {
|
TEST(TEST_CATEGORY, offsetview_unmanaged_construction) {
|
||||||
test_offsetview_unmanaged_construction<int, TEST_EXECSPACE>();
|
test_offsetview_unmanaged_construction<int, TEST_EXECSPACE>();
|
||||||
|
|||||||
51
lib/kokkos/containers/unit_tests/TestSYCL_Category.hpp
Normal file
51
lib/kokkos/containers/unit_tests/TestSYCL_Category.hpp
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 3.0
|
||||||
|
// Copyright (2020) National Technology & Engineering
|
||||||
|
// Solutions of Sandia, LLC (NTESS).
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef KOKKOS_TEST_SYCL_HPP
|
||||||
|
#define KOKKOS_TEST_SYCL_HPP
|
||||||
|
|
||||||
|
#define TEST_CATEGORY sycl
|
||||||
|
#define TEST_EXECSPACE Kokkos::Experimental::SYCL
|
||||||
|
|
||||||
|
#endif
|
||||||
@ -583,18 +583,9 @@ struct TestDuplicatedScatterView<
|
|||||||
};
|
};
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#ifdef KOKKOS_ENABLE_ROCM
|
|
||||||
// disable duplicated instantiation with ROCm until
|
|
||||||
// UniqueToken can support it
|
|
||||||
template <typename ScatterType>
|
|
||||||
struct TestDuplicatedScatterView<Kokkos::Experimental::ROCm, ScatterType> {
|
|
||||||
TestDuplicatedScatterView(int) {}
|
|
||||||
};
|
|
||||||
#endif
|
|
||||||
|
|
||||||
template <typename DeviceType, typename ScatterType,
|
template <typename DeviceType, typename ScatterType,
|
||||||
typename NumberType = double>
|
typename NumberType = double>
|
||||||
void test_scatter_view(int n) {
|
void test_scatter_view(int64_t n) {
|
||||||
using execution_space = typename DeviceType::execution_space;
|
using execution_space = typename DeviceType::execution_space;
|
||||||
|
|
||||||
// no atomics or duplication is only sensible if the execution space
|
// no atomics or duplication is only sensible if the execution space
|
||||||
@ -630,7 +621,7 @@ void test_scatter_view(int n) {
|
|||||||
constexpr std::size_t bytes_per_value = sizeof(NumberType) * 12;
|
constexpr std::size_t bytes_per_value = sizeof(NumberType) * 12;
|
||||||
std::size_t const maximum_allowed_copy_values =
|
std::size_t const maximum_allowed_copy_values =
|
||||||
maximum_allowed_copy_bytes / bytes_per_value;
|
maximum_allowed_copy_bytes / bytes_per_value;
|
||||||
n = std::min(n, int(maximum_allowed_copy_values));
|
n = std::min(n, int64_t(maximum_allowed_copy_values));
|
||||||
|
|
||||||
// if the default is duplicated, this needs to follow the limit
|
// if the default is duplicated, this needs to follow the limit
|
||||||
{
|
{
|
||||||
@ -683,32 +674,40 @@ TEST(TEST_CATEGORY, scatterview_devicetype) {
|
|||||||
test_scatter_view<device_type, Kokkos::Experimental::ScatterMin>(10);
|
test_scatter_view<device_type, Kokkos::Experimental::ScatterMin>(10);
|
||||||
test_scatter_view<device_type, Kokkos::Experimental::ScatterMax>(10);
|
test_scatter_view<device_type, Kokkos::Experimental::ScatterMax>(10);
|
||||||
|
|
||||||
|
#if defined(KOKKOS_ENABLE_CUDA) || defined(KOKKOS_ENABLE_HIP)
|
||||||
#ifdef KOKKOS_ENABLE_CUDA
|
#ifdef KOKKOS_ENABLE_CUDA
|
||||||
if (std::is_same<TEST_EXECSPACE, Kokkos::Cuda>::value) {
|
using device_execution_space = Kokkos::Cuda;
|
||||||
using cuda_device_type = Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>;
|
using device_memory_space = Kokkos::CudaSpace;
|
||||||
test_scatter_view<cuda_device_type, Kokkos::Experimental::ScatterSum,
|
using host_accessible_space = Kokkos::CudaUVMSpace;
|
||||||
|
#else
|
||||||
|
using device_execution_space = Kokkos::Experimental::HIP;
|
||||||
|
using device_memory_space = Kokkos::Experimental::HIPSpace;
|
||||||
|
using host_accessible_space = Kokkos::Experimental::HIPHostPinnedSpace;
|
||||||
|
#endif
|
||||||
|
if (std::is_same<TEST_EXECSPACE, device_execution_space>::value) {
|
||||||
|
using device_device_type =
|
||||||
|
Kokkos::Device<device_execution_space, device_memory_space>;
|
||||||
|
test_scatter_view<device_device_type, Kokkos::Experimental::ScatterSum,
|
||||||
double>(10);
|
double>(10);
|
||||||
test_scatter_view<cuda_device_type, Kokkos::Experimental::ScatterSum,
|
test_scatter_view<device_device_type, Kokkos::Experimental::ScatterSum,
|
||||||
unsigned int>(10);
|
unsigned int>(10);
|
||||||
test_scatter_view<cuda_device_type, Kokkos::Experimental::ScatterProd>(10);
|
test_scatter_view<device_device_type, Kokkos::Experimental::ScatterProd>(
|
||||||
test_scatter_view<cuda_device_type, Kokkos::Experimental::ScatterMin>(10);
|
10);
|
||||||
test_scatter_view<cuda_device_type, Kokkos::Experimental::ScatterMax>(10);
|
test_scatter_view<device_device_type, Kokkos::Experimental::ScatterMin>(10);
|
||||||
using cudauvm_device_type =
|
test_scatter_view<device_device_type, Kokkos::Experimental::ScatterMax>(10);
|
||||||
Kokkos::Device<Kokkos::Cuda, Kokkos::CudaUVMSpace>;
|
using host_device_type =
|
||||||
test_scatter_view<cudauvm_device_type, Kokkos::Experimental::ScatterSum,
|
Kokkos::Device<device_execution_space, host_accessible_space>;
|
||||||
|
test_scatter_view<host_device_type, Kokkos::Experimental::ScatterSum,
|
||||||
double>(10);
|
double>(10);
|
||||||
test_scatter_view<cudauvm_device_type, Kokkos::Experimental::ScatterSum,
|
test_scatter_view<host_device_type, Kokkos::Experimental::ScatterSum,
|
||||||
unsigned int>(10);
|
unsigned int>(10);
|
||||||
test_scatter_view<cudauvm_device_type, Kokkos::Experimental::ScatterProd>(
|
test_scatter_view<host_device_type, Kokkos::Experimental::ScatterProd>(10);
|
||||||
10);
|
test_scatter_view<host_device_type, Kokkos::Experimental::ScatterMin>(10);
|
||||||
test_scatter_view<cudauvm_device_type, Kokkos::Experimental::ScatterMin>(
|
test_scatter_view<host_device_type, Kokkos::Experimental::ScatterMax>(10);
|
||||||
10);
|
|
||||||
test_scatter_view<cudauvm_device_type, Kokkos::Experimental::ScatterMax>(
|
|
||||||
10);
|
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Test
|
} // namespace Test
|
||||||
|
|
||||||
#endif // KOKKOS_TEST_UNORDERED_MAP_HPP
|
#endif // KOKKOS_TEST_SCATTER_VIEW_HPP
|
||||||
|
|||||||
@ -200,8 +200,7 @@ void run_test_graph3(size_t B, size_t N) {
|
|||||||
|
|
||||||
for (size_t i = 0; i < B; i++) {
|
for (size_t i = 0; i < B; i++) {
|
||||||
size_t ne = 0;
|
size_t ne = 0;
|
||||||
for (size_t j = hx.row_block_offsets(i); j < hx.row_block_offsets(i + 1);
|
for (auto j = hx.row_block_offsets(i); j < hx.row_block_offsets(i + 1); j++)
|
||||||
j++)
|
|
||||||
ne += hx.row_map(j + 1) - hx.row_map(j) + C;
|
ne += hx.row_map(j + 1) - hx.row_map(j) + C;
|
||||||
|
|
||||||
ASSERT_FALSE(
|
ASSERT_FALSE(
|
||||||
@ -212,7 +211,7 @@ void run_test_graph3(size_t B, size_t N) {
|
|||||||
|
|
||||||
template <class Space>
|
template <class Space>
|
||||||
void run_test_graph4() {
|
void run_test_graph4() {
|
||||||
using ordinal_type = unsigned;
|
using ordinal_type = unsigned int;
|
||||||
using layout_type = Kokkos::LayoutRight;
|
using layout_type = Kokkos::LayoutRight;
|
||||||
using space_type = Space;
|
using space_type = Space;
|
||||||
using memory_traits_type = Kokkos::MemoryUnmanaged;
|
using memory_traits_type = Kokkos::MemoryUnmanaged;
|
||||||
@ -286,7 +285,10 @@ void run_test_graph4() {
|
|||||||
|
|
||||||
TEST(TEST_CATEGORY, staticcrsgraph) {
|
TEST(TEST_CATEGORY, staticcrsgraph) {
|
||||||
TestStaticCrsGraph::run_test_graph<TEST_EXECSPACE>();
|
TestStaticCrsGraph::run_test_graph<TEST_EXECSPACE>();
|
||||||
|
// FIXME_SYCL requires MDRangePolicy
|
||||||
|
#ifndef KOKKOS_ENABLE_SYCL
|
||||||
TestStaticCrsGraph::run_test_graph2<TEST_EXECSPACE>();
|
TestStaticCrsGraph::run_test_graph2<TEST_EXECSPACE>();
|
||||||
|
#endif
|
||||||
TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 0);
|
TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 0);
|
||||||
TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 1000);
|
TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 1000);
|
||||||
TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 10000);
|
TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 10000);
|
||||||
|
|||||||
@ -78,7 +78,7 @@ struct test_vector_insert {
|
|||||||
// Looks like some std::vector implementations do not have the restriction
|
// Looks like some std::vector implementations do not have the restriction
|
||||||
// right on the overload taking three iterators, and thus the following call
|
// right on the overload taking three iterators, and thus the following call
|
||||||
// will hit that overload and then fail to compile.
|
// will hit that overload and then fail to compile.
|
||||||
#if defined(KOKKOS_COMPILER_INTEL) && (1700 > KOKKOS_COMPILER_INTEL)
|
#if defined(KOKKOS_COMPILER_INTEL)
|
||||||
// And at least GCC 4.8.4 doesn't implement vector insert correct for C++11
|
// And at least GCC 4.8.4 doesn't implement vector insert correct for C++11
|
||||||
// Return type is void ...
|
// Return type is void ...
|
||||||
#if (__GNUC__ < 5)
|
#if (__GNUC__ < 5)
|
||||||
@ -104,7 +104,7 @@ struct test_vector_insert {
|
|||||||
// Looks like some std::vector implementations do not have the restriction
|
// Looks like some std::vector implementations do not have the restriction
|
||||||
// right on the overload taking three iterators, and thus the following call
|
// right on the overload taking three iterators, and thus the following call
|
||||||
// will hit that overload and then fail to compile.
|
// will hit that overload and then fail to compile.
|
||||||
#if defined(KOKKOS_COMPILER_INTEL) && (1700 > KOKKOS_COMPILER_INTEL)
|
#if defined(KOKKOS_COMPILER_INTEL)
|
||||||
b.insert(b.begin(), typename Vector::size_type(7), 9);
|
b.insert(b.begin(), typename Vector::size_type(7), 9);
|
||||||
#else
|
#else
|
||||||
b.insert(b.begin(), 7, 9);
|
b.insert(b.begin(), 7, 9);
|
||||||
@ -125,7 +125,7 @@ struct test_vector_insert {
|
|||||||
|
|
||||||
// Testing insert at end via all three function interfaces
|
// Testing insert at end via all three function interfaces
|
||||||
a.insert(a.end(), 11);
|
a.insert(a.end(), 11);
|
||||||
#if defined(KOKKOS_COMPILER_INTEL) && (1700 > KOKKOS_COMPILER_INTEL)
|
#if defined(KOKKOS_COMPILER_INTEL)
|
||||||
a.insert(a.end(), typename Vector::size_type(2), 12);
|
a.insert(a.end(), typename Vector::size_type(2), 12);
|
||||||
#else
|
#else
|
||||||
a.insert(a.end(), 2, 12);
|
a.insert(a.end(), 2, 12);
|
||||||
|
|||||||
@ -100,6 +100,5 @@
|
|||||||
|
|
||||||
// TODO: No longer options in Kokkos. Need to be removed.
|
// TODO: No longer options in Kokkos. Need to be removed.
|
||||||
#cmakedefine KOKKOS_USING_DEPRECATED_VIEW
|
#cmakedefine KOKKOS_USING_DEPRECATED_VIEW
|
||||||
#cmakedefine KOKKOS_ENABLE_CXX11
|
|
||||||
|
|
||||||
#endif // !defined(KOKKOS_FOR_SIERRA)
|
#endif // !defined(KOKKOS_FOR_SIERRA)
|
||||||
|
|||||||
@ -48,17 +48,10 @@ SET(SOURCES
|
|||||||
PerfTest_ViewResize_8.cpp
|
PerfTest_ViewResize_8.cpp
|
||||||
)
|
)
|
||||||
|
|
||||||
IF(Kokkos_ENABLE_HIP)
|
|
||||||
# FIXME HIP requires TeamPolicy
|
|
||||||
LIST(REMOVE_ITEM SOURCES
|
|
||||||
PerfTest_CustomReduction.cpp
|
|
||||||
PerfTest_ExecSpacePartitioning.cpp
|
|
||||||
)
|
|
||||||
ENDIF()
|
|
||||||
|
|
||||||
IF(Kokkos_ENABLE_OPENMPTARGET)
|
IF(Kokkos_ENABLE_OPENMPTARGET)
|
||||||
# FIXME OPENMPTARGET requires TeamPolicy Reductions and Custom Reduction
|
# FIXME OPENMPTARGET requires TeamPolicy Reductions and Custom Reduction
|
||||||
LIST(REMOVE_ITEM SOURCES
|
LIST(REMOVE_ITEM SOURCES
|
||||||
|
PerfTestGramSchmidt.cpp
|
||||||
PerfTest_CustomReduction.cpp
|
PerfTest_CustomReduction.cpp
|
||||||
PerfTest_ExecSpacePartitioning.cpp
|
PerfTest_ExecSpacePartitioning.cpp
|
||||||
)
|
)
|
||||||
@ -75,7 +68,8 @@ KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
|||||||
KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
||||||
|
|
||||||
# This test currently times out for MSVC
|
# This test currently times out for MSVC
|
||||||
IF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC")
|
# FIXME_SYCL these tests don't compile yet (require parallel_for).
|
||||||
|
IF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC" AND NOT Kokkos_ENABLE_SYCL)
|
||||||
KOKKOS_ADD_EXECUTABLE_AND_TEST(
|
KOKKOS_ADD_EXECUTABLE_AND_TEST(
|
||||||
PerfTestExec
|
PerfTestExec
|
||||||
SOURCES ${SOURCES}
|
SOURCES ${SOURCES}
|
||||||
@ -83,17 +77,28 @@ IF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC")
|
|||||||
)
|
)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
KOKKOS_ADD_EXECUTABLE_AND_TEST(
|
# FIXME_SYCL
|
||||||
PerformanceTest_Atomic
|
IF(NOT Kokkos_ENABLE_SYCL)
|
||||||
SOURCES test_atomic.cpp
|
KOKKOS_ADD_EXECUTABLE_AND_TEST(
|
||||||
CATEGORIES PERFORMANCE
|
PerformanceTest_Atomic
|
||||||
)
|
SOURCES test_atomic.cpp
|
||||||
|
CATEGORIES PERFORMANCE
|
||||||
|
)
|
||||||
|
|
||||||
|
IF(NOT KOKKOS_ENABLE_CUDA OR KOKKOS_ENABLE_CUDA_LAMBDA)
|
||||||
|
KOKKOS_ADD_EXECUTABLE_AND_TEST(
|
||||||
|
PerformanceTest_Atomic_MinMax
|
||||||
|
SOURCES test_atomic_minmax_simple.cpp
|
||||||
|
CATEGORIES PERFORMANCE
|
||||||
|
)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
KOKKOS_ADD_EXECUTABLE_AND_TEST(
|
KOKKOS_ADD_EXECUTABLE_AND_TEST(
|
||||||
PerformanceTest_Mempool
|
PerformanceTest_Mempool
|
||||||
SOURCES test_mempool.cpp
|
SOURCES test_mempool.cpp
|
||||||
CATEGORIES PERFORMANCE
|
CATEGORIES PERFORMANCE
|
||||||
)
|
)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
IF(NOT Kokkos_ENABLE_OPENMPTARGET)
|
IF(NOT Kokkos_ENABLE_OPENMPTARGET)
|
||||||
# FIXME OPENMPTARGET needs tasking
|
# FIXME OPENMPTARGET needs tasking
|
||||||
|
|||||||
@ -65,6 +65,12 @@ TEST_TARGETS += test-taskdag
|
|||||||
|
|
||||||
#
|
#
|
||||||
|
|
||||||
|
OBJ_ATOMICS_MINMAX = test_atomic_minmax_simple.o
|
||||||
|
TARGETS += KokkosCore_PerformanceTest_Atomics_MinMax
|
||||||
|
TEST_TARGETS += test-atomic-minmax
|
||||||
|
|
||||||
|
#
|
||||||
|
|
||||||
KokkosCore_PerformanceTest: $(OBJ_PERF) $(KOKKOS_LINK_DEPENDS)
|
KokkosCore_PerformanceTest: $(OBJ_PERF) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(EXTRA_PATH) $(OBJ_PERF) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosCore_PerformanceTest
|
$(LINK) $(EXTRA_PATH) $(OBJ_PERF) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosCore_PerformanceTest
|
||||||
|
|
||||||
@ -77,6 +83,9 @@ KokkosCore_PerformanceTest_Mempool: $(OBJ_MEMPOOL) $(KOKKOS_LINK_DEPENDS)
|
|||||||
KokkosCore_PerformanceTest_TaskDAG: $(OBJ_TASKDAG) $(KOKKOS_LINK_DEPENDS)
|
KokkosCore_PerformanceTest_TaskDAG: $(OBJ_TASKDAG) $(KOKKOS_LINK_DEPENDS)
|
||||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_TASKDAG) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_TaskDAG
|
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_TASKDAG) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_TaskDAG
|
||||||
|
|
||||||
|
KokkosCore_PerformanceTest_Atomics_MinMax: $(OBJ_ATOMICS_MINMAX) $(KOKKOS_LINK_DEPENDS)
|
||||||
|
$(LINK) $(EXTRA_PATH) $(OBJ_ATOMICS_MINMAX) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosCore_PerformanceTest_Atomics_MinMax
|
||||||
|
|
||||||
test-performance: KokkosCore_PerformanceTest
|
test-performance: KokkosCore_PerformanceTest
|
||||||
./KokkosCore_PerformanceTest
|
./KokkosCore_PerformanceTest
|
||||||
|
|
||||||
@ -89,6 +98,9 @@ test-mempool: KokkosCore_PerformanceTest_Mempool
|
|||||||
test-taskdag: KokkosCore_PerformanceTest_TaskDAG
|
test-taskdag: KokkosCore_PerformanceTest_TaskDAG
|
||||||
./KokkosCore_PerformanceTest_TaskDAG
|
./KokkosCore_PerformanceTest_TaskDAG
|
||||||
|
|
||||||
|
test-atomic-minmax: KokkosCore_PerformanceTest_Atomics_MinMax
|
||||||
|
./KokkosCore_PerformanceTest_Atomics_MinMax
|
||||||
|
|
||||||
build_all: $(TARGETS)
|
build_all: $(TARGETS)
|
||||||
|
|
||||||
test: $(TEST_TARGETS)
|
test: $(TEST_TARGETS)
|
||||||
|
|||||||
@ -120,7 +120,7 @@ void run_resizeview_tests123(int N, int R) {
|
|||||||
Kokkos::Timer timer;
|
Kokkos::Timer timer;
|
||||||
for (int r = 0; r < R; r++) {
|
for (int r = 0; r < R; r++) {
|
||||||
Kokkos::View<double*, Layout> a1(
|
Kokkos::View<double*, Layout> a1(
|
||||||
Kokkos::ViewAllocateWithoutInitializing("A1"), int(N8 * 1.1));
|
Kokkos::view_alloc(Kokkos::WithoutInitializing, "A1"), int(N8 * 1.1));
|
||||||
double* a1_ptr = a1.data();
|
double* a1_ptr = a1.data();
|
||||||
Kokkos::parallel_for(
|
Kokkos::parallel_for(
|
||||||
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
||||||
@ -201,7 +201,7 @@ void run_resizeview_tests45(int N, int R) {
|
|||||||
Kokkos::Timer timer;
|
Kokkos::Timer timer;
|
||||||
for (int r = 0; r < R; r++) {
|
for (int r = 0; r < R; r++) {
|
||||||
Kokkos::View<double*, Layout> a1(
|
Kokkos::View<double*, Layout> a1(
|
||||||
Kokkos::ViewAllocateWithoutInitializing("A1"), int(N8 * 1.1));
|
Kokkos::view_alloc(Kokkos::WithoutInitializing, "A1"), int(N8 * 1.1));
|
||||||
double* a1_ptr = a1.data();
|
double* a1_ptr = a1.data();
|
||||||
Kokkos::parallel_for(
|
Kokkos::parallel_for(
|
||||||
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
||||||
@ -258,7 +258,7 @@ void run_resizeview_tests6(int N, int R) {
|
|||||||
Kokkos::Timer timer;
|
Kokkos::Timer timer;
|
||||||
for (int r = 0; r < R; r++) {
|
for (int r = 0; r < R; r++) {
|
||||||
Kokkos::View<double*, Layout> a1(
|
Kokkos::View<double*, Layout> a1(
|
||||||
Kokkos::ViewAllocateWithoutInitializing("A1"), int(N8 * 1.1));
|
Kokkos::view_alloc(Kokkos::WithoutInitializing, "A1"), int(N8 * 1.1));
|
||||||
double* a1_ptr = a1.data();
|
double* a1_ptr = a1.data();
|
||||||
Kokkos::parallel_for(
|
Kokkos::parallel_for(
|
||||||
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
||||||
@ -311,7 +311,7 @@ void run_resizeview_tests7(int N, int R) {
|
|||||||
Kokkos::Timer timer;
|
Kokkos::Timer timer;
|
||||||
for (int r = 0; r < R; r++) {
|
for (int r = 0; r < R; r++) {
|
||||||
Kokkos::View<double*, Layout> a1(
|
Kokkos::View<double*, Layout> a1(
|
||||||
Kokkos::ViewAllocateWithoutInitializing("A1"), int(N8 * 1.1));
|
Kokkos::view_alloc(Kokkos::WithoutInitializing, "A1"), int(N8 * 1.1));
|
||||||
double* a1_ptr = a1.data();
|
double* a1_ptr = a1.data();
|
||||||
Kokkos::parallel_for(
|
Kokkos::parallel_for(
|
||||||
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
||||||
@ -366,7 +366,7 @@ void run_resizeview_tests8(int N, int R) {
|
|||||||
Kokkos::Timer timer;
|
Kokkos::Timer timer;
|
||||||
for (int r = 0; r < R; r++) {
|
for (int r = 0; r < R; r++) {
|
||||||
Kokkos::View<double*, Layout> a1(
|
Kokkos::View<double*, Layout> a1(
|
||||||
Kokkos::ViewAllocateWithoutInitializing("A1"), int(N8 * 1.1));
|
Kokkos::view_alloc(Kokkos::WithoutInitializing, "A1"), int(N8 * 1.1));
|
||||||
double* a1_ptr = a1.data();
|
double* a1_ptr = a1.data();
|
||||||
Kokkos::parallel_for(
|
Kokkos::parallel_for(
|
||||||
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
N8, KOKKOS_LAMBDA(const int& i) { a1_ptr[i] = a_ptr[i]; });
|
||||||
|
|||||||
244
lib/kokkos/core/perf_test/test_atomic_minmax_simple.cpp
Normal file
244
lib/kokkos/core/perf_test/test_atomic_minmax_simple.cpp
Normal file
@ -0,0 +1,244 @@
|
|||||||
|
// export OMP_PROC_BIND=spread ; export OMP_PLACES=threads
|
||||||
|
// c++ -O2 -g -DNDEBUG -fopenmp
|
||||||
|
// ../core/perf_test/test_atomic_minmax_simple.cpp -I../core/src/ -I. -o
|
||||||
|
// test_atomic_minmax_simple.x containers/src/libkokkoscontainers.a
|
||||||
|
// core/src/libkokkoscore.a -ldl && OMP_NUM_THREADS=1
|
||||||
|
// ./test_atomic_minmax_simple.x 10000000
|
||||||
|
|
||||||
|
#include <cstdio>
|
||||||
|
#include <cstdlib>
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <typeinfo>
|
||||||
|
|
||||||
|
#include <Kokkos_Core.hpp>
|
||||||
|
#include <impl/Kokkos_Timer.hpp>
|
||||||
|
|
||||||
|
using exec_space = Kokkos::DefaultExecutionSpace;
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
void test(const int length) {
|
||||||
|
Kokkos::Impl::Timer timer;
|
||||||
|
|
||||||
|
using vector = Kokkos::View<T*, exec_space>;
|
||||||
|
|
||||||
|
vector inp("input", length);
|
||||||
|
T max = std::numeric_limits<T>::max();
|
||||||
|
T min = std::numeric_limits<T>::lowest();
|
||||||
|
|
||||||
|
// input is max values - all min atomics will replace
|
||||||
|
{
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
length, KOKKOS_LAMBDA(const int i) { inp(i) = max; });
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
timer.reset();
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
length, KOKKOS_LAMBDA(const int i) {
|
||||||
|
(void)Kokkos::atomic_fetch_min(&(inp(i)), (T)i);
|
||||||
|
});
|
||||||
|
Kokkos::fence();
|
||||||
|
double time = timer.seconds();
|
||||||
|
|
||||||
|
int errors(0);
|
||||||
|
Kokkos::parallel_reduce(
|
||||||
|
length,
|
||||||
|
KOKKOS_LAMBDA(const int i, int& inner) { inner += (inp(i) != (T)i); },
|
||||||
|
errors);
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
if (errors) {
|
||||||
|
std::cerr << "Error in 100% min replacements: " << errors << std::endl;
|
||||||
|
std::cerr << "inp(0)=" << inp(0) << std::endl;
|
||||||
|
}
|
||||||
|
std::cout << "Time for 100% min replacements: " << time << std::endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
// input is min values - all max atomics will replace
|
||||||
|
{
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
length, KOKKOS_LAMBDA(const int i) { inp(i) = min; });
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
timer.reset();
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
length, KOKKOS_LAMBDA(const int i) {
|
||||||
|
(void)Kokkos::atomic_max_fetch(&(inp(i)), (T)i);
|
||||||
|
});
|
||||||
|
Kokkos::fence();
|
||||||
|
double time = timer.seconds();
|
||||||
|
|
||||||
|
int errors(0);
|
||||||
|
Kokkos::parallel_reduce(
|
||||||
|
length,
|
||||||
|
KOKKOS_LAMBDA(const int i, int& inner) { inner += (inp(i) != (T)i); },
|
||||||
|
errors);
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
if (errors) {
|
||||||
|
std::cerr << "Error in 100% max replacements: " << errors << std::endl;
|
||||||
|
std::cerr << "inp(0)=" << inp(0) << std::endl;
|
||||||
|
}
|
||||||
|
std::cout << "Time for 100% max replacements: " << time << std::endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
// input is max values - all max atomics will early exit
|
||||||
|
{
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
length, KOKKOS_LAMBDA(const int i) { inp(i) = max; });
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
timer.reset();
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
length, KOKKOS_LAMBDA(const int i) {
|
||||||
|
(void)Kokkos::atomic_max_fetch(&(inp(i)), (T)i);
|
||||||
|
});
|
||||||
|
Kokkos::fence();
|
||||||
|
double time = timer.seconds();
|
||||||
|
|
||||||
|
int errors(0);
|
||||||
|
Kokkos::parallel_reduce(
|
||||||
|
length,
|
||||||
|
KOKKOS_LAMBDA(const int i, int& inner) {
|
||||||
|
T ref = max;
|
||||||
|
inner += (inp(i) != ref);
|
||||||
|
},
|
||||||
|
errors);
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
if (errors) {
|
||||||
|
std::cerr << "Error in 100% max early exits: " << errors << std::endl;
|
||||||
|
std::cerr << "inp(0)=" << inp(0) << std::endl;
|
||||||
|
}
|
||||||
|
std::cout << "Time for 100% max early exits: " << time << std::endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
// input is min values - all min atomics will early exit
|
||||||
|
{
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
length, KOKKOS_LAMBDA(const int i) { inp(i) = min; });
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
timer.reset();
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
length, KOKKOS_LAMBDA(const int i) {
|
||||||
|
(void)Kokkos::atomic_min_fetch(&(inp(i)), (T)i);
|
||||||
|
});
|
||||||
|
Kokkos::fence();
|
||||||
|
double time = timer.seconds();
|
||||||
|
|
||||||
|
int errors(0);
|
||||||
|
Kokkos::parallel_reduce(
|
||||||
|
length,
|
||||||
|
KOKKOS_LAMBDA(const int i, int& inner) {
|
||||||
|
T ref = min;
|
||||||
|
inner += (inp(i) != ref);
|
||||||
|
},
|
||||||
|
errors);
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
if (errors) {
|
||||||
|
std::cerr << "Error in 100% min early exits: " << errors << std::endl;
|
||||||
|
std::cerr << "inp(0)=" << inp(0) << std::endl;
|
||||||
|
if (length > 9) std::cout << "inp(9)=" << inp(9) << std::endl;
|
||||||
|
}
|
||||||
|
std::cout << "Time for 100% min early exits: " << time << std::endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
// limit iterations for contentious test, takes ~50x longer for same length
|
||||||
|
auto con_length = length / 5;
|
||||||
|
// input is min values - some max atomics will replace
|
||||||
|
{
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
1, KOKKOS_LAMBDA(const int i) { inp(i) = min; });
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
T current(0);
|
||||||
|
timer.reset();
|
||||||
|
Kokkos::parallel_reduce(
|
||||||
|
con_length,
|
||||||
|
KOKKOS_LAMBDA(const int i, T& inner) {
|
||||||
|
inner = Kokkos::atomic_max_fetch(&(inp(0)), inner + 1);
|
||||||
|
if (i == con_length - 1) {
|
||||||
|
Kokkos::atomic_max_fetch(&(inp(0)), max);
|
||||||
|
inner = max;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
Kokkos::Max<T>(current));
|
||||||
|
Kokkos::fence();
|
||||||
|
double time = timer.seconds();
|
||||||
|
|
||||||
|
if (current < max) {
|
||||||
|
std::cerr << "Error in contentious max replacements: " << std::endl;
|
||||||
|
std::cerr << "final=" << current << " inp(0)=" << inp(0) << " max=" << max
|
||||||
|
<< std::endl;
|
||||||
|
}
|
||||||
|
std::cout << "Time for contentious max " << con_length
|
||||||
|
<< " replacements: " << time << std::endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
// input is max values - some min atomics will replace
|
||||||
|
{
|
||||||
|
Kokkos::parallel_for(
|
||||||
|
1, KOKKOS_LAMBDA(const int i) { inp(i) = max; });
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
timer.reset();
|
||||||
|
T current(100000000);
|
||||||
|
Kokkos::parallel_reduce(
|
||||||
|
con_length,
|
||||||
|
KOKKOS_LAMBDA(const int i, T& inner) {
|
||||||
|
inner = Kokkos::atomic_min_fetch(&(inp(0)), inner - 1);
|
||||||
|
if (i == con_length - 1) {
|
||||||
|
Kokkos::atomic_min_fetch(&(inp(0)), min);
|
||||||
|
inner = min;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
Kokkos::Min<T>(current));
|
||||||
|
Kokkos::fence();
|
||||||
|
double time = timer.seconds();
|
||||||
|
|
||||||
|
if (current > min) {
|
||||||
|
std::cerr << "Error in contentious min replacements: " << std::endl;
|
||||||
|
std::cerr << "final=" << current << " inp(0)=" << inp(0) << " min=" << min
|
||||||
|
<< std::endl;
|
||||||
|
}
|
||||||
|
std::cout << "Time for contentious min " << con_length
|
||||||
|
<< " replacements: " << time << std::endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char* argv[]) {
|
||||||
|
Kokkos::initialize(argc, argv);
|
||||||
|
{
|
||||||
|
int length = 1000000;
|
||||||
|
if (argc == 2) {
|
||||||
|
length = std::stoi(argv[1]);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (length < 1) {
|
||||||
|
throw std::invalid_argument("");
|
||||||
|
}
|
||||||
|
|
||||||
|
std::cout << "================ int" << std::endl;
|
||||||
|
test<int>(length);
|
||||||
|
std::cout << "================ long" << std::endl;
|
||||||
|
test<long>(length);
|
||||||
|
std::cout << "================ long long" << std::endl;
|
||||||
|
test<long long>(length);
|
||||||
|
|
||||||
|
std::cout << "================ unsigned int" << std::endl;
|
||||||
|
test<unsigned int>(length);
|
||||||
|
std::cout << "================ unsigned long" << std::endl;
|
||||||
|
test<unsigned long>(length);
|
||||||
|
std::cout << "================ unsigned long long" << std::endl;
|
||||||
|
test<unsigned long long>(length);
|
||||||
|
|
||||||
|
std::cout << "================ float" << std::endl;
|
||||||
|
test<float>(length);
|
||||||
|
std::cout << "================ double" << std::endl;
|
||||||
|
test<double>(length);
|
||||||
|
}
|
||||||
|
Kokkos::finalize();
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
@ -19,10 +19,6 @@ SET(KOKKOS_CORE_HEADERS)
|
|||||||
APPEND_GLOB(KOKKOS_CORE_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
|
APPEND_GLOB(KOKKOS_CORE_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
|
||||||
APPEND_GLOB(KOKKOS_CORE_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/impl/*.hpp)
|
APPEND_GLOB(KOKKOS_CORE_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/impl/*.hpp)
|
||||||
|
|
||||||
IF (KOKKOS_ENABLE_ROCM)
|
|
||||||
APPEND_GLOB(KOKKOS_CORE_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/ROCm/*.cpp)
|
|
||||||
ENDIF()
|
|
||||||
|
|
||||||
IF (KOKKOS_ENABLE_CUDA)
|
IF (KOKKOS_ENABLE_CUDA)
|
||||||
APPEND_GLOB(KOKKOS_CORE_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/Cuda/*.cpp)
|
APPEND_GLOB(KOKKOS_CORE_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/Cuda/*.cpp)
|
||||||
APPEND_GLOB(KOKKOS_CORE_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/Cuda/*.hpp)
|
APPEND_GLOB(KOKKOS_CORE_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/Cuda/*.hpp)
|
||||||
@ -64,6 +60,11 @@ ELSE()
|
|||||||
LIST(REMOVE_ITEM KOKKOS_CORE_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/impl/Kokkos_Serial_task.cpp)
|
LIST(REMOVE_ITEM KOKKOS_CORE_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/impl/Kokkos_Serial_task.cpp)
|
||||||
ENDIF()
|
ENDIF()
|
||||||
|
|
||||||
|
IF (KOKKOS_ENABLE_SYCL)
|
||||||
|
APPEND_GLOB(KOKKOS_CORE_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/SYCL/*.cpp)
|
||||||
|
APPEND_GLOB(KOKKOS_CORE_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/SYCL/*.hpp)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
KOKKOS_ADD_LIBRARY(
|
KOKKOS_ADD_LIBRARY(
|
||||||
kokkoscore
|
kokkoscore
|
||||||
SOURCES ${KOKKOS_CORE_SRCS}
|
SOURCES ${KOKKOS_CORE_SRCS}
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -146,9 +146,9 @@ void CudaSpace::access_error(const void *const) {
|
|||||||
|
|
||||||
bool CudaUVMSpace::available() {
|
bool CudaUVMSpace::available() {
|
||||||
#if defined(CUDA_VERSION) && !defined(__APPLE__)
|
#if defined(CUDA_VERSION) && !defined(__APPLE__)
|
||||||
enum { UVM_available = true };
|
enum : bool { UVM_available = true };
|
||||||
#else
|
#else
|
||||||
enum { UVM_available = false };
|
enum : bool { UVM_available = false };
|
||||||
#endif
|
#endif
|
||||||
return UVM_available;
|
return UVM_available;
|
||||||
}
|
}
|
||||||
@ -201,8 +201,15 @@ CudaHostPinnedSpace::CudaHostPinnedSpace() {}
|
|||||||
void *CudaSpace::allocate(const size_t arg_alloc_size) const {
|
void *CudaSpace::allocate(const size_t arg_alloc_size) const {
|
||||||
return allocate("[unlabeled]", arg_alloc_size);
|
return allocate("[unlabeled]", arg_alloc_size);
|
||||||
}
|
}
|
||||||
|
|
||||||
void *CudaSpace::allocate(const char *arg_label, const size_t arg_alloc_size,
|
void *CudaSpace::allocate(const char *arg_label, const size_t arg_alloc_size,
|
||||||
const size_t arg_logical_size) const {
|
const size_t arg_logical_size) const {
|
||||||
|
return impl_allocate(arg_label, arg_alloc_size, arg_logical_size);
|
||||||
|
}
|
||||||
|
void *CudaSpace::impl_allocate(
|
||||||
|
const char *arg_label, const size_t arg_alloc_size,
|
||||||
|
const size_t arg_logical_size,
|
||||||
|
const Kokkos::Tools::SpaceHandle arg_handle) const {
|
||||||
void *ptr = nullptr;
|
void *ptr = nullptr;
|
||||||
|
|
||||||
auto error_code = cudaMalloc(&ptr, arg_alloc_size);
|
auto error_code = cudaMalloc(&ptr, arg_alloc_size);
|
||||||
@ -219,9 +226,7 @@ void *CudaSpace::allocate(const char *arg_label, const size_t arg_alloc_size,
|
|||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
const size_t reported_size =
|
const size_t reported_size =
|
||||||
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
||||||
Kokkos::Profiling::allocateData(
|
Kokkos::Profiling::allocateData(arg_handle, arg_label, ptr, reported_size);
|
||||||
Kokkos::Profiling::make_space_handle(name()), arg_label, ptr,
|
|
||||||
reported_size);
|
|
||||||
}
|
}
|
||||||
return ptr;
|
return ptr;
|
||||||
}
|
}
|
||||||
@ -231,6 +236,12 @@ void *CudaUVMSpace::allocate(const size_t arg_alloc_size) const {
|
|||||||
}
|
}
|
||||||
void *CudaUVMSpace::allocate(const char *arg_label, const size_t arg_alloc_size,
|
void *CudaUVMSpace::allocate(const char *arg_label, const size_t arg_alloc_size,
|
||||||
const size_t arg_logical_size) const {
|
const size_t arg_logical_size) const {
|
||||||
|
return impl_allocate(arg_label, arg_alloc_size, arg_logical_size);
|
||||||
|
}
|
||||||
|
void *CudaUVMSpace::impl_allocate(
|
||||||
|
const char *arg_label, const size_t arg_alloc_size,
|
||||||
|
const size_t arg_logical_size,
|
||||||
|
const Kokkos::Tools::SpaceHandle arg_handle) const {
|
||||||
void *ptr = nullptr;
|
void *ptr = nullptr;
|
||||||
|
|
||||||
Cuda::impl_static_fence();
|
Cuda::impl_static_fence();
|
||||||
@ -260,19 +271,22 @@ void *CudaUVMSpace::allocate(const char *arg_label, const size_t arg_alloc_size,
|
|||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
const size_t reported_size =
|
const size_t reported_size =
|
||||||
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
||||||
Kokkos::Profiling::allocateData(
|
Kokkos::Profiling::allocateData(arg_handle, arg_label, ptr, reported_size);
|
||||||
Kokkos::Profiling::make_space_handle(name()), arg_label, ptr,
|
|
||||||
reported_size);
|
|
||||||
}
|
}
|
||||||
return ptr;
|
return ptr;
|
||||||
}
|
}
|
||||||
|
|
||||||
void *CudaHostPinnedSpace::allocate(const size_t arg_alloc_size) const {
|
void *CudaHostPinnedSpace::allocate(const size_t arg_alloc_size) const {
|
||||||
return allocate("[unlabeled]", arg_alloc_size);
|
return allocate("[unlabeled]", arg_alloc_size);
|
||||||
}
|
}
|
||||||
void *CudaHostPinnedSpace::allocate(const char *arg_label,
|
void *CudaHostPinnedSpace::allocate(const char *arg_label,
|
||||||
const size_t arg_alloc_size,
|
const size_t arg_alloc_size,
|
||||||
const size_t arg_logical_size) const {
|
const size_t arg_logical_size) const {
|
||||||
|
return impl_allocate(arg_label, arg_alloc_size, arg_logical_size);
|
||||||
|
}
|
||||||
|
void *CudaHostPinnedSpace::impl_allocate(
|
||||||
|
const char *arg_label, const size_t arg_alloc_size,
|
||||||
|
const size_t arg_logical_size,
|
||||||
|
const Kokkos::Tools::SpaceHandle arg_handle) const {
|
||||||
void *ptr = nullptr;
|
void *ptr = nullptr;
|
||||||
|
|
||||||
auto error_code = cudaHostAlloc(&ptr, arg_alloc_size, cudaHostAllocDefault);
|
auto error_code = cudaHostAlloc(&ptr, arg_alloc_size, cudaHostAllocDefault);
|
||||||
@ -288,9 +302,7 @@ void *CudaHostPinnedSpace::allocate(const char *arg_label,
|
|||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
const size_t reported_size =
|
const size_t reported_size =
|
||||||
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
||||||
Kokkos::Profiling::allocateData(
|
Kokkos::Profiling::allocateData(arg_handle, arg_label, ptr, reported_size);
|
||||||
Kokkos::Profiling::make_space_handle(name()), arg_label, ptr,
|
|
||||||
reported_size);
|
|
||||||
}
|
}
|
||||||
return ptr;
|
return ptr;
|
||||||
}
|
}
|
||||||
@ -304,12 +316,17 @@ void CudaSpace::deallocate(void *const arg_alloc_ptr,
|
|||||||
void CudaSpace::deallocate(const char *arg_label, void *const arg_alloc_ptr,
|
void CudaSpace::deallocate(const char *arg_label, void *const arg_alloc_ptr,
|
||||||
const size_t arg_alloc_size,
|
const size_t arg_alloc_size,
|
||||||
const size_t arg_logical_size) const {
|
const size_t arg_logical_size) const {
|
||||||
|
impl_deallocate(arg_label, arg_alloc_ptr, arg_alloc_size, arg_logical_size);
|
||||||
|
}
|
||||||
|
void CudaSpace::impl_deallocate(
|
||||||
|
const char *arg_label, void *const arg_alloc_ptr,
|
||||||
|
const size_t arg_alloc_size, const size_t arg_logical_size,
|
||||||
|
const Kokkos::Tools::SpaceHandle arg_handle) const {
|
||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
const size_t reported_size =
|
const size_t reported_size =
|
||||||
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
||||||
Kokkos::Profiling::deallocateData(
|
Kokkos::Profiling::deallocateData(arg_handle, arg_label, arg_alloc_ptr,
|
||||||
Kokkos::Profiling::make_space_handle(name()), arg_label, arg_alloc_ptr,
|
reported_size);
|
||||||
reported_size);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
@ -327,13 +344,21 @@ void CudaUVMSpace::deallocate(const char *arg_label, void *const arg_alloc_ptr,
|
|||||||
|
|
||||||
,
|
,
|
||||||
const size_t arg_logical_size) const {
|
const size_t arg_logical_size) const {
|
||||||
|
impl_deallocate(arg_label, arg_alloc_ptr, arg_alloc_size, arg_logical_size);
|
||||||
|
}
|
||||||
|
void CudaUVMSpace::impl_deallocate(
|
||||||
|
const char *arg_label, void *const arg_alloc_ptr,
|
||||||
|
const size_t arg_alloc_size
|
||||||
|
|
||||||
|
,
|
||||||
|
const size_t arg_logical_size,
|
||||||
|
const Kokkos::Tools::SpaceHandle arg_handle) const {
|
||||||
Cuda::impl_static_fence();
|
Cuda::impl_static_fence();
|
||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
const size_t reported_size =
|
const size_t reported_size =
|
||||||
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
||||||
Kokkos::Profiling::deallocateData(
|
Kokkos::Profiling::deallocateData(arg_handle, arg_label, arg_alloc_ptr,
|
||||||
Kokkos::Profiling::make_space_handle(name()), arg_label, arg_alloc_ptr,
|
reported_size);
|
||||||
reported_size);
|
|
||||||
}
|
}
|
||||||
try {
|
try {
|
||||||
if (arg_alloc_ptr != nullptr) {
|
if (arg_alloc_ptr != nullptr) {
|
||||||
@ -349,17 +374,22 @@ void CudaHostPinnedSpace::deallocate(void *const arg_alloc_ptr,
|
|||||||
const size_t arg_alloc_size) const {
|
const size_t arg_alloc_size) const {
|
||||||
deallocate("[unlabeled]", arg_alloc_ptr, arg_alloc_size);
|
deallocate("[unlabeled]", arg_alloc_ptr, arg_alloc_size);
|
||||||
}
|
}
|
||||||
|
|
||||||
void CudaHostPinnedSpace::deallocate(const char *arg_label,
|
void CudaHostPinnedSpace::deallocate(const char *arg_label,
|
||||||
void *const arg_alloc_ptr,
|
void *const arg_alloc_ptr,
|
||||||
const size_t arg_alloc_size,
|
const size_t arg_alloc_size,
|
||||||
const size_t arg_logical_size) const {
|
const size_t arg_logical_size) const {
|
||||||
|
impl_deallocate(arg_label, arg_alloc_ptr, arg_alloc_size, arg_logical_size);
|
||||||
|
}
|
||||||
|
|
||||||
|
void CudaHostPinnedSpace::impl_deallocate(
|
||||||
|
const char *arg_label, void *const arg_alloc_ptr,
|
||||||
|
const size_t arg_alloc_size, const size_t arg_logical_size,
|
||||||
|
const Kokkos::Tools::SpaceHandle arg_handle) const {
|
||||||
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
if (Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
const size_t reported_size =
|
const size_t reported_size =
|
||||||
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
(arg_logical_size > 0) ? arg_logical_size : arg_alloc_size;
|
||||||
Kokkos::Profiling::deallocateData(
|
Kokkos::Profiling::deallocateData(arg_handle, arg_label, arg_alloc_ptr,
|
||||||
Kokkos::Profiling::make_space_handle(name()), arg_label, arg_alloc_ptr,
|
reported_size);
|
||||||
reported_size);
|
|
||||||
}
|
}
|
||||||
try {
|
try {
|
||||||
CUDA_SAFE_CALL(cudaFreeHost(arg_alloc_ptr));
|
CUDA_SAFE_CALL(cudaFreeHost(arg_alloc_ptr));
|
||||||
@ -375,7 +405,7 @@ void CudaHostPinnedSpace::deallocate(const char *arg_label,
|
|||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
#ifdef KOKKOS_DEBUG
|
#ifdef KOKKOS_ENABLE_DEBUG
|
||||||
SharedAllocationRecord<void, void>
|
SharedAllocationRecord<void, void>
|
||||||
SharedAllocationRecord<Kokkos::CudaSpace, void>::s_root_record;
|
SharedAllocationRecord<Kokkos::CudaSpace, void>::s_root_record;
|
||||||
|
|
||||||
@ -551,7 +581,7 @@ SharedAllocationRecord<Kokkos::CudaSpace, void>::SharedAllocationRecord(
|
|||||||
// Pass through allocated [ SharedAllocationHeader , user_memory ]
|
// Pass through allocated [ SharedAllocationHeader , user_memory ]
|
||||||
// Pass through deallocation function
|
// Pass through deallocation function
|
||||||
: SharedAllocationRecord<void, void>(
|
: SharedAllocationRecord<void, void>(
|
||||||
#ifdef KOKKOS_DEBUG
|
#ifdef KOKKOS_ENABLE_DEBUG
|
||||||
&SharedAllocationRecord<Kokkos::CudaSpace, void>::s_root_record,
|
&SharedAllocationRecord<Kokkos::CudaSpace, void>::s_root_record,
|
||||||
#endif
|
#endif
|
||||||
Impl::checked_allocation_with_header(arg_space, arg_label,
|
Impl::checked_allocation_with_header(arg_space, arg_label,
|
||||||
@ -582,7 +612,7 @@ SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::SharedAllocationRecord(
|
|||||||
// Pass through allocated [ SharedAllocationHeader , user_memory ]
|
// Pass through allocated [ SharedAllocationHeader , user_memory ]
|
||||||
// Pass through deallocation function
|
// Pass through deallocation function
|
||||||
: SharedAllocationRecord<void, void>(
|
: SharedAllocationRecord<void, void>(
|
||||||
#ifdef KOKKOS_DEBUG
|
#ifdef KOKKOS_ENABLE_DEBUG
|
||||||
&SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::s_root_record,
|
&SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::s_root_record,
|
||||||
#endif
|
#endif
|
||||||
Impl::checked_allocation_with_header(arg_space, arg_label,
|
Impl::checked_allocation_with_header(arg_space, arg_label,
|
||||||
@ -610,7 +640,7 @@ SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::
|
|||||||
// Pass through allocated [ SharedAllocationHeader , user_memory ]
|
// Pass through allocated [ SharedAllocationHeader , user_memory ]
|
||||||
// Pass through deallocation function
|
// Pass through deallocation function
|
||||||
: SharedAllocationRecord<void, void>(
|
: SharedAllocationRecord<void, void>(
|
||||||
#ifdef KOKKOS_DEBUG
|
#ifdef KOKKOS_ENABLE_DEBUG
|
||||||
&SharedAllocationRecord<Kokkos::CudaHostPinnedSpace,
|
&SharedAllocationRecord<Kokkos::CudaHostPinnedSpace,
|
||||||
void>::s_root_record,
|
void>::s_root_record,
|
||||||
#endif
|
#endif
|
||||||
@ -830,7 +860,7 @@ void SharedAllocationRecord<Kokkos::CudaSpace, void>::print_records(
|
|||||||
std::ostream &s, const Kokkos::CudaSpace &, bool detail) {
|
std::ostream &s, const Kokkos::CudaSpace &, bool detail) {
|
||||||
(void)s;
|
(void)s;
|
||||||
(void)detail;
|
(void)detail;
|
||||||
#ifdef KOKKOS_DEBUG
|
#ifdef KOKKOS_ENABLE_DEBUG
|
||||||
SharedAllocationRecord<void, void> *r = &s_root_record;
|
SharedAllocationRecord<void, void> *r = &s_root_record;
|
||||||
|
|
||||||
char buffer[256];
|
char buffer[256];
|
||||||
@ -896,7 +926,7 @@ void SharedAllocationRecord<Kokkos::CudaSpace, void>::print_records(
|
|||||||
#else
|
#else
|
||||||
Kokkos::Impl::throw_runtime_exception(
|
Kokkos::Impl::throw_runtime_exception(
|
||||||
"SharedAllocationHeader<CudaSpace>::print_records only works with "
|
"SharedAllocationHeader<CudaSpace>::print_records only works with "
|
||||||
"KOKKOS_DEBUG enabled");
|
"KOKKOS_ENABLE_DEBUG enabled");
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -904,13 +934,13 @@ void SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::print_records(
|
|||||||
std::ostream &s, const Kokkos::CudaUVMSpace &, bool detail) {
|
std::ostream &s, const Kokkos::CudaUVMSpace &, bool detail) {
|
||||||
(void)s;
|
(void)s;
|
||||||
(void)detail;
|
(void)detail;
|
||||||
#ifdef KOKKOS_DEBUG
|
#ifdef KOKKOS_ENABLE_DEBUG
|
||||||
SharedAllocationRecord<void, void>::print_host_accessible_records(
|
SharedAllocationRecord<void, void>::print_host_accessible_records(
|
||||||
s, "CudaUVM", &s_root_record, detail);
|
s, "CudaUVM", &s_root_record, detail);
|
||||||
#else
|
#else
|
||||||
Kokkos::Impl::throw_runtime_exception(
|
Kokkos::Impl::throw_runtime_exception(
|
||||||
"SharedAllocationHeader<CudaSpace>::print_records only works with "
|
"SharedAllocationHeader<CudaSpace>::print_records only works with "
|
||||||
"KOKKOS_DEBUG enabled");
|
"KOKKOS_ENABLE_DEBUG enabled");
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -918,13 +948,13 @@ void SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::print_records(
|
|||||||
std::ostream &s, const Kokkos::CudaHostPinnedSpace &, bool detail) {
|
std::ostream &s, const Kokkos::CudaHostPinnedSpace &, bool detail) {
|
||||||
(void)s;
|
(void)s;
|
||||||
(void)detail;
|
(void)detail;
|
||||||
#ifdef KOKKOS_DEBUG
|
#ifdef KOKKOS_ENABLE_DEBUG
|
||||||
SharedAllocationRecord<void, void>::print_host_accessible_records(
|
SharedAllocationRecord<void, void>::print_host_accessible_records(
|
||||||
s, "CudaHostPinned", &s_root_record, detail);
|
s, "CudaHostPinned", &s_root_record, detail);
|
||||||
#else
|
#else
|
||||||
Kokkos::Impl::throw_runtime_exception(
|
Kokkos::Impl::throw_runtime_exception(
|
||||||
"SharedAllocationHeader<CudaSpace>::print_records only works with "
|
"SharedAllocationHeader<CudaSpace>::print_records only works with "
|
||||||
"KOKKOS_DEBUG enabled");
|
"KOKKOS_ENABLE_DEBUG enabled");
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -198,6 +198,39 @@ int cuda_get_opt_block_size(const CudaInternal* cuda_instance,
|
|||||||
LaunchBounds{});
|
LaunchBounds{});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Assuming cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1)
|
||||||
|
// NOTE these number can be obtained several ways:
|
||||||
|
// * One option is to download the CUDA Occupancy Calculator spreadsheet, select
|
||||||
|
// "Compute Capability" first and check what is the smallest "Shared Memory
|
||||||
|
// Size Config" that is available. The "Shared Memory Per Multiprocessor" in
|
||||||
|
// bytes is then to be found below in the summary.
|
||||||
|
// * Another option would be to look for the information in the "Tuning
|
||||||
|
// Guide(s)" of the CUDA Toolkit Documentation for each GPU architecture, in
|
||||||
|
// the "Shared Memory" section (more tedious)
|
||||||
|
inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
|
||||||
|
int const compute_capability = properties.major * 10 + properties.minor;
|
||||||
|
return [compute_capability]() {
|
||||||
|
switch (compute_capability) {
|
||||||
|
case 30:
|
||||||
|
case 32:
|
||||||
|
case 35: return 16;
|
||||||
|
case 37: return 80;
|
||||||
|
case 50:
|
||||||
|
case 53:
|
||||||
|
case 60:
|
||||||
|
case 62: return 64;
|
||||||
|
case 52:
|
||||||
|
case 61: return 96;
|
||||||
|
case 70:
|
||||||
|
case 80: return 8;
|
||||||
|
case 75: return 32;
|
||||||
|
default:
|
||||||
|
Kokkos::Impl::throw_runtime_exception(
|
||||||
|
"Unknown device in cuda block size deduction");
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}() * 1024;
|
||||||
|
}
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
|||||||
210
lib/kokkos/core/src/Cuda/Kokkos_Cuda_GraphNodeKernel.hpp
Normal file
210
lib/kokkos/core/src/Cuda/Kokkos_Cuda_GraphNodeKernel.hpp
Normal file
@ -0,0 +1,210 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 3.0
|
||||||
|
// Copyright (2020) National Technology & Engineering
|
||||||
|
// Solutions of Sandia, LLC (NTESS).
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef KOKKOS_KOKKOS_CUDA_GRAPHNODEKERNEL_IMPL_HPP
|
||||||
|
#define KOKKOS_KOKKOS_CUDA_GRAPHNODEKERNEL_IMPL_HPP
|
||||||
|
|
||||||
|
#include <Kokkos_Macros.hpp>
|
||||||
|
|
||||||
|
#if defined(KOKKOS_ENABLE_CUDA) && defined(KOKKOS_CUDA_ENABLE_GRAPHS)
|
||||||
|
|
||||||
|
#include <Kokkos_Graph_fwd.hpp>
|
||||||
|
|
||||||
|
#include <impl/Kokkos_GraphImpl.hpp> // GraphAccess needs to be complete
|
||||||
|
#include <impl/Kokkos_SharedAlloc.hpp> // SharedAllocationRecord
|
||||||
|
|
||||||
|
#include <Kokkos_Parallel.hpp>
|
||||||
|
#include <Kokkos_Parallel_Reduce.hpp>
|
||||||
|
#include <Kokkos_PointerOwnership.hpp>
|
||||||
|
|
||||||
|
#include <Kokkos_Cuda.hpp>
|
||||||
|
#include <cuda_runtime_api.h>
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
|
template <class PolicyType, class Functor, class PatternTag, class... Args>
|
||||||
|
class GraphNodeKernelImpl<Kokkos::Cuda, PolicyType, Functor, PatternTag,
|
||||||
|
Args...>
|
||||||
|
: public PatternImplSpecializationFromTag<PatternTag, Functor, PolicyType,
|
||||||
|
Args..., Kokkos::Cuda>::type {
|
||||||
|
private:
|
||||||
|
using base_t =
|
||||||
|
typename PatternImplSpecializationFromTag<PatternTag, Functor, PolicyType,
|
||||||
|
Args..., Kokkos::Cuda>::type;
|
||||||
|
using size_type = Kokkos::Cuda::size_type;
|
||||||
|
// These are really functioning as optional references, though I'm not sure
|
||||||
|
// that the cudaGraph_t one needs to be since it's a pointer under the
|
||||||
|
// covers and we're not modifying it
|
||||||
|
Kokkos::ObservingRawPtr<const cudaGraph_t> m_graph_ptr = nullptr;
|
||||||
|
Kokkos::ObservingRawPtr<cudaGraphNode_t> m_graph_node_ptr = nullptr;
|
||||||
|
// Note: owned pointer to CudaSpace memory (used for global memory launches),
|
||||||
|
// which we're responsible for deallocating, but not responsible for calling
|
||||||
|
// its destructor.
|
||||||
|
using Record = Kokkos::Impl::SharedAllocationRecord<Kokkos::CudaSpace, void>;
|
||||||
|
// Basically, we have to make this mutable for the same reasons that the
|
||||||
|
// global kernel buffers in the Cuda instance are mutable...
|
||||||
|
mutable Kokkos::OwningRawPtr<base_t> m_driver_storage = nullptr;
|
||||||
|
|
||||||
|
public:
|
||||||
|
using Policy = PolicyType;
|
||||||
|
using graph_kernel = GraphNodeKernelImpl;
|
||||||
|
|
||||||
|
// TODO Ensure the execution space of the graph is the same as the one
|
||||||
|
// attached to the policy?
|
||||||
|
// TODO @graph kernel name info propagation
|
||||||
|
template <class PolicyDeduced, class... ArgsDeduced>
|
||||||
|
GraphNodeKernelImpl(std::string, Kokkos::Cuda const&, Functor arg_functor,
|
||||||
|
PolicyDeduced&& arg_policy, ArgsDeduced&&... args)
|
||||||
|
// This is super ugly, but it works for now and is the most minimal change
|
||||||
|
// to the codebase for now...
|
||||||
|
: base_t(std::move(arg_functor), (PolicyDeduced &&) arg_policy,
|
||||||
|
(ArgsDeduced &&) args...) {}
|
||||||
|
|
||||||
|
// FIXME @graph Forward through the instance once that works in the backends
|
||||||
|
template <class PolicyDeduced>
|
||||||
|
GraphNodeKernelImpl(Kokkos::Cuda const& ex, Functor arg_functor,
|
||||||
|
PolicyDeduced&& arg_policy)
|
||||||
|
: GraphNodeKernelImpl("", ex, std::move(arg_functor),
|
||||||
|
(PolicyDeduced &&) arg_policy) {}
|
||||||
|
|
||||||
|
~GraphNodeKernelImpl() {
|
||||||
|
if (m_driver_storage) {
|
||||||
|
// We should be the only owner, but this is still the easiest way to
|
||||||
|
// allocate and deallocate aligned memory for these sorts of things
|
||||||
|
Record::decrement(Record::get_record(m_driver_storage));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void set_cuda_graph_ptr(cudaGraph_t* arg_graph_ptr) {
|
||||||
|
m_graph_ptr = arg_graph_ptr;
|
||||||
|
}
|
||||||
|
void set_cuda_graph_node_ptr(cudaGraphNode_t* arg_node_ptr) {
|
||||||
|
m_graph_node_ptr = arg_node_ptr;
|
||||||
|
}
|
||||||
|
cudaGraphNode_t* get_cuda_graph_node_ptr() const { return m_graph_node_ptr; }
|
||||||
|
cudaGraph_t const* get_cuda_graph_ptr() const { return m_graph_ptr; }
|
||||||
|
|
||||||
|
Kokkos::ObservingRawPtr<base_t> allocate_driver_memory_buffer() const {
|
||||||
|
KOKKOS_EXPECTS(m_driver_storage == nullptr)
|
||||||
|
|
||||||
|
auto* record = Record::allocate(
|
||||||
|
Kokkos::CudaSpace{}, "GraphNodeKernel global memory functor storage",
|
||||||
|
sizeof(base_t));
|
||||||
|
|
||||||
|
Record::increment(record);
|
||||||
|
m_driver_storage = reinterpret_cast<base_t*>(record->data());
|
||||||
|
KOKKOS_ENSURES(m_driver_storage != nullptr)
|
||||||
|
return m_driver_storage;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
struct CudaGraphNodeAggregateKernel {
|
||||||
|
using graph_kernel = CudaGraphNodeAggregateKernel;
|
||||||
|
|
||||||
|
// Aggregates don't need a policy, but for the purposes of checking the static
|
||||||
|
// assertions about graph kerenls,
|
||||||
|
struct Policy {
|
||||||
|
using is_graph_kernel = std::true_type;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class KernelType,
|
||||||
|
class Tag =
|
||||||
|
typename PatternTagFromImplSpecialization<KernelType>::type>
|
||||||
|
struct get_graph_node_kernel_type
|
||||||
|
: identity<GraphNodeKernelImpl<Kokkos::Cuda, typename KernelType::Policy,
|
||||||
|
typename KernelType::functor_type, Tag>> {};
|
||||||
|
template <class KernelType>
|
||||||
|
struct get_graph_node_kernel_type<KernelType, Kokkos::ParallelReduceTag>
|
||||||
|
: identity<GraphNodeKernelImpl<Kokkos::Cuda, typename KernelType::Policy,
|
||||||
|
typename KernelType::functor_type,
|
||||||
|
Kokkos::ParallelReduceTag,
|
||||||
|
typename KernelType::reducer_type>> {};
|
||||||
|
|
||||||
|
//==============================================================================
|
||||||
|
// <editor-fold desc="get_cuda_graph_*() helper functions"> {{{1
|
||||||
|
|
||||||
|
template <class KernelType>
|
||||||
|
auto* allocate_driver_storage_for_kernel(KernelType const& kernel) {
|
||||||
|
using graph_node_kernel_t =
|
||||||
|
typename get_graph_node_kernel_type<KernelType>::type;
|
||||||
|
auto const& kernel_as_graph_kernel =
|
||||||
|
static_cast<graph_node_kernel_t const&>(kernel);
|
||||||
|
// TODO @graphs we need to somehow indicate the need for a fence in the
|
||||||
|
// destructor of the GraphImpl object (so that we don't have to
|
||||||
|
// just always do it)
|
||||||
|
return kernel_as_graph_kernel.allocate_driver_memory_buffer();
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class KernelType>
|
||||||
|
auto const& get_cuda_graph_from_kernel(KernelType const& kernel) {
|
||||||
|
using graph_node_kernel_t =
|
||||||
|
typename get_graph_node_kernel_type<KernelType>::type;
|
||||||
|
auto const& kernel_as_graph_kernel =
|
||||||
|
static_cast<graph_node_kernel_t const&>(kernel);
|
||||||
|
cudaGraph_t const* graph_ptr = kernel_as_graph_kernel.get_cuda_graph_ptr();
|
||||||
|
KOKKOS_EXPECTS(graph_ptr != nullptr);
|
||||||
|
return *graph_ptr;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class KernelType>
|
||||||
|
auto& get_cuda_graph_node_from_kernel(KernelType const& kernel) {
|
||||||
|
using graph_node_kernel_t =
|
||||||
|
typename get_graph_node_kernel_type<KernelType>::type;
|
||||||
|
auto const& kernel_as_graph_kernel =
|
||||||
|
static_cast<graph_node_kernel_t const&>(kernel);
|
||||||
|
auto* graph_node_ptr = kernel_as_graph_kernel.get_cuda_graph_node_ptr();
|
||||||
|
KOKKOS_EXPECTS(graph_node_ptr != nullptr);
|
||||||
|
return *graph_node_ptr;
|
||||||
|
}
|
||||||
|
|
||||||
|
// </editor-fold> end get_cuda_graph_*() helper functions }}}1
|
||||||
|
//==============================================================================
|
||||||
|
|
||||||
|
} // end namespace Impl
|
||||||
|
} // end namespace Kokkos
|
||||||
|
|
||||||
|
#endif // defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
#endif // KOKKOS_KOKKOS_CUDA_GRAPHNODEKERNEL_IMPL_HPP
|
||||||
@ -42,85 +42,62 @@
|
|||||||
//@HEADER
|
//@HEADER
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#include <type_traits>
|
#ifndef KOKKOS_KOKKOS_CUDA_GRAPHNODE_IMPL_HPP
|
||||||
|
#define KOKKOS_KOKKOS_CUDA_GRAPHNODE_IMPL_HPP
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
|
|
||||||
#if !defined(KOKKOS_ROCM_INVOKE_H)
|
#if defined(KOKKOS_ENABLE_CUDA) && defined(KOKKOS_CUDA_ENABLE_GRAPHS)
|
||||||
#define KOKKOS_ROCM_INVOKE_H
|
|
||||||
|
#include <Kokkos_Graph_fwd.hpp>
|
||||||
|
|
||||||
|
#include <impl/Kokkos_GraphImpl.hpp> // GraphAccess needs to be complete
|
||||||
|
|
||||||
|
#include <Kokkos_Cuda.hpp>
|
||||||
|
#include <cuda_runtime_api.h>
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
template <class Tag, class F, class... Ts,
|
template <>
|
||||||
typename std::enable_if<(!std::is_void<Tag>()), int>::type = 0>
|
struct GraphNodeBackendSpecificDetails<Kokkos::Cuda> {
|
||||||
KOKKOS_INLINE_FUNCTION void rocm_invoke(F&& f, Ts&&... xs) {
|
cudaGraphNode_t node = nullptr;
|
||||||
f(Tag(), static_cast<Ts&&>(xs)...);
|
|
||||||
}
|
|
||||||
|
|
||||||
template <class Tag, class F, class... Ts,
|
//----------------------------------------------------------------------------
|
||||||
typename std::enable_if<(std::is_void<Tag>()), int>::type = 0>
|
// <editor-fold desc="Ctors, destructor, and assignment"> {{{2
|
||||||
KOKKOS_INLINE_FUNCTION void rocm_invoke(F&& f, Ts&&... xs) {
|
|
||||||
f(static_cast<Ts&&>(xs)...);
|
|
||||||
}
|
|
||||||
|
|
||||||
template <class F, class Tag = void>
|
explicit GraphNodeBackendSpecificDetails() = default;
|
||||||
struct rocm_invoke_fn {
|
|
||||||
F* f;
|
|
||||||
rocm_invoke_fn(F& f_) : f(&f_) {}
|
|
||||||
|
|
||||||
template <class... Ts>
|
explicit GraphNodeBackendSpecificDetails(
|
||||||
KOKKOS_INLINE_FUNCTION void operator()(Ts&&... xs) const {
|
_graph_node_is_root_ctor_tag) noexcept {}
|
||||||
rocm_invoke<Tag>(*f, static_cast<Ts&&>(xs)...);
|
|
||||||
}
|
// </editor-fold> end Ctors, destructor, and assignment }}}2
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
};
|
};
|
||||||
|
|
||||||
template <class Tag, class F>
|
template <class Kernel, class PredecessorRef>
|
||||||
KOKKOS_INLINE_FUNCTION rocm_invoke_fn<F, Tag> make_rocm_invoke_fn(F& f) {
|
struct GraphNodeBackendDetailsBeforeTypeErasure<Kokkos::Cuda, Kernel,
|
||||||
return {f};
|
PredecessorRef> {
|
||||||
}
|
protected:
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
// <editor-fold desc="ctors, destructor, and assignment"> {{{2
|
||||||
|
|
||||||
template <class T>
|
GraphNodeBackendDetailsBeforeTypeErasure(
|
||||||
KOKKOS_INLINE_FUNCTION T& rocm_unwrap(T& x) {
|
Kokkos::Cuda const&, Kernel&, PredecessorRef const&,
|
||||||
return x;
|
GraphNodeBackendSpecificDetails<Kokkos::Cuda>&) noexcept {}
|
||||||
}
|
|
||||||
|
|
||||||
template <class T>
|
GraphNodeBackendDetailsBeforeTypeErasure(
|
||||||
KOKKOS_INLINE_FUNCTION T& rocm_unwrap(std::reference_wrapper<T> x) {
|
Kokkos::Cuda const&, _graph_node_is_root_ctor_tag,
|
||||||
return x;
|
GraphNodeBackendSpecificDetails<Kokkos::Cuda>&) noexcept {}
|
||||||
}
|
|
||||||
|
|
||||||
template <class F, class T>
|
// </editor-fold> end ctors, destructor, and assignment }}}2
|
||||||
struct rocm_capture_fn {
|
//----------------------------------------------------------------------------
|
||||||
F f;
|
|
||||||
T data;
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION rocm_capture_fn(F f_, T x) : f(f_), data(x) {}
|
|
||||||
|
|
||||||
template <class... Ts>
|
|
||||||
KOKKOS_INLINE_FUNCTION void operator()(Ts&&... xs) const {
|
|
||||||
f(rocm_unwrap(data), static_cast<Ts&&>(xs)...);
|
|
||||||
}
|
|
||||||
};
|
};
|
||||||
|
|
||||||
template <class F, class T>
|
} // end namespace Impl
|
||||||
KOKKOS_INLINE_FUNCTION rocm_capture_fn<F, T> rocm_capture(F f, T x) {
|
} // end namespace Kokkos
|
||||||
return {f, x};
|
|
||||||
}
|
|
||||||
|
|
||||||
template <class F, class T, class U, class... Ts>
|
#include <Cuda/Kokkos_Cuda_GraphNodeKernel.hpp>
|
||||||
KOKKOS_INLINE_FUNCTION auto rocm_capture(F f, T x, U y, Ts... xs)
|
|
||||||
-> decltype(rocm_capture(rocm_capture(f, x), y, xs...)) {
|
|
||||||
return rocm_capture(rocm_capture(f, x), y, xs...);
|
|
||||||
}
|
|
||||||
|
|
||||||
struct rocm_apply_op {
|
#endif // defined(KOKKOS_ENABLE_CUDA)
|
||||||
template <class F, class... Ts>
|
#endif // KOKKOS_KOKKOS_CUDA_GRAPHNODE_IMPL_HPP
|
||||||
KOKKOS_INLINE_FUNCTION void operator()(F&& f, Ts&&... xs) const {
|
|
||||||
f(static_cast<Ts&&>(xs)...);
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
} // namespace Impl
|
|
||||||
} // namespace Kokkos
|
|
||||||
|
|
||||||
#endif
|
|
||||||
219
lib/kokkos/core/src/Cuda/Kokkos_Cuda_Graph_Impl.hpp
Normal file
219
lib/kokkos/core/src/Cuda/Kokkos_Cuda_Graph_Impl.hpp
Normal file
@ -0,0 +1,219 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 3.0
|
||||||
|
// Copyright (2020) National Technology & Engineering
|
||||||
|
// Solutions of Sandia, LLC (NTESS).
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef KOKKOS_KOKKOS_CUDA_GRAPH_IMPL_HPP
|
||||||
|
#define KOKKOS_KOKKOS_CUDA_GRAPH_IMPL_HPP
|
||||||
|
|
||||||
|
#include <Kokkos_Macros.hpp>
|
||||||
|
|
||||||
|
#if defined(KOKKOS_ENABLE_CUDA) && defined(KOKKOS_CUDA_ENABLE_GRAPHS)
|
||||||
|
|
||||||
|
#include <Kokkos_Graph_fwd.hpp>
|
||||||
|
|
||||||
|
#include <impl/Kokkos_GraphImpl.hpp> // GraphAccess needs to be complete
|
||||||
|
|
||||||
|
// GraphNodeImpl needs to be complete because GraphImpl here is a full
|
||||||
|
// specialization and not just a partial one
|
||||||
|
#include <impl/Kokkos_GraphNodeImpl.hpp>
|
||||||
|
#include <Cuda/Kokkos_Cuda_GraphNode_Impl.hpp>
|
||||||
|
|
||||||
|
#include <Kokkos_Cuda.hpp>
|
||||||
|
#include <cuda_runtime_api.h>
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
|
template <>
|
||||||
|
struct GraphImpl<Kokkos::Cuda> {
|
||||||
|
public:
|
||||||
|
using execution_space = Kokkos::Cuda;
|
||||||
|
|
||||||
|
private:
|
||||||
|
execution_space m_execution_space;
|
||||||
|
cudaGraph_t m_graph = nullptr;
|
||||||
|
cudaGraphExec_t m_graph_exec = nullptr;
|
||||||
|
|
||||||
|
using cuda_graph_flags_t = unsigned int;
|
||||||
|
|
||||||
|
using node_details_t = GraphNodeBackendSpecificDetails<Kokkos::Cuda>;
|
||||||
|
|
||||||
|
void _instantiate_graph() {
|
||||||
|
constexpr size_t error_log_size = 256;
|
||||||
|
cudaGraphNode_t error_node = nullptr;
|
||||||
|
char error_log[error_log_size];
|
||||||
|
CUDA_SAFE_CALL(cudaGraphInstantiate(&m_graph_exec, m_graph, &error_node,
|
||||||
|
error_log, error_log_size));
|
||||||
|
// TODO @graphs print out errors
|
||||||
|
}
|
||||||
|
|
||||||
|
public:
|
||||||
|
using root_node_impl_t =
|
||||||
|
GraphNodeImpl<Kokkos::Cuda, Kokkos::Experimental::TypeErasedTag,
|
||||||
|
Kokkos::Experimental::TypeErasedTag>;
|
||||||
|
using aggregate_kernel_impl_t = CudaGraphNodeAggregateKernel;
|
||||||
|
using aggregate_node_impl_t =
|
||||||
|
GraphNodeImpl<Kokkos::Cuda, aggregate_kernel_impl_t,
|
||||||
|
Kokkos::Experimental::TypeErasedTag>;
|
||||||
|
|
||||||
|
// Not moveable or copyable; it spends its whole life as a shared_ptr in the
|
||||||
|
// Graph object
|
||||||
|
GraphImpl() = delete;
|
||||||
|
GraphImpl(GraphImpl const&) = delete;
|
||||||
|
GraphImpl(GraphImpl&&) = delete;
|
||||||
|
GraphImpl& operator=(GraphImpl const&) = delete;
|
||||||
|
GraphImpl& operator=(GraphImpl&&) = delete;
|
||||||
|
~GraphImpl() {
|
||||||
|
// TODO @graphs we need to somehow indicate the need for a fence in the
|
||||||
|
// destructor of the GraphImpl object (so that we don't have to
|
||||||
|
// just always do it)
|
||||||
|
m_execution_space.fence();
|
||||||
|
KOKKOS_EXPECTS(bool(m_graph))
|
||||||
|
if (bool(m_graph_exec)) {
|
||||||
|
CUDA_SAFE_CALL(cudaGraphExecDestroy(m_graph_exec));
|
||||||
|
}
|
||||||
|
CUDA_SAFE_CALL(cudaGraphDestroy(m_graph));
|
||||||
|
};
|
||||||
|
|
||||||
|
explicit GraphImpl(Kokkos::Cuda arg_instance)
|
||||||
|
: m_execution_space(std::move(arg_instance)) {
|
||||||
|
CUDA_SAFE_CALL(cudaGraphCreate(&m_graph, cuda_graph_flags_t{0}));
|
||||||
|
}
|
||||||
|
|
||||||
|
void add_node(std::shared_ptr<aggregate_node_impl_t> const& arg_node_ptr) {
|
||||||
|
// All of the predecessors are just added as normal, so all we need to
|
||||||
|
// do here is add an empty node
|
||||||
|
CUDA_SAFE_CALL(cudaGraphAddEmptyNode(&(arg_node_ptr->node_details_t::node),
|
||||||
|
m_graph,
|
||||||
|
/* dependencies = */ nullptr,
|
||||||
|
/* numDependencies = */ 0));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class NodeImpl>
|
||||||
|
// requires NodeImplPtr is a shared_ptr to specialization of GraphNodeImpl
|
||||||
|
// Also requires that the kernel has the graph node tag in it's policy
|
||||||
|
void add_node(std::shared_ptr<NodeImpl> const& arg_node_ptr) {
|
||||||
|
static_assert(
|
||||||
|
NodeImpl::kernel_type::Policy::is_graph_kernel::value,
|
||||||
|
"Something has gone horribly wrong, but it's too complicated to "
|
||||||
|
"explain here. Buy Daisy a coffee and she'll explain it to you.");
|
||||||
|
KOKKOS_EXPECTS(bool(arg_node_ptr));
|
||||||
|
// The Kernel launch from the execute() method has been shimmed to insert
|
||||||
|
// the node into the graph
|
||||||
|
auto& kernel = arg_node_ptr->get_kernel();
|
||||||
|
// note: using arg_node_ptr->node_details_t::node caused an ICE in NVCC 10.1
|
||||||
|
auto& cuda_node = static_cast<node_details_t*>(arg_node_ptr.get())->node;
|
||||||
|
KOKKOS_EXPECTS(!bool(cuda_node));
|
||||||
|
kernel.set_cuda_graph_ptr(&m_graph);
|
||||||
|
kernel.set_cuda_graph_node_ptr(&cuda_node);
|
||||||
|
kernel.execute();
|
||||||
|
KOKKOS_ENSURES(bool(cuda_node));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class NodeImplPtr, class PredecessorRef>
|
||||||
|
// requires PredecessorRef is a specialization of GraphNodeRef that has
|
||||||
|
// already been added to this graph and NodeImpl is a specialization of
|
||||||
|
// GraphNodeImpl that has already been added to this graph.
|
||||||
|
void add_predecessor(NodeImplPtr arg_node_ptr, PredecessorRef arg_pred_ref) {
|
||||||
|
KOKKOS_EXPECTS(bool(arg_node_ptr))
|
||||||
|
auto pred_ptr = GraphAccess::get_node_ptr(arg_pred_ref);
|
||||||
|
KOKKOS_EXPECTS(bool(pred_ptr))
|
||||||
|
|
||||||
|
// clang-format off
|
||||||
|
// NOTE const-qualifiers below are commented out because of an API break
|
||||||
|
// from CUDA 10.0 to CUDA 10.1
|
||||||
|
// cudaGraphAddDependencies(cudaGraph_t, cudaGraphNode_t*, cudaGraphNode_t*, size_t)
|
||||||
|
// cudaGraphAddDependencies(cudaGraph_t, const cudaGraphNode_t*, const cudaGraphNode_t*, size_t)
|
||||||
|
// clang-format on
|
||||||
|
auto /*const*/& pred_cuda_node = pred_ptr->node_details_t::node;
|
||||||
|
KOKKOS_EXPECTS(bool(pred_cuda_node))
|
||||||
|
|
||||||
|
auto /*const*/& cuda_node = arg_node_ptr->node_details_t::node;
|
||||||
|
KOKKOS_EXPECTS(bool(cuda_node))
|
||||||
|
|
||||||
|
CUDA_SAFE_CALL(
|
||||||
|
cudaGraphAddDependencies(m_graph, &pred_cuda_node, &cuda_node, 1));
|
||||||
|
}
|
||||||
|
|
||||||
|
void submit() {
|
||||||
|
if (!bool(m_graph_exec)) {
|
||||||
|
_instantiate_graph();
|
||||||
|
}
|
||||||
|
CUDA_SAFE_CALL(
|
||||||
|
cudaGraphLaunch(m_graph_exec, m_execution_space.cuda_stream()));
|
||||||
|
}
|
||||||
|
|
||||||
|
execution_space const& get_execution_space() const noexcept {
|
||||||
|
return m_execution_space;
|
||||||
|
}
|
||||||
|
|
||||||
|
auto create_root_node_ptr() {
|
||||||
|
KOKKOS_EXPECTS(bool(m_graph))
|
||||||
|
KOKKOS_EXPECTS(!bool(m_graph_exec))
|
||||||
|
auto rv = std::make_shared<root_node_impl_t>(
|
||||||
|
get_execution_space(), _graph_node_is_root_ctor_tag{});
|
||||||
|
CUDA_SAFE_CALL(cudaGraphAddEmptyNode(&(rv->node_details_t::node), m_graph,
|
||||||
|
/* dependencies = */ nullptr,
|
||||||
|
/* numDependencies = */ 0));
|
||||||
|
KOKKOS_ENSURES(bool(rv->node_details_t::node))
|
||||||
|
return rv;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class... PredecessorRefs>
|
||||||
|
// See requirements/expectations in GraphBuilder
|
||||||
|
auto create_aggregate_ptr(PredecessorRefs&&...) {
|
||||||
|
// The attachment to predecessors, which is all we really need, happens
|
||||||
|
// in the generic layer, which calls through to add_predecessor for
|
||||||
|
// each predecessor ref, so all we need to do here is create the (trivial)
|
||||||
|
// aggregate node.
|
||||||
|
return std::make_shared<aggregate_node_impl_t>(
|
||||||
|
m_execution_space, _graph_node_kernel_ctor_tag{},
|
||||||
|
aggregate_kernel_impl_t{});
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
} // end namespace Impl
|
||||||
|
} // end namespace Kokkos
|
||||||
|
|
||||||
|
#endif // defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
#endif // KOKKOS_KOKKOS_CUDA_GRAPH_IMPL_HPP
|
||||||
710
lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp
Normal file
710
lib/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp
Normal file
@ -0,0 +1,710 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 3.0
|
||||||
|
// Copyright (2020) National Technology & Engineering
|
||||||
|
// Solutions of Sandia, LLC (NTESS).
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-NA0003525 with NTESS,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef KOKKOS_CUDA_HALF_HPP_
|
||||||
|
#define KOKKOS_CUDA_HALF_HPP_
|
||||||
|
|
||||||
|
#include <Kokkos_Macros.hpp>
|
||||||
|
#ifdef KOKKOS_ENABLE_CUDA
|
||||||
|
#if !(defined(KOKKOS_COMPILER_CLANG) && KOKKOS_COMPILER_CLANG < 900) && \
|
||||||
|
!(defined(KOKKOS_ARCH_KEPLER) || defined(KOKKOS_ARCH_MAXWELL50) || \
|
||||||
|
defined(KOKKOS_ARCH_MAXWELL52))
|
||||||
|
#include <cuda_fp16.h>
|
||||||
|
|
||||||
|
#ifndef KOKKOS_IMPL_HALF_TYPE_DEFINED
|
||||||
|
// Make sure no one else tries to define half_t
|
||||||
|
#define KOKKOS_IMPL_HALF_TYPE_DEFINED
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
namespace Impl {
|
||||||
|
struct half_impl_t {
|
||||||
|
using type = __half;
|
||||||
|
};
|
||||||
|
} // namespace Impl
|
||||||
|
namespace Experimental {
|
||||||
|
|
||||||
|
// Forward declarations
|
||||||
|
class half_t;
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(float val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(bool val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(double val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(short val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(int val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(long val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(long long val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned short val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned int val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned long val);
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned long long val);
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, float>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, bool>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, double>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, short>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, int>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long long>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
std::enable_if_t<std::is_same<T, unsigned short>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, unsigned int>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
std::enable_if_t<std::is_same<T, unsigned long>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
std::enable_if_t<std::is_same<T, unsigned long long>::value, T>
|
||||||
|
cast_from_half(half_t);
|
||||||
|
|
||||||
|
class half_t {
|
||||||
|
public:
|
||||||
|
using impl_type = Kokkos::Impl::half_impl_t::type;
|
||||||
|
|
||||||
|
private:
|
||||||
|
impl_type val;
|
||||||
|
|
||||||
|
public:
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t() : val(0.0F) {}
|
||||||
|
|
||||||
|
// Don't support implicit conversion back to impl_type.
|
||||||
|
// impl_type is a storage only type on host.
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator impl_type() const { return val; }
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator float() const { return cast_from_half<float>(*this); }
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator bool() const { return cast_from_half<bool>(*this); }
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator double() const { return cast_from_half<double>(*this); }
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator short() const { return cast_from_half<short>(*this); }
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator int() const { return cast_from_half<int>(*this); }
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator long() const { return cast_from_half<long>(*this); }
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator long long() const {
|
||||||
|
return cast_from_half<long long>(*this);
|
||||||
|
}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator unsigned short() const {
|
||||||
|
return cast_from_half<unsigned short>(*this);
|
||||||
|
}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator unsigned int() const {
|
||||||
|
return cast_from_half<unsigned int>(*this);
|
||||||
|
}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator unsigned long() const {
|
||||||
|
return cast_from_half<unsigned long>(*this);
|
||||||
|
}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit operator unsigned long long() const {
|
||||||
|
return cast_from_half<unsigned long long>(*this);
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t(impl_type rhs) : val(rhs) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(float rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(bool rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(double rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(short rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(int rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(long rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(long long rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(unsigned short rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(unsigned int rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(unsigned long rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
explicit half_t(unsigned long long rhs) : val(cast_to_half(rhs).val) {}
|
||||||
|
|
||||||
|
// Unary operators
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t operator+() const {
|
||||||
|
half_t tmp = *this;
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
tmp.val = +tmp.val;
|
||||||
|
#else
|
||||||
|
tmp.val = __float2half(+__half2float(tmp.val));
|
||||||
|
#endif
|
||||||
|
return tmp;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t operator-() const {
|
||||||
|
half_t tmp = *this;
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
tmp.val = -tmp.val;
|
||||||
|
#else
|
||||||
|
tmp.val = __float2half(-__half2float(tmp.val));
|
||||||
|
#endif
|
||||||
|
return tmp;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Prefix operators
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t& operator++() {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
++val;
|
||||||
|
#else
|
||||||
|
float tmp = __half2float(val);
|
||||||
|
++tmp;
|
||||||
|
val = __float2half(tmp);
|
||||||
|
#endif
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t& operator--() {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
--val;
|
||||||
|
#else
|
||||||
|
float tmp = __half2float(val);
|
||||||
|
--tmp;
|
||||||
|
val = __float2half(tmp);
|
||||||
|
#endif
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Postfix operators
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t operator++(int) {
|
||||||
|
half_t tmp = *this;
|
||||||
|
operator++();
|
||||||
|
return tmp;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t operator--(int) {
|
||||||
|
half_t tmp = *this;
|
||||||
|
operator--();
|
||||||
|
return tmp;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Binary operators
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t& operator=(impl_type rhs) {
|
||||||
|
val = rhs;
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_FUNCTION half_t& operator=(T rhs) {
|
||||||
|
val = cast_to_half(rhs).val;
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Compound operators
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t& operator+=(half_t rhs) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
val += rhs.val;
|
||||||
|
#else
|
||||||
|
val = __float2half(__half2float(val) + __half2float(rhs.val));
|
||||||
|
#endif
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t& operator-=(half_t rhs) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
val -= rhs.val;
|
||||||
|
#else
|
||||||
|
val = __float2half(__half2float(val) - __half2float(rhs.val));
|
||||||
|
#endif
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t& operator*=(half_t rhs) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
val *= rhs.val;
|
||||||
|
#else
|
||||||
|
val = __float2half(__half2float(val) * __half2float(rhs.val));
|
||||||
|
#endif
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t& operator/=(half_t rhs) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
val /= rhs.val;
|
||||||
|
#else
|
||||||
|
val = __float2half(__half2float(val) / __half2float(rhs.val));
|
||||||
|
#endif
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Binary Arithmetic
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t friend operator+(half_t lhs, half_t rhs) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
lhs.val += rhs.val;
|
||||||
|
#else
|
||||||
|
lhs.val = __float2half(__half2float(lhs.val) + __half2float(rhs.val));
|
||||||
|
#endif
|
||||||
|
return lhs;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t friend operator-(half_t lhs, half_t rhs) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
lhs.val -= rhs.val;
|
||||||
|
#else
|
||||||
|
lhs.val = __float2half(__half2float(lhs.val) - __half2float(rhs.val));
|
||||||
|
#endif
|
||||||
|
return lhs;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t friend operator*(half_t lhs, half_t rhs) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
lhs.val *= rhs.val;
|
||||||
|
#else
|
||||||
|
lhs.val = __float2half(__half2float(lhs.val) * __half2float(rhs.val));
|
||||||
|
#endif
|
||||||
|
return lhs;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
half_t friend operator/(half_t lhs, half_t rhs) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
lhs.val /= rhs.val;
|
||||||
|
#else
|
||||||
|
lhs.val = __float2half(__half2float(lhs.val) / __half2float(rhs.val));
|
||||||
|
#endif
|
||||||
|
return lhs;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Logical operators
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
bool operator!() const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return static_cast<bool>(!val);
|
||||||
|
#else
|
||||||
|
return !__half2float(val);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
// NOTE: Loses short-circuit evaluation
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
bool operator&&(half_t rhs) const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return static_cast<bool>(val && rhs.val);
|
||||||
|
#else
|
||||||
|
return __half2float(val) && __half2float(rhs.val);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
// NOTE: Loses short-circuit evaluation
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
bool operator||(half_t rhs) const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return static_cast<bool>(val || rhs.val);
|
||||||
|
#else
|
||||||
|
return __half2float(val) || __half2float(rhs.val);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
// Comparison operators
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
bool operator==(half_t rhs) const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return static_cast<bool>(val == rhs.val);
|
||||||
|
#else
|
||||||
|
return __half2float(val) == __half2float(rhs.val);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
bool operator!=(half_t rhs) const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return static_cast<bool>(val != rhs.val);
|
||||||
|
#else
|
||||||
|
return __half2float(val) != __half2float(rhs.val);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
bool operator<(half_t rhs) const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return static_cast<bool>(val < rhs.val);
|
||||||
|
#else
|
||||||
|
return __half2float(val) < __half2float(rhs.val);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
bool operator>(half_t rhs) const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return static_cast<bool>(val > rhs.val);
|
||||||
|
#else
|
||||||
|
return __half2float(val) > __half2float(rhs.val);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
bool operator<=(half_t rhs) const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return static_cast<bool>(val <= rhs.val);
|
||||||
|
#else
|
||||||
|
return __half2float(val) <= __half2float(rhs.val);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
bool operator>=(half_t rhs) const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return static_cast<bool>(val >= rhs.val);
|
||||||
|
#else
|
||||||
|
return __half2float(val) >= __half2float(rhs.val);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// CUDA before 11.1 only has the half <-> float conversions marked host device
|
||||||
|
// So we will largely convert to float on the host for conversion
|
||||||
|
// But still call the correct functions on the device
|
||||||
|
#if (CUDA_VERSION < 11100)
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(half_t val) { return val; }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(float val) { return half_t(__float2half(val)); }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(bool val) { return cast_to_half(static_cast<float>(val)); }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(double val) {
|
||||||
|
// double2half was only introduced in CUDA 11 too
|
||||||
|
return half_t(__float2half(static_cast<float>(val)));
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(short val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return half_t(__short2half_rn(val));
|
||||||
|
#else
|
||||||
|
return half_t(__float2half(static_cast<float>(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned short val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return half_t(__ushort2half_rn(val));
|
||||||
|
#else
|
||||||
|
return half_t(__float2half(static_cast<float>(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(int val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return half_t(__int2half_rn(val));
|
||||||
|
#else
|
||||||
|
return half_t(__float2half(static_cast<float>(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned int val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return half_t(__uint2half_rn(val));
|
||||||
|
#else
|
||||||
|
return half_t(__float2half(static_cast<float>(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(long long val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return half_t(__ll2half_rn(val));
|
||||||
|
#else
|
||||||
|
return half_t(__float2half(static_cast<float>(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned long long val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return half_t(__ull2half_rn(val));
|
||||||
|
#else
|
||||||
|
return half_t(__float2half(static_cast<float>(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(long val) {
|
||||||
|
return cast_to_half(static_cast<long long>(val));
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned long val) {
|
||||||
|
return cast_to_half(static_cast<unsigned long long>(val));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, float>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return __half2float(half_t::impl_type(val));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, bool>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return static_cast<T>(cast_from_half<float>(val));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, double>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return static_cast<T>(__half2float(half_t::impl_type(val)));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, short>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return __half2short_rz(half_t::impl_type(val));
|
||||||
|
#else
|
||||||
|
return static_cast<T>(__half2float(half_t::impl_type(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
std::enable_if_t<std::is_same<T, unsigned short>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return __half2ushort_rz(half_t::impl_type(val));
|
||||||
|
#else
|
||||||
|
return static_cast<T>(__half2float(half_t::impl_type(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, int>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return __half2int_rz(half_t::impl_type(val));
|
||||||
|
#else
|
||||||
|
return static_cast<T>(__half2float(half_t::impl_type(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, unsigned>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return __half2uint_rz(half_t::impl_type(val));
|
||||||
|
#else
|
||||||
|
return static_cast<T>(__half2float(half_t::impl_type(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long long>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return __half2ll_rz(half_t::impl_type(val));
|
||||||
|
#else
|
||||||
|
return static_cast<T>(__half2float(half_t::impl_type(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
std::enable_if_t<std::is_same<T, unsigned long long>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return __half2ull_rz(half_t::impl_type(val));
|
||||||
|
#else
|
||||||
|
return static_cast<T>(__half2float(half_t::impl_type(val)));
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return static_cast<T>(cast_from_half<long long>(val));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
std::enable_if_t<std::is_same<T, unsigned long>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return static_cast<T>(cast_from_half<unsigned long long>(val));
|
||||||
|
}
|
||||||
|
|
||||||
|
#else // CUDA 11.1 versions follow
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(float val) { return __float2half(val); }
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(double val) { return __double2half(val); }
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(short val) { return __short2half_rn(val); }
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned short val) { return __ushort2half_rn(val); }
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(int val) { return __int2half_rn(val); }
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned int val) { return __uint2half_rn(val); }
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(long long val) { return __ll2half_rn(val); }
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned long long val) { return __ull2half_rn(val); }
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(long val) {
|
||||||
|
return cast_to_half(static_cast<long long>(val));
|
||||||
|
}
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
half_t cast_to_half(unsigned long val) {
|
||||||
|
return cast_to_half(static_cast<unsigned long long>(val));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, float>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return __half2float(val);
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, double>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return __half2double(val);
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, short>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return __half2short_rz(val);
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
std::enable_if_t<std::is_same<T, unsigned short>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return __half2ushort_rz(val);
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, int>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return __half2int_rz(val);
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, unsigned int>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return __half2uint_rz(val);
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long long>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return __half2ll_rz(val);
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
std::enable_if_t<std::is_same<T, unsigned long long>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return __half2ull_rz(val);
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return static_cast<T>(cast_from_half<long long>(val));
|
||||||
|
}
|
||||||
|
template <class T>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
std::enable_if_t<std::is_same<T, unsigned long>::value, T>
|
||||||
|
cast_from_half(half_t val) {
|
||||||
|
return static_cast<T>(cast_from_half<unsigned long long>(val));
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
} // namespace Experimental
|
||||||
|
} // namespace Kokkos
|
||||||
|
#endif // KOKKOS_IMPL_HALF_TYPE_DEFINED
|
||||||
|
#endif // KOKKOS_ENABLE_CUDA
|
||||||
|
#endif // Disables for half_t on cuda:
|
||||||
|
// Clang/8||KEPLER30||KEPLER32||KEPLER37||MAXWELL50||MAXWELL52
|
||||||
|
#endif
|
||||||
@ -132,7 +132,7 @@ int cuda_kernel_arch() {
|
|||||||
bool cuda_launch_blocking() {
|
bool cuda_launch_blocking() {
|
||||||
const char *env = getenv("CUDA_LAUNCH_BLOCKING");
|
const char *env = getenv("CUDA_LAUNCH_BLOCKING");
|
||||||
|
|
||||||
if (env == 0) return false;
|
if (env == nullptr) return false;
|
||||||
|
|
||||||
return std::stoi(env);
|
return std::stoi(env);
|
||||||
}
|
}
|
||||||
@ -509,14 +509,14 @@ void CudaInternal::initialize(int cuda_device_id, cudaStream_t stream) {
|
|||||||
const char *env_force_device_alloc =
|
const char *env_force_device_alloc =
|
||||||
getenv("CUDA_MANAGED_FORCE_DEVICE_ALLOC");
|
getenv("CUDA_MANAGED_FORCE_DEVICE_ALLOC");
|
||||||
bool force_device_alloc;
|
bool force_device_alloc;
|
||||||
if (env_force_device_alloc == 0)
|
if (env_force_device_alloc == nullptr)
|
||||||
force_device_alloc = false;
|
force_device_alloc = false;
|
||||||
else
|
else
|
||||||
force_device_alloc = std::stoi(env_force_device_alloc) != 0;
|
force_device_alloc = std::stoi(env_force_device_alloc) != 0;
|
||||||
|
|
||||||
const char *env_visible_devices = getenv("CUDA_VISIBLE_DEVICES");
|
const char *env_visible_devices = getenv("CUDA_VISIBLE_DEVICES");
|
||||||
bool visible_devices_one = true;
|
bool visible_devices_one = true;
|
||||||
if (env_visible_devices == 0) visible_devices_one = false;
|
if (env_visible_devices == nullptr) visible_devices_one = false;
|
||||||
|
|
||||||
if (Kokkos::show_warnings() &&
|
if (Kokkos::show_warnings() &&
|
||||||
(!visible_devices_one && !force_device_alloc)) {
|
(!visible_devices_one && !force_device_alloc)) {
|
||||||
@ -893,6 +893,92 @@ const cudaDeviceProp &Cuda::cuda_device_prop() const {
|
|||||||
return m_space_instance->m_deviceProp;
|
return m_space_instance->m_deviceProp;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
|
int get_gpu(const InitArguments &args);
|
||||||
|
|
||||||
|
int g_cuda_space_factory_initialized =
|
||||||
|
initialize_space_factory<CudaSpaceInitializer>("150_Cuda");
|
||||||
|
|
||||||
|
void CudaSpaceInitializer::initialize(const InitArguments &args) {
|
||||||
|
int use_gpu = get_gpu(args);
|
||||||
|
if (std::is_same<Kokkos::Cuda, Kokkos::DefaultExecutionSpace>::value ||
|
||||||
|
0 < use_gpu) {
|
||||||
|
if (use_gpu > -1) {
|
||||||
|
Kokkos::Cuda::impl_initialize(Kokkos::Cuda::SelectDevice(use_gpu));
|
||||||
|
} else {
|
||||||
|
Kokkos::Cuda::impl_initialize();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void CudaSpaceInitializer::finalize(bool all_spaces) {
|
||||||
|
if ((std::is_same<Kokkos::Cuda, Kokkos::DefaultExecutionSpace>::value ||
|
||||||
|
all_spaces) &&
|
||||||
|
Kokkos::Cuda::impl_is_initialized()) {
|
||||||
|
Kokkos::Cuda::impl_finalize();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void CudaSpaceInitializer::fence() { Kokkos::Cuda::impl_static_fence(); }
|
||||||
|
|
||||||
|
void CudaSpaceInitializer::print_configuration(std::ostream &msg,
|
||||||
|
const bool detail) {
|
||||||
|
msg << "Device Execution Space:" << std::endl;
|
||||||
|
msg << " KOKKOS_ENABLE_CUDA: ";
|
||||||
|
msg << "yes" << std::endl;
|
||||||
|
|
||||||
|
msg << "Cuda Atomics:" << std::endl;
|
||||||
|
msg << " KOKKOS_ENABLE_CUDA_ATOMICS: ";
|
||||||
|
#ifdef KOKKOS_ENABLE_CUDA_ATOMICS
|
||||||
|
msg << "yes" << std::endl;
|
||||||
|
#else
|
||||||
|
msg << "no" << std::endl;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
msg << "Cuda Options:" << std::endl;
|
||||||
|
msg << " KOKKOS_ENABLE_CUDA_LAMBDA: ";
|
||||||
|
#ifdef KOKKOS_ENABLE_CUDA_LAMBDA
|
||||||
|
msg << "yes" << std::endl;
|
||||||
|
#else
|
||||||
|
msg << "no" << std::endl;
|
||||||
|
#endif
|
||||||
|
msg << " KOKKOS_ENABLE_CUDA_LDG_INTRINSIC: ";
|
||||||
|
#ifdef KOKKOS_ENABLE_CUDA_LDG_INTRINSIC
|
||||||
|
msg << "yes" << std::endl;
|
||||||
|
#else
|
||||||
|
msg << "no" << std::endl;
|
||||||
|
#endif
|
||||||
|
msg << " KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE: ";
|
||||||
|
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
|
||||||
|
msg << "yes" << std::endl;
|
||||||
|
#else
|
||||||
|
msg << "no" << std::endl;
|
||||||
|
#endif
|
||||||
|
msg << " KOKKOS_ENABLE_CUDA_UVM: ";
|
||||||
|
#ifdef KOKKOS_ENABLE_CUDA_UVM
|
||||||
|
msg << "yes" << std::endl;
|
||||||
|
#else
|
||||||
|
msg << "no" << std::endl;
|
||||||
|
#endif
|
||||||
|
msg << " KOKKOS_ENABLE_CUSPARSE: ";
|
||||||
|
#ifdef KOKKOS_ENABLE_CUSPARSE
|
||||||
|
msg << "yes" << std::endl;
|
||||||
|
#else
|
||||||
|
msg << "no" << std::endl;
|
||||||
|
#endif
|
||||||
|
msg << " KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA: ";
|
||||||
|
#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
|
||||||
|
msg << "yes" << std::endl;
|
||||||
|
#else
|
||||||
|
msg << "no" << std::endl;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
msg << "\nCuda Runtime Configuration:" << std::endl;
|
||||||
|
Cuda::print_configuration(msg, detail);
|
||||||
|
}
|
||||||
|
} // namespace Impl
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
|
|||||||
@ -34,7 +34,9 @@ struct CudaTraits {
|
|||||||
enum : CudaSpace::size_type {
|
enum : CudaSpace::size_type {
|
||||||
KernelArgumentLimit = 0x001000 /* 4k bytes */
|
KernelArgumentLimit = 0x001000 /* 4k bytes */
|
||||||
};
|
};
|
||||||
|
enum : CudaSpace::size_type {
|
||||||
|
MaxHierarchicalParallelism = 1024 /* team_size * vector_length */
|
||||||
|
};
|
||||||
using ConstantGlobalBufferType =
|
using ConstantGlobalBufferType =
|
||||||
unsigned long[ConstantMemoryUsage / sizeof(unsigned long)];
|
unsigned long[ConstantMemoryUsage / sizeof(unsigned long)];
|
||||||
|
|
||||||
|
|||||||
@ -48,20 +48,23 @@
|
|||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
#ifdef KOKKOS_ENABLE_CUDA
|
#ifdef KOKKOS_ENABLE_CUDA
|
||||||
|
|
||||||
|
#include <mutex>
|
||||||
#include <string>
|
#include <string>
|
||||||
#include <cstdint>
|
#include <cstdint>
|
||||||
|
#include <cmath>
|
||||||
#include <Kokkos_Parallel.hpp>
|
#include <Kokkos_Parallel.hpp>
|
||||||
#include <impl/Kokkos_Error.hpp>
|
#include <impl/Kokkos_Error.hpp>
|
||||||
#include <Cuda/Kokkos_Cuda_abort.hpp>
|
#include <Cuda/Kokkos_Cuda_abort.hpp>
|
||||||
#include <Cuda/Kokkos_Cuda_Error.hpp>
|
#include <Cuda/Kokkos_Cuda_Error.hpp>
|
||||||
#include <Cuda/Kokkos_Cuda_Locks.hpp>
|
#include <Cuda/Kokkos_Cuda_Locks.hpp>
|
||||||
#include <Cuda/Kokkos_Cuda_Instance.hpp>
|
#include <Cuda/Kokkos_Cuda_Instance.hpp>
|
||||||
|
#include <impl/Kokkos_GraphImpl_fwd.hpp>
|
||||||
|
#include <Cuda/Kokkos_Cuda_GraphNodeKernel.hpp>
|
||||||
|
#include <Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp>
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#if defined(__CUDACC__)
|
|
||||||
|
|
||||||
/** \brief Access to constant memory on the device */
|
/** \brief Access to constant memory on the device */
|
||||||
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
|
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
|
||||||
|
|
||||||
@ -140,29 +143,85 @@ __global__ __launch_bounds__(
|
|||||||
driver->operator()();
|
driver->operator()();
|
||||||
}
|
}
|
||||||
|
|
||||||
template <class DriverType>
|
//==============================================================================
|
||||||
__global__ static void cuda_parallel_launch_constant_or_global_memory(
|
// <editor-fold desc="Some helper functions for launch code readability"> {{{1
|
||||||
const DriverType* driver_ptr) {
|
|
||||||
const DriverType& driver =
|
|
||||||
driver_ptr != nullptr
|
|
||||||
? *driver_ptr
|
|
||||||
: *((const DriverType*)kokkos_impl_cuda_constant_memory_buffer);
|
|
||||||
|
|
||||||
driver();
|
inline bool is_empty_launch(dim3 const& grid, dim3 const& block) {
|
||||||
|
return (grid.x == 0) || ((block.x * block.y * block.z) == 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
template <class DriverType, unsigned int maxTperB, unsigned int minBperSM>
|
inline void check_shmem_request(CudaInternal const* cuda_instance, int shmem) {
|
||||||
__global__
|
if (cuda_instance->m_maxShmemPerBlock < shmem) {
|
||||||
__launch_bounds__(maxTperB, minBperSM) static void cuda_parallel_launch_constant_or_global_memory(
|
Kokkos::Impl::throw_runtime_exception(
|
||||||
const DriverType* driver_ptr) {
|
std::string("CudaParallelLaunch (or graph node creation) FAILED: shared"
|
||||||
const DriverType& driver =
|
" memory request is too large"));
|
||||||
driver_ptr != nullptr
|
}
|
||||||
? *driver_ptr
|
|
||||||
: *((const DriverType*)kokkos_impl_cuda_constant_memory_buffer);
|
|
||||||
|
|
||||||
driver();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
template <class KernelFuncPtr>
|
||||||
|
inline void configure_shmem_preference(KernelFuncPtr const& func,
|
||||||
|
bool prefer_shmem) {
|
||||||
|
#ifndef KOKKOS_ARCH_KEPLER
|
||||||
|
// On Kepler the L1 has no benefit since it doesn't cache reads
|
||||||
|
auto set_cache_config = [&] {
|
||||||
|
CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
|
||||||
|
func,
|
||||||
|
(prefer_shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1)));
|
||||||
|
return prefer_shmem;
|
||||||
|
};
|
||||||
|
static bool cache_config_preference_cached = set_cache_config();
|
||||||
|
if (cache_config_preference_cached != prefer_shmem) {
|
||||||
|
cache_config_preference_cached = set_cache_config();
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
// Use the parameters so we don't get a warning
|
||||||
|
(void)func;
|
||||||
|
(void)prefer_shmem;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Policy>
|
||||||
|
std::enable_if_t<Policy::experimental_contains_desired_occupancy>
|
||||||
|
modify_launch_configuration_if_desired_occupancy_is_specified(
|
||||||
|
Policy const& policy, cudaDeviceProp const& properties,
|
||||||
|
cudaFuncAttributes const& attributes, dim3 const& block, int& shmem,
|
||||||
|
bool& prefer_shmem) {
|
||||||
|
int const block_size = block.x * block.y * block.z;
|
||||||
|
int const desired_occupancy = policy.impl_get_desired_occupancy().value();
|
||||||
|
|
||||||
|
size_t const shmem_per_sm_prefer_l1 = get_shmem_per_sm_prefer_l1(properties);
|
||||||
|
size_t const static_shmem = attributes.sharedSizeBytes;
|
||||||
|
|
||||||
|
// round to nearest integer and avoid division by zero
|
||||||
|
int active_blocks = std::max(
|
||||||
|
1, static_cast<int>(std::round(
|
||||||
|
static_cast<double>(properties.maxThreadsPerMultiProcessor) /
|
||||||
|
block_size * desired_occupancy / 100)));
|
||||||
|
int const dynamic_shmem =
|
||||||
|
shmem_per_sm_prefer_l1 / active_blocks - static_shmem;
|
||||||
|
|
||||||
|
if (dynamic_shmem > shmem) {
|
||||||
|
shmem = dynamic_shmem;
|
||||||
|
prefer_shmem = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Policy>
|
||||||
|
std::enable_if_t<!Policy::experimental_contains_desired_occupancy>
|
||||||
|
modify_launch_configuration_if_desired_occupancy_is_specified(
|
||||||
|
Policy const&, cudaDeviceProp const&, cudaFuncAttributes const&,
|
||||||
|
dim3 const& /*block*/, int& /*shmem*/, bool& /*prefer_shmem*/) {}
|
||||||
|
|
||||||
|
// </editor-fold> end Some helper functions for launch code readability }}}1
|
||||||
|
//==============================================================================
|
||||||
|
|
||||||
|
//==============================================================================
|
||||||
|
// <editor-fold desc="DeduceCudaLaunchMechanism"> {{{2
|
||||||
|
|
||||||
|
// Use local memory up to ConstantMemoryUseThreshold
|
||||||
|
// Use global memory above ConstantMemoryUsage
|
||||||
|
// In between use ConstantMemory
|
||||||
|
|
||||||
template <class DriverType>
|
template <class DriverType>
|
||||||
struct DeduceCudaLaunchMechanism {
|
struct DeduceCudaLaunchMechanism {
|
||||||
constexpr static const Kokkos::Experimental::WorkItemProperty::
|
constexpr static const Kokkos::Experimental::WorkItemProperty::
|
||||||
@ -217,408 +276,362 @@ struct DeduceCudaLaunchMechanism {
|
|||||||
: Experimental::CudaLaunchMechanism::GlobalMemory)
|
: Experimental::CudaLaunchMechanism::GlobalMemory)
|
||||||
: (default_launch_mechanism));
|
: (default_launch_mechanism));
|
||||||
};
|
};
|
||||||
// Use local memory up to ConstantMemoryUseThreshold
|
|
||||||
// Use global memory above ConstantMemoryUsage
|
// </editor-fold> end DeduceCudaLaunchMechanism }}}2
|
||||||
// In between use ConstantMemory
|
//==============================================================================
|
||||||
template <class DriverType, class LaunchBounds = Kokkos::LaunchBounds<>,
|
|
||||||
Experimental::CudaLaunchMechanism LaunchMechanism =
|
//==============================================================================
|
||||||
DeduceCudaLaunchMechanism<DriverType>::launch_mechanism>
|
// <editor-fold desc="CudaParallelLaunchKernelInvoker"> {{{1
|
||||||
struct CudaParallelLaunch;
|
|
||||||
|
// Base classes that summarize the differences between the different launch
|
||||||
|
// mechanisms
|
||||||
|
|
||||||
|
template <class DriverType, class LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism LaunchMechanism>
|
||||||
|
struct CudaParallelLaunchKernelFunc;
|
||||||
|
|
||||||
|
template <class DriverType, class LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism LaunchMechanism>
|
||||||
|
struct CudaParallelLaunchKernelInvoker;
|
||||||
|
|
||||||
|
//------------------------------------------------------------------------------
|
||||||
|
// <editor-fold desc="Local memory"> {{{2
|
||||||
|
|
||||||
template <class DriverType, unsigned int MaxThreadsPerBlock,
|
template <class DriverType, unsigned int MaxThreadsPerBlock,
|
||||||
unsigned int MinBlocksPerSM>
|
unsigned int MinBlocksPerSM>
|
||||||
struct CudaParallelLaunch<
|
struct CudaParallelLaunchKernelFunc<
|
||||||
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
|
|
||||||
Experimental::CudaLaunchMechanism::ConstantMemory> {
|
|
||||||
static_assert(sizeof(DriverType) < CudaTraits::ConstantMemoryUsage,
|
|
||||||
"Kokkos Error: Requested CudaLaunchConstantMemory with a "
|
|
||||||
"Functor larger than 32kB.");
|
|
||||||
inline CudaParallelLaunch(const DriverType& driver, const dim3& grid,
|
|
||||||
const dim3& block, const int shmem,
|
|
||||||
const CudaInternal* cuda_instance,
|
|
||||||
const bool prefer_shmem) {
|
|
||||||
if ((grid.x != 0) && ((block.x * block.y * block.z) != 0)) {
|
|
||||||
if (cuda_instance->m_maxShmemPerBlock < shmem) {
|
|
||||||
Kokkos::Impl::throw_runtime_exception(std::string(
|
|
||||||
"CudaParallelLaunch FAILED: shared memory request is too large"));
|
|
||||||
}
|
|
||||||
#ifndef KOKKOS_ARCH_KEPLER
|
|
||||||
// On Kepler the L1 has no benefit since it doesn't cache reads
|
|
||||||
else {
|
|
||||||
static bool cache_config_set = false;
|
|
||||||
if (!cache_config_set) {
|
|
||||||
CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
|
|
||||||
cuda_parallel_launch_constant_memory<
|
|
||||||
DriverType, MaxThreadsPerBlock, MinBlocksPerSM>,
|
|
||||||
(prefer_shmem ? cudaFuncCachePreferShared
|
|
||||||
: cudaFuncCachePreferL1)));
|
|
||||||
cache_config_set = true;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
(void)prefer_shmem;
|
|
||||||
#endif
|
|
||||||
|
|
||||||
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
|
|
||||||
|
|
||||||
// Wait until the previous kernel that uses the constant buffer is done
|
|
||||||
CUDA_SAFE_CALL(cudaEventSynchronize(cuda_instance->constantMemReusable));
|
|
||||||
|
|
||||||
// Copy functor (synchronously) to staging buffer in pinned host memory
|
|
||||||
unsigned long* staging = cuda_instance->constantMemHostStaging;
|
|
||||||
memcpy(staging, &driver, sizeof(DriverType));
|
|
||||||
|
|
||||||
// Copy functor asynchronously from there to constant memory on the device
|
|
||||||
cudaMemcpyToSymbolAsync(kokkos_impl_cuda_constant_memory_buffer, staging,
|
|
||||||
sizeof(DriverType), 0, cudaMemcpyHostToDevice,
|
|
||||||
cudaStream_t(cuda_instance->m_stream));
|
|
||||||
|
|
||||||
// Invoke the driver function on the device
|
|
||||||
cuda_parallel_launch_constant_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>
|
|
||||||
<<<grid, block, shmem, cuda_instance->m_stream>>>();
|
|
||||||
|
|
||||||
// Record an event that says when the constant buffer can be reused
|
|
||||||
CUDA_SAFE_CALL(cudaEventRecord(cuda_instance->constantMemReusable,
|
|
||||||
cudaStream_t(cuda_instance->m_stream)));
|
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
|
|
||||||
CUDA_SAFE_CALL(cudaGetLastError());
|
|
||||||
Kokkos::Cuda().fence();
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
static cudaFuncAttributes get_cuda_func_attributes() {
|
|
||||||
// Race condition inside of cudaFuncGetAttributes if the same address is
|
|
||||||
// given requires using a local variable as input instead of a static Rely
|
|
||||||
// on static variable initialization to make sure only one thread executes
|
|
||||||
// the code and the result is visible.
|
|
||||||
auto wrap_get_attributes = []() -> cudaFuncAttributes {
|
|
||||||
cudaFuncAttributes attr_tmp;
|
|
||||||
CUDA_SAFE_CALL(cudaFuncGetAttributes(
|
|
||||||
&attr_tmp,
|
|
||||||
cuda_parallel_launch_constant_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>));
|
|
||||||
return attr_tmp;
|
|
||||||
};
|
|
||||||
static cudaFuncAttributes attr = wrap_get_attributes();
|
|
||||||
return attr;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template <class DriverType>
|
|
||||||
struct CudaParallelLaunch<DriverType, Kokkos::LaunchBounds<0, 0>,
|
|
||||||
Experimental::CudaLaunchMechanism::ConstantMemory> {
|
|
||||||
static_assert(sizeof(DriverType) < CudaTraits::ConstantMemoryUsage,
|
|
||||||
"Kokkos Error: Requested CudaLaunchConstantMemory with a "
|
|
||||||
"Functor larger than 32kB.");
|
|
||||||
inline CudaParallelLaunch(const DriverType& driver, const dim3& grid,
|
|
||||||
const dim3& block, const int shmem,
|
|
||||||
const CudaInternal* cuda_instance,
|
|
||||||
const bool prefer_shmem) {
|
|
||||||
if ((grid.x != 0) && ((block.x * block.y * block.z) != 0)) {
|
|
||||||
if (cuda_instance->m_maxShmemPerBlock < shmem) {
|
|
||||||
Kokkos::Impl::throw_runtime_exception(std::string(
|
|
||||||
"CudaParallelLaunch FAILED: shared memory request is too large"));
|
|
||||||
}
|
|
||||||
#ifndef KOKKOS_ARCH_KEPLER
|
|
||||||
// On Kepler the L1 has no benefit since it doesn't cache reads
|
|
||||||
else {
|
|
||||||
static bool cache_config_set = false;
|
|
||||||
if (!cache_config_set) {
|
|
||||||
CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
|
|
||||||
cuda_parallel_launch_constant_memory<DriverType>,
|
|
||||||
(prefer_shmem ? cudaFuncCachePreferShared
|
|
||||||
: cudaFuncCachePreferL1)));
|
|
||||||
cache_config_set = true;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
(void)prefer_shmem;
|
|
||||||
#endif
|
|
||||||
|
|
||||||
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
|
|
||||||
|
|
||||||
// Wait until the previous kernel that uses the constant buffer is done
|
|
||||||
CUDA_SAFE_CALL(cudaEventSynchronize(cuda_instance->constantMemReusable));
|
|
||||||
|
|
||||||
// Copy functor (synchronously) to staging buffer in pinned host memory
|
|
||||||
unsigned long* staging = cuda_instance->constantMemHostStaging;
|
|
||||||
memcpy(staging, &driver, sizeof(DriverType));
|
|
||||||
|
|
||||||
// Copy functor asynchronously from there to constant memory on the device
|
|
||||||
cudaMemcpyToSymbolAsync(kokkos_impl_cuda_constant_memory_buffer, staging,
|
|
||||||
sizeof(DriverType), 0, cudaMemcpyHostToDevice,
|
|
||||||
cudaStream_t(cuda_instance->m_stream));
|
|
||||||
|
|
||||||
// Invoke the driver function on the device
|
|
||||||
cuda_parallel_launch_constant_memory<DriverType>
|
|
||||||
<<<grid, block, shmem, cuda_instance->m_stream>>>();
|
|
||||||
|
|
||||||
// Record an event that says when the constant buffer can be reused
|
|
||||||
CUDA_SAFE_CALL(cudaEventRecord(cuda_instance->constantMemReusable,
|
|
||||||
cudaStream_t(cuda_instance->m_stream)));
|
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
|
|
||||||
CUDA_SAFE_CALL(cudaGetLastError());
|
|
||||||
Kokkos::Cuda().fence();
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
static cudaFuncAttributes get_cuda_func_attributes() {
|
|
||||||
// Race condition inside of cudaFuncGetAttributes if the same address is
|
|
||||||
// given requires using a local variable as input instead of a static Rely
|
|
||||||
// on static variable initialization to make sure only one thread executes
|
|
||||||
// the code and the result is visible.
|
|
||||||
auto wrap_get_attributes = []() -> cudaFuncAttributes {
|
|
||||||
cudaFuncAttributes attr_tmp;
|
|
||||||
CUDA_SAFE_CALL(cudaFuncGetAttributes(
|
|
||||||
&attr_tmp, cuda_parallel_launch_constant_memory<DriverType>));
|
|
||||||
return attr_tmp;
|
|
||||||
};
|
|
||||||
static cudaFuncAttributes attr = wrap_get_attributes();
|
|
||||||
return attr;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template <class DriverType, unsigned int MaxThreadsPerBlock,
|
|
||||||
unsigned int MinBlocksPerSM>
|
|
||||||
struct CudaParallelLaunch<
|
|
||||||
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
|
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
|
||||||
Experimental::CudaLaunchMechanism::LocalMemory> {
|
Experimental::CudaLaunchMechanism::LocalMemory> {
|
||||||
static_assert(sizeof(DriverType) < CudaTraits::KernelArgumentLimit,
|
static std::decay_t<decltype(cuda_parallel_launch_local_memory<
|
||||||
"Kokkos Error: Requested CudaLaunchLocalMemory with a Functor "
|
DriverType, MaxThreadsPerBlock, MinBlocksPerSM>)>
|
||||||
"larger than 4096 bytes.");
|
get_kernel_func() {
|
||||||
inline CudaParallelLaunch(const DriverType& driver, const dim3& grid,
|
return cuda_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
|
||||||
const dim3& block, const int shmem,
|
MinBlocksPerSM>;
|
||||||
const CudaInternal* cuda_instance,
|
|
||||||
const bool prefer_shmem) {
|
|
||||||
if ((grid.x != 0) && ((block.x * block.y * block.z) != 0)) {
|
|
||||||
if (cuda_instance->m_maxShmemPerBlock < shmem) {
|
|
||||||
Kokkos::Impl::throw_runtime_exception(std::string(
|
|
||||||
"CudaParallelLaunch FAILED: shared memory request is too large"));
|
|
||||||
}
|
|
||||||
#ifndef KOKKOS_ARCH_KEPLER
|
|
||||||
// On Kepler the L1 has no benefit since it doesn't cache reads
|
|
||||||
else {
|
|
||||||
static bool cache_config_set = false;
|
|
||||||
if (!cache_config_set) {
|
|
||||||
CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
|
|
||||||
cuda_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>,
|
|
||||||
(prefer_shmem ? cudaFuncCachePreferShared
|
|
||||||
: cudaFuncCachePreferL1)));
|
|
||||||
cache_config_set = true;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
(void)prefer_shmem;
|
|
||||||
#endif
|
|
||||||
|
|
||||||
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
|
|
||||||
|
|
||||||
// Invoke the driver function on the device
|
|
||||||
cuda_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>
|
|
||||||
<<<grid, block, shmem, cuda_instance->m_stream>>>(driver);
|
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
|
|
||||||
CUDA_SAFE_CALL(cudaGetLastError());
|
|
||||||
Kokkos::Cuda().fence();
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
static cudaFuncAttributes get_cuda_func_attributes() {
|
|
||||||
// Race condition inside of cudaFuncGetAttributes if the same address is
|
|
||||||
// given requires using a local variable as input instead of a static Rely
|
|
||||||
// on static variable initialization to make sure only one thread executes
|
|
||||||
// the code and the result is visible.
|
|
||||||
auto wrap_get_attributes = []() -> cudaFuncAttributes {
|
|
||||||
cudaFuncAttributes attr_tmp;
|
|
||||||
CUDA_SAFE_CALL(cudaFuncGetAttributes(
|
|
||||||
&attr_tmp,
|
|
||||||
cuda_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>));
|
|
||||||
return attr_tmp;
|
|
||||||
};
|
|
||||||
static cudaFuncAttributes attr = wrap_get_attributes();
|
|
||||||
return attr;
|
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
template <class DriverType>
|
template <class DriverType>
|
||||||
struct CudaParallelLaunch<DriverType, Kokkos::LaunchBounds<0, 0>,
|
struct CudaParallelLaunchKernelFunc<
|
||||||
Experimental::CudaLaunchMechanism::LocalMemory> {
|
DriverType, Kokkos::LaunchBounds<0, 0>,
|
||||||
|
Experimental::CudaLaunchMechanism::LocalMemory> {
|
||||||
|
static std::decay_t<decltype(cuda_parallel_launch_local_memory<DriverType>)>
|
||||||
|
get_kernel_func() {
|
||||||
|
return cuda_parallel_launch_local_memory<DriverType>;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
//------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
template <class DriverType, class LaunchBounds>
|
||||||
|
struct CudaParallelLaunchKernelInvoker<
|
||||||
|
DriverType, LaunchBounds, Experimental::CudaLaunchMechanism::LocalMemory>
|
||||||
|
: CudaParallelLaunchKernelFunc<
|
||||||
|
DriverType, LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism::LocalMemory> {
|
||||||
|
using base_t = CudaParallelLaunchKernelFunc<
|
||||||
|
DriverType, LaunchBounds, Experimental::CudaLaunchMechanism::LocalMemory>;
|
||||||
static_assert(sizeof(DriverType) < CudaTraits::KernelArgumentLimit,
|
static_assert(sizeof(DriverType) < CudaTraits::KernelArgumentLimit,
|
||||||
"Kokkos Error: Requested CudaLaunchLocalMemory with a Functor "
|
"Kokkos Error: Requested CudaLaunchLocalMemory with a Functor "
|
||||||
"larger than 4096 bytes.");
|
"larger than 4096 bytes.");
|
||||||
inline CudaParallelLaunch(const DriverType& driver, const dim3& grid,
|
|
||||||
const dim3& block, const int shmem,
|
|
||||||
const CudaInternal* cuda_instance,
|
|
||||||
const bool prefer_shmem) {
|
|
||||||
if ((grid.x != 0) && ((block.x * block.y * block.z) != 0)) {
|
|
||||||
if (cuda_instance->m_maxShmemPerBlock < shmem) {
|
|
||||||
Kokkos::Impl::throw_runtime_exception(std::string(
|
|
||||||
"CudaParallelLaunch FAILED: shared memory request is too large"));
|
|
||||||
}
|
|
||||||
#ifndef KOKKOS_ARCH_KEPLER
|
|
||||||
// On Kepler the L1 has no benefit since it doesn't cache reads
|
|
||||||
else {
|
|
||||||
static bool cache_config_set = false;
|
|
||||||
if (!cache_config_set) {
|
|
||||||
CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
|
|
||||||
cuda_parallel_launch_local_memory<DriverType>,
|
|
||||||
(prefer_shmem ? cudaFuncCachePreferShared
|
|
||||||
: cudaFuncCachePreferL1)));
|
|
||||||
cache_config_set = true;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
(void)prefer_shmem;
|
|
||||||
#endif
|
|
||||||
|
|
||||||
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
|
static void invoke_kernel(DriverType const& driver, dim3 const& grid,
|
||||||
|
dim3 const& block, int shmem,
|
||||||
|
CudaInternal const* cuda_instance) {
|
||||||
|
(base_t::
|
||||||
|
get_kernel_func())<<<grid, block, shmem, cuda_instance->m_stream>>>(
|
||||||
|
driver);
|
||||||
|
}
|
||||||
|
|
||||||
// Invoke the driver function on the device
|
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||||
cuda_parallel_launch_local_memory<DriverType>
|
inline static void create_parallel_launch_graph_node(
|
||||||
<<<grid, block, shmem, cuda_instance->m_stream>>>(driver);
|
DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
|
||||||
|
CudaInternal const* cuda_instance, bool prefer_shmem) {
|
||||||
|
//----------------------------------------
|
||||||
|
auto const& graph = Impl::get_cuda_graph_from_kernel(driver);
|
||||||
|
KOKKOS_EXPECTS(bool(graph));
|
||||||
|
auto& graph_node = Impl::get_cuda_graph_node_from_kernel(driver);
|
||||||
|
// Expect node not yet initialized
|
||||||
|
KOKKOS_EXPECTS(!bool(graph_node));
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
|
if (!Impl::is_empty_launch(grid, block)) {
|
||||||
CUDA_SAFE_CALL(cudaGetLastError());
|
Impl::check_shmem_request(cuda_instance, shmem);
|
||||||
Kokkos::Cuda().fence();
|
Impl::configure_shmem_preference(base_t::get_kernel_func(), prefer_shmem);
|
||||||
#endif
|
|
||||||
|
void const* args[] = {&driver};
|
||||||
|
|
||||||
|
cudaKernelNodeParams params = {};
|
||||||
|
|
||||||
|
params.blockDim = block;
|
||||||
|
params.gridDim = grid;
|
||||||
|
params.sharedMemBytes = shmem;
|
||||||
|
params.func = (void*)base_t::get_kernel_func();
|
||||||
|
params.kernelParams = (void**)args;
|
||||||
|
params.extra = nullptr;
|
||||||
|
|
||||||
|
CUDA_SAFE_CALL(cudaGraphAddKernelNode(
|
||||||
|
&graph_node, graph, /* dependencies = */ nullptr,
|
||||||
|
/* numDependencies = */ 0, ¶ms));
|
||||||
|
} else {
|
||||||
|
// We still need an empty node for the dependency structure
|
||||||
|
CUDA_SAFE_CALL(cudaGraphAddEmptyNode(&graph_node, graph,
|
||||||
|
/* dependencies = */ nullptr,
|
||||||
|
/* numDependencies = */ 0));
|
||||||
}
|
}
|
||||||
|
KOKKOS_ENSURES(bool(graph_node))
|
||||||
}
|
}
|
||||||
|
#endif
|
||||||
static cudaFuncAttributes get_cuda_func_attributes() {
|
|
||||||
// Race condition inside of cudaFuncGetAttributes if the same address is
|
|
||||||
// given requires using a local variable as input instead of a static Rely
|
|
||||||
// on static variable initialization to make sure only one thread executes
|
|
||||||
// the code and the result is visible.
|
|
||||||
auto wrap_get_attributes = []() -> cudaFuncAttributes {
|
|
||||||
cudaFuncAttributes attr_tmp;
|
|
||||||
CUDA_SAFE_CALL(cudaFuncGetAttributes(
|
|
||||||
&attr_tmp, cuda_parallel_launch_local_memory<DriverType>));
|
|
||||||
return attr_tmp;
|
|
||||||
};
|
|
||||||
static cudaFuncAttributes attr = wrap_get_attributes();
|
|
||||||
return attr;
|
|
||||||
}
|
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// </editor-fold> end local memory }}}2
|
||||||
|
//------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
//------------------------------------------------------------------------------
|
||||||
|
// <editor-fold desc="Global Memory"> {{{2
|
||||||
|
|
||||||
template <class DriverType, unsigned int MaxThreadsPerBlock,
|
template <class DriverType, unsigned int MaxThreadsPerBlock,
|
||||||
unsigned int MinBlocksPerSM>
|
unsigned int MinBlocksPerSM>
|
||||||
struct CudaParallelLaunch<
|
struct CudaParallelLaunchKernelFunc<
|
||||||
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
|
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
|
||||||
Experimental::CudaLaunchMechanism::GlobalMemory> {
|
Experimental::CudaLaunchMechanism::GlobalMemory> {
|
||||||
inline CudaParallelLaunch(const DriverType& driver, const dim3& grid,
|
static void* get_kernel_func() {
|
||||||
const dim3& block, const int shmem,
|
return cuda_parallel_launch_global_memory<DriverType, MaxThreadsPerBlock,
|
||||||
CudaInternal* cuda_instance,
|
MinBlocksPerSM>;
|
||||||
const bool prefer_shmem) {
|
|
||||||
if ((grid.x != 0) && ((block.x * block.y * block.z) != 0)) {
|
|
||||||
if (cuda_instance->m_maxShmemPerBlock < shmem) {
|
|
||||||
Kokkos::Impl::throw_runtime_exception(std::string(
|
|
||||||
"CudaParallelLaunch FAILED: shared memory request is too large"));
|
|
||||||
}
|
|
||||||
#ifndef KOKKOS_ARCH_KEPLER
|
|
||||||
// On Kepler the L1 has no benefit since it doesn't cache reads
|
|
||||||
else {
|
|
||||||
static bool cache_config_set = false;
|
|
||||||
if (!cache_config_set) {
|
|
||||||
CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
|
|
||||||
cuda_parallel_launch_global_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>,
|
|
||||||
(prefer_shmem ? cudaFuncCachePreferShared
|
|
||||||
: cudaFuncCachePreferL1)));
|
|
||||||
cache_config_set = true;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
(void)prefer_shmem;
|
|
||||||
#endif
|
|
||||||
|
|
||||||
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
|
|
||||||
|
|
||||||
DriverType* driver_ptr = nullptr;
|
|
||||||
driver_ptr = reinterpret_cast<DriverType*>(
|
|
||||||
cuda_instance->scratch_functor(sizeof(DriverType)));
|
|
||||||
cudaMemcpyAsync(driver_ptr, &driver, sizeof(DriverType),
|
|
||||||
cudaMemcpyDefault, cuda_instance->m_stream);
|
|
||||||
|
|
||||||
// Invoke the driver function on the device
|
|
||||||
cuda_parallel_launch_global_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>
|
|
||||||
<<<grid, block, shmem, cuda_instance->m_stream>>>(driver_ptr);
|
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
|
|
||||||
CUDA_SAFE_CALL(cudaGetLastError());
|
|
||||||
Kokkos::Cuda().fence();
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
}
|
|
||||||
static cudaFuncAttributes get_cuda_func_attributes() {
|
|
||||||
// Race condition inside of cudaFuncGetAttributes if the same address is
|
|
||||||
// given requires using a local variable as input instead of a static Rely
|
|
||||||
// on static variable initialization to make sure only one thread executes
|
|
||||||
// the code and the result is visible.
|
|
||||||
auto wrap_get_attributes = []() -> cudaFuncAttributes {
|
|
||||||
cudaFuncAttributes attr_tmp;
|
|
||||||
CUDA_SAFE_CALL(cudaFuncGetAttributes(
|
|
||||||
&attr_tmp,
|
|
||||||
cuda_parallel_launch_global_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>));
|
|
||||||
return attr_tmp;
|
|
||||||
};
|
|
||||||
static cudaFuncAttributes attr = wrap_get_attributes();
|
|
||||||
return attr;
|
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
template <class DriverType>
|
template <class DriverType>
|
||||||
struct CudaParallelLaunch<DriverType, Kokkos::LaunchBounds<0, 0>,
|
struct CudaParallelLaunchKernelFunc<
|
||||||
Experimental::CudaLaunchMechanism::GlobalMemory> {
|
DriverType, Kokkos::LaunchBounds<0, 0>,
|
||||||
inline CudaParallelLaunch(const DriverType& driver, const dim3& grid,
|
Experimental::CudaLaunchMechanism::GlobalMemory> {
|
||||||
const dim3& block, const int shmem,
|
static std::decay_t<decltype(cuda_parallel_launch_global_memory<DriverType>)>
|
||||||
CudaInternal* cuda_instance,
|
get_kernel_func() {
|
||||||
const bool prefer_shmem) {
|
return cuda_parallel_launch_global_memory<DriverType>;
|
||||||
if ((grid.x != 0) && ((block.x * block.y * block.z) != 0)) {
|
}
|
||||||
if (cuda_instance->m_maxShmemPerBlock < shmem) {
|
};
|
||||||
Kokkos::Impl::throw_runtime_exception(std::string(
|
|
||||||
"CudaParallelLaunch FAILED: shared memory request is too large"));
|
|
||||||
}
|
|
||||||
#ifndef KOKKOS_ARCH_KEPLER
|
|
||||||
// On Kepler the L1 has no benefit since it doesn't cache reads
|
|
||||||
else {
|
|
||||||
static bool cache_config_set = false;
|
|
||||||
if (!cache_config_set) {
|
|
||||||
CUDA_SAFE_CALL(cudaFuncSetCacheConfig(
|
|
||||||
cuda_parallel_launch_global_memory<DriverType>,
|
|
||||||
(prefer_shmem ? cudaFuncCachePreferShared
|
|
||||||
: cudaFuncCachePreferL1)));
|
|
||||||
cache_config_set = true;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
(void)prefer_shmem;
|
|
||||||
#endif
|
|
||||||
|
|
||||||
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
|
//------------------------------------------------------------------------------
|
||||||
|
|
||||||
DriverType* driver_ptr = nullptr;
|
template <class DriverType, class LaunchBounds>
|
||||||
driver_ptr = reinterpret_cast<DriverType*>(
|
struct CudaParallelLaunchKernelInvoker<
|
||||||
cuda_instance->scratch_functor(sizeof(DriverType)));
|
DriverType, LaunchBounds, Experimental::CudaLaunchMechanism::GlobalMemory>
|
||||||
|
: CudaParallelLaunchKernelFunc<
|
||||||
|
DriverType, LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism::GlobalMemory> {
|
||||||
|
using base_t = CudaParallelLaunchKernelFunc<
|
||||||
|
DriverType, LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism::GlobalMemory>;
|
||||||
|
|
||||||
|
static void invoke_kernel(DriverType const& driver, dim3 const& grid,
|
||||||
|
dim3 const& block, int shmem,
|
||||||
|
CudaInternal const* cuda_instance) {
|
||||||
|
DriverType* driver_ptr = reinterpret_cast<DriverType*>(
|
||||||
|
cuda_instance->scratch_functor(sizeof(DriverType)));
|
||||||
|
|
||||||
|
cudaMemcpyAsync(driver_ptr, &driver, sizeof(DriverType), cudaMemcpyDefault,
|
||||||
|
cuda_instance->m_stream);
|
||||||
|
(base_t::
|
||||||
|
get_kernel_func())<<<grid, block, shmem, cuda_instance->m_stream>>>(
|
||||||
|
driver_ptr);
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||||
|
inline static void create_parallel_launch_graph_node(
|
||||||
|
DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
|
||||||
|
CudaInternal const* cuda_instance, bool prefer_shmem) {
|
||||||
|
//----------------------------------------
|
||||||
|
auto const& graph = Impl::get_cuda_graph_from_kernel(driver);
|
||||||
|
KOKKOS_EXPECTS(bool(graph));
|
||||||
|
auto& graph_node = Impl::get_cuda_graph_node_from_kernel(driver);
|
||||||
|
// Expect node not yet initialized
|
||||||
|
KOKKOS_EXPECTS(!bool(graph_node));
|
||||||
|
|
||||||
|
if (!Impl::is_empty_launch(grid, block)) {
|
||||||
|
Impl::check_shmem_request(cuda_instance, shmem);
|
||||||
|
Impl::configure_shmem_preference(base_t::get_kernel_func(), prefer_shmem);
|
||||||
|
|
||||||
|
auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver);
|
||||||
|
|
||||||
|
// Unlike in the non-graph case, we can get away with doing an async copy
|
||||||
|
// here because the `DriverType` instance is held in the GraphNodeImpl
|
||||||
|
// which is guaranteed to be alive until the graph instance itself is
|
||||||
|
// destroyed, where there should be a fence ensuring that the allocation
|
||||||
|
// associated with this kernel on the device side isn't deleted.
|
||||||
cudaMemcpyAsync(driver_ptr, &driver, sizeof(DriverType),
|
cudaMemcpyAsync(driver_ptr, &driver, sizeof(DriverType),
|
||||||
cudaMemcpyDefault, cuda_instance->m_stream);
|
cudaMemcpyDefault, cuda_instance->m_stream);
|
||||||
|
|
||||||
cuda_parallel_launch_global_memory<DriverType>
|
void const* args[] = {&driver_ptr};
|
||||||
<<<grid, block, shmem, cuda_instance->m_stream>>>(driver_ptr);
|
|
||||||
|
cudaKernelNodeParams params = {};
|
||||||
|
|
||||||
|
params.blockDim = block;
|
||||||
|
params.gridDim = grid;
|
||||||
|
params.sharedMemBytes = shmem;
|
||||||
|
params.func = (void*)base_t::get_kernel_func();
|
||||||
|
params.kernelParams = (void**)args;
|
||||||
|
params.extra = nullptr;
|
||||||
|
|
||||||
|
CUDA_SAFE_CALL(cudaGraphAddKernelNode(
|
||||||
|
&graph_node, graph, /* dependencies = */ nullptr,
|
||||||
|
/* numDependencies = */ 0, ¶ms));
|
||||||
|
} else {
|
||||||
|
// We still need an empty node for the dependency structure
|
||||||
|
CUDA_SAFE_CALL(cudaGraphAddEmptyNode(&graph_node, graph,
|
||||||
|
/* dependencies = */ nullptr,
|
||||||
|
/* numDependencies = */ 0));
|
||||||
|
}
|
||||||
|
KOKKOS_ENSURES(bool(graph_node))
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
};
|
||||||
|
|
||||||
|
// </editor-fold> end Global Memory }}}2
|
||||||
|
//------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
//------------------------------------------------------------------------------
|
||||||
|
// <editor-fold desc="Constant Memory"> {{{2
|
||||||
|
|
||||||
|
template <class DriverType, unsigned int MaxThreadsPerBlock,
|
||||||
|
unsigned int MinBlocksPerSM>
|
||||||
|
struct CudaParallelLaunchKernelFunc<
|
||||||
|
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
|
||||||
|
Experimental::CudaLaunchMechanism::ConstantMemory> {
|
||||||
|
static std::decay_t<decltype(cuda_parallel_launch_constant_memory<
|
||||||
|
DriverType, MaxThreadsPerBlock, MinBlocksPerSM>)>
|
||||||
|
get_kernel_func() {
|
||||||
|
return cuda_parallel_launch_constant_memory<DriverType, MaxThreadsPerBlock,
|
||||||
|
MinBlocksPerSM>;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class DriverType>
|
||||||
|
struct CudaParallelLaunchKernelFunc<
|
||||||
|
DriverType, Kokkos::LaunchBounds<0, 0>,
|
||||||
|
Experimental::CudaLaunchMechanism::ConstantMemory> {
|
||||||
|
static std::decay_t<
|
||||||
|
decltype(cuda_parallel_launch_constant_memory<DriverType>)>
|
||||||
|
get_kernel_func() {
|
||||||
|
return cuda_parallel_launch_constant_memory<DriverType>;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
//------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
template <class DriverType, class LaunchBounds>
|
||||||
|
struct CudaParallelLaunchKernelInvoker<
|
||||||
|
DriverType, LaunchBounds, Experimental::CudaLaunchMechanism::ConstantMemory>
|
||||||
|
: CudaParallelLaunchKernelFunc<
|
||||||
|
DriverType, LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism::ConstantMemory> {
|
||||||
|
using base_t = CudaParallelLaunchKernelFunc<
|
||||||
|
DriverType, LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism::ConstantMemory>;
|
||||||
|
static_assert(sizeof(DriverType) < CudaTraits::ConstantMemoryUsage,
|
||||||
|
"Kokkos Error: Requested CudaLaunchConstantMemory with a "
|
||||||
|
"Functor larger than 32kB.");
|
||||||
|
|
||||||
|
static void invoke_kernel(DriverType const& driver, dim3 const& grid,
|
||||||
|
dim3 const& block, int shmem,
|
||||||
|
CudaInternal const* cuda_instance) {
|
||||||
|
// Wait until the previous kernel that uses the constant buffer is done
|
||||||
|
CUDA_SAFE_CALL(cudaEventSynchronize(cuda_instance->constantMemReusable));
|
||||||
|
|
||||||
|
// Copy functor (synchronously) to staging buffer in pinned host memory
|
||||||
|
unsigned long* staging = cuda_instance->constantMemHostStaging;
|
||||||
|
memcpy(staging, &driver, sizeof(DriverType));
|
||||||
|
|
||||||
|
// Copy functor asynchronously from there to constant memory on the device
|
||||||
|
cudaMemcpyToSymbolAsync(kokkos_impl_cuda_constant_memory_buffer, staging,
|
||||||
|
sizeof(DriverType), 0, cudaMemcpyHostToDevice,
|
||||||
|
cudaStream_t(cuda_instance->m_stream));
|
||||||
|
|
||||||
|
// Invoke the driver function on the device
|
||||||
|
(base_t::
|
||||||
|
get_kernel_func())<<<grid, block, shmem, cuda_instance->m_stream>>>();
|
||||||
|
|
||||||
|
// Record an event that says when the constant buffer can be reused
|
||||||
|
CUDA_SAFE_CALL(cudaEventRecord(cuda_instance->constantMemReusable,
|
||||||
|
cudaStream_t(cuda_instance->m_stream)));
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||||
|
inline static void create_parallel_launch_graph_node(
|
||||||
|
DriverType const& driver, dim3 const& grid, dim3 const& block, int shmem,
|
||||||
|
CudaInternal const* cuda_instance, bool prefer_shmem) {
|
||||||
|
// Just use global memory; coordinating through events to share constant
|
||||||
|
// memory with the non-graph interface is not really reasonable since
|
||||||
|
// events don't work with Graphs directly, and this would anyway require
|
||||||
|
// a much more complicated structure that finds previous nodes in the
|
||||||
|
// dependency structure of the graph and creates an implicit dependence
|
||||||
|
// based on the need for constant memory (which we would then have to
|
||||||
|
// somehow go and prove was not creating a dependency cycle, and I don't
|
||||||
|
// even know if there's an efficient way to do that, let alone in the
|
||||||
|
// structure we currenty have).
|
||||||
|
using global_launch_impl_t = CudaParallelLaunchKernelInvoker<
|
||||||
|
DriverType, LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism::GlobalMemory>;
|
||||||
|
global_launch_impl_t::create_parallel_launch_graph_node(
|
||||||
|
driver, grid, block, shmem, cuda_instance, prefer_shmem);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
};
|
||||||
|
|
||||||
|
// </editor-fold> end Constant Memory }}}2
|
||||||
|
//------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
// </editor-fold> end CudaParallelLaunchKernelInvoker }}}1
|
||||||
|
//==============================================================================
|
||||||
|
|
||||||
|
//==============================================================================
|
||||||
|
// <editor-fold desc="CudaParallelLaunchImpl"> {{{1
|
||||||
|
|
||||||
|
template <class DriverType, class LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism LaunchMechanism>
|
||||||
|
struct CudaParallelLaunchImpl;
|
||||||
|
|
||||||
|
template <class DriverType, unsigned int MaxThreadsPerBlock,
|
||||||
|
unsigned int MinBlocksPerSM,
|
||||||
|
Experimental::CudaLaunchMechanism LaunchMechanism>
|
||||||
|
struct CudaParallelLaunchImpl<
|
||||||
|
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
|
||||||
|
LaunchMechanism>
|
||||||
|
: CudaParallelLaunchKernelInvoker<
|
||||||
|
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
|
||||||
|
LaunchMechanism> {
|
||||||
|
using base_t = CudaParallelLaunchKernelInvoker<
|
||||||
|
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
|
||||||
|
LaunchMechanism>;
|
||||||
|
|
||||||
|
inline static void launch_kernel(const DriverType& driver, const dim3& grid,
|
||||||
|
const dim3& block, int shmem,
|
||||||
|
const CudaInternal* cuda_instance,
|
||||||
|
bool prefer_shmem) {
|
||||||
|
if (!Impl::is_empty_launch(grid, block)) {
|
||||||
|
// Prevent multiple threads to simultaneously set the cache configuration
|
||||||
|
// preference and launch the same kernel
|
||||||
|
static std::mutex mutex;
|
||||||
|
std::lock_guard<std::mutex> lock(mutex);
|
||||||
|
|
||||||
|
Impl::check_shmem_request(cuda_instance, shmem);
|
||||||
|
|
||||||
|
// If a desired occupancy is specified, we compute how much shared memory
|
||||||
|
// to ask for to achieve that occupancy, assuming that the cache
|
||||||
|
// configuration is `cudaFuncCachePreferL1`. If the amount of dynamic
|
||||||
|
// shared memory computed is actually smaller than `shmem` we overwrite
|
||||||
|
// `shmem` and set `prefer_shmem` to `false`.
|
||||||
|
modify_launch_configuration_if_desired_occupancy_is_specified(
|
||||||
|
driver.get_policy(), cuda_instance->m_deviceProp,
|
||||||
|
get_cuda_func_attributes(), block, shmem, prefer_shmem);
|
||||||
|
|
||||||
|
Impl::configure_shmem_preference(base_t::get_kernel_func(), prefer_shmem);
|
||||||
|
|
||||||
|
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
|
||||||
|
|
||||||
|
// Invoke the driver function on the device
|
||||||
|
base_t::invoke_kernel(driver, grid, block, shmem, cuda_instance);
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
|
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
|
||||||
CUDA_SAFE_CALL(cudaGetLastError());
|
CUDA_SAFE_CALL(cudaGetLastError());
|
||||||
Kokkos::Cuda().fence();
|
cuda_instance->fence();
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -630,15 +643,63 @@ struct CudaParallelLaunch<DriverType, Kokkos::LaunchBounds<0, 0>,
|
|||||||
// the code and the result is visible.
|
// the code and the result is visible.
|
||||||
auto wrap_get_attributes = []() -> cudaFuncAttributes {
|
auto wrap_get_attributes = []() -> cudaFuncAttributes {
|
||||||
cudaFuncAttributes attr_tmp;
|
cudaFuncAttributes attr_tmp;
|
||||||
CUDA_SAFE_CALL(cudaFuncGetAttributes(
|
CUDA_SAFE_CALL(
|
||||||
&attr_tmp, cuda_parallel_launch_global_memory<DriverType>));
|
cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()));
|
||||||
return attr_tmp;
|
return attr_tmp;
|
||||||
};
|
};
|
||||||
static cudaFuncAttributes attr = wrap_get_attributes();
|
static cudaFuncAttributes attr = wrap_get_attributes();
|
||||||
return attr;
|
return attr;
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
// </editor-fold> end CudaParallelLaunchImpl }}}1
|
||||||
|
//==============================================================================
|
||||||
|
|
||||||
|
//==============================================================================
|
||||||
|
// <editor-fold desc="CudaParallelLaunch"> {{{1
|
||||||
|
|
||||||
|
template <class DriverType, class LaunchBounds = Kokkos::LaunchBounds<>,
|
||||||
|
Experimental::CudaLaunchMechanism LaunchMechanism =
|
||||||
|
DeduceCudaLaunchMechanism<DriverType>::launch_mechanism,
|
||||||
|
bool DoGraph = DriverType::Policy::is_graph_kernel::value
|
||||||
|
#ifndef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||||
|
&& false
|
||||||
|
#endif
|
||||||
|
>
|
||||||
|
struct CudaParallelLaunch;
|
||||||
|
|
||||||
|
// General launch mechanism
|
||||||
|
template <class DriverType, class LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism LaunchMechanism>
|
||||||
|
struct CudaParallelLaunch<DriverType, LaunchBounds, LaunchMechanism,
|
||||||
|
/* DoGraph = */ false>
|
||||||
|
: CudaParallelLaunchImpl<DriverType, LaunchBounds, LaunchMechanism> {
|
||||||
|
using base_t =
|
||||||
|
CudaParallelLaunchImpl<DriverType, LaunchBounds, LaunchMechanism>;
|
||||||
|
template <class... Args>
|
||||||
|
CudaParallelLaunch(Args&&... args) {
|
||||||
|
base_t::launch_kernel((Args &&) args...);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||||
|
// Launch mechanism for creating graph nodes
|
||||||
|
template <class DriverType, class LaunchBounds,
|
||||||
|
Experimental::CudaLaunchMechanism LaunchMechanism>
|
||||||
|
struct CudaParallelLaunch<DriverType, LaunchBounds, LaunchMechanism,
|
||||||
|
/* DoGraph = */ true>
|
||||||
|
: CudaParallelLaunchImpl<DriverType, LaunchBounds, LaunchMechanism> {
|
||||||
|
using base_t =
|
||||||
|
CudaParallelLaunchImpl<DriverType, LaunchBounds, LaunchMechanism>;
|
||||||
|
template <class... Args>
|
||||||
|
CudaParallelLaunch(Args&&... args) {
|
||||||
|
base_t::create_parallel_launch_graph_node((Args &&) args...);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// </editor-fold> end CudaParallelLaunch }}}1
|
||||||
|
//==============================================================================
|
||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
@ -646,6 +707,5 @@ struct CudaParallelLaunch<DriverType, Kokkos::LaunchBounds<0, 0>,
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#endif /* defined( __CUDACC__ ) */
|
|
||||||
#endif /* defined( KOKKOS_ENABLE_CUDA ) */
|
#endif /* defined( KOKKOS_ENABLE_CUDA ) */
|
||||||
#endif /* #ifndef KOKKOS_CUDAEXEC_HPP */
|
#endif /* #ifndef KOKKOS_CUDAEXEC_HPP */
|
||||||
|
|||||||
@ -42,13 +42,10 @@
|
|||||||
//@HEADER
|
//@HEADER
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Core.hpp>
|
||||||
|
|
||||||
#ifdef KOKKOS_ENABLE_CUDA
|
#ifdef KOKKOS_ENABLE_CUDA
|
||||||
|
|
||||||
#include <Cuda/Kokkos_Cuda_Locks.hpp>
|
#include <Cuda/Kokkos_Cuda_Locks.hpp>
|
||||||
#include <Cuda/Kokkos_Cuda_Error.hpp>
|
#include <Cuda/Kokkos_Cuda_Error.hpp>
|
||||||
#include <Kokkos_Cuda.hpp>
|
|
||||||
|
|
||||||
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
|
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
|
|||||||
@ -81,8 +81,6 @@ void finalize_host_cuda_lock_arrays();
|
|||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
#if defined(__CUDACC__)
|
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
@ -173,8 +171,6 @@ inline int eliminate_warning_for_lock_array() { return lock_array_copied; }
|
|||||||
KOKKOS_COPY_CUDA_LOCK_ARRAYS_TO_DEVICE()
|
KOKKOS_COPY_CUDA_LOCK_ARRAYS_TO_DEVICE()
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#endif /* defined( __CUDACC__ ) */
|
|
||||||
|
|
||||||
#endif /* defined( KOKKOS_ENABLE_CUDA ) */
|
#endif /* defined( KOKKOS_ENABLE_CUDA ) */
|
||||||
|
|
||||||
#endif /* #ifndef KOKKOS_CUDA_LOCKS_HPP */
|
#endif /* #ifndef KOKKOS_CUDA_LOCKS_HPP */
|
||||||
|
|||||||
@ -46,7 +46,7 @@
|
|||||||
#define KOKKOS_CUDA_PARALLEL_HPP
|
#define KOKKOS_CUDA_PARALLEL_HPP
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
#if defined(__CUDACC__) && defined(KOKKOS_ENABLE_CUDA)
|
#if defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
|
||||||
#include <algorithm>
|
#include <algorithm>
|
||||||
#include <string>
|
#include <string>
|
||||||
@ -99,6 +99,8 @@ class TeamPolicyInternal<Kokkos::Cuda, Properties...>
|
|||||||
int m_team_scratch_size[2];
|
int m_team_scratch_size[2];
|
||||||
int m_thread_scratch_size[2];
|
int m_thread_scratch_size[2];
|
||||||
int m_chunk_size;
|
int m_chunk_size;
|
||||||
|
bool m_tune_team;
|
||||||
|
bool m_tune_vector;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
//! Execution space of this execution policy
|
//! Execution space of this execution policy
|
||||||
@ -115,6 +117,8 @@ class TeamPolicyInternal<Kokkos::Cuda, Properties...>
|
|||||||
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
|
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
|
||||||
m_chunk_size = p.m_chunk_size;
|
m_chunk_size = p.m_chunk_size;
|
||||||
m_space = p.m_space;
|
m_space = p.m_space;
|
||||||
|
m_tune_team = p.m_tune_team;
|
||||||
|
m_tune_vector = p.m_tune_vector;
|
||||||
}
|
}
|
||||||
|
|
||||||
//----------------------------------------
|
//----------------------------------------
|
||||||
@ -130,10 +134,10 @@ class TeamPolicyInternal<Kokkos::Cuda, Properties...>
|
|||||||
Kokkos::Impl::cuda_get_max_block_size<FunctorType,
|
Kokkos::Impl::cuda_get_max_block_size<FunctorType,
|
||||||
typename traits::launch_bounds>(
|
typename traits::launch_bounds>(
|
||||||
space().impl_internal_space_instance(), attr, f,
|
space().impl_internal_space_instance(), attr, f,
|
||||||
(size_t)vector_length(),
|
(size_t)impl_vector_length(),
|
||||||
(size_t)team_scratch_size(0) + 2 * sizeof(double),
|
(size_t)team_scratch_size(0) + 2 * sizeof(double),
|
||||||
(size_t)thread_scratch_size(0) + sizeof(double));
|
(size_t)thread_scratch_size(0) + sizeof(double));
|
||||||
return block_size / vector_length();
|
return block_size / impl_vector_length();
|
||||||
}
|
}
|
||||||
|
|
||||||
template <class FunctorType>
|
template <class FunctorType>
|
||||||
@ -171,10 +175,10 @@ class TeamPolicyInternal<Kokkos::Cuda, Properties...>
|
|||||||
Kokkos::Impl::cuda_get_opt_block_size<FunctorType,
|
Kokkos::Impl::cuda_get_opt_block_size<FunctorType,
|
||||||
typename traits::launch_bounds>(
|
typename traits::launch_bounds>(
|
||||||
space().impl_internal_space_instance(), attr, f,
|
space().impl_internal_space_instance(), attr, f,
|
||||||
(size_t)vector_length(),
|
(size_t)impl_vector_length(),
|
||||||
(size_t)team_scratch_size(0) + 2 * sizeof(double),
|
(size_t)team_scratch_size(0) + 2 * sizeof(double),
|
||||||
(size_t)thread_scratch_size(0) + sizeof(double));
|
(size_t)thread_scratch_size(0) + sizeof(double));
|
||||||
return block_size / vector_length();
|
return block_size / impl_vector_length();
|
||||||
}
|
}
|
||||||
|
|
||||||
template <class FunctorType>
|
template <class FunctorType>
|
||||||
@ -234,9 +238,18 @@ class TeamPolicyInternal<Kokkos::Cuda, Properties...>
|
|||||||
|
|
||||||
//----------------------------------------
|
//----------------------------------------
|
||||||
|
|
||||||
inline int vector_length() const { return m_vector_length; }
|
KOKKOS_DEPRECATED inline int vector_length() const {
|
||||||
|
return impl_vector_length();
|
||||||
|
}
|
||||||
|
inline int impl_vector_length() const { return m_vector_length; }
|
||||||
inline int team_size() const { return m_team_size; }
|
inline int team_size() const { return m_team_size; }
|
||||||
inline int league_size() const { return m_league_size; }
|
inline int league_size() const { return m_league_size; }
|
||||||
|
inline bool impl_auto_team_size() const { return m_tune_team; }
|
||||||
|
inline bool impl_auto_vector_length() const { return m_tune_vector; }
|
||||||
|
inline void impl_set_team_size(size_t team_size) { m_team_size = team_size; }
|
||||||
|
inline void impl_set_vector_length(size_t vector_length) {
|
||||||
|
m_vector_length = vector_length;
|
||||||
|
}
|
||||||
inline int scratch_size(int level, int team_size_ = -1) const {
|
inline int scratch_size(int level, int team_size_ = -1) const {
|
||||||
if (team_size_ < 0) team_size_ = m_team_size;
|
if (team_size_ < 0) team_size_ = m_team_size;
|
||||||
return m_team_scratch_size[level] +
|
return m_team_scratch_size[level] +
|
||||||
@ -258,18 +271,25 @@ class TeamPolicyInternal<Kokkos::Cuda, Properties...>
|
|||||||
m_vector_length(0),
|
m_vector_length(0),
|
||||||
m_team_scratch_size{0, 0},
|
m_team_scratch_size{0, 0},
|
||||||
m_thread_scratch_size{0, 0},
|
m_thread_scratch_size{0, 0},
|
||||||
m_chunk_size(32) {}
|
m_chunk_size(Impl::CudaTraits::WarpSize),
|
||||||
|
m_tune_team(false),
|
||||||
|
m_tune_vector(false) {}
|
||||||
|
|
||||||
/** \brief Specify league size, request team size */
|
/** \brief Specify league size, specify team size, specify vector length */
|
||||||
TeamPolicyInternal(const execution_space space_, int league_size_,
|
TeamPolicyInternal(const execution_space space_, int league_size_,
|
||||||
int team_size_request, int vector_length_request = 1)
|
int team_size_request, int vector_length_request = 1)
|
||||||
: m_space(space_),
|
: m_space(space_),
|
||||||
m_league_size(league_size_),
|
m_league_size(league_size_),
|
||||||
m_team_size(team_size_request),
|
m_team_size(team_size_request),
|
||||||
m_vector_length(verify_requested_vector_length(vector_length_request)),
|
m_vector_length(
|
||||||
|
(vector_length_request > 0)
|
||||||
|
? verify_requested_vector_length(vector_length_request)
|
||||||
|
: verify_requested_vector_length(1)),
|
||||||
m_team_scratch_size{0, 0},
|
m_team_scratch_size{0, 0},
|
||||||
m_thread_scratch_size{0, 0},
|
m_thread_scratch_size{0, 0},
|
||||||
m_chunk_size(32) {
|
m_chunk_size(Impl::CudaTraits::WarpSize),
|
||||||
|
m_tune_team(bool(team_size_request <= 0)),
|
||||||
|
m_tune_vector(bool(vector_length_request <= 0)) {
|
||||||
// Make sure league size is permissible
|
// Make sure league size is permissible
|
||||||
if (league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
|
if (league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
|
||||||
Impl::throw_runtime_exception(
|
Impl::throw_runtime_exception(
|
||||||
@ -277,72 +297,56 @@ class TeamPolicyInternal<Kokkos::Cuda, Properties...>
|
|||||||
"space.");
|
"space.");
|
||||||
|
|
||||||
// Make sure total block size is permissible
|
// Make sure total block size is permissible
|
||||||
if (m_team_size * m_vector_length > 1024) {
|
if (m_team_size * m_vector_length >
|
||||||
|
int(Impl::CudaTraits::MaxHierarchicalParallelism)) {
|
||||||
Impl::throw_runtime_exception(
|
Impl::throw_runtime_exception(
|
||||||
std::string("Kokkos::TeamPolicy< Cuda > the team size is too large. "
|
std::string("Kokkos::TeamPolicy< Cuda > the team size is too large. "
|
||||||
"Team size x vector length must be smaller than 1024."));
|
"Team size x vector length must be smaller than 1024."));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/** \brief Specify league size, request team size */
|
/** \brief Specify league size, request team size, specify vector length */
|
||||||
TeamPolicyInternal(const execution_space space_, int league_size_,
|
TeamPolicyInternal(const execution_space space_, int league_size_,
|
||||||
const Kokkos::AUTO_t& /* team_size_request */
|
const Kokkos::AUTO_t& /* team_size_request */
|
||||||
,
|
,
|
||||||
int vector_length_request = 1)
|
int vector_length_request = 1)
|
||||||
: m_space(space_),
|
: TeamPolicyInternal(space_, league_size_, -1, vector_length_request) {}
|
||||||
m_league_size(league_size_),
|
|
||||||
m_team_size(-1),
|
/** \brief Specify league size, request team size and vector length */
|
||||||
m_vector_length(verify_requested_vector_length(vector_length_request)),
|
TeamPolicyInternal(const execution_space space_, int league_size_,
|
||||||
m_team_scratch_size{0, 0},
|
const Kokkos::AUTO_t& /* team_size_request */,
|
||||||
m_thread_scratch_size{0, 0},
|
const Kokkos::AUTO_t& /* vector_length_request */
|
||||||
m_chunk_size(32) {
|
)
|
||||||
// Make sure league size is permissible
|
: TeamPolicyInternal(space_, league_size_, -1, -1) {}
|
||||||
if (league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
|
|
||||||
Impl::throw_runtime_exception(
|
/** \brief Specify league size, specify team size, request vector length */
|
||||||
"Requested too large league_size for TeamPolicy on Cuda execution "
|
TeamPolicyInternal(const execution_space space_, int league_size_,
|
||||||
"space.");
|
int team_size_request, const Kokkos::AUTO_t&)
|
||||||
}
|
: TeamPolicyInternal(space_, league_size_, team_size_request, -1) {}
|
||||||
|
|
||||||
TeamPolicyInternal(int league_size_, int team_size_request,
|
TeamPolicyInternal(int league_size_, int team_size_request,
|
||||||
int vector_length_request = 1)
|
int vector_length_request = 1)
|
||||||
: m_space(typename traits::execution_space()),
|
: TeamPolicyInternal(typename traits::execution_space(), league_size_,
|
||||||
m_league_size(league_size_),
|
team_size_request, vector_length_request) {}
|
||||||
m_team_size(team_size_request),
|
|
||||||
m_vector_length(verify_requested_vector_length(vector_length_request)),
|
|
||||||
m_team_scratch_size{0, 0},
|
|
||||||
m_thread_scratch_size{0, 0},
|
|
||||||
m_chunk_size(32) {
|
|
||||||
// Make sure league size is permissible
|
|
||||||
if (league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
|
|
||||||
Impl::throw_runtime_exception(
|
|
||||||
"Requested too large league_size for TeamPolicy on Cuda execution "
|
|
||||||
"space.");
|
|
||||||
|
|
||||||
// Make sure total block size is permissible
|
TeamPolicyInternal(int league_size_, const Kokkos::AUTO_t& team_size_request,
|
||||||
if (m_team_size * m_vector_length > 1024) {
|
|
||||||
Impl::throw_runtime_exception(
|
|
||||||
std::string("Kokkos::TeamPolicy< Cuda > the team size is too large. "
|
|
||||||
"Team size x vector length must be smaller than 1024."));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
TeamPolicyInternal(int league_size_,
|
|
||||||
const Kokkos::AUTO_t& /* team_size_request */
|
|
||||||
,
|
|
||||||
int vector_length_request = 1)
|
int vector_length_request = 1)
|
||||||
: m_space(typename traits::execution_space()),
|
: TeamPolicyInternal(typename traits::execution_space(), league_size_,
|
||||||
m_league_size(league_size_),
|
team_size_request, vector_length_request)
|
||||||
m_team_size(-1),
|
|
||||||
m_vector_length(verify_requested_vector_length(vector_length_request)),
|
{}
|
||||||
m_team_scratch_size{0, 0},
|
|
||||||
m_thread_scratch_size{0, 0},
|
/** \brief Specify league size, request team size */
|
||||||
m_chunk_size(32) {
|
TeamPolicyInternal(int league_size_, const Kokkos::AUTO_t& team_size_request,
|
||||||
// Make sure league size is permissible
|
const Kokkos::AUTO_t& vector_length_request)
|
||||||
if (league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
|
: TeamPolicyInternal(typename traits::execution_space(), league_size_,
|
||||||
Impl::throw_runtime_exception(
|
team_size_request, vector_length_request) {}
|
||||||
"Requested too large league_size for TeamPolicy on Cuda execution "
|
|
||||||
"space.");
|
/** \brief Specify league size, request team size */
|
||||||
}
|
TeamPolicyInternal(int league_size_, int team_size_request,
|
||||||
|
const Kokkos::AUTO_t& vector_length_request)
|
||||||
|
: TeamPolicyInternal(typename traits::execution_space(), league_size_,
|
||||||
|
team_size_request, vector_length_request) {}
|
||||||
|
|
||||||
inline int chunk_size() const { return m_chunk_size; }
|
inline int chunk_size() const { return m_chunk_size; }
|
||||||
|
|
||||||
@ -394,7 +398,7 @@ class TeamPolicyInternal<Kokkos::Cuda, Properties...>
|
|||||||
get_cuda_func_attributes();
|
get_cuda_func_attributes();
|
||||||
const int block_size = std::forward<BlockSizeCallable>(block_size_callable)(
|
const int block_size = std::forward<BlockSizeCallable>(block_size_callable)(
|
||||||
space().impl_internal_space_instance(), attr, f,
|
space().impl_internal_space_instance(), attr, f,
|
||||||
(size_t)vector_length(),
|
(size_t)impl_vector_length(),
|
||||||
(size_t)team_scratch_size(0) + 2 * sizeof(double),
|
(size_t)team_scratch_size(0) + 2 * sizeof(double),
|
||||||
(size_t)thread_scratch_size(0) + sizeof(double) +
|
(size_t)thread_scratch_size(0) + sizeof(double) +
|
||||||
((functor_value_traits::StaticValueSize != 0)
|
((functor_value_traits::StaticValueSize != 0)
|
||||||
@ -406,7 +410,7 @@ class TeamPolicyInternal<Kokkos::Cuda, Properties...>
|
|||||||
int p2 = 1;
|
int p2 = 1;
|
||||||
while (p2 <= block_size) p2 *= 2;
|
while (p2 <= block_size) p2 *= 2;
|
||||||
p2 /= 2;
|
p2 /= 2;
|
||||||
return p2 / vector_length();
|
return p2 / impl_vector_length();
|
||||||
}
|
}
|
||||||
|
|
||||||
template <class ClosureType, class FunctorType>
|
template <class ClosureType, class FunctorType>
|
||||||
@ -468,6 +472,8 @@ class ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
|
|||||||
public:
|
public:
|
||||||
using functor_type = FunctorType;
|
using functor_type = FunctorType;
|
||||||
|
|
||||||
|
Policy const& get_policy() const { return m_policy; }
|
||||||
|
|
||||||
inline __device__ void operator()(void) const {
|
inline __device__ void operator()(void) const {
|
||||||
const Member work_stride = blockDim.y * gridDim.x;
|
const Member work_stride = blockDim.y * gridDim.x;
|
||||||
const Member work_end = m_policy.end();
|
const Member work_end = m_policy.end();
|
||||||
@ -518,7 +524,8 @@ class ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
|
|||||||
template <class FunctorType, class... Traits>
|
template <class FunctorType, class... Traits>
|
||||||
class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
||||||
public:
|
public:
|
||||||
using Policy = Kokkos::MDRangePolicy<Traits...>;
|
using Policy = Kokkos::MDRangePolicy<Traits...>;
|
||||||
|
using functor_type = FunctorType;
|
||||||
|
|
||||||
private:
|
private:
|
||||||
using RP = Policy;
|
using RP = Policy;
|
||||||
@ -530,10 +537,11 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
|||||||
const Policy m_rp;
|
const Policy m_rp;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
Policy const& get_policy() const { return m_rp; }
|
||||||
|
|
||||||
inline __device__ void operator()(void) const {
|
inline __device__ void operator()(void) const {
|
||||||
Kokkos::Impl::Refactor::DeviceIterateTile<Policy::rank, Policy, FunctorType,
|
Kokkos::Impl::DeviceIterateTile<Policy::rank, Policy, FunctorType,
|
||||||
typename Policy::work_tag>(
|
typename Policy::work_tag>(m_rp, m_functor)
|
||||||
m_rp, m_functor)
|
|
||||||
.exec_range();
|
.exec_range();
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -621,8 +629,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
|
|||||||
*this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
|
*this, grid, block, 0, m_rp.space().impl_internal_space_instance(),
|
||||||
false);
|
false);
|
||||||
} else {
|
} else {
|
||||||
printf("Kokkos::MDRange Error: Exceeded rank bounds with Cuda\n");
|
Kokkos::abort("Kokkos::MDRange Error: Exceeded rank bounds with Cuda\n");
|
||||||
Kokkos::abort("Aborting");
|
|
||||||
}
|
}
|
||||||
|
|
||||||
} // end execute
|
} // end execute
|
||||||
@ -636,7 +643,7 @@ template <class FunctorType, class... Properties>
|
|||||||
class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
||||||
Kokkos::Cuda> {
|
Kokkos::Cuda> {
|
||||||
public:
|
public:
|
||||||
using Policy = TeamPolicyInternal<Kokkos::Cuda, Properties...>;
|
using Policy = TeamPolicy<Properties...>;
|
||||||
|
|
||||||
private:
|
private:
|
||||||
using Member = typename Policy::member_type;
|
using Member = typename Policy::member_type;
|
||||||
@ -680,6 +687,8 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
}
|
}
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
Policy const& get_policy() const { return m_policy; }
|
||||||
|
|
||||||
__device__ inline void operator()(void) const {
|
__device__ inline void operator()(void) const {
|
||||||
// Iterate this block through the league
|
// Iterate this block through the league
|
||||||
int64_t threadid = 0;
|
int64_t threadid = 0;
|
||||||
@ -749,7 +758,7 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
m_policy(arg_policy),
|
m_policy(arg_policy),
|
||||||
m_league_size(arg_policy.league_size()),
|
m_league_size(arg_policy.league_size()),
|
||||||
m_team_size(arg_policy.team_size()),
|
m_team_size(arg_policy.team_size()),
|
||||||
m_vector_size(arg_policy.vector_length()) {
|
m_vector_size(arg_policy.impl_vector_length()) {
|
||||||
cudaFuncAttributes attr =
|
cudaFuncAttributes attr =
|
||||||
CudaParallelLaunch<ParallelFor,
|
CudaParallelLaunch<ParallelFor,
|
||||||
LaunchBounds>::get_cuda_func_attributes();
|
LaunchBounds>::get_cuda_func_attributes();
|
||||||
@ -796,10 +805,10 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
if (int(m_team_size) >
|
if (int(m_team_size) >
|
||||||
int(Kokkos::Impl::cuda_get_max_block_size<FunctorType, LaunchBounds>(
|
int(Kokkos::Impl::cuda_get_max_block_size<FunctorType, LaunchBounds>(
|
||||||
m_policy.space().impl_internal_space_instance(), attr,
|
m_policy.space().impl_internal_space_instance(), attr,
|
||||||
arg_functor, arg_policy.vector_length(),
|
arg_functor, arg_policy.impl_vector_length(),
|
||||||
arg_policy.team_scratch_size(0),
|
arg_policy.team_scratch_size(0),
|
||||||
arg_policy.thread_scratch_size(0)) /
|
arg_policy.thread_scratch_size(0)) /
|
||||||
arg_policy.vector_length())) {
|
arg_policy.impl_vector_length())) {
|
||||||
Kokkos::Impl::throw_runtime_exception(std::string(
|
Kokkos::Impl::throw_runtime_exception(std::string(
|
||||||
"Kokkos::Impl::ParallelFor< Cuda > requested too large team size."));
|
"Kokkos::Impl::ParallelFor< Cuda > requested too large team size."));
|
||||||
}
|
}
|
||||||
@ -847,6 +856,7 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
|
|||||||
using functor_type = FunctorType;
|
using functor_type = FunctorType;
|
||||||
using size_type = Kokkos::Cuda::size_type;
|
using size_type = Kokkos::Cuda::size_type;
|
||||||
using index_type = typename Policy::index_type;
|
using index_type = typename Policy::index_type;
|
||||||
|
using reducer_type = ReducerType;
|
||||||
|
|
||||||
// Algorithmic constraints: blockSize is a power of two AND blockDim.y ==
|
// Algorithmic constraints: blockSize is a power of two AND blockDim.y ==
|
||||||
// blockDim.z == 1
|
// blockDim.z == 1
|
||||||
@ -873,6 +883,8 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
|
|||||||
using DummySHMEMReductionType = int;
|
using DummySHMEMReductionType = int;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
Policy const& get_policy() const { return m_policy; }
|
||||||
|
|
||||||
// Make the exec_range calls call to Reduce::DeviceIterateTile
|
// Make the exec_range calls call to Reduce::DeviceIterateTile
|
||||||
template <class TagType>
|
template <class TagType>
|
||||||
__device__ inline
|
__device__ inline
|
||||||
@ -949,36 +961,44 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
|
|||||||
for (unsigned i = threadIdx.y; i < word_count.value; i += blockDim.y) {
|
for (unsigned i = threadIdx.y; i < word_count.value; i += blockDim.y) {
|
||||||
global[i] = shared[i];
|
global[i] = shared[i];
|
||||||
}
|
}
|
||||||
} else if (cuda_single_inter_block_reduce_scan<false, ReducerTypeFwd,
|
// return ;
|
||||||
WorkTagFwd>(
|
}
|
||||||
ReducerConditional::select(m_functor, m_reducer), blockIdx.x,
|
|
||||||
gridDim.x, kokkos_impl_cuda_shared_memory<size_type>(),
|
|
||||||
m_scratch_space, m_scratch_flags)) {
|
|
||||||
// This is the final block with the final result at the final threads'
|
|
||||||
// location
|
|
||||||
|
|
||||||
size_type* const shared = kokkos_impl_cuda_shared_memory<size_type>() +
|
if (m_policy.begin() != m_policy.end()) {
|
||||||
(blockDim.y - 1) * word_count.value;
|
{
|
||||||
size_type* const global =
|
if (cuda_single_inter_block_reduce_scan<false, ReducerTypeFwd,
|
||||||
m_result_ptr_device_accessible
|
WorkTagFwd>(
|
||||||
? reinterpret_cast<size_type*>(m_result_ptr)
|
ReducerConditional::select(m_functor, m_reducer), blockIdx.x,
|
||||||
: (m_unified_space ? m_unified_space : m_scratch_space);
|
gridDim.x, kokkos_impl_cuda_shared_memory<size_type>(),
|
||||||
|
m_scratch_space, m_scratch_flags)) {
|
||||||
|
// This is the final block with the final result at the final threads'
|
||||||
|
// location
|
||||||
|
|
||||||
if (threadIdx.y == 0) {
|
size_type* const shared =
|
||||||
Kokkos::Impl::FunctorFinal<ReducerTypeFwd, WorkTagFwd>::final(
|
kokkos_impl_cuda_shared_memory<size_type>() +
|
||||||
ReducerConditional::select(m_functor, m_reducer), shared);
|
(blockDim.y - 1) * word_count.value;
|
||||||
}
|
size_type* const global =
|
||||||
|
m_result_ptr_device_accessible
|
||||||
|
? reinterpret_cast<size_type*>(m_result_ptr)
|
||||||
|
: (m_unified_space ? m_unified_space : m_scratch_space);
|
||||||
|
|
||||||
if (CudaTraits::WarpSize < word_count.value) {
|
if (threadIdx.y == 0) {
|
||||||
__syncthreads();
|
Kokkos::Impl::FunctorFinal<ReducerTypeFwd, WorkTagFwd>::final(
|
||||||
}
|
ReducerConditional::select(m_functor, m_reducer), shared);
|
||||||
|
}
|
||||||
|
|
||||||
for (unsigned i = threadIdx.y; i < word_count.value; i += blockDim.y) {
|
if (CudaTraits::WarpSize < word_count.value) {
|
||||||
global[i] = shared[i];
|
__syncthreads();
|
||||||
|
}
|
||||||
|
|
||||||
|
for (unsigned i = threadIdx.y; i < word_count.value;
|
||||||
|
i += blockDim.y) {
|
||||||
|
global[i] = shared[i];
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* __device__ inline
|
/* __device__ inline
|
||||||
void run(const DummyShflReductionType&) const
|
void run(const DummyShflReductionType&) const
|
||||||
{
|
{
|
||||||
@ -1055,6 +1075,9 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
|
|||||||
const bool need_device_set = ReduceFunctorHasInit<FunctorType>::value ||
|
const bool need_device_set = ReduceFunctorHasInit<FunctorType>::value ||
|
||||||
ReduceFunctorHasFinal<FunctorType>::value ||
|
ReduceFunctorHasFinal<FunctorType>::value ||
|
||||||
!m_result_ptr_host_accessible ||
|
!m_result_ptr_host_accessible ||
|
||||||
|
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||||
|
Policy::is_graph_kernel::value ||
|
||||||
|
#endif
|
||||||
!std::is_same<ReducerType, InvalidType>::value;
|
!std::is_same<ReducerType, InvalidType>::value;
|
||||||
if ((nwork > 0) || need_device_set) {
|
if ((nwork > 0) || need_device_set) {
|
||||||
const int block_size = local_block_size(m_functor);
|
const int block_size = local_block_size(m_functor);
|
||||||
@ -1077,6 +1100,7 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
|
|||||||
dim3 grid(std::min(int(block.y), int((nwork + block.y - 1) / block.y)), 1,
|
dim3 grid(std::min(int(block.y), int((nwork + block.y - 1) / block.y)), 1,
|
||||||
1);
|
1);
|
||||||
|
|
||||||
|
// TODO @graph We need to effectively insert this in to the graph
|
||||||
const int shmem =
|
const int shmem =
|
||||||
UseShflReduction
|
UseShflReduction
|
||||||
? 0
|
? 0
|
||||||
@ -1117,6 +1141,7 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
|
|||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
if (m_result_ptr) {
|
if (m_result_ptr) {
|
||||||
|
// TODO @graph We need to effectively insert this in to the graph
|
||||||
ValueInit::init(ReducerConditional::select(m_functor, m_reducer),
|
ValueInit::init(ReducerConditional::select(m_functor, m_reducer),
|
||||||
m_result_ptr);
|
m_result_ptr);
|
||||||
}
|
}
|
||||||
@ -1195,6 +1220,7 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
|
|||||||
using reference_type = typename ValueTraits::reference_type;
|
using reference_type = typename ValueTraits::reference_type;
|
||||||
using functor_type = FunctorType;
|
using functor_type = FunctorType;
|
||||||
using size_type = Cuda::size_type;
|
using size_type = Cuda::size_type;
|
||||||
|
using reducer_type = ReducerType;
|
||||||
|
|
||||||
// Algorithmic constraints: blockSize is a power of two AND blockDim.y ==
|
// Algorithmic constraints: blockSize is a power of two AND blockDim.y ==
|
||||||
// blockDim.z == 1
|
// blockDim.z == 1
|
||||||
@ -1214,16 +1240,16 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
|
|||||||
|
|
||||||
// Shall we use the shfl based reduction or not (only use it for static sized
|
// Shall we use the shfl based reduction or not (only use it for static sized
|
||||||
// types of more than 128bit
|
// types of more than 128bit
|
||||||
enum {
|
static constexpr bool UseShflReduction = false;
|
||||||
UseShflReduction = ((sizeof(value_type) > 2 * sizeof(double)) &&
|
//((sizeof(value_type)>2*sizeof(double)) && ValueTraits::StaticValueSize)
|
||||||
(ValueTraits::StaticValueSize != 0))
|
|
||||||
};
|
|
||||||
// Some crutch to do function overloading
|
// Some crutch to do function overloading
|
||||||
private:
|
private:
|
||||||
using DummyShflReductionType = double;
|
using DummyShflReductionType = double;
|
||||||
using DummySHMEMReductionType = int;
|
using DummySHMEMReductionType = int;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
Policy const& get_policy() const { return m_policy; }
|
||||||
|
|
||||||
inline __device__ void exec_range(reference_type update) const {
|
inline __device__ void exec_range(reference_type update) const {
|
||||||
Kokkos::Impl::Reduce::DeviceIterateTile<Policy::rank, Policy, FunctorType,
|
Kokkos::Impl::Reduce::DeviceIterateTile<Policy::rank, Policy, FunctorType,
|
||||||
typename Policy::work_tag,
|
typename Policy::work_tag,
|
||||||
@ -1390,6 +1416,7 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
|
|||||||
// Required grid.x <= block.y
|
// Required grid.x <= block.y
|
||||||
const dim3 grid(std::min(int(block.y), int(nwork)), 1, 1);
|
const dim3 grid(std::min(int(block.y), int(nwork)), 1, 1);
|
||||||
|
|
||||||
|
// TODO @graph We need to effectively insert this in to the graph
|
||||||
const int shmem =
|
const int shmem =
|
||||||
UseShflReduction
|
UseShflReduction
|
||||||
? 0
|
? 0
|
||||||
@ -1403,7 +1430,7 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
|
|||||||
false); // copy to device and execute
|
false); // copy to device and execute
|
||||||
|
|
||||||
if (!m_result_ptr_device_accessible) {
|
if (!m_result_ptr_device_accessible) {
|
||||||
Cuda().fence();
|
m_policy.space().fence();
|
||||||
|
|
||||||
if (m_result_ptr) {
|
if (m_result_ptr) {
|
||||||
if (m_unified_space) {
|
if (m_unified_space) {
|
||||||
@ -1421,6 +1448,7 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
|
|||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
if (m_result_ptr) {
|
if (m_result_ptr) {
|
||||||
|
// TODO @graph We need to effectively insert this in to the graph
|
||||||
ValueInit::init(ReducerConditional::select(m_functor, m_reducer),
|
ValueInit::init(ReducerConditional::select(m_functor, m_reducer),
|
||||||
m_result_ptr);
|
m_result_ptr);
|
||||||
}
|
}
|
||||||
@ -1464,7 +1492,7 @@ template <class FunctorType, class ReducerType, class... Properties>
|
|||||||
class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
||||||
ReducerType, Kokkos::Cuda> {
|
ReducerType, Kokkos::Cuda> {
|
||||||
public:
|
public:
|
||||||
using Policy = TeamPolicyInternal<Kokkos::Cuda, Properties...>;
|
using Policy = TeamPolicy<Properties...>;
|
||||||
|
|
||||||
private:
|
private:
|
||||||
using Member = typename Policy::member_type;
|
using Member = typename Policy::member_type;
|
||||||
@ -1491,8 +1519,11 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
public:
|
public:
|
||||||
using functor_type = FunctorType;
|
using functor_type = FunctorType;
|
||||||
using size_type = Cuda::size_type;
|
using size_type = Cuda::size_type;
|
||||||
|
using reducer_type = ReducerType;
|
||||||
|
|
||||||
enum { UseShflReduction = (true && (ValueTraits::StaticValueSize != 0)) };
|
enum : bool {
|
||||||
|
UseShflReduction = (true && (ValueTraits::StaticValueSize != 0))
|
||||||
|
};
|
||||||
|
|
||||||
private:
|
private:
|
||||||
using DummyShflReductionType = double;
|
using DummyShflReductionType = double;
|
||||||
@ -1539,6 +1570,8 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
}
|
}
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
Policy const& get_policy() const { return m_policy; }
|
||||||
|
|
||||||
__device__ inline void operator()() const {
|
__device__ inline void operator()() const {
|
||||||
int64_t threadid = 0;
|
int64_t threadid = 0;
|
||||||
if (m_scratch_size[1] > 0) {
|
if (m_scratch_size[1] > 0) {
|
||||||
@ -1631,31 +1664,35 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
for (unsigned i = threadIdx.y; i < word_count.value; i += blockDim.y) {
|
for (unsigned i = threadIdx.y; i < word_count.value; i += blockDim.y) {
|
||||||
global[i] = shared[i];
|
global[i] = shared[i];
|
||||||
}
|
}
|
||||||
} else if (cuda_single_inter_block_reduce_scan<false, FunctorType, WorkTag>(
|
}
|
||||||
ReducerConditional::select(m_functor, m_reducer), blockIdx.x,
|
|
||||||
gridDim.x, kokkos_impl_cuda_shared_memory<size_type>(),
|
|
||||||
m_scratch_space, m_scratch_flags)) {
|
|
||||||
// This is the final block with the final result at the final threads'
|
|
||||||
// location
|
|
||||||
|
|
||||||
size_type* const shared = kokkos_impl_cuda_shared_memory<size_type>() +
|
if (m_league_size != 0) {
|
||||||
(blockDim.y - 1) * word_count.value;
|
if (cuda_single_inter_block_reduce_scan<false, FunctorType, WorkTag>(
|
||||||
size_type* const global =
|
ReducerConditional::select(m_functor, m_reducer), blockIdx.x,
|
||||||
m_result_ptr_device_accessible
|
gridDim.x, kokkos_impl_cuda_shared_memory<size_type>(),
|
||||||
? reinterpret_cast<size_type*>(m_result_ptr)
|
m_scratch_space, m_scratch_flags)) {
|
||||||
: (m_unified_space ? m_unified_space : m_scratch_space);
|
// This is the final block with the final result at the final threads'
|
||||||
|
// location
|
||||||
|
|
||||||
if (threadIdx.y == 0) {
|
size_type* const shared = kokkos_impl_cuda_shared_memory<size_type>() +
|
||||||
Kokkos::Impl::FunctorFinal<ReducerTypeFwd, WorkTagFwd>::final(
|
(blockDim.y - 1) * word_count.value;
|
||||||
ReducerConditional::select(m_functor, m_reducer), shared);
|
size_type* const global =
|
||||||
}
|
m_result_ptr_device_accessible
|
||||||
|
? reinterpret_cast<size_type*>(m_result_ptr)
|
||||||
|
: (m_unified_space ? m_unified_space : m_scratch_space);
|
||||||
|
|
||||||
if (CudaTraits::WarpSize < word_count.value) {
|
if (threadIdx.y == 0) {
|
||||||
__syncthreads();
|
Kokkos::Impl::FunctorFinal<ReducerTypeFwd, WorkTagFwd>::final(
|
||||||
}
|
ReducerConditional::select(m_functor, m_reducer), shared);
|
||||||
|
}
|
||||||
|
|
||||||
for (unsigned i = threadIdx.y; i < word_count.value; i += blockDim.y) {
|
if (CudaTraits::WarpSize < word_count.value) {
|
||||||
global[i] = shared[i];
|
__syncthreads();
|
||||||
|
}
|
||||||
|
|
||||||
|
for (unsigned i = threadIdx.y; i < word_count.value; i += blockDim.y) {
|
||||||
|
global[i] = shared[i];
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -1717,6 +1754,9 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
const bool need_device_set = ReduceFunctorHasInit<FunctorType>::value ||
|
const bool need_device_set = ReduceFunctorHasInit<FunctorType>::value ||
|
||||||
ReduceFunctorHasFinal<FunctorType>::value ||
|
ReduceFunctorHasFinal<FunctorType>::value ||
|
||||||
!m_result_ptr_host_accessible ||
|
!m_result_ptr_host_accessible ||
|
||||||
|
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
|
||||||
|
Policy::is_graph_kernel::value ||
|
||||||
|
#endif
|
||||||
!std::is_same<ReducerType, InvalidType>::value;
|
!std::is_same<ReducerType, InvalidType>::value;
|
||||||
if ((nwork > 0) || need_device_set) {
|
if ((nwork > 0) || need_device_set) {
|
||||||
const int block_count =
|
const int block_count =
|
||||||
@ -1770,6 +1810,7 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
if (m_result_ptr) {
|
if (m_result_ptr) {
|
||||||
|
// TODO @graph We need to effectively insert this in to the graph
|
||||||
ValueInit::init(ReducerConditional::select(m_functor, m_reducer),
|
ValueInit::init(ReducerConditional::select(m_functor, m_reducer),
|
||||||
m_result_ptr);
|
m_result_ptr);
|
||||||
}
|
}
|
||||||
@ -1800,7 +1841,7 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
m_scratch_ptr{nullptr, nullptr},
|
m_scratch_ptr{nullptr, nullptr},
|
||||||
m_league_size(arg_policy.league_size()),
|
m_league_size(arg_policy.league_size()),
|
||||||
m_team_size(arg_policy.team_size()),
|
m_team_size(arg_policy.team_size()),
|
||||||
m_vector_size(arg_policy.vector_length()) {
|
m_vector_size(arg_policy.impl_vector_length()) {
|
||||||
cudaFuncAttributes attr =
|
cudaFuncAttributes attr =
|
||||||
CudaParallelLaunch<ParallelReduce,
|
CudaParallelLaunch<ParallelReduce,
|
||||||
LaunchBounds>::get_cuda_func_attributes();
|
LaunchBounds>::get_cuda_func_attributes();
|
||||||
@ -1838,7 +1879,7 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
|
|
||||||
// The global parallel_reduce does not support vector_length other than 1 at
|
// The global parallel_reduce does not support vector_length other than 1 at
|
||||||
// the moment
|
// the moment
|
||||||
if ((arg_policy.vector_length() > 1) && !UseShflReduction)
|
if ((arg_policy.impl_vector_length() > 1) && !UseShflReduction)
|
||||||
Impl::throw_runtime_exception(
|
Impl::throw_runtime_exception(
|
||||||
"Kokkos::parallel_reduce with a TeamPolicy using a vector length of "
|
"Kokkos::parallel_reduce with a TeamPolicy using a vector length of "
|
||||||
"greater than 1 is not currently supported for CUDA for dynamic "
|
"greater than 1 is not currently supported for CUDA for dynamic "
|
||||||
@ -1899,7 +1940,7 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
m_scratch_ptr{nullptr, nullptr},
|
m_scratch_ptr{nullptr, nullptr},
|
||||||
m_league_size(arg_policy.league_size()),
|
m_league_size(arg_policy.league_size()),
|
||||||
m_team_size(arg_policy.team_size()),
|
m_team_size(arg_policy.team_size()),
|
||||||
m_vector_size(arg_policy.vector_length()) {
|
m_vector_size(arg_policy.impl_vector_length()) {
|
||||||
cudaFuncAttributes attr =
|
cudaFuncAttributes attr =
|
||||||
CudaParallelLaunch<ParallelReduce,
|
CudaParallelLaunch<ParallelReduce,
|
||||||
LaunchBounds>::get_cuda_func_attributes();
|
LaunchBounds>::get_cuda_func_attributes();
|
||||||
@ -1936,7 +1977,7 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
|
|||||||
|
|
||||||
// The global parallel_reduce does not support vector_length other than 1 at
|
// The global parallel_reduce does not support vector_length other than 1 at
|
||||||
// the moment
|
// the moment
|
||||||
if ((arg_policy.vector_length() > 1) && !UseShflReduction)
|
if ((arg_policy.impl_vector_length() > 1) && !UseShflReduction)
|
||||||
Impl::throw_runtime_exception(
|
Impl::throw_runtime_exception(
|
||||||
"Kokkos::parallel_reduce with a TeamPolicy using a vector length of "
|
"Kokkos::parallel_reduce with a TeamPolicy using a vector length of "
|
||||||
"greater than 1 is not currently supported for CUDA for dynamic "
|
"greater than 1 is not currently supported for CUDA for dynamic "
|
||||||
@ -2150,6 +2191,8 @@ class ParallelScan<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
Policy const& get_policy() const { return m_policy; }
|
||||||
|
|
||||||
//----------------------------------------
|
//----------------------------------------
|
||||||
|
|
||||||
__device__ inline void operator()(void) const {
|
__device__ inline void operator()(void) const {
|
||||||
@ -2440,6 +2483,8 @@ class ParallelScanWithTotal<FunctorType, Kokkos::RangePolicy<Traits...>,
|
|||||||
}
|
}
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
Policy const& get_policy() const { return m_policy; }
|
||||||
|
|
||||||
//----------------------------------------
|
//----------------------------------------
|
||||||
|
|
||||||
__device__ inline void operator()(void) const {
|
__device__ inline void operator()(void) const {
|
||||||
@ -2799,5 +2844,5 @@ struct ParallelReduceFunctorType<FunctorTypeIn, ExecPolicy, ValueType, Cuda> {
|
|||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
#endif /* defined( __CUDACC__ ) */
|
#endif /* defined(KOKKOS_ENABLE_CUDA) */
|
||||||
#endif /* #ifndef KOKKOS_CUDA_PARALLEL_HPP */
|
#endif /* #ifndef KOKKOS_CUDA_PARALLEL_HPP */
|
||||||
|
|||||||
@ -46,7 +46,7 @@
|
|||||||
#define KOKKOS_CUDA_REDUCESCAN_HPP
|
#define KOKKOS_CUDA_REDUCESCAN_HPP
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
#if defined(__CUDACC__) && defined(KOKKOS_ENABLE_CUDA)
|
#if defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
|
||||||
#include <utility>
|
#include <utility>
|
||||||
|
|
||||||
@ -983,5 +983,5 @@ inline unsigned cuda_single_inter_block_reduce_scan_shmem(
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#endif /* #if defined( __CUDACC__ ) */
|
#endif /* #if defined(KOKKOS_ENABLE_CUDA) */
|
||||||
#endif /* KOKKOS_CUDA_REDUCESCAN_HPP */
|
#endif /* KOKKOS_CUDA_REDUCESCAN_HPP */
|
||||||
|
|||||||
@ -390,7 +390,7 @@ class TaskQueueSpecializationConstrained<
|
|||||||
((int*)&task_ptr)[0] = KOKKOS_IMPL_CUDA_SHFL(((int*)&task_ptr)[0], 0, 32);
|
((int*)&task_ptr)[0] = KOKKOS_IMPL_CUDA_SHFL(((int*)&task_ptr)[0], 0, 32);
|
||||||
((int*)&task_ptr)[1] = KOKKOS_IMPL_CUDA_SHFL(((int*)&task_ptr)[1], 0, 32);
|
((int*)&task_ptr)[1] = KOKKOS_IMPL_CUDA_SHFL(((int*)&task_ptr)[1], 0, 32);
|
||||||
|
|
||||||
#if defined(KOKKOS_DEBUG)
|
#if defined(KOKKOS_ENABLE_DEBUG)
|
||||||
KOKKOS_IMPL_CUDA_SYNCWARP_OR_RETURN("TaskQueue CUDA task_ptr");
|
KOKKOS_IMPL_CUDA_SYNCWARP_OR_RETURN("TaskQueue CUDA task_ptr");
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
@ -799,7 +799,6 @@ namespace Kokkos {
|
|||||||
* i=0..N-1.
|
* i=0..N-1.
|
||||||
*
|
*
|
||||||
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
|
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
|
||||||
* This functionality requires C++11 support.
|
|
||||||
*/
|
*/
|
||||||
template <typename iType, class Lambda, class Scheduler>
|
template <typename iType, class Lambda, class Scheduler>
|
||||||
KOKKOS_INLINE_FUNCTION void parallel_for(
|
KOKKOS_INLINE_FUNCTION void parallel_for(
|
||||||
|
|||||||
@ -50,7 +50,7 @@
|
|||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
|
|
||||||
/* only compile this file if CUDA is enabled for Kokkos */
|
/* only compile this file if CUDA is enabled for Kokkos */
|
||||||
#if defined(__CUDACC__) && defined(KOKKOS_ENABLE_CUDA)
|
#if defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
|
||||||
#include <utility>
|
#include <utility>
|
||||||
#include <Kokkos_Parallel.hpp>
|
#include <Kokkos_Parallel.hpp>
|
||||||
@ -290,7 +290,7 @@ class CudaTeamMember {
|
|||||||
*/
|
*/
|
||||||
template <typename Type>
|
template <typename Type>
|
||||||
KOKKOS_INLINE_FUNCTION Type team_scan(const Type& value) const {
|
KOKKOS_INLINE_FUNCTION Type team_scan(const Type& value) const {
|
||||||
return this->template team_scan<Type>(value, 0);
|
return this->template team_scan<Type>(value, nullptr);
|
||||||
}
|
}
|
||||||
|
|
||||||
//----------------------------------------
|
//----------------------------------------
|
||||||
@ -935,6 +935,54 @@ KOKKOS_INLINE_FUNCTION
|
|||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
/** \brief Inter-thread parallel exclusive prefix sum.
|
||||||
|
*
|
||||||
|
* Executes closure(iType i, ValueType & val, bool final) for each i=[0..N)
|
||||||
|
*
|
||||||
|
* The range [0..N) is mapped to each rank in the team (whose global rank is
|
||||||
|
* less than N) and a scan operation is performed. The last call to closure has
|
||||||
|
* final == true.
|
||||||
|
*/
|
||||||
|
// This is the same code as in HIP and largely the same as in OpenMPTarget
|
||||||
|
template <typename iType, typename FunctorType>
|
||||||
|
KOKKOS_INLINE_FUNCTION void parallel_scan(
|
||||||
|
const Impl::TeamThreadRangeBoundariesStruct<iType, Impl::CudaTeamMember>&
|
||||||
|
loop_bounds,
|
||||||
|
const FunctorType& lambda) {
|
||||||
|
// Extract value_type from lambda
|
||||||
|
using value_type = typename Kokkos::Impl::FunctorAnalysis<
|
||||||
|
Kokkos::Impl::FunctorPatternInterface::SCAN, void,
|
||||||
|
FunctorType>::value_type;
|
||||||
|
|
||||||
|
const auto start = loop_bounds.start;
|
||||||
|
const auto end = loop_bounds.end;
|
||||||
|
auto& member = loop_bounds.member;
|
||||||
|
const auto team_size = member.team_size();
|
||||||
|
const auto team_rank = member.team_rank();
|
||||||
|
const auto nchunk = (end - start + team_size - 1) / team_size;
|
||||||
|
value_type accum = 0;
|
||||||
|
// each team has to process one or more chunks of the prefix scan
|
||||||
|
for (iType i = 0; i < nchunk; ++i) {
|
||||||
|
auto ii = start + i * team_size + team_rank;
|
||||||
|
// local accumulation for this chunk
|
||||||
|
value_type local_accum = 0;
|
||||||
|
// user updates value with prefix value
|
||||||
|
if (ii < loop_bounds.end) lambda(ii, local_accum, false);
|
||||||
|
// perform team scan
|
||||||
|
local_accum = member.team_scan(local_accum);
|
||||||
|
// add this blocks accum to total accumulation
|
||||||
|
auto val = accum + local_accum;
|
||||||
|
// user updates their data with total accumulation
|
||||||
|
if (ii < loop_bounds.end) lambda(ii, val, true);
|
||||||
|
// the last value needs to be propogated to next chunk
|
||||||
|
if (team_rank == team_size - 1) accum = val;
|
||||||
|
// broadcast last value to rest of the team
|
||||||
|
member.team_broadcast(accum, team_size - 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
/** \brief Intra-thread vector parallel exclusive prefix sum.
|
/** \brief Intra-thread vector parallel exclusive prefix sum.
|
||||||
*
|
*
|
||||||
* Executes closure(iType i, ValueType & val, bool final) for each i=[0..N)
|
* Executes closure(iType i, ValueType & val, bool final) for each i=[0..N)
|
||||||
@ -1089,6 +1137,6 @@ KOKKOS_INLINE_FUNCTION void single(
|
|||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
#endif /* defined( __CUDACC__ ) */
|
#endif /* defined(KOKKOS_ENABLE_CUDA) */
|
||||||
|
|
||||||
#endif /* #ifndef KOKKOS_CUDA_TEAM_HPP */
|
#endif /* #ifndef KOKKOS_CUDA_TEAM_HPP */
|
||||||
|
|||||||
@ -77,6 +77,8 @@ class ParallelFor<FunctorType, Kokkos::WorkGraphPolicy<Traits...>,
|
|||||||
}
|
}
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
Policy const& get_policy() const { return m_policy; }
|
||||||
|
|
||||||
__device__ inline void operator()() const noexcept {
|
__device__ inline void operator()() const noexcept {
|
||||||
if (0 == (threadIdx.y % 16)) {
|
if (0 == (threadIdx.y % 16)) {
|
||||||
// Spin until COMPLETED_TOKEN.
|
// Spin until COMPLETED_TOKEN.
|
||||||
|
|||||||
@ -48,7 +48,7 @@
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
#if defined(__CUDACC__) && defined(KOKKOS_ENABLE_CUDA)
|
#if defined(KOKKOS_ENABLE_CUDA)
|
||||||
|
|
||||||
#include <cuda.h>
|
#include <cuda.h>
|
||||||
|
|
||||||
@ -97,5 +97,5 @@ __device__ inline void cuda_abort(const char *const message) {
|
|||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
#else
|
#else
|
||||||
void KOKKOS_CORE_SRC_CUDA_ABORT_PREVENT_LINK_ERROR() {}
|
void KOKKOS_CORE_SRC_CUDA_ABORT_PREVENT_LINK_ERROR() {}
|
||||||
#endif /* #if defined(__CUDACC__) && defined( KOKKOS_ENABLE_CUDA ) */
|
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
|
||||||
#endif /* #ifndef KOKKOS_CUDA_ABORT_HPP */
|
#endif /* #ifndef KOKKOS_CUDA_ABORT_HPP */
|
||||||
|
|||||||
@ -45,6 +45,10 @@
|
|||||||
#ifndef KOKKOS_HIP_ATOMIC_HPP
|
#ifndef KOKKOS_HIP_ATOMIC_HPP
|
||||||
#define KOKKOS_HIP_ATOMIC_HPP
|
#define KOKKOS_HIP_ATOMIC_HPP
|
||||||
|
|
||||||
|
#include <impl/Kokkos_Atomic_Memory_Order.hpp>
|
||||||
|
#include <impl/Kokkos_Memory_Fence.hpp>
|
||||||
|
#include <HIP/Kokkos_HIP_Locks.hpp>
|
||||||
|
|
||||||
#if defined(KOKKOS_ENABLE_HIP_ATOMICS)
|
#if defined(KOKKOS_ENABLE_HIP_ATOMICS)
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
// HIP can do:
|
// HIP can do:
|
||||||
@ -103,19 +107,16 @@ atomic_exchange(volatile T *const dest,
|
|||||||
typename std::enable_if<sizeof(T) != sizeof(int) &&
|
typename std::enable_if<sizeof(T) != sizeof(int) &&
|
||||||
sizeof(T) != sizeof(long long),
|
sizeof(T) != sizeof(long long),
|
||||||
const T>::type &val) {
|
const T>::type &val) {
|
||||||
// FIXME_HIP
|
|
||||||
Kokkos::abort("atomic_exchange not implemented for large types.\n");
|
|
||||||
T return_val;
|
T return_val;
|
||||||
int done = 0;
|
int done = 0;
|
||||||
unsigned int active = __ballot(1);
|
unsigned int active = __ballot(1);
|
||||||
unsigned int done_active = 0;
|
unsigned int done_active = 0;
|
||||||
while (active != done_active) {
|
while (active != done_active) {
|
||||||
if (!done) {
|
if (!done) {
|
||||||
// if (Impl::lock_address_hip_space((void*)dest))
|
if (Impl::lock_address_hip_space((void *)dest)) {
|
||||||
{
|
|
||||||
return_val = *dest;
|
return_val = *dest;
|
||||||
*dest = val;
|
*dest = val;
|
||||||
// Impl::unlock_address_hip_space((void*)dest);
|
Impl::unlock_address_hip_space((void *)dest);
|
||||||
done = 1;
|
done = 1;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -215,19 +216,16 @@ __inline__ __device__ T atomic_compare_exchange(
|
|||||||
typename std::enable_if<sizeof(T) != sizeof(int) &&
|
typename std::enable_if<sizeof(T) != sizeof(int) &&
|
||||||
sizeof(T) != sizeof(long long),
|
sizeof(T) != sizeof(long long),
|
||||||
const T>::type &val) {
|
const T>::type &val) {
|
||||||
// FIXME_HIP
|
|
||||||
Kokkos::abort("atomic_compare_exchange not implemented for large types.\n");
|
|
||||||
T return_val;
|
T return_val;
|
||||||
int done = 0;
|
int done = 0;
|
||||||
unsigned int active = __ballot(1);
|
unsigned int active = __ballot(1);
|
||||||
unsigned int done_active = 0;
|
unsigned int done_active = 0;
|
||||||
while (active != done_active) {
|
while (active != done_active) {
|
||||||
if (!done) {
|
if (!done) {
|
||||||
// if (Impl::lock_address_hip_space((void*)dest))
|
if (Impl::lock_address_hip_space((void *)dest)) {
|
||||||
{
|
|
||||||
return_val = *dest;
|
return_val = *dest;
|
||||||
if (return_val == compare) *dest = val;
|
if (return_val == compare) *dest = val;
|
||||||
// Impl::unlock_address_hip_space((void*)dest);
|
Impl::unlock_address_hip_space((void *)dest);
|
||||||
done = 1;
|
done = 1;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -350,19 +348,16 @@ atomic_fetch_add(volatile T *dest,
|
|||||||
typename std::enable_if<sizeof(T) != sizeof(int) &&
|
typename std::enable_if<sizeof(T) != sizeof(int) &&
|
||||||
sizeof(T) != sizeof(long long),
|
sizeof(T) != sizeof(long long),
|
||||||
const T &>::type val) {
|
const T &>::type val) {
|
||||||
// FIXME_HIP
|
|
||||||
Kokkos::abort("atomic_fetch_add not implemented for large types.\n");
|
|
||||||
T return_val;
|
T return_val;
|
||||||
int done = 0;
|
int done = 0;
|
||||||
unsigned int active = __ballot(1);
|
unsigned int active = __ballot(1);
|
||||||
unsigned int done_active = 0;
|
unsigned int done_active = 0;
|
||||||
while (active != done_active) {
|
while (active != done_active) {
|
||||||
if (!done) {
|
if (!done) {
|
||||||
// if(Kokkos::Impl::lock_address_hip_space((void *)dest))
|
if (Kokkos::Impl::lock_address_hip_space((void *)dest)) {
|
||||||
{
|
|
||||||
return_val = *dest;
|
return_val = *dest;
|
||||||
*dest = return_val + val;
|
*dest = return_val + val;
|
||||||
// Kokkos::Impl::unlock_address_hip_space((void *)dest);
|
Kokkos::Impl::unlock_address_hip_space((void *)dest);
|
||||||
done = 1;
|
done = 1;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -513,19 +508,16 @@ atomic_fetch_sub(volatile T *const dest,
|
|||||||
typename std::enable_if<sizeof(T) != sizeof(int) &&
|
typename std::enable_if<sizeof(T) != sizeof(int) &&
|
||||||
sizeof(T) != sizeof(long long),
|
sizeof(T) != sizeof(long long),
|
||||||
const T>::type &val) {
|
const T>::type &val) {
|
||||||
// FIXME_HIP
|
|
||||||
Kokkos::abort("atomic_fetch_sub not implemented for large types.\n");
|
|
||||||
T return_val;
|
T return_val;
|
||||||
int done = 0;
|
int done = 0;
|
||||||
unsigned int active = __ballot(1);
|
unsigned int active = __ballot(1);
|
||||||
unsigned int done_active = 0;
|
unsigned int done_active = 0;
|
||||||
while (active != done_active) {
|
while (active != done_active) {
|
||||||
if (!done) {
|
if (!done) {
|
||||||
/*if (Impl::lock_address_hip_space((void*)dest)) */
|
if (Impl::lock_address_hip_space((void *)dest)) {
|
||||||
{
|
|
||||||
return_val = *dest;
|
return_val = *dest;
|
||||||
*dest = return_val - val;
|
*dest = return_val - val;
|
||||||
// Impl::unlock_address_hip_space((void*)dest);
|
Impl::unlock_address_hip_space((void *)dest);
|
||||||
done = 1;
|
done = 1;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -569,6 +561,62 @@ __inline__ __device__ unsigned long long int atomic_fetch_and(
|
|||||||
unsigned long long int const val) {
|
unsigned long long int const val) {
|
||||||
return atomicAnd(const_cast<unsigned long long int *>(dest), val);
|
return atomicAnd(const_cast<unsigned long long int *>(dest), val);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
__inline__ __device__ void _atomic_store(T *ptr, T val,
|
||||||
|
memory_order_relaxed_t) {
|
||||||
|
(void)atomic_exchange(ptr, val);
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
__inline__ __device__ void _atomic_store(T *ptr, T val,
|
||||||
|
memory_order_seq_cst_t) {
|
||||||
|
memory_fence();
|
||||||
|
atomic_store(ptr, val, memory_order_relaxed);
|
||||||
|
memory_fence();
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
__inline__ __device__ void _atomic_store(T *ptr, T val,
|
||||||
|
memory_order_release_t) {
|
||||||
|
memory_fence();
|
||||||
|
atomic_store(ptr, val, memory_order_relaxed);
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
__inline__ __device__ void _atomic_store(T *ptr, T val) {
|
||||||
|
atomic_store(ptr, val, memory_order_relaxed);
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
__inline__ __device__ T _atomic_load(T *ptr, memory_order_relaxed_t) {
|
||||||
|
T dummy{};
|
||||||
|
return atomic_compare_exchange(ptr, dummy, dummy);
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
__inline__ __device__ T _atomic_load(T *ptr, memory_order_seq_cst_t) {
|
||||||
|
memory_fence();
|
||||||
|
T rv = atomic_load(ptr, memory_order_relaxed);
|
||||||
|
memory_fence();
|
||||||
|
return rv;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
__inline__ __device__ T _atomic_load(T *ptr, memory_order_acquire_t) {
|
||||||
|
T rv = atomic_load(ptr, memory_order_relaxed);
|
||||||
|
memory_fence();
|
||||||
|
return rv;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
__inline__ __device__ T _atomic_load(T *ptr) {
|
||||||
|
return atomic_load(ptr, memory_order_relaxed);
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace Impl
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
|||||||
@ -55,6 +55,26 @@
|
|||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Experimental {
|
namespace Experimental {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
|
template <typename DriverType, bool, int MaxThreadsPerBlock, int MinBlocksPerSM>
|
||||||
|
void hipOccupancy(int *numBlocks, int blockSize, int sharedmem) {
|
||||||
|
// FIXME_HIP - currently the "constant" path is unimplemented.
|
||||||
|
// we should look at whether it's functional, and
|
||||||
|
// perform some simple scaling studies to see when /
|
||||||
|
// if the constant launcher outperforms the current
|
||||||
|
// pass by pointer shared launcher
|
||||||
|
HIP_SAFE_CALL(hipOccupancyMaxActiveBlocksPerMultiprocessor(
|
||||||
|
numBlocks,
|
||||||
|
hip_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
|
||||||
|
MinBlocksPerSM>,
|
||||||
|
blockSize, sharedmem));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename DriverType, bool constant>
|
||||||
|
void hipOccupancy(int *numBlocks, int blockSize, int sharedmem) {
|
||||||
|
hipOccupancy<DriverType, constant, HIPTraits::MaxThreadsPerBlock, 1>(
|
||||||
|
numBlocks, blockSize, sharedmem);
|
||||||
|
}
|
||||||
template <typename DriverType, typename LaunchBounds, bool Large>
|
template <typename DriverType, typename LaunchBounds, bool Large>
|
||||||
struct HIPGetMaxBlockSize;
|
struct HIPGetMaxBlockSize;
|
||||||
|
|
||||||
@ -78,31 +98,26 @@ int hip_internal_get_block_size(const F &condition_check,
|
|||||||
const int min_blocks_per_sm =
|
const int min_blocks_per_sm =
|
||||||
LaunchBounds::minBperSM == 0 ? 1 : LaunchBounds::minBperSM;
|
LaunchBounds::minBperSM == 0 ? 1 : LaunchBounds::minBperSM;
|
||||||
const int max_threads_per_block = LaunchBounds::maxTperB == 0
|
const int max_threads_per_block = LaunchBounds::maxTperB == 0
|
||||||
? hip_instance->m_maxThreadsPerBlock
|
? HIPTraits::MaxThreadsPerBlock
|
||||||
: LaunchBounds::maxTperB;
|
: LaunchBounds::maxTperB;
|
||||||
|
|
||||||
const int regs_per_wavefront = attr.numRegs;
|
const int regs_per_wavefront = std::max(attr.numRegs, 1);
|
||||||
const int regs_per_sm = hip_instance->m_regsPerSM;
|
const int regs_per_sm = hip_instance->m_regsPerSM;
|
||||||
const int shmem_per_sm = hip_instance->m_shmemPerSM;
|
const int shmem_per_sm = hip_instance->m_shmemPerSM;
|
||||||
const int max_shmem_per_block = hip_instance->m_maxShmemPerBlock;
|
const int max_shmem_per_block = hip_instance->m_maxShmemPerBlock;
|
||||||
const int max_blocks_per_sm = hip_instance->m_maxBlocksPerSM;
|
const int max_blocks_per_sm = hip_instance->m_maxBlocksPerSM;
|
||||||
const int max_threads_per_sm = hip_instance->m_maxThreadsPerSM;
|
const int max_threads_per_sm = hip_instance->m_maxThreadsPerSM;
|
||||||
|
|
||||||
// FIXME_HIP this is broken in 3.5, but should be in 3.6
|
|
||||||
#if (HIP_VERSION_MAJOR > 3 || HIP_VERSION_MINOR > 5 || \
|
|
||||||
HIP_VERSION_PATCH >= 20226)
|
|
||||||
int block_size = std::min(attr.maxThreadsPerBlock, max_threads_per_block);
|
|
||||||
#else
|
|
||||||
int block_size = max_threads_per_block;
|
int block_size = max_threads_per_block;
|
||||||
#endif
|
|
||||||
KOKKOS_ASSERT(block_size > 0);
|
KOKKOS_ASSERT(block_size > 0);
|
||||||
|
const int blocks_per_warp =
|
||||||
|
(block_size + HIPTraits::WarpSize - 1) / HIPTraits::WarpSize;
|
||||||
|
|
||||||
int functor_shmem = ::Kokkos::Impl::FunctorTeamShmemSize<FunctorType>::value(
|
int functor_shmem = ::Kokkos::Impl::FunctorTeamShmemSize<FunctorType>::value(
|
||||||
f, block_size / vector_length);
|
f, block_size / vector_length);
|
||||||
int total_shmem = shmem_block + shmem_thread * (block_size / vector_length) +
|
int total_shmem = shmem_block + shmem_thread * (block_size / vector_length) +
|
||||||
functor_shmem + attr.sharedSizeBytes;
|
functor_shmem + attr.sharedSizeBytes;
|
||||||
int max_blocks_regs =
|
int max_blocks_regs = regs_per_sm / (regs_per_wavefront * blocks_per_warp);
|
||||||
regs_per_sm / (regs_per_wavefront * (block_size / HIPTraits::WarpSize));
|
|
||||||
int max_blocks_shmem =
|
int max_blocks_shmem =
|
||||||
(total_shmem < max_shmem_per_block)
|
(total_shmem < max_shmem_per_block)
|
||||||
? (total_shmem > 0 ? shmem_per_sm / total_shmem : max_blocks_regs)
|
? (total_shmem > 0 ? shmem_per_sm / total_shmem : max_blocks_regs)
|
||||||
@ -113,7 +128,8 @@ int hip_internal_get_block_size(const F &condition_check,
|
|||||||
blocks_per_sm = max_threads_per_sm / block_size;
|
blocks_per_sm = max_threads_per_sm / block_size;
|
||||||
threads_per_sm = blocks_per_sm * block_size;
|
threads_per_sm = blocks_per_sm * block_size;
|
||||||
}
|
}
|
||||||
int opt_block_size = (blocks_per_sm >= min_blocks_per_sm) ? block_size : 0;
|
int opt_block_size =
|
||||||
|
(blocks_per_sm >= min_blocks_per_sm) ? block_size : min_blocks_per_sm;
|
||||||
int opt_threads_per_sm = threads_per_sm;
|
int opt_threads_per_sm = threads_per_sm;
|
||||||
// printf("BlockSizeMax: %i Shmem: %i %i %i %i Regs: %i %i Blocks: %i %i
|
// printf("BlockSizeMax: %i Shmem: %i %i %i %i Regs: %i %i Blocks: %i %i
|
||||||
// Achieved: %i %i Opt: %i %i\n",block_size,
|
// Achieved: %i %i Opt: %i %i\n",block_size,
|
||||||
@ -126,8 +142,7 @@ int hip_internal_get_block_size(const F &condition_check,
|
|||||||
f, block_size / vector_length);
|
f, block_size / vector_length);
|
||||||
total_shmem = shmem_block + shmem_thread * (block_size / vector_length) +
|
total_shmem = shmem_block + shmem_thread * (block_size / vector_length) +
|
||||||
functor_shmem + attr.sharedSizeBytes;
|
functor_shmem + attr.sharedSizeBytes;
|
||||||
max_blocks_regs =
|
max_blocks_regs = regs_per_sm / (regs_per_wavefront * blocks_per_warp);
|
||||||
regs_per_sm / (regs_per_wavefront * (block_size / HIPTraits::WarpSize));
|
|
||||||
max_blocks_shmem =
|
max_blocks_shmem =
|
||||||
(total_shmem < max_shmem_per_block)
|
(total_shmem < max_shmem_per_block)
|
||||||
? (total_shmem > 0 ? shmem_per_sm / total_shmem : max_blocks_regs)
|
? (total_shmem > 0 ? shmem_per_sm / total_shmem : max_blocks_regs)
|
||||||
@ -163,28 +178,21 @@ int hip_get_max_block_size(const HIPInternal *hip_instance,
|
|||||||
[](int x) { return x == 0; }, hip_instance, attr, f, vector_length,
|
[](int x) { return x == 0; }, hip_instance, attr, f, vector_length,
|
||||||
shmem_block, shmem_thread);
|
shmem_block, shmem_thread);
|
||||||
}
|
}
|
||||||
template <typename DriverType>
|
template <typename DriverType, class LaunchBounds>
|
||||||
struct HIPGetMaxBlockSize<DriverType, Kokkos::LaunchBounds<>, true> {
|
struct HIPGetMaxBlockSize<DriverType, LaunchBounds, true> {
|
||||||
static int get_block_size(typename DriverType::functor_type const &f,
|
static int get_block_size(typename DriverType::functor_type const &f,
|
||||||
size_t const vector_length,
|
size_t const vector_length,
|
||||||
size_t const shmem_extra_block,
|
size_t const shmem_extra_block,
|
||||||
size_t const shmem_extra_thread) {
|
size_t const shmem_extra_thread) {
|
||||||
// FIXME_HIP -- remove this once the API change becomes mature
|
int numBlocks = 0;
|
||||||
#if !defined(__HIP__)
|
int blockSize = LaunchBounds::maxTperB == 0 ? 1024 : LaunchBounds::maxTperB;
|
||||||
using blocktype = unsigned int;
|
|
||||||
#else
|
|
||||||
using blocktype = int;
|
|
||||||
#endif
|
|
||||||
blocktype numBlocks = 0;
|
|
||||||
int blockSize = 1024;
|
|
||||||
int sharedmem =
|
int sharedmem =
|
||||||
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
|
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
|
||||||
::Kokkos::Impl::FunctorTeamShmemSize<
|
::Kokkos::Impl::FunctorTeamShmemSize<
|
||||||
typename DriverType::functor_type>::value(f, blockSize /
|
typename DriverType::functor_type>::value(f, blockSize /
|
||||||
vector_length);
|
vector_length);
|
||||||
hipOccupancyMaxActiveBlocksPerMultiprocessor(
|
|
||||||
&numBlocks, hip_parallel_launch_constant_memory<DriverType>, blockSize,
|
hipOccupancy<DriverType, true>(&numBlocks, blockSize, sharedmem);
|
||||||
sharedmem);
|
|
||||||
|
|
||||||
if (numBlocks > 0) return blockSize;
|
if (numBlocks > 0) return blockSize;
|
||||||
while (blockSize > HIPTraits::WarpSize && numBlocks == 0) {
|
while (blockSize > HIPTraits::WarpSize && numBlocks == 0) {
|
||||||
@ -195,9 +203,7 @@ struct HIPGetMaxBlockSize<DriverType, Kokkos::LaunchBounds<>, true> {
|
|||||||
typename DriverType::functor_type>::value(f, blockSize /
|
typename DriverType::functor_type>::value(f, blockSize /
|
||||||
vector_length);
|
vector_length);
|
||||||
|
|
||||||
hipOccupancyMaxActiveBlocksPerMultiprocessor(
|
hipOccupancy<DriverType, true>(&numBlocks, blockSize, sharedmem);
|
||||||
&numBlocks, hip_parallel_launch_constant_memory<DriverType>,
|
|
||||||
blockSize, sharedmem);
|
|
||||||
}
|
}
|
||||||
int blockSizeUpperBound = blockSize * 2;
|
int blockSizeUpperBound = blockSize * 2;
|
||||||
while (blockSize < blockSizeUpperBound && numBlocks > 0) {
|
while (blockSize < blockSizeUpperBound && numBlocks > 0) {
|
||||||
@ -208,9 +214,7 @@ struct HIPGetMaxBlockSize<DriverType, Kokkos::LaunchBounds<>, true> {
|
|||||||
typename DriverType::functor_type>::value(f, blockSize /
|
typename DriverType::functor_type>::value(f, blockSize /
|
||||||
vector_length);
|
vector_length);
|
||||||
|
|
||||||
hipOccupancyMaxActiveBlocksPerMultiprocessor(
|
hipOccupancy<DriverType, true>(&numBlocks, blockSize, sharedmem);
|
||||||
&numBlocks, hip_parallel_launch_constant_memory<DriverType>,
|
|
||||||
blockSize, sharedmem);
|
|
||||||
}
|
}
|
||||||
return blockSize - HIPTraits::WarpSize;
|
return blockSize - HIPTraits::WarpSize;
|
||||||
}
|
}
|
||||||
@ -255,7 +259,7 @@ struct HIPGetOptBlockSize<DriverType, Kokkos::LaunchBounds<0, 0>, true> {
|
|||||||
int maxOccupancy = 0;
|
int maxOccupancy = 0;
|
||||||
int bestBlockSize = 0;
|
int bestBlockSize = 0;
|
||||||
|
|
||||||
while (blockSize < 1024) {
|
while (blockSize < HIPTraits::MaxThreadsPerBlock) {
|
||||||
blockSize *= 2;
|
blockSize *= 2;
|
||||||
|
|
||||||
// calculate the occupancy with that optBlockSize and check whether its
|
// calculate the occupancy with that optBlockSize and check whether its
|
||||||
@ -265,9 +269,7 @@ struct HIPGetOptBlockSize<DriverType, Kokkos::LaunchBounds<0, 0>, true> {
|
|||||||
::Kokkos::Impl::FunctorTeamShmemSize<
|
::Kokkos::Impl::FunctorTeamShmemSize<
|
||||||
typename DriverType::functor_type>::value(f, blockSize /
|
typename DriverType::functor_type>::value(f, blockSize /
|
||||||
vector_length);
|
vector_length);
|
||||||
hipOccupancyMaxActiveBlocksPerMultiprocessor(
|
hipOccupancy<DriverType, true>(&numBlocks, blockSize, sharedmem);
|
||||||
&numBlocks, hip_parallel_launch_constant_memory<DriverType>,
|
|
||||||
blockSize, sharedmem);
|
|
||||||
if (maxOccupancy < numBlocks * blockSize) {
|
if (maxOccupancy < numBlocks * blockSize) {
|
||||||
maxOccupancy = numBlocks * blockSize;
|
maxOccupancy = numBlocks * blockSize;
|
||||||
bestBlockSize = blockSize;
|
bestBlockSize = blockSize;
|
||||||
@ -289,7 +291,7 @@ struct HIPGetOptBlockSize<DriverType, Kokkos::LaunchBounds<0, 0>, false> {
|
|||||||
int maxOccupancy = 0;
|
int maxOccupancy = 0;
|
||||||
int bestBlockSize = 0;
|
int bestBlockSize = 0;
|
||||||
|
|
||||||
while (blockSize < 1024) {
|
while (blockSize < HIPTraits::MaxThreadsPerBlock) {
|
||||||
blockSize *= 2;
|
blockSize *= 2;
|
||||||
sharedmem =
|
sharedmem =
|
||||||
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
|
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
|
||||||
@ -297,9 +299,7 @@ struct HIPGetOptBlockSize<DriverType, Kokkos::LaunchBounds<0, 0>, false> {
|
|||||||
typename DriverType::functor_type>::value(f, blockSize /
|
typename DriverType::functor_type>::value(f, blockSize /
|
||||||
vector_length);
|
vector_length);
|
||||||
|
|
||||||
hipOccupancyMaxActiveBlocksPerMultiprocessor(
|
hipOccupancy<DriverType, false>(&numBlocks, blockSize, sharedmem);
|
||||||
&numBlocks, hip_parallel_launch_local_memory<DriverType>, blockSize,
|
|
||||||
sharedmem);
|
|
||||||
|
|
||||||
if (maxOccupancy < numBlocks * blockSize) {
|
if (maxOccupancy < numBlocks * blockSize) {
|
||||||
maxOccupancy = numBlocks * blockSize;
|
maxOccupancy = numBlocks * blockSize;
|
||||||
@ -340,11 +340,8 @@ struct HIPGetOptBlockSize<
|
|||||||
::Kokkos::Impl::FunctorTeamShmemSize<
|
::Kokkos::Impl::FunctorTeamShmemSize<
|
||||||
typename DriverType::functor_type>::value(f, blockSize /
|
typename DriverType::functor_type>::value(f, blockSize /
|
||||||
vector_length);
|
vector_length);
|
||||||
hipOccupancyMaxActiveBlocksPerMultiprocessor(
|
hipOccupancy<DriverType, true, MaxThreadsPerBlock, MinBlocksPerSM>(
|
||||||
&numBlocks,
|
&numBlocks, blockSize, sharedmem);
|
||||||
hip_parallel_launch_constant_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>,
|
|
||||||
blockSize, sharedmem);
|
|
||||||
if (numBlocks >= static_cast<int>(MinBlocksPerSM) &&
|
if (numBlocks >= static_cast<int>(MinBlocksPerSM) &&
|
||||||
blockSize <= static_cast<int>(MaxThreadsPerBlock)) {
|
blockSize <= static_cast<int>(MaxThreadsPerBlock)) {
|
||||||
if (maxOccupancy < numBlocks * blockSize) {
|
if (maxOccupancy < numBlocks * blockSize) {
|
||||||
@ -384,11 +381,8 @@ struct HIPGetOptBlockSize<
|
|||||||
typename DriverType::functor_type>::value(f, blockSize /
|
typename DriverType::functor_type>::value(f, blockSize /
|
||||||
vector_length);
|
vector_length);
|
||||||
|
|
||||||
hipOccupancyMaxActiveBlocksPerMultiprocessor(
|
hipOccupancy<DriverType, false, MaxThreadsPerBlock, MinBlocksPerSM>(
|
||||||
&numBlocks,
|
&numBlocks, blockSize, sharedmem);
|
||||||
hip_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
|
|
||||||
MinBlocksPerSM>,
|
|
||||||
blockSize, sharedmem);
|
|
||||||
if (numBlocks >= int(MinBlocksPerSM) &&
|
if (numBlocks >= int(MinBlocksPerSM) &&
|
||||||
blockSize <= int(MaxThreadsPerBlock)) {
|
blockSize <= int(MaxThreadsPerBlock)) {
|
||||||
if (maxOccupancy < numBlocks * blockSize) {
|
if (maxOccupancy < numBlocks * blockSize) {
|
||||||
|
|||||||
@ -56,10 +56,10 @@ namespace Kokkos {
|
|||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
void hip_internal_error_throw(hipError_t e, const char* name,
|
void hip_internal_error_throw(hipError_t e, const char* name,
|
||||||
const char* file = NULL, const int line = 0);
|
const char* file = nullptr, const int line = 0);
|
||||||
|
|
||||||
inline void hip_internal_safe_call(hipError_t e, const char* name,
|
inline void hip_internal_safe_call(hipError_t e, const char* name,
|
||||||
const char* file = NULL,
|
const char* file = nullptr,
|
||||||
const int line = 0) {
|
const int line = 0) {
|
||||||
if (hipSuccess != e) {
|
if (hipSuccess != e) {
|
||||||
hip_internal_error_throw(e, name, file, line);
|
hip_internal_error_throw(e, name, file, line);
|
||||||
|
|||||||
@ -114,7 +114,7 @@ void HIPInternal::print_configuration(std::ostream &s) const {
|
|||||||
<< (dev_info.m_hipProp[i].major) << "." << dev_info.m_hipProp[i].minor
|
<< (dev_info.m_hipProp[i].major) << "." << dev_info.m_hipProp[i].minor
|
||||||
<< ", Total Global Memory: "
|
<< ", Total Global Memory: "
|
||||||
<< ::Kokkos::Impl::human_memory_size(dev_info.m_hipProp[i].totalGlobalMem)
|
<< ::Kokkos::Impl::human_memory_size(dev_info.m_hipProp[i].totalGlobalMem)
|
||||||
<< ", Shared Memory per Wavefront: "
|
<< ", Shared Memory per Block: "
|
||||||
<< ::Kokkos::Impl::human_memory_size(
|
<< ::Kokkos::Impl::human_memory_size(
|
||||||
dev_info.m_hipProp[i].sharedMemPerBlock);
|
dev_info.m_hipProp[i].sharedMemPerBlock);
|
||||||
if (m_hipDev == i) s << " : Selected";
|
if (m_hipDev == i) s << " : Selected";
|
||||||
@ -140,10 +140,10 @@ HIPInternal::~HIPInternal() {
|
|||||||
m_maxShmemPerBlock = 0;
|
m_maxShmemPerBlock = 0;
|
||||||
m_scratchSpaceCount = 0;
|
m_scratchSpaceCount = 0;
|
||||||
m_scratchFlagsCount = 0;
|
m_scratchFlagsCount = 0;
|
||||||
m_scratchSpace = 0;
|
m_scratchSpace = nullptr;
|
||||||
m_scratchFlags = 0;
|
m_scratchFlags = nullptr;
|
||||||
m_scratchConcurrentBitset = nullptr;
|
m_scratchConcurrentBitset = nullptr;
|
||||||
m_stream = 0;
|
m_stream = nullptr;
|
||||||
}
|
}
|
||||||
|
|
||||||
int HIPInternal::verify_is_initialized(const char *const label) const {
|
int HIPInternal::verify_is_initialized(const char *const label) const {
|
||||||
@ -183,7 +183,7 @@ void HIPInternal::initialize(int hip_device_id, hipStream_t stream) {
|
|||||||
|
|
||||||
const HIPInternalDevices &dev_info = HIPInternalDevices::singleton();
|
const HIPInternalDevices &dev_info = HIPInternalDevices::singleton();
|
||||||
|
|
||||||
const bool ok_init = 0 == m_scratchSpace || 0 == m_scratchFlags;
|
const bool ok_init = nullptr == m_scratchSpace || nullptr == m_scratchFlags;
|
||||||
|
|
||||||
// Need at least a GPU device
|
// Need at least a GPU device
|
||||||
const bool ok_id =
|
const bool ok_id =
|
||||||
@ -195,9 +195,11 @@ void HIPInternal::initialize(int hip_device_id, hipStream_t stream) {
|
|||||||
m_hipDev = hip_device_id;
|
m_hipDev = hip_device_id;
|
||||||
m_deviceProp = hipProp;
|
m_deviceProp = hipProp;
|
||||||
|
|
||||||
hipSetDevice(m_hipDev);
|
HIP_SAFE_CALL(hipSetDevice(m_hipDev));
|
||||||
|
|
||||||
m_stream = stream;
|
m_stream = stream;
|
||||||
|
m_team_scratch_current_size = 0;
|
||||||
|
m_team_scratch_ptr = nullptr;
|
||||||
|
|
||||||
// number of multiprocessors
|
// number of multiprocessors
|
||||||
m_multiProcCount = hipProp.multiProcessorCount;
|
m_multiProcCount = hipProp.multiProcessorCount;
|
||||||
@ -216,14 +218,19 @@ void HIPInternal::initialize(int hip_device_id, hipStream_t stream) {
|
|||||||
m_maxBlock = hipProp.maxGridSize[0];
|
m_maxBlock = hipProp.maxGridSize[0];
|
||||||
|
|
||||||
// theoretically, we can get 40 WF's / CU, but only can sustain 32
|
// theoretically, we can get 40 WF's / CU, but only can sustain 32
|
||||||
|
// see
|
||||||
|
// https://github.com/ROCm-Developer-Tools/HIP/blob/a0b5dfd625d99af7e288629747b40dd057183173/vdi/hip_platform.cpp#L742
|
||||||
m_maxBlocksPerSM = 32;
|
m_maxBlocksPerSM = 32;
|
||||||
// FIXME_HIP - Nick to implement this upstream
|
// FIXME_HIP - Nick to implement this upstream
|
||||||
m_regsPerSM = 262144 / 32;
|
// Register count comes from Sec. 2.2. "Data Sharing" of the
|
||||||
m_shmemPerSM = hipProp.maxSharedMemoryPerMultiProcessor;
|
// Vega 7nm ISA document (see the diagram)
|
||||||
m_maxShmemPerBlock = hipProp.sharedMemPerBlock;
|
// https://developer.amd.com/wp-content/resources/Vega_7nm_Shader_ISA.pdf
|
||||||
m_maxThreadsPerSM = m_maxBlocksPerSM * HIPTraits::WarpSize;
|
// VGPRS = 4 (SIMD/CU) * 256 VGPR/SIMD * 64 registers / VGPR =
|
||||||
m_maxThreadsPerBlock = hipProp.maxThreadsPerBlock;
|
// 65536 VGPR/CU
|
||||||
|
m_regsPerSM = 65536;
|
||||||
|
m_shmemPerSM = hipProp.maxSharedMemoryPerMultiProcessor;
|
||||||
|
m_maxShmemPerBlock = hipProp.sharedMemPerBlock;
|
||||||
|
m_maxThreadsPerSM = m_maxBlocksPerSM * HIPTraits::WarpSize;
|
||||||
//----------------------------------
|
//----------------------------------
|
||||||
// Multiblock reduction uses scratch flags for counters
|
// Multiblock reduction uses scratch flags for counters
|
||||||
// and scratch space for partial reduction values.
|
// and scratch space for partial reduction values.
|
||||||
@ -277,8 +284,7 @@ void HIPInternal::initialize(int hip_device_id, hipStream_t stream) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Init the array for used for arbitrarily sized atomics
|
// Init the array for used for arbitrarily sized atomics
|
||||||
// FIXME_HIP uncomment this when global variable works
|
if (m_stream == nullptr) ::Kokkos::Impl::initialize_host_hip_lock_arrays();
|
||||||
// if (m_stream == 0) ::Kokkos::Impl::initialize_host_hip_lock_arrays();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
@ -327,18 +333,35 @@ Kokkos::Experimental::HIP::size_type *HIPInternal::scratch_flags(
|
|||||||
|
|
||||||
m_scratchFlags = reinterpret_cast<size_type *>(r->data());
|
m_scratchFlags = reinterpret_cast<size_type *>(r->data());
|
||||||
|
|
||||||
hipMemset(m_scratchFlags, 0, m_scratchFlagsCount * sizeScratchGrain);
|
HIP_SAFE_CALL(
|
||||||
|
hipMemset(m_scratchFlags, 0, m_scratchFlagsCount * sizeScratchGrain));
|
||||||
}
|
}
|
||||||
|
|
||||||
return m_scratchFlags;
|
return m_scratchFlags;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void *HIPInternal::resize_team_scratch_space(std::int64_t bytes,
|
||||||
|
bool force_shrink) {
|
||||||
|
if (m_team_scratch_current_size == 0) {
|
||||||
|
m_team_scratch_current_size = bytes;
|
||||||
|
m_team_scratch_ptr = Kokkos::kokkos_malloc<Kokkos::Experimental::HIPSpace>(
|
||||||
|
"HIPSpace::ScratchMemory", m_team_scratch_current_size);
|
||||||
|
}
|
||||||
|
if ((bytes > m_team_scratch_current_size) ||
|
||||||
|
((bytes < m_team_scratch_current_size) && (force_shrink))) {
|
||||||
|
m_team_scratch_current_size = bytes;
|
||||||
|
m_team_scratch_ptr = Kokkos::kokkos_realloc<Kokkos::Experimental::HIPSpace>(
|
||||||
|
m_team_scratch_ptr, m_team_scratch_current_size);
|
||||||
|
}
|
||||||
|
return m_team_scratch_ptr;
|
||||||
|
}
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
void HIPInternal::finalize() {
|
void HIPInternal::finalize() {
|
||||||
HIP().fence();
|
this->fence();
|
||||||
was_finalized = true;
|
was_finalized = true;
|
||||||
if (0 != m_scratchSpace || 0 != m_scratchFlags) {
|
if (nullptr != m_scratchSpace || nullptr != m_scratchFlags) {
|
||||||
using RecordHIP =
|
using RecordHIP =
|
||||||
Kokkos::Impl::SharedAllocationRecord<Kokkos::Experimental::HIPSpace>;
|
Kokkos::Impl::SharedAllocationRecord<Kokkos::Experimental::HIPSpace>;
|
||||||
|
|
||||||
@ -346,19 +369,24 @@ void HIPInternal::finalize() {
|
|||||||
RecordHIP::decrement(RecordHIP::get_record(m_scratchSpace));
|
RecordHIP::decrement(RecordHIP::get_record(m_scratchSpace));
|
||||||
RecordHIP::decrement(RecordHIP::get_record(m_scratchConcurrentBitset));
|
RecordHIP::decrement(RecordHIP::get_record(m_scratchConcurrentBitset));
|
||||||
|
|
||||||
m_hipDev = -1;
|
if (m_team_scratch_current_size > 0)
|
||||||
m_hipArch = -1;
|
Kokkos::kokkos_free<Kokkos::Experimental::HIPSpace>(m_team_scratch_ptr);
|
||||||
m_multiProcCount = 0;
|
|
||||||
m_maxWarpCount = 0;
|
m_hipDev = -1;
|
||||||
m_maxBlock = 0;
|
m_hipArch = -1;
|
||||||
m_maxSharedWords = 0;
|
m_multiProcCount = 0;
|
||||||
m_maxShmemPerBlock = 0;
|
m_maxWarpCount = 0;
|
||||||
m_scratchSpaceCount = 0;
|
m_maxBlock = 0;
|
||||||
m_scratchFlagsCount = 0;
|
m_maxSharedWords = 0;
|
||||||
m_scratchSpace = 0;
|
m_maxShmemPerBlock = 0;
|
||||||
m_scratchFlags = 0;
|
m_scratchSpaceCount = 0;
|
||||||
m_scratchConcurrentBitset = nullptr;
|
m_scratchFlagsCount = 0;
|
||||||
m_stream = 0;
|
m_scratchSpace = nullptr;
|
||||||
|
m_scratchFlags = nullptr;
|
||||||
|
m_scratchConcurrentBitset = nullptr;
|
||||||
|
m_stream = nullptr;
|
||||||
|
m_team_scratch_current_size = 0;
|
||||||
|
m_team_scratch_ptr = nullptr;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -57,6 +57,8 @@ struct HIPTraits {
|
|||||||
static int constexpr WarpSize = 64;
|
static int constexpr WarpSize = 64;
|
||||||
static int constexpr WarpIndexMask = 0x003f; /* hexadecimal for 63 */
|
static int constexpr WarpIndexMask = 0x003f; /* hexadecimal for 63 */
|
||||||
static int constexpr WarpIndexShift = 6; /* WarpSize == 1 << WarpShift*/
|
static int constexpr WarpIndexShift = 6; /* WarpSize == 1 << WarpShift*/
|
||||||
|
static int constexpr MaxThreadsPerBlock =
|
||||||
|
1024; // FIXME_HIP -- assumed constant for now
|
||||||
|
|
||||||
static int constexpr ConstantMemoryUsage = 0x008000; /* 32k bytes */
|
static int constexpr ConstantMemoryUsage = 0x008000; /* 32k bytes */
|
||||||
static int constexpr ConstantMemoryUseThreshold = 0x000200; /* 512 bytes */
|
static int constexpr ConstantMemoryUseThreshold = 0x000200; /* 512 bytes */
|
||||||
@ -92,9 +94,11 @@ class HIPInternal {
|
|||||||
int m_shmemPerSM;
|
int m_shmemPerSM;
|
||||||
int m_maxShmemPerBlock;
|
int m_maxShmemPerBlock;
|
||||||
int m_maxThreadsPerSM;
|
int m_maxThreadsPerSM;
|
||||||
int m_maxThreadsPerBlock;
|
|
||||||
|
// Scratch Spaces for Reductions
|
||||||
size_type m_scratchSpaceCount;
|
size_type m_scratchSpaceCount;
|
||||||
size_type m_scratchFlagsCount;
|
size_type m_scratchFlagsCount;
|
||||||
|
|
||||||
size_type *m_scratchSpace;
|
size_type *m_scratchSpace;
|
||||||
size_type *m_scratchFlags;
|
size_type *m_scratchFlags;
|
||||||
uint32_t *m_scratchConcurrentBitset = nullptr;
|
uint32_t *m_scratchConcurrentBitset = nullptr;
|
||||||
@ -103,6 +107,10 @@ class HIPInternal {
|
|||||||
|
|
||||||
hipStream_t m_stream;
|
hipStream_t m_stream;
|
||||||
|
|
||||||
|
// Team Scratch Level 1 Space
|
||||||
|
mutable int64_t m_team_scratch_current_size;
|
||||||
|
mutable void *m_team_scratch_ptr;
|
||||||
|
|
||||||
bool was_finalized = false;
|
bool was_finalized = false;
|
||||||
|
|
||||||
static HIPInternal &singleton();
|
static HIPInternal &singleton();
|
||||||
@ -113,7 +121,7 @@ class HIPInternal {
|
|||||||
return m_hipDev >= 0;
|
return m_hipDev >= 0;
|
||||||
} // 0 != m_scratchSpace && 0 != m_scratchFlags ; }
|
} // 0 != m_scratchSpace && 0 != m_scratchFlags ; }
|
||||||
|
|
||||||
void initialize(int hip_device_id, hipStream_t stream = 0);
|
void initialize(int hip_device_id, hipStream_t stream = nullptr);
|
||||||
void finalize();
|
void finalize();
|
||||||
|
|
||||||
void print_configuration(std::ostream &) const;
|
void print_configuration(std::ostream &) const;
|
||||||
@ -132,15 +140,21 @@ class HIPInternal {
|
|||||||
m_shmemPerSM(0),
|
m_shmemPerSM(0),
|
||||||
m_maxShmemPerBlock(0),
|
m_maxShmemPerBlock(0),
|
||||||
m_maxThreadsPerSM(0),
|
m_maxThreadsPerSM(0),
|
||||||
m_maxThreadsPerBlock(0),
|
|
||||||
m_scratchSpaceCount(0),
|
m_scratchSpaceCount(0),
|
||||||
m_scratchFlagsCount(0),
|
m_scratchFlagsCount(0),
|
||||||
m_scratchSpace(0),
|
m_scratchSpace(nullptr),
|
||||||
m_scratchFlags(0),
|
m_scratchFlags(nullptr),
|
||||||
m_stream(0) {}
|
m_stream(nullptr),
|
||||||
|
m_team_scratch_current_size(0),
|
||||||
|
m_team_scratch_ptr(nullptr) {}
|
||||||
|
|
||||||
|
// Resizing of reduction related scratch spaces
|
||||||
size_type *scratch_space(const size_type size);
|
size_type *scratch_space(const size_type size);
|
||||||
size_type *scratch_flags(const size_type size);
|
size_type *scratch_flags(const size_type size);
|
||||||
|
|
||||||
|
// Resizing of team level 1 scratch
|
||||||
|
void *resize_team_scratch_space(std::int64_t bytes,
|
||||||
|
bool force_shrink = false);
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
|
|||||||
@ -64,7 +64,7 @@ namespace Kokkos {
|
|||||||
namespace Experimental {
|
namespace Experimental {
|
||||||
template <typename T>
|
template <typename T>
|
||||||
inline __device__ T *kokkos_impl_hip_shared_memory() {
|
inline __device__ T *kokkos_impl_hip_shared_memory() {
|
||||||
extern __shared__ HIPSpace::size_type sh[];
|
HIP_DYNAMIC_SHARED(HIPSpace::size_type, sh);
|
||||||
return (T *)sh;
|
return (T *)sh;
|
||||||
}
|
}
|
||||||
} // namespace Experimental
|
} // namespace Experimental
|
||||||
@ -74,18 +74,17 @@ namespace Kokkos {
|
|||||||
namespace Experimental {
|
namespace Experimental {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
void *hip_resize_scratch_space(std::int64_t bytes, bool force_shrink = false);
|
|
||||||
|
|
||||||
template <typename DriverType>
|
template <typename DriverType>
|
||||||
__global__ static void hip_parallel_launch_constant_memory() {
|
__global__ static void hip_parallel_launch_constant_memory() {
|
||||||
// cannot use global constants in HCC
|
const DriverType &driver = *(reinterpret_cast<const DriverType *>(
|
||||||
#ifdef __HCC__
|
kokkos_impl_hip_constant_memory_buffer));
|
||||||
__device__ __constant__ unsigned long kokkos_impl_hip_constant_memory_buffer
|
driver();
|
||||||
[Kokkos::Experimental::Impl::HIPTraits::ConstantMemoryUsage /
|
}
|
||||||
sizeof(unsigned long)];
|
|
||||||
#endif
|
|
||||||
|
|
||||||
const DriverType *const driver = (reinterpret_cast<const DriverType *>(
|
template <typename DriverType, unsigned int maxTperB, unsigned int minBperSM>
|
||||||
|
__global__ __launch_bounds__(
|
||||||
|
maxTperB, minBperSM) static void hip_parallel_launch_constant_memory() {
|
||||||
|
const DriverType &driver = *(reinterpret_cast<const DriverType *>(
|
||||||
kokkos_impl_hip_constant_memory_buffer));
|
kokkos_impl_hip_constant_memory_buffer));
|
||||||
|
|
||||||
driver->operator()();
|
driver->operator()();
|
||||||
@ -147,6 +146,8 @@ struct HIPParallelLaunch<
|
|||||||
"HIPParallelLaunch FAILED: shared memory request is too large");
|
"HIPParallelLaunch FAILED: shared memory request is too large");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
KOKKOS_ENSURE_HIP_LOCK_ARRAYS_ON_DEVICE();
|
||||||
|
|
||||||
// FIXME_HIP -- there is currently an error copying (some) structs
|
// FIXME_HIP -- there is currently an error copying (some) structs
|
||||||
// by value to the device in HIP-Clang / VDI
|
// by value to the device in HIP-Clang / VDI
|
||||||
// As a workaround, we can malloc the DriverType and explictly copy over.
|
// As a workaround, we can malloc the DriverType and explictly copy over.
|
||||||
@ -169,12 +170,15 @@ struct HIPParallelLaunch<
|
|||||||
}
|
}
|
||||||
|
|
||||||
static hipFuncAttributes get_hip_func_attributes() {
|
static hipFuncAttributes get_hip_func_attributes() {
|
||||||
hipFuncAttributes attr;
|
static hipFuncAttributes attr = []() {
|
||||||
hipFuncGetAttributes(
|
hipFuncAttributes attr;
|
||||||
&attr,
|
HIP_SAFE_CALL(hipFuncGetAttributes(
|
||||||
reinterpret_cast<void const *>(
|
&attr,
|
||||||
hip_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
|
reinterpret_cast<void const *>(
|
||||||
MinBlocksPerSM>));
|
hip_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
|
||||||
|
MinBlocksPerSM>)));
|
||||||
|
return attr;
|
||||||
|
}();
|
||||||
return attr;
|
return attr;
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
@ -192,6 +196,8 @@ struct HIPParallelLaunch<DriverType, Kokkos::LaunchBounds<0, 0>,
|
|||||||
"HIPParallelLaunch FAILED: shared memory request is too large"));
|
"HIPParallelLaunch FAILED: shared memory request is too large"));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
KOKKOS_ENSURE_HIP_LOCK_ARRAYS_ON_DEVICE();
|
||||||
|
|
||||||
// Invoke the driver function on the device
|
// Invoke the driver function on the device
|
||||||
|
|
||||||
// FIXME_HIP -- see note about struct copy by value above
|
// FIXME_HIP -- see note about struct copy by value above
|
||||||
@ -212,10 +218,13 @@ struct HIPParallelLaunch<DriverType, Kokkos::LaunchBounds<0, 0>,
|
|||||||
}
|
}
|
||||||
|
|
||||||
static hipFuncAttributes get_hip_func_attributes() {
|
static hipFuncAttributes get_hip_func_attributes() {
|
||||||
hipFuncAttributes attr;
|
static hipFuncAttributes attr = []() {
|
||||||
hipFuncGetAttributes(
|
hipFuncAttributes attr;
|
||||||
&attr, reinterpret_cast<void *>(
|
HIP_SAFE_CALL(hipFuncGetAttributes(
|
||||||
&hip_parallel_launch_local_memory<DriverType, 1024, 1>));
|
&attr, reinterpret_cast<void const *>(
|
||||||
|
hip_parallel_launch_local_memory<DriverType, 1024, 1>)));
|
||||||
|
return attr;
|
||||||
|
}();
|
||||||
return attr;
|
return attr;
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user