Update Kokkos library in LAMMPS to v3.4.0

This commit is contained in:
Stan Gerald Moore
2021-04-26 16:28:19 -06:00
parent 39f3c1684f
commit 692da3bf88
358 changed files with 16375 additions and 10003 deletions

View File

@ -1,5 +1,168 @@
# Change Log # Change Log
## [3.4.00](https://github.com/kokkos/kokkos/tree/3.4.00) (2021-04-25)
[Full Changelog](https://github.com/kokkos/kokkos/compare/3.3.01...3.4.00)
**Highlights:**
- SYCL Backend Almost Feature Complete
- OpenMPTarget Backend Almost Feature Complete
- Performance Improvements for HIP backend
- Require CMake 3.16 or newer
- Tool Callback Interface Enhancements
- cmath wrapper functions available now in Kokkos::Experimental
**Features:**
- Implement parallel_scan with ThreadVectorRange and Reducer [\#3861](https://github.com/kokkos/kokkos/pull/3861)
- Implement SYCL Random [\#3849](https://github.com/kokkos/kokkos/pull/3849)
- OpenMPTarget: Adding Implementation for nested reducers [\#3845](https://github.com/kokkos/kokkos/pull/3845)
- Implement UniqueToken for SYCL [\#3833](https://github.com/kokkos/kokkos/pull/3833)
- OpenMPTarget: UniqueToken::Global implementation [\#3823](https://github.com/kokkos/kokkos/pull/3823)
- DualView sync's on ExecutionSpaces [\#3822](https://github.com/kokkos/kokkos/pull/3822)
- SYCL outer TeamPolicy parallel_reduce [\#3818](https://github.com/kokkos/kokkos/pull/3818)
- SYCL TeamPolicy::team_scan [\#3815](https://github.com/kokkos/kokkos/pull/3815)
- SYCL MDRangePolicy parallel_reduce [\#3801](https://github.com/kokkos/kokkos/pull/3801)
- Enable use of execution space instances in ScatterView [\#3786](https://github.com/kokkos/kokkos/pull/3786)
- SYCL TeamPolicy nested parallel_reduce [\#3783](https://github.com/kokkos/kokkos/pull/3783)
- OpenMPTarget: MDRange with TagType for parallel_for [\#3781](https://github.com/kokkos/kokkos/pull/3781)
- Adding OpenMPTarget parallel_scan [\#3655](https://github.com/kokkos/kokkos/pull/3655)
- SYCL basic TeamPolicy [\#3654](https://github.com/kokkos/kokkos/pull/3654)
- OpenMPTarget: scratch memory implementation [\#3611](https://github.com/kokkos/kokkos/pull/3611)
**Implemented enhancements Backends and Archs:**
- SYCL choose a specific GPU [\#3918](https://github.com/kokkos/kokkos/pull/3918)
- [HIP] Lock access to scratch memory when using Teams [\#3916](https://github.com/kokkos/kokkos/pull/3916)
- [HIP] fix multithreaded access to get_next_driver [\#3908](https://github.com/kokkos/kokkos/pull/3908)
- Forward declare HIPHostPinnedSpace and SYCLSharedUSMSpace [\#3902](https://github.com/kokkos/kokkos/pull/3902)
- Let SYCL USMObjectMem use SharedAllocationRecord [\#3898](https://github.com/kokkos/kokkos/pull/3898)
- Implement clock_tic for SYCL [\#3893](https://github.com/kokkos/kokkos/pull/3893)
- Don't use a static variable in HIPInternal::scratch_space [\#3866](https://github.com/kokkos/kokkos/pull/3866)(https://github.com/kokkos/kokkos/pull/3866)
- Reuse memory for SYCL parallel_reduce [\#3873](https://github.com/kokkos/kokkos/pull/3873)
- Update SYCL compiler in CI [\#3826](https://github.com/kokkos/kokkos/pull/3826)
- Introduce HostSharedPtr to manage m_space_instance for Cuda/HIP/SYCL [\#3824](https://github.com/kokkos/kokkos/pull/3824)
- [HIP] Use shuffle for range reduction [\#3811](https://github.com/kokkos/kokkos/pull/3811)
- OpenMPTarget: Changes to the hierarchical parallelism [\#3808](https://github.com/kokkos/kokkos/pull/3808)
- Remove ExtendedReferenceWrapper for SYCL parallel_reduce [\#3802](https://github.com/kokkos/kokkos/pull/3802)
- Eliminate sycl_indirect_launch [\#3777](https://github.com/kokkos/kokkos/pull/3777)
- OpenMPTarget: scratch implementation for parallel_reduce [\#3776](https://github.com/kokkos/kokkos/pull/3776)
- Allow initializing SYCL execution space from sycl::queue and SYCL::impl_static_fence [\#3767](https://github.com/kokkos/kokkos/pull/3767)
- SYCL TeamPolicy scratch memory alternative [\#3763](https://github.com/kokkos/kokkos/pull/3763)
- Alternative implementation for SYCL TeamPolicy [\#3759](https://github.com/kokkos/kokkos/pull/3759)
- Unify handling of synchronous errors in SYCL [\#3754](https://github.com/kokkos/kokkos/pull/3754)
- core/Cuda: Half_t updates for cgsolve [\#3746](https://github.com/kokkos/kokkos/pull/3746)
- Unify HIPParallelLaunch structures [\#3733](https://github.com/kokkos/kokkos/pull/3733)
- Improve performance for SYCL parallel_reduce [\#3732](https://github.com/kokkos/kokkos/pull/3732)
- Use consistent types in Kokkos_OpenMPTarget_Parallel.hpp [\#3703](https://github.com/kokkos/kokkos/pull/3703)
- Implement non-blocking kernel launches for HIP backend [\#3697](https://github.com/kokkos/kokkos/pull/3697)
- Change SYCLInternal::m_queue std::unique_ptr -> std::optional [\#3677](https://github.com/kokkos/kokkos/pull/3677)
- Use alternative SYCL parallel_reduce implementation [\#3671](https://github.com/kokkos/kokkos/pull/3671)
- Use runtime values in KokkosExp_MDRangePolicy.hpp [\#3626](https://github.com/kokkos/kokkos/pull/3626)
- Clean up AnalyzePolicy [\#3564](https://github.com/kokkos/kokkos/pull/3564)
- Changes for indirect launch of SYCL parallel reduce [\#3511](https://github.com/kokkos/kokkos/pull/3511)
**Implemented enhancements BuildSystem:**
- Also require C++14 when building gtest [\#3912](https://github.com/kokkos/kokkos/pull/3912)
- Fix compiling SYCL with OpenMP [\#3874](https://github.com/kokkos/kokkos/pull/3874)
- Require C++17 for SYCL (at configuration time) [\#3869](https://github.com/kokkos/kokkos/pull/3869)
- Add COMPILE_DEFINITIONS argument to kokkos_create_imported_tpl [\#3862](https://github.com/kokkos/kokkos/pull/3862)
- Do not pass arch flags to the linker with no rdc [\#3846](https://github.com/kokkos/kokkos/pull/3846)
- Try compiling C++14 check with C++14 support and print error message [\#3843](https://github.com/kokkos/kokkos/pull/3843)
- Enable HIP with Cray Clang [\#3842](https://github.com/kokkos/kokkos/pull/3842)
- Add an option to disable header self containment tests [\#3834](https://github.com/kokkos/kokkos/pull/3834)
- CMake check for C++14 [\#3809](https://github.com/kokkos/kokkos/pull/3809)
- Prefer -std=* over --std=* [\#3779](https://github.com/kokkos/kokkos/pull/3779)
- Kokkos launch compiler updates [\#3778](https://github.com/kokkos/kokkos/pull/3778)
- Updated comments and enabled no-op for kokkos_launch_compiler [\#3774](https://github.com/kokkos/kokkos/pull/3774)
- Apple's Clang not correctly recognised [\#3772](https://github.com/kokkos/kokkos/pull/3772)
- kokkos_launch_compiler + CUDA auto-detect arch [\#3770](https://github.com/kokkos/kokkos/pull/3770)
- Add Spack test support for Kokkos [\#3753](https://github.com/kokkos/kokkos/pull/3753)
- Split SYCL tests for aot compilation [\#3741](https://github.com/kokkos/kokkos/pull/3741)
- Use consistent OpenMP flag for IntelClang [\#3735](https://github.com/kokkos/kokkos/pull/3735)
- Add support for -Wno-deprecated-gpu-targets [\#3722](https://github.com/kokkos/kokkos/pull/3722)
- Add configuration to target CUDA compute capability 8.6 [\#3713](https://github.com/kokkos/kokkos/pull/3713)
- Added VERSION and SOVERSION to KOKKOS_INTERNAL_ADD_LIBRARY [\#3706](https://github.com/kokkos/kokkos/pull/3706)
- Add fast-math to known NVCC flags [\#3699](https://github.com/kokkos/kokkos/pull/3699)
- Add MI-100 arch string [\#3698](https://github.com/kokkos/kokkos/pull/3698)
- Require CMake >=3.16 [\#3679](https://github.com/kokkos/kokkos/pull/3679)
- KokkosCI.cmake, KokkosCTest.cmake.in, CTestConfig.cmake.in + CI updates [\#2844](https://github.com/kokkos/kokkos/pull/2844)
**Implemented enhancements Tools:**
- Improve readability of the callback invocation in profiling [\#3860](https://github.com/kokkos/kokkos/pull/3860)
- V1.1 Tools Interface: incremental, action-based [\#3812](https://github.com/kokkos/kokkos/pull/3812)
- Enable launch latency simulations [\#3721](https://github.com/kokkos/kokkos/pull/3721)
- Added metadata callback to tools interface [\#3711](https://github.com/kokkos/kokkos/pull/3711)
- MDRange Tile Size Tuning [\#3688](https://github.com/kokkos/kokkos/pull/3688)
- Added support for command-line args for kokkos-tools [\#3627](https://github.com/kokkos/kokkos/pull/3627)
- Query max tile sizes for an MDRangePolicy, and set tile sizes on an existing policy [\#3481](https://github.com/kokkos/kokkos/pull/3481)
**Implemented enhancements Other:**
- Try detecting ndevices in get_gpu [\#3921](https://github.com/kokkos/kokkos/pull/3921)
- Use strcmp to compare names() [\#3909](https://github.com/kokkos/kokkos/pull/3909)
- Add execution space arguments for constructor overloads that might allocate a new underlying View [\#3904](https://github.com/kokkos/kokkos/pull/3904)
- Prefix labels in internal use of kokkos_malloc [\#3891](https://github.com/kokkos/kokkos/pull/3891)
- Prefix labels for internal uses of SharedAllocationRecord [\#3890](https://github.com/kokkos/kokkos/pull/3890)
- Add missing hypot math function [\#3880](https://github.com/kokkos/kokkos/pull/3880)
- Unify algorithm unit tests to avoid code duplication [\#3851](https://github.com/kokkos/kokkos/pull/3851)
- DualView.template view() better matches for Devices in UVMSpace cases [\#3857](https://github.com/kokkos/kokkos/pull/3857)
- More extensive disentangling of Policy Traits [\#3829](https://github.com/kokkos/kokkos/pull/3829)
- Replaced nanosleep and sched_yield with STL routines [\#3825](https://github.com/kokkos/kokkos/pull/3825)
- Constructing Atomic Subviews [\#3810](https://github.com/kokkos/kokkos/pull/3810)
- Metadata Declaration in Core [\#3729](https://github.com/kokkos/kokkos/pull/3729)
- Allow using tagged final functor in parallel_reduce [\#3714](https://github.com/kokkos/kokkos/pull/3714)
- Major duplicate code removal in SharedAllocationRecord specializations [\#3658](https://github.com/kokkos/kokkos/pull/3658)
**Fixed bugs:**
- Provide forward declarations in Kokkos_ViewLayoutTiled.hpp for XL [\#3911](https://github.com/kokkos/kokkos/pull/3911)
- Fixup absolute value of floating points in Kokkos complex [\#3882](https://github.com/kokkos/kokkos/pull/3882)
- Address intel 17 ICE [\#3881](https://github.com/kokkos/kokkos/pull/3881)
- Add missing pow(Kokkos::complex) overloads [\#3868](https://github.com/kokkos/kokkos/pull/3868)
- Fix bug {pow, log}(Kokkos::complex) [\#3866](https://github.com/kokkos/kokkos/pull/3866)(https://github.com/kokkos/kokkos/pull/3866)
- Cleanup writing to output streams in Cuda [\#3859](https://github.com/kokkos/kokkos/pull/3859)
- Fixup cache CUDA fallback execution space instance used by DualView::sync [\#3856](https://github.com/kokkos/kokkos/pull/3856)
- Fix cmake warning with pthread [\#3854](https://github.com/kokkos/kokkos/pull/3854)
- Fix typo FOUND_CUDA_{DRIVVER -> DRIVER} [\#3852](https://github.com/kokkos/kokkos/pull/3852)
- Fix bug in SYCL team_reduce [\#3848](https://github.com/kokkos/kokkos/pull/3848)
- Atrocious bug in MDRange tuning [\#3803](https://github.com/kokkos/kokkos/pull/3803)
- Fix compiling SYCL with Kokkos_ENABLE_TUNING=ON [\#3800](https://github.com/kokkos/kokkos/pull/3800)
- Fixed command line parsing bug [\#3797](https://github.com/kokkos/kokkos/pull/3797)
- Workaround race condition in SYCL parallel_reduce [\#3782](https://github.com/kokkos/kokkos/pull/3782)
- Fix Atomic{Min,Max} for Kepler30 [\#3780](https://github.com/kokkos/kokkos/pull/3780)
- Fix SYCL typo [\#3755](https://github.com/kokkos/kokkos/pull/3755)
- Fixed Kokkos_install_additional_files macro [\#3752](https://github.com/kokkos/kokkos/pull/3752)
- Fix a typo for Kokkos_ARCH_A64FX [\#3751](https://github.com/kokkos/kokkos/pull/3751)
- OpenMPTarget: fixes and workarounds to work with "Release" build type [\#3748](https://github.com/kokkos/kokkos/pull/3748)
- Fix parsing bug for number of devices command line argument [\#3724](https://github.com/kokkos/kokkos/pull/3724)
- Avoid more warnings with clang and C++20 [\#3719](https://github.com/kokkos/kokkos/pull/3719)
- Fix gcc-10.1 C++20 warnings [\#3718](https://github.com/kokkos/kokkos/pull/3718)
- Fix cuda cache config not being set correct [\#3712](https://github.com/kokkos/kokkos/pull/3712)
- Fix dualview deepcopy perftools [\#3701](https://github.com/kokkos/kokkos/pull/3701)
- use drand instead of frand in drand [\#3696](https://github.com/kokkos/kokkos/pull/3696)
**Incompatibilities:**
- Remove unimplemented member functions of SYCLDevice [\#3919](https://github.com/kokkos/kokkos/pull/3919)
- Replace cl::sycl [\#3896](https://github.com/kokkos/kokkos/pull/3896)
- Get rid of SYCL workaround in Kokkos_Complex.hpp [\#3884](https://github.com/kokkos/kokkos/pull/3884)
- Replace most uses of if_c [\#3883](https://github.com/kokkos/kokkos/pull/3883)
- Remove Impl::enable_if_type [\#3863](https://github.com/kokkos/kokkos/pull/3863)
- Remove HostBarrier test [\#3847](https://github.com/kokkos/kokkos/pull/3847)
- Avoid (void) interface [\#3836](https://github.com/kokkos/kokkos/pull/3836)
- Remove VerifyExecutionCanAccessMemorySpace [\#3813](https://github.com/kokkos/kokkos/pull/3813)
- Avoid duplicated code in ScratchMemorySpace [\#3793](https://github.com/kokkos/kokkos/pull/3793)
- Remove superfluous FunctorFinal specialization [\#3788](https://github.com/kokkos/kokkos/pull/3788)
- Rename cl::sycl -> sycl in Kokkos_MathematicalFunctions.hpp [\#3678](https://github.com/kokkos/kokkos/pull/3678)
- Remove integer_sequence backward compatibility implementation [\#3533](https://github.com/kokkos/kokkos/pull/3533)
**Enabled tests:**
- Fixup re-enable core performance tests [\#3903](https://github.com/kokkos/kokkos/pull/3903)
- Enable more SYCL tests [\#3900](https://github.com/kokkos/kokkos/pull/3900)
- Restrict MDRange Policy tests for Intel GPUs [\#3853](https://github.com/kokkos/kokkos/pull/3853)
- Disable death tests for rawhide [\#3844](https://github.com/kokkos/kokkos/pull/3844)
- OpenMPTarget: Block unit tests that do not pass with the nvidia compiler [\#3839](https://github.com/kokkos/kokkos/pull/3839)
- Enable Bitset container test for SYCL [\#3830](https://github.com/kokkos/kokkos/pull/3830)
- Enable some more SYCL tests [\#3744](https://github.com/kokkos/kokkos/pull/3744)
- Enable SYCL atomic tests [\#3742](https://github.com/kokkos/kokkos/pull/3742)
- Enable more SYCL perf_tests [\#3692](https://github.com/kokkos/kokkos/pull/3692)
- Enable examples for SYCL [\#3691](https://github.com/kokkos/kokkos/pull/3691)
## [3.3.01](https://github.com/kokkos/kokkos/tree/3.3.01) (2021-01-06) ## [3.3.01](https://github.com/kokkos/kokkos/tree/3.3.01) (2021-01-06)
[Full Changelog](https://github.com/kokkos/kokkos/compare/3.3.00...3.3.01) [Full Changelog](https://github.com/kokkos/kokkos/compare/3.3.00...3.3.01)

View File

@ -72,7 +72,7 @@ ENDFUNCTION()
LIST(APPEND CMAKE_MODULE_PATH cmake/Modules) LIST(APPEND CMAKE_MODULE_PATH cmake/Modules)
IF(NOT KOKKOS_HAS_TRILINOS) IF(NOT KOKKOS_HAS_TRILINOS)
cmake_minimum_required(VERSION 3.10 FATAL_ERROR) cmake_minimum_required(VERSION 3.16 FATAL_ERROR)
set(CMAKE_DISABLE_SOURCE_CHANGES ON) set(CMAKE_DISABLE_SOURCE_CHANGES ON)
set(CMAKE_DISABLE_IN_SOURCE_BUILD ON) set(CMAKE_DISABLE_IN_SOURCE_BUILD ON)
IF (Spack_WORKAROUND) IF (Spack_WORKAROUND)
@ -111,21 +111,19 @@ ENDIF()
set(Kokkos_VERSION_MAJOR 3) set(Kokkos_VERSION_MAJOR 3)
set(Kokkos_VERSION_MINOR 3) set(Kokkos_VERSION_MINOR 4)
set(Kokkos_VERSION_PATCH 1) set(Kokkos_VERSION_PATCH 00)
set(Kokkos_VERSION "${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}.${Kokkos_VERSION_PATCH}") set(Kokkos_VERSION "${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}.${Kokkos_VERSION_PATCH}")
math(EXPR KOKKOS_VERSION "${Kokkos_VERSION_MAJOR} * 10000 + ${Kokkos_VERSION_MINOR} * 100 + ${Kokkos_VERSION_PATCH}") math(EXPR KOKKOS_VERSION "${Kokkos_VERSION_MAJOR} * 10000 + ${Kokkos_VERSION_MINOR} * 100 + ${Kokkos_VERSION_PATCH}")
IF(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.12.0")
MESSAGE(STATUS "Setting policy CMP0074 to use <Package>_ROOT variables") MESSAGE(STATUS "Setting policy CMP0074 to use <Package>_ROOT variables")
CMAKE_POLICY(SET CMP0074 NEW) CMAKE_POLICY(SET CMP0074 NEW)
ENDIF()
# Load either the real TriBITS or a TriBITS wrapper # Load either the real TriBITS or a TriBITS wrapper
# for certain utility functions that are universal (like GLOBAL_SET) # for certain utility functions that are universal (like GLOBAL_SET)
INCLUDE(${KOKKOS_SRC_PATH}/cmake/fake_tribits.cmake) INCLUDE(${KOKKOS_SRC_PATH}/cmake/fake_tribits.cmake)
IF (Kokkos_ENABLE_CUDA AND ${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.14.0") IF (Kokkos_ENABLE_CUDA)
# If we are building CUDA, we have tricked CMake because we declare a CXX project # If we are building CUDA, we have tricked CMake because we declare a CXX project
# If the default C++ standard for a given compiler matches the requested # If the default C++ standard for a given compiler matches the requested
# standard, then CMake just omits the -std flag in later versions of CMake # standard, then CMake just omits the -std flag in later versions of CMake
@ -139,15 +137,19 @@ ENDIF()
# I really wish these were regular variables # I really wish these were regular variables
# but scoping issues can make it difficult # but scoping issues can make it difficult
GLOBAL_SET(KOKKOS_COMPILE_OPTIONS) GLOBAL_SET(KOKKOS_COMPILE_OPTIONS)
GLOBAL_SET(KOKKOS_LINK_OPTIONS -DKOKKOS_DEPENDENCE) GLOBAL_SET(KOKKOS_LINK_OPTIONS)
GLOBAL_SET(KOKKOS_CUDA_OPTIONS) GLOBAL_SET(KOKKOS_CUDA_OPTIONS)
GLOBAL_SET(KOKKOS_CUDAFE_OPTIONS) GLOBAL_SET(KOKKOS_CUDAFE_OPTIONS)
GLOBAL_SET(KOKKOS_XCOMPILER_OPTIONS) GLOBAL_SET(KOKKOS_XCOMPILER_OPTIONS)
# We need to append text here for making sure TPLs # We need to append text here for making sure TPLs
# we import are available for an installed Kokkos # we import are available for an installed Kokkos
GLOBAL_SET(KOKKOS_TPL_EXPORTS) GLOBAL_SET(KOKKOS_TPL_EXPORTS)
# this could probably be scoped to project # KOKKOS_DEPENDENCE is used by kokkos_launch_compiler
GLOBAL_SET(KOKKOS_COMPILE_DEFINITIONS KOKKOS_DEPENDENCE) GLOBAL_SET(KOKKOS_COMPILE_DEFINITIONS KOKKOS_DEPENDENCE)
# MSVC never goes through kokkos_launch_compiler
IF(NOT MSVC)
GLOBAL_APPEND(KOKKOS_LINK_OPTIONS -DKOKKOS_DEPENDENCE)
ENDIF()
# Include a set of Kokkos-specific wrapper functions that # Include a set of Kokkos-specific wrapper functions that
# will either call raw CMake or TriBITS # will either call raw CMake or TriBITS

View File

@ -11,8 +11,8 @@ CXXFLAGS += $(SHFLAGS)
endif endif
KOKKOS_VERSION_MAJOR = 3 KOKKOS_VERSION_MAJOR = 3
KOKKOS_VERSION_MINOR = 3 KOKKOS_VERSION_MINOR = 4
KOKKOS_VERSION_PATCH = 1 KOKKOS_VERSION_PATCH = 00
KOKKOS_VERSION = $(shell echo $(KOKKOS_VERSION_MAJOR)*10000+$(KOKKOS_VERSION_MINOR)*100+$(KOKKOS_VERSION_PATCH) | bc) KOKKOS_VERSION = $(shell echo $(KOKKOS_VERSION_MAJOR)*10000+$(KOKKOS_VERSION_MINOR)*100+$(KOKKOS_VERSION_PATCH) | bc)
# Options: Cuda,HIP,OpenMP,Pthread,Serial # Options: Cuda,HIP,OpenMP,Pthread,Serial
@ -20,7 +20,7 @@ KOKKOS_DEVICES ?= "OpenMP"
#KOKKOS_DEVICES ?= "Pthread" #KOKKOS_DEVICES ?= "Pthread"
# Options: # Options:
# Intel: KNC,KNL,SNB,HSW,BDW,SKX # Intel: KNC,KNL,SNB,HSW,BDW,SKX
# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80 # NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86
# ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX # ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX
# IBM: BGQ,Power7,Power8,Power9 # IBM: BGQ,Power7,Power8,Power9
# AMD-GPUS: Vega900,Vega906,Vega908 # AMD-GPUS: Vega900,Vega906,Vega908
@ -164,17 +164,17 @@ KOKKOS_INTERNAL_OS_DARWIN := $(call kokkos_has_string,$(KOKKOS_OS),Darwin)
KOKKOS_CXX_VERSION := $(strip $(shell $(CXX) --version 2>&1)) KOKKOS_CXX_VERSION := $(strip $(shell $(CXX) --version 2>&1))
KOKKOS_INTERNAL_COMPILER_INTEL := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),Intel Corporation) KOKKOS_INTERNAL_COMPILER_INTEL := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),Intel Corporation)
KOKKOS_INTERNAL_COMPILER_PGI := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),PGI) KOKKOS_INTERNAL_COMPILER_PGI := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),PGI)
KOKKOS_INTERNAL_COMPILER_XL := $(strip $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l)) KOKKOS_INTERNAL_COMPILER_XL := $(strip $(shell $(CXX) -qversion 2>&1 | grep -c XL))
KOKKOS_INTERNAL_COMPILER_CRAY := $(strip $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l)) KOKKOS_INTERNAL_COMPILER_CRAY := $(strip $(shell $(CXX) -craype-verbose 2>&1 | grep -c "CC-"))
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell echo "$(shell export OMPI_CXX=$(OMPI_CXX); export MPICH_CXX=$(MPICH_CXX); $(CXX) --version 2>&1 | grep nvcc | wc -l)>0" | bc)) KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell echo "$(shell export OMPI_CXX=$(OMPI_CXX); export MPICH_CXX=$(MPICH_CXX); $(CXX) --version 2>&1 | grep -c nvcc)>0" | bc))
KOKKOS_INTERNAL_COMPILER_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),clang) KOKKOS_INTERNAL_COMPILER_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),clang)
KOKKOS_INTERNAL_COMPILER_APPLE_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),Apple LLVM) KOKKOS_INTERNAL_COMPILER_APPLE_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),Apple clang)
KOKKOS_INTERNAL_COMPILER_HCC := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),HCC) KOKKOS_INTERNAL_COMPILER_HCC := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),HCC)
KOKKOS_INTERNAL_COMPILER_GCC := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),GCC) KOKKOS_INTERNAL_COMPILER_GCC := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),GCC)
# Check Host Compiler if using NVCC through nvcc_wrapper # Check Host Compiler if using NVCC through nvcc_wrapper
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1) ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
KOKKOS_INTERNAL_COMPILER_NVCC_WRAPPER := $(strip $(shell echo $(CXX) | grep nvcc_wrapper | wc -l)) KOKKOS_INTERNAL_COMPILER_NVCC_WRAPPER := $(strip $(shell echo $(CXX) | grep -c nvcc_wrapper))
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC_WRAPPER), 1) ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC_WRAPPER), 1)
KOKKOS_CXX_HOST_VERSION := $(strip $(shell $(CXX) $(CXXFLAGS) --host-version 2>&1)) KOKKOS_CXX_HOST_VERSION := $(strip $(shell $(CXX) $(CXXFLAGS) --host-version 2>&1))
@ -297,11 +297,11 @@ else
#KOKKOS_INTERNAL_CXX1Z_FLAG := -hstd=c++1z #KOKKOS_INTERNAL_CXX1Z_FLAG := -hstd=c++1z
#KOKKOS_INTERNAL_CXX2A_FLAG := -hstd=c++2a #KOKKOS_INTERNAL_CXX2A_FLAG := -hstd=c++2a
else else
KOKKOS_INTERNAL_CXX14_FLAG := --std=c++14 KOKKOS_INTERNAL_CXX14_FLAG := -std=c++14
KOKKOS_INTERNAL_CXX1Y_FLAG := --std=c++1y KOKKOS_INTERNAL_CXX1Y_FLAG := -std=c++1y
KOKKOS_INTERNAL_CXX17_FLAG := --std=c++17 KOKKOS_INTERNAL_CXX17_FLAG := -std=c++17
KOKKOS_INTERNAL_CXX1Z_FLAG := --std=c++1z KOKKOS_INTERNAL_CXX1Z_FLAG := -std=c++1z
KOKKOS_INTERNAL_CXX2A_FLAG := --std=c++2a KOKKOS_INTERNAL_CXX2A_FLAG := -std=c++2a
endif endif
endif endif
endif endif
@ -332,6 +332,7 @@ KOKKOS_INTERNAL_USE_ARCH_VOLTA70 := $(call kokkos_has_string,$(KOKKOS_ARCH),Volt
KOKKOS_INTERNAL_USE_ARCH_VOLTA72 := $(call kokkos_has_string,$(KOKKOS_ARCH),Volta72) KOKKOS_INTERNAL_USE_ARCH_VOLTA72 := $(call kokkos_has_string,$(KOKKOS_ARCH),Volta72)
KOKKOS_INTERNAL_USE_ARCH_TURING75 := $(call kokkos_has_string,$(KOKKOS_ARCH),Turing75) KOKKOS_INTERNAL_USE_ARCH_TURING75 := $(call kokkos_has_string,$(KOKKOS_ARCH),Turing75)
KOKKOS_INTERNAL_USE_ARCH_AMPERE80 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere80) KOKKOS_INTERNAL_USE_ARCH_AMPERE80 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere80)
KOKKOS_INTERNAL_USE_ARCH_AMPERE86 := $(call kokkos_has_string,$(KOKKOS_ARCH),Ampere86)
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \ KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \ + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \ + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
@ -344,7 +345,8 @@ KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLE
+ $(KOKKOS_INTERNAL_USE_ARCH_VOLTA70) \ + $(KOKKOS_INTERNAL_USE_ARCH_VOLTA70) \
+ $(KOKKOS_INTERNAL_USE_ARCH_VOLTA72) \ + $(KOKKOS_INTERNAL_USE_ARCH_VOLTA72) \
+ $(KOKKOS_INTERNAL_USE_ARCH_TURING75) \ + $(KOKKOS_INTERNAL_USE_ARCH_TURING75) \
+ $(KOKKOS_INTERNAL_USE_ARCH_AMPERE80)) + $(KOKKOS_INTERNAL_USE_ARCH_AMPERE80) \
+ $(KOKKOS_INTERNAL_USE_ARCH_AMPERE86))
#SEK: This seems like a bug to me #SEK: This seems like a bug to me
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0) ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
@ -585,10 +587,10 @@ ifeq ($(KOKKOS_INTERNAL_ENABLE_PROFILING_LOAD_PRINT), 1)
endif endif
ifeq ($(KOKKOS_INTERNAL_ENABLE_TUNING), 1) ifeq ($(KOKKOS_INTERNAL_ENABLE_TUNING), 1)
tmp := $(call kokkos_append_header,"\#define KOKKOS_ENABLE_TUNING") tmp := $(call kokkos_append_header,"$H""define KOKKOS_ENABLE_TUNING")
endif endif
tmp := $(call kokkos_append_header,"\#define KOKKOS_ENABLE_LIBDL") tmp := $(call kokkos_append_header,"$H""define KOKKOS_ENABLE_LIBDL")
ifeq ($(KOKKOS_INTERNAL_USE_HWLOC), 1) ifeq ($(KOKKOS_INTERNAL_USE_HWLOC), 1)
ifneq ($(KOKKOS_CMAKE), yes) ifneq ($(KOKKOS_CMAKE), yes)
@ -752,6 +754,14 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_A64FX), 1)
KOKKOS_CXXFLAGS += -march=armv8.2-a+sve KOKKOS_CXXFLAGS += -march=armv8.2-a+sve
KOKKOS_LDFLAGS += -march=armv8.2-a+sve KOKKOS_LDFLAGS += -march=armv8.2-a+sve
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_CXXFLAGS += -msve-vector-bits=512
KOKKOS_LDFLAGS += -msve-vector-bits=512
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_GCC), 1)
KOKKOS_CXXFLAGS += -msve-vector-bits=512
KOKKOS_LDFLAGS += -msve-vector-bits=512
endif
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN), 1) ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN), 1)
@ -1100,6 +1110,11 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA_ARCH), 1)
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMPERE80") tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMPERE80")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_80 KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_80
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMPERE86), 1)
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMPERE")
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMPERE86")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_86
endif
ifneq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0) ifneq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG) KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)
@ -1159,7 +1174,7 @@ endif
KOKKOS_INTERNAL_LS_CONFIG := $(shell ls KokkosCore_config.h 2>&1) KOKKOS_INTERNAL_LS_CONFIG := $(shell ls KokkosCore_config.h 2>&1)
ifeq ($(KOKKOS_INTERNAL_LS_CONFIG), KokkosCore_config.h) ifeq ($(KOKKOS_INTERNAL_LS_CONFIG), KokkosCore_config.h)
KOKKOS_INTERNAL_NEW_CONFIG := $(strip $(shell diff KokkosCore_config.h KokkosCore_config.tmp | grep define | wc -l)) KOKKOS_INTERNAL_NEW_CONFIG := $(strip $(shell diff KokkosCore_config.h KokkosCore_config.tmp | grep -c define))
else else
KOKKOS_INTERNAL_NEW_CONFIG := 1 KOKKOS_INTERNAL_NEW_CONFIG := 1
endif endif
@ -1181,41 +1196,41 @@ tmp := $(call kokkos_update_config_header, KOKKOS_SETUP_HPP_, "KokkosCore_Config
tmp := $(call kokkos_update_config_header, KOKKOS_DECLARE_HPP_, "KokkosCore_Config_DeclareBackend.tmp", "KokkosCore_Config_DeclareBackend.hpp") tmp := $(call kokkos_update_config_header, KOKKOS_DECLARE_HPP_, "KokkosCore_Config_DeclareBackend.tmp", "KokkosCore_Config_DeclareBackend.hpp")
tmp := $(call kokkos_update_config_header, KOKKOS_POST_INCLUDE_HPP_, "KokkosCore_Config_PostInclude.tmp", "KokkosCore_Config_PostInclude.hpp") tmp := $(call kokkos_update_config_header, KOKKOS_POST_INCLUDE_HPP_, "KokkosCore_Config_PostInclude.tmp", "KokkosCore_Config_PostInclude.hpp")
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1) ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_CUDA.hpp>","KokkosCore_Config_FwdBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <fwd/Kokkos_Fwd_CUDA.hpp>","KokkosCore_Config_FwdBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_CUDA.hpp>","KokkosCore_Config_DeclareBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <decl/Kokkos_Declare_CUDA.hpp>","KokkosCore_Config_DeclareBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <setup/Kokkos_Setup_Cuda.hpp>","KokkosCore_Config_SetupBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <setup/Kokkos_Setup_Cuda.hpp>","KokkosCore_Config_SetupBackend.hpp")
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1) ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
else else
endif endif
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1) ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_OPENMPTARGET.hpp>","KokkosCore_Config_FwdBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <fwd/Kokkos_Fwd_OPENMPTARGET.hpp>","KokkosCore_Config_FwdBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_OPENMPTARGET.hpp>","KokkosCore_Config_DeclareBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <decl/Kokkos_Declare_OPENMPTARGET.hpp>","KokkosCore_Config_DeclareBackend.hpp")
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_HIP), 1) ifeq ($(KOKKOS_INTERNAL_USE_HIP), 1)
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_HIP.hpp>","KokkosCore_Config_FwdBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <fwd/Kokkos_Fwd_HIP.hpp>","KokkosCore_Config_FwdBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_HIP.hpp>","KokkosCore_Config_DeclareBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <decl/Kokkos_Declare_HIP.hpp>","KokkosCore_Config_DeclareBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <setup/Kokkos_Setup_HIP.hpp>","KokkosCore_Config_SetupBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <setup/Kokkos_Setup_HIP.hpp>","KokkosCore_Config_SetupBackend.hpp")
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1) ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_OPENMP.hpp>","KokkosCore_Config_FwdBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <fwd/Kokkos_Fwd_OPENMP.hpp>","KokkosCore_Config_FwdBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_OPENMP.hpp>","KokkosCore_Config_DeclareBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <decl/Kokkos_Declare_OPENMP.hpp>","KokkosCore_Config_DeclareBackend.hpp")
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1) ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_THREADS.hpp>","KokkosCore_Config_FwdBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <fwd/Kokkos_Fwd_THREADS.hpp>","KokkosCore_Config_FwdBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_THREADS.hpp>","KokkosCore_Config_DeclareBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <decl/Kokkos_Declare_THREADS.hpp>","KokkosCore_Config_DeclareBackend.hpp")
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_HPX), 1) ifeq ($(KOKKOS_INTERNAL_USE_HPX), 1)
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_HPX.hpp>","KokkosCore_Config_FwdBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <fwd/Kokkos_Fwd_HPX.hpp>","KokkosCore_Config_FwdBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_HPX.hpp>","KokkosCore_Config_DeclareBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <decl/Kokkos_Declare_HPX.hpp>","KokkosCore_Config_DeclareBackend.hpp")
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1) ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_SERIAL.hpp>","KokkosCore_Config_FwdBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <fwd/Kokkos_Fwd_SERIAL.hpp>","KokkosCore_Config_FwdBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_SERIAL.hpp>","KokkosCore_Config_DeclareBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <decl/Kokkos_Declare_SERIAL.hpp>","KokkosCore_Config_DeclareBackend.hpp")
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1) ifeq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
tmp := $(call kokkos_append_config_header,"\#include <fwd/Kokkos_Fwd_HBWSpace.hpp>","KokkosCore_Config_FwdBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <fwd/Kokkos_Fwd_HBWSpace.hpp>","KokkosCore_Config_FwdBackend.hpp")
tmp := $(call kokkos_append_config_header,"\#include <decl/Kokkos_Declare_HBWSpace.hpp>","KokkosCore_Config_DeclareBackend.hpp") tmp := $(call kokkos_append_config_header,"$H""include <decl/Kokkos_Declare_HBWSpace.hpp>","KokkosCore_Config_DeclareBackend.hpp")
endif endif
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/*.hpp) KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp) KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
@ -1334,7 +1349,7 @@ ifneq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
endif endif
# With Cygwin functions such as fdopen and fileno are not defined # With Cygwin functions such as fdopen and fileno are not defined
# when strict ansi is enabled. strict ansi gets enabled with --std=c++14 # when strict ansi is enabled. strict ansi gets enabled with -std=c++14
# though. So we hard undefine it here. Not sure if that has any bad side effects # though. So we hard undefine it here. Not sure if that has any bad side effects
# This is needed for gtest actually, not for Kokkos itself! # This is needed for gtest actually, not for Kokkos itself!
ifeq ($(KOKKOS_INTERNAL_OS_CYGWIN), 1) ifeq ($(KOKKOS_INTERNAL_OS_CYGWIN), 1)

View File

@ -36,6 +36,8 @@ Kokkos_MemorySpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_MemorySpace.cpp $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_MemorySpace.cpp
Kokkos_HostSpace_deepcopy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace_deepcopy.cpp Kokkos_HostSpace_deepcopy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace_deepcopy.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace_deepcopy.cpp $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace_deepcopy.cpp
Kokkos_NumericTraits.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_NumericTraits.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_NumericTraits.cpp
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1) ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
Kokkos_Cuda_Instance.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Instance.cpp Kokkos_Cuda_Instance.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Instance.cpp

View File

@ -668,6 +668,25 @@ struct Random_UniqueIndex<Kokkos::Experimental::HIP> {
}; };
#endif #endif
#ifdef KOKKOS_ENABLE_SYCL
template <>
struct Random_UniqueIndex<Kokkos::Experimental::SYCL> {
using locks_view_type = View<int*, Kokkos::Experimental::SYCL>;
KOKKOS_FUNCTION
static int get_state_idx(const locks_view_type& locks_) {
#ifdef KOKKOS_ARCH_INTEL_GEN
int i = Kokkos::Impl::clock_tic() % locks_.extent(0);
#else
int i = 0;
#endif
while (Kokkos::atomic_compare_exchange(&locks_(i), 0, 1)) {
i = (i + 1) % static_cast<int>(locks_.extent(0));
}
return i;
}
};
#endif
} // namespace Impl } // namespace Impl
template <class DeviceType> template <class DeviceType>
@ -1028,7 +1047,7 @@ class Random_XorShift1024 {
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
double drand(const double& start, const double& end) { double drand(const double& start, const double& end) {
return frand(end - start) + start; return drand(end - start) + start;
} }
// Marsaglia polar method for drawing a standard normal distributed random // Marsaglia polar method for drawing a standard normal distributed random

View File

@ -3,6 +3,7 @@
KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR}) KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR}) KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src ) KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
KOKKOS_INCLUDE_DIRECTORIES(${KOKKOS_SOURCE_DIR}/core/unit_test/category_files)
SET(GTEST_SOURCE_DIR ${${PARENT_PACKAGE_NAME}_SOURCE_DIR}/tpls/gtest) SET(GTEST_SOURCE_DIR ${${PARENT_PACKAGE_NAME}_SOURCE_DIR}/tpls/gtest)
@ -25,7 +26,7 @@ KOKKOS_ADD_TEST_LIBRARY(
TARGET_COMPILE_DEFINITIONS(kokkosalgorithms_gtest PUBLIC GTEST_HAS_TR1_TUPLE=0 GTEST_HAS_PTHREAD=0) TARGET_COMPILE_DEFINITIONS(kokkosalgorithms_gtest PUBLIC GTEST_HAS_TR1_TUPLE=0 GTEST_HAS_PTHREAD=0)
IF((NOT (Kokkos_ENABLE_CUDA AND WIN32)) AND (NOT ("${KOKKOS_CXX_COMPILER_ID}" STREQUAL "Fujitsu"))) IF((NOT (Kokkos_ENABLE_CUDA AND WIN32)) AND (NOT ("${KOKKOS_CXX_COMPILER_ID}" STREQUAL "Fujitsu")))
TARGET_COMPILE_FEATURES(kokkosalgorithms_gtest PUBLIC cxx_std_11) TARGET_COMPILE_FEATURES(kokkosalgorithms_gtest PUBLIC cxx_std_14)
ENDIF() ENDIF()
# Suppress clang-tidy diagnostics on code that we do not have control over # Suppress clang-tidy diagnostics on code that we do not have control over
@ -33,51 +34,42 @@ IF(CMAKE_CXX_CLANG_TIDY)
SET_TARGET_PROPERTIES(kokkosalgorithms_gtest PROPERTIES CXX_CLANG_TIDY "") SET_TARGET_PROPERTIES(kokkosalgorithms_gtest PROPERTIES CXX_CLANG_TIDY "")
ENDIF() ENDIF()
SET(SOURCES SET(ALGORITHM UnitTestMain.cpp)
UnitTestMain.cpp
)
IF(Kokkos_ENABLE_OPENMP) IF(Kokkos_ENABLE_OPENMP)
LIST( APPEND SOURCES LIST(APPEND ALGORITHM_SOURCES
TestOpenMP.cpp
TestOpenMP_Sort1D.cpp TestOpenMP_Sort1D.cpp
TestOpenMP_Sort3D.cpp TestOpenMP_Sort3D.cpp
TestOpenMP_SortDynamicView.cpp TestOpenMP_SortDynamicView.cpp
TestOpenMP_Random.cpp
) )
ENDIF() ENDIF()
IF(Kokkos_ENABLE_HIP) foreach(Tag Threads;Serial;OpenMP;Cuda;HPX;HIP;SYCL)
LIST( APPEND SOURCES # Because there is always an exception to the rule
TestHIP.cpp if(Tag STREQUAL "Threads")
) set(DEVICE "PTHREAD")
ENDIF() else()
string(TOUPPER ${Tag} DEVICE)
endif()
IF(Kokkos_ENABLE_CUDA) if(Kokkos_ENABLE_${DEVICE})
LIST( APPEND SOURCES set(dir ${CMAKE_CURRENT_BINARY_DIR})
TestCuda.cpp set(file ${dir}/Test${Tag}.cpp)
# Write to a temporary intermediate file and call configure_file to avoid
# updating timestamps triggering unnecessary rebuilds on subsequent cmake runs.
file(WRITE ${dir}/dummy.cpp
"#include <Test${Tag}_Category.hpp>\n"
"#include <TestRandomCommon.hpp>\n"
"#include <TestSortCommon.hpp>\n"
) )
ENDIF() configure_file(${dir}/dummy.cpp ${file})
list(APPEND ALGORITHM_SOURCES ${file})
IF(Kokkos_ENABLE_HPX) endif()
LIST( APPEND SOURCES endforeach()
TestHPX.cpp
)
ENDIF()
IF(Kokkos_ENABLE_SERIAL)
LIST( APPEND SOURCES
TestSerial.cpp
)
ENDIF()
IF(Kokkos_ENABLE_PTHREAD)
LIST( APPEND SOURCES
TestThreads.cpp
)
ENDIF()
KOKKOS_ADD_EXECUTABLE_AND_TEST( KOKKOS_ADD_EXECUTABLE_AND_TEST(
UnitTest UnitTest
SOURCES ${SOURCES} SOURCES
UnitTestMain.cpp
${ALGORITHM_SOURCES}
) )

View File

@ -20,11 +20,19 @@ override LDFLAGS += -lpthread
include $(KOKKOS_PATH)/Makefile.kokkos include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/algorithms/unit_tests KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/algorithms/unit_tests -I${KOKKOS_PATH}/core/unit_test/category_files
TEST_TARGETS = TEST_TARGETS =
TARGETS = TARGETS =
tmp := $(foreach device, $(KOKKOS_DEVICELIST), \
$(if $(filter Test$(device).cpp, $(shell ls Test$(device).cpp 2>/dev/null)),,\
$(shell echo "\#include <Test"${device}"_Category.hpp>" > Test$(device).cpp); \
$(shell echo "\#include <TestRandomCommon.hpp>" >> Test$(device).cpp); \
$(shell echo "\#include <TestSortCommon.hpp>" >> Test$(device).cpp); \
) \
)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1) ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_CUDA = TestCuda.o UnitTestMain.o gtest-all.o OBJ_CUDA = TestCuda.o UnitTestMain.o gtest-all.o
TARGETS += KokkosAlgorithms_UnitTest_Cuda TARGETS += KokkosAlgorithms_UnitTest_Cuda
@ -44,7 +52,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1) ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
OBJ_OPENMP = TestOpenMP.o TestOpenMP_Random.o TestOpenMP_Sort1D.o TestOpenMP_Sort3D.o TestOpenMP_SortDynamicView.o UnitTestMain.o gtest-all.o OBJ_OPENMP = TestOpenMP.o TestOpenMP_Sort1D.o TestOpenMP_Sort3D.o TestOpenMP_SortDynamicView.o UnitTestMain.o gtest-all.o
TARGETS += KokkosAlgorithms_UnitTest_OpenMP TARGETS += KokkosAlgorithms_UnitTest_OpenMP
TEST_TARGETS += test-openmp TEST_TARGETS += test-openmp
endif endif

View File

@ -59,6 +59,8 @@ TEST(openmp, SortUnsigned1D) {
Impl::test_1D_sort<Kokkos::OpenMP, unsigned>(171); Impl::test_1D_sort<Kokkos::OpenMP, unsigned>(171);
} }
TEST(openmp, SortIssue1160) { Impl::test_issue_1160_sort<Kokkos::OpenMP>(); }
} // namespace Test } // namespace Test
#else #else
void KOKKOS_ALGORITHMS_UNITTESTS_TESTOPENMP_PREVENT_LINK_ERROR() {} void KOKKOS_ALGORITHMS_UNITTESTS_TESTOPENMP_PREVENT_LINK_ERROR() {}

View File

@ -491,6 +491,34 @@ void test_random(unsigned int num_draws) {
} }
} // namespace Impl } // namespace Impl
template <typename ExecutionSpace>
void test_random_xorshift64() {
#if defined(KOKKOS_ENABLE_SYCL) || defined(KOKKOS_ENABLE_CUDA) || \
defined(KOKKOS_ENABLE_HIP)
const int num_draws = 132141141;
#else // SERIAL, HPX, OPENMP
const int num_draws = 10240000;
#endif
Impl::test_random<Kokkos::Random_XorShift64_Pool<ExecutionSpace>>(num_draws);
Impl::test_random<Kokkos::Random_XorShift64_Pool<
Kokkos::Device<ExecutionSpace, typename ExecutionSpace::memory_space>>>(
num_draws);
}
template <typename ExecutionSpace>
void test_random_xorshift1024() {
#if defined(KOKKOS_ENABLE_SYCL) || defined(KOKKOS_ENABLE_CUDA) || \
defined(KOKKOS_ENABLE_HIP)
const int num_draws = 52428813;
#else // SERIAL, HPX, OPENMP
const int num_draws = 10130144;
#endif
Impl::test_random<Kokkos::Random_XorShift1024_Pool<ExecutionSpace>>(
num_draws);
Impl::test_random<Kokkos::Random_XorShift1024_Pool<
Kokkos::Device<ExecutionSpace, typename ExecutionSpace::memory_space>>>(
num_draws);
}
} // namespace Test } // namespace Test
#endif // KOKKOS_TEST_UNORDERED_MAP_HPP #endif // KOKKOS_TEST_UNORDERED_MAP_HPP

View File

@ -0,0 +1,60 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 3.0
// Copyright (2020) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_ALGORITHMS_UNITTESTS_TESTRANDOM_COMMON_HPP
#define KOKKOS_ALGORITHMS_UNITTESTS_TESTRANDOM_COMMON_HPP
#include <TestRandom.hpp>
namespace Test {
TEST(TEST_CATEGORY, Random_XorShift64) {
test_random_xorshift64<TEST_EXECSPACE>();
}
TEST(TEST_CATEGORY, Random_XorShift1024_0) {
test_random_xorshift1024<TEST_EXECSPACE>();
}
} // namespace Test
#endif

View File

@ -42,10 +42,14 @@
//@HEADER //@HEADER
*/ */
#ifndef KOKKOS_TEST_CUDA_HPP #ifndef KOKKOS_ALGORITHMS_UNITTESTS_TESTSORT_COMMON_HPP
#define KOKKOS_TEST_CUDA_HPP #define KOKKOS_ALGORITHMS_UNITTESTS_TESTSORT_COMMON_HPP
#define TEST_CATEGORY cuda #include <TestSort.hpp>
#define TEST_EXECSPACE Kokkos::Cuda
namespace Test {
TEST(TEST_CATEGORY, SortUnsigned) {
Impl::test_sort<TEST_EXECSPACE, unsigned>(171);
}
} // namespace Test
#endif #endif

View File

@ -3,8 +3,4 @@ image:
clone_folder: c:\projects\source clone_folder: c:\projects\source
build_script: build_script:
- cmd: >- - cmd: >-
mkdir build && cmake c:\projects\source -DKokkos_ENABLE_TESTS=ON -DCMAKE_CXX_FLAGS="/W0 /EHsc /d1reportClassLayoutChanges" -DCTEST_ARGS="-C Debug -V --output-on-failure" -DBUILD_NAME=MSVC-2019 -DBUILD_TYPE=Debug -DSITE=AppVeyor -DTARGET=install -P cmake/KokkosCI.cmake
cd build &&
cmake c:\projects\source -DKokkos_ENABLE_TESTS=ON &&
cmake --build . --target install &&
ctest -C Debug -V

View File

@ -13,6 +13,17 @@
# $1 are 'ar', 'cmake', etc. during the linking phase # $1 are 'ar', 'cmake', etc. during the linking phase
# #
# emit a message about the underlying command executed
: ${DEBUG:=0}
: ${KOKKOS_DEBUG_LAUNCH_COMPILER:=${DEBUG}}
debug-message()
{
if [ "${KOKKOS_DEBUG_LAUNCH_COMPILER}" -ne 0 ]; then
echo -e "##### $(basename ${BASH_SOURCE[0]}) executing: \"$@\"... #####"
fi
}
# check the arguments for the KOKKOS_DEPENDENCE compiler definition # check the arguments for the KOKKOS_DEPENDENCE compiler definition
KOKKOS_DEPENDENCE=0 KOKKOS_DEPENDENCE=0
for i in ${@} for i in ${@}
@ -23,16 +34,30 @@ do
fi fi
done done
# if C++ is not passed, someone is probably trying to invoke it directly # if Kokkos compiler is not passed, someone is probably trying to invoke it directly
if [ -z "${1}" ]; then if [ -z "${1}" ]; then
echo -e "\n${BASH_SOURCE[0]} was invoked without the C++ compiler as the first argument." echo -e "\n${BASH_SOURCE[0]} was invoked without the Kokkos compiler as the first argument."
echo "This script is not indended to be directly invoked by any mechanism other" echo "This script is not indended to be directly invoked by any mechanism other"
echo -e "than through a RULE_LAUNCH_COMPILE or RULE_LAUNCH_LINK property set in CMake\n" echo -e "than through a RULE_LAUNCH_COMPILE or RULE_LAUNCH_LINK property set in CMake.\n"
exit 1
fi
# if Kokkos compiler is not passed, someone is probably trying to invoke it directly
if [ -z "${2}" ]; then
echo -e "\n${BASH_SOURCE[0]} was invoked without the C++ compiler as the second argument."
echo "This script is not indended to be directly invoked by any mechanism other"
echo -e "than through a RULE_LAUNCH_COMPILE or RULE_LAUNCH_LINK property set in CMake.\n"
exit 1 exit 1
fi fi
# if there aren't two args, this isn't necessarily invalid, just a bit strange # if there aren't two args, this isn't necessarily invalid, just a bit strange
if [ -z "${2}" ]; then exit 0; fi if [ -z "${3}" ]; then exit 0; fi
# store the Kokkos compiler
KOKKOS_COMPILER=${1}
# remove the Kokkos compiler from the arguments
shift
# store the expected C++ compiler # store the expected C++ compiler
CXX_COMPILER=${1} CXX_COMPILER=${1}
@ -40,48 +65,57 @@ CXX_COMPILER=${1}
# remove the expected C++ compiler from the arguments # remove the expected C++ compiler from the arguments
shift shift
# after the above shift, $1 is now the exe for the compile or link command, e.g. # NOTE: in below, ${KOKKOS_COMPILER} is usually nvcc_wrapper
# kokkos_launch_compiler g++ gcc -c file.c -o file.o #
# after the above shifts, $1 is now the exe for the compile or link command, e.g.
# kokkos_launch_compiler ${KOKKOS_COMPILER} g++ gcc -c file.c -o file.o
# becomes: # becomes:
# kokkos_launch_compiler gcc -c file.c -o file.o # kokkos_launch_compiler gcc -c file.c -o file.o
# Check to see if the executable is the C++ compiler and if it is not, then # We check to see if the executable is the C++ compiler and if it is not, then
# just execute the command. # just execute the command.
# #
# Summary: # Summary:
# kokkos_launch_compiler g++ gcc -c file.c -o file.o # kokkos_launch_compiler ${KOKKOS_COMPILER} g++ gcc -c file.c -o file.o
# results in this command being executed: # results in this command being executed:
# gcc -c file.c -o file.o # gcc -c file.c -o file.o
# and # and
# kokkos_launch_compiler g++ g++ -c file.cpp -o file.o # kokkos_launch_compiler ${KOKKOS_COMPILER} g++ g++ -c file.cpp -o file.o
# results in this command being executed: # results in this command being executed:
# nvcc_wrapper -c file.cpp -o file.o # ${KOKKOS_COMPILER} -c file.cpp -o file.o
if [[ "${KOKKOS_DEPENDENCE}" -eq "0" || "${CXX_COMPILER}" != "${1}" ]]; then if [[ "${KOKKOS_DEPENDENCE}" -eq "0" || "${CXX_COMPILER}" != "${1}" ]]; then
# the command does not depend on Kokkos so just execute the command w/o re-directing to nvcc_wrapper debug-message $@
# the command does not depend on Kokkos so just execute the command w/o re-directing to ${KOKKOS_COMPILER}
eval $@ eval $@
else else
# the executable is the C++ compiler, so we need to re-direct to nvcc_wrapper # the executable is the C++ compiler, so we need to re-direct to ${KOKKOS_COMPILER}
if [ ! -f "${KOKKOS_COMPILER}" ]; then
echo -e "\nError: the compiler redirect for Kokkos was not found at ${KOKKOS_COMPILER}\n"
exit 1
fi
# find the nvcc_wrapper from the same build/install # find the nvcc_wrapper from the same build/install
NVCC_WRAPPER="$(dirname ${BASH_SOURCE[0]})/nvcc_wrapper" NVCC_WRAPPER="$(dirname ${BASH_SOURCE[0]})/nvcc_wrapper"
if [ "${KOKKOS_COMPILER}" = "${NVCC_WRAPPER}" ]; then
if [ -z "${NVCC_WRAPPER}" ]; then # this should only be valid in the install tree -- it will be set to CMAKE_CXX_COMPILER used using Kokkos installation
echo -e "\nError: nvcc_wrapper not found in $(dirname ${BASH_SOURCE[0]}).\n" if [ -z $(echo "@NVCC_WRAPPER_DEFAULT_COMPILER@" | grep 'NVCC_WRAPPER_DEFAULT_COMPILER') ]; then
exit 1 : ${NVCC_WRAPPER_DEFAULT_COMPILER:="@NVCC_WRAPPER_DEFAULT_COMPILER@"}
fi fi
# set default nvcc wrapper compiler if not specified # set default nvcc wrapper compiler if not specified
: ${NVCC_WRAPPER_DEFAULT_COMPILER:=${CXX_COMPILER}} : ${NVCC_WRAPPER_DEFAULT_COMPILER:=${CXX_COMPILER}}
export NVCC_WRAPPER_DEFAULT_COMPILER export NVCC_WRAPPER_DEFAULT_COMPILER
# calling itself will cause an infinitely long build # nvcc_wrapper calling itself will cause an infinitely long build
if [ "${NVCC_WRAPPER}" = "${NVCC_WRAPPER_DEFAULT_COMPILER}" ]; then if [ "${NVCC_WRAPPER}" = "${NVCC_WRAPPER_DEFAULT_COMPILER}" ]; then
echo -e "\nError: NVCC_WRAPPER == NVCC_WRAPPER_DEFAULT_COMPILER. Terminating to avoid infinite loop!\n" echo -e "\nError: NVCC_WRAPPER == NVCC_WRAPPER_DEFAULT_COMPILER. Terminating to avoid infinite loop!\n"
exit 1 exit 1
fi fi
fi
# discard the compiler from the command # discard the compiler from the command
shift shift
# execute nvcc_wrapper debug-message ${KOKKOS_COMPILER} $@
${NVCC_WRAPPER} $@ # execute ${KOKKOS_COMPILER} (again, usually nvcc_wrapper)
${KOKKOS_COMPILER} $@
fi fi

View File

@ -191,11 +191,11 @@ do
shift shift
;; ;;
#Handle known nvcc args #Handle known nvcc args
--dryrun|--verbose|--keep|--keep-dir*|-G|--relocatable-device-code*|-lineinfo|-expt-extended-lambda|-expt-relaxed-constexpr|--resource-usage|-Xptxas*|--fmad*|--Wext-lambda-captures-this|-Wext-lambda-captures-this) --dryrun|--verbose|--keep|--keep-dir*|-G|--relocatable-device-code*|-lineinfo|-expt-extended-lambda|-expt-relaxed-constexpr|--resource-usage|-Xptxas*|--fmad*|--use_fast_math|--Wext-lambda-captures-this|-Wext-lambda-captures-this)
cuda_args="$cuda_args $1" cuda_args="$cuda_args $1"
;; ;;
#Handle more known nvcc args #Handle more known nvcc args
--expt-extended-lambda|--expt-relaxed-constexpr) --expt-extended-lambda|--expt-relaxed-constexpr|--Wno-deprecated-gpu-targets|-Wno-deprecated-gpu-targets)
cuda_args="$cuda_args $1" cuda_args="$cuda_args $1"
;; ;;
#Handle known nvcc args that have an argument #Handle known nvcc args that have an argument

View File

@ -0,0 +1,91 @@
#----------------------------------------------------------------------------------------#
#
# CTestConfig.cmake template for Kokkos
#
#----------------------------------------------------------------------------------------#
#
# dash-board related
#
set(CTEST_PROJECT_NAME "Kokkos")
set(CTEST_NIGHTLY_START_TIME "01:00:00 UTC")
set(CTEST_DROP_METHOD "https")
set(CTEST_DROP_SITE "cdash.nersc.gov")
set(CTEST_DROP_LOCATION "/submit.php?project=${CTEST_PROJECT_NAME}")
set(CTEST_CDASH_VERSION "1.6")
set(CTEST_CDASH_QUERY_VERSION TRUE)
set(CTEST_SUBMIT_RETRY_COUNT "1")
set(CTEST_SUBMIT_RETRY_DELAY "30")
#
# configure/build related
#
set(CTEST_BUILD_NAME "@BUILD_NAME@")
set(CTEST_MODEL "@MODEL@")
set(CTEST_SITE "@SITE@")
set(CTEST_CONFIGURATION_TYPE "@BUILD_TYPE@")
set(CTEST_SOURCE_DIRECTORY "@SOURCE_REALDIR@")
set(CTEST_BINARY_DIRECTORY "@BINARY_REALDIR@")
#
# configure/build related
#
set(CTEST_UPDATE_TYPE "git")
set(CTEST_UPDATE_VERSION_ONLY ON)
# set(CTEST_GENERATOR "")
# set(CTEST_GENERATOR_PLATFORM "")
#
# testing related
#
set(CTEST_TIMEOUT "7200")
set(CTEST_TEST_TIMEOUT "7200")
set(CTEST_CUSTOM_MAXIMUM_NUMBER_OF_ERRORS "100")
set(CTEST_CUSTOM_MAXIMUM_NUMBER_OF_WARNINGS "100")
set(CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE "1048576")
#
# coverage related
#
set(CTEST_CUSTOM_COVERAGE_EXCLUDE ".*tpls/.*;/usr/.*;.*unit_test/.*;.*unit_tests/.*;.*perf_test/.*")
#
# commands
#
if(NOT "@CHECKOUT_COMMAND@" STREQUAL "")
set(CTEST_CHECKOUT_COMMAND "@CHECKOUT_COMMAND@")
endif()
set(CTEST_UPDATE_COMMAND "@GIT_EXECUTABLE@")
set(CTEST_CONFIGURE_COMMAND "@CMAKE_COMMAND@ -DCMAKE_BUILD_TYPE=@BUILD_TYPE@ -DKokkos_ENABLE_TESTS=ON @CONFIG_ARGS@ @SOURCE_REALDIR@")
set(CTEST_BUILD_COMMAND "@CMAKE_COMMAND@ --build @BINARY_REALDIR@ --target @TARGET@")
if(NOT WIN32)
set(CTEST_BUILD_COMMAND "${CTEST_BUILD_COMMAND} -- -j@BUILD_JOBS@")
endif()
set(CTEST_COVERAGE_COMMAND "gcov")
set(CTEST_MEMORYCHECK_COMMAND "valgrind")
set(CTEST_GIT_COMMAND "@GIT_EXECUTABLE@")
#
# various configs
#
set(APPEND_VALUE @APPEND@)
if(APPEND_VALUE)
set(APPEND_CTEST APPEND)
endif()
macro(SET_TEST_PROP VAR)
if(NOT "${ARGS}" STREQUAL "")
set(${VAR}_CTEST ${VAR} ${ARGN})
endif()
endmacro()
set_test_prop(START @START@)
set_test_prop(END @END@)
set_test_prop(STRIDE @STRIDE@)
set_test_prop(INCLUDE @INCLUDE@)
set_test_prop(EXCLUDE @EXCLUDE@)
set_test_prop(INCLUDE_LABEL @INCLUDE_LABEL@)
set_test_prop(EXCLUDE_LABEL @EXCLUDE_LABEL@)
set_test_prop(PARALLEL_LEVEL @PARALLEL_LEVEL@)
set_test_prop(STOP_TIME @STOP_TIME@)
set_test_prop(COVERAGE_LABELS @LABELS@)

View File

@ -0,0 +1,350 @@
cmake_minimum_required(VERSION 3.16 FATAL_ERROR)
message(STATUS "")
get_cmake_property(_cached_vars CACHE_VARIABLES)
set(KOKKOS_CMAKE_ARGS)
set(EXCLUDED_VARIABLES "CMAKE_COMMAND" "CMAKE_CPACK_COMMAND" "CMAKE_CTEST_COMMAND" "CMAKE_ROOT"
"CTEST_ARGS" "BUILD_NAME" "CMAKE_CXX_FLAGS" "CMAKE_BUILD_TYPE")
list(SORT _cached_vars)
foreach(_var ${_cached_vars})
if(NOT "${_var}" IN_LIST EXCLUDED_VARIABLES)
list(APPEND KOKKOS_CMAKE_ARGS ${_var})
if("${_var}" STREQUAL "CMAKE_BUILD_TYPE")
set(BUILD_TYPE "${CMAKE_BUILD_TYPE}")
endif()
endif()
endforeach()
#----------------------------------------------------------------------------------------#
#
# Macros and variables
#
#----------------------------------------------------------------------------------------#
macro(CHECK_REQUIRED VAR)
if(NOT DEFINED ${VAR})
message(FATAL_ERROR "Error! Variable '${VAR}' must be defined")
endif()
endmacro()
# require the build name variable
CHECK_REQUIRED(BUILD_NAME)
# uses all args
macro(SET_DEFAULT VAR)
if(NOT DEFINED ${VAR})
set(${VAR} ${ARGN})
endif()
# remove these ctest configuration variables from the defines
# passed to the Kokkos configuration
if("${VAR}" IN_LIST KOKKOS_CMAKE_ARGS)
list(REMOVE_ITEM KOKKOS_CMAKE_ARGS "${VAR}")
endif()
endmacro()
# uses first arg -- useful for selecting via priority from multiple
# potentially defined variables, e.g.:
#
# set_default_arg1(BUILD_NAME ${TRAVIS_BUILD_NAME} ${BUILD_NAME})
#
macro(SET_DEFAULT_ARG1 VAR)
if(NOT DEFINED ${VAR})
foreach(_ARG ${ARGN})
if(NOT "${_ARG}" STREQUAL "")
set(${VAR} ${_ARG})
break()
endif()
endforeach()
endif()
# remove these ctest configuration variables from the defines
# passed to the Kokkos configuration
if("${VAR}" IN_LIST KOKKOS_CMAKE_ARGS)
list(REMOVE_ITEM KOKKOS_CMAKE_ARGS "${VAR}")
endif()
endmacro()
# determine the default working directory
if(NOT "$ENV{WORKSPACE}" STREQUAL "")
set(WORKING_DIR "$ENV{WORKSPACE}")
else()
get_filename_component(WORKING_DIR ${CMAKE_CURRENT_LIST_DIR} DIRECTORY)
endif()
# determine the hostname
execute_process(COMMAND hostname
OUTPUT_VARIABLE HOSTNAME
OUTPUT_STRIP_TRAILING_WHITESPACE)
SET_DEFAULT(HOSTNAME "$ENV{HOSTNAME}")
# get the number of processors
include(ProcessorCount)
ProcessorCount(NUM_PROCESSORS)
# find git
find_package(Git QUIET)
if(NOT GIT_EXECUTABLE)
unset(GIT_EXECUTABLE CACHE)
unset(GIT_EXECUTABLE)
endif()
function(EXECUTE_GIT_COMMAND VAR)
set(${VAR} "" PARENT_SCOPE)
execute_process(COMMAND ${GIT_EXECUTABLE} ${ARGN}
OUTPUT_VARIABLE VAL
RESULT_VARIABLE RET
OUTPUT_STRIP_TRAILING_WHITESPACE
WORKING_DIRECTORY ${CMAKE_CURRENT_LIST_DIR}
ERROR_QUIET)
string(REPLACE ";" " " _CMD "${GIT_EXECUTABLE} ${ARGN}")
set(LAST_GIT_COMMAND "${_CMD}" PARENT_SCOPE)
if(RET EQUAL 0)
set(${VAR} "${VAL}" PARENT_SCOPE)
endif()
endfunction()
# just gets the git branch name if available
function(GET_GIT_BRANCH_NAME VAR)
execute_git_command(GIT_BRANCH branch --show-current)
set(_INVALID "%D" "HEAD")
if(NOT GIT_BRANCH OR "${GIT_BRANCH}" IN_LIST _INVALID)
execute_git_command(GIT_BRANCH show -s --format=%D)
if(NOT GIT_BRANCH OR "${GIT_BRANCH}" IN_LIST _INVALID)
execute_git_command(GIT_BRANCH --describe all)
endif()
endif()
#
if(GIT_BRANCH)
string(REPLACE " " ";" _DESC "${GIT_BRANCH}")
# just set it to last one via loop instead of wonky cmake index manip
foreach(_ITR ${_DESC})
set(GIT_BRANCH "${_ITR}")
endforeach()
set(${VAR} "${GIT_BRANCH}" PARENT_SCOPE)
message(STATUS "GIT BRANCH via '${LAST_GIT_COMMAND}': ${GIT_BRANCH}")
endif()
endfunction()
# just gets the git branch name if available
function(GET_GIT_AUTHOR_NAME VAR)
execute_git_command(GIT_AUTHOR show -s --format=%an)
if(GIT_AUTHOR)
string(LENGTH "${GIT_AUTHOR}" STRLEN)
# if the build name gets too long, this can cause submission errors
if(STRLEN GREATER 24)
# remove middle initial
string(REGEX REPLACE " [A-Z]\. " " " GIT_AUTHOR "${GIT_AUTHOR}")
# get first and sur name
string(REGEX REPLACE "([A-Za-z]+) ([A-Za-z]+)" "\\1" F_NAME "${GIT_AUTHOR}")
string(REGEX REPLACE "([A-Za-z]+) ([A-Za-z]+)" "\\2" S_NAME "${GIT_AUTHOR}")
if(S_NAME)
set(GIT_AUTHOR "${S_NAME}")
elseif(F_NAME)
set(GIT_AUTHOR "${F_NAME}")
endif()
endif()
# remove any spaces, quotes, periods, etc.
string(REGEX REPLACE "[ ',;_\.\"]+" "" GIT_AUTHOR "${GIT_AUTHOR}")
set(${VAR} "${GIT_AUTHOR}" PARENT_SCOPE)
message(STATUS "GIT AUTHOR via '${LAST_GIT_COMMAND}': ${GIT_AUTHOR}")
endif()
endfunction()
# get the name of the branch
GET_GIT_BRANCH_NAME(GIT_BRANCH)
# get the name of the author
GET_GIT_AUTHOR_NAME(GIT_AUTHOR)
# author, prefer git method for consistency
SET_DEFAULT_ARG1(AUTHOR ${GIT_AUTHOR} $ENV{GIT_AUTHOR} $ENV{AUTHOR})
# SLUG == owner_name/repo_name
SET_DEFAULT_ARG1(SLUG $ENV{TRAVIS_PULL_REQUEST_SLUG} $ENV{TRAVIS_REPO_SLUG} $ENV{APPVEYOR_REPO_NAME} $ENV{PULL_REQUEST_SLUG} $ENV{REPO_SLUG})
# branch name
SET_DEFAULT_ARG1(BRANCH $ENV{TRAVIS_PULL_REQUEST_BRANCH} $ENV{TRAVIS_BRANCH} $ENV{APPVEYOR_PULL_REQUEST_HEAD_REPO_BRANCH} $ENV{APPVEYOR_REPO_BRANCH} $ENV{GIT_BRANCH} $ENV{BRANCH_NAME} $ENV{BRANCH} ${GIT_BRANCH})
# pull request number
SET_DEFAULT_ARG1(PULL_REQUEST_NUM $ENV{TRAVIS_PULL_REQUEST} $ENV{CHANGE_ID} $ENV{APPVEYOR_PULL_REQUEST_NUMBER} $ENV{PULL_REQUEST_NUM})
# get the event type, e.g. push, pull_request, api, cron, etc.
SET_DEFAULT_ARG1(EVENT_TYPE $ENV{TRAVIS_EVENT_TYPE} ${EVENT_TYPE})
if("${BRANCH}" STREQUAL "")
message(STATUS "Checked: environment variables for Travis, Appveyor, Jenkins (git plugin), BRANCH_NAME, BRANCH and 'git branch --show-current'")
message(FATAL_ERROR "Error! Git branch could not be determined. Please provide -DBRANCH=<name>")
endif()
#----------------------------------------------------------------------------------------#
#
# Set default values if not provided on command-line
#
#----------------------------------------------------------------------------------------#
SET_DEFAULT(SOURCE_DIR "${WORKING_DIR}") # source directory
SET_DEFAULT(BINARY_DIR "${WORKING_DIR}/build") # build directory
SET_DEFAULT(BUILD_TYPE "${CMAKE_BUILD_TYPE}") # Release, Debug, etc.
SET_DEFAULT(MODEL "Continuous") # Continuous, Nightly, or Experimental
SET_DEFAULT(JOBS 1) # number of parallel ctests
SET_DEFAULT(CTEST_COMMAND "${CMAKE_CTEST_COMMAND}") # just in case
SET_DEFAULT(CTEST_ARGS "-V --output-on-failure") # extra arguments when ctest is called
SET_DEFAULT(GIT_EXECUTABLE "git") # ctest_update
SET_DEFAULT(TARGET "all") # build target
SET_DEFAULT_ARG1(SITE "$ENV{SITE}"
"${HOSTNAME}") # update site
SET_DEFAULT_ARG1(BUILD_JOBS "$ENV{BUILD_JOBS}"
"${NUM_PROCESSORS}") # number of parallel compile jobs
#
# The variable below correspond to ctest arguments, i.e. START,END,STRIDE are
# '-I START,END,STRIDE'
#
SET_DEFAULT(START "")
SET_DEFAULT(END "")
SET_DEFAULT(STRIDE "")
SET_DEFAULT(INCLUDE "")
SET_DEFAULT(EXCLUDE "")
SET_DEFAULT(INCLUDE_LABEL "")
SET_DEFAULT(EXCLUDE_LABEL "")
SET_DEFAULT(PARALLEL_LEVEL "")
SET_DEFAULT(STOP_TIME "")
SET_DEFAULT(LABELS "")
SET_DEFAULT(NOTES "")
# default static build tag for Nightly
set(BUILD_TAG "${BRANCH}")
if(NOT BUILD_TYPE)
# default for kokkos if not specified
set(BUILD_TYPE "RelWithDebInfo")
endif()
# generate dynamic name if continuous or experimental model
if(NOT "${MODEL}" STREQUAL "Nightly")
if(EVENT_TYPE AND PULL_REQUEST_NUM)
# e.g. pull_request/123
if(AUTHOR)
set(BUILD_TAG "${AUTHOR}/${EVENT_TYPE}/${PULL_REQUEST_NUM}")
else()
set(BUILD_TAG "${EVENT_TYPE}/${PULL_REQUEST_NUM}")
endif()
elseif(SLUG)
# e.g. owner_name/repo_name
set(BUILD_TAG "${SLUG}")
elseif(AUTHOR)
set(BUILD_TAG "${AUTHOR}/${BRANCH}")
endif()
if(EVENT_TYPE AND NOT PULL_REQUEST_NUM)
set(BUILD_TAG "${BUILD_TAG}-${EVENT_TYPE}")
endif()
endif()
# unnecessary
string(REPLACE "/remotes/" "/" BUILD_TAG "${BUILD_TAG}")
string(REPLACE "/origin/" "/" BUILD_TAG "${BUILD_TAG}")
message(STATUS "BUILD_TAG: ${BUILD_TAG}")
set(BUILD_NAME "[${BUILD_TAG}] [${BUILD_NAME}-${BUILD_TYPE}]")
# colons in build name create extra (empty) entries in CDash
string(REPLACE ":" "-" BUILD_NAME "${BUILD_NAME}")
# unnecessary info
string(REPLACE "/merge]" "]" BUILD_NAME "${BUILD_NAME}")
# consistency
string(REPLACE "/pr/" "/pull/" BUILD_NAME "${BUILD_NAME}")
string(REPLACE "pull_request/" "pull/" BUILD_NAME "${BUILD_NAME}")
# miscellaneous from missing fields
string(REPLACE "--" "-" BUILD_NAME "${BUILD_NAME}")
string(REPLACE "-]" "]" BUILD_NAME "${BUILD_NAME}")
# check binary directory
if(EXISTS ${BINARY_DIR})
if(NOT IS_DIRECTORY "${BINARY_DIR}")
message(FATAL_ERROR "Error! '${BINARY_DIR}' already exists and is not a directory!")
endif()
file(GLOB BINARY_DIR_FILES "${BINARY_DIR}/*")
if(NOT "${BINARY_DIR_FILES}" STREQUAL "")
message(FATAL_ERROR "Error! '${BINARY_DIR}' already exists and is not empty!")
endif()
endif()
get_filename_component(SOURCE_REALDIR ${SOURCE_DIR} REALPATH)
get_filename_component(BINARY_REALDIR ${BINARY_DIR} REALPATH)
#----------------------------------------------------------------------------------------#
#
# Generate the CTestConfig.cmake
#
#----------------------------------------------------------------------------------------#
set(CONFIG_ARGS)
foreach(_ARG ${KOKKOS_CMAKE_ARGS})
if(NOT "${${_ARG}}" STREQUAL "")
get_property(_ARG_TYPE CACHE ${_ARG} PROPERTY TYPE)
if("${_ARG_TYPE}" STREQUAL "UNINITIALIZED")
if("${${_ARG}}" STREQUAL "ON" OR "${${_ARG}}" STREQUAL "OFF")
set(_ARG_TYPE "BOOL")
elseif(EXISTS "${${_ARG}}" AND NOT IS_DIRECTORY "${${_ARG}}")
set(_ARG_TYPE "FILEPATH")
elseif(EXISTS "${${_ARG}}" AND IS_DIRECTORY "${${_ARG}}")
set(_ARG_TYPE "PATH")
elseif(NOT "${${_ARG}}" STREQUAL "")
set(_ARG_TYPE "STRING")
endif()
endif()
set(CONFIG_ARGS "${CONFIG_ARGS}set(${_ARG} \"${${_ARG}}\" CACHE ${_ARG_TYPE} \"\")\n")
endif()
endforeach()
file(WRITE ${BINARY_REALDIR}/initial-cache.cmake
"
set(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS}\" CACHE STRING \"\")
${CONFIG_ARGS}
")
file(READ ${BINARY_REALDIR}/initial-cache.cmake _CACHE_INFO)
message(STATUS "Initial cache:\n${_CACHE_INFO}")
# initialize the cache
set(CONFIG_ARGS "-C ${BINARY_REALDIR}/initial-cache.cmake")
# generate the CTestConfig.cmake
configure_file(
${CMAKE_CURRENT_LIST_DIR}/CTestConfig.cmake.in
${BINARY_REALDIR}/CTestConfig.cmake
@ONLY)
# copy/generate the dashboard script
configure_file(
${CMAKE_CURRENT_LIST_DIR}/KokkosCTest.cmake.in
${BINARY_REALDIR}/KokkosCTest.cmake
@ONLY)
# custom CTest settings go in ${BINARY_DIR}/CTestCustom.cmake
execute_process(
COMMAND ${CMAKE_COMMAND} -E touch CTestCustom.cmake
WORKING_DIRECTORY ${BINARY_REALDIR}
)
#----------------------------------------------------------------------------------------#
#
# Execute CTest
#
#----------------------------------------------------------------------------------------#
message(STATUS "")
message(STATUS "BUILD_NAME: ${BUILD_NAME}")
message(STATUS "Executing '${CTEST_COMMAND} -S KokkosCTest.cmake ${CTEST_ARGS}'...")
message(STATUS "")
# e.g. -DCTEST_ARGS="--output-on-failure -VV" should really be -DCTEST_ARGS="--output-on-failure;-VV"
string(REPLACE " " ";" CTEST_ARGS "${CTEST_ARGS}")
execute_process(
COMMAND ${CTEST_COMMAND} -S KokkosCTest.cmake ${CTEST_ARGS}
RESULT_VARIABLE RET
WORKING_DIRECTORY ${BINARY_REALDIR}
)
# ensure that any non-zero result variable gets propagated
if(NOT RET EQUAL 0)
message(FATAL_ERROR "CTest return non-zero exit code: ${RET}")
endif()

View File

@ -0,0 +1,261 @@
cmake_minimum_required(VERSION 3.16 FATAL_ERROR)
if(EXISTS "${CMAKE_CURRENT_LIST_DIR}/CTestConfig.cmake")
include("${CMAKE_CURRENT_LIST_DIR}/CTestConfig.cmake")
endif()
include(ProcessorCount)
ProcessorCount(CTEST_PROCESSOR_COUNT)
cmake_policy(SET CMP0009 NEW)
cmake_policy(SET CMP0011 NEW)
# ---------------------------------------------------------------------------- #
# -- Commands
# ---------------------------------------------------------------------------- #
find_program(CTEST_CMAKE_COMMAND NAMES cmake)
find_program(CTEST_UNAME_COMMAND NAMES uname)
find_program(CTEST_BZR_COMMAND NAMES bzr)
find_program(CTEST_CVS_COMMAND NAMES cvs)
find_program(CTEST_GIT_COMMAND NAMES git)
find_program(CTEST_HG_COMMAND NAMES hg)
find_program(CTEST_P4_COMMAND NAMES p4)
find_program(CTEST_SVN_COMMAND NAMES svn)
find_program(VALGRIND_COMMAND NAMES valgrind)
find_program(GCOV_COMMAND NAMES gcov)
find_program(LCOV_COMMAND NAMES llvm-cov)
find_program(MEMORYCHECK_COMMAND NAMES valgrind )
set(MEMORYCHECK_TYPE Valgrind)
# set(MEMORYCHECK_TYPE Purify)
# set(MEMORYCHECK_TYPE BoundsChecker)
# set(MEMORYCHECK_TYPE ThreadSanitizer)
# set(MEMORYCHECK_TYPE AddressSanitizer)
# set(MEMORYCHECK_TYPE LeakSanitizer)
# set(MEMORYCHECK_TYPE MemorySanitizer)
# set(MEMORYCHECK_TYPE UndefinedBehaviorSanitizer)
set(MEMORYCHECK_COMMAND_OPTIONS "--trace-children=yes --leak-check=full")
# ---------------------------------------------------------------------------- #
# -- Settings
# ---------------------------------------------------------------------------- #
## -- Process timeout in seconds
set(CTEST_TIMEOUT "7200")
## -- Set output to English
set(ENV{LC_MESSAGES} "en_EN" )
# ---------------------------------------------------------------------------- #
# -- Copy ctest configuration file
# ---------------------------------------------------------------------------- #
macro(COPY_CTEST_CONFIG_FILES)
foreach(_FILE CTestConfig.cmake CTestCustom.cmake)
# if current directory is not binary or source directory
if(NOT "${CMAKE_CURRENT_LIST_DIR}" STREQUAL "${CTEST_BINARY_DIRECTORY}" AND
NOT "${CTEST_SOURCE_DIRECTORY}" STREQUAL "${CTEST_BINARY_DIRECTORY}")
# if file exists in current directory
if(EXISTS ${CMAKE_CURRENT_LIST_DIR}/${_FILE})
configure_file(${CMAKE_CURRENT_LIST_DIR}/${_FILE}
${CTEST_BINARY_DIRECTORY}/${_FILE} COPYONLY)
endif()
# if source and binary differ
elseif(NOT "${CTEST_SOURCE_DIRECTORY}" STREQUAL "${CTEST_BINARY_DIRECTORY}")
# if file exists in source directory but not in binary directory
if(EXISTS ${CTEST_SOURCE_DIRECTORY}/${_FILE} AND
NOT EXISTS ${CTEST_BINARY_DIRECTORY}/${_FILE})
configure_file(${CTEST_SOURCE_DIRECTORY}/${_FILE}
${CTEST_BINARY_DIRECTORY}/${_FILE} COPYONLY)
endif()
endif()
endforeach()
endmacro()
ctest_read_custom_files("${CMAKE_CURRENT_LIST_DIR}")
message(STATUS "CTEST_MODEL: ${CTEST_MODEL}")
#-------------------------------------------------------------------------#
# Start
#
message(STATUS "")
message(STATUS "[${CTEST_BUILD_NAME}] Running START_CTEST stage...")
message(STATUS "")
ctest_start(${CTEST_MODEL} TRACK ${CTEST_MODEL} ${APPEND_CTEST}
${CTEST_SOURCE_DIRECTORY} ${CTEST_BINARY_DIRECTORY})
#-------------------------------------------------------------------------#
# Config
#
copy_ctest_config_files()
ctest_read_custom_files("${CTEST_BINARY_DIRECTORY}")
#-------------------------------------------------------------------------#
# Update
#
message(STATUS "")
message(STATUS "[${CTEST_BUILD_NAME}] Running CTEST_UPDATE stage...")
message(STATUS "")
ctest_update(SOURCE "${CTEST_SOURCE_DIRECTORY}"
RETURN_VALUE up_ret)
#-------------------------------------------------------------------------#
# Configure
#
message(STATUS "")
message(STATUS "[${CTEST_BUILD_NAME}] Running CTEST_CONFIGURE stage...")
message(STATUS "")
ctest_configure(BUILD "${CTEST_BINARY_DIRECTORY}"
SOURCE ${CTEST_SOURCE_DIRECTORY}
${APPEND_CTEST}
OPTIONS "${CTEST_CONFIGURE_OPTIONS}"
RETURN_VALUE config_ret)
#-------------------------------------------------------------------------#
# Echo configure log bc Damien wants to delay merging this PR for eternity
#
file(GLOB _configure_log "${CTEST_BINARY_DIRECTORY}/Testing/Temporary/LastConfigure*.log")
# should only have one but loop just for safety
foreach(_LOG ${_configure_log})
file(READ ${_LOG} _LOG_MESSAGE)
message(STATUS "Configure Log: ${_LOG}")
message(STATUS "\n${_LOG_MESSAGE}\n")
endforeach()
#-------------------------------------------------------------------------#
# Build
#
message(STATUS "")
message(STATUS "[${CTEST_BUILD_NAME}] Running CTEST_BUILD stage...")
message(STATUS "")
ctest_build(BUILD "${CTEST_BINARY_DIRECTORY}"
${APPEND_CTEST}
RETURN_VALUE build_ret)
#-------------------------------------------------------------------------#
# Echo build log bc Damien wants to delay merging this PR for eternity
#
file(GLOB _build_log "${CTEST_BINARY_DIRECTORY}/Testing/Temporary/LastBuild*.log")
# should only have one but loop just for safety
foreach(_LOG ${_build_log})
file(READ ${_LOG} _LOG_MESSAGE)
message(STATUS "Build Log: ${_LOG}")
message(STATUS "\n${_LOG_MESSAGE}\n")
endforeach()
#-------------------------------------------------------------------------#
# Test
#
message(STATUS "")
message(STATUS "[${CTEST_BUILD_NAME}] Running CTEST_TEST stage...")
message(STATUS "")
ctest_test(RETURN_VALUE test_ret
${APPEND_CTEST}
${START_CTEST}
${END_CTEST}
${STRIDE_CTEST}
${INCLUDE_CTEST}
${EXCLUDE_CTEST}
${INCLUDE_LABEL_CTEST}
${EXCLUDE_LABEL_CTEST}
${PARALLEL_LEVEL_CTEST}
${STOP_TIME_CTEST}
SCHEDULE_RANDOM OFF)
#-------------------------------------------------------------------------#
# Coverage
#
message(STATUS "")
message(STATUS "[${CTEST_BUILD_NAME}] Running CTEST_COVERAGE stage...")
message(STATUS "")
execute_process(COMMAND ${CTEST_COVERAGE_COMMAND} ${CTEST_COVERAGE_EXTRA_FLAGS}
WORKING_DIRECTORY ${CTEST_BINARY_DIRECTORY}
ERROR_QUIET)
ctest_coverage(${APPEND_CTEST}
${CTEST_COVERAGE_LABELS}
RETURN_VALUE cov_ret)
#-------------------------------------------------------------------------#
# MemCheck
#
message(STATUS "")
message(STATUS "[${CTEST_BUILD_NAME}] Running CTEST_MEMCHECK stage...")
message(STATUS "")
ctest_memcheck(RETURN_VALUE mem_ret
${APPEND_CTEST}
${START_CTEST}
${END_CTEST}
${STRIDE_CTEST}
${INCLUDE_CTEST}
${EXCLUDE_CTEST}
${INCLUDE_LABEL_CTEST}
${EXCLUDE_LABEL_CTEST}
${PARALLEL_LEVEL_CTEST})
#-------------------------------------------------------------------------#
# Submit
#
message(STATUS "")
message(STATUS "[${CTEST_BUILD_NAME}] Running CTEST_SUBMIT stage...")
message(STATUS "")
file(GLOB_RECURSE NOTE_FILES "${CTEST_BINARY_DIRECTORY}/*CTestNotes.cmake")
foreach(_FILE ${NOTE_FILES})
message(STATUS "Including CTest notes files: \"${_FILE}\"...")
include("${_FILE}")
endforeach()
# capture submit error so it doesn't fail because of a submission error
ctest_submit(RETURN_VALUE submit_ret
RETRY_COUNT 2
RETRY_DELAY 10
CAPTURE_CMAKE_ERROR submit_err)
#-------------------------------------------------------------------------#
# Submit
#
message(STATUS "")
message(STATUS "[${CTEST_BUILD_NAME}] Finished ${CTEST_MODEL} Stages (${STAGES})")
message(STATUS "")
#-------------------------------------------------------------------------#
# Non-zero exit codes for important errors
#
if(NOT config_ret EQUAL 0)
message(FATAL_ERROR "Error during configuration! Exit code: ${config_ret}")
endif()
if(NOT build_ret EQUAL 0)
message(FATAL_ERROR "Error during build! Exit code: ${build_ret}")
endif()
if(NOT test_ret EQUAL 0)
message(FATAL_ERROR "Error during testing! Exit code: ${test_ret}")
endif()

View File

@ -19,17 +19,44 @@ INCLUDE("${Kokkos_CMAKE_DIR}/KokkosTargets.cmake")
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosConfigCommon.cmake") INCLUDE("${Kokkos_CMAKE_DIR}/KokkosConfigCommon.cmake")
UNSET(Kokkos_CMAKE_DIR) UNSET(Kokkos_CMAKE_DIR)
# if CUDA was enabled and separable compilation was specified, e.g. # check for conflicts
# find_package(Kokkos COMPONENTS separable_compilation) IF("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS AND
# then we set the RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK "separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS)
IF(@Kokkos_ENABLE_CUDA@ AND NOT "separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS) MESSAGE(STATUS "'launch_compiler' implies global redirection of targets depending on Kokkos to appropriate compiler.")
MESSAGE(STATUS "'separable_compilation' implies explicitly defining where redirection occurs via 'kokkos_compilation(PROJECT|TARGET|SOURCE|DIRECTORY ...)'")
MESSAGE(FATAL_ERROR "Conflicting COMPONENTS: 'launch_compiler' and 'separable_compilation'")
ENDIF()
IF("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS)
#
# if find_package(Kokkos COMPONENTS launch_compiler) then rely on the
# RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK to always redirect to the
# appropriate compiler for Kokkos
#
MESSAGE(STATUS "kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos")
kokkos_compilation(
GLOBAL
CHECK_CUDA_COMPILES)
ELSEIF(@Kokkos_ENABLE_CUDA@ AND NOT "separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS)
#
# if CUDA was enabled, separable compilation was not specified, and current compiler
# cannot compile CUDA, then set the RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK globally and
# kokkos_launch_compiler will re-direct to the compiler used to compile CUDA code during installation.
# kokkos_launch_compiler will re-direct if ${CMAKE_CXX_COMPILER} and -DKOKKOS_DEPENDENCE is present,
# otherwise, the original command will be executed
#
# run test to see if CMAKE_CXX_COMPILER=nvcc_wrapper # run test to see if CMAKE_CXX_COMPILER=nvcc_wrapper
kokkos_compiler_is_nvcc(IS_NVCC ${CMAKE_CXX_COMPILER}) kokkos_compiler_is_nvcc(IS_NVCC ${CMAKE_CXX_COMPILER})
# if not nvcc_wrapper, use RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK
IF(NOT IS_NVCC AND NOT CMAKE_CXX_COMPILER_ID STREQUAL Clang AND # if not nvcc_wrapper and Kokkos_LAUNCH_COMPILER was not set to OFF
(NOT DEFINED Kokkos_LAUNCH_COMPILER OR Kokkos_LAUNCH_COMPILER)) IF(NOT IS_NVCC AND (NOT DEFINED Kokkos_LAUNCH_COMPILER OR Kokkos_LAUNCH_COMPILER))
MESSAGE(STATUS "kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to nvcc_wrapper") MESSAGE(STATUS "kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos")
kokkos_compilation(GLOBAL) kokkos_compilation(GLOBAL)
ENDIF() ENDIF()
UNSET(IS_NVCC) # be mindful of the environment, pollution is bad
# be mindful of the environment, pollution is bad
UNSET(IS_NVCC)
ENDIF() ENDIF()

View File

@ -3,6 +3,7 @@ SET(Kokkos_OPTIONS @KOKKOS_ENABLED_OPTIONS@)
SET(Kokkos_TPLS @KOKKOS_ENABLED_TPLS@) SET(Kokkos_TPLS @KOKKOS_ENABLED_TPLS@)
SET(Kokkos_ARCH @KOKKOS_ENABLED_ARCH_LIST@) SET(Kokkos_ARCH @KOKKOS_ENABLED_ARCH_LIST@)
SET(Kokkos_CXX_COMPILER "@CMAKE_CXX_COMPILER@") SET(Kokkos_CXX_COMPILER "@CMAKE_CXX_COMPILER@")
SET(Kokkos_CXX_COMPILER_ID "@KOKKOS_CXX_COMPILER_ID@")
# These are needed by KokkosKernels # These are needed by KokkosKernels
FOREACH(DEV ${Kokkos_DEVICES}) FOREACH(DEV ${Kokkos_DEVICES})
@ -13,7 +14,7 @@ IF(NOT Kokkos_FIND_QUIETLY)
MESSAGE(STATUS "Enabled Kokkos devices: ${Kokkos_DEVICES}") MESSAGE(STATUS "Enabled Kokkos devices: ${Kokkos_DEVICES}")
ENDIF() ENDIF()
IF (Kokkos_ENABLE_CUDA AND ${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.14.0") IF (Kokkos_ENABLE_CUDA)
# If we are building CUDA, we have tricked CMake because we declare a CXX project # If we are building CUDA, we have tricked CMake because we declare a CXX project
# If the default C++ standard for a given compiler matches the requested # If the default C++ standard for a given compiler matches the requested
# standard, then CMake just omits the -std flag in later versions of CMake # standard, then CMake just omits the -std flag in later versions of CMake
@ -90,52 +91,6 @@ function(kokkos_check)
endif() endif()
endfunction() endfunction()
# this function is provided to easily select which files use nvcc_wrapper:
#
# GLOBAL --> all files
# TARGET --> all files in a target
# SOURCE --> specific source files
# DIRECTORY --> all files in directory
# PROJECT --> all files/targets in a project/subproject
#
FUNCTION(kokkos_compilation)
CMAKE_PARSE_ARGUMENTS(COMP "GLOBAL;PROJECT" "" "DIRECTORY;TARGET;SOURCE" ${ARGN})
# search relative first and then absolute
SET(_HINTS "${CMAKE_CURRENT_LIST_DIR}/../.." "@CMAKE_INSTALL_PREFIX@")
# find kokkos_launch_compiler
FIND_PROGRAM(Kokkos_COMPILE_LAUNCHER
NAMES kokkos_launch_compiler
HINTS ${_HINTS}
PATHS ${_HINTS}
PATH_SUFFIXES bin)
IF(NOT Kokkos_COMPILE_LAUNCHER)
MESSAGE(FATAL_ERROR "Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'")
ENDIF()
IF(COMP_GLOBAL)
# if global, don't bother setting others
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
ELSE()
FOREACH(_TYPE PROJECT DIRECTORY TARGET SOURCE)
# make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...)
IF("${_TYPE}" STREQUAL "PROJECT" AND COMP_${_TYPE})
LIST(APPEND COMP_DIRECTORY ${PROJECT_SOURCE_DIR})
UNSET(COMP_${_TYPE})
ENDIF()
# set the properties if defined
IF(COMP_${_TYPE})
# MESSAGE(STATUS "Using nvcc_wrapper :: ${_TYPE} :: ${COMP_${_TYPE}}")
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}")
ENDIF()
ENDFOREACH()
ENDIF()
ENDFUNCTION()
# A test to check whether a downstream project set the C++ compiler to NVCC or not # A test to check whether a downstream project set the C++ compiler to NVCC or not
# this is called only when Kokkos was installed with Kokkos_ENABLE_CUDA=ON # this is called only when Kokkos was installed with Kokkos_ENABLE_CUDA=ON
FUNCTION(kokkos_compiler_is_nvcc VAR COMPILER) FUNCTION(kokkos_compiler_is_nvcc VAR COMPILER)
@ -159,3 +114,161 @@ FUNCTION(kokkos_compiler_is_nvcc VAR COMPILER)
ENDIF() ENDIF()
ENDFUNCTION() ENDFUNCTION()
# this function checks whether the current CXX compiler supports building CUDA
FUNCTION(kokkos_cxx_compiler_cuda_test _VAR _COMPILER)
FILE(WRITE ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
"
#include <cuda.h>
#include <cstdlib>
__global__
void kernel(int sz, double* data)
{
int _beg = blockIdx.x * blockDim.x + threadIdx.x;
for(int i = _beg; i < sz; ++i)
data[i] += static_cast<double>(i);
}
int main()
{
double* data = NULL;
int blocks = 64;
int grids = 64;
int ret = cudaMalloc(&data, blocks * grids * sizeof(double));
if(ret != cudaSuccess)
return EXIT_FAILURE;
kernel<<<grids, blocks>>>(blocks * grids, data);
cudaDeviceSynchronize();
return EXIT_SUCCESS;
}
")
# save the command for debugging
SET(_COMMANDS "${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu")
# use execute_process instead of try compile because we want to set custom compiler
EXECUTE_PROCESS(COMMAND ${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
RESULT_VARIABLE _RET
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
TIMEOUT 15
OUTPUT_QUIET
ERROR_QUIET)
IF(NOT _RET EQUAL 0)
# save the command for debugging
SET(_COMMANDS "${_COMMAND}\n${_COMPILER} --cuda-gpu-arch=sm_35 ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu")
# try the compile test again with clang arguments
EXECUTE_PROCESS(COMMAND ${_COMPILER} --cuda-gpu-arch=sm_35 -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
RESULT_VARIABLE _RET
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
TIMEOUT 15
OUTPUT_QUIET
ERROR_QUIET)
ENDIF()
SET(${_VAR}_COMMANDS "${_COMMANDS}" PARENT_SCOPE)
SET(${_VAR} ${_RET} PARENT_SCOPE)
ENDFUNCTION()
# this function is provided to easily select which files use the same compiler as Kokkos
# when it was installed (or nvcc_wrapper):
#
# GLOBAL --> all files
# TARGET --> all files in a target
# SOURCE --> specific source files
# DIRECTORY --> all files in directory
# PROJECT --> all files/targets in a project/subproject
#
# Use the COMPILER argument to specify a compiler, if needed. By default, it will
# set the values to ${Kokkos_CXX_COMPILER} unless Kokkos_ENABLE_CUDA=ON and
# Kokkos_CXX_COMPILER_ID is NVIDIA, then it will set it to nvcc_wrapper
#
# Use CHECK_CUDA_COMPILES to run a check when CUDA is enabled
#
FUNCTION(kokkos_compilation)
CMAKE_PARSE_ARGUMENTS(COMP
"GLOBAL;PROJECT;CHECK_CUDA_COMPILES"
"COMPILER"
"DIRECTORY;TARGET;SOURCE;COMMAND_PREFIX"
${ARGN})
# if built w/o CUDA support, we want to basically make this a no-op
SET(_Kokkos_ENABLE_CUDA @Kokkos_ENABLE_CUDA@)
# search relative first and then absolute
SET(_HINTS "${CMAKE_CURRENT_LIST_DIR}/../.." "@CMAKE_INSTALL_PREFIX@")
# find kokkos_launch_compiler
FIND_PROGRAM(Kokkos_COMPILE_LAUNCHER
NAMES kokkos_launch_compiler
HINTS ${_HINTS}
PATHS ${_HINTS}
PATH_SUFFIXES bin)
IF(NOT Kokkos_COMPILE_LAUNCHER)
MESSAGE(FATAL_ERROR "Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'")
ENDIF()
# if COMPILER was not specified, assume Kokkos_CXX_COMPILER
IF(NOT COMP_COMPILER)
SET(COMP_COMPILER ${Kokkos_CXX_COMPILER})
IF(_Kokkos_ENABLE_CUDA AND Kokkos_CXX_COMPILER_ID STREQUAL NVIDIA)
# find nvcc_wrapper
FIND_PROGRAM(Kokkos_NVCC_WRAPPER
NAMES nvcc_wrapper
HINTS ${_HINTS}
PATHS ${_HINTS}
PATH_SUFFIXES bin)
# fatal if we can't nvcc_wrapper
IF(NOT Kokkos_NVCC_WRAPPER)
MESSAGE(FATAL_ERROR "Kokkos could not find nvcc_wrapper. Please set '-DKokkos_NVCC_WRAPPER=/path/to/nvcc_wrapper'")
ENDIF()
SET(COMP_COMPILER ${Kokkos_NVCC_WRAPPER})
ENDIF()
ENDIF()
# check that the original compiler still exists!
IF(NOT EXISTS ${COMP_COMPILER})
MESSAGE(FATAL_ERROR "Kokkos could not find original compiler: '${COMP_COMPILER}'")
ENDIF()
# try to ensure that compiling cuda code works!
IF(_Kokkos_ENABLE_CUDA AND COMP_CHECK_CUDA_COMPILES)
# this may fail if kokkos_compiler launcher was used during install
kokkos_cxx_compiler_cuda_test(_COMPILES_CUDA
${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER})
# if above failed, throw an error
IF(NOT _COMPILES_CUDA)
MESSAGE(FATAL_ERROR "kokkos_cxx_compiler_cuda_test failed! Test commands:\n${_COMPILES_CUDA_COMMANDS}")
ENDIF()
ENDIF()
IF(COMP_COMMAND_PREFIX)
SET(_PREFIX "${COMP_COMMAND_PREFIX}")
STRING(REPLACE ";" " " _PREFIX "${COMP_COMMAND_PREFIX}")
SET(Kokkos_COMPILER_LAUNCHER "${_PREFIX} ${Kokkos_COMPILE_LAUNCHER}")
ENDIF()
IF(COMP_GLOBAL)
# if global, don't bother setting others
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
ELSE()
FOREACH(_TYPE PROJECT DIRECTORY TARGET SOURCE)
# make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...)
IF("${_TYPE}" STREQUAL "PROJECT" AND COMP_${_TYPE})
LIST(APPEND COMP_DIRECTORY ${PROJECT_SOURCE_DIR})
UNSET(COMP_${_TYPE})
ENDIF()
# set the properties if defined
IF(COMP_${_TYPE})
# MESSAGE(STATUS "Using ${COMP_COMPILER} :: ${_TYPE} :: ${COMP_${_TYPE}}")
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
ENDIF()
ENDFOREACH()
ENDIF()
ENDFUNCTION()

View File

@ -78,6 +78,7 @@
#cmakedefine KOKKOS_ARCH_POWER7 #cmakedefine KOKKOS_ARCH_POWER7
#cmakedefine KOKKOS_ARCH_POWER8 #cmakedefine KOKKOS_ARCH_POWER8
#cmakedefine KOKKOS_ARCH_POWER9 #cmakedefine KOKKOS_ARCH_POWER9
#cmakedefine KOKKOS_ARCH_INTEL_GEN
#cmakedefine KOKKOS_ARCH_KEPLER #cmakedefine KOKKOS_ARCH_KEPLER
#cmakedefine KOKKOS_ARCH_KEPLER30 #cmakedefine KOKKOS_ARCH_KEPLER30
#cmakedefine KOKKOS_ARCH_KEPLER32 #cmakedefine KOKKOS_ARCH_KEPLER32
@ -95,5 +96,8 @@
#cmakedefine KOKKOS_ARCH_VOLTA72 #cmakedefine KOKKOS_ARCH_VOLTA72
#cmakedefine KOKKOS_ARCH_TURING75 #cmakedefine KOKKOS_ARCH_TURING75
#cmakedefine KOKKOS_ARCH_AMPERE80 #cmakedefine KOKKOS_ARCH_AMPERE80
#cmakedefine KOKKOS_ARCH_AMPERE86
#cmakedefine KOKKOS_ARCH_AMD_ZEN #cmakedefine KOKKOS_ARCH_AMD_ZEN
#cmakedefine KOKKOS_ARCH_AMD_ZEN2 #cmakedefine KOKKOS_ARCH_AMD_ZEN2
#cmakedefine KOKKOS_IMPL_DISABLE_SYCL_DEVICE_PRINTF

View File

@ -481,76 +481,6 @@ if(CMAKE_CUDA_COMPILER_LOADED AND NOT CUDAToolkit_BIN_DIR AND CMAKE_CUDA_COMPILE
unset(cuda_dir) unset(cuda_dir)
endif() endif()
IF(CMAKE_VERSION VERSION_LESS "3.12.0")
function(import_target_link_libraries target)
cmake_parse_arguments(HACK
"SYSTEM;INTERFACE;PUBLIC"
""
""
${ARGN}
)
get_target_property(LIBS ${target} INTERFACE_LINK_LIBRARIES)
if (LIBS)
list(APPEND LIBS ${HACK_UNPARSED_ARGUMENTS})
else()
set(LIBS ${HACK_UNPARSED_ARGUMENTS})
endif()
set_target_properties(${target} PROPERTIES
INTERFACE_LINK_LIBRARIES "${LIBS}")
endfunction()
ELSE()
function(import_target_link_libraries)
target_link_libraries(${ARGN})
endfunction()
ENDIF()
IF(CMAKE_VERSION VERSION_LESS "3.13.0")
function(import_target_link_directories target)
cmake_parse_arguments(HACK
"SYSTEM;INTERFACE;PUBLIC"
""
""
${ARGN}
)
get_target_property(LINK_LIBS ${target} INTERFACE_LINK_LIBRARIES)
if (LINK_LIBS) #could be not-found
set(LINK_LIBS_LIST ${LINK_LIBS})
endif()
foreach(LIB ${HACK_UNPARSED_ARGUMENTS})
list(APPEND LINK_LIBS_LIST -L${LIB})
endforeach()
set_target_properties(${target} PROPERTIES
INTERFACE_LINK_LIBRARIES "${LINK_LIBS_LIST}")
endfunction()
ELSE()
function(import_target_link_directories)
target_link_directories(${ARGN})
endfunction()
ENDIF()
IF(CMAKE_VERSION VERSION_LESS "3.12.0")
function(import_target_include_directories target)
cmake_parse_arguments(HACK
"SYSTEM;INTERFACE;PUBLIC"
""
""
${ARGN}
)
get_target_property(INLUDE_DIRS ${target} INTERFACE_INCLUDE_DIRECTORIES)
if (INCLUDE_DIRS)
list(APPEND INCLUDE_DIRS ${HACK_UNPARSED_ARGUMENTS})
else()
set(INCLUDE_DIRS ${HACK_UNPARSED_ARGUMENTS})
endif()
set_target_properties(${target} PROPERTIES
INTERFACE_INCLUDE_DIRECTORIES "${INCLUDE_DIRS}")
endfunction()
ELSE()
function(import_target_include_directories)
target_include_directories(${ARGN})
endfunction()
ENDIF()
# Try language- or user-provided path first. # Try language- or user-provided path first.
if(CUDAToolkit_BIN_DIR) if(CUDAToolkit_BIN_DIR)
find_program(CUDAToolkit_NVCC_EXECUTABLE find_program(CUDAToolkit_NVCC_EXECUTABLE
@ -854,11 +784,11 @@ if(CUDAToolkit_FOUND)
if (NOT TARGET CUDA::${lib_name} AND CUDA_${lib_name}_LIBRARY) if (NOT TARGET CUDA::${lib_name} AND CUDA_${lib_name}_LIBRARY)
add_library(CUDA::${lib_name} IMPORTED INTERFACE) add_library(CUDA::${lib_name} IMPORTED INTERFACE)
import_target_include_directories(CUDA::${lib_name} SYSTEM INTERFACE "${CUDAToolkit_INCLUDE_DIRS}") target_include_directories(CUDA::${lib_name} SYSTEM INTERFACE "${CUDAToolkit_INCLUDE_DIRS}")
import_target_link_libraries(CUDA::${lib_name} INTERFACE "${CUDA_${lib_name}_LIBRARY}") target_link_libraries(CUDA::${lib_name} INTERFACE "${CUDA_${lib_name}_LIBRARY}")
foreach(dep ${arg_DEPS}) foreach(dep ${arg_DEPS})
if(TARGET CUDA::${dep}) if(TARGET CUDA::${dep})
import_target_link_libraries(CUDA::${lib_name} INTERFACE CUDA::${dep}) target_link_libraries(CUDA::${lib_name} INTERFACE CUDA::${dep})
endif() endif()
endforeach() endforeach()
endif() endif()
@ -866,8 +796,8 @@ if(CUDAToolkit_FOUND)
if(NOT TARGET CUDA::toolkit) if(NOT TARGET CUDA::toolkit)
add_library(CUDA::toolkit IMPORTED INTERFACE) add_library(CUDA::toolkit IMPORTED INTERFACE)
import_target_include_directories(CUDA::toolkit SYSTEM INTERFACE "${CUDAToolkit_INCLUDE_DIRS}") target_include_directories(CUDA::toolkit SYSTEM INTERFACE "${CUDAToolkit_INCLUDE_DIRS}")
import_target_link_directories(CUDA::toolkit INTERFACE "${CUDAToolkit_LIBRARY_DIR}") target_link_directories(CUDA::toolkit INTERFACE "${CUDAToolkit_LIBRARY_DIR}")
endif() endif()
_CUDAToolkit_find_and_add_import_lib(cuda_driver ALT cuda) _CUDAToolkit_find_and_add_import_lib(cuda_driver ALT cuda)
@ -882,11 +812,11 @@ if(CUDAToolkit_FOUND)
AND TARGET CUDA::cudart_static) AND TARGET CUDA::cudart_static)
add_library(CUDA::cudart_static_deps IMPORTED INTERFACE) add_library(CUDA::cudart_static_deps IMPORTED INTERFACE)
import_target_link_libraries(CUDA::cudart_static INTERFACE CUDA::cudart_static_deps) target_link_libraries(CUDA::cudart_static INTERFACE CUDA::cudart_static_deps)
if(UNIX AND (CMAKE_C_COMPILER OR CMAKE_CXX_COMPILER)) if(UNIX AND (CMAKE_C_COMPILER OR CMAKE_CXX_COMPILER))
find_package(Threads REQUIRED) find_package(Threads REQUIRED)
import_target_link_libraries(CUDA::cudart_static_deps INTERFACE Threads::Threads ${CMAKE_DL_LIBS}) target_link_libraries(CUDA::cudart_static_deps INTERFACE Threads::Threads ${CMAKE_DL_LIBS})
endif() endif()
if(UNIX AND NOT APPLE) if(UNIX AND NOT APPLE)
@ -896,7 +826,7 @@ if(CUDAToolkit_FOUND)
if(NOT CUDAToolkit_rt_LIBRARY) if(NOT CUDAToolkit_rt_LIBRARY)
message(WARNING "Could not find librt library, needed by CUDA::cudart_static") message(WARNING "Could not find librt library, needed by CUDA::cudart_static")
else() else()
import_target_link_libraries(CUDA::cudart_static_deps INTERFACE ${CUDAToolkit_rt_LIBRARY}) target_link_libraries(CUDA::cudart_static_deps INTERFACE ${CUDAToolkit_rt_LIBRARY})
endif() endif()
endif() endif()
endif() endif()

View File

@ -25,7 +25,7 @@ IF (TARGET CUDA::cuda_driver)
SET(FOUND_CUDA_DRIVER TRUE) SET(FOUND_CUDA_DRIVER TRUE)
KOKKOS_EXPORT_IMPORTED_TPL(CUDA::cuda_driver) KOKKOS_EXPORT_IMPORTED_TPL(CUDA::cuda_driver)
ELSE() ELSE()
SET(FOUND_CUDA_DRIVVER FALSE) SET(FOUND_CUDA_DRIVER FALSE)
ENDIF() ENDIF()
include(FindPackageHandleStandardArgs) include(FindPackageHandleStandardArgs)

View File

@ -10,7 +10,7 @@ TRY_COMPILE(KOKKOS_HAS_PTHREAD_ARG
# ${CMAKE_CXX${KOKKOS_CXX_STANDARD}_STANDARD_COMPILE_OPTION} # ${CMAKE_CXX${KOKKOS_CXX_STANDARD}_STANDARD_COMPILE_OPTION}
INCLUDE(FindPackageHandleStandardArgs) INCLUDE(FindPackageHandleStandardArgs)
FIND_PACKAGE_HANDLE_STANDARD_ARGS(PTHREAD DEFAULT_MSG KOKKOS_HAS_PTHREAD_ARG) FIND_PACKAGE_HANDLE_STANDARD_ARGS(TPLPTHREAD DEFAULT_MSG KOKKOS_HAS_PTHREAD_ARG)
#Only create the TPL if we succeed #Only create the TPL if we succeed
IF (KOKKOS_HAS_PTHREAD_ARG) IF (KOKKOS_HAS_PTHREAD_ARG)
KOKKOS_CREATE_IMPORTED_TPL(PTHREAD KOKKOS_CREATE_IMPORTED_TPL(PTHREAD

View File

@ -0,0 +1,11 @@
include(FindPackageHandleStandardArgs)
FIND_LIBRARY(AMD_HIP_LIBRARY amdhip64 PATHS ENV ROCM_PATH PATH_SUFFIXES lib)
FIND_LIBRARY(HSA_RUNTIME_LIBRARY hsa-runtime64 PATHS ENV ROCM_PATH PATH_SUFFIXES lib)
find_package_handle_standard_args(TPLROCM DEFAULT_MSG AMD_HIP_LIBRARY HSA_RUNTIME_LIBRARY)
kokkos_create_imported_tpl(ROCM INTERFACE
LINK_LIBRARIES ${HSA_RUNTIME_LIBRARY} ${AMD_HIP_LIBRARY}
COMPILE_DEFINITIONS __HIP_ROCclr__
)

View File

@ -0,0 +1,8 @@
#include <type_traits>
int main() {
// _t versions of type traits were added in C++14
std::remove_cv_t<int> i = 0;
return i;
}

View File

@ -72,6 +72,7 @@ int main() {
case 72: std::cout << "Set -DKokkos_ARCH_VOLTA72=ON ." << std::endl; break; case 72: std::cout << "Set -DKokkos_ARCH_VOLTA72=ON ." << std::endl; break;
case 75: std::cout << "Set -DKokkos_ARCH_TURING75=ON ." << std::endl; break; case 75: std::cout << "Set -DKokkos_ARCH_TURING75=ON ." << std::endl; break;
case 80: std::cout << "Set -DKokkos_ARCH_AMPERE80=ON ." << std::endl; break; case 80: std::cout << "Set -DKokkos_ARCH_AMPERE80=ON ." << std::endl; break;
case 86: std::cout << "Set -DKokkos_ARCH_AMPERE86=ON ." << std::endl; break;
default: default:
std::cout << "Compute capability " << compute_capability std::cout << "Compute capability " << compute_capability
<< " is not supported" << std::endl; << " is not supported" << std::endl;

View File

@ -2,7 +2,7 @@
void* kokkos_test(void* args) { return args; } void* kokkos_test(void* args) { return args; }
int main(void) { int main() {
pthread_t thread; pthread_t thread;
/* Use NULL to avoid C++11. Some compilers /* Use NULL to avoid C++11. Some compilers
do not have C++11 by default. Forcing C++11 do not have C++11 by default. Forcing C++11

View File

@ -81,10 +81,16 @@ ENDMACRO()
FUNCTION(KOKKOS_ADD_TEST) FUNCTION(KOKKOS_ADD_TEST)
if (KOKKOS_HAS_TRILINOS) if (KOKKOS_HAS_TRILINOS)
CMAKE_PARSE_ARGUMENTS(TEST CMAKE_PARSE_ARGUMENTS(TEST
"" "SKIP_TRIBITS"
"EXE;NAME;TOOL" "EXE;NAME;TOOL"
"ARGS" "ARGS"
${ARGN}) ${ARGN})
IF(TEST_SKIP_TRIBITS)
MESSAGE(STATUS "Skipping test ${TEST_NAME} in TriBits")
RETURN()
ENDIF()
IF(TEST_EXE) IF(TEST_EXE)
SET(EXE_ROOT ${TEST_EXE}) SET(EXE_ROOT ${TEST_EXE})
ELSE() ELSE()
@ -119,11 +125,10 @@ FUNCTION(KOKKOS_ADD_TEST)
endif() endif()
else() else()
CMAKE_PARSE_ARGUMENTS(TEST CMAKE_PARSE_ARGUMENTS(TEST
"WILL_FAIL" "WILL_FAIL;SKIP_TRIBITS"
"FAIL_REGULAR_EXPRESSION;PASS_REGULAR_EXPRESSION;EXE;NAME;TOOL" "FAIL_REGULAR_EXPRESSION;PASS_REGULAR_EXPRESSION;EXE;NAME;TOOL"
"CATEGORIES;ARGS" "CATEGORIES;ARGS"
${ARGN}) ${ARGN})
SET(TESTS_ADDED)
# To match Tribits, we should always be receiving # To match Tribits, we should always be receiving
# the root names of exes/libs # the root names of exes/libs
IF(TEST_EXE) IF(TEST_EXE)
@ -135,32 +140,12 @@ FUNCTION(KOKKOS_ADD_TEST)
# These should be the full target name # These should be the full target name
SET(TEST_NAME ${PACKAGE_NAME}_${TEST_NAME}) SET(TEST_NAME ${PACKAGE_NAME}_${TEST_NAME})
SET(EXE ${PACKAGE_NAME}_${EXE_ROOT}) SET(EXE ${PACKAGE_NAME}_${EXE_ROOT})
IF (TEST_ARGS)
SET(TEST_NUMBER 0)
FOREACH (ARG_STR ${TEST_ARGS})
# This is passed as a single string blob to match TriBITS behavior
# We need this to be turned into a list
STRING(REPLACE " " ";" ARG_STR_LIST ${ARG_STR})
IF(WIN32)
ADD_TEST(NAME ${TEST_NAME}${TEST_NUMBER} WORKING_DIRECTORY ${LIBRARY_OUTPUT_PATH}
COMMAND ${EXE}${CMAKE_EXECUTABLE_SUFFIX} ${ARG_STR_LIST})
ELSE()
ADD_TEST(NAME ${TEST_NAME}${TEST_NUMBER} COMMAND ${EXE} ${ARG_STR_LIST})
ENDIF()
LIST(APPEND TESTS_ADDED "${TEST_NAME}${TEST_NUMBER}")
MATH(EXPR TEST_NUMBER "${TEST_NUMBER} + 1")
ENDFOREACH()
ELSE()
IF(WIN32) IF(WIN32)
ADD_TEST(NAME ${TEST_NAME} WORKING_DIRECTORY ${LIBRARY_OUTPUT_PATH} ADD_TEST(NAME ${TEST_NAME} WORKING_DIRECTORY ${LIBRARY_OUTPUT_PATH}
COMMAND ${EXE}${CMAKE_EXECUTABLE_SUFFIX}) COMMAND ${EXE}${CMAKE_EXECUTABLE_SUFFIX} ${TEST_ARGS})
ELSE() ELSE()
ADD_TEST(NAME ${TEST_NAME} COMMAND ${EXE}) ADD_TEST(NAME ${TEST_NAME} COMMAND ${EXE} ${TEST_ARGS})
ENDIF() ENDIF()
LIST(APPEND TESTS_ADDED "${TEST_NAME}")
ENDIF()
FOREACH(TEST_NAME ${TESTS_ADDED})
IF(TEST_WILL_FAIL) IF(TEST_WILL_FAIL)
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES WILL_FAIL ${TEST_WILL_FAIL}) SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES WILL_FAIL ${TEST_WILL_FAIL})
ENDIF() ENDIF()
@ -170,13 +155,12 @@ FUNCTION(KOKKOS_ADD_TEST)
IF(TEST_PASS_REGULAR_EXPRESSION) IF(TEST_PASS_REGULAR_EXPRESSION)
SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES PASS_REGULAR_EXPRESSION ${TEST_PASS_REGULAR_EXPRESSION}) SET_TESTS_PROPERTIES(${TEST_NAME} PROPERTIES PASS_REGULAR_EXPRESSION ${TEST_PASS_REGULAR_EXPRESSION})
ENDIF() ENDIF()
if(TEST_TOOL) IF(TEST_TOOL)
add_dependencies(${EXE} ${TEST_TOOL}) #make sure the exe has to build the tool ADD_DEPENDENCIES(${EXE} ${TEST_TOOL}) #make sure the exe has to build the tool
set_property(TEST ${TEST_NAME} APPEND_STRING PROPERTY ENVIRONMENT "KOKKOS_PROFILE_LIBRARY=$<TARGET_FILE:${TEST_TOOL}>") SET_PROPERTY(TEST ${TEST_NAME} APPEND_STRING PROPERTY ENVIRONMENT "KOKKOS_PROFILE_LIBRARY=$<TARGET_FILE:${TEST_TOOL}>")
endif() ENDIF()
ENDFOREACH()
VERIFY_EMPTY(KOKKOS_ADD_TEST ${TEST_UNPARSED_ARGUMENTS}) VERIFY_EMPTY(KOKKOS_ADD_TEST ${TEST_UNPARSED_ARGUMENTS})
endif() ENDIF()
ENDFUNCTION() ENDFUNCTION()
FUNCTION(KOKKOS_ADD_ADVANCED_TEST) FUNCTION(KOKKOS_ADD_ADVANCED_TEST)
@ -326,14 +310,6 @@ ENDIF()
ENDFUNCTION() ENDFUNCTION()
FUNCTION(KOKKOS_TARGET_COMPILE_DEFINITIONS)
IF (KOKKOS_HAS_TRILINOS)
TARGET_COMPILE_DEFINITIONS(${TARGET} ${ARGN})
ELSE()
TARGET_COMPILE_DEFINITIONS(${TARGET} ${ARGN})
ENDIF()
ENDFUNCTION()
FUNCTION(KOKKOS_INCLUDE_DIRECTORIES) FUNCTION(KOKKOS_INCLUDE_DIRECTORIES)
IF(KOKKOS_HAS_TRILINOS) IF(KOKKOS_HAS_TRILINOS)
TRIBITS_INCLUDE_DIRECTORIES(${ARGN}) TRIBITS_INCLUDE_DIRECTORIES(${ARGN})
@ -350,10 +326,6 @@ ENDIF()
ENDFUNCTION() ENDFUNCTION()
MACRO(KOKKOS_ADD_COMPILE_OPTIONS)
ADD_COMPILE_OPTIONS(${ARGN})
ENDMACRO()
MACRO(PRINTALL match) MACRO(PRINTALL match)
get_cmake_property(_variableNames VARIABLES) get_cmake_property(_variableNames VARIABLES)
list (SORT _variableNames) list (SORT _variableNames)
@ -376,4 +348,3 @@ FUNCTION(GLOBAL_APPEND VARNAME)
LIST(APPEND TEMP ${ARGN}) LIST(APPEND TEMP ${ARGN})
GLOBAL_SET(${VARNAME} ${TEMP}) GLOBAL_SET(${VARNAME} ${TEMP})
ENDFUNCTION() ENDFUNCTION()

View File

@ -35,7 +35,7 @@ KOKKOS_ARCH_OPTION(ARMV80 HOST "ARMv8.0 Compatible CPU")
KOKKOS_ARCH_OPTION(ARMV81 HOST "ARMv8.1 Compatible CPU") KOKKOS_ARCH_OPTION(ARMV81 HOST "ARMv8.1 Compatible CPU")
KOKKOS_ARCH_OPTION(ARMV8_THUNDERX HOST "ARMv8 Cavium ThunderX CPU") KOKKOS_ARCH_OPTION(ARMV8_THUNDERX HOST "ARMv8 Cavium ThunderX CPU")
KOKKOS_ARCH_OPTION(ARMV8_THUNDERX2 HOST "ARMv8 Cavium ThunderX2 CPU") KOKKOS_ARCH_OPTION(ARMV8_THUNDERX2 HOST "ARMv8 Cavium ThunderX2 CPU")
KOKKOS_ARCH_OPTION(A64FX HOST "ARMv8.2 with SVE Suport") KOKKOS_ARCH_OPTION(A64FX HOST "ARMv8.2 with SVE Support")
KOKKOS_ARCH_OPTION(WSM HOST "Intel Westmere CPU") KOKKOS_ARCH_OPTION(WSM HOST "Intel Westmere CPU")
KOKKOS_ARCH_OPTION(SNB HOST "Intel Sandy/Ivy Bridge CPUs") KOKKOS_ARCH_OPTION(SNB HOST "Intel Sandy/Ivy Bridge CPUs")
KOKKOS_ARCH_OPTION(HSW HOST "Intel Haswell CPUs") KOKKOS_ARCH_OPTION(HSW HOST "Intel Haswell CPUs")
@ -60,11 +60,12 @@ KOKKOS_ARCH_OPTION(VOLTA70 GPU "NVIDIA Volta generation CC 7.0")
KOKKOS_ARCH_OPTION(VOLTA72 GPU "NVIDIA Volta generation CC 7.2") KOKKOS_ARCH_OPTION(VOLTA72 GPU "NVIDIA Volta generation CC 7.2")
KOKKOS_ARCH_OPTION(TURING75 GPU "NVIDIA Turing generation CC 7.5") KOKKOS_ARCH_OPTION(TURING75 GPU "NVIDIA Turing generation CC 7.5")
KOKKOS_ARCH_OPTION(AMPERE80 GPU "NVIDIA Ampere generation CC 8.0") KOKKOS_ARCH_OPTION(AMPERE80 GPU "NVIDIA Ampere generation CC 8.0")
KOKKOS_ARCH_OPTION(AMPERE86 GPU "NVIDIA Ampere generation CC 8.6")
KOKKOS_ARCH_OPTION(ZEN HOST "AMD Zen architecture") KOKKOS_ARCH_OPTION(ZEN HOST "AMD Zen architecture")
KOKKOS_ARCH_OPTION(ZEN2 HOST "AMD Zen2 architecture") KOKKOS_ARCH_OPTION(ZEN2 HOST "AMD Zen2 architecture")
KOKKOS_ARCH_OPTION(VEGA900 GPU "AMD GPU MI25 GFX900") KOKKOS_ARCH_OPTION(VEGA900 GPU "AMD GPU MI25 GFX900")
KOKKOS_ARCH_OPTION(VEGA906 GPU "AMD GPU MI50/MI60 GFX906") KOKKOS_ARCH_OPTION(VEGA906 GPU "AMD GPU MI50/MI60 GFX906")
KOKKOS_ARCH_OPTION(VEGA908 GPU "AMD GPU") KOKKOS_ARCH_OPTION(VEGA908 GPU "AMD GPU MI100 GFX908")
KOKKOS_ARCH_OPTION(INTEL_GEN GPU "Intel GPUs Gen9+") KOKKOS_ARCH_OPTION(INTEL_GEN GPU "Intel GPUs Gen9+")
@ -141,8 +142,16 @@ ENDIF()
#------------------------------- KOKKOS_HIP_OPTIONS --------------------------- #------------------------------- KOKKOS_HIP_OPTIONS ---------------------------
#clear anything that might be in the cache #clear anything that might be in the cache
GLOBAL_SET(KOKKOS_AMDGPU_OPTIONS) GLOBAL_SET(KOKKOS_AMDGPU_OPTIONS)
IF(KOKKOS_CXX_COMPILER_ID STREQUAL HIP) IF(KOKKOS_ENABLE_HIP)
IF(KOKKOS_CXX_COMPILER_ID STREQUAL HIPCC)
SET(AMDGPU_ARCH_FLAG "--amdgpu-target") SET(AMDGPU_ARCH_FLAG "--amdgpu-target")
ELSE()
SET(AMDGPU_ARCH_FLAG "--offload-arch")
GLOBAL_APPEND(KOKKOS_AMDGPU_OPTIONS -x hip)
IF(DEFINED ENV{ROCM_PATH})
GLOBAL_APPEND(KOKKOS_AMDGPU_OPTIONS --rocm-path=$ENV{ROCM_PATH})
ENDIF()
ENDIF()
ENDIF() ENDIF()
@ -183,6 +192,8 @@ ENDIF()
IF (KOKKOS_ARCH_A64FX) IF (KOKKOS_ARCH_A64FX)
COMPILER_SPECIFIC_FLAGS( COMPILER_SPECIFIC_FLAGS(
DEFAULT -march=armv8.2-a+sve DEFAULT -march=armv8.2-a+sve
Clang -march=armv8.2-a+sve -msve-vector-bits=512
GCC -march=armv8.2-a+sve -msve-vector-bits=512
) )
ENDIF() ENDIF()
@ -309,7 +320,7 @@ IF (KOKKOS_ARCH_POWER8 OR KOKKOS_ARCH_POWER9)
SET(KOKKOS_USE_ISA_POWERPCLE ON) SET(KOKKOS_USE_ISA_POWERPCLE ON)
ENDIF() ENDIF()
IF (Kokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE) IF (KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE)
COMPILER_SPECIFIC_FLAGS( COMPILER_SPECIFIC_FLAGS(
Clang -fcuda-rdc Clang -fcuda-rdc
NVIDIA --relocatable-device-code=true NVIDIA --relocatable-device-code=true
@ -333,8 +344,8 @@ ENDIF()
#Right now we cannot get the compiler ID when cross-compiling, so just check #Right now we cannot get the compiler ID when cross-compiling, so just check
#that HIP is enabled #that HIP is enabled
IF (Kokkos_ENABLE_HIP) IF (KOKKOS_ENABLE_HIP)
IF (Kokkos_ENABLE_HIP_RELOCATABLE_DEVICE_CODE) IF (KOKKOS_ENABLE_HIP_RELOCATABLE_DEVICE_CODE)
COMPILER_SPECIFIC_FLAGS( COMPILER_SPECIFIC_FLAGS(
DEFAULT -fgpu-rdc DEFAULT -fgpu-rdc
) )
@ -345,8 +356,7 @@ IF (Kokkos_ENABLE_HIP)
ENDIF() ENDIF()
ENDIF() ENDIF()
IF (KOKKOS_ENABLE_SYCL)
IF (Kokkos_ENABLE_SYCL)
COMPILER_SPECIFIC_FLAGS( COMPILER_SPECIFIC_FLAGS(
DEFAULT -fsycl DEFAULT -fsycl
) )
@ -363,7 +373,7 @@ FUNCTION(CHECK_CUDA_ARCH ARCH FLAG)
MESSAGE(FATAL_ERROR "Multiple GPU architectures given! Already have ${CUDA_ARCH_ALREADY_SPECIFIED}, but trying to add ${ARCH}. If you are re-running CMake, try clearing the cache and running again.") MESSAGE(FATAL_ERROR "Multiple GPU architectures given! Already have ${CUDA_ARCH_ALREADY_SPECIFIED}, but trying to add ${ARCH}. If you are re-running CMake, try clearing the cache and running again.")
ENDIF() ENDIF()
SET(CUDA_ARCH_ALREADY_SPECIFIED ${ARCH} PARENT_SCOPE) SET(CUDA_ARCH_ALREADY_SPECIFIED ${ARCH} PARENT_SCOPE)
IF (NOT KOKKOS_ENABLE_CUDA AND NOT KOKKOS_ENABLE_OPENMPTARGET) IF (NOT KOKKOS_ENABLE_CUDA AND NOT KOKKOS_ENABLE_OPENMPTARGET AND NOT KOKKOS_ENABLE_SYCL)
MESSAGE(WARNING "Given CUDA arch ${ARCH}, but Kokkos_ENABLE_CUDA and Kokkos_ENABLE_OPENMPTARGET are OFF. Option will be ignored.") MESSAGE(WARNING "Given CUDA arch ${ARCH}, but Kokkos_ENABLE_CUDA and Kokkos_ENABLE_OPENMPTARGET are OFF. Option will be ignored.")
UNSET(KOKKOS_ARCH_${ARCH} PARENT_SCOPE) UNSET(KOKKOS_ARCH_${ARCH} PARENT_SCOPE)
ELSE() ELSE()
@ -396,6 +406,7 @@ CHECK_CUDA_ARCH(VOLTA70 sm_70)
CHECK_CUDA_ARCH(VOLTA72 sm_72) CHECK_CUDA_ARCH(VOLTA72 sm_72)
CHECK_CUDA_ARCH(TURING75 sm_75) CHECK_CUDA_ARCH(TURING75 sm_75)
CHECK_CUDA_ARCH(AMPERE80 sm_80) CHECK_CUDA_ARCH(AMPERE80 sm_80)
CHECK_CUDA_ARCH(AMPERE86 sm_86)
SET(AMDGPU_ARCH_ALREADY_SPECIFIED "") SET(AMDGPU_ARCH_ALREADY_SPECIFIED "")
FUNCTION(CHECK_AMDGPU_ARCH ARCH FLAG) FUNCTION(CHECK_AMDGPU_ARCH ARCH FLAG)
@ -405,12 +416,12 @@ FUNCTION(CHECK_AMDGPU_ARCH ARCH FLAG)
ENDIF() ENDIF()
SET(AMDGPU_ARCH_ALREADY_SPECIFIED ${ARCH} PARENT_SCOPE) SET(AMDGPU_ARCH_ALREADY_SPECIFIED ${ARCH} PARENT_SCOPE)
IF (NOT KOKKOS_ENABLE_HIP AND NOT KOKKOS_ENABLE_OPENMPTARGET) IF (NOT KOKKOS_ENABLE_HIP AND NOT KOKKOS_ENABLE_OPENMPTARGET)
MESSAGE(WARNING "Given HIP arch ${ARCH}, but Kokkos_ENABLE_AMDGPU and Kokkos_ENABLE_OPENMPTARGET are OFF. Option will be ignored.") MESSAGE(WARNING "Given AMD GPU architecture ${ARCH}, but Kokkos_ENABLE_HIP and Kokkos_ENABLE_OPENMPTARGET are OFF. Option will be ignored.")
UNSET(KOKKOS_ARCH_${ARCH} PARENT_SCOPE) UNSET(KOKKOS_ARCH_${ARCH} PARENT_SCOPE)
ELSE() ELSE()
SET(KOKKOS_AMDGPU_ARCH_FLAG ${FLAG} PARENT_SCOPE) SET(KOKKOS_AMDGPU_ARCH_FLAG ${FLAG} PARENT_SCOPE)
GLOBAL_APPEND(KOKKOS_AMDGPU_OPTIONS "${AMDGPU_ARCH_FLAG}=${FLAG}") GLOBAL_APPEND(KOKKOS_AMDGPU_OPTIONS "${AMDGPU_ARCH_FLAG}=${FLAG}")
IF(KOKKOS_ENABLE_HIP) IF(KOKKOS_ENABLE_HIP_RELOCATABLE_DEVICE_CODE)
GLOBAL_APPEND(KOKKOS_LINK_OPTIONS "${AMDGPU_ARCH_FLAG}=${FLAG}") GLOBAL_APPEND(KOKKOS_LINK_OPTIONS "${AMDGPU_ARCH_FLAG}=${FLAG}")
ENDIF() ENDIF()
ENDIF() ENDIF()
@ -451,6 +462,24 @@ IF (KOKKOS_ENABLE_OPENMPTARGET)
ENDIF() ENDIF()
ENDIF() ENDIF()
IF (KOKKOS_ENABLE_SYCL)
IF(CUDA_ARCH_ALREADY_SPECIFIED)
IF(KOKKOS_ENABLE_UNSUPPORTED_ARCHS)
COMPILER_SPECIFIC_FLAGS(
DEFAULT -fsycl-targets=nvptx64-nvidia-cuda-sycldevice
)
# FIXME_SYCL The CUDA backend doesn't support printf yet.
GLOBAL_SET(KOKKOS_IMPL_DISABLE_SYCL_DEVICE_PRINTF ON)
ELSE()
MESSAGE(SEND_ERROR "Setting a CUDA architecture for SYCL is only allowed with Kokkos_ENABLE_UNSUPPORTED_ARCHS=ON!")
ENDIF()
ELSEIF(KOKKOS_ARCH_INTEL_GEN)
COMPILER_SPECIFIC_FLAGS(
DEFAULT -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xsycl-target-backend "-device skl"
)
ENDIF()
ENDIF()
IF(KOKKOS_ENABLE_CUDA AND NOT CUDA_ARCH_ALREADY_SPECIFIED) IF(KOKKOS_ENABLE_CUDA AND NOT CUDA_ARCH_ALREADY_SPECIFIED)
# Try to autodetect the CUDA Compute Capability by asking the device # Try to autodetect the CUDA Compute Capability by asking the device
SET(_BINARY_TEST_DIR ${CMAKE_CURRENT_BINARY_DIR}/cmake/compile_tests/CUDAComputeCapabilityWorkdir) SET(_BINARY_TEST_DIR ${CMAKE_CURRENT_BINARY_DIR}/cmake/compile_tests/CUDAComputeCapabilityWorkdir)
@ -464,6 +493,43 @@ IF(KOKKOS_ENABLE_CUDA AND NOT CUDA_ARCH_ALREADY_SPECIFIED)
${CMAKE_CURRENT_SOURCE_DIR}/cmake/compile_tests/cuda_compute_capability.cc ${CMAKE_CURRENT_SOURCE_DIR}/cmake/compile_tests/cuda_compute_capability.cc
COMPILE_DEFINITIONS -DSM_ONLY COMPILE_DEFINITIONS -DSM_ONLY
RUN_OUTPUT_VARIABLE _CUDA_COMPUTE_CAPABILITY) RUN_OUTPUT_VARIABLE _CUDA_COMPUTE_CAPABILITY)
# if user is using kokkos_compiler_launcher, above will fail.
IF(NOT _COMPILE_RESULT OR NOT _RESULT EQUAL 0)
# check to see if CUDA is not already enabled (may happen when Kokkos is subproject)
GET_PROPERTY(_ENABLED_LANGUAGES GLOBAL PROPERTY ENABLED_LANGUAGES)
# language has to be fully enabled, just checking for CMAKE_CUDA_COMPILER isn't enough
IF(NOT "CUDA" IN_LIST _ENABLED_LANGUAGES)
# make sure the user knows that we aren't using CUDA compiler for anything else
MESSAGE(STATUS "CUDA auto-detection of architecture failed with ${CMAKE_CXX_COMPILER}. Enabling CUDA language ONLY to auto-detect architecture...")
INCLUDE(CheckLanguage)
CHECK_LANGUAGE(CUDA)
IF(CMAKE_CUDA_COMPILER)
ENABLE_LANGUAGE(CUDA)
ELSE()
MESSAGE(STATUS "CUDA language could not be enabled")
ENDIF()
ENDIF()
# if CUDA was enabled, this will be defined
IF(CMAKE_CUDA_COMPILER)
# copy our test to .cu so cmake compiles as CUDA
CONFIGURE_FILE(
${PROJECT_SOURCE_DIR}/cmake/compile_tests/cuda_compute_capability.cc
${PROJECT_BINARY_DIR}/compile_tests/cuda_compute_capability.cu
COPYONLY
)
# run test again
TRY_RUN(
_RESULT
_COMPILE_RESULT
${_BINARY_TEST_DIR}
${PROJECT_BINARY_DIR}/compile_tests/cuda_compute_capability.cu
COMPILE_DEFINITIONS -DSM_ONLY
RUN_OUTPUT_VARIABLE _CUDA_COMPUTE_CAPABILITY)
ENDIF()
ENDIF()
LIST(FIND KOKKOS_CUDA_ARCH_FLAGS sm_${_CUDA_COMPUTE_CAPABILITY} FLAG_INDEX) LIST(FIND KOKKOS_CUDA_ARCH_FLAGS sm_${_CUDA_COMPUTE_CAPABILITY} FLAG_INDEX)
IF(_COMPILE_RESULT AND _RESULT EQUAL 0 AND NOT FLAG_INDEX EQUAL -1) IF(_COMPILE_RESULT AND _RESULT EQUAL 0 AND NOT FLAG_INDEX EQUAL -1)
MESSAGE(STATUS "Detected CUDA Compute Capability ${_CUDA_COMPUTE_CAPABILITY}") MESSAGE(STATUS "Detected CUDA Compute Capability ${_CUDA_COMPUTE_CAPABILITY}")
@ -500,7 +566,7 @@ IF (KOKKOS_ENABLE_CUDA)
SET(KOKKOS_ARCH_VOLTA ON) SET(KOKKOS_ARCH_VOLTA ON)
ENDIF() ENDIF()
IF (KOKKOS_ARCH_AMPERE80) IF (KOKKOS_ARCH_AMPERE80 OR KOKKOS_ARCH_AMPERE86)
SET(KOKKOS_ARCH_AMPERE ON) SET(KOKKOS_ARCH_AMPERE ON)
ENDIF() ENDIF()
ENDIF() ENDIF()

View File

@ -27,6 +27,12 @@ IF(Kokkos_ENABLE_CUDA)
PATHS ${PROJECT_SOURCE_DIR} PATHS ${PROJECT_SOURCE_DIR}
PATH_SUFFIXES bin) PATH_SUFFIXES bin)
FIND_PROGRAM(Kokkos_NVCC_WRAPPER
NAMES nvcc_wrapper
HINTS ${PROJECT_SOURCE_DIR}
PATHS ${PROJECT_SOURCE_DIR}
PATH_SUFFIXES bin)
# check if compiler was set to nvcc_wrapper # check if compiler was set to nvcc_wrapper
kokkos_internal_have_compiler_nvcc(${CMAKE_CXX_COMPILER}) kokkos_internal_have_compiler_nvcc(${CMAKE_CXX_COMPILER})
# if launcher was found and nvcc_wrapper was not specified as # if launcher was found and nvcc_wrapper was not specified as
@ -37,7 +43,7 @@ IF(Kokkos_ENABLE_CUDA)
# if the second argument matches the C++ compiler, it forwards the rest of the # if the second argument matches the C++ compiler, it forwards the rest of the
# args to nvcc_wrapper # args to nvcc_wrapper
kokkos_internal_have_compiler_nvcc( kokkos_internal_have_compiler_nvcc(
${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER} ${CMAKE_CXX_COMPILER} -DKOKKOS_DEPENDENCE) ${Kokkos_COMPILE_LAUNCHER} ${Kokkos_NVCC_WRAPPER} ${CMAKE_CXX_COMPILER} ${CMAKE_CXX_COMPILER} -DKOKKOS_DEPENDENCE)
SET(INTERNAL_USE_COMPILER_LAUNCHER true) SET(INTERNAL_USE_COMPILER_LAUNCHER true)
ENDIF() ENDIF()
ENDIF() ENDIF()
@ -55,32 +61,7 @@ IF(INTERNAL_HAVE_COMPILER_NVCC)
SET(KOKKOS_CXX_COMPILER_VERSION ${TEMP_CXX_COMPILER_VERSION} CACHE STRING INTERNAL FORCE) SET(KOKKOS_CXX_COMPILER_VERSION ${TEMP_CXX_COMPILER_VERSION} CACHE STRING INTERNAL FORCE)
MESSAGE(STATUS "Compiler Version: ${KOKKOS_CXX_COMPILER_VERSION}") MESSAGE(STATUS "Compiler Version: ${KOKKOS_CXX_COMPILER_VERSION}")
IF(INTERNAL_USE_COMPILER_LAUNCHER) IF(INTERNAL_USE_COMPILER_LAUNCHER)
IF(Kokkos_LAUNCH_COMPILER_INFO) MESSAGE(STATUS "kokkos_launch_compiler (${Kokkos_COMPILE_LAUNCHER}) is enabled...")
GET_FILENAME_COMPONENT(BASE_COMPILER_NAME ${CMAKE_CXX_COMPILER} NAME)
# does not have STATUS intentionally
MESSAGE("")
MESSAGE("Kokkos_LAUNCH_COMPILER_INFO (${Kokkos_COMPILE_LAUNCHER}):")
MESSAGE(" - Kokkos + CUDA backend requires the C++ files to be compiled as CUDA code.")
MESSAGE(" - kokkos_launch_compiler permits CMAKE_CXX_COMPILER to be set to a traditional C++ compiler when Kokkos_ENABLE_CUDA=ON")
MESSAGE(" by prefixing all the compile and link commands with the path to the script + CMAKE_CXX_COMPILER (${CMAKE_CXX_COMPILER}).")
MESSAGE(" - If any of the compile or link commands have CMAKE_CXX_COMPILER as the first argument, it replaces CMAKE_CXX_COMPILER with nvcc_wrapper.")
MESSAGE(" - If the compile or link command is not CMAKE_CXX_COMPILER, it just executes the command.")
MESSAGE(" - If using ccache, set CMAKE_CXX_COMPILER to nvcc_wrapper explicitly.")
MESSAGE(" - kokkos_compiler_launcher is available to downstream projects as well.")
MESSAGE(" - If CMAKE_CXX_COMPILER=nvcc_wrapper, all legacy behavior will be preserved during 'find_package(Kokkos)'")
MESSAGE(" - If CMAKE_CXX_COMPILER is not nvcc_wrapper, 'find_package(Kokkos)' will apply 'kokkos_compilation(GLOBAL)' unless separable compilation is enabled")
MESSAGE(" - This can be disabled via '-DKokkos_LAUNCH_COMPILER=OFF'")
MESSAGE(" - Use 'find_package(Kokkos COMPONENTS separable_compilation)' to enable separable compilation")
MESSAGE(" - Separable compilation allows you to control the scope of where the compiler transformation behavior (${BASE_COMPILER_NAME} -> nvcc_wrapper) is applied")
MESSAGE(" - The compiler transformation can be applied on a per-project, per-directory, per-target, and/or per-source-file basis")
MESSAGE(" - 'kokkos_compilation(PROJECT)' will apply the compiler transformation to all targets in a project/subproject")
MESSAGE(" - 'kokkos_compilation(TARGET <TARGET> [<TARGETS>...])' will apply the compiler transformation to the specified target(s)")
MESSAGE(" - 'kokkos_compilation(SOURCE <SOURCE> [<SOURCES>...])' will apply the compiler transformation to the specified source file(s)")
MESSAGE(" - 'kokkos_compilation(DIRECTORY <DIR> [<DIRS>...])' will apply the compiler transformation to the specified directories")
MESSAGE("")
ELSE()
MESSAGE(STATUS "kokkos_launch_compiler (${Kokkos_COMPILE_LAUNCHER}) is enabled... Set Kokkos_LAUNCH_COMPILER_INFO=ON for more info.")
ENDIF()
kokkos_compilation(GLOBAL) kokkos_compilation(GLOBAL)
ENDIF() ENDIF()
ENDIF() ENDIF()
@ -92,7 +73,11 @@ IF(Kokkos_ENABLE_HIP)
OUTPUT_STRIP_TRAILING_WHITESPACE) OUTPUT_STRIP_TRAILING_WHITESPACE)
STRING(REPLACE "\n" " - " INTERNAL_COMPILER_VERSION_ONE_LINE ${INTERNAL_COMPILER_VERSION} ) STRING(REPLACE "\n" " - " INTERNAL_COMPILER_VERSION_ONE_LINE ${INTERNAL_COMPILER_VERSION} )
SET(KOKKOS_CXX_COMPILER_ID HIP CACHE STRING INTERNAL FORCE)
STRING(FIND ${INTERNAL_COMPILER_VERSION_ONE_LINE} "HIP version" INTERNAL_COMPILER_VERSION_CONTAINS_HIP)
IF(INTERNAL_COMPILER_VERSION_CONTAINS_HIP GREATER -1)
SET(KOKKOS_CXX_COMPILER_ID HIPCC CACHE STRING INTERNAL FORCE)
ENDIF()
STRING(REGEX MATCH "[0-9]+\\.[0-9]+\\.[0-9]+" STRING(REGEX MATCH "[0-9]+\\.[0-9]+\\.[0-9]+"
TEMP_CXX_COMPILER_VERSION ${INTERNAL_COMPILER_VERSION_ONE_LINE}) TEMP_CXX_COMPILER_VERSION ${INTERNAL_COMPILER_VERSION_ONE_LINE})
@ -103,8 +88,7 @@ ENDIF()
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang) IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
# The Cray compiler reports as Clang to most versions of CMake # The Cray compiler reports as Clang to most versions of CMake
EXECUTE_PROCESS(COMMAND ${CMAKE_CXX_COMPILER} --version EXECUTE_PROCESS(COMMAND ${CMAKE_CXX_COMPILER} --version
COMMAND grep Cray COMMAND grep -c Cray
COMMAND wc -l
OUTPUT_VARIABLE INTERNAL_HAVE_CRAY_COMPILER OUTPUT_VARIABLE INTERNAL_HAVE_CRAY_COMPILER
OUTPUT_STRIP_TRAILING_WHITESPACE) OUTPUT_STRIP_TRAILING_WHITESPACE)
IF (INTERNAL_HAVE_CRAY_COMPILER) #not actually Clang IF (INTERNAL_HAVE_CRAY_COMPILER) #not actually Clang
@ -112,8 +96,7 @@ IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
ENDIF() ENDIF()
# The clang based Intel compiler reports as Clang to most versions of CMake # The clang based Intel compiler reports as Clang to most versions of CMake
EXECUTE_PROCESS(COMMAND ${CMAKE_CXX_COMPILER} --version EXECUTE_PROCESS(COMMAND ${CMAKE_CXX_COMPILER} --version
COMMAND grep icpx COMMAND grep -c "DPC++\\|icpx"
COMMAND wc -l
OUTPUT_VARIABLE INTERNAL_HAVE_INTEL_COMPILER OUTPUT_VARIABLE INTERNAL_HAVE_INTEL_COMPILER
OUTPUT_STRIP_TRAILING_WHITESPACE) OUTPUT_STRIP_TRAILING_WHITESPACE)
IF (INTERNAL_HAVE_INTEL_COMPILER) #not actually Clang IF (INTERNAL_HAVE_INTEL_COMPILER) #not actually Clang
@ -174,7 +157,7 @@ ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
SET(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "Kokkos turns off CXX extensions" FORCE) SET(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "Kokkos turns off CXX extensions" FORCE)
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL HIP) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL HIPCC)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 3.8.0) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 3.8.0)
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()

View File

@ -49,11 +49,14 @@ ENDIF()
IF (KOKKOS_CXX_STANDARD STREQUAL 17) IF (KOKKOS_CXX_STANDARD STREQUAL 17)
IF (KOKKOS_CXX_COMPILER_ID STREQUAL GNU AND KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 7) IF (KOKKOS_CXX_COMPILER_ID STREQUAL GNU AND KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 7)
MESSAGE(FATAL_ERROR "You have requested c++17 support for GCC ${KOKKOS_CXX_COMPILER_VERSION}. Although CMake has allowed this and GCC accepts -std=c++1z/c++17, GCC <= 6 does not properly support *this capture. Please reduce the C++ standard to 14 or upgrade the compiler if you do need C++17 support.") MESSAGE(FATAL_ERROR "You have requested C++17 support for GCC ${KOKKOS_CXX_COMPILER_VERSION}. Although CMake has allowed this and GCC accepts -std=c++1z/c++17, GCC < 7 does not properly support *this capture. Please reduce the C++ standard to 14 or upgrade the compiler if you do need C++17 support.")
ENDIF() ENDIF()
IF (KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA AND KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 11) IF (KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA AND KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 11)
MESSAGE(FATAL_ERROR "You have requested c++17 support for NVCC ${KOKKOS_CXX_COMPILER_VERSION}. NVCC only supports C++17 from version 11 on. Please reduce the C++ standard to 14 or upgrade the compiler if you need C++17 support.") MESSAGE(FATAL_ERROR "You have requested C++17 support for NVCC ${KOKKOS_CXX_COMPILER_VERSION}. NVCC only supports C++17 from version 11 on. Please reduce the C++ standard to 14 or upgrade the compiler if you need C++17 support.")
ENDIF()
IF (KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA AND KOKKOS_ENABLE_CUDA_CONSTEXPR)
MESSAGE(WARNING "You have requested -DKokkos_ENABLE_CUDA_CONSTEXPR=ON with C++17 support for NVCC ${KOKKOS_CXX_COMPILER_VERSION} which is known to trigger compiler bugs. See https://github.com/kokkos/kokkos/issues/3496")
ENDIF() ENDIF()
ENDIF() ENDIF()

View File

@ -48,9 +48,6 @@ IF(KOKKOS_ENABLE_OPENMP)
IF(KOKKOS_CLANG_IS_CRAY) IF(KOKKOS_CLANG_IS_CRAY)
SET(ClangOpenMPFlag -fopenmp) SET(ClangOpenMPFlag -fopenmp)
ENDIF() ENDIF()
IF(KOKKOS_CLANG_IS_INTEL)
SET(ClangOpenMPFlag -fiopenmp)
ENDIF()
IF(KOKKOS_COMPILER_CLANG_MSVC) IF(KOKKOS_COMPILER_CLANG_MSVC)
#for clang-cl expression /openmp yields an error, so directly add the specific Clang flag #for clang-cl expression /openmp yields an error, so directly add the specific Clang flag
SET(ClangOpenMPFlag /clang:-fopenmp=libomp) SET(ClangOpenMPFlag /clang:-fopenmp=libomp)
@ -64,6 +61,7 @@ IF(KOKKOS_ENABLE_OPENMP)
COMPILER_SPECIFIC_FLAGS( COMPILER_SPECIFIC_FLAGS(
COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID
Clang -Xcompiler ${ClangOpenMPFlag} Clang -Xcompiler ${ClangOpenMPFlag}
IntelClang -Xcompiler -fiopenmp
PGI -Xcompiler -mp PGI -Xcompiler -mp
Cray NO-VALUE-SPECIFIED Cray NO-VALUE-SPECIFIED
XL -Xcompiler -qsmp=omp XL -Xcompiler -qsmp=omp
@ -72,6 +70,7 @@ IF(KOKKOS_ENABLE_OPENMP)
ELSE() ELSE()
COMPILER_SPECIFIC_FLAGS( COMPILER_SPECIFIC_FLAGS(
Clang ${ClangOpenMPFlag} Clang ${ClangOpenMPFlag}
IntelClang -fiopenmp
AppleClang -Xpreprocessor -fopenmp AppleClang -Xpreprocessor -fopenmp
PGI -mp PGI -mp
Cray NO-VALUE-SPECIFIED Cray NO-VALUE-SPECIFIED
@ -152,3 +151,11 @@ IF (KOKKOS_ENABLE_HIP)
ENDIF() ENDIF()
KOKKOS_DEVICE_OPTION(SYCL OFF DEVICE "Whether to build SYCL backend") KOKKOS_DEVICE_OPTION(SYCL OFF DEVICE "Whether to build SYCL backend")
## SYCL has extra setup requirements, turn on Kokkos_Setup_SYCL.hpp in macros
IF (KOKKOS_ENABLE_SYCL)
IF(KOKKOS_CXX_STANDARD LESS 17)
MESSAGE(FATAL_ERROR "SYCL backend requires C++17 or newer!")
ENDIF()
LIST(APPEND DEVICE_SETUP_LIST SYCL)
ENDIF()

View File

@ -48,6 +48,7 @@ KOKKOS_ENABLE_OPTION(COMPILER_WARNINGS OFF "Whether to print all compiler war
KOKKOS_ENABLE_OPTION(PROFILING_LOAD_PRINT OFF "Whether to print information about which profiling tools got loaded") KOKKOS_ENABLE_OPTION(PROFILING_LOAD_PRINT OFF "Whether to print information about which profiling tools got loaded")
KOKKOS_ENABLE_OPTION(TUNING OFF "Whether to create bindings for tuning tools") KOKKOS_ENABLE_OPTION(TUNING OFF "Whether to create bindings for tuning tools")
KOKKOS_ENABLE_OPTION(AGGRESSIVE_VECTORIZATION OFF "Whether to aggressively vectorize loops") KOKKOS_ENABLE_OPTION(AGGRESSIVE_VECTORIZATION OFF "Whether to aggressively vectorize loops")
KOKKOS_ENABLE_OPTION(LAUNCH_COMPILER ON "Whether to potentially use the launch compiler")
IF (KOKKOS_ENABLE_CUDA) IF (KOKKOS_ENABLE_CUDA)
SET(KOKKOS_COMPILER_CUDA_VERSION "${KOKKOS_COMPILER_VERSION_MAJOR}${KOKKOS_COMPILER_VERSION_MINOR}") SET(KOKKOS_COMPILER_CUDA_VERSION "${KOKKOS_COMPILER_VERSION_MAJOR}${KOKKOS_COMPILER_VERSION_MINOR}")
@ -68,6 +69,15 @@ ELSE()
ENDIF() ENDIF()
KOKKOS_ENABLE_OPTION(COMPLEX_ALIGN ${COMPLEX_ALIGN_DEFAULT} "Whether to align Kokkos::complex to 2*alignof(RealType)") KOKKOS_ENABLE_OPTION(COMPLEX_ALIGN ${COMPLEX_ALIGN_DEFAULT} "Whether to align Kokkos::complex to 2*alignof(RealType)")
IF (KOKKOS_ENABLE_TESTS)
SET(HEADER_SELF_CONTAINMENT_TESTS_DEFAULT ON)
ELSE()
SET(HEADER_SELF_CONTAINMENT_TESTS_DEFAULT OFF)
ENDIF()
KOKKOS_ENABLE_OPTION(HEADER_SELF_CONTAINMENT_TESTS ${HEADER_SELF_CONTAINMENT_TESTS_DEFAULT} "Enable header self-containment unit tests")
IF (NOT KOKKOS_ENABLE_TESTS AND KOKKOS_ENABLE_HEADER_SELF_CONTAINMENT_TESTS)
MESSAGE(WARNING "Kokkos_ENABLE_HEADER_SELF_CONTAINMENT_TESTS is ON but Kokkos_ENABLE_TESTS is OFF. Option will be ignored.")
ENDIF()
IF (KOKKOS_ENABLE_CUDA AND (KOKKOS_CXX_COMPILER_ID STREQUAL Clang)) IF (KOKKOS_ENABLE_CUDA AND (KOKKOS_CXX_COMPILER_ID STREQUAL Clang))
SET(CUDA_CONSTEXPR_DEFAULT ON) SET(CUDA_CONSTEXPR_DEFAULT ON)
@ -76,15 +86,15 @@ ELSE()
ENDIF() ENDIF()
KOKKOS_ENABLE_OPTION(CUDA_CONSTEXPR ${CUDA_CONSTEXPR_DEFAULT} "Whether to activate experimental relaxed constexpr functions") KOKKOS_ENABLE_OPTION(CUDA_CONSTEXPR ${CUDA_CONSTEXPR_DEFAULT} "Whether to activate experimental relaxed constexpr functions")
Kokkos_ENABLE_OPTION(UNSUPPORTED_ARCHS OFF "Whether to allow architectures in backends Kokkos doesn't optimize for")
FUNCTION(check_device_specific_options) FUNCTION(check_device_specific_options)
CMAKE_PARSE_ARGUMENTS(SOME "" "DEVICE" "OPTIONS" ${ARGN}) CMAKE_PARSE_ARGUMENTS(SOME "" "DEVICE" "OPTIONS" ${ARGN})
IF(NOT KOKKOS_ENABLE_${SOME_DEVICE}) IF(NOT KOKKOS_ENABLE_${SOME_DEVICE})
FOREACH(OPTION ${SOME_OPTIONS}) FOREACH(OPTION ${SOME_OPTIONS})
IF(CMAKE_VERSION VERSION_GREATER_EQUAL 3.14)
IF(NOT DEFINED CACHE{Kokkos_ENABLE_${OPTION}} OR NOT DEFINED CACHE{Kokkos_ENABLE_${SOME_DEVICE}}) IF(NOT DEFINED CACHE{Kokkos_ENABLE_${OPTION}} OR NOT DEFINED CACHE{Kokkos_ENABLE_${SOME_DEVICE}})
MESSAGE(FATAL_ERROR "Internal logic error: option '${OPTION}' or device '${SOME_DEVICE}' not recognized.") MESSAGE(FATAL_ERROR "Internal logic error: option '${OPTION}' or device '${SOME_DEVICE}' not recognized.")
ENDIF() ENDIF()
ENDIF()
IF(KOKKOS_ENABLE_${OPTION}) IF(KOKKOS_ENABLE_${OPTION})
MESSAGE(WARNING "Kokkos_ENABLE_${OPTION} is ON but ${SOME_DEVICE} backend is not enabled. Option will be ignored.") MESSAGE(WARNING "Kokkos_ENABLE_${OPTION} is ON but ${SOME_DEVICE} backend is not enabled. Option will be ignored.")
UNSET(KOKKOS_ENABLE_${OPTION} PARENT_SCOPE) UNSET(KOKKOS_ENABLE_${OPTION} PARENT_SCOPE)

View File

@ -169,9 +169,7 @@ MACRO(kokkos_export_imported_tpl NAME)
ENDIF() ENDIF()
SET(TPL_LINK_OPTIONS) SET(TPL_LINK_OPTIONS)
IF(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.13.0")
GET_TARGET_PROPERTY(TPL_LINK_OPTIONS ${NAME} INTERFACE_LINK_OPTIONS) GET_TARGET_PROPERTY(TPL_LINK_OPTIONS ${NAME} INTERFACE_LINK_OPTIONS)
ENDIF()
IF(TPL_LINK_OPTIONS) IF(TPL_LINK_OPTIONS)
KOKKOS_APPEND_CONFIG_LINE("INTERFACE_LINK_OPTIONS ${TPL_LINK_OPTIONS}") KOKKOS_APPEND_CONFIG_LINE("INTERFACE_LINK_OPTIONS ${TPL_LINK_OPTIONS}")
ENDIF() ENDIF()
@ -230,9 +228,7 @@ MACRO(kokkos_import_tpl NAME)
# I have still been getting errors about ROOT variables being ignored # I have still been getting errors about ROOT variables being ignored
# I'm not sure if this is a scope issue - but make sure # I'm not sure if this is a scope issue - but make sure
# the policy is set before we do any find_package calls # the policy is set before we do any find_package calls
IF(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.12.0")
CMAKE_POLICY(SET CMP0074 NEW) CMAKE_POLICY(SET CMP0074 NEW)
ENDIF()
IF (KOKKOS_ENABLE_${NAME}) IF (KOKKOS_ENABLE_${NAME})
#Tack on a TPL here to make sure we avoid using anyone else's find #Tack on a TPL here to make sure we avoid using anyone else's find
@ -314,7 +310,7 @@ MACRO(kokkos_create_imported_tpl NAME)
CMAKE_PARSE_ARGUMENTS(TPL CMAKE_PARSE_ARGUMENTS(TPL
"INTERFACE" "INTERFACE"
"LIBRARY" "LIBRARY"
"LINK_LIBRARIES;INCLUDES;COMPILE_OPTIONS;LINK_OPTIONS" "LINK_LIBRARIES;INCLUDES;COMPILE_DEFINITIONS;COMPILE_OPTIONS;LINK_OPTIONS"
${ARGN}) ${ARGN})
@ -334,6 +330,9 @@ MACRO(kokkos_create_imported_tpl NAME)
IF(TPL_INCLUDES) IF(TPL_INCLUDES)
TARGET_INCLUDE_DIRECTORIES(${NAME} INTERFACE ${TPL_INCLUDES}) TARGET_INCLUDE_DIRECTORIES(${NAME} INTERFACE ${TPL_INCLUDES})
ENDIF() ENDIF()
IF(TPL_COMPILE_DEFINITIONS)
TARGET_COMPILE_DEFINITIONS(${NAME} INTERFACE ${TPL_COMPILE_DEFINITIONS})
ENDIF()
IF(TPL_COMPILE_OPTIONS) IF(TPL_COMPILE_OPTIONS)
TARGET_COMPILE_OPTIONS(${NAME} INTERFACE ${TPL_COMPILE_OPTIONS}) TARGET_COMPILE_OPTIONS(${NAME} INTERFACE ${TPL_COMPILE_OPTIONS})
ENDIF() ENDIF()
@ -355,6 +354,10 @@ MACRO(kokkos_create_imported_tpl NAME)
SET_TARGET_PROPERTIES(${NAME} PROPERTIES SET_TARGET_PROPERTIES(${NAME} PROPERTIES
INTERFACE_INCLUDE_DIRECTORIES "${TPL_INCLUDES}") INTERFACE_INCLUDE_DIRECTORIES "${TPL_INCLUDES}")
ENDIF() ENDIF()
IF(TPL_COMPILE_DEFINITIONS)
SET_TARGET_PROPERTIES(${NAME} PROPERTIES
INTERFACE_COMPILE_DEFINITIONS "${TPL_COMPILE_DEFINITIONS}")
ENDIF()
IF(TPL_COMPILE_OPTIONS) IF(TPL_COMPILE_OPTIONS)
SET_TARGET_PROPERTIES(${NAME} PROPERTIES SET_TARGET_PROPERTIES(${NAME} PROPERTIES
INTERFACE_COMPILE_OPTIONS "${TPL_COMPILE_OPTIONS}") INTERFACE_COMPILE_OPTIONS "${TPL_COMPILE_OPTIONS}")
@ -770,7 +773,7 @@ FUNCTION(kokkos_link_tpl TARGET)
ENDFUNCTION() ENDFUNCTION()
FUNCTION(COMPILER_SPECIFIC_OPTIONS_HELPER) FUNCTION(COMPILER_SPECIFIC_OPTIONS_HELPER)
SET(COMPILERS NVIDIA PGI XL DEFAULT Cray Intel Clang AppleClang IntelClang GNU HIP Fujitsu) SET(COMPILERS NVIDIA PGI XL DEFAULT Cray Intel Clang AppleClang IntelClang GNU HIPCC Fujitsu)
CMAKE_PARSE_ARGUMENTS( CMAKE_PARSE_ARGUMENTS(
PARSE PARSE
"LINK_OPTIONS;COMPILE_OPTIONS;COMPILE_DEFINITIONS;LINK_LIBRARIES" "LINK_OPTIONS;COMPILE_OPTIONS;COMPILE_DEFINITIONS;LINK_LIBRARIES"
@ -926,6 +929,9 @@ ENDFUNCTION()
# DIRECTORY --> all files in directory # DIRECTORY --> all files in directory
# PROJECT --> all files/targets in a project/subproject # PROJECT --> all files/targets in a project/subproject
# #
# NOTE: this is VERY DIFFERENT than the version in KokkosConfigCommon.cmake.in.
# This version explicitly uses nvcc_wrapper.
#
FUNCTION(kokkos_compilation) FUNCTION(kokkos_compilation)
# check whether the compiler already supports building CUDA # check whether the compiler already supports building CUDA
KOKKOS_CXX_COMPILER_CUDA_TEST(Kokkos_CXX_COMPILER_COMPILES_CUDA) KOKKOS_CXX_COMPILER_CUDA_TEST(Kokkos_CXX_COMPILER_COMPILES_CUDA)
@ -947,10 +953,21 @@ FUNCTION(kokkos_compilation)
MESSAGE(FATAL_ERROR "Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'") MESSAGE(FATAL_ERROR "Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'")
ENDIF() ENDIF()
# find nvcc_wrapper
FIND_PROGRAM(Kokkos_NVCC_WRAPPER
NAMES nvcc_wrapper
HINTS ${PROJECT_SOURCE_DIR}
PATHS ${PROJECT_SOURCE_DIR}
PATH_SUFFIXES bin)
IF(NOT Kokkos_COMPILE_LAUNCHER)
MESSAGE(FATAL_ERROR "Kokkos could not find 'nvcc_wrapper'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/nvcc_wrapper'")
ENDIF()
IF(COMP_GLOBAL) IF(COMP_GLOBAL)
# if global, don't bother setting others # if global, don't bother setting others
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}") SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${Kokkos_NVCC_WRAPPER} ${CMAKE_CXX_COMPILER}")
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}") SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${Kokkos_NVCC_WRAPPER} ${CMAKE_CXX_COMPILER}")
ELSE() ELSE()
FOREACH(_TYPE PROJECT DIRECTORY TARGET SOURCE) FOREACH(_TYPE PROJECT DIRECTORY TARGET SOURCE)
# make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...) # make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...)
@ -961,8 +978,8 @@ FUNCTION(kokkos_compilation)
# set the properties if defined # set the properties if defined
IF(COMP_${_TYPE}) IF(COMP_${_TYPE})
# MESSAGE(STATUS "Using nvcc_wrapper :: ${_TYPE} :: ${COMP_${_TYPE}}") # MESSAGE(STATUS "Using nvcc_wrapper :: ${_TYPE} :: ${COMP_${_TYPE}}")
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}") SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${Kokkos_NVCC_WRAPPER} ${CMAKE_CXX_COMPILER}")
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${CMAKE_CXX_COMPILER}") SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${Kokkos_NVCC_WRAPPER} ${CMAKE_CXX_COMPILER}")
ENDIF() ENDIF()
ENDFOREACH() ENDFOREACH()
ENDIF() ENDIF()

View File

@ -86,6 +86,19 @@ ELSE()
MESSAGE(FATAL_ERROR "Unknown C++ standard ${KOKKOS_CXX_STANDARD} - must be 14, 17, or 20") MESSAGE(FATAL_ERROR "Unknown C++ standard ${KOKKOS_CXX_STANDARD} - must be 14, 17, or 20")
ENDIF() ENDIF()
# Enforce that we can compile a simple C++14 program
TRY_COMPILE(CAN_COMPILE_CPP14
${KOKKOS_TOP_BUILD_DIR}/corner_cases
${KOKKOS_SOURCE_DIR}/cmake/compile_tests/cplusplus14.cpp
OUTPUT_VARIABLE ERROR_MESSAGE
CXX_STANDARD 14
)
if (NOT CAN_COMPILE_CPP14)
UNSET(CAN_COMPILE_CPP14 CACHE) #make sure CMake always re-runs this
MESSAGE(FATAL_ERROR "C++${KOKKOS_CXX_STANDARD}-compliant compiler detected, but unable to compile C++14 or later program. Verify that ${CMAKE_CXX_COMPILER_ID}:${CMAKE_CXX_COMPILER_VERSION} is set up correctly (e.g., check that correct library headers are being used).\nFailing output:\n ${ERROR_MESSAGE}")
ENDIF()
UNSET(CAN_COMPILE_CPP14 CACHE) #make sure CMake always re-runs this
# Enforce that extensions are turned off for nvcc_wrapper. # Enforce that extensions are turned off for nvcc_wrapper.

View File

@ -1,5 +1,6 @@
KOKKOS_CFG_DEPENDS(TPLS OPTIONS) KOKKOS_CFG_DEPENDS(TPLS OPTIONS)
KOKKOS_CFG_DEPENDS(TPLS DEVICES) KOKKOS_CFG_DEPENDS(TPLS DEVICES)
KOKKOS_CFG_DEPENDS(TPLS COMPILER_ID)
FUNCTION(KOKKOS_TPL_OPTION PKG DEFAULT) FUNCTION(KOKKOS_TPL_OPTION PKG DEFAULT)
CMAKE_PARSE_ARGUMENTS(PARSED CMAKE_PARSE_ARGUMENTS(PARSED
@ -38,6 +39,12 @@ IF(KOKKOS_ENABLE_MEMKIND)
ENDIF() ENDIF()
KOKKOS_TPL_OPTION(CUDA ${Kokkos_ENABLE_CUDA} TRIBITS CUDA) KOKKOS_TPL_OPTION(CUDA ${Kokkos_ENABLE_CUDA} TRIBITS CUDA)
KOKKOS_TPL_OPTION(LIBRT Off) KOKKOS_TPL_OPTION(LIBRT Off)
IF(KOKKOS_ENABLE_HIP AND NOT KOKKOS_CXX_COMPILER_ID STREQUAL HIPCC)
SET(ROCM_DEFAULT ON)
ELSE()
SET(ROCM_DEFAULT OFF)
ENDIF()
KOKKOS_TPL_OPTION(ROCM ${ROCM_DEFAULT})
IF (WIN32) IF (WIN32)
SET(LIBDL_DEFAULT Off) SET(LIBDL_DEFAULT Off)
@ -70,6 +77,7 @@ KOKKOS_IMPORT_TPL(LIBRT)
KOKKOS_IMPORT_TPL(LIBDL) KOKKOS_IMPORT_TPL(LIBDL)
KOKKOS_IMPORT_TPL(MEMKIND) KOKKOS_IMPORT_TPL(MEMKIND)
KOKKOS_IMPORT_TPL(PTHREAD INTERFACE) KOKKOS_IMPORT_TPL(PTHREAD INTERFACE)
KOKKOS_IMPORT_TPL(ROCM INTERFACE)
#Convert list to newlines (which CMake doesn't always like in cache variables) #Convert list to newlines (which CMake doesn't always like in cache variables)
STRING(REPLACE ";" "\n" KOKKOS_TPL_EXPORT_TEMP "${KOKKOS_TPL_EXPORTS}") STRING(REPLACE ";" "\n" KOKKOS_TPL_EXPORT_TEMP "${KOKKOS_TPL_EXPORTS}")

View File

@ -168,12 +168,27 @@ ELSE()
KOKKOS_ADD_TEST_EXECUTABLE(${ROOT_NAME} KOKKOS_ADD_TEST_EXECUTABLE(${ROOT_NAME}
SOURCES ${PARSE_SOURCES} SOURCES ${PARSE_SOURCES}
) )
IF (PARSE_ARGS)
SET(TEST_NUMBER 0)
FOREACH (ARG_STR ${PARSE_ARGS})
# This is passed as a single string blob to match TriBITS behavior
# We need this to be turned into a list
STRING(REPLACE " " ";" ARG_STR_LIST ${ARG_STR})
LIST(APPEND TEST_NAME "${ROOT_NAME}${TEST_NUMBER}")
MATH(EXPR TEST_NUMBER "${TEST_NUMBER} + 1")
KOKKOS_ADD_TEST(NAME ${TEST_NAME}
EXE ${ROOT_NAME}
FAIL_REGULAR_EXPRESSION " FAILED "
ARGS ${ARG_STR_LIST}
)
ENDFOREACH()
ELSE()
KOKKOS_ADD_TEST(NAME ${ROOT_NAME} KOKKOS_ADD_TEST(NAME ${ROOT_NAME}
EXE ${ROOT_NAME} EXE ${ROOT_NAME}
FAIL_REGULAR_EXPRESSION " FAILED " FAIL_REGULAR_EXPRESSION " FAILED "
ARGS ${PARSE_ARGS}
) )
ENDIF() ENDIF()
ENDIF()
ENDFUNCTION() ENDFUNCTION()
FUNCTION(KOKKOS_SET_EXE_PROPERTY ROOT_NAME) FUNCTION(KOKKOS_SET_EXE_PROPERTY ROOT_NAME)
@ -301,11 +316,26 @@ ENDMACRO()
## Includes generated header files, scripts such as nvcc_wrapper and hpcbind, ## Includes generated header files, scripts such as nvcc_wrapper and hpcbind,
## as well as other files provided through plugins. ## as well as other files provided through plugins.
MACRO(KOKKOS_INSTALL_ADDITIONAL_FILES) MACRO(KOKKOS_INSTALL_ADDITIONAL_FILES)
# kokkos_launch_compiler is used by Kokkos to prefix compiler commands so that they forward to nvcc_wrapper
# kokkos_launch_compiler is used by Kokkos to prefix compiler commands so that they forward to original kokkos compiler
# if nvcc_wrapper was not used as CMAKE_CXX_COMPILER, configure the original compiler into kokkos_launch_compiler
IF(NOT "${CMAKE_CXX_COMPILER}" MATCHES "nvcc_wrapper")
SET(NVCC_WRAPPER_DEFAULT_COMPILER "${CMAKE_CXX_COMPILER}")
ELSE()
IF(NOT "$ENV{NVCC_WRAPPER_DEFAULT_COMPILER}" STREQUAL "")
SET(NVCC_WRAPPER_DEFAULT_COMPILER "$ENV{NVCC_WRAPPER_DEFAULT_COMPILER}")
ENDIF()
ENDIF()
CONFIGURE_FILE(${CMAKE_CURRENT_SOURCE_DIR}/bin/kokkos_launch_compiler
${PROJECT_BINARY_DIR}/temp/kokkos_launch_compiler
@ONLY)
INSTALL(PROGRAMS INSTALL(PROGRAMS
"${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper" "${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper"
"${CMAKE_CURRENT_SOURCE_DIR}/bin/hpcbind" "${CMAKE_CURRENT_SOURCE_DIR}/bin/hpcbind"
"${CMAKE_CURRENT_SOURCE_DIR}/bin/kokkos_launch_compiler" "${CMAKE_CURRENT_SOURCE_DIR}/bin/kokkos_launch_compiler"
"${PROJECT_BINARY_DIR}/temp/kokkos_launch_compiler"
DESTINATION ${CMAKE_INSTALL_BINDIR}) DESTINATION ${CMAKE_INSTALL_BINDIR})
INSTALL(FILES INSTALL(FILES
"${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_config.h" "${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_config.h"
@ -313,7 +343,7 @@ MACRO(KOKKOS_INSTALL_ADDITIONAL_FILES)
"${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_SetupBackend.hpp" "${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_SetupBackend.hpp"
"${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_DeclareBackend.hpp" "${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_DeclareBackend.hpp"
"${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_PostInclude.hpp" "${CMAKE_CURRENT_BINARY_DIR}/KokkosCore_Config_PostInclude.hpp"
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}) DESTINATION ${KOKKOS_HEADER_DIR})
ENDMACRO() ENDMACRO()
FUNCTION(KOKKOS_SET_LIBRARY_PROPERTIES LIBRARY_NAME) FUNCTION(KOKKOS_SET_LIBRARY_PROPERTIES LIBRARY_NAME)
@ -330,24 +360,12 @@ FUNCTION(KOKKOS_SET_LIBRARY_PROPERTIES LIBRARY_NAME)
${LIBRARY_NAME} PUBLIC ${LIBRARY_NAME} PUBLIC
$<$<LINK_LANGUAGE:CXX>:${KOKKOS_LINK_OPTIONS}> $<$<LINK_LANGUAGE:CXX>:${KOKKOS_LINK_OPTIONS}>
) )
ELSEIF(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.13") ELSE()
#I can use link options #I can use link options
#just assume CXX linkage #just assume CXX linkage
TARGET_LINK_OPTIONS( TARGET_LINK_OPTIONS(
${LIBRARY_NAME} PUBLIC ${KOKKOS_LINK_OPTIONS} ${LIBRARY_NAME} PUBLIC ${KOKKOS_LINK_OPTIONS}
) )
ELSE()
#assume CXX linkage, we have no good way to check otherwise
IF (PARSE_PLAIN_STYLE)
TARGET_LINK_LIBRARIES(
${LIBRARY_NAME} ${KOKKOS_LINK_OPTIONS}
)
ELSE()
#well, have to do it the wrong way for now
TARGET_LINK_LIBRARIES(
${LIBRARY_NAME} PUBLIC ${KOKKOS_LINK_OPTIONS}
)
ENDIF()
ENDIF() ENDIF()
TARGET_COMPILE_OPTIONS( TARGET_COMPILE_OPTIONS(
@ -448,6 +466,13 @@ FUNCTION(KOKKOS_INTERNAL_ADD_LIBRARY LIBRARY_NAME)
${PARSE_SOURCES} ${PARSE_SOURCES}
) )
IF(PARSE_SHARED OR BUILD_SHARED_LIBS)
SET_TARGET_PROPERTIES(${LIBRARY_NAME} PROPERTIES
VERSION ${Kokkos_VERSION}
SOVERSION ${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}
)
ENDIF()
KOKKOS_INTERNAL_ADD_LIBRARY_INSTALL(${LIBRARY_NAME}) KOKKOS_INTERNAL_ADD_LIBRARY_INSTALL(${LIBRARY_NAME})
#In case we are building in-tree, add an alias name #In case we are building in-tree, add an alias name

View File

@ -26,8 +26,6 @@ KOKKOS_ADD_LIBRARY(
HEADERS ${KOKKOS_CONTAINER_HEADERS} HEADERS ${KOKKOS_CONTAINER_HEADERS}
) )
SET_TARGET_PROPERTIES(kokkoscontainers PROPERTIES VERSION ${Kokkos_VERSION})
KOKKOS_LIB_INCLUDE_DIRECTORIES(kokkoscontainers KOKKOS_LIB_INCLUDE_DIRECTORIES(kokkoscontainers
${KOKKOS_TOP_BUILD_DIR} ${KOKKOS_TOP_BUILD_DIR}
${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_CURRENT_BINARY_DIR}
@ -36,4 +34,3 @@ KOKKOS_LIB_INCLUDE_DIRECTORIES(kokkoscontainers
KOKKOS_LINK_INTERNAL_LIBRARY(kokkoscontainers kokkoscore) KOKKOS_LINK_INTERNAL_LIBRARY(kokkoscontainers kokkoscore)
#----------------------------------------------------------------------------- #-----------------------------------------------------------------------------

View File

@ -91,6 +91,25 @@ namespace Kokkos {
* behavior. Please see the documentation of Kokkos::View for * behavior. Please see the documentation of Kokkos::View for
* examples. The default suffices for most users. * examples. The default suffices for most users.
*/ */
namespace Impl {
#ifdef KOKKOS_ENABLE_CUDA
inline const Kokkos::Cuda& get_cuda_space(const Kokkos::Cuda& in) { return in; }
inline const Kokkos::Cuda& get_cuda_space() {
return *Kokkos::Impl::cuda_get_deep_copy_space();
}
template <typename NonCudaExecSpace>
inline const Kokkos::Cuda& get_cuda_space(const NonCudaExecSpace&) {
return get_cuda_space();
}
#endif // KOKKOS_ENABLE_CUDA
} // namespace Impl
template <class DataType, class Arg1Type = void, class Arg2Type = void, template <class DataType, class Arg1Type = void, class Arg2Type = void,
class Arg3Type = void> class Arg3Type = void>
class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> { class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
@ -295,6 +314,53 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
"DualView constructed with incompatible views"); "DualView constructed with incompatible views");
} }
} }
// does the DualView have only one device
struct impl_dualview_is_single_device {
enum : bool {
value = std::is_same<typename t_dev::device_type,
typename t_host::device_type>::value
};
};
// does the given device match the device of t_dev?
template <typename Device>
struct impl_device_matches_tdev_device {
enum : bool {
value = std::is_same<typename t_dev::device_type, Device>::value
};
};
// does the given device match the device of t_host?
template <typename Device>
struct impl_device_matches_thost_device {
enum : bool {
value = std::is_same<typename t_host::device_type, Device>::value
};
};
// does the given device match the execution space of t_host?
template <typename Device>
struct impl_device_matches_thost_exec {
enum : bool {
value = std::is_same<typename t_host::execution_space, Device>::value
};
};
// does the given device match the execution space of t_dev?
template <typename Device>
struct impl_device_matches_tdev_exec {
enum : bool {
value = std::is_same<typename t_dev::execution_space, Device>::value
};
};
// does the given device's memory space match the memory space of t_dev?
template <typename Device>
struct impl_device_matches_tdev_memory_space {
enum : bool {
value = std::is_same<typename t_dev::memory_space,
typename Device::memory_space>::value
};
};
//@} //@}
//! \name Methods for synchronizing, marking as modified, and getting Views. //! \name Methods for synchronizing, marking as modified, and getting Views.
@ -302,7 +368,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
/// \brief Return a View on a specific device \c Device. /// \brief Return a View on a specific device \c Device.
/// ///
/// Please don't be afraid of the if_c expression in the return /// Please don't be afraid of the nested if_c expressions in the return
/// value's type. That just tells the method what the return type /// value's type. That just tells the method what the return type
/// should be: t_dev if the \c Device template parameter matches /// should be: t_dev if the \c Device template parameter matches
/// this DualView's device type, else t_host. /// this DualView's device type, else t_host.
@ -323,10 +389,17 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
/// typename dual_view_type::t_host hostView = DV.view<host_device_type> (); /// typename dual_view_type::t_host hostView = DV.view<host_device_type> ();
/// \endcode /// \endcode
template <class Device> template <class Device>
KOKKOS_INLINE_FUNCTION const typename Impl::if_c< KOKKOS_INLINE_FUNCTION const typename std::conditional_t<
std::is_same<typename t_dev::memory_space, impl_device_matches_tdev_device<Device>::value, t_dev,
typename Device::memory_space>::value, typename std::conditional_t<
t_dev, t_host>::type& impl_device_matches_thost_device<Device>::value, t_host,
typename std::conditional_t<
impl_device_matches_thost_exec<Device>::value, t_host,
typename std::conditional_t<
impl_device_matches_tdev_exec<Device>::value, t_dev,
typename std::conditional_t<
impl_device_matches_tdev_memory_space<Device>::value,
t_dev, t_host> > > > >
view() const { view() const {
constexpr bool device_is_memspace = constexpr bool device_is_memspace =
std::is_same<Device, typename Device::memory_space>::value; std::is_same<Device, typename Device::memory_space>::value;
@ -463,6 +536,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
true); true);
} }
} }
/// \brief Update data on device or host only if data in the other /// \brief Update data on device or host only if data in the other
/// space has been marked as modified. /// space has been marked as modified.
/// ///
@ -480,12 +554,9 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
/// the data in either View. You must manually mark modified data /// the data in either View. You must manually mark modified data
/// as modified, by calling the modify() method with the /// as modified, by calling the modify() method with the
/// appropriate template parameter. /// appropriate template parameter.
template <class Device> // deliberately passing args by cref as they're used multiple times
void sync(const typename std::enable_if< template <class Device, class... Args>
(std::is_same<typename traits::data_type, void sync_impl(std::true_type, Args const&... args) {
typename traits::non_const_data_type>::value) ||
(std::is_same<Device, int>::value),
int>::type& = 0) {
if (modified_flags.data() == nullptr) return; if (modified_flags.data() == nullptr) return;
int dev = get_device_side<Device>(); int dev = get_device_side<Device>();
@ -497,12 +568,12 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
Kokkos::CudaUVMSpace>::value) { Kokkos::CudaUVMSpace>::value) {
if (d_view.data() == h_view.data()) if (d_view.data() == h_view.data())
Kokkos::Impl::cuda_prefetch_pointer( Kokkos::Impl::cuda_prefetch_pointer(
Kokkos::Cuda(), d_view.data(), Impl::get_cuda_space(args...), d_view.data(),
sizeof(typename t_dev::value_type) * d_view.span(), true); sizeof(typename t_dev::value_type) * d_view.span(), true);
} }
#endif #endif
deep_copy(d_view, h_view); deep_copy(args..., d_view, h_view);
modified_flags(0) = modified_flags(1) = 0; modified_flags(0) = modified_flags(1) = 0;
impl_report_device_sync(); impl_report_device_sync();
} }
@ -514,12 +585,12 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
Kokkos::CudaUVMSpace>::value) { Kokkos::CudaUVMSpace>::value) {
if (d_view.data() == h_view.data()) if (d_view.data() == h_view.data())
Kokkos::Impl::cuda_prefetch_pointer( Kokkos::Impl::cuda_prefetch_pointer(
Kokkos::Cuda(), d_view.data(), Impl::get_cuda_space(args...), d_view.data(),
sizeof(typename t_dev::value_type) * d_view.span(), false); sizeof(typename t_dev::value_type) * d_view.span(), false);
} }
#endif #endif
deep_copy(h_view, d_view); deep_copy(args..., h_view, d_view);
modified_flags(0) = modified_flags(1) = 0; modified_flags(0) = modified_flags(1) = 0;
impl_report_host_sync(); impl_report_host_sync();
} }
@ -533,10 +604,26 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
template <class Device> template <class Device>
void sync(const typename std::enable_if< void sync(const typename std::enable_if<
(!std::is_same<typename traits::data_type, (std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) || typename traits::non_const_data_type>::value) ||
(std::is_same<Device, int>::value), (std::is_same<Device, int>::value),
int>::type& = 0) { int>::type& = 0) {
sync_impl<Device>(std::true_type{});
}
template <class Device, class ExecutionSpace>
void sync(const ExecutionSpace& exec,
const typename std::enable_if<
(std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) ||
(std::is_same<Device, int>::value),
int>::type& = 0) {
sync_impl<Device>(std::true_type{}, exec);
}
// deliberately passing args by cref as they're used multiple times
template <class Device, class... Args>
void sync_impl(std::false_type, Args const&...) {
if (modified_flags.data() == nullptr) return; if (modified_flags.data() == nullptr) return;
int dev = get_device_side<Device>(); int dev = get_device_side<Device>();
@ -557,7 +644,27 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
} }
} }
void sync_host() { template <class Device>
void sync(const typename std::enable_if<
(!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) ||
(std::is_same<Device, int>::value),
int>::type& = 0) {
sync_impl<Device>(std::false_type{});
}
template <class Device, class ExecutionSpace>
void sync(const ExecutionSpace& exec,
const typename std::enable_if<
(!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) ||
(std::is_same<Device, int>::value),
int>::type& = 0) {
sync_impl<Device>(std::false_type{}, exec);
}
// deliberately passing args by cref as they're used multiple times
template <typename... Args>
void sync_host_impl(Args const&... args) {
if (!std::is_same<typename traits::data_type, if (!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) typename traits::non_const_data_type>::value)
Impl::throw_runtime_exception( Impl::throw_runtime_exception(
@ -569,18 +676,26 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
Kokkos::CudaUVMSpace>::value) { Kokkos::CudaUVMSpace>::value) {
if (d_view.data() == h_view.data()) if (d_view.data() == h_view.data())
Kokkos::Impl::cuda_prefetch_pointer( Kokkos::Impl::cuda_prefetch_pointer(
Kokkos::Cuda(), d_view.data(), Impl::get_cuda_space(args...), d_view.data(),
sizeof(typename t_dev::value_type) * d_view.span(), false); sizeof(typename t_dev::value_type) * d_view.span(), false);
} }
#endif #endif
deep_copy(h_view, d_view); deep_copy(args..., h_view, d_view);
modified_flags(1) = modified_flags(0) = 0; modified_flags(1) = modified_flags(0) = 0;
impl_report_host_sync(); impl_report_host_sync();
} }
} }
void sync_device() { template <class ExecSpace>
void sync_host(const ExecSpace& exec) {
sync_host_impl(exec);
}
void sync_host() { sync_host_impl(); }
// deliberately passing args by cref as they're used multiple times
template <typename... Args>
void sync_device_impl(Args const&... args) {
if (!std::is_same<typename traits::data_type, if (!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) typename traits::non_const_data_type>::value)
Impl::throw_runtime_exception( Impl::throw_runtime_exception(
@ -592,17 +707,23 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
Kokkos::CudaUVMSpace>::value) { Kokkos::CudaUVMSpace>::value) {
if (d_view.data() == h_view.data()) if (d_view.data() == h_view.data())
Kokkos::Impl::cuda_prefetch_pointer( Kokkos::Impl::cuda_prefetch_pointer(
Kokkos::Cuda(), d_view.data(), Impl::get_cuda_space(args...), d_view.data(),
sizeof(typename t_dev::value_type) * d_view.span(), true); sizeof(typename t_dev::value_type) * d_view.span(), true);
} }
#endif #endif
deep_copy(d_view, h_view); deep_copy(args..., d_view, h_view);
modified_flags(1) = modified_flags(0) = 0; modified_flags(1) = modified_flags(0) = 0;
impl_report_device_sync(); impl_report_device_sync();
} }
} }
template <class ExecSpace>
void sync_device(const ExecSpace& exec) {
sync_device_impl(exec);
}
void sync_device() { sync_device_impl(); }
template <class Device> template <class Device>
bool need_sync() const { bool need_sync() const {
if (modified_flags.data() == nullptr) return false; if (modified_flags.data() == nullptr) return false;
@ -658,6 +779,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
template <class Device> template <class Device>
void modify() { void modify() {
if (modified_flags.data() == nullptr) return; if (modified_flags.data() == nullptr) return;
if (impl_dualview_is_single_device::value) return;
int dev = get_device_side<Device>(); int dev = get_device_side<Device>();
if (dev == 1) { // if Device is the same as DualView's device type if (dev == 1) { // if Device is the same as DualView's device type
@ -690,6 +812,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
} }
inline void modify_host() { inline void modify_host() {
if (impl_dualview_is_single_device::value) return;
if (modified_flags.data() != nullptr) { if (modified_flags.data() != nullptr) {
modified_flags(0) = modified_flags(0) =
(modified_flags(1) > modified_flags(0) ? modified_flags(1) (modified_flags(1) > modified_flags(0) ? modified_flags(1)
@ -710,6 +833,7 @@ class DualView : public ViewTraits<DataType, Arg1Type, Arg2Type, Arg3Type> {
} }
inline void modify_device() { inline void modify_device() {
if (impl_dualview_is_single_device::value) return;
if (modified_flags.data() != nullptr) { if (modified_flags.data() != nullptr) {
modified_flags(1) = modified_flags(1) =
(modified_flags(1) > modified_flags(0) ? modified_flags(1) (modified_flags(1) > modified_flags(0) ? modified_flags(1)

View File

@ -245,13 +245,10 @@ KOKKOS_INLINE_FUNCTION bool dyn_rank_view_verify_operator_bounds(
return (size_t(i) < map.extent(R)) && return (size_t(i) < map.extent(R)) &&
dyn_rank_view_verify_operator_bounds<R + 1>(rank, map, args...); dyn_rank_view_verify_operator_bounds<R + 1>(rank, map, args...);
} else if (i != 0) { } else if (i != 0) {
// FIXME_SYCL SYCL doesn't allow printf in kernels KOKKOS_IMPL_DO_NOT_USE_PRINTF(
#ifndef KOKKOS_ENABLE_SYCL
printf(
"DynRankView Debug Bounds Checking Error: at rank %u\n Extra " "DynRankView Debug Bounds Checking Error: at rank %u\n Extra "
"arguments beyond the rank must be zero \n", "arguments beyond the rank must be zero \n",
R); R);
#endif
return (false) && return (false) &&
dyn_rank_view_verify_operator_bounds<R + 1>(rank, map, args...); dyn_rank_view_verify_operator_bounds<R + 1>(rank, map, args...);
} else { } else {
@ -575,28 +572,13 @@ class DynRankView : public ViewTraits<DataType, Properties...> {
(is_layout_left || is_layout_right || is_layout_stride) (is_layout_left || is_layout_right || is_layout_stride)
}; };
template <class Space, bool = Kokkos::Impl::MemorySpaceAccess<
Space, typename traits::memory_space>::accessible>
struct verify_space {
KOKKOS_FORCEINLINE_FUNCTION static void check() {}
};
template <class Space>
struct verify_space<Space, false> {
KOKKOS_FORCEINLINE_FUNCTION static void check() {
Kokkos::abort(
"Kokkos::DynRankView ERROR: attempt to access inaccessible memory "
"space");
};
};
// Bounds checking macros // Bounds checking macros
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK) #if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
// rank of the calling operator - included as first argument in ARG // rank of the calling operator - included as first argument in ARG
#define KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(ARG) \ #define KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(ARG) \
DynRankView::template verify_space< \ Kokkos::Impl::verify_space<Kokkos::Impl::ActiveExecutionMemorySpace, \
Kokkos::Impl::ActiveExecutionMemorySpace>::check(); \ typename traits::memory_space>::check(); \
Kokkos::Impl::dyn_rank_view_verify_operator_bounds< \ Kokkos::Impl::dyn_rank_view_verify_operator_bounds< \
typename traits::memory_space> \ typename traits::memory_space> \
ARG; ARG;
@ -604,8 +586,8 @@ class DynRankView : public ViewTraits<DataType, Properties...> {
#else #else
#define KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(ARG) \ #define KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(ARG) \
DynRankView::template verify_space< \ Kokkos::Impl::verify_space<Kokkos::Impl::ActiveExecutionMemorySpace, \
Kokkos::Impl::ActiveExecutionMemorySpace>::check(); typename traits::memory_space>::check();
#endif #endif

View File

@ -76,6 +76,12 @@ struct ChunkArraySpace<Kokkos::Experimental::HIPSpace> {
using memory_space = typename Kokkos::Experimental::HIPHostPinnedSpace; using memory_space = typename Kokkos::Experimental::HIPHostPinnedSpace;
}; };
#endif #endif
#ifdef KOKKOS_ENABLE_SYCL
template <>
struct ChunkArraySpace<Kokkos::Experimental::SYCLDeviceUSMSpace> {
using memory_space = typename Kokkos::Experimental::SYCLSharedUSMSpace;
};
#endif
} // end namespace Impl } // end namespace Impl
/** \brief Dynamic views are restricted to rank-one and no layout. /** \brief Dynamic views are restricted to rank-one and no layout.

View File

@ -377,25 +377,11 @@ class OffsetView : public ViewTraits<DataType, Properties...> {
std::is_same<typename traits::specialize, void>::value && std::is_same<typename traits::specialize, void>::value &&
(is_layout_left || is_layout_right || is_layout_stride); (is_layout_left || is_layout_right || is_layout_stride);
template <class Space, bool = Kokkos::Impl::MemorySpaceAccess<
Space, typename traits::memory_space>::accessible>
struct verify_space {
KOKKOS_FORCEINLINE_FUNCTION static void check() {}
};
template <class Space>
struct verify_space<Space, false> {
KOKKOS_FORCEINLINE_FUNCTION static void check() {
Kokkos::abort(
"Kokkos::View ERROR: attempt to access inaccessible memory space");
};
};
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK) #if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
#define KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY(ARG) \ #define KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY(ARG) \
OffsetView::template verify_space< \ Kokkos::Impl::verify_space<Kokkos::Impl::ActiveExecutionMemorySpace, \
Kokkos::Impl::ActiveExecutionMemorySpace>::check(); \ typename traits::memory_space>::check(); \
Kokkos::Experimental::Impl::offsetview_verify_operator_bounds< \ Kokkos::Experimental::Impl::offsetview_verify_operator_bounds< \
typename traits::memory_space> \ typename traits::memory_space> \
ARG; ARG;
@ -403,8 +389,8 @@ class OffsetView : public ViewTraits<DataType, Properties...> {
#else #else
#define KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY(ARG) \ #define KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY(ARG) \
OffsetView::template verify_space< \ Kokkos::Impl::verify_space<Kokkos::Impl::ActiveExecutionMemorySpace, \
Kokkos::Impl::ActiveExecutionMemorySpace>::check(); typename traits::memory_space>::check();
#endif #endif
public: public:

View File

@ -649,13 +649,13 @@ struct ReduceDuplicatesBase {
size_t stride; size_t stride;
size_t start; size_t start;
size_t n; size_t n;
ReduceDuplicatesBase(ValueType const* src_in, ValueType* dest_in, ReduceDuplicatesBase(ExecSpace const& exec_space, ValueType const* src_in,
size_t stride_in, size_t start_in, size_t n_in, ValueType* dest_in, size_t stride_in, size_t start_in,
std::string const& name) size_t n_in, std::string const& name)
: src(src_in), dst(dest_in), stride(stride_in), start(start_in), n(n_in) { : src(src_in), dst(dest_in), stride(stride_in), start(start_in), n(n_in) {
parallel_for( parallel_for(
std::string("Kokkos::ScatterView::ReduceDuplicates [") + name + "]", std::string("Kokkos::ScatterView::ReduceDuplicates [") + name + "]",
RangePolicy<ExecSpace, size_t>(0, stride), RangePolicy<ExecSpace, size_t>(exec_space, 0, stride),
static_cast<Derived const&>(*this)); static_cast<Derived const&>(*this));
} }
}; };
@ -667,9 +667,10 @@ template <typename ExecSpace, typename ValueType, typename Op>
struct ReduceDuplicates struct ReduceDuplicates
: public ReduceDuplicatesBase<ExecSpace, ValueType, Op> { : public ReduceDuplicatesBase<ExecSpace, ValueType, Op> {
using Base = ReduceDuplicatesBase<ExecSpace, ValueType, Op>; using Base = ReduceDuplicatesBase<ExecSpace, ValueType, Op>;
ReduceDuplicates(ValueType const* src_in, ValueType* dst_in, size_t stride_in, ReduceDuplicates(ExecSpace const& exec_space, ValueType const* src_in,
size_t start_in, size_t n_in, std::string const& name) ValueType* dst_in, size_t stride_in, size_t start_in,
: Base(src_in, dst_in, stride_in, start_in, n_in, name) {} size_t n_in, std::string const& name)
: Base(exec_space, src_in, dst_in, stride_in, start_in, n_in, name) {}
KOKKOS_FORCEINLINE_FUNCTION void operator()(size_t i) const { KOKKOS_FORCEINLINE_FUNCTION void operator()(size_t i) const {
for (size_t j = Base::start; j < Base::n; ++j) { for (size_t j = Base::start; j < Base::n; ++j) {
ScatterValue<ValueType, Op, ExecSpace, ScatterValue<ValueType, Op, ExecSpace,
@ -687,12 +688,12 @@ template <typename ExecSpace, typename ValueType, typename Op>
struct ResetDuplicatesBase { struct ResetDuplicatesBase {
using Derived = ResetDuplicates<ExecSpace, ValueType, Op>; using Derived = ResetDuplicates<ExecSpace, ValueType, Op>;
ValueType* data; ValueType* data;
ResetDuplicatesBase(ValueType* data_in, size_t size_in, ResetDuplicatesBase(ExecSpace const& exec_space, ValueType* data_in,
std::string const& name) size_t size_in, std::string const& name)
: data(data_in) { : data(data_in) {
parallel_for( parallel_for(
std::string("Kokkos::ScatterView::ResetDuplicates [") + name + "]", std::string("Kokkos::ScatterView::ResetDuplicates [") + name + "]",
RangePolicy<ExecSpace, size_t>(0, size_in), RangePolicy<ExecSpace, size_t>(exec_space, 0, size_in),
static_cast<Derived const&>(*this)); static_cast<Derived const&>(*this));
} }
}; };
@ -703,8 +704,9 @@ struct ResetDuplicatesBase {
template <typename ExecSpace, typename ValueType, typename Op> template <typename ExecSpace, typename ValueType, typename Op>
struct ResetDuplicates : public ResetDuplicatesBase<ExecSpace, ValueType, Op> { struct ResetDuplicates : public ResetDuplicatesBase<ExecSpace, ValueType, Op> {
using Base = ResetDuplicatesBase<ExecSpace, ValueType, Op>; using Base = ResetDuplicatesBase<ExecSpace, ValueType, Op>;
ResetDuplicates(ValueType* data_in, size_t size_in, std::string const& name) ResetDuplicates(ExecSpace const& exec_space, ValueType* data_in,
: Base(data_in, size_in, name) {} size_t size_in, std::string const& name)
: Base(exec_space, data_in, size_in, name) {}
KOKKOS_FORCEINLINE_FUNCTION void operator()(size_t i) const { KOKKOS_FORCEINLINE_FUNCTION void operator()(size_t i) const {
ScatterValue<ValueType, Op, ExecSpace, ScatterValue<ValueType, Op, ExecSpace,
Kokkos::Experimental::ScatterNonAtomic> Kokkos::Experimental::ScatterNonAtomic>
@ -713,6 +715,16 @@ struct ResetDuplicates : public ResetDuplicatesBase<ExecSpace, ValueType, Op> {
} }
}; };
template <typename... P>
void check_scatter_view_allocation_properties_argument(
ViewCtorProp<P...> const&) {
static_assert(ViewCtorProp<P...>::has_execution_space &&
ViewCtorProp<P...>::has_label &&
ViewCtorProp<P...>::initialize,
"Allocation property must have an execution name as well as a "
"label, and must perform the view initialization");
}
} // namespace Experimental } // namespace Experimental
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos
@ -762,10 +774,26 @@ class ScatterView<DataType, Layout, DeviceType, Op, ScatterNonDuplicated,
ScatterView(View<RT, RP...> const& original_view) ScatterView(View<RT, RP...> const& original_view)
: internal_view(original_view) {} : internal_view(original_view) {}
template <typename RT, typename... P, typename... RP>
ScatterView(execution_space const& /* exec_space */,
View<RT, RP...> const& original_view)
: internal_view(original_view) {}
template <typename... Dims> template <typename... Dims>
ScatterView(std::string const& name, Dims... dims) ScatterView(std::string const& name, Dims... dims)
: internal_view(name, dims...) {} : internal_view(name, dims...) {}
// This overload allows specifying an execution space instance to be
// used by passing, e.g., Kokkos::view_alloc(exec_space, "label") as
// first argument.
template <typename... P, typename... Dims>
ScatterView(::Kokkos::Impl::ViewCtorProp<P...> const& arg_prop, Dims... dims)
: internal_view(arg_prop, dims...) {
using ::Kokkos::Impl::Experimental::
check_scatter_view_allocation_properties_argument;
check_scatter_view_allocation_properties_argument(arg_prop);
}
template <typename OtherDataType, typename OtherDeviceType> template <typename OtherDataType, typename OtherDeviceType>
KOKKOS_FUNCTION ScatterView( KOKKOS_FUNCTION ScatterView(
const ScatterView<OtherDataType, Layout, OtherDeviceType, Op, const ScatterView<OtherDataType, Layout, OtherDeviceType, Op,
@ -796,27 +824,41 @@ class ScatterView<DataType, Layout, DeviceType, Op, ScatterNonDuplicated,
template <typename DT, typename... RP> template <typename DT, typename... RP>
void contribute_into(View<DT, RP...> const& dest) const { void contribute_into(View<DT, RP...> const& dest) const {
contribute_into(execution_space(), dest);
}
template <typename DT, typename... RP>
void contribute_into(execution_space const& exec_space,
View<DT, RP...> const& dest) const {
using dest_type = View<DT, RP...>; using dest_type = View<DT, RP...>;
static_assert(std::is_same<typename dest_type::array_layout, Layout>::value, static_assert(std::is_same<typename dest_type::array_layout, Layout>::value,
"ScatterView contribute destination has different layout"); "ScatterView contribute destination has different layout");
static_assert( static_assert(
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< Kokkos::Impl::SpaceAccessibility<
memory_space, typename dest_type::memory_space>::value, execution_space, typename dest_type::memory_space>::accessible,
"ScatterView contribute destination memory space not accessible"); "ScatterView contribute destination memory space not accessible");
if (dest.data() == internal_view.data()) return; if (dest.data() == internal_view.data()) return;
Kokkos::Impl::Experimental::ReduceDuplicates<execution_space, Kokkos::Impl::Experimental::ReduceDuplicates<execution_space,
original_value_type, Op>( original_value_type, Op>(
internal_view.data(), dest.data(), 0, 0, 1, internal_view.label()); exec_space, internal_view.data(), dest.data(), 0, 0, 1,
internal_view.label());
} }
void reset() { void reset(execution_space const& exec_space = execution_space()) {
Kokkos::Impl::Experimental::ResetDuplicates<execution_space, Kokkos::Impl::Experimental::ResetDuplicates<execution_space,
original_value_type, Op>( original_value_type, Op>(
internal_view.data(), internal_view.size(), internal_view.label()); exec_space, internal_view.data(), internal_view.size(),
internal_view.label());
} }
template <typename DT, typename... RP> template <typename DT, typename... RP>
void reset_except(View<DT, RP...> const& view) { void reset_except(View<DT, RP...> const& view) {
if (view.data() != internal_view.data()) reset(); reset_except(execution_space(), view);
}
template <typename DT, typename... RP>
void reset_except(const execution_space& exec_space,
View<DT, RP...> const& view) {
if (view.data() != internal_view.data()) reset(exec_space);
} }
void resize(const size_t n0 = 0, const size_t n1 = 0, const size_t n2 = 0, void resize(const size_t n0 = 0, const size_t n1 = 0, const size_t n2 = 0,
@ -928,10 +970,16 @@ class ScatterView<DataType, Kokkos::LayoutRight, DeviceType, Op,
template <typename RT, typename... RP> template <typename RT, typename... RP>
ScatterView(View<RT, RP...> const& original_view) ScatterView(View<RT, RP...> const& original_view)
: ScatterView(execution_space(), original_view) {}
template <typename RT, typename... P, typename... RP>
ScatterView(execution_space const& exec_space,
View<RT, RP...> const& original_view)
: unique_token(), : unique_token(),
internal_view( internal_view(
view_alloc(WithoutInitializing, view_alloc(WithoutInitializing,
std::string("duplicated_") + original_view.label()), std::string("duplicated_") + original_view.label(),
exec_space),
unique_token.size(), unique_token.size(),
original_view.rank_dynamic > 0 ? original_view.extent(0) original_view.rank_dynamic > 0 ? original_view.extent(0)
: KOKKOS_IMPL_CTOR_DEFAULT_ARG, : KOKKOS_IMPL_CTOR_DEFAULT_ARG,
@ -949,14 +997,32 @@ class ScatterView<DataType, Kokkos::LayoutRight, DeviceType, Op,
: KOKKOS_IMPL_CTOR_DEFAULT_ARG) : KOKKOS_IMPL_CTOR_DEFAULT_ARG)
{ {
reset(); reset(exec_space);
} }
template <typename... Dims> template <typename... Dims>
ScatterView(std::string const& name, Dims... dims) ScatterView(std::string const& name, Dims... dims)
: internal_view(view_alloc(WithoutInitializing, name), : ScatterView(view_alloc(execution_space(), name), dims...) {}
// This overload allows specifying an execution space instance to be
// used by passing, e.g., Kokkos::view_alloc(exec_space, "label") as
// first argument.
template <typename... P, typename... Dims>
ScatterView(::Kokkos::Impl::ViewCtorProp<P...> const& arg_prop, Dims... dims)
: internal_view(view_alloc(WithoutInitializing,
static_cast<::Kokkos::Impl::ViewCtorProp<
void, std::string> const&>(arg_prop)
.value),
unique_token.size(), dims...) { unique_token.size(), dims...) {
reset(); using ::Kokkos::Impl::Experimental::
check_scatter_view_allocation_properties_argument;
check_scatter_view_allocation_properties_argument(arg_prop);
auto const exec_space =
static_cast<::Kokkos::Impl::ViewCtorProp<void, execution_space> const&>(
arg_prop)
.value;
reset(exec_space);
} }
template <typename OverrideContribution = Contribution> template <typename OverrideContribution = Contribution>
@ -984,37 +1050,51 @@ class ScatterView<DataType, Kokkos::LayoutRight, DeviceType, Op,
template <typename DT, typename... RP> template <typename DT, typename... RP>
void contribute_into(View<DT, RP...> const& dest) const { void contribute_into(View<DT, RP...> const& dest) const {
contribute_into(execution_space(), dest);
}
template <typename DT, typename... RP>
void contribute_into(execution_space const& exec_space,
View<DT, RP...> const& dest) const {
using dest_type = View<DT, RP...>; using dest_type = View<DT, RP...>;
static_assert(std::is_same<typename dest_type::array_layout, static_assert(std::is_same<typename dest_type::array_layout,
Kokkos::LayoutRight>::value, Kokkos::LayoutRight>::value,
"ScatterView deep_copy destination has different layout"); "ScatterView deep_copy destination has different layout");
static_assert( static_assert(
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< Kokkos::Impl::SpaceAccessibility<
memory_space, typename dest_type::memory_space>::value, execution_space, typename dest_type::memory_space>::accessible,
"ScatterView deep_copy destination memory space not accessible"); "ScatterView deep_copy destination memory space not accessible");
bool is_equal = (dest.data() == internal_view.data()); bool is_equal = (dest.data() == internal_view.data());
size_t start = is_equal ? 1 : 0; size_t start = is_equal ? 1 : 0;
Kokkos::Impl::Experimental::ReduceDuplicates<execution_space, Kokkos::Impl::Experimental::ReduceDuplicates<execution_space,
original_value_type, Op>( original_value_type, Op>(
internal_view.data(), dest.data(), internal_view.stride(0), start, exec_space, internal_view.data(), dest.data(), internal_view.stride(0),
internal_view.extent(0), internal_view.label()); start, internal_view.extent(0), internal_view.label());
} }
void reset() { void reset(execution_space const& exec_space = execution_space()) {
Kokkos::Impl::Experimental::ResetDuplicates<execution_space, Kokkos::Impl::Experimental::ResetDuplicates<execution_space,
original_value_type, Op>( original_value_type, Op>(
internal_view.data(), internal_view.size(), internal_view.label()); exec_space, internal_view.data(), internal_view.size(),
internal_view.label());
} }
template <typename DT, typename... RP> template <typename DT, typename... RP>
void reset_except(View<DT, RP...> const& view) { void reset_except(View<DT, RP...> const& view) {
reset_except(execution_space(), view);
}
template <typename DT, typename... RP>
void reset_except(execution_space const& exec_space,
View<DT, RP...> const& view) {
if (view.data() != internal_view.data()) { if (view.data() != internal_view.data()) {
reset(); reset(exec_space);
return; return;
} }
Kokkos::Impl::Experimental::ResetDuplicates<execution_space, Kokkos::Impl::Experimental::ResetDuplicates<execution_space,
original_value_type, Op>( original_value_type, Op>(
internal_view.data() + view.size(), internal_view.size() - view.size(), exec_space, internal_view.data() + view.size(),
internal_view.label()); internal_view.size() - view.size(), internal_view.label());
} }
void resize(const size_t n0 = 0, const size_t n1 = 0, const size_t n2 = 0, void resize(const size_t n0 = 0, const size_t n1 = 0, const size_t n2 = 0,
@ -1075,7 +1155,13 @@ class ScatterView<DataType, Kokkos::LayoutLeft, DeviceType, Op,
ScatterView() = default; ScatterView() = default;
template <typename RT, typename... RP> template <typename RT, typename... RP>
ScatterView(View<RT, RP...> const& original_view) : unique_token() { ScatterView(View<RT, RP...> const& original_view)
: ScatterView(execution_space(), original_view) {}
template <typename RT, typename... P, typename... RP>
ScatterView(execution_space const& exec_space,
View<RT, RP...> const& original_view)
: unique_token() {
size_t arg_N[8] = {original_view.rank > 0 ? original_view.extent(0) size_t arg_N[8] = {original_view.rank > 0 ? original_view.extent(0)
: KOKKOS_IMPL_CTOR_DEFAULT_ARG, : KOKKOS_IMPL_CTOR_DEFAULT_ARG,
original_view.rank > 1 ? original_view.extent(1) original_view.rank > 1 ? original_view.extent(1)
@ -1094,14 +1180,27 @@ class ScatterView<DataType, Kokkos::LayoutLeft, DeviceType, Op,
arg_N[internal_view_type::rank - 1] = unique_token.size(); arg_N[internal_view_type::rank - 1] = unique_token.size();
internal_view = internal_view_type( internal_view = internal_view_type(
view_alloc(WithoutInitializing, view_alloc(WithoutInitializing,
std::string("duplicated_") + original_view.label()), std::string("duplicated_") + original_view.label(),
exec_space),
arg_N[0], arg_N[1], arg_N[2], arg_N[3], arg_N[4], arg_N[5], arg_N[6], arg_N[0], arg_N[1], arg_N[2], arg_N[3], arg_N[4], arg_N[5], arg_N[6],
arg_N[7]); arg_N[7]);
reset(); reset(exec_space);
} }
template <typename... Dims> template <typename... Dims>
ScatterView(std::string const& name, Dims... dims) { ScatterView(std::string const& name, Dims... dims)
: ScatterView(view_alloc(execution_space(), name), dims...) {}
// This overload allows specifying an execution space instance to be
// used by passing, e.g., Kokkos::view_alloc(exec_space, "label") as
// first argument.
template <typename... P, typename... Dims>
ScatterView(::Kokkos::Impl::ViewCtorProp<P...> const& arg_prop,
Dims... dims) {
using ::Kokkos::Impl::Experimental::
check_scatter_view_allocation_properties_argument;
check_scatter_view_allocation_properties_argument(arg_prop);
original_view_type original_view; original_view_type original_view;
size_t arg_N[8] = {original_view.rank > 0 ? original_view.static_extent(0) size_t arg_N[8] = {original_view.rank > 0 ? original_view.static_extent(0)
: KOKKOS_IMPL_CTOR_DEFAULT_ARG, : KOKKOS_IMPL_CTOR_DEFAULT_ARG,
@ -1120,10 +1219,20 @@ class ScatterView<DataType, Kokkos::LayoutLeft, DeviceType, Op,
KOKKOS_IMPL_CTOR_DEFAULT_ARG}; KOKKOS_IMPL_CTOR_DEFAULT_ARG};
Kokkos::Impl::Experimental::args_to_array(arg_N, 0, dims...); Kokkos::Impl::Experimental::args_to_array(arg_N, 0, dims...);
arg_N[internal_view_type::rank - 1] = unique_token.size(); arg_N[internal_view_type::rank - 1] = unique_token.size();
auto const name =
static_cast<::Kokkos::Impl::ViewCtorProp<void, std::string> const&>(
arg_prop)
.value;
internal_view = internal_view_type(view_alloc(WithoutInitializing, name), internal_view = internal_view_type(view_alloc(WithoutInitializing, name),
arg_N[0], arg_N[1], arg_N[2], arg_N[3], arg_N[0], arg_N[1], arg_N[2], arg_N[3],
arg_N[4], arg_N[5], arg_N[6], arg_N[7]); arg_N[4], arg_N[5], arg_N[6], arg_N[7]);
reset();
auto const exec_space =
static_cast<::Kokkos::Impl::ViewCtorProp<void, execution_space> const&>(
arg_prop)
.value;
reset(exec_space);
} }
template <typename OtherDataType, typename OtherDeviceType> template <typename OtherDataType, typename OtherDeviceType>
@ -1166,6 +1275,12 @@ class ScatterView<DataType, Kokkos::LayoutLeft, DeviceType, Op,
template <typename... RP> template <typename... RP>
void contribute_into(View<RP...> const& dest) const { void contribute_into(View<RP...> const& dest) const {
contribute_into(execution_space(), dest);
}
template <typename... RP>
void contribute_into(execution_space const& exec_space,
View<RP...> const& dest) const {
using dest_type = View<RP...>; using dest_type = View<RP...>;
static_assert( static_assert(
std::is_same<typename dest_type::value_type, std::is_same<typename dest_type::value_type,
@ -1175,34 +1290,42 @@ class ScatterView<DataType, Kokkos::LayoutLeft, DeviceType, Op,
Kokkos::LayoutLeft>::value, Kokkos::LayoutLeft>::value,
"ScatterView deep_copy destination has different layout"); "ScatterView deep_copy destination has different layout");
static_assert( static_assert(
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< Kokkos::Impl::SpaceAccessibility<
memory_space, typename dest_type::memory_space>::value, execution_space, typename dest_type::memory_space>::accessible,
"ScatterView deep_copy destination memory space not accessible"); "ScatterView deep_copy destination memory space not accessible");
auto extent = internal_view.extent(internal_view_type::rank - 1); auto extent = internal_view.extent(internal_view_type::rank - 1);
bool is_equal = (dest.data() == internal_view.data()); bool is_equal = (dest.data() == internal_view.data());
size_t start = is_equal ? 1 : 0; size_t start = is_equal ? 1 : 0;
Kokkos::Impl::Experimental::ReduceDuplicates<execution_space, Kokkos::Impl::Experimental::ReduceDuplicates<execution_space,
original_value_type, Op>( original_value_type, Op>(
internal_view.data(), dest.data(), exec_space, internal_view.data(), dest.data(),
internal_view.stride(internal_view_type::rank - 1), start, extent, internal_view.stride(internal_view_type::rank - 1), start, extent,
internal_view.label()); internal_view.label());
} }
void reset() { void reset(execution_space const& exec_space = execution_space()) {
Kokkos::Impl::Experimental::ResetDuplicates<execution_space, Kokkos::Impl::Experimental::ResetDuplicates<execution_space,
original_value_type, Op>( original_value_type, Op>(
internal_view.data(), internal_view.size(), internal_view.label()); exec_space, internal_view.data(), internal_view.size(),
internal_view.label());
} }
template <typename DT, typename... RP> template <typename DT, typename... RP>
void reset_except(View<DT, RP...> const& view) { void reset_except(View<DT, RP...> const& view) {
reset_except(execution_space(), view);
}
template <typename DT, typename... RP>
void reset_except(execution_space const& exec_space,
View<DT, RP...> const& view) {
if (view.data() != internal_view.data()) { if (view.data() != internal_view.data()) {
reset(); reset(exec_space);
return; return;
} }
Kokkos::Impl::Experimental::ResetDuplicates<execution_space, Kokkos::Impl::Experimental::ResetDuplicates<execution_space,
original_value_type, Op>( original_value_type, Op>(
internal_view.data() + view.size(), internal_view.size() - view.size(), exec_space, internal_view.data() + view.size(),
internal_view.label()); internal_view.size() - view.size(), internal_view.label());
} }
void resize(const size_t n0 = 0, const size_t n1 = 0, const size_t n2 = 0, void resize(const size_t n0 = 0, const size_t n1 = 0, const size_t n2 = 0,
@ -1316,21 +1439,21 @@ template <typename Op = Kokkos::Experimental::ScatterSum,
ScatterView< ScatterView<
RT, typename ViewTraits<RT, RP...>::array_layout, RT, typename ViewTraits<RT, RP...>::array_layout,
typename ViewTraits<RT, RP...>::device_type, Op, typename ViewTraits<RT, RP...>::device_type, Op,
typename Kokkos::Impl::if_c< std::conditional_t<
std::is_same<Duplication, void>::value, std::is_same<Duplication, void>::value,
typename Kokkos::Impl::Experimental::DefaultDuplication< typename Kokkos::Impl::Experimental::DefaultDuplication<
typename ViewTraits<RT, RP...>::execution_space>::type, typename ViewTraits<RT, RP...>::execution_space>::type,
Duplication>::type, Duplication>,
typename Kokkos::Impl::if_c< std::conditional_t<
std::is_same<Contribution, void>::value, std::is_same<Contribution, void>::value,
typename Kokkos::Impl::Experimental::DefaultContribution< typename Kokkos::Impl::Experimental::DefaultContribution<
typename ViewTraits<RT, RP...>::execution_space, typename ViewTraits<RT, RP...>::execution_space,
typename Kokkos::Impl::if_c< typename std::conditional_t<
std::is_same<Duplication, void>::value, std::is_same<Duplication, void>::value,
typename Kokkos::Impl::Experimental::DefaultDuplication< typename Kokkos::Impl::Experimental::DefaultDuplication<
typename ViewTraits<RT, RP...>::execution_space>::type, typename ViewTraits<RT, RP...>::execution_space>::type,
Duplication>::type>::type, Duplication>>::type,
Contribution>::type> Contribution>>
create_scatter_view(View<RT, RP...> const& original_view) { create_scatter_view(View<RT, RP...> const& original_view) {
return original_view; // implicit ScatterView constructor call return original_view; // implicit ScatterView constructor call
} }
@ -1365,12 +1488,21 @@ create_scatter_view(Op, Duplication, Contribution,
namespace Kokkos { namespace Kokkos {
namespace Experimental { namespace Experimental {
template <typename DT1, typename DT2, typename LY, typename ES, typename OP,
typename CT, typename DP, typename... VP>
void contribute(
typename ES::execution_space const& exec_space, View<DT1, VP...>& dest,
Kokkos::Experimental::ScatterView<DT2, LY, ES, OP, CT, DP> const& src) {
src.contribute_into(exec_space, dest);
}
template <typename DT1, typename DT2, typename LY, typename ES, typename OP, template <typename DT1, typename DT2, typename LY, typename ES, typename OP,
typename CT, typename DP, typename... VP> typename CT, typename DP, typename... VP>
void contribute( void contribute(
View<DT1, VP...>& dest, View<DT1, VP...>& dest,
Kokkos::Experimental::ScatterView<DT2, LY, ES, OP, CT, DP> const& src) { Kokkos::Experimental::ScatterView<DT2, LY, ES, OP, CT, DP> const& src) {
src.contribute_into(dest); using execution_space = typename ES::execution_space;
contribute(execution_space{}, dest, src);
} }
} // namespace Experimental } // namespace Experimental

View File

@ -264,26 +264,24 @@ class UnorderedMap {
private: private:
enum : size_type { invalid_index = ~static_cast<size_type>(0) }; enum : size_type { invalid_index = ~static_cast<size_type>(0) };
using impl_value_type = using impl_value_type = std::conditional_t<is_set, int, declared_value_type>;
typename Impl::if_c<is_set, int, declared_value_type>::type;
using key_type_view = typename Impl::if_c< using key_type_view = std::conditional_t<
is_insertable_map, View<key_type *, device_type>, is_insertable_map, View<key_type *, device_type>,
View<const key_type *, device_type, MemoryTraits<RandomAccess> > >::type; View<const key_type *, device_type, MemoryTraits<RandomAccess> > >;
using value_type_view = using value_type_view = std::conditional_t<
typename Impl::if_c<is_insertable_map || is_modifiable_map, is_insertable_map || is_modifiable_map,
View<impl_value_type *, device_type>, View<impl_value_type *, device_type>,
View<const impl_value_type *, device_type, View<const impl_value_type *, device_type, MemoryTraits<RandomAccess> > >;
MemoryTraits<RandomAccess> > >::type;
using size_type_view = typename Impl::if_c< using size_type_view = std::conditional_t<
is_insertable_map, View<size_type *, device_type>, is_insertable_map, View<size_type *, device_type>,
View<const size_type *, device_type, MemoryTraits<RandomAccess> > >::type; View<const size_type *, device_type, MemoryTraits<RandomAccess> > >;
using bitset_type = using bitset_type =
typename Impl::if_c<is_insertable_map, Bitset<execution_space>, std::conditional_t<is_insertable_map, Bitset<execution_space>,
ConstBitset<execution_space> >::type; ConstBitset<execution_space> >;
enum { modified_idx = 0, erasable_idx = 1, failed_insert_idx = 2 }; enum { modified_idx = 0, erasable_idx = 1, failed_insert_idx = 2 };
enum { num_scalars = 3 }; enum { num_scalars = 3 };
@ -540,10 +538,7 @@ class UnorderedMap {
// Previously claimed an unused entry that was not inserted. // Previously claimed an unused entry that was not inserted.
// Release this unused entry immediately. // Release this unused entry immediately.
if (!m_available_indexes.reset(new_index)) { if (!m_available_indexes.reset(new_index)) {
// FIXME_SYCL SYCL doesn't allow printf in kernels KOKKOS_IMPL_DO_NOT_USE_PRINTF("Unable to free existing\n");
#ifndef KOKKOS_ENABLE_SYCL
printf("Unable to free existing\n");
#endif
} }
} }
@ -659,8 +654,8 @@ class UnorderedMap {
/// ///
/// 'const value_type' via Cuda texture fetch must return by value. /// 'const value_type' via Cuda texture fetch must return by value.
KOKKOS_FORCEINLINE_FUNCTION KOKKOS_FORCEINLINE_FUNCTION
typename Impl::if_c<(is_set || has_const_value), impl_value_type, std::conditional_t<(is_set || has_const_value), impl_value_type,
impl_value_type &>::type impl_value_type &>
value_at(size_type i) const { value_at(size_type i) const {
return m_values[is_set ? 0 : (i < capacity() ? i : capacity())]; return m_values[is_set ? 0 : (i < capacity() ? i : capacity())];
} }

View File

@ -57,10 +57,22 @@
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
KOKKOS_FORCEINLINE_FUNCTION
unsigned rotate_left(unsigned i, int r) {
constexpr int size = static_cast<int>(sizeof(unsigned) * CHAR_BIT);
return r ? ((i << r) | (i >> (size - r))) : i;
}
KOKKOS_FORCEINLINE_FUNCTION KOKKOS_FORCEINLINE_FUNCTION
unsigned rotate_right(unsigned i, int r) { unsigned rotate_right(unsigned i, int r) {
enum { size = static_cast<int>(sizeof(unsigned) * CHAR_BIT) }; constexpr int size = static_cast<int>(sizeof(unsigned) * CHAR_BIT);
// FIXME_SYCL llvm.fshr.i32 missing
// (https://github.com/intel/llvm/issues/3308)
#ifdef __SYCL_DEVICE_ONLY__
return rotate_left(i, size - r);
#else
return r ? ((i >> r) | (i << (size - r))) : i; return r ? ((i >> r) | (i << (size - r))) : i;
#endif
} }
template <typename Bitset> template <typename Bitset>

View File

@ -250,8 +250,8 @@ struct UnorderedMapPrint {
uint32_t list = m_map.m_hash_lists(i); uint32_t list = m_map.m_hash_lists(i);
for (size_type curr = list, ii = 0; curr != invalid_index; for (size_type curr = list, ii = 0; curr != invalid_index;
curr = m_map.m_next_index[curr], ++ii) { curr = m_map.m_next_index[curr], ++ii) {
printf("%d[%d]: %d->%d\n", list, ii, m_map.key_at(curr), KOKKOS_IMPL_DO_NOT_USE_PRINTF("%d[%d]: %d->%d\n", list, ii,
m_map.value_at(curr)); m_map.key_at(curr), m_map.value_at(curr));
} }
} }
}; };

View File

@ -2,6 +2,7 @@
KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR}) KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR}) KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src ) KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
KOKKOS_INCLUDE_DIRECTORIES(${KOKKOS_SOURCE_DIR}/core/unit_test/category_files)
foreach(Tag Threads;Serial;OpenMP;HPX;Cuda;HIP;SYCL) foreach(Tag Threads;Serial;OpenMP;HPX;Cuda;HIP;SYCL)
# Because there is always an exception to the rule # Because there is always an exception to the rule
@ -41,11 +42,6 @@ foreach(Tag Threads;Serial;OpenMP;HPX;Cuda;HIP;SYCL)
configure_file(${dir}/dummy.cpp ${file}) configure_file(${dir}/dummy.cpp ${file})
list(APPEND UnitTestSources ${file}) list(APPEND UnitTestSources ${file})
endforeach() endforeach()
list(REMOVE_ITEM UnitTestSources
${CMAKE_CURRENT_BINARY_DIR}/sycl/TestSYCL_Bitset.cpp
${CMAKE_CURRENT_BINARY_DIR}/sycl/TestSYCL_ScatterView.cpp
${CMAKE_CURRENT_BINARY_DIR}/sycl/TestSYCL_UnorderedMap.cpp
)
KOKKOS_ADD_EXECUTABLE_AND_TEST(UnitTest_${Tag} SOURCES ${UnitTestSources}) KOKKOS_ADD_EXECUTABLE_AND_TEST(UnitTest_${Tag} SOURCES ${UnitTestSources})
endif() endif()
endforeach() endforeach()

View File

@ -26,7 +26,7 @@ override LDFLAGS += -lpthread
include $(KOKKOS_PATH)/Makefile.kokkos include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests -I${KOKKOS_PATH}/core/unit_test/category_files
TEST_TARGETS = TEST_TARGETS =
TARGETS = TARGETS =

View File

@ -114,6 +114,8 @@ struct test_dualview_combinations {
a.template modify<typename ViewType::execution_space>(); a.template modify<typename ViewType::execution_space>();
a.template sync<typename ViewType::host_mirror_space>(); a.template sync<typename ViewType::host_mirror_space>();
a.template sync<typename ViewType::host_mirror_space>(
Kokkos::DefaultExecutionSpace{});
a.h_view(5, 1) = 3; a.h_view(5, 1) = 3;
a.h_view(6, 1) = 4; a.h_view(6, 1) = 4;
@ -122,11 +124,15 @@ struct test_dualview_combinations {
ViewType b = Kokkos::subview(a, std::pair<unsigned int, unsigned int>(6, 9), ViewType b = Kokkos::subview(a, std::pair<unsigned int, unsigned int>(6, 9),
std::pair<unsigned int, unsigned int>(0, 1)); std::pair<unsigned int, unsigned int>(0, 1));
a.template sync<typename ViewType::execution_space>(); a.template sync<typename ViewType::execution_space>();
a.template sync<typename ViewType::execution_space>(
Kokkos::DefaultExecutionSpace{});
b.template modify<typename ViewType::execution_space>(); b.template modify<typename ViewType::execution_space>();
Kokkos::deep_copy(b.d_view, 2); Kokkos::deep_copy(b.d_view, 2);
a.template sync<typename ViewType::host_mirror_space>(); a.template sync<typename ViewType::host_mirror_space>();
a.template sync<typename ViewType::host_mirror_space>(
Kokkos::DefaultExecutionSpace{});
Scalar count = 0; Scalar count = 0;
for (unsigned int i = 0; i < a.d_view.extent(0); i++) for (unsigned int i = 0; i < a.d_view.extent(0); i++)
for (unsigned int j = 0; j < a.d_view.extent(1); j++) for (unsigned int j = 0; j < a.d_view.extent(1); j++)
@ -180,6 +186,7 @@ struct test_dual_view_deep_copy {
} else { } else {
a.modify_device(); a.modify_device();
a.sync_host(); a.sync_host();
a.sync_host(Kokkos::DefaultExecutionSpace{});
} }
// Check device view is initialized as expected // Check device view is initialized as expected
@ -208,6 +215,7 @@ struct test_dual_view_deep_copy {
b.template sync<typename ViewType::host_mirror_space>(); b.template sync<typename ViewType::host_mirror_space>();
} else { } else {
b.sync_host(); b.sync_host();
b.sync_host(Kokkos::DefaultExecutionSpace{});
} }
// Perform same checks on b as done on a // Perform same checks on b as done on a
@ -302,6 +310,7 @@ struct test_dualview_resize {
ASSERT_EQ(a.extent(1), m / factor); ASSERT_EQ(a.extent(1), m / factor);
a.sync_device(); a.sync_device();
a.sync_device(Kokkos::DefaultExecutionSpace{});
// Check device view is initialized as expected // Check device view is initialized as expected
a_d_sum = 0; a_d_sum = 0;
@ -404,19 +413,14 @@ void test_dualview_resize() {
Impl::test_dualview_resize<Scalar, Device>(); Impl::test_dualview_resize<Scalar, Device>();
} }
// FIXME_SYCL requires MDRange policy
#ifndef KOKKOS_ENABLE_SYCL
TEST(TEST_CATEGORY, dualview_combination) { TEST(TEST_CATEGORY, dualview_combination) {
test_dualview_combinations<int, TEST_EXECSPACE>(10, true); test_dualview_combinations<int, TEST_EXECSPACE>(10, true);
} }
#endif
TEST(TEST_CATEGORY, dualview_alloc) { TEST(TEST_CATEGORY, dualview_alloc) {
test_dualview_alloc<int, TEST_EXECSPACE>(10); test_dualview_alloc<int, TEST_EXECSPACE>(10);
} }
// FIXME_SYCL requires MDRange policy
#ifndef KOKKOS_ENABLE_SYCL
TEST(TEST_CATEGORY, dualview_combinations_without_init) { TEST(TEST_CATEGORY, dualview_combinations_without_init) {
test_dualview_combinations<int, TEST_EXECSPACE>(10, false); test_dualview_combinations<int, TEST_EXECSPACE>(10, false);
} }
@ -433,8 +437,133 @@ TEST(TEST_CATEGORY, dualview_realloc) {
TEST(TEST_CATEGORY, dualview_resize) { TEST(TEST_CATEGORY, dualview_resize) {
test_dualview_resize<int, TEST_EXECSPACE>(); test_dualview_resize<int, TEST_EXECSPACE>();
} }
namespace {
/**
*
* The following tests are a response to
* https://github.com/kokkos/kokkos/issues/3850
* and
* https://github.com/kokkos/kokkos/pull/3857
*
* DualViews were returning incorrect view types and taking
* inappropriate actions based on the templated view methods.
*
* Specifically, template view methods were always returning
* a device view if the memory space was UVM and a Kokkos::Device was passed.
* Sync/modify methods completely broke down So these tests exist to make sure
* that we keep the semantics of UVM DualViews intact.
*/
// modify if we have other UVM enabled backends
#ifdef KOKKOS_ENABLE_CUDA // OR other UVM builds
#define UVM_ENABLED_BUILD
#endif #endif
#ifdef UVM_ENABLED_BUILD
template <typename ExecSpace>
struct UVMSpaceFor;
#endif
#ifdef KOKKOS_ENABLE_CUDA // specific to CUDA
template <>
struct UVMSpaceFor<Kokkos::Cuda> {
using type = Kokkos::CudaUVMSpace;
};
#endif
#ifdef UVM_ENABLED_BUILD
template <>
struct UVMSpaceFor<Kokkos::DefaultHostExecutionSpace> {
using type = typename UVMSpaceFor<Kokkos::DefaultExecutionSpace>::type;
};
#else
template <typename ExecSpace>
struct UVMSpaceFor {
using type = typename ExecSpace::memory_space;
};
#endif
using ExecSpace = Kokkos::DefaultExecutionSpace;
using MemSpace = typename UVMSpaceFor<Kokkos::DefaultExecutionSpace>::type;
using DeviceType = Kokkos::Device<ExecSpace, MemSpace>;
using DualViewType = Kokkos::DualView<double*, Kokkos::LayoutLeft, DeviceType>;
using d_device = DeviceType;
using h_device = Kokkos::Device<
Kokkos::DefaultHostExecutionSpace,
typename UVMSpaceFor<Kokkos::DefaultHostExecutionSpace>::type>;
TEST(TEST_CATEGORY, dualview_device_correct_kokkos_device) {
DualViewType dv("myView", 100);
dv.clear_sync_state();
auto v_d = dv.template view<d_device>();
using vdt = decltype(v_d);
using vdt_d = vdt::device_type;
using vdt_d_e = vdt_d::execution_space;
ASSERT_STREQ(vdt_d_e::name(), Kokkos::DefaultExecutionSpace::name());
}
TEST(TEST_CATEGORY, dualview_host_correct_kokkos_device) {
DualViewType dv("myView", 100);
dv.clear_sync_state();
auto v_h = dv.template view<h_device>();
using vht = decltype(v_h);
using vht_d = vht::device_type;
using vht_d_e = vht_d::execution_space;
ASSERT_STREQ(vht_d_e::name(), Kokkos::DefaultHostExecutionSpace::name());
}
TEST(TEST_CATEGORY, dualview_host_modify_template_device_sync) {
DualViewType dv("myView", 100);
dv.clear_sync_state();
dv.modify_host();
dv.template sync<d_device>();
EXPECT_TRUE(!dv.need_sync_device());
EXPECT_TRUE(!dv.need_sync_host());
dv.clear_sync_state();
}
TEST(TEST_CATEGORY, dualview_host_modify_template_device_execspace_sync) {
DualViewType dv("myView", 100);
dv.clear_sync_state();
dv.modify_host();
dv.template sync<d_device::execution_space>();
EXPECT_TRUE(!dv.need_sync_device());
EXPECT_TRUE(!dv.need_sync_host());
dv.clear_sync_state();
}
TEST(TEST_CATEGORY, dualview_device_modify_template_host_sync) {
DualViewType dv("myView", 100);
dv.clear_sync_state();
dv.modify_device();
dv.template sync<h_device>();
EXPECT_TRUE(!dv.need_sync_device());
EXPECT_TRUE(!dv.need_sync_host());
dv.clear_sync_state();
}
TEST(TEST_CATEGORY, dualview_device_modify_template_host_execspace_sync) {
DualViewType dv("myView", 100);
dv.clear_sync_state();
dv.modify_device();
dv.template sync<h_device::execution_space>();
EXPECT_TRUE(!dv.need_sync_device());
EXPECT_TRUE(!dv.need_sync_host());
dv.clear_sync_state();
}
TEST(TEST_CATEGORY,
dualview_template_views_return_correct_executionspace_views) {
DualViewType dv("myView", 100);
dv.clear_sync_state();
using hvt = decltype(dv.view<typename Kokkos::DefaultHostExecutionSpace>());
using dvt = decltype(dv.view<typename Kokkos::DefaultExecutionSpace>());
ASSERT_STREQ(Kokkos::DefaultExecutionSpace::name(),
dvt::device_type::execution_space::name());
ASSERT_STREQ(Kokkos::DefaultHostExecutionSpace::name(),
hvt::device_type::execution_space::name());
}
} // anonymous namespace
} // namespace Test } // namespace Test
#endif // KOKKOS_TEST_DUALVIEW_HPP #endif // KOKKOS_TEST_DUALVIEW_HPP

View File

@ -243,8 +243,6 @@ struct TestDynamicView {
} }
}; };
// FIXME_SYCL needs resize_serial
#ifndef KOKKOS_ENABLE_SYCL
TEST(TEST_CATEGORY, dynamic_view) { TEST(TEST_CATEGORY, dynamic_view) {
using TestDynView = TestDynamicView<double, TEST_EXECSPACE>; using TestDynView = TestDynamicView<double, TEST_EXECSPACE>;
@ -252,7 +250,6 @@ TEST(TEST_CATEGORY, dynamic_view) {
TestDynView::run(100000 + 100 * i); TestDynView::run(100000 + 100 * i);
} }
} }
#endif
} // namespace Test } // namespace Test

View File

@ -1,51 +0,0 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 3.0
// Copyright (2020) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TEST_HIP_HPP
#define KOKKOS_TEST_HIP_HPP
#define TEST_CATEGORY hip
#define TEST_EXECSPACE Kokkos::Experimental::HIP
#endif

View File

@ -1,51 +0,0 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 3.0
// Copyright (2020) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TEST_HPX_HPP
#define KOKKOS_TEST_HPX_HPP
#define TEST_CATEGORY hpx
#define TEST_EXECSPACE Kokkos::Experimental::HPX
#endif

View File

@ -130,8 +130,6 @@ void test_offsetview_construction() {
} }
} }
// FIXME_SYCL requires MDRange policy
#ifndef KOKKOS_ENABLE_SYCL
const int ovmin0 = ov.begin(0); const int ovmin0 = ov.begin(0);
const int ovend0 = ov.end(0); const int ovend0 = ov.end(0);
const int ovmin1 = ov.begin(1); const int ovmin1 = ov.begin(1);
@ -178,7 +176,6 @@ void test_offsetview_construction() {
} }
ASSERT_EQ(OVResult, answer) << "Bad data found in OffsetView"; ASSERT_EQ(OVResult, answer) << "Bad data found in OffsetView";
#endif
#endif #endif
{ {
@ -215,8 +212,6 @@ void test_offsetview_construction() {
point3_type{{extent0, extent1, extent2}}); point3_type{{extent0, extent1, extent2}});
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA) #if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
// FIXME_SYCL requires MDRange policy
#ifdef KOKKOS_ENABLE_SYCL
int view3DSum = 0; int view3DSum = 0;
Kokkos::parallel_reduce( Kokkos::parallel_reduce(
rangePolicy3DZero, rangePolicy3DZero,
@ -239,7 +234,6 @@ void test_offsetview_construction() {
ASSERT_EQ(view3DSum, offsetView3DSum) ASSERT_EQ(view3DSum, offsetView3DSum)
<< "construction of OffsetView from View and begins array broken."; << "construction of OffsetView from View and begins array broken.";
#endif
#endif #endif
} }
view_type viewFromOV = ov.view(); view_type viewFromOV = ov.view();
@ -266,8 +260,6 @@ void test_offsetview_construction() {
Kokkos::deep_copy(aView, ov); Kokkos::deep_copy(aView, ov);
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA) #if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
// FIXME_SYCL requires MDRange policy
#ifndef KOKKOS_ENABLE_SYCL
int sum = 0; int sum = 0;
Kokkos::parallel_reduce( Kokkos::parallel_reduce(
rangePolicy2D, rangePolicy2D,
@ -277,7 +269,6 @@ void test_offsetview_construction() {
sum); sum);
ASSERT_EQ(sum, 0) << "deep_copy(view, offsetView) broken."; ASSERT_EQ(sum, 0) << "deep_copy(view, offsetView) broken.";
#endif
#endif #endif
} }
@ -288,8 +279,6 @@ void test_offsetview_construction() {
Kokkos::deep_copy(ov, aView); Kokkos::deep_copy(ov, aView);
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA) #if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
// FIXME_SYCL requires MDRange policy
#ifndef KOKKOS_ENABLE_SYCL
int sum = 0; int sum = 0;
Kokkos::parallel_reduce( Kokkos::parallel_reduce(
rangePolicy2D, rangePolicy2D,
@ -299,7 +288,6 @@ void test_offsetview_construction() {
sum); sum);
ASSERT_EQ(sum, 0) << "deep_copy(offsetView, view) broken."; ASSERT_EQ(sum, 0) << "deep_copy(offsetView, view) broken.";
#endif
#endif #endif
} }
} }
@ -471,8 +459,6 @@ void test_offsetview_subview() {
ASSERT_EQ(offsetSubview.end(1), 9); ASSERT_EQ(offsetSubview.end(1), 9);
#if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA) #if defined(KOKKOS_ENABLE_CUDA_LAMBDA) || !defined(KOKKOS_ENABLE_CUDA)
// FIXME_SYCL requires MDRange policy
#ifndef KOKKOS_ENABLE_SYCL
using range_type = Kokkos::MDRangePolicy<Device, Kokkos::Rank<2>, using range_type = Kokkos::MDRangePolicy<Device, Kokkos::Rank<2>,
Kokkos::IndexType<int> >; Kokkos::IndexType<int> >;
using point_type = typename range_type::point_type; using point_type = typename range_type::point_type;
@ -498,7 +484,6 @@ void test_offsetview_subview() {
sum); sum);
ASSERT_EQ(sum, 6 * (e0 - b0) * (e1 - b1)); ASSERT_EQ(sum, 6 * (e0 - b0) * (e1 - b1));
#endif
#endif #endif
} }
@ -701,12 +686,9 @@ void test_offsetview_offsets_rank3() {
} }
#endif #endif
// FIXME_SYCL needs MDRangePolicy
#ifndef KOKKOS_ENABLE_SYCL
TEST(TEST_CATEGORY, offsetview_construction) { TEST(TEST_CATEGORY, offsetview_construction) {
test_offsetview_construction<int, TEST_EXECSPACE>(); test_offsetview_construction<int, TEST_EXECSPACE>();
} }
#endif
TEST(TEST_CATEGORY, offsetview_unmanaged_construction) { TEST(TEST_CATEGORY, offsetview_unmanaged_construction) {
test_offsetview_unmanaged_construction<int, TEST_EXECSPACE>(); test_offsetview_unmanaged_construction<int, TEST_EXECSPACE>();

View File

@ -1,51 +0,0 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 3.0
// Copyright (2020) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TEST_OPENMP_HPP
#define KOKKOS_TEST_OPENMP_HPP
#define TEST_CATEGORY openmp
#define TEST_EXECSPACE Kokkos::OpenMP
#endif

View File

@ -1,51 +0,0 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 3.0
// Copyright (2020) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TEST_SYCL_HPP
#define KOKKOS_TEST_SYCL_HPP
#define TEST_CATEGORY sycl
#define TEST_EXECSPACE Kokkos::Experimental::SYCL
#endif

View File

@ -437,6 +437,10 @@ struct test_scatter_view_config {
Contribution, Op, Contribution, Op,
NumberType>::orig_view_type; NumberType>::orig_view_type;
void compile_constructor() {
auto sv = scatter_view_def(Kokkos::view_alloc(DeviceType{}, "label"), 10);
}
void run_test(int n) { void run_test(int n) {
// test allocation // test allocation
{ {

View File

@ -1,51 +0,0 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 3.0
// Copyright (2020) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TEST_SERIAL_HPP
#define KOKKOS_TEST_SERIAL_HPP
#define TEST_CATEGORY serial
#define TEST_EXECSPACE Kokkos::Serial
#endif

View File

@ -285,10 +285,7 @@ void run_test_graph4() {
TEST(TEST_CATEGORY, staticcrsgraph) { TEST(TEST_CATEGORY, staticcrsgraph) {
TestStaticCrsGraph::run_test_graph<TEST_EXECSPACE>(); TestStaticCrsGraph::run_test_graph<TEST_EXECSPACE>();
// FIXME_SYCL requires MDRangePolicy
#ifndef KOKKOS_ENABLE_SYCL
TestStaticCrsGraph::run_test_graph2<TEST_EXECSPACE>(); TestStaticCrsGraph::run_test_graph2<TEST_EXECSPACE>();
#endif
TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 0); TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 0);
TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 1000); TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 1000);
TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 10000); TestStaticCrsGraph::run_test_graph3<TEST_EXECSPACE>(1, 10000);

View File

@ -1,51 +0,0 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 3.0
// Copyright (2020) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY NTESS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NTESS OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact Christian R. Trott (crtrott@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TEST_THREADS_HPP
#define KOKKOS_TEST_THREADS_HPP
#define TEST_CATEGORY threads
#define TEST_EXECSPACE Kokkos::Threads
#endif

View File

@ -163,7 +163,8 @@ struct TestFind {
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void operator()(typename execution_space::size_type i, void operator()(typename execution_space::size_type i,
value_type &errors) const { value_type &errors) const {
const bool expect_to_find_i = (i < m_max_key); const bool expect_to_find_i =
(i < typename execution_space::size_type(m_max_key));
const bool exists = m_map.exists(i); const bool exists = m_map.exists(i);
@ -293,10 +294,11 @@ void test_deep_copy(uint32_t num_nodes) {
} }
} }
// FIXME_HIP wrong result in CI but works locally // FIXME_SYCL wrong results on Nvidia GPUs but correct on Host and Intel GPUs
#ifndef KOKKOS_ENABLE_HIP // FIXME_HIP
// WORKAROUND MSVC // WORKAROUND MSVC
#ifndef _WIN32 #if !(defined(KOKKOS_ENABLE_HIP) && (HIP_VERSION < 401)) && \
!defined(_WIN32) && !defined(KOKKOS_ENABLE_SYCL)
TEST(TEST_CATEGORY, UnorderedMap_insert) { TEST(TEST_CATEGORY, UnorderedMap_insert) {
for (int i = 0; i < 500; ++i) { for (int i = 0; i < 500; ++i) {
test_insert<TEST_EXECSPACE>(100000, 90000, 100, true); test_insert<TEST_EXECSPACE>(100000, 90000, 100, true);
@ -304,7 +306,6 @@ TEST(TEST_CATEGORY, UnorderedMap_insert) {
} }
} }
#endif #endif
#endif
TEST(TEST_CATEGORY, UnorderedMap_failed_insert) { TEST(TEST_CATEGORY, UnorderedMap_failed_insert) {
for (int i = 0; i < 1000; ++i) test_failed_insert<TEST_EXECSPACE>(10000); for (int i = 0; i < 1000; ++i) test_failed_insert<TEST_EXECSPACE>(10000);

View File

@ -9,6 +9,14 @@
# that in TriBITS KokkosAlgorithms can be disabled... # that in TriBITS KokkosAlgorithms can be disabled...
#INCLUDE_DIRECTORIES("${CMAKE_CURRENT_SOURCE_DIR}/../../algorithms/src") #INCLUDE_DIRECTORIES("${CMAKE_CURRENT_SOURCE_DIR}/../../algorithms/src")
# FIXME_OPENMPTARGET - the NVIDIA HPC compiler nvc++ in the OpenMPTarget backend does not pass the perf_tests.
IF (KOKKOS_ENABLE_OPENMPTARGET
AND (KOKKOS_CXX_COMPILER_ID STREQUAL PGI
OR KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC))
RETURN()
ENDIF()
SET(SOURCES SET(SOURCES
PerfTestMain.cpp PerfTestMain.cpp
PerfTestGramSchmidt.cpp PerfTestGramSchmidt.cpp
@ -68,8 +76,7 @@ KOKKOS_INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR}) KOKKOS_INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
# This test currently times out for MSVC # This test currently times out for MSVC
# FIXME_SYCL these tests don't compile yet (require parallel_for). IF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC")
IF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC" AND NOT Kokkos_ENABLE_SYCL)
KOKKOS_ADD_EXECUTABLE_AND_TEST( KOKKOS_ADD_EXECUTABLE_AND_TEST(
PerfTestExec PerfTestExec
SOURCES ${SOURCES} SOURCES ${SOURCES}
@ -77,8 +84,6 @@ IF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC" AND NOT Kokkos_ENABLE_SYCL)
) )
ENDIF() ENDIF()
# FIXME_SYCL
IF(NOT Kokkos_ENABLE_SYCL)
KOKKOS_ADD_EXECUTABLE_AND_TEST( KOKKOS_ADD_EXECUTABLE_AND_TEST(
PerformanceTest_Atomic PerformanceTest_Atomic
SOURCES test_atomic.cpp SOURCES test_atomic.cpp
@ -98,7 +103,6 @@ KOKKOS_ADD_EXECUTABLE_AND_TEST(
SOURCES test_mempool.cpp SOURCES test_mempool.cpp
CATEGORIES PERFORMANCE CATEGORIES PERFORMANCE
) )
ENDIF()
IF(NOT Kokkos_ENABLE_OPENMPTARGET) IF(NOT Kokkos_ENABLE_OPENMPTARGET)
# FIXME OPENMPTARGET needs tasking # FIXME OPENMPTARGET needs tasking

View File

@ -69,7 +69,7 @@ struct InvNorm2 : public Kokkos::DotSingle<VectorView> {
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void final(value_type& result) const { void final(value_type& result) const {
result = std::sqrt(result); result = Kokkos::Experimental::sqrt(result);
Rjj() = result; Rjj() = result;
inv() = (0 < result) ? 1.0 / result : 0; inv() = (0 < result) ? 1.0 / result : 0;
} }
@ -145,7 +145,7 @@ struct ModifiedGramSchmidt {
// Q(:,j) *= ( 1 / R(j,j) ); => Q(:,j) *= tmp ; // Q(:,j) *= ( 1 / R(j,j) ); => Q(:,j) *= tmp ;
Kokkos::scale(tmp, Qj); Kokkos::scale(tmp, Qj);
for (size_t k = j + 1; k < count; ++k) { for (size_type k = j + 1; k < count; ++k) {
const vector_type Qk = Kokkos::subview(Q_, Kokkos::ALL(), k); const vector_type Qk = Kokkos::subview(Q_, Kokkos::ALL(), k);
const value_view Rjk = Kokkos::subview(R_, j, k); const value_view Rjk = Kokkos::subview(R_, j, k);
@ -165,7 +165,7 @@ struct ModifiedGramSchmidt {
//-------------------------------------------------------------------------- //--------------------------------------------------------------------------
static double test(const size_t length, const size_t count, static double test(const size_type length, const size_type count,
const size_t iter = 1) { const size_t iter = 1) {
multivector_type Q_("Q", length, count); multivector_type Q_("Q", length, count);
multivector_type R_("R", count, count); multivector_type R_("R", count, count);

View File

@ -72,8 +72,6 @@ KOKKOS_ADD_LIBRARY(
ADD_BUILD_OPTIONS # core should be given all the necessary compiler/linker flags ADD_BUILD_OPTIONS # core should be given all the necessary compiler/linker flags
) )
SET_TARGET_PROPERTIES(kokkoscore PROPERTIES VERSION ${Kokkos_VERSION})
KOKKOS_LIB_INCLUDE_DIRECTORIES(kokkoscore KOKKOS_LIB_INCLUDE_DIRECTORIES(kokkoscore
${KOKKOS_TOP_BUILD_DIR} ${KOKKOS_TOP_BUILD_DIR}
${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_CURRENT_BINARY_DIR}
@ -87,3 +85,4 @@ KOKKOS_LINK_TPL(kokkoscore PUBLIC HPX)
KOKKOS_LINK_TPL(kokkoscore PUBLIC LIBDL) KOKKOS_LINK_TPL(kokkoscore PUBLIC LIBDL)
KOKKOS_LINK_TPL(kokkoscore PUBLIC LIBRT) KOKKOS_LINK_TPL(kokkoscore PUBLIC LIBRT)
KOKKOS_LINK_TPL(kokkoscore PUBLIC PTHREAD) KOKKOS_LINK_TPL(kokkoscore PUBLIC PTHREAD)
KOKKOS_LINK_TPL(kokkoscore PUBLIC ROCM)

View File

@ -45,6 +45,10 @@
#include <Kokkos_Macros.hpp> #include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_CUDA #ifdef KOKKOS_ENABLE_CUDA
#include <Kokkos_Core.hpp>
#include <Kokkos_Cuda.hpp>
#include <Kokkos_CudaSpace.hpp>
#include <cstdlib> #include <cstdlib>
#include <iostream> #include <iostream>
#include <sstream> #include <sstream>
@ -52,10 +56,6 @@
#include <algorithm> #include <algorithm>
#include <atomic> #include <atomic>
#include <Kokkos_Core.hpp>
#include <Kokkos_Cuda.hpp>
#include <Kokkos_CudaSpace.hpp>
//#include <Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp> //#include <Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp>
#include <impl/Kokkos_Error.hpp> #include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_MemorySpace.hpp> #include <impl/Kokkos_MemorySpace.hpp>
@ -65,6 +65,22 @@
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
cudaStream_t Kokkos::Impl::cuda_get_deep_copy_stream() {
static cudaStream_t s = nullptr;
if (s == nullptr) {
cudaStreamCreate(&s);
}
return s;
}
const std::unique_ptr<Kokkos::Cuda> &Kokkos::Impl::cuda_get_deep_copy_space(
bool initialize) {
static std::unique_ptr<Cuda> space = nullptr;
if (!space && initialize)
space = std::make_unique<Cuda>(Kokkos::Impl::cuda_get_deep_copy_stream());
return space;
}
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
@ -72,13 +88,6 @@ namespace {
static std::atomic<int> num_uvm_allocations(0); static std::atomic<int> num_uvm_allocations(0);
cudaStream_t get_deep_copy_stream() {
static cudaStream_t s = nullptr;
if (s == nullptr) {
cudaStreamCreate(&s);
}
return s;
}
} // namespace } // namespace
DeepCopy<CudaSpace, CudaSpace, Cuda>::DeepCopy(void *dst, const void *src, DeepCopy<CudaSpace, CudaSpace, Cuda>::DeepCopy(void *dst, const void *src,
@ -115,7 +124,7 @@ DeepCopy<CudaSpace, HostSpace, Cuda>::DeepCopy(const Cuda &instance, void *dst,
} }
void DeepCopyAsyncCuda(void *dst, const void *src, size_t n) { void DeepCopyAsyncCuda(void *dst, const void *src, size_t n) {
cudaStream_t s = get_deep_copy_stream(); cudaStream_t s = cuda_get_deep_copy_stream();
CUDA_SAFE_CALL(cudaMemcpyAsync(dst, src, n, cudaMemcpyDefault, s)); CUDA_SAFE_CALL(cudaMemcpyAsync(dst, src, n, cudaMemcpyDefault, s));
cudaStreamSynchronize(s); cudaStreamSynchronize(s);
} }
@ -128,14 +137,14 @@ void DeepCopyAsyncCuda(void *dst, const void *src, size_t n) {
namespace Kokkos { namespace Kokkos {
void CudaSpace::access_error() { KOKKOS_DEPRECATED void CudaSpace::access_error() {
const std::string msg( const std::string msg(
"Kokkos::CudaSpace::access_error attempt to execute Cuda function from " "Kokkos::CudaSpace::access_error attempt to execute Cuda function from "
"non-Cuda space"); "non-Cuda space");
Kokkos::Impl::throw_runtime_exception(msg); Kokkos::Impl::throw_runtime_exception(msg);
} }
void CudaSpace::access_error(const void *const) { KOKKOS_DEPRECATED void CudaSpace::access_error(const void *const) {
const std::string msg( const std::string msg(
"Kokkos::CudaSpace::access_error attempt to execute Cuda function from " "Kokkos::CudaSpace::access_error attempt to execute Cuda function from "
"non-Cuda space"); "non-Cuda space");
@ -459,79 +468,6 @@ SharedAllocationRecord<Kokkos::CudaSpace, void>::attach_texture_object(
return tex_obj; return tex_obj;
} }
//==============================================================================
// <editor-fold desc="SharedAllocationRecord::get_label()"> {{{1
std::string SharedAllocationRecord<Kokkos::CudaSpace, void>::get_label() const {
SharedAllocationHeader header;
Kokkos::Impl::DeepCopy<Kokkos::HostSpace, Kokkos::CudaSpace>(
&header, RecordBase::head(), sizeof(SharedAllocationHeader));
return std::string(header.m_label);
}
std::string SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::get_label()
const {
return std::string(RecordBase::head()->m_label);
}
std::string
SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::get_label() const {
return std::string(RecordBase::head()->m_label);
}
// </editor-fold> end SharedAllocationRecord::get_label() }}}1
//==============================================================================
//==============================================================================
// <editor-fold desc="SharedAllocationRecord allocate()"> {{{1
SharedAllocationRecord<Kokkos::CudaSpace, void>
*SharedAllocationRecord<Kokkos::CudaSpace, void>::allocate(
const Kokkos::CudaSpace &arg_space, const std::string &arg_label,
const size_t arg_alloc_size) {
return new SharedAllocationRecord(arg_space, arg_label, arg_alloc_size);
}
SharedAllocationRecord<Kokkos::CudaUVMSpace, void>
*SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::allocate(
const Kokkos::CudaUVMSpace &arg_space, const std::string &arg_label,
const size_t arg_alloc_size) {
return new SharedAllocationRecord(arg_space, arg_label, arg_alloc_size);
}
SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>
*SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::allocate(
const Kokkos::CudaHostPinnedSpace &arg_space,
const std::string &arg_label, const size_t arg_alloc_size) {
return new SharedAllocationRecord(arg_space, arg_label, arg_alloc_size);
}
// </editor-fold> end SharedAllocationRecord allocate() }}}1
//==============================================================================
//==============================================================================
// <editor-fold desc="SharedAllocationRecord deallocate"> {{{1
void SharedAllocationRecord<Kokkos::CudaSpace, void>::deallocate(
SharedAllocationRecord<void, void> *arg_rec) {
delete static_cast<SharedAllocationRecord *>(arg_rec);
}
void SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::deallocate(
SharedAllocationRecord<void, void> *arg_rec) {
delete static_cast<SharedAllocationRecord *>(arg_rec);
}
void SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::deallocate(
SharedAllocationRecord<void, void> *arg_rec) {
delete static_cast<SharedAllocationRecord *>(arg_rec);
}
// </editor-fold> end SharedAllocationRecord deallocate }}}1
//==============================================================================
//============================================================================== //==============================================================================
// <editor-fold desc="SharedAllocationRecord destructors"> {{{1 // <editor-fold desc="SharedAllocationRecord destructors"> {{{1
@ -580,7 +516,7 @@ SharedAllocationRecord<Kokkos::CudaSpace, void>::SharedAllocationRecord(
const SharedAllocationRecord<void, void>::function_type arg_dealloc) const SharedAllocationRecord<void, void>::function_type arg_dealloc)
// Pass through allocated [ SharedAllocationHeader , user_memory ] // Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function // Pass through deallocation function
: SharedAllocationRecord<void, void>( : base_t(
#ifdef KOKKOS_ENABLE_DEBUG #ifdef KOKKOS_ENABLE_DEBUG
&SharedAllocationRecord<Kokkos::CudaSpace, void>::s_root_record, &SharedAllocationRecord<Kokkos::CudaSpace, void>::s_root_record,
#endif #endif
@ -592,13 +528,7 @@ SharedAllocationRecord<Kokkos::CudaSpace, void>::SharedAllocationRecord(
SharedAllocationHeader header; SharedAllocationHeader header;
// Fill in the Header information this->base_t::_fill_host_accessible_header_info(header, arg_label);
header.m_record = static_cast<SharedAllocationRecord<void, void> *>(this);
strncpy(header.m_label, arg_label.c_str(),
SharedAllocationHeader::maximum_label_length);
// Set last element zero, in case c_str is too long
header.m_label[SharedAllocationHeader::maximum_label_length - 1] = (char)0;
// Copy to device memory // Copy to device memory
Kokkos::Impl::DeepCopy<CudaSpace, HostSpace>(RecordBase::m_alloc_ptr, &header, Kokkos::Impl::DeepCopy<CudaSpace, HostSpace>(RecordBase::m_alloc_ptr, &header,
@ -611,7 +541,7 @@ SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::SharedAllocationRecord(
const SharedAllocationRecord<void, void>::function_type arg_dealloc) const SharedAllocationRecord<void, void>::function_type arg_dealloc)
// Pass through allocated [ SharedAllocationHeader , user_memory ] // Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function // Pass through deallocation function
: SharedAllocationRecord<void, void>( : base_t(
#ifdef KOKKOS_ENABLE_DEBUG #ifdef KOKKOS_ENABLE_DEBUG
&SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::s_root_record, &SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::s_root_record,
#endif #endif
@ -620,16 +550,8 @@ SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::SharedAllocationRecord(
sizeof(SharedAllocationHeader) + arg_alloc_size, arg_dealloc), sizeof(SharedAllocationHeader) + arg_alloc_size, arg_dealloc),
m_tex_obj(0), m_tex_obj(0),
m_space(arg_space) { m_space(arg_space) {
// Fill in the Header information, directly accessible via UVM this->base_t::_fill_host_accessible_header_info(*base_t::m_alloc_ptr,
arg_label);
RecordBase::m_alloc_ptr->m_record = this;
strncpy(RecordBase::m_alloc_ptr->m_label, arg_label.c_str(),
SharedAllocationHeader::maximum_label_length);
// Set last element zero, in case c_str is too long
RecordBase::m_alloc_ptr
->m_label[SharedAllocationHeader::maximum_label_length - 1] = (char)0;
} }
SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>:: SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::
@ -639,7 +561,7 @@ SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::
const SharedAllocationRecord<void, void>::function_type arg_dealloc) const SharedAllocationRecord<void, void>::function_type arg_dealloc)
// Pass through allocated [ SharedAllocationHeader , user_memory ] // Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function // Pass through deallocation function
: SharedAllocationRecord<void, void>( : base_t(
#ifdef KOKKOS_ENABLE_DEBUG #ifdef KOKKOS_ENABLE_DEBUG
&SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, &SharedAllocationRecord<Kokkos::CudaHostPinnedSpace,
void>::s_root_record, void>::s_root_record,
@ -648,319 +570,13 @@ SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::
arg_alloc_size), arg_alloc_size),
sizeof(SharedAllocationHeader) + arg_alloc_size, arg_dealloc), sizeof(SharedAllocationHeader) + arg_alloc_size, arg_dealloc),
m_space(arg_space) { m_space(arg_space) {
// Fill in the Header information, directly accessible on the host this->base_t::_fill_host_accessible_header_info(*base_t::m_alloc_ptr,
arg_label);
RecordBase::m_alloc_ptr->m_record = this;
strncpy(RecordBase::m_alloc_ptr->m_label, arg_label.c_str(),
SharedAllocationHeader::maximum_label_length);
// Set last element zero, in case c_str is too long
RecordBase::m_alloc_ptr
->m_label[SharedAllocationHeader::maximum_label_length - 1] = (char)0;
} }
// </editor-fold> end SharedAllocationRecord constructors }}}1 // </editor-fold> end SharedAllocationRecord constructors }}}1
//============================================================================== //==============================================================================
//==============================================================================
// <editor-fold desc="SharedAllocationRecored::(re|de|)allocate_tracked"> {{{1
void *SharedAllocationRecord<Kokkos::CudaSpace, void>::allocate_tracked(
const Kokkos::CudaSpace &arg_space, const std::string &arg_alloc_label,
const size_t arg_alloc_size) {
if (!arg_alloc_size) return nullptr;
SharedAllocationRecord *const r =
allocate(arg_space, arg_alloc_label, arg_alloc_size);
RecordBase::increment(r);
return r->data();
}
void SharedAllocationRecord<Kokkos::CudaSpace, void>::deallocate_tracked(
void *const arg_alloc_ptr) {
if (arg_alloc_ptr != nullptr) {
SharedAllocationRecord *const r = get_record(arg_alloc_ptr);
RecordBase::decrement(r);
}
}
void *SharedAllocationRecord<Kokkos::CudaSpace, void>::reallocate_tracked(
void *const arg_alloc_ptr, const size_t arg_alloc_size) {
SharedAllocationRecord *const r_old = get_record(arg_alloc_ptr);
SharedAllocationRecord *const r_new =
allocate(r_old->m_space, r_old->get_label(), arg_alloc_size);
Kokkos::Impl::DeepCopy<CudaSpace, CudaSpace>(
r_new->data(), r_old->data(), std::min(r_old->size(), r_new->size()));
RecordBase::increment(r_new);
RecordBase::decrement(r_old);
return r_new->data();
}
void *SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::allocate_tracked(
const Kokkos::CudaUVMSpace &arg_space, const std::string &arg_alloc_label,
const size_t arg_alloc_size) {
if (!arg_alloc_size) return nullptr;
SharedAllocationRecord *const r =
allocate(arg_space, arg_alloc_label, arg_alloc_size);
RecordBase::increment(r);
return r->data();
}
void SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::deallocate_tracked(
void *const arg_alloc_ptr) {
if (arg_alloc_ptr != nullptr) {
SharedAllocationRecord *const r = get_record(arg_alloc_ptr);
RecordBase::decrement(r);
}
}
void *SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::reallocate_tracked(
void *const arg_alloc_ptr, const size_t arg_alloc_size) {
SharedAllocationRecord *const r_old = get_record(arg_alloc_ptr);
SharedAllocationRecord *const r_new =
allocate(r_old->m_space, r_old->get_label(), arg_alloc_size);
Kokkos::Impl::DeepCopy<CudaUVMSpace, CudaUVMSpace>(
r_new->data(), r_old->data(), std::min(r_old->size(), r_new->size()));
RecordBase::increment(r_new);
RecordBase::decrement(r_old);
return r_new->data();
}
void *
SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::allocate_tracked(
const Kokkos::CudaHostPinnedSpace &arg_space,
const std::string &arg_alloc_label, const size_t arg_alloc_size) {
if (!arg_alloc_size) return nullptr;
SharedAllocationRecord *const r =
allocate(arg_space, arg_alloc_label, arg_alloc_size);
RecordBase::increment(r);
return r->data();
}
void SharedAllocationRecord<Kokkos::CudaHostPinnedSpace,
void>::deallocate_tracked(void *const
arg_alloc_ptr) {
if (arg_alloc_ptr != nullptr) {
SharedAllocationRecord *const r = get_record(arg_alloc_ptr);
RecordBase::decrement(r);
}
}
void *
SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::reallocate_tracked(
void *const arg_alloc_ptr, const size_t arg_alloc_size) {
SharedAllocationRecord *const r_old = get_record(arg_alloc_ptr);
SharedAllocationRecord *const r_new =
allocate(r_old->m_space, r_old->get_label(), arg_alloc_size);
Kokkos::Impl::DeepCopy<CudaHostPinnedSpace, CudaHostPinnedSpace>(
r_new->data(), r_old->data(), std::min(r_old->size(), r_new->size()));
RecordBase::increment(r_new);
RecordBase::decrement(r_old);
return r_new->data();
}
// </editor-fold> end SharedAllocationRecored::(re|de|)allocate_tracked }}}1
//==============================================================================
//==============================================================================
// <editor-fold desc="SharedAllocationRecord::get_record()"> {{{1
SharedAllocationRecord<Kokkos::CudaSpace, void> *
SharedAllocationRecord<Kokkos::CudaSpace, void>::get_record(void *alloc_ptr) {
using RecordCuda = SharedAllocationRecord<Kokkos::CudaSpace, void>;
using Header = SharedAllocationHeader;
// Copy the header from the allocation
Header head;
Header const *const head_cuda =
alloc_ptr ? Header::get_header(alloc_ptr) : nullptr;
if (alloc_ptr) {
Kokkos::Impl::DeepCopy<HostSpace, CudaSpace>(
&head, head_cuda, sizeof(SharedAllocationHeader));
}
RecordCuda *const record =
alloc_ptr ? static_cast<RecordCuda *>(head.m_record) : nullptr;
if (!alloc_ptr || record->m_alloc_ptr != head_cuda) {
Kokkos::Impl::throw_runtime_exception(
std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , "
"void >::get_record ERROR"));
}
return record;
}
SharedAllocationRecord<Kokkos::CudaUVMSpace, void> *SharedAllocationRecord<
Kokkos::CudaUVMSpace, void>::get_record(void *alloc_ptr) {
using Header = SharedAllocationHeader;
using RecordCuda = SharedAllocationRecord<Kokkos::CudaUVMSpace, void>;
Header *const h =
alloc_ptr ? reinterpret_cast<Header *>(alloc_ptr) - 1 : nullptr;
if (!alloc_ptr || h->m_record->m_alloc_ptr != h) {
Kokkos::Impl::throw_runtime_exception(
std::string("Kokkos::Impl::SharedAllocationRecord< "
"Kokkos::CudaUVMSpace , void >::get_record ERROR"));
}
return static_cast<RecordCuda *>(h->m_record);
}
SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>
*SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::get_record(
void *alloc_ptr) {
using Header = SharedAllocationHeader;
using RecordCuda = SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>;
Header *const h =
alloc_ptr ? reinterpret_cast<Header *>(alloc_ptr) - 1 : nullptr;
if (!alloc_ptr || h->m_record->m_alloc_ptr != h) {
Kokkos::Impl::throw_runtime_exception(
std::string("Kokkos::Impl::SharedAllocationRecord< "
"Kokkos::CudaHostPinnedSpace , void >::get_record ERROR"));
}
return static_cast<RecordCuda *>(h->m_record);
}
// </editor-fold> end SharedAllocationRecord::get_record() }}}1
//==============================================================================
//==============================================================================
// <editor-fold desc="SharedAllocationRecord::print_records()"> {{{1
// Iterate records to print orphaned memory ...
void SharedAllocationRecord<Kokkos::CudaSpace, void>::print_records(
std::ostream &s, const Kokkos::CudaSpace &, bool detail) {
(void)s;
(void)detail;
#ifdef KOKKOS_ENABLE_DEBUG
SharedAllocationRecord<void, void> *r = &s_root_record;
char buffer[256];
SharedAllocationHeader head;
if (detail) {
do {
if (r->m_alloc_ptr) {
Kokkos::Impl::DeepCopy<HostSpace, CudaSpace>(
&head, r->m_alloc_ptr, sizeof(SharedAllocationHeader));
} else {
head.m_label[0] = 0;
}
// Formatting dependent on sizeof(uintptr_t)
const char *format_string;
if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string =
"Cuda addr( 0x%.12lx ) list( 0x%.12lx 0x%.12lx ) extent[ 0x%.12lx "
"+ %.8ld ] count(%d) dealloc(0x%.12lx) %s\n";
} else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string =
"Cuda addr( 0x%.12llx ) list( 0x%.12llx 0x%.12llx ) extent[ "
"0x%.12llx + %.8ld ] count(%d) dealloc(0x%.12llx) %s\n";
}
snprintf(buffer, 256, format_string, reinterpret_cast<uintptr_t>(r),
reinterpret_cast<uintptr_t>(r->m_prev),
reinterpret_cast<uintptr_t>(r->m_next),
reinterpret_cast<uintptr_t>(r->m_alloc_ptr), r->m_alloc_size,
r->m_count, reinterpret_cast<uintptr_t>(r->m_dealloc),
head.m_label);
s << buffer;
r = r->m_next;
} while (r != &s_root_record);
} else {
do {
if (r->m_alloc_ptr) {
Kokkos::Impl::DeepCopy<HostSpace, CudaSpace>(
&head, r->m_alloc_ptr, sizeof(SharedAllocationHeader));
// Formatting dependent on sizeof(uintptr_t)
const char *format_string;
if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string = "Cuda [ 0x%.12lx + %ld ] %s\n";
} else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string = "Cuda [ 0x%.12llx + %ld ] %s\n";
}
snprintf(buffer, 256, format_string,
reinterpret_cast<uintptr_t>(r->data()), r->size(),
head.m_label);
} else {
snprintf(buffer, 256, "Cuda [ 0 + 0 ]\n");
}
s << buffer;
r = r->m_next;
} while (r != &s_root_record);
}
#else
Kokkos::Impl::throw_runtime_exception(
"SharedAllocationHeader<CudaSpace>::print_records only works with "
"KOKKOS_ENABLE_DEBUG enabled");
#endif
}
void SharedAllocationRecord<Kokkos::CudaUVMSpace, void>::print_records(
std::ostream &s, const Kokkos::CudaUVMSpace &, bool detail) {
(void)s;
(void)detail;
#ifdef KOKKOS_ENABLE_DEBUG
SharedAllocationRecord<void, void>::print_host_accessible_records(
s, "CudaUVM", &s_root_record, detail);
#else
Kokkos::Impl::throw_runtime_exception(
"SharedAllocationHeader<CudaSpace>::print_records only works with "
"KOKKOS_ENABLE_DEBUG enabled");
#endif
}
void SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>::print_records(
std::ostream &s, const Kokkos::CudaHostPinnedSpace &, bool detail) {
(void)s;
(void)detail;
#ifdef KOKKOS_ENABLE_DEBUG
SharedAllocationRecord<void, void>::print_host_accessible_records(
s, "CudaHostPinned", &s_root_record, detail);
#else
Kokkos::Impl::throw_runtime_exception(
"SharedAllocationHeader<CudaSpace>::print_records only works with "
"KOKKOS_ENABLE_DEBUG enabled");
#endif
}
// </editor-fold> end SharedAllocationRecord::print_records() }}}1
//==============================================================================
void cuda_prefetch_pointer(const Cuda &space, const void *ptr, size_t bytes, void cuda_prefetch_pointer(const Cuda &space, const void *ptr, size_t bytes,
bool to_device) { bool to_device) {
if ((ptr == nullptr) || (bytes == 0)) return; if ((ptr == nullptr) || (bytes == 0)) return;
@ -984,6 +600,29 @@ void cuda_prefetch_pointer(const Cuda &space, const void *ptr, size_t bytes,
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos
//==============================================================================
// <editor-fold desc="Explicit instantiations of CRTP Base classes"> {{{1
#include <impl/Kokkos_SharedAlloc_timpl.hpp>
namespace Kokkos {
namespace Impl {
// To avoid additional compilation cost for something that's (mostly?) not
// performance sensitive, we explicity instantiate these CRTP base classes here,
// where we have access to the associated *_timpl.hpp header files.
template class SharedAllocationRecordCommon<Kokkos::CudaSpace>;
template class HostInaccessibleSharedAllocationRecordCommon<Kokkos::CudaSpace>;
template class SharedAllocationRecordCommon<Kokkos::CudaUVMSpace>;
template class SharedAllocationRecordCommon<Kokkos::CudaHostPinnedSpace>;
} // end namespace Impl
} // end namespace Kokkos
// </editor-fold> end Explicit instantiations of CRTP Base classes }}}1
//==============================================================================
#else #else
void KOKKOS_CORE_SRC_CUDA_CUDASPACE_PREVENT_LINK_ERROR() {} void KOKKOS_CORE_SRC_CUDA_CUDASPACE_PREVENT_LINK_ERROR() {}
#endif // KOKKOS_ENABLE_CUDA #endif // KOKKOS_ENABLE_CUDA

View File

@ -140,7 +140,7 @@ inline int cuda_deduce_block_size(bool early_termination,
} }
} }
if (early_termination && blocks_per_sm != 0) break; if (early_termination && opt_block_size != 0) break;
} }
return opt_block_size; return opt_block_size;
@ -222,7 +222,8 @@ inline size_t get_shmem_per_sm_prefer_l1(cudaDeviceProp const& properties) {
case 52: case 52:
case 61: return 96; case 61: return 96;
case 70: case 70:
case 80: return 8; case 80:
case 86: return 8;
case 75: return 32; case 75: return 32;
default: default:
Kokkos::Impl::throw_runtime_exception( Kokkos::Impl::throw_runtime_exception(

View File

@ -175,30 +175,42 @@ class half_t {
return cast_from_half<unsigned long long>(*this); return cast_from_half<unsigned long long>(*this);
} }
/**
* Conversion constructors.
*
* Support implicit conversions from impl_type, float, double -> half_t
* Mixed precision expressions require upcasting which is done in the
* "// Binary Arithmetic" operator overloads below.
*
* Support implicit conversions from integral types -> half_t.
* Expressions involving half_t with integral types require downcasting
* the integral types to half_t. Existing operator overloads can handle this
* with the addition of the below implicit conversion constructors.
*/
KOKKOS_FUNCTION KOKKOS_FUNCTION
half_t(impl_type rhs) : val(rhs) {} half_t(impl_type rhs) : val(rhs) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(float rhs) : val(cast_to_half(rhs).val) {} half_t(float rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(double rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(bool rhs) : val(cast_to_half(rhs).val) {} explicit half_t(bool rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(double rhs) : val(cast_to_half(rhs).val) {} half_t(short rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(short rhs) : val(cast_to_half(rhs).val) {} half_t(int rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(int rhs) : val(cast_to_half(rhs).val) {} half_t(long rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(long rhs) : val(cast_to_half(rhs).val) {} half_t(long long rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(long long rhs) : val(cast_to_half(rhs).val) {} half_t(unsigned short rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(unsigned short rhs) : val(cast_to_half(rhs).val) {} half_t(unsigned int rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(unsigned int rhs) : val(cast_to_half(rhs).val) {} half_t(unsigned long rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION KOKKOS_FUNCTION
explicit half_t(unsigned long rhs) : val(cast_to_half(rhs).val) {} half_t(unsigned long long rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
explicit half_t(unsigned long long rhs) : val(cast_to_half(rhs).val) {}
// Unary operators // Unary operators
KOKKOS_FUNCTION KOKKOS_FUNCTION
@ -276,6 +288,11 @@ class half_t {
return *this; return *this;
} }
template <class T>
KOKKOS_FUNCTION void operator=(T rhs) volatile {
val = cast_to_half(rhs).val;
}
// Compound operators // Compound operators
KOKKOS_FUNCTION KOKKOS_FUNCTION
half_t& operator+=(half_t rhs) { half_t& operator+=(half_t rhs) {
@ -287,6 +304,47 @@ class half_t {
return *this; return *this;
} }
KOKKOS_FUNCTION
volatile half_t& operator+=(half_t rhs) volatile {
#ifdef __CUDA_ARCH__
// Cuda 10 supports __half volatile stores but not volatile arithmetic
// operands. Cast away volatile-ness of val for arithmetic but not for store
// location.
val = const_cast<impl_type&>(val) + rhs.val;
#else
// Use non-volatile val_ref to suppress:
// "warning: implicit dereference will not access object of type volatile
// __half in statement"
auto val_ref = const_cast<impl_type&>(val);
val_ref = __float2half(__half2float(const_cast<impl_type&>(val)) +
__half2float(rhs.val));
#endif
return *this;
}
// Compund operators: upcast overloads for +=
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator+=(T& lhs, half_t rhs) {
lhs += static_cast<T>(rhs);
return lhs;
}
KOKKOS_FUNCTION
half_t& operator+=(float rhs) {
float result = static_cast<float>(val) + rhs;
val = static_cast<impl_type>(result);
return *this;
}
KOKKOS_FUNCTION
half_t& operator+=(double rhs) {
double result = static_cast<double>(val) + rhs;
val = static_cast<impl_type>(result);
return *this;
}
KOKKOS_FUNCTION KOKKOS_FUNCTION
half_t& operator-=(half_t rhs) { half_t& operator-=(half_t rhs) {
#ifdef __CUDA_ARCH__ #ifdef __CUDA_ARCH__
@ -297,6 +355,47 @@ class half_t {
return *this; return *this;
} }
KOKKOS_FUNCTION
volatile half_t& operator-=(half_t rhs) volatile {
#ifdef __CUDA_ARCH__
// Cuda 10 supports __half volatile stores but not volatile arithmetic
// operands. Cast away volatile-ness of val for arithmetic but not for store
// location.
val = const_cast<impl_type&>(val) - rhs.val;
#else
// Use non-volatile val_ref to suppress:
// "warning: implicit dereference will not access object of type volatile
// __half in statement"
auto val_ref = const_cast<impl_type&>(val);
val_ref = __float2half(__half2float(const_cast<impl_type&>(val)) -
__half2float(rhs.val));
#endif
return *this;
}
// Compund operators: upcast overloads for -=
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator-=(T& lhs, half_t rhs) {
lhs -= static_cast<T>(rhs);
return lhs;
}
KOKKOS_FUNCTION
half_t& operator-=(float rhs) {
float result = static_cast<float>(val) - rhs;
val = static_cast<impl_type>(result);
return *this;
}
KOKKOS_FUNCTION
half_t& operator-=(double rhs) {
double result = static_cast<double>(val) - rhs;
val = static_cast<impl_type>(result);
return *this;
}
KOKKOS_FUNCTION KOKKOS_FUNCTION
half_t& operator*=(half_t rhs) { half_t& operator*=(half_t rhs) {
#ifdef __CUDA_ARCH__ #ifdef __CUDA_ARCH__
@ -307,6 +406,47 @@ class half_t {
return *this; return *this;
} }
KOKKOS_FUNCTION
volatile half_t& operator*=(half_t rhs) volatile {
#ifdef __CUDA_ARCH__
// Cuda 10 supports __half volatile stores but not volatile arithmetic
// operands. Cast away volatile-ness of val for arithmetic but not for store
// location.
val = const_cast<impl_type&>(val) * rhs.val;
#else
// Use non-volatile val_ref to suppress:
// "warning: implicit dereference will not access object of type volatile
// __half in statement"
auto val_ref = const_cast<impl_type&>(val);
val_ref = __float2half(__half2float(const_cast<impl_type&>(val)) *
__half2float(rhs.val));
#endif
return *this;
}
// Compund operators: upcast overloads for *=
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator*=(T& lhs, half_t rhs) {
lhs *= static_cast<T>(rhs);
return lhs;
}
KOKKOS_FUNCTION
half_t& operator*=(float rhs) {
float result = static_cast<float>(val) * rhs;
val = static_cast<impl_type>(result);
return *this;
}
KOKKOS_FUNCTION
half_t& operator*=(double rhs) {
double result = static_cast<double>(val) * rhs;
val = static_cast<impl_type>(result);
return *this;
}
KOKKOS_FUNCTION KOKKOS_FUNCTION
half_t& operator/=(half_t rhs) { half_t& operator/=(half_t rhs) {
#ifdef __CUDA_ARCH__ #ifdef __CUDA_ARCH__
@ -317,6 +457,47 @@ class half_t {
return *this; return *this;
} }
KOKKOS_FUNCTION
volatile half_t& operator/=(half_t rhs) volatile {
#ifdef __CUDA_ARCH__
// Cuda 10 supports __half volatile stores but not volatile arithmetic
// operands. Cast away volatile-ness of val for arithmetic but not for store
// location.
val = const_cast<impl_type&>(val) / rhs.val;
#else
// Use non-volatile val_ref to suppress:
// "warning: implicit dereference will not access object of type volatile
// __half in statement"
auto val_ref = const_cast<impl_type&>(val);
val_ref = __float2half(__half2float(const_cast<impl_type&>(val)) /
__half2float(rhs.val));
#endif
return *this;
}
// Compund operators: upcast overloads for /=
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator/=(T& lhs, half_t rhs) {
lhs /= static_cast<T>(rhs);
return lhs;
}
KOKKOS_FUNCTION
half_t& operator/=(float rhs) {
float result = static_cast<float>(val) / rhs;
val = static_cast<impl_type>(result);
return *this;
}
KOKKOS_FUNCTION
half_t& operator/=(double rhs) {
double result = static_cast<double>(val) / rhs;
val = static_cast<impl_type>(result);
return *this;
}
// Binary Arithmetic // Binary Arithmetic
KOKKOS_FUNCTION KOKKOS_FUNCTION
half_t friend operator+(half_t lhs, half_t rhs) { half_t friend operator+(half_t lhs, half_t rhs) {
@ -328,6 +509,21 @@ class half_t {
return lhs; return lhs;
} }
// Binary Arithmetic upcast operators for +
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator+(half_t lhs, T rhs) {
return T(lhs) + rhs;
}
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator+(T lhs, half_t rhs) {
return lhs + T(rhs);
}
KOKKOS_FUNCTION KOKKOS_FUNCTION
half_t friend operator-(half_t lhs, half_t rhs) { half_t friend operator-(half_t lhs, half_t rhs) {
#ifdef __CUDA_ARCH__ #ifdef __CUDA_ARCH__
@ -338,6 +534,21 @@ class half_t {
return lhs; return lhs;
} }
// Binary Arithmetic upcast operators for -
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator-(half_t lhs, T rhs) {
return T(lhs) - rhs;
}
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator-(T lhs, half_t rhs) {
return lhs - T(rhs);
}
KOKKOS_FUNCTION KOKKOS_FUNCTION
half_t friend operator*(half_t lhs, half_t rhs) { half_t friend operator*(half_t lhs, half_t rhs) {
#ifdef __CUDA_ARCH__ #ifdef __CUDA_ARCH__
@ -348,6 +559,21 @@ class half_t {
return lhs; return lhs;
} }
// Binary Arithmetic upcast operators for *
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator*(half_t lhs, T rhs) {
return T(lhs) * rhs;
}
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator*(T lhs, half_t rhs) {
return lhs * T(rhs);
}
KOKKOS_FUNCTION KOKKOS_FUNCTION
half_t friend operator/(half_t lhs, half_t rhs) { half_t friend operator/(half_t lhs, half_t rhs) {
#ifdef __CUDA_ARCH__ #ifdef __CUDA_ARCH__
@ -358,6 +584,21 @@ class half_t {
return lhs; return lhs;
} }
// Binary Arithmetic upcast operators for /
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator/(half_t lhs, T rhs) {
return T(lhs) / rhs;
}
template <class T>
KOKKOS_FUNCTION std::enable_if_t<
std::is_same<T, float>::value || std::is_same<T, double>::value, T> friend
operator/(T lhs, half_t rhs) {
return lhs / T(rhs);
}
// Logical operators // Logical operators
KOKKOS_FUNCTION KOKKOS_FUNCTION
bool operator!() const { bool operator!() const {

View File

@ -54,6 +54,7 @@
#include <Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp> #include <Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp>
#include <Cuda/Kokkos_Cuda_Instance.hpp> #include <Cuda/Kokkos_Cuda_Instance.hpp>
#include <Cuda/Kokkos_Cuda_Locks.hpp> #include <Cuda/Kokkos_Cuda_Locks.hpp>
#include <Cuda/Kokkos_Cuda_UniqueToken.hpp>
#include <impl/Kokkos_Error.hpp> #include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_Tools.hpp> #include <impl/Kokkos_Tools.hpp>
@ -248,11 +249,11 @@ void CudaInternal::print_configuration(std::ostream &s) const {
const CudaInternalDevices &dev_info = CudaInternalDevices::singleton(); const CudaInternalDevices &dev_info = CudaInternalDevices::singleton();
#if defined(KOKKOS_ENABLE_CUDA) #if defined(KOKKOS_ENABLE_CUDA)
s << "macro KOKKOS_ENABLE_CUDA : defined" << std::endl; s << "macro KOKKOS_ENABLE_CUDA : defined\n";
#endif #endif
#if defined(CUDA_VERSION) #if defined(CUDA_VERSION)
s << "macro CUDA_VERSION = " << CUDA_VERSION << " = version " s << "macro CUDA_VERSION = " << CUDA_VERSION << " = version "
<< CUDA_VERSION / 1000 << "." << (CUDA_VERSION % 1000) / 10 << std::endl; << CUDA_VERSION / 1000 << "." << (CUDA_VERSION % 1000) / 10 << '\n';
#endif #endif
for (int i = 0; i < dev_info.m_cudaDevCount; ++i) { for (int i = 0; i < dev_info.m_cudaDevCount; ++i) {
@ -274,7 +275,6 @@ CudaInternal::~CudaInternal() {
m_scratchConcurrentBitset) { m_scratchConcurrentBitset) {
std::cerr << "Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize()" std::cerr << "Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize()"
<< std::endl; << std::endl;
std::cerr.flush();
} }
m_cudaDev = -1; m_cudaDev = -1;
@ -358,8 +358,7 @@ void CudaInternal::initialize(int cuda_device_id, cudaStream_t stream) {
if (m_cudaArch == 0) { if (m_cudaArch == 0) {
std::stringstream ss; std::stringstream ss;
ss << "Kokkos::Cuda::initialize ERROR: likely mismatch of architecture" ss << "Kokkos::Cuda::initialize ERROR: likely mismatch of architecture\n";
<< std::endl;
std::string msg = ss.str(); std::string msg = ss.str();
Kokkos::abort(msg.c_str()); Kokkos::abort(msg.c_str());
} }
@ -373,7 +372,7 @@ void CudaInternal::initialize(int cuda_device_id, cudaStream_t stream) {
"compute capability " "compute capability "
<< compiled_major << "." << compiled_minor << compiled_major << "." << compiled_minor
<< " on device with compute capability " << cudaProp.major << "." << " on device with compute capability " << cudaProp.major << "."
<< cudaProp.minor << " is not supported by CUDA!" << std::endl; << cudaProp.minor << " is not supported by CUDA!\n";
std::string msg = ss.str(); std::string msg = ss.str();
Kokkos::abort(msg.c_str()); Kokkos::abort(msg.c_str());
} }
@ -458,7 +457,7 @@ void CudaInternal::initialize(int cuda_device_id, cudaStream_t stream) {
Kokkos::Impl::SharedAllocationRecord<Kokkos::CudaSpace, void>; Kokkos::Impl::SharedAllocationRecord<Kokkos::CudaSpace, void>;
Record *const r = Record *const r =
Record::allocate(Kokkos::CudaSpace(), "InternalScratchBitset", Record::allocate(Kokkos::CudaSpace(), "Kokkos::InternalScratchBitset",
sizeof(uint32_t) * buffer_bound); sizeof(uint32_t) * buffer_bound);
Record::increment(r); Record::increment(r);
@ -492,17 +491,11 @@ void CudaInternal::initialize(int cuda_device_id, cudaStream_t stream) {
#ifdef KOKKOS_ENABLE_CUDA_UVM #ifdef KOKKOS_ENABLE_CUDA_UVM
if (Kokkos::show_warnings() && !cuda_launch_blocking()) { if (Kokkos::show_warnings() && !cuda_launch_blocking()) {
std::cerr << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into " std::cerr << R"warning(
"UVMSpace by default" Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
<< std::endl; without setting CUDA_LAUNCH_BLOCKING=1.
std::cerr << " without setting " The code must call Cuda().fence() after each kernel
"CUDA_LAUNCH_BLOCKING=1." or will likely crash when accessing data on the host.)warning"
<< std::endl;
std::cerr << " The code must call "
"Cuda().fence() after each kernel"
<< std::endl;
std::cerr << " or will likely crash when "
"accessing data on the host."
<< std::endl; << std::endl;
} }
@ -520,19 +513,13 @@ void CudaInternal::initialize(int cuda_device_id, cudaStream_t stream) {
if (Kokkos::show_warnings() && if (Kokkos::show_warnings() &&
(!visible_devices_one && !force_device_alloc)) { (!visible_devices_one && !force_device_alloc)) {
std::cerr << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into " std::cerr << R"warning(
"UVMSpace by default" Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
without setting CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or
setting CUDA_VISIBLE_DEVICES.
This could on multi GPU systems lead to severe performance"
penalties.)warning"
<< std::endl; << std::endl;
std::cerr << " without setting "
"CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or "
<< std::endl;
std::cerr
<< " setting CUDA_VISIBLE_DEVICES."
<< std::endl;
std::cerr << " This could on multi GPU "
"systems lead to severe performance"
<< std::endl;
std::cerr << " penalties." << std::endl;
} }
#endif #endif
@ -575,7 +562,7 @@ Cuda::size_type *CudaInternal::scratch_flags(const Cuda::size_type size) const {
if (m_scratchFlags) Record::decrement(Record::get_record(m_scratchFlags)); if (m_scratchFlags) Record::decrement(Record::get_record(m_scratchFlags));
Record *const r = Record *const r =
Record::allocate(Kokkos::CudaSpace(), "InternalScratchFlags", Record::allocate(Kokkos::CudaSpace(), "Kokkos::InternalScratchFlags",
(sizeof(ScratchGrain) * m_scratchFlagsCount)); (sizeof(ScratchGrain) * m_scratchFlagsCount));
Record::increment(r); Record::increment(r);
@ -600,7 +587,7 @@ Cuda::size_type *CudaInternal::scratch_space(const Cuda::size_type size) const {
if (m_scratchSpace) Record::decrement(Record::get_record(m_scratchSpace)); if (m_scratchSpace) Record::decrement(Record::get_record(m_scratchSpace));
Record *const r = Record *const r =
Record::allocate(Kokkos::CudaSpace(), "InternalScratchSpace", Record::allocate(Kokkos::CudaSpace(), "Kokkos::InternalScratchSpace",
(sizeof(ScratchGrain) * m_scratchSpaceCount)); (sizeof(ScratchGrain) * m_scratchSpaceCount));
Record::increment(r); Record::increment(r);
@ -624,7 +611,7 @@ Cuda::size_type *CudaInternal::scratch_unified(
Record::decrement(Record::get_record(m_scratchUnified)); Record::decrement(Record::get_record(m_scratchUnified));
Record *const r = Record::allocate( Record *const r = Record::allocate(
Kokkos::CudaHostPinnedSpace(), "InternalScratchUnified", Kokkos::CudaHostPinnedSpace(), "Kokkos::InternalScratchUnified",
(sizeof(ScratchGrain) * m_scratchUnifiedCount)); (sizeof(ScratchGrain) * m_scratchUnifiedCount));
Record::increment(r); Record::increment(r);
@ -646,8 +633,9 @@ Cuda::size_type *CudaInternal::scratch_functor(
if (m_scratchFunctor) if (m_scratchFunctor)
Record::decrement(Record::get_record(m_scratchFunctor)); Record::decrement(Record::get_record(m_scratchFunctor));
Record *const r = Record::allocate( Record *const r =
Kokkos::CudaSpace(), "InternalScratchFunctor", m_scratchFunctorSize); Record::allocate(Kokkos::CudaSpace(), "Kokkos::InternalScratchFunctor",
m_scratchFunctorSize);
Record::increment(r); Record::increment(r);
@ -662,7 +650,7 @@ void *CudaInternal::resize_team_scratch_space(std::int64_t bytes,
if (m_team_scratch_current_size == 0) { if (m_team_scratch_current_size == 0) {
m_team_scratch_current_size = bytes; m_team_scratch_current_size = bytes;
m_team_scratch_ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>( m_team_scratch_ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>(
"CudaSpace::ScratchMemory", m_team_scratch_current_size); "Kokkos::CudaSpace::TeamScratchMemory", m_team_scratch_current_size);
} }
if ((bytes > m_team_scratch_current_size) || if ((bytes > m_team_scratch_current_size) ||
((bytes < m_team_scratch_current_size) && (force_shrink))) { ((bytes < m_team_scratch_current_size) && (force_shrink))) {
@ -676,6 +664,9 @@ void *CudaInternal::resize_team_scratch_space(std::int64_t bytes,
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
void CudaInternal::finalize() { void CudaInternal::finalize() {
// skip if finalize() has already been called
if (was_finalized) return;
was_finalized = true; was_finalized = true;
if (nullptr != m_scratchSpace || nullptr != m_scratchFlags) { if (nullptr != m_scratchSpace || nullptr != m_scratchFlags) {
// Only finalize this if we're the singleton // Only finalize this if we're the singleton
@ -719,6 +710,11 @@ void CudaInternal::finalize() {
if (this == &singleton()) { if (this == &singleton()) {
cudaFreeHost(constantMemHostStaging); cudaFreeHost(constantMemHostStaging);
cudaEventDestroy(constantMemReusable); cudaEventDestroy(constantMemReusable);
auto &deep_copy_space =
Kokkos::Impl::cuda_get_deep_copy_space(/*initialize*/ false);
if (deep_copy_space)
deep_copy_space->impl_internal_space_instance()->finalize();
cudaStreamDestroy(cuda_get_deep_copy_stream());
} }
} }
@ -821,62 +817,23 @@ Cuda::size_type Cuda::device_arch() {
void Cuda::impl_finalize() { Impl::CudaInternal::singleton().finalize(); } void Cuda::impl_finalize() { Impl::CudaInternal::singleton().finalize(); }
Cuda::Cuda() Cuda::Cuda()
: m_space_instance(&Impl::CudaInternal::singleton()), m_counter(nullptr) { : m_space_instance(&Impl::CudaInternal::singleton(),
[](Impl::CudaInternal *) {}) {
Impl::CudaInternal::singleton().verify_is_initialized( Impl::CudaInternal::singleton().verify_is_initialized(
"Cuda instance constructor"); "Cuda instance constructor");
} }
Cuda::Cuda(cudaStream_t stream) Cuda::Cuda(cudaStream_t stream)
: m_space_instance(new Impl::CudaInternal), m_counter(new int(1)) { : m_space_instance(new Impl::CudaInternal, [](Impl::CudaInternal *ptr) {
ptr->finalize();
delete ptr;
}) {
Impl::CudaInternal::singleton().verify_is_initialized( Impl::CudaInternal::singleton().verify_is_initialized(
"Cuda instance constructor"); "Cuda instance constructor");
m_space_instance->initialize(Impl::CudaInternal::singleton().m_cudaDev, m_space_instance->initialize(Impl::CudaInternal::singleton().m_cudaDev,
stream); stream);
} }
KOKKOS_FUNCTION Cuda::Cuda(Cuda &&other) noexcept {
m_space_instance = other.m_space_instance;
other.m_space_instance = nullptr;
m_counter = other.m_counter;
other.m_counter = nullptr;
}
KOKKOS_FUNCTION Cuda::Cuda(const Cuda &other)
: m_space_instance(other.m_space_instance), m_counter(other.m_counter) {
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA
if (m_counter) Kokkos::atomic_add(m_counter, 1);
#endif
}
KOKKOS_FUNCTION Cuda &Cuda::operator=(Cuda &&other) noexcept {
m_space_instance = other.m_space_instance;
other.m_space_instance = nullptr;
m_counter = other.m_counter;
other.m_counter = nullptr;
return *this;
}
KOKKOS_FUNCTION Cuda &Cuda::operator=(const Cuda &other) {
m_space_instance = other.m_space_instance;
m_counter = other.m_counter;
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA
if (m_counter) Kokkos::atomic_add(m_counter, 1);
#endif
return *this;
}
KOKKOS_FUNCTION Cuda::~Cuda() noexcept {
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA
if (m_counter == nullptr) return;
int const count = Kokkos::atomic_fetch_sub(m_counter, 1);
if (count == 1) {
delete m_counter;
m_space_instance->finalize();
delete m_space_instance;
}
#endif
}
void Cuda::print_configuration(std::ostream &s, const bool) { void Cuda::print_configuration(std::ostream &s, const bool) {
Impl::CudaInternal::singleton().print_configuration(s); Impl::CudaInternal::singleton().print_configuration(s);
} }
@ -924,54 +881,53 @@ void CudaSpaceInitializer::fence() { Kokkos::Cuda::impl_static_fence(); }
void CudaSpaceInitializer::print_configuration(std::ostream &msg, void CudaSpaceInitializer::print_configuration(std::ostream &msg,
const bool detail) { const bool detail) {
msg << "Device Execution Space:" << std::endl; msg << "Device Execution Space:\n";
msg << " KOKKOS_ENABLE_CUDA: "; msg << " KOKKOS_ENABLE_CUDA: yes\n";
msg << "yes" << std::endl;
msg << "Cuda Atomics:" << std::endl; msg << "Cuda Atomics:\n";
msg << " KOKKOS_ENABLE_CUDA_ATOMICS: "; msg << " KOKKOS_ENABLE_CUDA_ATOMICS: ";
#ifdef KOKKOS_ENABLE_CUDA_ATOMICS #ifdef KOKKOS_ENABLE_CUDA_ATOMICS
msg << "yes" << std::endl; msg << "yes\n";
#else #else
msg << "no" << std::endl; msg << "no\n";
#endif #endif
msg << "Cuda Options:" << std::endl; msg << "Cuda Options:\n";
msg << " KOKKOS_ENABLE_CUDA_LAMBDA: "; msg << " KOKKOS_ENABLE_CUDA_LAMBDA: ";
#ifdef KOKKOS_ENABLE_CUDA_LAMBDA #ifdef KOKKOS_ENABLE_CUDA_LAMBDA
msg << "yes" << std::endl; msg << "yes\n";
#else #else
msg << "no" << std::endl; msg << "no\n";
#endif #endif
msg << " KOKKOS_ENABLE_CUDA_LDG_INTRINSIC: "; msg << " KOKKOS_ENABLE_CUDA_LDG_INTRINSIC: ";
#ifdef KOKKOS_ENABLE_CUDA_LDG_INTRINSIC #ifdef KOKKOS_ENABLE_CUDA_LDG_INTRINSIC
msg << "yes" << std::endl; msg << "yes\n";
#else #else
msg << "no" << std::endl; msg << "no\n";
#endif #endif
msg << " KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE: "; msg << " KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE: ";
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE #ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
msg << "yes" << std::endl; msg << "yes\n";
#else #else
msg << "no" << std::endl; msg << "no\n";
#endif #endif
msg << " KOKKOS_ENABLE_CUDA_UVM: "; msg << " KOKKOS_ENABLE_CUDA_UVM: ";
#ifdef KOKKOS_ENABLE_CUDA_UVM #ifdef KOKKOS_ENABLE_CUDA_UVM
msg << "yes" << std::endl; msg << "yes\n";
#else #else
msg << "no" << std::endl; msg << "no\n";
#endif #endif
msg << " KOKKOS_ENABLE_CUSPARSE: "; msg << " KOKKOS_ENABLE_CUSPARSE: ";
#ifdef KOKKOS_ENABLE_CUSPARSE #ifdef KOKKOS_ENABLE_CUSPARSE
msg << "yes" << std::endl; msg << "yes\n";
#else #else
msg << "no" << std::endl; msg << "no\n";
#endif #endif
msg << " KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA: "; msg << " KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA: ";
#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA #ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
msg << "yes" << std::endl; msg << "yes\n";
#else #else
msg << "no" << std::endl; msg << "no\n";
#endif #endif
msg << "\nCuda Runtime Configuration:" << std::endl; msg << "\nCuda Runtime Configuration:" << std::endl;

View File

@ -17,30 +17,24 @@ namespace Kokkos {
namespace Impl { namespace Impl {
struct CudaTraits { struct CudaTraits {
enum : CudaSpace::size_type { WarpSize = 32 /* 0x0020 */ }; static constexpr CudaSpace::size_type WarpSize = 32 /* 0x0020 */;
enum : CudaSpace::size_type { static constexpr CudaSpace::size_type WarpIndexMask =
WarpIndexMask = 0x001f /* Mask for warpindex */ 0x001f; /* Mask for warpindex */
}; static constexpr CudaSpace::size_type WarpIndexShift =
enum : CudaSpace::size_type { 5; /* WarpSize == 1 << WarpShift */
WarpIndexShift = 5 /* WarpSize == 1 << WarpShift */
};
enum : CudaSpace::size_type { static constexpr CudaSpace::size_type ConstantMemoryUsage =
ConstantMemoryUsage = 0x008000 /* 32k bytes */ 0x008000; /* 32k bytes */
}; static constexpr CudaSpace::size_type ConstantMemoryCache =
enum : CudaSpace::size_type { 0x002000; /* 8k bytes */
ConstantMemoryCache = 0x002000 /* 8k bytes */ static constexpr CudaSpace::size_type KernelArgumentLimit =
}; 0x001000; /* 4k bytes */
enum : CudaSpace::size_type { static constexpr CudaSpace::size_type MaxHierarchicalParallelism =
KernelArgumentLimit = 0x001000 /* 4k bytes */ 1024; /* team_size * vector_length */
};
enum : CudaSpace::size_type {
MaxHierarchicalParallelism = 1024 /* team_size * vector_length */
};
using ConstantGlobalBufferType = using ConstantGlobalBufferType =
unsigned long[ConstantMemoryUsage / sizeof(unsigned long)]; unsigned long[ConstantMemoryUsage / sizeof(unsigned long)];
enum { ConstantMemoryUseThreshold = 0x000200 /* 512 bytes */ }; static constexpr int ConstantMemoryUseThreshold = 0x000200 /* 512 bytes */;
KOKKOS_INLINE_FUNCTION static CudaSpace::size_type warp_count( KOKKOS_INLINE_FUNCTION static CudaSpace::size_type warp_count(
CudaSpace::size_type i) { CudaSpace::size_type i) {

View File

@ -158,6 +158,9 @@ inline void check_shmem_request(CudaInternal const* cuda_instance, int shmem) {
} }
} }
// This function needs to be template on DriverType and LaunchBounds
// so that the static bool is unique for each type combo
// KernelFuncPtr does not necessarily contain that type information.
template <class DriverType, class LaunchBounds, class KernelFuncPtr> template <class DriverType, class LaunchBounds, class KernelFuncPtr>
inline void configure_shmem_preference(KernelFuncPtr const& func, inline void configure_shmem_preference(KernelFuncPtr const& func,
bool prefer_shmem) { bool prefer_shmem) {
@ -355,8 +358,7 @@ struct CudaParallelLaunchKernelInvoker<
if (!Impl::is_empty_launch(grid, block)) { if (!Impl::is_empty_launch(grid, block)) {
Impl::check_shmem_request(cuda_instance, shmem); Impl::check_shmem_request(cuda_instance, shmem);
Impl::configure_shmem_preference<DriverType, LaunchBounds, Impl::configure_shmem_preference<DriverType, LaunchBounds>(
decltype(base_t::get_kernel_func())>(
base_t::get_kernel_func(), prefer_shmem); base_t::get_kernel_func(), prefer_shmem);
void const* args[] = {&driver}; void const* args[] = {&driver};
@ -449,8 +451,7 @@ struct CudaParallelLaunchKernelInvoker<
if (!Impl::is_empty_launch(grid, block)) { if (!Impl::is_empty_launch(grid, block)) {
Impl::check_shmem_request(cuda_instance, shmem); Impl::check_shmem_request(cuda_instance, shmem);
Impl::configure_shmem_preference<DriverType, LaunchBounds, Impl::configure_shmem_preference<DriverType, LaunchBounds>(
decltype(base_t::get_kernel_func())>(
base_t::get_kernel_func(), prefer_shmem); base_t::get_kernel_func(), prefer_shmem);
auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver); auto* driver_ptr = Impl::allocate_driver_storage_for_kernel(driver);
@ -627,9 +628,8 @@ struct CudaParallelLaunchImpl<
get_cuda_func_attributes(), block, shmem, prefer_shmem); get_cuda_func_attributes(), block, shmem, prefer_shmem);
Impl::configure_shmem_preference< Impl::configure_shmem_preference<
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>, DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>>(
decltype(base_t::get_kernel_func())>(base_t::get_kernel_func(), base_t::get_kernel_func(), prefer_shmem);
prefer_shmem);
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE(); KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();

View File

@ -0,0 +1,37 @@
#ifndef KOKKOS_CUDA_MDRANGEPOLICY_HPP_
#define KOKKOS_CUDA_MDRANGEPOLICY_HPP_
#include <KokkosExp_MDRangePolicy.hpp>
namespace Kokkos {
template <>
struct default_outer_direction<Kokkos::Cuda> {
using type = Iterate;
static constexpr Iterate value = Iterate::Left;
};
template <>
struct default_inner_direction<Kokkos::Cuda> {
using type = Iterate;
static constexpr Iterate value = Iterate::Left;
};
namespace Impl {
// Settings for MDRangePolicy
template <>
inline TileSizeProperties get_tile_size_properties<Kokkos::Cuda>(
const Kokkos::Cuda& space) {
TileSizeProperties properties;
properties.max_threads =
space.impl_internal_space_instance()->m_maxThreadsPerSM;
properties.default_largest_tile_size = 16;
properties.default_tile_size = 2;
properties.max_total_tile_size = 512;
return properties;
}
} // Namespace Impl
} // Namespace Kokkos
#endif

View File

@ -60,6 +60,7 @@
#include <Cuda/Kokkos_Cuda_ReduceScan.hpp> #include <Cuda/Kokkos_Cuda_ReduceScan.hpp>
#include <Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp> #include <Cuda/Kokkos_Cuda_BlockSize_Deduction.hpp>
#include <Cuda/Kokkos_Cuda_Locks.hpp> #include <Cuda/Kokkos_Cuda_Locks.hpp>
#include <Cuda/Kokkos_Cuda_Team.hpp>
#include <Kokkos_Vectorization.hpp> #include <Kokkos_Vectorization.hpp>
#include <Cuda/Kokkos_Cuda_Version_9_8_Compatibility.hpp> #include <Cuda/Kokkos_Cuda_Version_9_8_Compatibility.hpp>
@ -67,6 +68,7 @@
#include <typeinfo> #include <typeinfo>
#include <KokkosExp_MDRangePolicy.hpp> #include <KokkosExp_MDRangePolicy.hpp>
#include <impl/KokkosExp_IterateTileGPU.hpp>
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
@ -474,7 +476,7 @@ class ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
Policy const& get_policy() const { return m_policy; } Policy const& get_policy() const { return m_policy; }
inline __device__ void operator()(void) const { inline __device__ void operator()() const {
const Member work_stride = blockDim.y * gridDim.x; const Member work_stride = blockDim.y * gridDim.x;
const Member work_end = m_policy.end(); const Member work_end = m_policy.end();
@ -537,9 +539,23 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda> {
const Policy m_rp; const Policy m_rp;
public: public:
template <typename Policy, typename Functor>
static int max_tile_size_product(const Policy& pol, const Functor&) {
cudaFuncAttributes attr =
CudaParallelLaunch<ParallelFor,
LaunchBounds>::get_cuda_func_attributes();
auto const& prop = pol.space().cuda_device_prop();
// Limits due to registers/SM, MDRange doesn't have
// shared memory constraints
int const regs_per_sm = prop.regsPerMultiprocessor;
int const regs_per_thread = attr.numRegs;
int const max_threads_per_sm = regs_per_sm / regs_per_thread;
return std::min(
max_threads_per_sm,
static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
}
Policy const& get_policy() const { return m_rp; } Policy const& get_policy() const { return m_rp; }
inline __device__ void operator()() const {
inline __device__ void operator()(void) const {
Kokkos::Impl::DeviceIterateTile<Policy::rank, Policy, FunctorType, Kokkos::Impl::DeviceIterateTile<Policy::rank, Policy, FunctorType,
typename Policy::work_tag>(m_rp, m_functor) typename Policy::work_tag>(m_rp, m_functor)
.exec_range(); .exec_range();
@ -689,7 +705,7 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
public: public:
Policy const& get_policy() const { return m_policy; } Policy const& get_policy() const { return m_policy; }
__device__ inline void operator()(void) const { __device__ inline void operator()() const {
// Iterate this block through the league // Iterate this block through the league
int64_t threadid = 0; int64_t threadid = 0;
if (m_scratch_size[1] > 0) { if (m_scratch_size[1] > 0) {
@ -1248,8 +1264,21 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
using DummySHMEMReductionType = int; using DummySHMEMReductionType = int;
public: public:
template <typename Policy, typename Functor>
static int max_tile_size_product(const Policy& pol, const Functor&) {
cudaFuncAttributes attr =
CudaParallelLaunch<ParallelReduce,
LaunchBounds>::get_cuda_func_attributes();
auto const& prop = pol.space().cuda_device_prop();
// Limits due do registers/SM
int const regs_per_sm = prop.regsPerMultiprocessor;
int const regs_per_thread = attr.numRegs;
int const max_threads_per_sm = regs_per_sm / regs_per_thread;
return std::min(
max_threads_per_sm,
static_cast<int>(Kokkos::Impl::CudaTraits::MaxHierarchicalParallelism));
}
Policy const& get_policy() const { return m_policy; } Policy const& get_policy() const { return m_policy; }
inline __device__ void exec_range(reference_type update) const { inline __device__ void exec_range(reference_type update) const {
Kokkos::Impl::Reduce::DeviceIterateTile<Policy::rank, Policy, FunctorType, Kokkos::Impl::Reduce::DeviceIterateTile<Policy::rank, Policy, FunctorType,
typename Policy::work_tag, typename Policy::work_tag,
@ -1258,7 +1287,7 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
.exec_range(); .exec_range();
} }
inline __device__ void operator()(void) const { inline __device__ void operator()() const {
/* run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType, /* run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType,
DummySHMEMReductionType>::select(1,1.0) ); DummySHMEMReductionType>::select(1,1.0) );
} }
@ -2074,7 +2103,7 @@ class ParallelScan<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
//---------------------------------------- //----------------------------------------
__device__ inline void initial(void) const { __device__ inline void initial() const {
const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize / const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize /
sizeof(size_type)> sizeof(size_type)>
word_count(ValueTraits::value_size(m_functor) / sizeof(size_type)); word_count(ValueTraits::value_size(m_functor) / sizeof(size_type));
@ -2110,7 +2139,7 @@ class ParallelScan<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
//---------------------------------------- //----------------------------------------
__device__ inline void final(void) const { __device__ inline void final() const {
const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize / const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize /
sizeof(size_type)> sizeof(size_type)>
word_count(ValueTraits::value_size(m_functor) / sizeof(size_type)); word_count(ValueTraits::value_size(m_functor) / sizeof(size_type));
@ -2195,7 +2224,7 @@ class ParallelScan<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda> {
//---------------------------------------- //----------------------------------------
__device__ inline void operator()(void) const { __device__ inline void operator()() const {
#ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION #ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
if (m_run_serial) { if (m_run_serial) {
typename ValueTraits::value_type value; typename ValueTraits::value_type value;
@ -2364,7 +2393,7 @@ class ParallelScanWithTotal<FunctorType, Kokkos::RangePolicy<Traits...>,
//---------------------------------------- //----------------------------------------
__device__ inline void initial(void) const { __device__ inline void initial() const {
const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize / const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize /
sizeof(size_type)> sizeof(size_type)>
word_count(ValueTraits::value_size(m_functor) / sizeof(size_type)); word_count(ValueTraits::value_size(m_functor) / sizeof(size_type));
@ -2400,7 +2429,7 @@ class ParallelScanWithTotal<FunctorType, Kokkos::RangePolicy<Traits...>,
//---------------------------------------- //----------------------------------------
__device__ inline void final(void) const { __device__ inline void final() const {
const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize / const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize /
sizeof(size_type)> sizeof(size_type)>
word_count(ValueTraits::value_size(m_functor) / sizeof(size_type)); word_count(ValueTraits::value_size(m_functor) / sizeof(size_type));
@ -2487,7 +2516,7 @@ class ParallelScanWithTotal<FunctorType, Kokkos::RangePolicy<Traits...>,
//---------------------------------------- //----------------------------------------
__device__ inline void operator()(void) const { __device__ inline void operator()() const {
#ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION #ifdef KOKKOS_IMPL_DEBUG_CUDA_SERIAL_EXECUTION
if (m_run_serial) { if (m_run_serial) {
typename ValueTraits::value_type value; typename ValueTraits::value_type value;

View File

@ -661,13 +661,14 @@ KOKKOS_INLINE_FUNCTION
thread, count); thread, count);
} }
template <typename iType> template <typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION Impl::ThreadVectorRangeBoundariesStruct<
Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::CudaTeamMember> typename std::common_type<iType1, iType2>::type, Impl::CudaTeamMember>
ThreadVectorRange(const Impl::CudaTeamMember& thread, iType arg_begin, ThreadVectorRange(const Impl::CudaTeamMember& thread, iType1 arg_begin,
iType arg_end) { iType2 arg_end) {
using iType = typename std::common_type<iType1, iType2>::type;
return Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::CudaTeamMember>( return Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::CudaTeamMember>(
thread, arg_begin, arg_end); thread, iType(arg_begin), iType(arg_end));
} }
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
@ -983,7 +984,7 @@ KOKKOS_INLINE_FUNCTION void parallel_scan(
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
/** \brief Intra-thread vector parallel exclusive prefix sum. /** \brief Intra-thread vector parallel scan with reducer.
* *
* Executes closure(iType i, ValueType & val, bool final) for each i=[0..N) * Executes closure(iType i, ValueType & val, bool final) for each i=[0..N)
* *
@ -991,25 +992,25 @@ KOKKOS_INLINE_FUNCTION void parallel_scan(
* thread and a scan operation is performed. * thread and a scan operation is performed.
* The last call to closure has final == true. * The last call to closure has final == true.
*/ */
template <typename iType, class Closure> template <typename iType, class Closure, typename ReducerType>
KOKKOS_INLINE_FUNCTION void parallel_scan( KOKKOS_INLINE_FUNCTION
const Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::CudaTeamMember>& typename std::enable_if<Kokkos::is_reducer<ReducerType>::value>::type
loop_boundaries, parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<
const Closure& closure) { iType, Impl::CudaTeamMember>& loop_boundaries,
const Closure& closure, const ReducerType& reducer) {
(void)loop_boundaries; (void)loop_boundaries;
(void)closure; (void)closure;
(void)reducer;
#ifdef __CUDA_ARCH__ #ifdef __CUDA_ARCH__
// Extract value_type from closure using value_type = typename ReducerType::value_type;
value_type accum;
using value_type = typename Kokkos::Impl::FunctorAnalysis< reducer.init(accum);
Kokkos::Impl::FunctorPatternInterface::SCAN, void, Closure>::value_type; const value_type identity = accum;
// Loop through boundaries by vector-length chunks // Loop through boundaries by vector-length chunks
// must scan at each iteration // must scan at each iteration
value_type accum = 0;
// All thread "lanes" must loop the same number of times. // All thread "lanes" must loop the same number of times.
// Determine an loop end for all thread "lanes." // Determine an loop end for all thread "lanes."
// Requires: // Requires:
@ -1026,44 +1027,68 @@ KOKKOS_INLINE_FUNCTION void parallel_scan(
const int end = loop_boundaries.end + (rem ? blockDim.x - rem : 0); const int end = loop_boundaries.end + (rem ? blockDim.x - rem : 0);
for (int i = threadIdx.x; i < end; i += blockDim.x) { for (int i = threadIdx.x; i < end; i += blockDim.x) {
value_type val = 0; value_type val = identity;
// First acquire per-lane contributions: // First acquire per-lane contributions.
if (i < loop_boundaries.end) closure(i, val, false); // This sets i's val to i-1's contribution
// to make the latter in_place_shfl_up an
// exclusive scan -- the final accumulation
// of i's val will be included in the second
// closure call later.
if (i < loop_boundaries.end && threadIdx.x > 0) closure(i - 1, val, false);
value_type sval = val; // Bottom up exclusive scan in triangular pattern
// Bottom up inclusive scan in triangular pattern
// where each CUDA thread is the root of a reduction tree // where each CUDA thread is the root of a reduction tree
// from the zeroth "lane" to itself. // from the zeroth "lane" to itself.
// [t] += [t-1] if t >= 1 // [t] += [t-1] if t >= 1
// [t] += [t-2] if t >= 2 // [t] += [t-2] if t >= 2
// [t] += [t-4] if t >= 4 // [t] += [t-4] if t >= 4
// ... // ...
// This differs from the non-reducer overload, where an inclusive scan was
// implemented, because in general the binary operator cannot be inverted
// and we would not be able to remove the inclusive contribution by
// inversion.
for (int j = 1; j < (int)blockDim.x; j <<= 1) { for (int j = 1; j < (int)blockDim.x; j <<= 1) {
value_type tmp = 0; value_type tmp = identity;
Impl::in_place_shfl_up(tmp, sval, j, blockDim.x, active_mask); Impl::in_place_shfl_up(tmp, val, j, blockDim.x, active_mask);
if (j <= (int)threadIdx.x) { if (j <= (int)threadIdx.x) {
sval += tmp; reducer.join(val, tmp);
} }
} }
// Include accumulation and remove value for exclusive scan: // Include accumulation
val = accum + sval - val; reducer.join(val, accum);
// Provide exclusive scan value: // Update i's contribution into the val
// and add it to accum for next round
if (i < loop_boundaries.end) closure(i, val, true); if (i < loop_boundaries.end) closure(i, val, true);
Impl::in_place_shfl(accum, val, mask, blockDim.x, active_mask);
// Accumulate the last value in the inclusive scan:
Impl::in_place_shfl(sval, sval, mask, blockDim.x, active_mask);
accum += sval;
} }
#endif #endif
} }
//----------------------------------------------------------------------------
/** \brief Intra-thread vector parallel exclusive prefix sum.
*
* Executes closure(iType i, ValueType & val, bool final) for each i=[0..N)
*
* The range [0..N) is mapped to all vector lanes in the
* thread and a scan operation is performed.
* The last call to closure has final == true.
*/
template <typename iType, class Closure>
KOKKOS_INLINE_FUNCTION void parallel_scan(
const Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::CudaTeamMember>&
loop_boundaries,
const Closure& closure) {
using value_type = typename Kokkos::Impl::FunctorAnalysis<
Kokkos::Impl::FunctorPatternInterface::SCAN, void, Closure>::value_type;
value_type dummy;
parallel_scan(loop_boundaries, closure, Kokkos::Sum<value_type>(dummy));
}
} // namespace Kokkos } // namespace Kokkos
namespace Kokkos { namespace Kokkos {

View File

@ -139,7 +139,7 @@ struct CudaLDGFetch {
template <typename iType> template <typename iType>
KOKKOS_INLINE_FUNCTION ValueType operator[](const iType& i) const { KOKKOS_INLINE_FUNCTION ValueType operator[](const iType& i) const {
#ifdef __CUDA_ARCH__ #if defined(__CUDA_ARCH__) && (350 <= _CUDA_ARCH__)
AliasType v = __ldg(reinterpret_cast<const AliasType*>(&m_ptr[i])); AliasType v = __ldg(reinterpret_cast<const AliasType*>(&m_ptr[i]));
return *(reinterpret_cast<ValueType*>(&v)); return *(reinterpret_cast<ValueType*>(&v));
#else #else

View File

@ -46,6 +46,7 @@
#define KOKKOS_CUDA_WORKGRAPHPOLICY_HPP #define KOKKOS_CUDA_WORKGRAPHPOLICY_HPP
#include <Kokkos_Cuda.hpp> #include <Kokkos_Cuda.hpp>
#include <Cuda/Kokkos_Cuda_KernelLaunch.hpp>
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {

View File

@ -75,17 +75,6 @@ void hipOccupancy(int *numBlocks, int blockSize, int sharedmem) {
hipOccupancy<DriverType, constant, HIPTraits::MaxThreadsPerBlock, 1>( hipOccupancy<DriverType, constant, HIPTraits::MaxThreadsPerBlock, 1>(
numBlocks, blockSize, sharedmem); numBlocks, blockSize, sharedmem);
} }
template <typename DriverType, typename LaunchBounds, bool Large>
struct HIPGetMaxBlockSize;
template <typename DriverType, typename LaunchBounds>
int hip_get_max_block_size(typename DriverType::functor_type const &f,
size_t const vector_length,
size_t const shmem_extra_block,
size_t const shmem_extra_thread) {
return HIPGetMaxBlockSize<DriverType, LaunchBounds, true>::get_block_size(
f, vector_length, shmem_extra_block, shmem_extra_thread);
}
template <class FunctorType, class LaunchBounds, typename F> template <class FunctorType, class LaunchBounds, typename F>
int hip_internal_get_block_size(const F &condition_check, int hip_internal_get_block_size(const F &condition_check,
@ -131,10 +120,6 @@ int hip_internal_get_block_size(const F &condition_check,
int opt_block_size = int opt_block_size =
(blocks_per_sm >= min_blocks_per_sm) ? block_size : min_blocks_per_sm; (blocks_per_sm >= min_blocks_per_sm) ? block_size : min_blocks_per_sm;
int opt_threads_per_sm = threads_per_sm; int opt_threads_per_sm = threads_per_sm;
// printf("BlockSizeMax: %i Shmem: %i %i %i %i Regs: %i %i Blocks: %i %i
// Achieved: %i %i Opt: %i %i\n",block_size,
// shmem_per_sm,max_shmem_per_block,functor_shmem,total_shmem,
// regs_per_sm,regs_per_wavefront,max_blocks_shmem,max_blocks_regs,blocks_per_sm,threads_per_sm,opt_block_size,opt_threads_per_sm);
block_size -= HIPTraits::WarpSize; block_size -= HIPTraits::WarpSize;
while (condition_check(blocks_per_sm) && while (condition_check(blocks_per_sm) &&
(block_size >= HIPTraits::WarpSize)) { (block_size >= HIPTraits::WarpSize)) {
@ -160,10 +145,6 @@ int hip_internal_get_block_size(const F &condition_check,
opt_threads_per_sm = threads_per_sm; opt_threads_per_sm = threads_per_sm;
} }
} }
// printf("BlockSizeMax: %i Shmem: %i %i %i %i Regs: %i %i Blocks: %i %i
// Achieved: %i %i Opt: %i %i\n",block_size,
// shmem_per_sm,max_shmem_per_block,functor_shmem,total_shmem,
// regs_per_sm,regs_per_wavefront,max_blocks_shmem,max_blocks_regs,blocks_per_sm,threads_per_sm,opt_block_size,opt_threads_per_sm);
block_size -= HIPTraits::WarpSize; block_size -= HIPTraits::WarpSize;
} }
return opt_block_size; return opt_block_size;
@ -178,62 +159,6 @@ int hip_get_max_block_size(const HIPInternal *hip_instance,
[](int x) { return x == 0; }, hip_instance, attr, f, vector_length, [](int x) { return x == 0; }, hip_instance, attr, f, vector_length,
shmem_block, shmem_thread); shmem_block, shmem_thread);
} }
template <typename DriverType, class LaunchBounds>
struct HIPGetMaxBlockSize<DriverType, LaunchBounds, true> {
static int get_block_size(typename DriverType::functor_type const &f,
size_t const vector_length,
size_t const shmem_extra_block,
size_t const shmem_extra_thread) {
int numBlocks = 0;
int blockSize = LaunchBounds::maxTperB == 0 ? 1024 : LaunchBounds::maxTperB;
int sharedmem =
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
::Kokkos::Impl::FunctorTeamShmemSize<
typename DriverType::functor_type>::value(f, blockSize /
vector_length);
hipOccupancy<DriverType, true>(&numBlocks, blockSize, sharedmem);
if (numBlocks > 0) return blockSize;
while (blockSize > HIPTraits::WarpSize && numBlocks == 0) {
blockSize /= 2;
sharedmem =
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
::Kokkos::Impl::FunctorTeamShmemSize<
typename DriverType::functor_type>::value(f, blockSize /
vector_length);
hipOccupancy<DriverType, true>(&numBlocks, blockSize, sharedmem);
}
int blockSizeUpperBound = blockSize * 2;
while (blockSize < blockSizeUpperBound && numBlocks > 0) {
blockSize += HIPTraits::WarpSize;
sharedmem =
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
::Kokkos::Impl::FunctorTeamShmemSize<
typename DriverType::functor_type>::value(f, blockSize /
vector_length);
hipOccupancy<DriverType, true>(&numBlocks, blockSize, sharedmem);
}
return blockSize - HIPTraits::WarpSize;
}
};
template <typename DriverType, typename LaunchBounds, bool Large>
struct HIPGetOptBlockSize;
template <typename DriverType, typename LaunchBounds>
int hip_get_opt_block_size(typename DriverType::functor_type const &f,
size_t const vector_length,
size_t const shmem_extra_block,
size_t const shmem_extra_thread) {
return HIPGetOptBlockSize<
DriverType, LaunchBounds,
(HIPTraits::ConstantMemoryUseThreshold <
sizeof(DriverType))>::get_block_size(f, vector_length, shmem_extra_block,
shmem_extra_thread);
}
template <typename FunctorType, typename LaunchBounds> template <typename FunctorType, typename LaunchBounds>
int hip_get_opt_block_size(HIPInternal const *hip_instance, int hip_get_opt_block_size(HIPInternal const *hip_instance,
@ -245,157 +170,6 @@ int hip_get_opt_block_size(HIPInternal const *hip_instance,
shmem_block, shmem_thread); shmem_block, shmem_thread);
} }
// FIXME_HIP the code is identical to the false struct except for
// hip_parallel_launch_constant_memory
template <typename DriverType>
struct HIPGetOptBlockSize<DriverType, Kokkos::LaunchBounds<0, 0>, true> {
static int get_block_size(typename DriverType::functor_type const &f,
size_t const vector_length,
size_t const shmem_extra_block,
size_t const shmem_extra_thread) {
int blockSize = HIPTraits::WarpSize / 2;
int numBlocks;
int sharedmem;
int maxOccupancy = 0;
int bestBlockSize = 0;
while (blockSize < HIPTraits::MaxThreadsPerBlock) {
blockSize *= 2;
// calculate the occupancy with that optBlockSize and check whether its
// larger than the largest one found so far
sharedmem =
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
::Kokkos::Impl::FunctorTeamShmemSize<
typename DriverType::functor_type>::value(f, blockSize /
vector_length);
hipOccupancy<DriverType, true>(&numBlocks, blockSize, sharedmem);
if (maxOccupancy < numBlocks * blockSize) {
maxOccupancy = numBlocks * blockSize;
bestBlockSize = blockSize;
}
}
return bestBlockSize;
}
};
template <typename DriverType>
struct HIPGetOptBlockSize<DriverType, Kokkos::LaunchBounds<0, 0>, false> {
static int get_block_size(const typename DriverType::functor_type &f,
const size_t vector_length,
const size_t shmem_extra_block,
const size_t shmem_extra_thread) {
int blockSize = HIPTraits::WarpSize / 2;
int numBlocks;
int sharedmem;
int maxOccupancy = 0;
int bestBlockSize = 0;
while (blockSize < HIPTraits::MaxThreadsPerBlock) {
blockSize *= 2;
sharedmem =
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
::Kokkos::Impl::FunctorTeamShmemSize<
typename DriverType::functor_type>::value(f, blockSize /
vector_length);
hipOccupancy<DriverType, false>(&numBlocks, blockSize, sharedmem);
if (maxOccupancy < numBlocks * blockSize) {
maxOccupancy = numBlocks * blockSize;
bestBlockSize = blockSize;
}
}
return bestBlockSize;
}
};
// FIXME_HIP the code is identical to the false struct except for
// hip_parallel_launch_constant_memory
template <typename DriverType, unsigned int MaxThreadsPerBlock,
unsigned int MinBlocksPerSM>
struct HIPGetOptBlockSize<
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
true> {
static int get_block_size(const typename DriverType::functor_type &f,
const size_t vector_length,
const size_t shmem_extra_block,
const size_t shmem_extra_thread) {
int blockSize = HIPTraits::WarpSize / 2;
int numBlocks;
int sharedmem;
int maxOccupancy = 0;
int bestBlockSize = 0;
int max_threads_per_block =
std::min(MaxThreadsPerBlock,
hip_internal_maximum_warp_count() * HIPTraits::WarpSize);
while (blockSize < max_threads_per_block) {
blockSize *= 2;
// calculate the occupancy with that optBlockSize and check whether its
// larger than the largest one found so far
sharedmem =
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
::Kokkos::Impl::FunctorTeamShmemSize<
typename DriverType::functor_type>::value(f, blockSize /
vector_length);
hipOccupancy<DriverType, true, MaxThreadsPerBlock, MinBlocksPerSM>(
&numBlocks, blockSize, sharedmem);
if (numBlocks >= static_cast<int>(MinBlocksPerSM) &&
blockSize <= static_cast<int>(MaxThreadsPerBlock)) {
if (maxOccupancy < numBlocks * blockSize) {
maxOccupancy = numBlocks * blockSize;
bestBlockSize = blockSize;
}
}
}
if (maxOccupancy > 0) return bestBlockSize;
return -1;
}
};
template <typename DriverType, unsigned int MaxThreadsPerBlock,
unsigned int MinBlocksPerSM>
struct HIPGetOptBlockSize<
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
false> {
static int get_block_size(const typename DriverType::functor_type &f,
const size_t vector_length,
const size_t shmem_extra_block,
const size_t shmem_extra_thread) {
int blockSize = HIPTraits::WarpSize / 2;
int numBlocks;
int sharedmem;
int maxOccupancy = 0;
int bestBlockSize = 0;
int max_threads_per_block =
std::min(MaxThreadsPerBlock,
hip_internal_maximum_warp_count() * HIPTraits::WarpSize);
while (blockSize < max_threads_per_block) {
blockSize *= 2;
sharedmem =
shmem_extra_block + shmem_extra_thread * (blockSize / vector_length) +
::Kokkos::Impl::FunctorTeamShmemSize<
typename DriverType::functor_type>::value(f, blockSize /
vector_length);
hipOccupancy<DriverType, false, MaxThreadsPerBlock, MinBlocksPerSM>(
&numBlocks, blockSize, sharedmem);
if (numBlocks >= int(MinBlocksPerSM) &&
blockSize <= int(MaxThreadsPerBlock)) {
if (maxOccupancy < numBlocks * blockSize) {
maxOccupancy = numBlocks * blockSize;
bestBlockSize = blockSize;
}
}
}
if (maxOccupancy > 0) return bestBlockSize;
return -1;
}
};
} // namespace Impl } // namespace Impl
} // namespace Experimental } // namespace Experimental
} // namespace Kokkos } // namespace Kokkos

View File

@ -164,6 +164,8 @@ HIPInternal &HIPInternal::singleton() {
void HIPInternal::fence() const { void HIPInternal::fence() const {
HIP_SAFE_CALL(hipStreamSynchronize(m_stream)); HIP_SAFE_CALL(hipStreamSynchronize(m_stream));
// can reset our cycle id now as well
m_cycleId = 0;
} }
void HIPInternal::initialize(int hip_device_id, hipStream_t stream) { void HIPInternal::initialize(int hip_device_id, hipStream_t stream) {
@ -256,7 +258,7 @@ void HIPInternal::initialize(int hip_device_id, hipStream_t stream) {
void>; void>;
Record *const r = Record::allocate(Kokkos::Experimental::HIPSpace(), Record *const r = Record::allocate(Kokkos::Experimental::HIPSpace(),
"InternalScratchBitset", "Kokkos::InternalScratchBitset",
sizeof(uint32_t) * buffer_bound); sizeof(uint32_t) * buffer_bound);
Record::increment(r); Record::increment(r);
@ -303,8 +305,10 @@ Kokkos::Experimental::HIP::size_type *HIPInternal::scratch_space(
Kokkos::Impl::SharedAllocationRecord<Kokkos::Experimental::HIPSpace, Kokkos::Impl::SharedAllocationRecord<Kokkos::Experimental::HIPSpace,
void>; void>;
static Record *const r = Record::allocate( if (m_scratchSpace) Record::decrement(Record::get_record(m_scratchSpace));
Kokkos::Experimental::HIPSpace(), "InternalScratchSpace",
Record *const r = Record::allocate(
Kokkos::Experimental::HIPSpace(), "Kokkos::InternalScratchSpace",
(sizeScratchGrain * m_scratchSpaceCount)); (sizeScratchGrain * m_scratchSpaceCount));
Record::increment(r); Record::increment(r);
@ -325,8 +329,10 @@ Kokkos::Experimental::HIP::size_type *HIPInternal::scratch_flags(
Kokkos::Impl::SharedAllocationRecord<Kokkos::Experimental::HIPSpace, Kokkos::Impl::SharedAllocationRecord<Kokkos::Experimental::HIPSpace,
void>; void>;
if (m_scratchFlags) Record::decrement(Record::get_record(m_scratchFlags));
Record *const r = Record::allocate( Record *const r = Record::allocate(
Kokkos::Experimental::HIPSpace(), "InternalScratchFlags", Kokkos::Experimental::HIPSpace(), "Kokkos::InternalScratchFlags",
(sizeScratchGrain * m_scratchFlagsCount)); (sizeScratchGrain * m_scratchFlagsCount));
Record::increment(r); Record::increment(r);
@ -345,7 +351,7 @@ void *HIPInternal::resize_team_scratch_space(std::int64_t bytes,
if (m_team_scratch_current_size == 0) { if (m_team_scratch_current_size == 0) {
m_team_scratch_current_size = bytes; m_team_scratch_current_size = bytes;
m_team_scratch_ptr = Kokkos::kokkos_malloc<Kokkos::Experimental::HIPSpace>( m_team_scratch_ptr = Kokkos::kokkos_malloc<Kokkos::Experimental::HIPSpace>(
"HIPSpace::ScratchMemory", m_team_scratch_current_size); "Kokkos::HIPSpace::TeamScratchMemory", m_team_scratch_current_size);
} }
if ((bytes > m_team_scratch_current_size) || if ((bytes > m_team_scratch_current_size) ||
((bytes < m_team_scratch_current_size) && (force_shrink))) { ((bytes < m_team_scratch_current_size) && (force_shrink))) {
@ -388,6 +394,40 @@ void HIPInternal::finalize() {
m_team_scratch_current_size = 0; m_team_scratch_current_size = 0;
m_team_scratch_ptr = nullptr; m_team_scratch_ptr = nullptr;
} }
if (nullptr != d_driverWorkArray) {
HIP_SAFE_CALL(hipHostFree(d_driverWorkArray));
d_driverWorkArray = nullptr;
}
}
char *HIPInternal::get_next_driver(size_t driverTypeSize) const {
std::lock_guard<std::mutex> const lock(m_mutexWorkArray);
if (d_driverWorkArray == nullptr) {
HIP_SAFE_CALL(
hipHostMalloc(&d_driverWorkArray,
m_maxDriverCycles * m_maxDriverTypeSize * sizeof(char),
hipHostMallocNonCoherent));
}
if (driverTypeSize > m_maxDriverTypeSize) {
// fence handles the cycle id reset for us
fence();
HIP_SAFE_CALL(hipHostFree(d_driverWorkArray));
m_maxDriverTypeSize = driverTypeSize;
if (m_maxDriverTypeSize % 128 != 0)
m_maxDriverTypeSize =
m_maxDriverTypeSize + 128 - m_maxDriverTypeSize % 128;
HIP_SAFE_CALL(
hipHostMalloc(&d_driverWorkArray,
m_maxDriverCycles * m_maxDriverTypeSize * sizeof(char),
hipHostMallocNonCoherent));
} else {
m_cycleId = (m_cycleId + 1) % m_maxDriverCycles;
if (m_cycleId == 0) {
// ensure any outstanding kernels are completed before we wrap around
fence();
}
}
return &d_driverWorkArray[m_maxDriverTypeSize * m_cycleId];
} }
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------

View File

@ -49,6 +49,8 @@
#include <Kokkos_HIP_Space.hpp> #include <Kokkos_HIP_Space.hpp>
#include <mutex>
namespace Kokkos { namespace Kokkos {
namespace Experimental { namespace Experimental {
namespace Impl { namespace Impl {
@ -83,33 +85,46 @@ class HIPInternal {
public: public:
using size_type = ::Kokkos::Experimental::HIP::size_type; using size_type = ::Kokkos::Experimental::HIP::size_type;
int m_hipDev; int m_hipDev = -1;
int m_hipArch; int m_hipArch = -1;
unsigned m_multiProcCount; unsigned m_multiProcCount = 0;
unsigned m_maxWarpCount; unsigned m_maxWarpCount = 0;
unsigned m_maxBlock; unsigned m_maxBlock = 0;
unsigned m_maxBlocksPerSM; unsigned m_maxBlocksPerSM = 0;
unsigned m_maxSharedWords; unsigned m_maxSharedWords = 0;
int m_regsPerSM; int m_regsPerSM;
int m_shmemPerSM; int m_shmemPerSM = 0;
int m_maxShmemPerBlock; int m_maxShmemPerBlock = 0;
int m_maxThreadsPerSM; int m_maxThreadsPerSM = 0;
// array of DriverTypes to be allocated in host-pinned memory for async
// kernel launches
mutable char *d_driverWorkArray = nullptr;
// number of kernel launches that can be in-flight w/o synchronization
const int m_maxDriverCycles = 100;
// max size of a DriverType [bytes]
mutable size_t m_maxDriverTypeSize = 1024 * 10;
// the current index in the driverWorkArray
mutable int m_cycleId = 0;
// mutex to access d_driverWorkArray
mutable std::mutex m_mutexWorkArray;
// Scratch Spaces for Reductions // Scratch Spaces for Reductions
size_type m_scratchSpaceCount; size_type m_scratchSpaceCount = 0;
size_type m_scratchFlagsCount; size_type m_scratchFlagsCount = 0;
size_type *m_scratchSpace; size_type *m_scratchSpace = nullptr;
size_type *m_scratchFlags; size_type *m_scratchFlags = nullptr;
uint32_t *m_scratchConcurrentBitset = nullptr; uint32_t *m_scratchConcurrentBitset = nullptr;
hipDeviceProp_t m_deviceProp; hipDeviceProp_t m_deviceProp;
hipStream_t m_stream; hipStream_t m_stream = nullptr;
// Team Scratch Level 1 Space // Team Scratch Level 1 Space
mutable int64_t m_team_scratch_current_size; mutable int64_t m_team_scratch_current_size = 0;
mutable void *m_team_scratch_ptr; mutable void *m_team_scratch_ptr = nullptr;
mutable std::mutex m_team_scratch_mutex;
bool was_finalized = false; bool was_finalized = false;
@ -117,9 +132,7 @@ class HIPInternal {
int verify_is_initialized(const char *const label) const; int verify_is_initialized(const char *const label) const;
int is_initialized() const { int is_initialized() const { return m_hipDev >= 0; }
return m_hipDev >= 0;
} // 0 != m_scratchSpace && 0 != m_scratchFlags ; }
void initialize(int hip_device_id, hipStream_t stream = nullptr); void initialize(int hip_device_id, hipStream_t stream = nullptr);
void finalize(); void finalize();
@ -128,25 +141,12 @@ class HIPInternal {
void fence() const; void fence() const;
// returns the next driver type pointer in our work array
char *get_next_driver(size_t driverTypeSize) const;
~HIPInternal(); ~HIPInternal();
HIPInternal() HIPInternal() = default;
: m_hipDev(-1),
m_hipArch(-1),
m_multiProcCount(0),
m_maxWarpCount(0),
m_maxBlock(0),
m_maxSharedWords(0),
m_shmemPerSM(0),
m_maxShmemPerBlock(0),
m_maxThreadsPerSM(0),
m_scratchSpaceCount(0),
m_scratchFlagsCount(0),
m_scratchSpace(nullptr),
m_scratchFlags(nullptr),
m_stream(nullptr),
m_team_scratch_current_size(0),
m_team_scratch_ptr(nullptr) {}
// Resizing of reduction related scratch spaces // Resizing of reduction related scratch spaces
size_type *scratch_space(const size_type size); size_type *scratch_space(const size_type size);

View File

@ -49,9 +49,9 @@
#if defined(__HIPCC__) #if defined(__HIPCC__)
#include <Kokkos_HIP_Space.hpp>
#include <HIP/Kokkos_HIP_Error.hpp> #include <HIP/Kokkos_HIP_Error.hpp>
#include <HIP/Kokkos_HIP_Instance.hpp> #include <HIP/Kokkos_HIP_Instance.hpp>
#include <Kokkos_HIP_Space.hpp>
// Must use global variable on the device with HIP-Clang // Must use global variable on the device with HIP-Clang
#ifdef __HIP__ #ifdef __HIP__
@ -127,16 +127,66 @@ struct HIPDispatchProperties {
HIPLaunchMechanism launch_mechanism = l; HIPLaunchMechanism launch_mechanism = l;
}; };
template <class DriverType, class LaunchBounds = Kokkos::LaunchBounds<>, template <typename DriverType, typename LaunchBounds,
HIPLaunchMechanism LaunchMechanism>
struct HIPParallelLaunchKernelFunc;
template <typename DriverType, unsigned int MaxThreadsPerBlock,
unsigned int MinBlocksPerSM>
struct HIPParallelLaunchKernelFunc<
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
HIPLaunchMechanism::LocalMemory> {
static auto get_kernel_func() {
return hip_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
MinBlocksPerSM>;
}
};
template <typename DriverType>
struct HIPParallelLaunchKernelFunc<DriverType, Kokkos::LaunchBounds<0, 0>,
HIPLaunchMechanism::LocalMemory> {
static auto get_kernel_func() {
return hip_parallel_launch_local_memory<DriverType, 1024, 1>;
}
};
template <typename DriverType, typename LaunchBounds,
HIPLaunchMechanism LaunchMechanism>
struct HIPParallelLaunchKernelInvoker;
template <typename DriverType, typename LaunchBounds>
struct HIPParallelLaunchKernelInvoker<DriverType, LaunchBounds,
HIPLaunchMechanism::LocalMemory>
: HIPParallelLaunchKernelFunc<DriverType, LaunchBounds,
HIPLaunchMechanism::LocalMemory> {
using base_t = HIPParallelLaunchKernelFunc<DriverType, LaunchBounds,
HIPLaunchMechanism::LocalMemory>;
static void invoke_kernel(DriverType const *driver, dim3 const &grid,
dim3 const &block, int shmem,
HIPInternal const *hip_instance) {
(base_t::get_kernel_func())<<<grid, block, shmem, hip_instance->m_stream>>>(
driver);
}
};
template <typename DriverType, typename LaunchBounds = Kokkos::LaunchBounds<>,
HIPLaunchMechanism LaunchMechanism = HIPLaunchMechanism::LocalMemory> HIPLaunchMechanism LaunchMechanism = HIPLaunchMechanism::LocalMemory>
struct HIPParallelLaunch; struct HIPParallelLaunch;
template <class DriverType, unsigned int MaxThreadsPerBlock, template <typename DriverType, unsigned int MaxThreadsPerBlock,
unsigned int MinBlocksPerSM> unsigned int MinBlocksPerSM>
struct HIPParallelLaunch< struct HIPParallelLaunch<
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
HIPLaunchMechanism::LocalMemory>
: HIPParallelLaunchKernelInvoker<
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>, DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
HIPLaunchMechanism::LocalMemory> { HIPLaunchMechanism::LocalMemory> {
inline HIPParallelLaunch(const DriverType &driver, const dim3 &grid, using base_t = HIPParallelLaunchKernelInvoker<
DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>,
HIPLaunchMechanism::LocalMemory>;
HIPParallelLaunch(const DriverType &driver, const dim3 &grid,
const dim3 &block, const int shmem, const dim3 &block, const int shmem,
const HIPInternal *hip_instance, const HIPInternal *hip_instance,
const bool /*prefer_shmem*/) { const bool /*prefer_shmem*/) {
@ -148,72 +198,16 @@ struct HIPParallelLaunch<
KOKKOS_ENSURE_HIP_LOCK_ARRAYS_ON_DEVICE(); KOKKOS_ENSURE_HIP_LOCK_ARRAYS_ON_DEVICE();
// FIXME_HIP -- there is currently an error copying (some) structs
// by value to the device in HIP-Clang / VDI
// As a workaround, we can malloc the DriverType and explictly copy over.
// To remove once solved in HIP
DriverType *d_driver;
HIP_SAFE_CALL(hipMalloc(&d_driver, sizeof(DriverType)));
HIP_SAFE_CALL(hipMemcpyAsync(d_driver, &driver, sizeof(DriverType),
hipMemcpyHostToDevice,
hip_instance->m_stream));
hip_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
MinBlocksPerSM>
<<<grid, block, shmem, hip_instance->m_stream>>>(d_driver);
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
HIP_SAFE_CALL(hipGetLastError());
hip_instance->fence();
#endif
HIP_SAFE_CALL(hipFree(d_driver));
}
}
static hipFuncAttributes get_hip_func_attributes() {
static hipFuncAttributes attr = []() {
hipFuncAttributes attr;
HIP_SAFE_CALL(hipFuncGetAttributes(
&attr,
reinterpret_cast<void const *>(
hip_parallel_launch_local_memory<DriverType, MaxThreadsPerBlock,
MinBlocksPerSM>)));
return attr;
}();
return attr;
}
};
template <class DriverType>
struct HIPParallelLaunch<DriverType, Kokkos::LaunchBounds<0, 0>,
HIPLaunchMechanism::LocalMemory> {
inline HIPParallelLaunch(const DriverType &driver, const dim3 &grid,
const dim3 &block, const int shmem,
const HIPInternal *hip_instance,
const bool /*prefer_shmem*/) {
if ((grid.x != 0) && ((block.x * block.y * block.z) != 0)) {
if (hip_instance->m_maxShmemPerBlock < shmem) {
Kokkos::Impl::throw_runtime_exception(std::string(
"HIPParallelLaunch FAILED: shared memory request is too large"));
}
KOKKOS_ENSURE_HIP_LOCK_ARRAYS_ON_DEVICE();
// Invoke the driver function on the device // Invoke the driver function on the device
DriverType *d_driver = reinterpret_cast<DriverType *>(
// FIXME_HIP -- see note about struct copy by value above hip_instance->get_next_driver(sizeof(DriverType)));
DriverType *d_driver; std::memcpy((void *)d_driver, (void *)&driver, sizeof(DriverType));
HIP_SAFE_CALL(hipMalloc(&d_driver, sizeof(DriverType))); base_t::invoke_kernel(d_driver, grid, block, shmem, hip_instance);
HIP_SAFE_CALL(hipMemcpyAsync(d_driver, &driver, sizeof(DriverType),
hipMemcpyHostToDevice,
hip_instance->m_stream));
hip_parallel_launch_local_memory<DriverType, 1024, 1>
<<<grid, block, shmem, hip_instance->m_stream>>>(d_driver);
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK) #if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
HIP_SAFE_CALL(hipGetLastError()); HIP_SAFE_CALL(hipGetLastError());
hip_instance->fence(); hip_instance->fence();
#endif #endif
HIP_SAFE_CALL(hipFree(d_driver));
} }
} }
@ -221,8 +215,7 @@ struct HIPParallelLaunch<DriverType, Kokkos::LaunchBounds<0, 0>,
static hipFuncAttributes attr = []() { static hipFuncAttributes attr = []() {
hipFuncAttributes attr; hipFuncAttributes attr;
HIP_SAFE_CALL(hipFuncGetAttributes( HIP_SAFE_CALL(hipFuncGetAttributes(
&attr, reinterpret_cast<void const *>( &attr, reinterpret_cast<void const *>(base_t::get_kernel_func())));
hip_parallel_launch_local_memory<DriverType, 1024, 1>)));
return attr; return attr;
}(); }();
return attr; return attr;

View File

@ -0,0 +1,37 @@
#ifndef KOKKOS_HIP_MDRANGEPOLICY_HPP_
#define KOKKOS_HIP_MDRANGEPOLICY_HPP_
#include <KokkosExp_MDRangePolicy.hpp>
namespace Kokkos {
template <>
struct default_outer_direction<Kokkos::Experimental::HIP> {
using type = Iterate;
static constexpr Iterate value = Iterate::Left;
};
template <>
struct default_inner_direction<Kokkos::Experimental::HIP> {
using type = Iterate;
static constexpr Iterate value = Iterate::Left;
};
namespace Impl {
// Settings for MDRangePolicy
template <>
inline TileSizeProperties get_tile_size_properties<Kokkos::Experimental::HIP>(
const Kokkos::Experimental::HIP& space) {
TileSizeProperties properties;
properties.max_threads =
space.impl_internal_space_instance()->m_maxThreadsPerSM;
properties.default_largest_tile_size = 16;
properties.default_tile_size = 4;
properties.max_total_tile_size = 1024;
return properties;
}
} // Namespace Impl
} // Namespace Kokkos
#endif

View File

@ -49,6 +49,7 @@
#include <HIP/Kokkos_HIP_KernelLaunch.hpp> #include <HIP/Kokkos_HIP_KernelLaunch.hpp>
#include <HIP/Kokkos_HIP_ReduceScan.hpp> #include <HIP/Kokkos_HIP_ReduceScan.hpp>
#include <KokkosExp_MDRangePolicy.hpp> #include <KokkosExp_MDRangePolicy.hpp>
#include <impl/KokkosExp_IterateTileGPU.hpp>
#include <Kokkos_Parallel.hpp> #include <Kokkos_Parallel.hpp>
namespace Kokkos { namespace Kokkos {
@ -72,7 +73,7 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>,
ParallelFor& operator=(ParallelFor const&) = delete; ParallelFor& operator=(ParallelFor const&) = delete;
public: public:
inline __device__ void operator()(void) const { inline __device__ void operator()() const {
Kokkos::Impl::DeviceIterateTile<Policy::rank, Policy, FunctorType, Kokkos::Impl::DeviceIterateTile<Policy::rank, Policy, FunctorType,
typename Policy::work_tag>(m_policy, typename Policy::work_tag>(m_policy,
m_functor) m_functor)
@ -175,6 +176,25 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>,
ParallelFor(FunctorType const& arg_functor, Policy const& arg_policy) ParallelFor(FunctorType const& arg_functor, Policy const& arg_policy)
: m_functor(arg_functor), m_policy(arg_policy) {} : m_functor(arg_functor), m_policy(arg_policy) {}
template <typename Policy, typename Functor>
static int max_tile_size_product(const Policy& pol, const Functor&) {
using closure_type =
ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>,
Kokkos::Experimental::HIP>;
hipFuncAttributes attr = Kokkos::Experimental::Impl::HIPParallelLaunch<
closure_type, LaunchBounds>::get_hip_func_attributes();
auto const& prop = pol.space().hip_device_prop();
// Limits due to registers/SM, MDRange doesn't have
// shared memory constraints
int const regs_per_sm = prop.regsPerMultiprocessor;
int const regs_per_thread = attr.numRegs;
int const max_threads_per_sm = regs_per_sm / regs_per_thread;
return std::min(
max_threads_per_sm,
static_cast<int>(
Kokkos::Experimental::Impl::HIPTraits::MaxThreadsPerBlock));
}
}; };
// ParallelReduce // ParallelReduce
@ -231,7 +251,7 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
DeviceIteratePattern(m_policy, m_functor, update).exec_range(); DeviceIteratePattern(m_policy, m_functor, update).exec_range();
} }
inline __device__ void operator()(void) const { inline __device__ void operator()() const {
const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize / const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize /
sizeof(size_type)> sizeof(size_type)>
word_count(ValueTraits::value_size( word_count(ValueTraits::value_size(
@ -291,13 +311,19 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
::Kokkos::Experimental::Impl::HIPTraits::MaxThreadsPerBlock; ::Kokkos::Experimental::Impl::HIPTraits::MaxThreadsPerBlock;
int shmem_size = ::Kokkos::Impl::hip_single_inter_block_reduce_scan_shmem< int shmem_size = ::Kokkos::Impl::hip_single_inter_block_reduce_scan_shmem<
false, FunctorType, WorkTag>(f, n); false, FunctorType, WorkTag>(f, n);
using closure_type = Impl::ParallelReduce<FunctorType, Policy, ReducerType>;
hipFuncAttributes attr = ::Kokkos::Experimental::Impl::HIPParallelLaunch<
closure_type, LaunchBounds>::get_hip_func_attributes();
while ( while (
(n && (n &&
(m_policy.space().impl_internal_space_instance()->m_maxShmemPerBlock < (m_policy.space().impl_internal_space_instance()->m_maxShmemPerBlock <
shmem_size)) || shmem_size)) ||
(n > static_cast<unsigned>( (n >
::Kokkos::Experimental::Impl::hip_get_max_block_size< static_cast<unsigned>(
ParallelReduce, LaunchBounds>(f, 1, shmem_size, 0)))) { ::Kokkos::Experimental::Impl::hip_get_max_block_size<FunctorType,
LaunchBounds>(
m_policy.space().impl_internal_space_instance(), attr, f, 1,
shmem_size, 0)))) {
n >>= 1; n >>= 1;
shmem_size = ::Kokkos::Impl::hip_single_inter_block_reduce_scan_shmem< shmem_size = ::Kokkos::Impl::hip_single_inter_block_reduce_scan_shmem<
false, FunctorType, WorkTag>(f, n); false, FunctorType, WorkTag>(f, n);
@ -391,6 +417,23 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
memory_space>::accessible), memory_space>::accessible),
m_scratch_space(nullptr), m_scratch_space(nullptr),
m_scratch_flags(nullptr) {} m_scratch_flags(nullptr) {}
template <typename Policy, typename Functor>
static int max_tile_size_product(const Policy& pol, const Functor&) {
using closure_type =
ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>,
ReducerType, Kokkos::Experimental::HIP>;
hipFuncAttributes attr = Kokkos::Experimental::Impl::HIPParallelLaunch<
closure_type, LaunchBounds>::get_hip_func_attributes();
auto const& prop = pol.space().hip_device_prop();
// Limits due do registers/SM
int const regs_per_sm = prop.regsPerMultiprocessor;
int const regs_per_thread = attr.numRegs;
int const max_threads_per_sm = regs_per_sm / regs_per_thread;
return std::min(
max_threads_per_sm,
static_cast<int>(
Kokkos::Experimental::Impl::HIPTraits::MaxThreadsPerBlock));
}
}; };
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos

View File

@ -92,7 +92,7 @@ class ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>,
public: public:
using functor_type = FunctorType; using functor_type = FunctorType;
inline __device__ void operator()(void) const { inline __device__ void operator()() const {
const Member work_stride = blockDim.y * gridDim.x; const Member work_stride = blockDim.y * gridDim.x;
const Member work_end = m_policy.end(); const Member work_end = m_policy.end();
@ -174,11 +174,14 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
size_type* m_scratch_space = nullptr; size_type* m_scratch_space = nullptr;
size_type* m_scratch_flags = nullptr; size_type* m_scratch_flags = nullptr;
// FIXME_HIP_PERFORMANCE Need a rule to choose when to use shared memory and #if HIP_VERSION < 401
// when to use shuffle
static bool constexpr UseShflReduction = static bool constexpr UseShflReduction =
((sizeof(value_type) > 2 * sizeof(double)) && ((sizeof(value_type) > 2 * sizeof(double)) &&
static_cast<bool>(ValueTraits::StaticValueSize)); static_cast<bool>(ValueTraits::StaticValueSize));
#else
static bool constexpr UseShflReduction =
static_cast<bool>(ValueTraits::StaticValueSize);
#endif
private: private:
struct ShflReductionTag {}; struct ShflReductionTag {};
@ -330,13 +333,19 @@ class ParallelReduce<FunctorType, Kokkos::RangePolicy<Traits...>, ReducerType,
int shmem_size = int shmem_size =
hip_single_inter_block_reduce_scan_shmem<false, FunctorType, WorkTag>( hip_single_inter_block_reduce_scan_shmem<false, FunctorType, WorkTag>(
f, n); f, n);
using closure_type = Impl::ParallelReduce<FunctorType, Policy, ReducerType>;
hipFuncAttributes attr = ::Kokkos::Experimental::Impl::HIPParallelLaunch<
closure_type, LaunchBounds>::get_hip_func_attributes();
while ( while (
(n && (n &&
(m_policy.space().impl_internal_space_instance()->m_maxShmemPerBlock < (m_policy.space().impl_internal_space_instance()->m_maxShmemPerBlock <
shmem_size)) || shmem_size)) ||
(n > static_cast<unsigned int>( (n >
Kokkos::Experimental::Impl::hip_get_max_block_size< static_cast<unsigned int>(
ParallelReduce, LaunchBounds>(f, 1, shmem_size, 0)))) { ::Kokkos::Experimental::Impl::hip_get_max_block_size<FunctorType,
LaunchBounds>(
m_policy.space().impl_internal_space_instance(), attr, f, 1,
shmem_size, 0)))) {
n >>= 1; n >>= 1;
shmem_size = shmem_size =
hip_single_inter_block_reduce_scan_shmem<false, FunctorType, WorkTag>( hip_single_inter_block_reduce_scan_shmem<false, FunctorType, WorkTag>(
@ -493,7 +502,7 @@ class ParallelScanHIPBase {
//---------------------------------------- //----------------------------------------
__device__ inline void initial(void) const { __device__ inline void initial() const {
const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize / const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize /
sizeof(size_type)> sizeof(size_type)>
word_count(ValueTraits::value_size(m_functor) / sizeof(size_type)); word_count(ValueTraits::value_size(m_functor) / sizeof(size_type));
@ -529,7 +538,7 @@ class ParallelScanHIPBase {
//---------------------------------------- //----------------------------------------
__device__ inline void final(void) const { __device__ inline void final() const {
const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize / const integral_nonzero_constant<size_type, ValueTraits::StaticValueSize /
sizeof(size_type)> sizeof(size_type)>
word_count(ValueTraits::value_size(m_functor) / sizeof(size_type)); word_count(ValueTraits::value_size(m_functor) / sizeof(size_type));
@ -606,7 +615,7 @@ class ParallelScanHIPBase {
public: public:
//---------------------------------------- //----------------------------------------
__device__ inline void operator()(void) const { __device__ inline void operator()() const {
if (!m_final) { if (!m_final) {
initial(); initial();
} else { } else {

View File

@ -433,6 +433,9 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
int m_shmem_size; int m_shmem_size;
void* m_scratch_ptr[2]; void* m_scratch_ptr[2];
int m_scratch_size[2]; int m_scratch_size[2];
// Only let one ParallelFor/Reduce modify the team scratch memory. The
// constructor acquires the mutex which is released in the destructor.
std::unique_lock<std::mutex> m_scratch_lock;
template <typename TagType> template <typename TagType>
__device__ inline __device__ inline
@ -449,7 +452,7 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
} }
public: public:
__device__ inline void operator()(void) const { __device__ inline void operator()() const {
// Iterate this block through the league // Iterate this block through the league
int64_t threadid = 0; int64_t threadid = 0;
if (m_scratch_size[1] > 0) { if (m_scratch_size[1] > 0) {
@ -513,7 +516,10 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
m_policy(arg_policy), m_policy(arg_policy),
m_league_size(arg_policy.league_size()), m_league_size(arg_policy.league_size()),
m_team_size(arg_policy.team_size()), m_team_size(arg_policy.team_size()),
m_vector_size(arg_policy.impl_vector_length()) { m_vector_size(arg_policy.impl_vector_length()),
m_scratch_lock(m_policy.space()
.impl_internal_space_instance()
->m_team_scratch_mutex) {
hipFuncAttributes attr = ::Kokkos::Experimental::Impl::HIPParallelLaunch< hipFuncAttributes attr = ::Kokkos::Experimental::Impl::HIPParallelLaunch<
ParallelFor, launch_bounds>::get_hip_func_attributes(); ParallelFor, launch_bounds>::get_hip_func_attributes();
m_team_size = m_team_size =
@ -640,6 +646,9 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
const size_type m_league_size; const size_type m_league_size;
int m_team_size; int m_team_size;
const size_type m_vector_size; const size_type m_vector_size;
// Only let one ParallelFor/Reduce modify the team scratch memory. The
// constructor acquires the mutex which is released in the destructor.
std::unique_lock<std::mutex> m_scratch_lock;
template <class TagType> template <class TagType>
__device__ inline __device__ inline
@ -877,7 +886,10 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
m_scratch_ptr{nullptr, nullptr}, m_scratch_ptr{nullptr, nullptr},
m_league_size(arg_policy.league_size()), m_league_size(arg_policy.league_size()),
m_team_size(arg_policy.team_size()), m_team_size(arg_policy.team_size()),
m_vector_size(arg_policy.impl_vector_length()) { m_vector_size(arg_policy.impl_vector_length()),
m_scratch_lock(m_policy.space()
.impl_internal_space_instance()
->m_team_scratch_mutex) {
hipFuncAttributes attr = Kokkos::Experimental::Impl::HIPParallelLaunch< hipFuncAttributes attr = Kokkos::Experimental::Impl::HIPParallelLaunch<
ParallelReduce, launch_bounds>::get_hip_func_attributes(); ParallelReduce, launch_bounds>::get_hip_func_attributes();
m_team_size = m_team_size =
@ -976,7 +988,10 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
m_scratch_ptr{nullptr, nullptr}, m_scratch_ptr{nullptr, nullptr},
m_league_size(arg_policy.league_size()), m_league_size(arg_policy.league_size()),
m_team_size(arg_policy.team_size()), m_team_size(arg_policy.team_size()),
m_vector_size(arg_policy.impl_vector_length()) { m_vector_size(arg_policy.impl_vector_length()),
m_scratch_lock(m_policy.space()
.impl_internal_space_instance()
->m_team_scratch_mutex) {
hipFuncAttributes attr = Kokkos::Experimental::Impl::HIPParallelLaunch< hipFuncAttributes attr = Kokkos::Experimental::Impl::HIPParallelLaunch<
ParallelReduce, launch_bounds>::get_hip_func_attributes(); ParallelReduce, launch_bounds>::get_hip_func_attributes();
m_team_size = m_team_size =

View File

@ -42,12 +42,6 @@
//@HEADER //@HEADER
*/ */
#include <stdlib.h>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <algorithm>
#include <atomic>
#include <Kokkos_Macros.hpp> #include <Kokkos_Macros.hpp>
#include <Kokkos_Core.hpp> #include <Kokkos_Core.hpp>
@ -57,6 +51,13 @@
#include <impl/Kokkos_Error.hpp> #include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_MemorySpace.hpp> #include <impl/Kokkos_MemorySpace.hpp>
#include <stdlib.h>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <algorithm>
#include <atomic>
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
namespace Kokkos { namespace Kokkos {
@ -172,14 +173,14 @@ void DeepCopyAsyncHIP(void* dst, void const* src, size_t n) {
namespace Kokkos { namespace Kokkos {
void Experimental::HIPSpace::access_error() { KOKKOS_DEPRECATED void Experimental::HIPSpace::access_error() {
const std::string msg( const std::string msg(
"Kokkos::Experimental::HIPSpace::access_error attempt to execute " "Kokkos::Experimental::HIPSpace::access_error attempt to execute "
"Experimental::HIP function from non-HIP space"); "Experimental::HIP function from non-HIP space");
Kokkos::Impl::throw_runtime_exception(msg); Kokkos::Impl::throw_runtime_exception(msg);
} }
void Experimental::HIPSpace::access_error(const void* const) { KOKKOS_DEPRECATED void Experimental::HIPSpace::access_error(const void* const) {
const std::string msg( const std::string msg(
"Kokkos::Experimental::HIPSpace::access_error attempt to execute " "Kokkos::Experimental::HIPSpace::access_error attempt to execute "
"Experimental::HIP function from non-HIP space"); "Experimental::HIP function from non-HIP space");
@ -326,45 +327,6 @@ SharedAllocationRecord<void, void> SharedAllocationRecord<
Kokkos::Experimental::HIPHostPinnedSpace, void>::s_root_record; Kokkos::Experimental::HIPHostPinnedSpace, void>::s_root_record;
#endif #endif
std::string SharedAllocationRecord<Kokkos::Experimental::HIPSpace,
void>::get_label() const {
SharedAllocationHeader header;
Kokkos::Impl::DeepCopy<Kokkos::HostSpace, Kokkos::Experimental::HIPSpace>(
&header, RecordBase::head(), sizeof(SharedAllocationHeader));
return std::string(header.m_label);
}
std::string SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace,
void>::get_label() const {
return std::string(RecordBase::head()->m_label);
}
SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>*
SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::allocate(
const Kokkos::Experimental::HIPSpace& arg_space,
const std::string& arg_label, const size_t arg_alloc_size) {
return new SharedAllocationRecord(arg_space, arg_label, arg_alloc_size);
}
SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>*
SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>::
allocate(const Kokkos::Experimental::HIPHostPinnedSpace& arg_space,
const std::string& arg_label, const size_t arg_alloc_size) {
return new SharedAllocationRecord(arg_space, arg_label, arg_alloc_size);
}
void SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::deallocate(
SharedAllocationRecord<void, void>* arg_rec) {
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
void SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>::
deallocate(SharedAllocationRecord<void, void>* arg_rec) {
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
SharedAllocationRecord<Kokkos::Experimental::HIPSpace, SharedAllocationRecord<Kokkos::Experimental::HIPSpace,
void>::~SharedAllocationRecord() { void>::~SharedAllocationRecord() {
const char* label = nullptr; const char* label = nullptr;
@ -393,7 +355,7 @@ SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::
const SharedAllocationRecord<void, void>::function_type arg_dealloc) const SharedAllocationRecord<void, void>::function_type arg_dealloc)
// Pass through allocated [ SharedAllocationHeader , user_memory ] // Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function // Pass through deallocation function
: SharedAllocationRecord<void, void>( : base_t(
#ifdef KOKKOS_ENABLE_DEBUG #ifdef KOKKOS_ENABLE_DEBUG
&SharedAllocationRecord<Kokkos::Experimental::HIPSpace, &SharedAllocationRecord<Kokkos::Experimental::HIPSpace,
void>::s_root_record, void>::s_root_record,
@ -405,13 +367,7 @@ SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::
SharedAllocationHeader header; SharedAllocationHeader header;
// Fill in the Header information this->base_t::_fill_host_accessible_header_info(header, arg_label);
header.m_record = static_cast<SharedAllocationRecord<void, void>*>(this);
strncpy(header.m_label, arg_label.c_str(),
SharedAllocationHeader::maximum_label_length);
// Set last element zero, in case c_str is too long
header.m_label[SharedAllocationHeader::maximum_label_length - 1] = (char)0;
// Copy to device memory // Copy to device memory
Kokkos::Impl::DeepCopy<Kokkos::Experimental::HIPSpace, HostSpace>( Kokkos::Impl::DeepCopy<Kokkos::Experimental::HIPSpace, HostSpace>(
@ -425,7 +381,7 @@ SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>::
const SharedAllocationRecord<void, void>::function_type arg_dealloc) const SharedAllocationRecord<void, void>::function_type arg_dealloc)
// Pass through allocated [ SharedAllocationHeader , user_memory ] // Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function // Pass through deallocation function
: SharedAllocationRecord<void, void>( : base_t(
#ifdef KOKKOS_ENABLE_DEBUG #ifdef KOKKOS_ENABLE_DEBUG
&SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, &SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace,
void>::s_root_record, void>::s_root_record,
@ -435,223 +391,8 @@ SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>::
sizeof(SharedAllocationHeader) + arg_alloc_size, arg_dealloc), sizeof(SharedAllocationHeader) + arg_alloc_size, arg_dealloc),
m_space(arg_space) { m_space(arg_space) {
// Fill in the Header information, directly accessible via host pinned memory // Fill in the Header information, directly accessible via host pinned memory
this->base_t::_fill_host_accessible_header_info(*RecordBase::m_alloc_ptr,
RecordBase::m_alloc_ptr->m_record = this; arg_label);
strncpy(RecordBase::m_alloc_ptr->m_label, arg_label.c_str(),
SharedAllocationHeader::maximum_label_length);
// Set last element zero, in case c_str is too long
RecordBase::m_alloc_ptr
->m_label[SharedAllocationHeader::maximum_label_length - 1] = (char)0;
}
//----------------------------------------------------------------------------
void* SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::
allocate_tracked(const Kokkos::Experimental::HIPSpace& arg_space,
const std::string& arg_alloc_label,
const size_t arg_alloc_size) {
if (!arg_alloc_size) return nullptr;
SharedAllocationRecord* const r =
allocate(arg_space, arg_alloc_label, arg_alloc_size);
RecordBase::increment(r);
return r->data();
}
void SharedAllocationRecord<Kokkos::Experimental::HIPSpace,
void>::deallocate_tracked(void* const
arg_alloc_ptr) {
if (arg_alloc_ptr != nullptr) {
SharedAllocationRecord* const r = get_record(arg_alloc_ptr);
RecordBase::decrement(r);
}
}
void* SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::
reallocate_tracked(void* const arg_alloc_ptr, const size_t arg_alloc_size) {
SharedAllocationRecord* const r_old = get_record(arg_alloc_ptr);
SharedAllocationRecord* const r_new =
allocate(r_old->m_space, r_old->get_label(), arg_alloc_size);
Kokkos::Impl::DeepCopy<Kokkos::Experimental::HIPSpace,
Kokkos::Experimental::HIPSpace>(
r_new->data(), r_old->data(), std::min(r_old->size(), r_new->size()));
RecordBase::increment(r_new);
RecordBase::decrement(r_old);
return r_new->data();
}
void* SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>::
allocate_tracked(const Kokkos::Experimental::HIPHostPinnedSpace& arg_space,
const std::string& arg_alloc_label,
const size_t arg_alloc_size) {
if (!arg_alloc_size) return nullptr;
SharedAllocationRecord* const r =
allocate(arg_space, arg_alloc_label, arg_alloc_size);
RecordBase::increment(r);
return r->data();
}
void SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace,
void>::deallocate_tracked(void* const
arg_alloc_ptr) {
if (arg_alloc_ptr) {
SharedAllocationRecord* const r = get_record(arg_alloc_ptr);
RecordBase::decrement(r);
}
}
void* SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>::
reallocate_tracked(void* const arg_alloc_ptr, const size_t arg_alloc_size) {
SharedAllocationRecord* const r_old = get_record(arg_alloc_ptr);
SharedAllocationRecord* const r_new =
allocate(r_old->m_space, r_old->get_label(), arg_alloc_size);
using HIPHostPinnedSpace = Kokkos::Experimental::HIPHostPinnedSpace;
Kokkos::Impl::DeepCopy<HIPHostPinnedSpace, HIPHostPinnedSpace>(
r_new->data(), r_old->data(), std::min(r_old->size(), r_new->size()));
RecordBase::increment(r_new);
RecordBase::decrement(r_old);
return r_new->data();
}
//----------------------------------------------------------------------------
SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>*
SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::get_record(
void* alloc_ptr) {
using Header = SharedAllocationHeader;
using RecordHIP =
SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>;
// Copy the header from the allocation
Header head;
Header const* const head_hip =
alloc_ptr ? Header::get_header(alloc_ptr) : nullptr;
if (alloc_ptr) {
Kokkos::Impl::DeepCopy<HostSpace, Kokkos::Experimental::HIPSpace>(
&head, head_hip, sizeof(SharedAllocationHeader));
}
RecordHIP* const record =
alloc_ptr ? static_cast<RecordHIP*>(head.m_record) : nullptr;
if (!alloc_ptr || record->m_alloc_ptr != head_hip) {
Kokkos::Impl::throw_runtime_exception(std::string(
"Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HIPSpace "
", void >::get_record ERROR"));
}
return record;
}
SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>*
SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace,
void>::get_record(void* alloc_ptr) {
using Header = SharedAllocationHeader;
using RecordHIP =
SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>;
Header* const h =
alloc_ptr ? reinterpret_cast<Header*>(alloc_ptr) - 1 : nullptr;
if (!alloc_ptr || h->m_record->m_alloc_ptr != h) {
Kokkos::Impl::throw_runtime_exception(std::string(
"Kokkos::Impl::SharedAllocationRecord< "
"Kokkos::Experimental::HIPHostPinnedSpace , void >::get_record ERROR"));
}
return static_cast<RecordHIP*>(h->m_record);
}
// Iterate records to print orphaned memory ...
void SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::
print_records(std::ostream& s, const Kokkos::Experimental::HIPSpace&,
bool detail) {
#ifdef KOKKOS_ENABLE_DEBUG
SharedAllocationRecord<void, void>* r = &s_root_record;
char buffer[256];
SharedAllocationHeader head;
if (detail) {
do {
if (r->m_alloc_ptr) {
Kokkos::Impl::DeepCopy<HostSpace, Kokkos::Experimental::HIPSpace>(
&head, r->m_alloc_ptr, sizeof(SharedAllocationHeader));
} else {
head.m_label[0] = 0;
}
// Formatting dependent on sizeof(uintptr_t)
const char* format_string;
if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string =
"HIP addr( 0x%.12lx ) list( 0x%.12lx 0x%.12lx ) extent[ 0x%.12lx + "
"%.8ld ] count(%d) dealloc(0x%.12lx) %s\n";
} else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string =
"HIP addr( 0x%.12llx ) list( 0x%.12llx 0x%.12llx ) extent[ "
"0x%.12llx + %.8ld ] count(%d) dealloc(0x%.12llx) %s\n";
}
snprintf(buffer, 256, format_string, reinterpret_cast<uintptr_t>(r),
reinterpret_cast<uintptr_t>(r->m_prev),
reinterpret_cast<uintptr_t>(r->m_next),
reinterpret_cast<uintptr_t>(r->m_alloc_ptr), r->m_alloc_size,
r->m_count, reinterpret_cast<uintptr_t>(r->m_dealloc),
head.m_label);
s << buffer;
r = r->m_next;
} while (r != &s_root_record);
} else {
do {
if (r->m_alloc_ptr) {
Kokkos::Impl::DeepCopy<HostSpace, Kokkos::Experimental::HIPSpace>(
&head, r->m_alloc_ptr, sizeof(SharedAllocationHeader));
// Formatting dependent on sizeof(uintptr_t)
const char* format_string;
if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string = "HIP [ 0x%.12lx + %ld ] %s\n";
} else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string = "HIP [ 0x%.12llx + %ld ] %s\n";
}
snprintf(buffer, 256, format_string,
reinterpret_cast<uintptr_t>(r->data()), r->size(),
head.m_label);
} else {
snprintf(buffer, 256, "HIP [ 0 + 0 ]\n");
}
s << buffer;
r = r->m_next;
} while (r != &s_root_record);
}
#else
(void)s;
(void)detail;
throw_runtime_exception(
"Kokkos::Impl::SharedAllocationRecord<HIPSpace>::print_records"
" only works with KOKKOS_ENABLE_DEBUG enabled");
#endif
} }
} // namespace Impl } // namespace Impl
@ -680,63 +421,22 @@ void HIP::impl_initialize(const HIP::SelectDevice config) {
void HIP::impl_finalize() { Impl::HIPInternal::singleton().finalize(); } void HIP::impl_finalize() { Impl::HIPInternal::singleton().finalize(); }
HIP::HIP() HIP::HIP()
: m_space_instance(&Impl::HIPInternal::singleton()), m_counter(nullptr) { : m_space_instance(&Impl::HIPInternal::singleton(),
[](Impl::HIPInternal*) {}) {
Impl::HIPInternal::singleton().verify_is_initialized( Impl::HIPInternal::singleton().verify_is_initialized(
"HIP instance constructor"); "HIP instance constructor");
} }
HIP::HIP(hipStream_t const stream) HIP::HIP(hipStream_t const stream)
: m_space_instance(new Impl::HIPInternal), m_counter(new int(1)) { : m_space_instance(new Impl::HIPInternal, [](Impl::HIPInternal* ptr) {
ptr->finalize();
delete ptr;
}) {
Impl::HIPInternal::singleton().verify_is_initialized( Impl::HIPInternal::singleton().verify_is_initialized(
"HIP instance constructor"); "HIP instance constructor");
m_space_instance->initialize(Impl::HIPInternal::singleton().m_hipDev, stream); m_space_instance->initialize(Impl::HIPInternal::singleton().m_hipDev, stream);
} }
KOKKOS_FUNCTION HIP::HIP(HIP&& other) noexcept {
m_space_instance = other.m_space_instance;
other.m_space_instance = nullptr;
m_counter = other.m_counter;
other.m_counter = nullptr;
}
KOKKOS_FUNCTION HIP::HIP(HIP const& other)
: m_space_instance(other.m_space_instance), m_counter(other.m_counter) {
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HIP_GPU
if (m_counter) Kokkos::atomic_add(m_counter, 1);
#endif
}
KOKKOS_FUNCTION HIP& HIP::operator=(HIP&& other) noexcept {
m_space_instance = other.m_space_instance;
other.m_space_instance = nullptr;
m_counter = other.m_counter;
other.m_counter = nullptr;
return *this;
}
KOKKOS_FUNCTION HIP& HIP::operator=(HIP const& other) {
m_space_instance = other.m_space_instance;
m_counter = other.m_counter;
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HIP_GPU
if (m_counter) Kokkos::atomic_add(m_counter, 1);
#endif
return *this;
}
KOKKOS_FUNCTION HIP::~HIP() noexcept {
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HIP_GPU
if (m_counter == nullptr) return;
int const count = Kokkos::atomic_fetch_sub(m_counter, 1);
if (count == 1) {
delete m_counter;
m_space_instance->finalize();
delete m_space_instance;
}
#endif
}
void HIP::print_configuration(std::ostream& s, const bool) { void HIP::print_configuration(std::ostream& s, const bool) {
Impl::HIPInternal::singleton().print_configuration(s); Impl::HIPInternal::singleton().print_configuration(s);
} }
@ -810,3 +510,26 @@ void HIPSpaceInitializer::print_configuration(std::ostream& msg,
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos
//==============================================================================
// <editor-fold desc="Explicit instantiations of CRTP Base classes"> {{{1
#include <impl/Kokkos_SharedAlloc_timpl.hpp>
namespace Kokkos {
namespace Impl {
// To avoid additional compilation cost for something that's (mostly?) not
// performance sensitive, we explicity instantiate these CRTP base classes here,
// where we have access to the associated *_timpl.hpp header files.
template class HostInaccessibleSharedAllocationRecordCommon<
Kokkos::Experimental::HIPSpace>;
template class SharedAllocationRecordCommon<Kokkos::Experimental::HIPSpace>;
template class SharedAllocationRecordCommon<
Kokkos::Experimental::HIPHostPinnedSpace>;
} // end namespace Impl
} // end namespace Kokkos
// </editor-fold> end Explicit instantiations of CRTP Base classes }}}1
//==============================================================================

View File

@ -644,13 +644,14 @@ KOKKOS_INLINE_FUNCTION
thread, count); thread, count);
} }
template <typename iType> template <typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION Impl::ThreadVectorRangeBoundariesStruct<
Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::HIPTeamMember> typename std::common_type<iType1, iType2>::type, Impl::HIPTeamMember>
ThreadVectorRange(const Impl::HIPTeamMember& thread, iType arg_begin, ThreadVectorRange(const Impl::HIPTeamMember& thread, iType1 arg_begin,
iType arg_end) { iType2 arg_end) {
using iType = typename std::common_type<iType1, iType2>::type;
return Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::HIPTeamMember>( return Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::HIPTeamMember>(
thread, arg_begin, arg_end); thread, iType(arg_begin), iType(arg_end));
} }
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
@ -961,7 +962,7 @@ KOKKOS_INLINE_FUNCTION
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
/** \brief Intra-thread vector parallel exclusive prefix sum. /** \brief Intra-thread vector parallel scan with reducer.
* *
* Executes closure(iType i, ValueType & val, bool final) for each i=[0..N) * Executes closure(iType i, ValueType & val, bool final) for each i=[0..N)
* *
@ -969,22 +970,21 @@ KOKKOS_INLINE_FUNCTION
* thread and a scan operation is performed. * thread and a scan operation is performed.
* The last call to closure has final == true. * The last call to closure has final == true.
*/ */
template <typename iType, class Closure> template <typename iType, class Closure, typename ReducerType>
KOKKOS_INLINE_FUNCTION void parallel_scan( KOKKOS_INLINE_FUNCTION
const Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::HIPTeamMember>& typename std::enable_if<Kokkos::is_reducer<ReducerType>::value>::type
loop_boundaries, parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<
const Closure& closure) { iType, Impl::HIPTeamMember>& loop_boundaries,
const Closure& closure, const ReducerType& reducer) {
#ifdef __HIP_DEVICE_COMPILE__ #ifdef __HIP_DEVICE_COMPILE__
// Extract value_type from closure using value_type = typename ReducerType::value_type;
value_type accum;
using value_type = typename Kokkos::Impl::FunctorAnalysis< reducer.init(accum);
Kokkos::Impl::FunctorPatternInterface::SCAN, void, Closure>::value_type; const value_type identity = accum;
// Loop through boundaries by vector-length chunks // Loop through boundaries by vector-length chunks
// must scan at each iteration // must scan at each iteration
value_type accum = 0;
// All thread "lanes" must loop the same number of times. // All thread "lanes" must loop the same number of times.
// Determine an loop end for all thread "lanes." // Determine an loop end for all thread "lanes."
// Requires: // Requires:
@ -997,47 +997,72 @@ KOKKOS_INLINE_FUNCTION void parallel_scan(
const int end = loop_boundaries.end + (rem ? blockDim.x - rem : 0); const int end = loop_boundaries.end + (rem ? blockDim.x - rem : 0);
for (int i = threadIdx.x; i < end; i += blockDim.x) { for (int i = threadIdx.x; i < end; i += blockDim.x) {
value_type val = 0; value_type val = identity;
// First acquire per-lane contributions: // First acquire per-lane contributions.
if (i < loop_boundaries.end) closure(i, val, false); // This sets i's val to i-1's contribution
// to make the latter in_place_shfl_up an
// exclusive scan -- the final accumulation
// of i's val will be included in the second
// closure call later.
if (i < loop_boundaries.end && threadIdx.x > 0) closure(i - 1, val, false);
value_type sval = val; // Bottom up exclusive scan in triangular pattern
// Bottom up inclusive scan in triangular pattern
// where each HIP thread is the root of a reduction tree // where each HIP thread is the root of a reduction tree
// from the zeroth "lane" to itself. // from the zeroth "lane" to itself.
// [t] += [t-1] if t >= 1 // [t] += [t-1] if t >= 1
// [t] += [t-2] if t >= 2 // [t] += [t-2] if t >= 2
// [t] += [t-4] if t >= 4 // [t] += [t-4] if t >= 4
// ... // ...
// This differs from the non-reducer overload, where an inclusive scan was
// implemented, because in general the binary operator cannot be inverted
// and we would not be able to remove the inclusive contribution by
// inversion.
for (int j = 1; j < static_cast<int>(blockDim.x); j <<= 1) { for (int j = 1; j < static_cast<int>(blockDim.x); j <<= 1) {
value_type tmp = 0; value_type tmp = identity;
::Kokkos::Experimental::Impl::in_place_shfl_up(tmp, sval, j, blockDim.x); ::Kokkos::Experimental::Impl::in_place_shfl_up(tmp, val, j, blockDim.x);
if (j <= static_cast<int>(threadIdx.x)) { if (j <= static_cast<int>(threadIdx.x)) {
sval += tmp; reducer.join(val, tmp);
} }
} }
// Include accumulation and remove value for exclusive scan: // Include accumulation
val = accum + sval - val; reducer.join(val, accum);
// Provide exclusive scan value: // Update i's contribution into the val
// and add it to accum for next round
if (i < loop_boundaries.end) closure(i, val, true); if (i < loop_boundaries.end) closure(i, val, true);
::Kokkos::Experimental::Impl::in_place_shfl(accum, val, blockDim.x - 1,
// Accumulate the last value in the inclusive scan:
::Kokkos::Experimental::Impl::in_place_shfl(sval, sval, blockDim.x - 1,
blockDim.x); blockDim.x);
accum += sval;
} }
#else #else
(void)loop_boundaries; (void)loop_boundaries;
(void)closure; (void)closure;
(void)reducer;
#endif #endif
} }
//----------------------------------------------------------------------------
/** \brief Intra-thread vector parallel exclusive prefix sum.
*
* Executes closure(iType i, ValueType & val, bool final) for each i=[0..N)
*
* The range [0..N) is mapped to all vector lanes in the
* thread and a scan operation is performed.
* The last call to closure has final == true.
*/
template <typename iType, class Closure>
KOKKOS_INLINE_FUNCTION void parallel_scan(
const Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::HIPTeamMember>&
loop_boundaries,
const Closure& closure) {
using value_type = typename Kokkos::Impl::FunctorAnalysis<
Kokkos::Impl::FunctorPatternInterface::SCAN, void, Closure>::value_type;
value_type dummy;
parallel_scan(loop_boundaries, closure, Kokkos::Sum<value_type>(dummy));
}
} // namespace Kokkos } // namespace Kokkos
namespace Kokkos { namespace Kokkos {

View File

@ -48,17 +48,11 @@
#include <initializer_list> #include <initializer_list>
#include <Kokkos_Layout.hpp> #include <Kokkos_Layout.hpp>
#include <Kokkos_Array.hpp>
#include <impl/KokkosExp_Host_IterateTile.hpp> #include <impl/KokkosExp_Host_IterateTile.hpp>
#include <Kokkos_ExecPolicy.hpp> #include <Kokkos_ExecPolicy.hpp>
#include <Kokkos_Parallel.hpp>
#include <type_traits> #include <type_traits>
#if defined(KOKKOS_ENABLE_CUDA) || \
(defined(__HIPCC__) && defined(KOKKOS_ENABLE_HIP))
#include <impl/KokkosExp_IterateTileGPU.hpp>
#endif
namespace Kokkos { namespace Kokkos {
// ------------------------------------------------------------------ // // ------------------------------------------------------------------ //
@ -75,21 +69,13 @@ enum class Iterate
template <typename ExecSpace> template <typename ExecSpace>
struct default_outer_direction { struct default_outer_direction {
using type = Iterate; using type = Iterate;
#if defined(KOKKOS_ENABLE_CUDA) || defined(KOKKOS_ENABLE_HIP)
static constexpr Iterate value = Iterate::Left;
#else
static constexpr Iterate value = Iterate::Right; static constexpr Iterate value = Iterate::Right;
#endif
}; };
template <typename ExecSpace> template <typename ExecSpace>
struct default_inner_direction { struct default_inner_direction {
using type = Iterate; using type = Iterate;
#if defined(KOKKOS_ENABLE_CUDA) || defined(KOKKOS_ENABLE_HIP)
static constexpr Iterate value = Iterate::Left;
#else
static constexpr Iterate value = Iterate::Right; static constexpr Iterate value = Iterate::Right;
#endif
}; };
// Iteration Pattern // Iteration Pattern
@ -179,6 +165,25 @@ constexpr NVCC_WONT_LET_ME_CALL_YOU_Array to_array_potentially_narrowing(
} }
return a; return a;
} }
struct TileSizeProperties {
int max_threads;
int default_largest_tile_size;
int default_tile_size;
int max_total_tile_size;
};
template <typename ExecutionSpace>
TileSizeProperties get_tile_size_properties(const ExecutionSpace&) {
// Host settings
TileSizeProperties properties;
properties.max_threads = std::numeric_limits<int>::max();
properties.default_largest_tile_size = 0;
properties.default_tile_size = 2;
properties.max_total_tile_size = std::numeric_limits<int>::max();
return properties;
}
} // namespace Impl } // namespace Impl
// multi-dimensional iteration pattern // multi-dimensional iteration pattern
@ -208,7 +213,7 @@ struct MDRangePolicy : public Kokkos::Impl::PolicyTraits<Properties...> {
using launch_bounds = typename traits::launch_bounds; using launch_bounds = typename traits::launch_bounds;
using member_type = typename range_policy::member_type; using member_type = typename range_policy::member_type;
enum { rank = static_cast<int>(iteration_pattern::rank) }; static constexpr int rank = iteration_pattern::rank;
using index_type = typename traits::index_type; using index_type = typename traits::index_type;
using array_index_type = std::int64_t; using array_index_type = std::int64_t;
@ -231,37 +236,20 @@ struct MDRangePolicy : public Kokkos::Impl::PolicyTraits<Properties...> {
point_type m_tile_end = {}; point_type m_tile_end = {};
index_type m_num_tiles = 1; index_type m_num_tiles = 1;
index_type m_prod_tile_dims = 1; index_type m_prod_tile_dims = 1;
bool m_tune_tile_size = false;
/* static constexpr auto outer_direction =
// NDE enum impl definition alternative - replace static constexpr int ?
enum { outer_direction = static_cast<int> (
(iteration_pattern::outer_direction != Iterate::Default) (iteration_pattern::outer_direction != Iterate::Default)
? iteration_pattern::outer_direction ? iteration_pattern::outer_direction
: default_outer_direction< typename traits::execution_space>::value ) }; : default_outer_direction<typename traits::execution_space>::value;
enum { inner_direction = static_cast<int> ( static constexpr auto inner_direction =
iteration_pattern::inner_direction != Iterate::Default iteration_pattern::inner_direction != Iterate::Default
? iteration_pattern::inner_direction ? iteration_pattern::inner_direction
: default_inner_direction< typename traits::execution_space>::value ) }; : default_inner_direction<typename traits::execution_space>::value;
enum { Right = static_cast<int>( Iterate::Right ) }; static constexpr auto Right = Iterate::Right;
enum { Left = static_cast<int>( Iterate::Left ) }; static constexpr auto Left = Iterate::Left;
*/
// static constexpr int rank = iteration_pattern::rank;
static constexpr int outer_direction = static_cast<int>(
(iteration_pattern::outer_direction != Iterate::Default)
? iteration_pattern::outer_direction
: default_outer_direction<typename traits::execution_space>::value);
static constexpr int inner_direction = static_cast<int>(
iteration_pattern::inner_direction != Iterate::Default
? iteration_pattern::inner_direction
: default_inner_direction<typename traits::execution_space>::value);
// Ugly ugly workaround intel 14 not handling scoped enum correctly
static constexpr int Right = static_cast<int>(Iterate::Right);
static constexpr int Left = static_cast<int>(Iterate::Left);
KOKKOS_INLINE_FUNCTION const typename traits::execution_space& space() const { KOKKOS_INLINE_FUNCTION const typename traits::execution_space& space() const {
return m_space; return m_space;
@ -320,7 +308,7 @@ struct MDRangePolicy : public Kokkos::Impl::PolicyTraits<Properties...> {
point_type const& lower, point_type const& upper, point_type const& lower, point_type const& upper,
tile_type const& tile = tile_type{}) tile_type const& tile = tile_type{})
: m_space(work_space), m_lower(lower), m_upper(upper), m_tile(tile) { : m_space(work_space), m_lower(lower), m_upper(upper), m_tile(tile) {
init(); init_helper(Impl::get_tile_size_properties(work_space));
} }
template <typename T, std::size_t NT = rank, template <typename T, std::size_t NT = rank,
@ -354,94 +342,57 @@ struct MDRangePolicy : public Kokkos::Impl::PolicyTraits<Properties...> {
m_tile(p.m_tile), m_tile(p.m_tile),
m_tile_end(p.m_tile_end), m_tile_end(p.m_tile_end),
m_num_tiles(p.m_num_tiles), m_num_tiles(p.m_num_tiles),
m_prod_tile_dims(p.m_prod_tile_dims) {} m_prod_tile_dims(p.m_prod_tile_dims),
m_tune_tile_size(p.m_tune_tile_size) {}
void impl_change_tile_size(const point_type& tile) {
m_tile = tile;
init_helper(Impl::get_tile_size_properties(m_space));
}
bool impl_tune_tile_size() const { return m_tune_tile_size; }
private: private:
void init() { void init_helper(Impl::TileSizeProperties properties) {
// Host m_prod_tile_dims = 1;
if (true
#if defined(KOKKOS_ENABLE_CUDA)
&& !std::is_same<typename traits::execution_space, Kokkos::Cuda>::value
#endif
#if defined(KOKKOS_ENABLE_HIP)
&& !std::is_same<typename traits::execution_space,
Kokkos::Experimental::HIP>::value
#endif
) {
index_type span;
for (int i = 0; i < rank; ++i) {
span = m_upper[i] - m_lower[i];
if (m_tile[i] <= 0) {
if (((int)inner_direction == (int)Right && (i < rank - 1)) ||
((int)inner_direction == (int)Left && (i > 0))) {
m_tile[i] = 2;
} else {
m_tile[i] = (span == 0 ? 1 : span);
}
}
m_tile_end[i] =
static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]);
m_num_tiles *= m_tile_end[i];
m_prod_tile_dims *= m_tile[i];
}
}
#if defined(KOKKOS_ENABLE_CUDA) || defined(KOKKOS_ENABLE_HIP)
else // Cuda or HIP
{
index_type span;
int increment = 1; int increment = 1;
int rank_start = 0; int rank_start = 0;
int rank_end = rank; int rank_end = rank;
if ((int)inner_direction == (int)Right) { if (inner_direction == Iterate::Right) {
increment = -1; increment = -1;
rank_start = rank - 1; rank_start = rank - 1;
rank_end = -1; rank_end = -1;
} }
bool is_cuda_exec_space =
#if defined(KOKKOS_ENABLE_CUDA)
std::is_same<typename traits::execution_space, Kokkos::Cuda>::value;
#else
false;
#endif
for (int i = rank_start; i != rank_end; i += increment) { for (int i = rank_start; i != rank_end; i += increment) {
span = m_upper[i] - m_lower[i]; const index_type length = m_upper[i] - m_lower[i];
if (m_tile[i] <= 0) { if (m_tile[i] <= 0) {
// TODO: determine what is a good default tile size for Cuda and HIP m_tune_tile_size = true;
// may be rank dependent if ((inner_direction == Iterate::Right && (i < rank - 1)) ||
if (((int)inner_direction == (int)Right && (i < rank - 1)) || (inner_direction == Iterate::Left && (i > 0))) {
((int)inner_direction == (int)Left && (i > 0))) { if (m_prod_tile_dims * properties.default_tile_size <
if (m_prod_tile_dims < 256) { static_cast<index_type>(properties.max_total_tile_size)) {
m_tile[i] = (is_cuda_exec_space) ? 2 : 4; m_tile[i] = properties.default_tile_size;
} else { } else {
m_tile[i] = 1; m_tile[i] = 1;
} }
} else { } else {
m_tile[i] = 16; m_tile[i] = properties.default_largest_tile_size == 0
? std::max<int>(length, 1)
: properties.default_largest_tile_size;
} }
} }
m_tile_end[i] = m_tile_end[i] =
static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]); static_cast<index_type>((length + m_tile[i] - 1) / m_tile[i]);
m_num_tiles *= m_tile_end[i]; m_num_tiles *= m_tile_end[i];
m_prod_tile_dims *= m_tile[i]; m_prod_tile_dims *= m_tile[i];
} }
if (m_prod_tile_dims > if (m_prod_tile_dims > static_cast<index_type>(properties.max_threads)) {
1024) { // Match Cuda restriction for ParallelReduce; 1024,1024,64 printf(" Product of tile dimensions exceed maximum limit: %d\n",
// max per dim (Kepler), but product num_threads < 1024 static_cast<int>(properties.max_threads));
if (is_cuda_exec_space) {
printf(" Tile dimensions exceed Cuda limits\n");
Kokkos::abort( Kokkos::abort(
"Cuda ExecSpace Error: MDRange tile dims exceed maximum number " "ExecSpace Error: MDRange tile dims exceed maximum number "
"of threads per block - choose smaller tile dims"); "of threads per block - choose smaller tile dims");
} else {
printf(" Tile dimensions exceed HIP limits\n");
Kokkos::abort(
"HIP ExecSpace Error: MDRange tile dims exceed maximum number of "
"threads per block - choose smaller tile dims");
} }
} }
}
#endif
}
}; };
} // namespace Kokkos } // namespace Kokkos

View File

@ -104,20 +104,6 @@ struct MemorySpaceAccess<Kokkos::AnonymousSpace, Kokkos::AnonymousSpace> {
enum : bool { deepcopy = true }; enum : bool { deepcopy = true };
}; };
template <typename OtherSpace>
struct VerifyExecutionCanAccessMemorySpace<OtherSpace, Kokkos::AnonymousSpace> {
enum { value = 1 };
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void *) {}
};
template <typename OtherSpace>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::AnonymousSpace, OtherSpace> {
enum { value = 1 };
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void *) {}
};
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos

View File

@ -45,14 +45,13 @@
#define KOKKOS_COMPLEX_HPP #define KOKKOS_COMPLEX_HPP
#include <Kokkos_Atomic.hpp> #include <Kokkos_Atomic.hpp>
#include <Kokkos_MathematicalFunctions.hpp>
#include <Kokkos_NumericTraits.hpp> #include <Kokkos_NumericTraits.hpp>
#include <impl/Kokkos_Error.hpp>
#include <complex> #include <complex>
#include <type_traits>
#include <iosfwd> #include <iosfwd>
#ifdef KOKKOS_ENABLE_SYCL
#include <CL/sycl.hpp>
#endif
namespace Kokkos { namespace Kokkos {
/// \class complex /// \class complex
@ -220,10 +219,11 @@ class
// Conditional noexcept, just in case RType throws on divide-by-zero // Conditional noexcept, just in case RType throws on divide-by-zero
KOKKOS_CONSTEXPR_14 KOKKOS_INLINE_FUNCTION complex& operator/=( KOKKOS_CONSTEXPR_14 KOKKOS_INLINE_FUNCTION complex& operator/=(
const complex<RealType>& y) noexcept(noexcept(RealType{} / RealType{})) { const complex<RealType>& y) noexcept(noexcept(RealType{} / RealType{})) {
using Kokkos::Experimental::fabs;
// Scale (by the "1-norm" of y) to avoid unwarranted overflow. // Scale (by the "1-norm" of y) to avoid unwarranted overflow.
// If the real part is +/-Inf and the imaginary part is -/+Inf, // If the real part is +/-Inf and the imaginary part is -/+Inf,
// this won't change the result. // this won't change the result.
const RealType s = std::fabs(y.real()) + std::fabs(y.imag()); const RealType s = fabs(y.real()) + fabs(y.imag());
// If s is 0, then y is zero, so x/y == real(x)/0 + i*imag(x)/0. // If s is 0, then y is zero, so x/y == real(x)/0 + i*imag(x)/0.
// In that case, the relation x/y == (x/s) / (y/s) doesn't hold, // In that case, the relation x/y == (x/s) / (y/s) doesn't hold,
@ -248,10 +248,11 @@ class
KOKKOS_INLINE_FUNCTION complex& operator/=( KOKKOS_INLINE_FUNCTION complex& operator/=(
const std::complex<RealType>& y) noexcept(noexcept(RealType{} / const std::complex<RealType>& y) noexcept(noexcept(RealType{} /
RealType{})) { RealType{})) {
using Kokkos::Experimental::fabs;
// Scale (by the "1-norm" of y) to avoid unwarranted overflow. // Scale (by the "1-norm" of y) to avoid unwarranted overflow.
// If the real part is +/-Inf and the imaginary part is -/+Inf, // If the real part is +/-Inf and the imaginary part is -/+Inf,
// this won't change the result. // this won't change the result.
const RealType s = std::fabs(y.real()) + std::fabs(y.imag()); const RealType s = fabs(y.real()) + fabs(y.imag());
// If s is 0, then y is zero, so x/y == real(x)/0 + i*imag(x)/0. // If s is 0, then y is zero, so x/y == real(x)/0 + i*imag(x)/0.
// In that case, the relation x/y == (x/s) / (y/s) doesn't hold, // In that case, the relation x/y == (x/s) / (y/s) doesn't hold,
@ -693,35 +694,96 @@ KOKKOS_INLINE_FUNCTION RealType real(const complex<RealType>& x) noexcept {
return x.real(); return x.real();
} }
//! Constructs a complex number from magnitude and phase angle
template <class T>
KOKKOS_INLINE_FUNCTION complex<T> polar(const T& r, const T& theta = T()) {
using Kokkos::Experimental::cos;
using Kokkos::Experimental::sin;
KOKKOS_EXPECTS(r >= 0);
return complex<T>(r * cos(theta), r * sin(theta));
}
//! Absolute value (magnitude) of a complex number. //! Absolute value (magnitude) of a complex number.
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION RealType abs(const complex<RealType>& x) { KOKKOS_INLINE_FUNCTION RealType abs(const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::hypot;
using cl::sycl::hypot;
#else
using std::hypot;
#endif
return hypot(x.real(), x.imag()); return hypot(x.real(), x.imag());
} }
//! Power of a complex number //! Power of a complex number
template <class RealType> template <class T>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> pow(const complex<RealType>& x, KOKKOS_INLINE_FUNCTION complex<T> pow(const complex<T>& x, const T& y) {
const RealType& e) { using Kokkos::Experimental::atan2;
RealType r = abs(x); using Kokkos::Experimental::pow;
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL T r = abs(x);
using cl::sycl::atan; T theta = atan2(x.imag(), x.real());
using cl::sycl::cos; return polar(pow(r, y), y * theta);
using cl::sycl::pow; }
using cl::sycl::sin;
#else template <class T>
using std::atan; KOKKOS_INLINE_FUNCTION complex<T> pow(const T& x, const complex<T>& y) {
using std::cos; return pow(complex<T>(x), y);
using std::pow; }
using std::sin;
#endif template <class T>
RealType phi = atan(x.imag() / x.real()); KOKKOS_INLINE_FUNCTION complex<T> pow(const complex<T>& x,
return pow(r, e) * Kokkos::complex<RealType>(cos(phi * e), sin(phi * e)); const complex<T>& y) {
using Kokkos::Experimental::log;
return x == T() ? T() : exp(y * log(x));
}
namespace Impl {
// NOTE promote would also be useful for math functions
template <class T, bool = std::is_integral<T>::value>
struct promote {
using type = double;
};
template <class T>
struct promote<T, false> {};
template <>
struct promote<long double> {
using type = long double;
};
template <>
struct promote<double> {
using type = double;
};
template <>
struct promote<float> {
using type = float;
};
template <class T>
using promote_t = typename promote<T>::type;
template <class T, class U>
struct promote_2 {
using type = decltype(promote_t<T>() + promote_t<U>());
};
template <class T, class U>
using promote_2_t = typename promote_2<T, U>::type;
} // namespace Impl
template <class T, class U,
class = std::enable_if_t<std::is_arithmetic<T>::value>>
KOKKOS_INLINE_FUNCTION complex<Impl::promote_2_t<T, U>> pow(
const T& x, const complex<U>& y) {
using type = Impl::promote_2_t<T, U>;
return pow(type(x), complex<type>(y));
}
template <class T, class U,
class = std::enable_if_t<std::is_arithmetic<U>::value>>
KOKKOS_INLINE_FUNCTION complex<Impl::promote_2_t<T, U>> pow(const complex<T>& x,
const U& y) {
using type = Impl::promote_2_t<T, U>;
return pow(complex<type>(x), type(y));
}
template <class T, class U>
KOKKOS_INLINE_FUNCTION complex<Impl::promote_2_t<T, U>> pow(
const complex<T>& x, const complex<U>& y) {
using type = Impl::promote_2_t<T, U>;
return pow(complex<type>(x), complex<type>(y));
} }
//! Square root of a complex number. This is intended to match the stdc++ //! Square root of a complex number. This is intended to match the stdc++
@ -729,26 +791,21 @@ KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> pow(const complex<RealType>& x,
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> sqrt( KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> sqrt(
const complex<RealType>& x) { const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::fabs;
using cl::sycl::abs; using Kokkos::Experimental::sqrt;
using cl::sycl::sqrt;
#else
using std::abs;
using std::sqrt;
#endif
RealType r = x.real(); RealType r = x.real();
RealType i = x.imag(); RealType i = x.imag();
if (r == RealType()) { if (r == RealType()) {
RealType t = sqrt(abs(i) / 2); RealType t = sqrt(fabs(i) / 2);
return Kokkos::complex<RealType>(t, i < RealType() ? -t : t); return Kokkos::complex<RealType>(t, i < RealType() ? -t : t);
} else { } else {
RealType t = sqrt(2 * (abs(x) + abs(r))); RealType t = sqrt(2 * (abs(x) + fabs(r)));
RealType u = t / 2; RealType u = t / 2;
return r > RealType() return r > RealType() ? Kokkos::complex<RealType>(u, i / t)
? Kokkos::complex<RealType>(u, i / t) : Kokkos::complex<RealType>(fabs(i) / t,
: Kokkos::complex<RealType>(abs(i) / t, i < RealType() ? -u : u); i < RealType() ? -u : u);
} }
} }
@ -762,15 +819,9 @@ KOKKOS_INLINE_FUNCTION complex<RealType> conj(
//! Exponential of a complex number. //! Exponential of a complex number.
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION complex<RealType> exp(const complex<RealType>& x) { KOKKOS_INLINE_FUNCTION complex<RealType> exp(const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::cos;
using cl::sycl::cos; using Kokkos::Experimental::exp;
using cl::sycl::exp; using Kokkos::Experimental::sin;
using cl::sycl::sin;
#else
using std::cos;
using std::exp;
using std::sin;
#endif
return exp(x.real()) * complex<RealType>(cos(x.imag()), sin(x.imag())); return exp(x.real()) * complex<RealType>(cos(x.imag()), sin(x.imag()));
} }
@ -778,14 +829,9 @@ KOKKOS_INLINE_FUNCTION complex<RealType> exp(const complex<RealType>& x) {
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> log( KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> log(
const complex<RealType>& x) { const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::atan2;
using cl::sycl::atan; using Kokkos::Experimental::log;
using cl::sycl::log; RealType phi = atan2(x.imag(), x.real());
#else
using std::atan;
using std::log;
#endif
RealType phi = atan(x.imag() / x.real());
return Kokkos::complex<RealType>(log(abs(x)), phi); return Kokkos::complex<RealType>(log(abs(x)), phi);
} }
@ -793,17 +839,10 @@ KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> log(
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> sin( KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> sin(
const complex<RealType>& x) { const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::cos;
using cl::sycl::cos; using Kokkos::Experimental::cosh;
using cl::sycl::cosh; using Kokkos::Experimental::sin;
using cl::sycl::sin; using Kokkos::Experimental::sinh;
using cl::sycl::sinh;
#else
using std::cos;
using std::cosh;
using std::sin;
using std::sinh;
#endif
return Kokkos::complex<RealType>(sin(x.real()) * cosh(x.imag()), return Kokkos::complex<RealType>(sin(x.real()) * cosh(x.imag()),
cos(x.real()) * sinh(x.imag())); cos(x.real()) * sinh(x.imag()));
} }
@ -812,17 +851,10 @@ KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> sin(
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> cos( KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> cos(
const complex<RealType>& x) { const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::cos;
using cl::sycl::cos; using Kokkos::Experimental::cosh;
using cl::sycl::cosh; using Kokkos::Experimental::sin;
using cl::sycl::sin; using Kokkos::Experimental::sinh;
using cl::sycl::sinh;
#else
using std::cos;
using std::cosh;
using std::sin;
using std::sinh;
#endif
return Kokkos::complex<RealType>(cos(x.real()) * cosh(x.imag()), return Kokkos::complex<RealType>(cos(x.real()) * cosh(x.imag()),
-sin(x.real()) * sinh(x.imag())); -sin(x.real()) * sinh(x.imag()));
} }
@ -838,17 +870,10 @@ KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> tan(
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> sinh( KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> sinh(
const complex<RealType>& x) { const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::cos;
using cl::sycl::cos; using Kokkos::Experimental::cosh;
using cl::sycl::cosh; using Kokkos::Experimental::sin;
using cl::sycl::sin; using Kokkos::Experimental::sinh;
using cl::sycl::sinh;
#else
using std::cos;
using std::cosh;
using std::sin;
using std::sinh;
#endif
return Kokkos::complex<RealType>(sinh(x.real()) * cos(x.imag()), return Kokkos::complex<RealType>(sinh(x.real()) * cos(x.imag()),
cosh(x.real()) * sin(x.imag())); cosh(x.real()) * sin(x.imag()));
} }
@ -857,17 +882,10 @@ KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> sinh(
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> cosh( KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> cosh(
const complex<RealType>& x) { const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::cos;
using cl::sycl::cos; using Kokkos::Experimental::cosh;
using cl::sycl::cosh; using Kokkos::Experimental::sin;
using cl::sycl::sin; using Kokkos::Experimental::sinh;
using cl::sycl::sinh;
#else
using std::cos;
using std::cosh;
using std::sin;
using std::sinh;
#endif
return Kokkos::complex<RealType>(cosh(x.real()) * cos(x.imag()), return Kokkos::complex<RealType>(cosh(x.real()) * cos(x.imag()),
sinh(x.real()) * sin(x.imag())); sinh(x.real()) * sin(x.imag()));
} }
@ -898,13 +916,8 @@ KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> acosh(
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> atanh( KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> atanh(
const complex<RealType>& x) { const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::atan2;
using cl::sycl::atan2; using Kokkos::Experimental::log;
using cl::sycl::log;
#else
using std::atan2;
using std::log;
#endif
const RealType i2 = x.imag() * x.imag(); const RealType i2 = x.imag() * x.imag();
const RealType r = RealType(1.0) - i2 - x.real() * x.real(); const RealType r = RealType(1.0) - i2 - x.real() * x.real();
@ -933,12 +946,7 @@ KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> asin(
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> acos( KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> acos(
const complex<RealType>& x) { const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::acos;
using cl::sycl::acos;
#else
using std::acos;
#endif
Kokkos::complex<RealType> t = asin(x); Kokkos::complex<RealType> t = asin(x);
RealType pi_2 = acos(RealType(0.0)); RealType pi_2 = acos(RealType(0.0));
return Kokkos::complex<RealType>(pi_2 - t.real(), -t.imag()); return Kokkos::complex<RealType>(pi_2 - t.real(), -t.imag());
@ -948,13 +956,8 @@ KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> acos(
template <class RealType> template <class RealType>
KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> atan( KOKKOS_INLINE_FUNCTION Kokkos::complex<RealType> atan(
const complex<RealType>& x) { const complex<RealType>& x) {
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL using Kokkos::Experimental::atan2;
using cl::sycl::atan2; using Kokkos::Experimental::log;
using cl::sycl::log;
#else
using std::atan2;
using std::log;
#endif
const RealType r2 = x.real() * x.real(); const RealType r2 = x.real() * x.real();
const RealType i = RealType(1.0) - r2 - x.imag() * x.imag(); const RealType i = RealType(1.0) - r2 - x.imag() * x.imag();
@ -996,12 +999,13 @@ KOKKOS_INLINE_FUNCTION
operator/(const complex<RealType1>& x, operator/(const complex<RealType1>& x,
const complex<RealType2>& y) noexcept(noexcept(RealType1{} / const complex<RealType2>& y) noexcept(noexcept(RealType1{} /
RealType2{})) { RealType2{})) {
using Kokkos::Experimental::fabs;
// Scale (by the "1-norm" of y) to avoid unwarranted overflow. // Scale (by the "1-norm" of y) to avoid unwarranted overflow.
// If the real part is +/-Inf and the imaginary part is -/+Inf, // If the real part is +/-Inf and the imaginary part is -/+Inf,
// this won't change the result. // this won't change the result.
using common_real_type = using common_real_type =
typename std::common_type<RealType1, RealType2>::type; typename std::common_type<RealType1, RealType2>::type;
const common_real_type s = std::fabs(real(y)) + std::fabs(imag(y)); const common_real_type s = fabs(real(y)) + fabs(imag(y));
// If s is 0, then y is zero, so x/y == real(x)/0 + i*imag(x)/0. // If s is 0, then y is zero, so x/y == real(x)/0 + i*imag(x)/0.
// In that case, the relation x/y == (x/s) / (y/s) doesn't hold, // In that case, the relation x/y == (x/s) / (y/s) doesn't hold,

View File

@ -58,6 +58,7 @@
#include <Kokkos_AnonymousSpace.hpp> #include <Kokkos_AnonymousSpace.hpp>
#include <Kokkos_LogicalSpaces.hpp> #include <Kokkos_LogicalSpaces.hpp>
#include <Kokkos_Pair.hpp> #include <Kokkos_Pair.hpp>
#include <Kokkos_MathematicalFunctions.hpp>
#include <Kokkos_MemoryPool.hpp> #include <Kokkos_MemoryPool.hpp>
#include <Kokkos_Array.hpp> #include <Kokkos_Array.hpp>
#include <Kokkos_View.hpp> #include <Kokkos_View.hpp>
@ -86,6 +87,10 @@ struct InitArguments {
int skip_device; int skip_device;
bool disable_warnings; bool disable_warnings;
bool tune_internals; bool tune_internals;
bool tool_help = false;
std::string tool_lib = {};
std::string tool_args = {};
InitArguments(int nt = -1, int nn = -1, int dv = -1, bool dw = false, InitArguments(int nt = -1, int nn = -1, int dv = -1, bool dw = false,
bool ti = false) bool ti = false)
: num_threads{nt}, : num_threads{nt},
@ -139,6 +144,10 @@ void pre_initialize(const InitArguments& args);
void post_initialize(const InitArguments& args); void post_initialize(const InitArguments& args);
void declare_configuration_metadata(const std::string& category,
const std::string& key,
const std::string& value);
} // namespace Impl } // namespace Impl
bool is_initialized() noexcept; bool is_initialized() noexcept;

View File

@ -50,6 +50,7 @@
// and compiler environment then sets a collection of #define macros. // and compiler environment then sets a collection of #define macros.
#include <Kokkos_Macros.hpp> #include <Kokkos_Macros.hpp>
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_Utilities.hpp> #include <impl/Kokkos_Utilities.hpp>
#include <Kokkos_MasterLock.hpp> #include <Kokkos_MasterLock.hpp>
@ -180,7 +181,6 @@ using DefaultHostExecutionSpace KOKKOS_IMPL_DEFAULT_HOST_EXEC_SPACE_ANNOTATION =
// a given memory space. // a given memory space.
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
#if defined(KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA) && \ #if defined(KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA) && \
@ -196,16 +196,22 @@ using ActiveExecutionMemorySpace = Kokkos::HostSpace;
using ActiveExecutionMemorySpace = void; using ActiveExecutionMemorySpace = void;
#endif #endif
template <class ActiveSpace, class MemorySpace> template <typename DstMemorySpace, typename SrcMemorySpace>
struct VerifyExecutionCanAccessMemorySpace { struct MemorySpaceAccess;
enum { value = 0 };
template <typename DstMemorySpace, typename SrcMemorySpace,
bool = Kokkos::Impl::MemorySpaceAccess<DstMemorySpace,
SrcMemorySpace>::accessible>
struct verify_space {
KOKKOS_FUNCTION static void check() {}
}; };
template <class Space> template <typename DstMemorySpace, typename SrcMemorySpace>
struct VerifyExecutionCanAccessMemorySpace<Space, Space> { struct verify_space<DstMemorySpace, SrcMemorySpace, false> {
enum { value = 1 }; KOKKOS_FUNCTION static void check() {
KOKKOS_INLINE_FUNCTION static void verify(void) {} Kokkos::abort(
KOKKOS_INLINE_FUNCTION static void verify(const void *) {} "Kokkos::View ERROR: attempt to access inaccessible memory space");
};
}; };
// Base class for exec space initializer factories // Base class for exec space initializer factories
@ -221,12 +227,12 @@ class LogicalMemorySpace;
} // namespace Kokkos } // namespace Kokkos
#define KOKKOS_RESTRICT_EXECUTION_TO_DATA(DATA_SPACE, DATA_PTR) \ #define KOKKOS_RESTRICT_EXECUTION_TO_DATA(DATA_SPACE, DATA_PTR) \
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< \ Kokkos::Impl::verify_space<Kokkos::Impl::ActiveExecutionMemorySpace, \
Kokkos::Impl::ActiveExecutionMemorySpace, DATA_SPACE>::verify(DATA_PTR) DATA_SPACE>::check();
#define KOKKOS_RESTRICT_EXECUTION_TO_(DATA_SPACE) \ #define KOKKOS_RESTRICT_EXECUTION_TO_(DATA_SPACE) \
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< \ Kokkos::Impl::verify_space<Kokkos::Impl::ActiveExecutionMemorySpace, \
Kokkos::Impl::ActiveExecutionMemorySpace, DATA_SPACE>::verify() DATA_SPACE>::check();
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
@ -256,8 +262,7 @@ template <class ViewTypeA, class ViewTypeB, class Layout, class ExecSpace,
int Rank, typename iType> int Rank, typename iType>
struct ViewCopy; struct ViewCopy;
template <class Functor, class Policy, class EnableFunctor = void, template <class Functor, class Policy>
class EnablePolicy = void>
struct FunctorPolicyExecutionSpace; struct FunctorPolicyExecutionSpace;
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------

View File

@ -199,7 +199,7 @@ class CrsRowMapFromCounts {
public: public:
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
void operator()(index_type i, value_type& update, bool final_pass) const { void operator()(index_type i, value_type& update, bool final_pass) const {
if (i < m_in.size()) { if (i < static_cast<index_type>(m_in.size())) {
update += m_in(i); update += m_in(i);
if (final_pass) m_out(i + 1) = update; if (final_pass) m_out(i + 1) = update;
} else if (final_pass) { } else if (final_pass) {

View File

@ -63,6 +63,7 @@
#include <Kokkos_MemoryTraits.hpp> #include <Kokkos_MemoryTraits.hpp>
#include <impl/Kokkos_Tags.hpp> #include <impl/Kokkos_Tags.hpp>
#include <impl/Kokkos_ExecSpaceInitializer.hpp> #include <impl/Kokkos_ExecSpaceInitializer.hpp>
#include <impl/Kokkos_HostSharedPtr.hpp>
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
@ -198,16 +199,6 @@ class Cuda {
Cuda(); Cuda();
KOKKOS_FUNCTION Cuda(Cuda&& other) noexcept;
KOKKOS_FUNCTION Cuda(const Cuda& other);
KOKKOS_FUNCTION Cuda& operator=(Cuda&& other) noexcept;
KOKKOS_FUNCTION Cuda& operator=(const Cuda& other);
KOKKOS_FUNCTION ~Cuda() noexcept;
Cuda(cudaStream_t stream); Cuda(cudaStream_t stream);
//-------------------------------------------------------------------------- //--------------------------------------------------------------------------
@ -253,13 +244,12 @@ class Cuda {
static const char* name(); static const char* name();
inline Impl::CudaInternal* impl_internal_space_instance() const { inline Impl::CudaInternal* impl_internal_space_instance() const {
return m_space_instance; return m_space_instance.get();
} }
uint32_t impl_instance_id() const noexcept { return 0; } uint32_t impl_instance_id() const noexcept { return 0; }
private: private:
Impl::CudaInternal* m_space_instance; Kokkos::Impl::HostSharedPtr<Impl::CudaInternal> m_space_instance;
int* m_counter;
}; };
namespace Tools { namespace Tools {
@ -319,38 +309,8 @@ struct MemorySpaceAccess<Kokkos::CudaUVMSpace,
#endif #endif
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::CudaSpace,
Kokkos::Cuda::scratch_memory_space> {
enum : bool { value = true };
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {}
};
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,
Kokkos::Cuda::scratch_memory_space> {
enum : bool { value = false };
inline static void verify(void) { CudaSpace::access_error(); }
inline static void verify(const void* p) { CudaSpace::access_error(p); }
};
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
#include <Cuda/Kokkos_Cuda_KernelLaunch.hpp>
#include <Cuda/Kokkos_Cuda_Instance.hpp>
#include <Cuda/Kokkos_Cuda_View.hpp>
#include <Cuda/Kokkos_Cuda_Team.hpp>
#include <Cuda/Kokkos_Cuda_Parallel.hpp>
#include <Cuda/Kokkos_Cuda_Task.hpp>
#include <Cuda/Kokkos_Cuda_UniqueToken.hpp>
#include <KokkosExp_MDRangePolicy.hpp>
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */ #endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
#endif /* #ifndef KOKKOS_CUDA_HPP */ #endif /* #ifndef KOKKOS_CUDA_HPP */

View File

@ -53,8 +53,10 @@
#include <iosfwd> #include <iosfwd>
#include <typeinfo> #include <typeinfo>
#include <string> #include <string>
#include <memory>
#include <Kokkos_HostSpace.hpp> #include <Kokkos_HostSpace.hpp>
#include <impl/Kokkos_SharedAlloc.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp> #include <impl/Kokkos_Profiling_Interface.hpp>
@ -119,8 +121,8 @@ class CudaSpace {
/*--------------------------------*/ /*--------------------------------*/
/** \brief Error reporting for HostSpace attempt to access CudaSpace */ /** \brief Error reporting for HostSpace attempt to access CudaSpace */
static void access_error(); KOKKOS_DEPRECATED static void access_error();
static void access_error(const void* const); KOKKOS_DEPRECATED static void access_error(const void* const);
private: private:
int m_device; ///< Which Cuda device int m_device; ///< Which Cuda device
@ -128,42 +130,6 @@ class CudaSpace {
static constexpr const char* m_name = "Cuda"; static constexpr const char* m_name = "Cuda";
friend class Kokkos::Impl::SharedAllocationRecord<Kokkos::CudaSpace, void>; friend class Kokkos::Impl::SharedAllocationRecord<Kokkos::CudaSpace, void>;
}; };
namespace Impl {
/// \brief Initialize lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function initializes the locks to zero (unset).
void init_lock_arrays_cuda_space();
/// \brief Retrieve the pointer to the lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function retrieves the lock array pointer.
/// If the array is not yet allocated it will do so.
int* atomic_lock_array_cuda_space_ptr(bool deallocate = false);
/// \brief Retrieve the pointer to the scratch array for team and thread private
/// global memory.
///
/// Team and Thread private scratch allocations in
/// global memory are acquired via locks.
/// This function retrieves the lock array pointer.
/// If the array is not yet allocated it will do so.
int* scratch_lock_array_cuda_space_ptr(bool deallocate = false);
/// \brief Retrieve the pointer to the scratch array for unique identifiers.
///
/// Unique identifiers in the range 0-Cuda::concurrency
/// are provided via locks.
/// This function retrieves the lock array pointer.
/// If the array is not yet allocated it will do so.
int* threadid_lock_array_cuda_space_ptr(bool deallocate = false);
} // namespace Impl
} // namespace Kokkos } // namespace Kokkos
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
@ -313,6 +279,11 @@ class CudaHostPinnedSpace {
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
cudaStream_t cuda_get_deep_copy_stream();
const std::unique_ptr<Kokkos::Cuda>& cuda_get_deep_copy_space(
bool initialize = true);
static_assert(Kokkos::Impl::MemorySpaceAccess<Kokkos::CudaSpace, static_assert(Kokkos::Impl::MemorySpaceAccess<Kokkos::CudaSpace,
Kokkos::CudaSpace>::assignable, Kokkos::CudaSpace>::assignable,
""); "");
@ -784,104 +755,21 @@ struct DeepCopy<HostSpace, CudaHostPinnedSpace, ExecutionSpace> {
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
/** Running in CudaSpace attempting to access HostSpace: error */
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::CudaSpace,
Kokkos::HostSpace> {
enum : bool { value = false };
KOKKOS_INLINE_FUNCTION static void verify(void) {
Kokkos::abort("Cuda code attempted to access HostSpace memory");
}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {
Kokkos::abort("Cuda code attempted to access HostSpace memory");
}
};
/** Running in CudaSpace accessing CudaUVMSpace: ok */
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::CudaSpace,
Kokkos::CudaUVMSpace> {
enum : bool { value = true };
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {}
};
/** Running in CudaSpace accessing CudaHostPinnedSpace: ok */
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::CudaSpace,
Kokkos::CudaHostPinnedSpace> {
enum : bool { value = true };
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {}
};
/** Running in CudaSpace attempting to access an unknown space: error */
template <class OtherSpace>
struct VerifyExecutionCanAccessMemorySpace<
typename std::enable_if<!std::is_same<Kokkos::CudaSpace, OtherSpace>::value,
Kokkos::CudaSpace>::type,
OtherSpace> {
enum : bool { value = false };
KOKKOS_INLINE_FUNCTION static void verify(void) {
Kokkos::abort("Cuda code attempted to access unknown Space memory");
}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {
Kokkos::abort("Cuda code attempted to access unknown Space memory");
}
};
//----------------------------------------------------------------------------
/** Running in HostSpace attempting to access CudaSpace */
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,
Kokkos::CudaSpace> {
enum : bool { value = false };
inline static void verify(void) { CudaSpace::access_error(); }
inline static void verify(const void* p) { CudaSpace::access_error(p); }
};
/** Running in HostSpace accessing CudaUVMSpace is OK */
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,
Kokkos::CudaUVMSpace> {
enum : bool { value = true };
inline static void verify(void) {}
inline static void verify(const void*) {}
};
/** Running in HostSpace accessing CudaHostPinnedSpace is OK */
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,
Kokkos::CudaHostPinnedSpace> {
enum : bool { value = true };
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template <> template <>
class SharedAllocationRecord<Kokkos::CudaSpace, void> class SharedAllocationRecord<Kokkos::CudaSpace, void>
: public SharedAllocationRecord<void, void> { : public HostInaccessibleSharedAllocationRecordCommon<Kokkos::CudaSpace> {
private: private:
friend class SharedAllocationRecord<Kokkos::CudaUVMSpace, void>; friend class SharedAllocationRecord<Kokkos::CudaUVMSpace, void>;
friend class SharedAllocationRecordCommon<Kokkos::CudaSpace>;
friend class HostInaccessibleSharedAllocationRecordCommon<Kokkos::CudaSpace>;
using RecordBase = SharedAllocationRecord<void, void>; using RecordBase = SharedAllocationRecord<void, void>;
using base_t =
HostInaccessibleSharedAllocationRecordCommon<Kokkos::CudaSpace>;
SharedAllocationRecord(const SharedAllocationRecord&) = delete; SharedAllocationRecord(const SharedAllocationRecord&) = delete;
SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete; SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete;
static void deallocate(RecordBase*);
static ::cudaTextureObject_t attach_texture_object( static ::cudaTextureObject_t attach_texture_object(
const unsigned sizeof_alias, void* const alloc_ptr, const unsigned sizeof_alias, void* const alloc_ptr,
const size_t alloc_size); const size_t alloc_size);
@ -890,39 +778,19 @@ class SharedAllocationRecord<Kokkos::CudaSpace, void>
static RecordBase s_root_record; static RecordBase s_root_record;
#endif #endif
::cudaTextureObject_t m_tex_obj; ::cudaTextureObject_t m_tex_obj = 0;
const Kokkos::CudaSpace m_space; const Kokkos::CudaSpace m_space;
protected: protected:
~SharedAllocationRecord(); ~SharedAllocationRecord();
SharedAllocationRecord() : RecordBase(), m_tex_obj(0), m_space() {} SharedAllocationRecord() = default;
SharedAllocationRecord( SharedAllocationRecord(
const Kokkos::CudaSpace& arg_space, const std::string& arg_label, const Kokkos::CudaSpace& arg_space, const std::string& arg_label,
const size_t arg_alloc_size, const size_t arg_alloc_size,
const RecordBase::function_type arg_dealloc = &deallocate); const RecordBase::function_type arg_dealloc = &base_t::deallocate);
public: public:
std::string get_label() const;
static SharedAllocationRecord* allocate(const Kokkos::CudaSpace& arg_space,
const std::string& arg_label,
const size_t arg_alloc_size);
/**\brief Allocate tracked memory in the space */
static void* allocate_tracked(const Kokkos::CudaSpace& arg_space,
const std::string& arg_label,
const size_t arg_alloc_size);
/**\brief Reallocate tracked memory in the space */
static void* reallocate_tracked(void* const arg_alloc_ptr,
const size_t arg_alloc_size);
/**\brief Deallocate tracked memory in the space */
static void deallocate_tracked(void* const arg_alloc_ptr);
static SharedAllocationRecord* get_record(void* arg_alloc_ptr);
template <typename AliasType> template <typename AliasType>
inline ::cudaTextureObject_t attach_texture_object() { inline ::cudaTextureObject_t attach_texture_object() {
static_assert((std::is_same<AliasType, int>::value || static_assert((std::is_same<AliasType, int>::value ||
@ -945,57 +813,35 @@ class SharedAllocationRecord<Kokkos::CudaSpace, void>
// Texture object is attached to the entire allocation range // Texture object is attached to the entire allocation range
return ptr - reinterpret_cast<AliasType*>(RecordBase::m_alloc_ptr); return ptr - reinterpret_cast<AliasType*>(RecordBase::m_alloc_ptr);
} }
static void print_records(std::ostream&, const Kokkos::CudaSpace&,
bool detail = false);
}; };
template <> template <>
class SharedAllocationRecord<Kokkos::CudaUVMSpace, void> class SharedAllocationRecord<Kokkos::CudaUVMSpace, void>
: public SharedAllocationRecord<void, void> { : public SharedAllocationRecordCommon<Kokkos::CudaUVMSpace> {
private: private:
friend class SharedAllocationRecordCommon<Kokkos::CudaUVMSpace>;
using base_t = SharedAllocationRecordCommon<Kokkos::CudaUVMSpace>;
using RecordBase = SharedAllocationRecord<void, void>; using RecordBase = SharedAllocationRecord<void, void>;
SharedAllocationRecord(const SharedAllocationRecord&) = delete; SharedAllocationRecord(const SharedAllocationRecord&) = delete;
SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete; SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete;
static void deallocate(RecordBase*);
static RecordBase s_root_record; static RecordBase s_root_record;
::cudaTextureObject_t m_tex_obj; ::cudaTextureObject_t m_tex_obj = 0;
const Kokkos::CudaUVMSpace m_space; const Kokkos::CudaUVMSpace m_space;
protected: protected:
~SharedAllocationRecord(); ~SharedAllocationRecord();
SharedAllocationRecord() : RecordBase(), m_tex_obj(0), m_space() {} SharedAllocationRecord() = default;
SharedAllocationRecord( SharedAllocationRecord(
const Kokkos::CudaUVMSpace& arg_space, const std::string& arg_label, const Kokkos::CudaUVMSpace& arg_space, const std::string& arg_label,
const size_t arg_alloc_size, const size_t arg_alloc_size,
const RecordBase::function_type arg_dealloc = &deallocate); const RecordBase::function_type arg_dealloc = &base_t::deallocate);
public: public:
std::string get_label() const;
static SharedAllocationRecord* allocate(const Kokkos::CudaUVMSpace& arg_space,
const std::string& arg_label,
const size_t arg_alloc_size);
/**\brief Allocate tracked memory in the space */
static void* allocate_tracked(const Kokkos::CudaUVMSpace& arg_space,
const std::string& arg_label,
const size_t arg_alloc_size);
/**\brief Reallocate tracked memory in the space */
static void* reallocate_tracked(void* const arg_alloc_ptr,
const size_t arg_alloc_size);
/**\brief Deallocate tracked memory in the space */
static void deallocate_tracked(void* const arg_alloc_ptr);
static SharedAllocationRecord* get_record(void* arg_alloc_ptr);
template <typename AliasType> template <typename AliasType>
inline ::cudaTextureObject_t attach_texture_object() { inline ::cudaTextureObject_t attach_texture_object() {
static_assert((std::is_same<AliasType, int>::value || static_assert((std::is_same<AliasType, int>::value ||
@ -1019,57 +865,32 @@ class SharedAllocationRecord<Kokkos::CudaUVMSpace, void>
// Texture object is attached to the entire allocation range // Texture object is attached to the entire allocation range
return ptr - reinterpret_cast<AliasType*>(RecordBase::m_alloc_ptr); return ptr - reinterpret_cast<AliasType*>(RecordBase::m_alloc_ptr);
} }
static void print_records(std::ostream&, const Kokkos::CudaUVMSpace&,
bool detail = false);
}; };
template <> template <>
class SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void> class SharedAllocationRecord<Kokkos::CudaHostPinnedSpace, void>
: public SharedAllocationRecord<void, void> { : public SharedAllocationRecordCommon<Kokkos::CudaHostPinnedSpace> {
private: private:
friend class SharedAllocationRecordCommon<Kokkos::CudaHostPinnedSpace>;
using RecordBase = SharedAllocationRecord<void, void>; using RecordBase = SharedAllocationRecord<void, void>;
using base_t = SharedAllocationRecordCommon<Kokkos::CudaHostPinnedSpace>;
SharedAllocationRecord(const SharedAllocationRecord&) = delete; SharedAllocationRecord(const SharedAllocationRecord&) = delete;
SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete; SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete;
static void deallocate(RecordBase*);
static RecordBase s_root_record; static RecordBase s_root_record;
const Kokkos::CudaHostPinnedSpace m_space; const Kokkos::CudaHostPinnedSpace m_space;
protected: protected:
~SharedAllocationRecord(); ~SharedAllocationRecord();
SharedAllocationRecord() : RecordBase(), m_space() {} SharedAllocationRecord() = default;
SharedAllocationRecord( SharedAllocationRecord(
const Kokkos::CudaHostPinnedSpace& arg_space, const Kokkos::CudaHostPinnedSpace& arg_space,
const std::string& arg_label, const size_t arg_alloc_size, const std::string& arg_label, const size_t arg_alloc_size,
const RecordBase::function_type arg_dealloc = &deallocate); const RecordBase::function_type arg_dealloc = &deallocate);
public:
std::string get_label() const;
static SharedAllocationRecord* allocate(
const Kokkos::CudaHostPinnedSpace& arg_space,
const std::string& arg_label, const size_t arg_alloc_size);
/**\brief Allocate tracked memory in the space */
static void* allocate_tracked(const Kokkos::CudaHostPinnedSpace& arg_space,
const std::string& arg_label,
const size_t arg_alloc_size);
/**\brief Reallocate tracked memory in the space */
static void* reallocate_tracked(void* const arg_alloc_ptr,
const size_t arg_alloc_size);
/**\brief Deallocate tracked memory in the space */
static void deallocate_tracked(void* const arg_alloc_ptr);
static SharedAllocationRecord* get_record(void* arg_alloc_ptr);
static void print_records(std::ostream&, const Kokkos::CudaHostPinnedSpace&,
bool detail = false);
}; };
} // namespace Impl } // namespace Impl

View File

@ -856,11 +856,12 @@ KOKKOS_INLINE_FUNCTION_DELETED
Impl::ThreadVectorRangeBoundariesStruct<iType, TeamMemberType> Impl::ThreadVectorRangeBoundariesStruct<iType, TeamMemberType>
ThreadVectorRange(const TeamMemberType&, const iType& count) = delete; ThreadVectorRange(const TeamMemberType&, const iType& count) = delete;
template <typename iType, class TeamMemberType, class _never_use_this_overload> template <typename iType1, typename iType2, class TeamMemberType,
KOKKOS_INLINE_FUNCTION_DELETED class _never_use_this_overload>
Impl::ThreadVectorRangeBoundariesStruct<iType, TeamMemberType> KOKKOS_INLINE_FUNCTION_DELETED Impl::ThreadVectorRangeBoundariesStruct<
ThreadVectorRange(const TeamMemberType&, const iType& arg_begin, typename std::common_type<iType1, iType2>::type, TeamMemberType>
const iType& arg_end) = delete; ThreadVectorRange(const TeamMemberType&, const iType1& arg_begin,
const iType2& arg_end) = delete;
namespace Impl { namespace Impl {
@ -902,85 +903,6 @@ struct ParallelConstructName<FunctorType, TagType, false> {
} // namespace Kokkos } // namespace Kokkos
namespace Kokkos { namespace Kokkos {
namespace Experimental {
namespace Impl {
template <class Property, class Policy>
struct PolicyPropertyAdaptor;
template <unsigned long P, template <class...> class Policy,
class... Properties>
struct PolicyPropertyAdaptor<WorkItemProperty::ImplWorkItemProperty<P>,
Policy<Properties...>> {
using policy_in_t = Policy<Properties...>;
static_assert(is_execution_policy<policy_in_t>::value, "");
using policy_out_t = Policy<typename policy_in_t::traits::execution_space,
typename policy_in_t::traits::schedule_type,
typename policy_in_t::traits::work_tag,
typename policy_in_t::traits::index_type,
typename policy_in_t::traits::iteration_pattern,
typename policy_in_t::traits::launch_bounds,
WorkItemProperty::ImplWorkItemProperty<P>,
typename policy_in_t::traits::occupancy_control>;
};
template <template <class...> class Policy, class... Properties>
struct PolicyPropertyAdaptor<DesiredOccupancy, Policy<Properties...>> {
using policy_in_t = Policy<Properties...>;
static_assert(is_execution_policy<policy_in_t>::value, "");
using policy_out_t = Policy<typename policy_in_t::traits::execution_space,
typename policy_in_t::traits::schedule_type,
typename policy_in_t::traits::work_tag,
typename policy_in_t::traits::index_type,
typename policy_in_t::traits::iteration_pattern,
typename policy_in_t::traits::launch_bounds,
typename policy_in_t::traits::work_item_property,
DesiredOccupancy>;
static_assert(policy_out_t::experimental_contains_desired_occupancy, "");
};
template <template <class...> class Policy, class... Properties>
struct PolicyPropertyAdaptor<MaximizeOccupancy, Policy<Properties...>> {
using policy_in_t = Policy<Properties...>;
static_assert(is_execution_policy<policy_in_t>::value, "");
using policy_out_t = Policy<typename policy_in_t::traits::execution_space,
typename policy_in_t::traits::schedule_type,
typename policy_in_t::traits::work_tag,
typename policy_in_t::traits::index_type,
typename policy_in_t::traits::iteration_pattern,
typename policy_in_t::traits::launch_bounds,
typename policy_in_t::traits::work_item_property,
MaximizeOccupancy>;
static_assert(!policy_out_t::experimental_contains_desired_occupancy, "");
};
} // namespace Impl
template <class PolicyType, unsigned long P>
constexpr typename Impl::PolicyPropertyAdaptor<
WorkItemProperty::ImplWorkItemProperty<P>, PolicyType>::policy_out_t
require(const PolicyType p, WorkItemProperty::ImplWorkItemProperty<P>) {
return typename Impl::PolicyPropertyAdaptor<
WorkItemProperty::ImplWorkItemProperty<P>, PolicyType>::policy_out_t(p);
}
template <typename Policy>
/*constexpr*/ typename Impl::PolicyPropertyAdaptor<DesiredOccupancy,
Policy>::policy_out_t
prefer(Policy const& p, DesiredOccupancy occ) {
typename Impl::PolicyPropertyAdaptor<DesiredOccupancy, Policy>::policy_out_t
pwo{p};
pwo.impl_set_desired_occupancy(occ);
return pwo;
}
template <typename Policy>
constexpr typename Impl::PolicyPropertyAdaptor<MaximizeOccupancy,
Policy>::policy_out_t
prefer(Policy const& p, MaximizeOccupancy) {
return {p};
}
} // namespace Experimental
namespace Impl { namespace Impl {

View File

@ -316,29 +316,5 @@ struct DeepCopy<Kokkos::Experimental::HBWSpace, HostSpace, ExecutionSpace> {
} // namespace Kokkos } // namespace Kokkos
namespace Kokkos {
namespace Impl {
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,
Kokkos::Experimental::HBWSpace> {
enum : bool { value = true };
inline static void verify(void) {}
inline static void verify(const void*) {}
};
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::Experimental::HBWSpace,
Kokkos::HostSpace> {
enum : bool { value = true };
inline static void verify(void) {}
inline static void verify(const void*) {}
};
} // namespace Impl
} // namespace Kokkos
#endif #endif
#endif // #define KOKKOS_HBWSPACE_HPP #endif // #define KOKKOS_HBWSPACE_HPP

View File

@ -57,6 +57,7 @@
#include <impl/Kokkos_Tags.hpp> #include <impl/Kokkos_Tags.hpp>
#include <HIP/Kokkos_HIP_Instance.hpp> #include <HIP/Kokkos_HIP_Instance.hpp>
#include <HIP/Kokkos_HIP_MDRangePolicy.hpp>
#include <HIP/Kokkos_HIP_Parallel_Range.hpp> #include <HIP/Kokkos_HIP_Parallel_Range.hpp>
#include <HIP/Kokkos_HIP_Parallel_MDRange.hpp> #include <HIP/Kokkos_HIP_Parallel_MDRange.hpp>
#include <HIP/Kokkos_HIP_Parallel_Team.hpp> #include <HIP/Kokkos_HIP_Parallel_Team.hpp>

View File

@ -61,6 +61,7 @@
#include <impl/Kokkos_Profiling_Interface.hpp> #include <impl/Kokkos_Profiling_Interface.hpp>
#include <impl/Kokkos_ExecSpaceInitializer.hpp> #include <impl/Kokkos_ExecSpaceInitializer.hpp>
#include <impl/Kokkos_HostSharedPtr.hpp>
#include <hip/hip_runtime_api.h> #include <hip/hip_runtime_api.h>
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
@ -117,8 +118,8 @@ class HIPSpace {
/*--------------------------------*/ /*--------------------------------*/
/** \brief Error reporting for HostSpace attempt to access HIPSpace */ /** \brief Error reporting for HostSpace attempt to access HIPSpace */
static void access_error(); KOKKOS_DEPRECATED static void access_error();
static void access_error(const void* const); KOKKOS_DEPRECATED static void access_error(const void* const);
private: private:
int m_device; ///< Which HIP device int m_device; ///< Which HIP device
@ -128,43 +129,6 @@ class HIPSpace {
}; };
} // namespace Experimental } // namespace Experimental
namespace Impl {
/// \brief Initialize lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function initializes the locks to zero (unset).
void init_lock_arrays_hip_space();
/// \brief Retrieve the pointer to the lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function retrieves the lock array pointer.
/// If the array is not yet allocated it will do so.
int* atomic_lock_array_hip_space_ptr(bool deallocate = false);
/// \brief Retrieve the pointer to the scratch array for team and thread private
/// global memory.
///
/// Team and Thread private scratch allocations in
/// global memory are acquired via locks.
/// This function retrieves the lock array pointer.
/// If the array is not yet allocated it will do so.
int* scratch_lock_array_hip_space_ptr(bool deallocate = false);
/// \brief Retrieve the pointer to the scratch array for unique identifiers.
///
/// Unique identifiers in the range 0-HIP::concurrency
/// are provided via locks.
/// This function retrieves the lock array pointer.
/// If the array is not yet allocated it will do so.
int* threadid_lock_array_hip_space_ptr(bool deallocate = false);
} // namespace Impl
} // namespace Kokkos } // namespace Kokkos
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
@ -483,88 +447,21 @@ struct DeepCopy<HostSpace, Kokkos::Experimental::HIPHostPinnedSpace,
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
/** Running in HIPSpace attempting to access HostSpace: error */
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::Experimental::HIPSpace,
Kokkos::HostSpace> {
enum : bool { value = false };
KOKKOS_INLINE_FUNCTION static void verify(void) {
Kokkos::abort("HIP code attempted to access HostSpace memory");
}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {
Kokkos::abort("HIP code attempted to access HostSpace memory");
}
};
/** Running in HIPSpace accessing HIPHostPinnedSpace: ok */
template <>
struct VerifyExecutionCanAccessMemorySpace<
Kokkos::Experimental::HIPSpace, Kokkos::Experimental::HIPHostPinnedSpace> {
enum : bool { value = true };
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {}
};
/** Running in HIPSpace attempting to access an unknown space: error */
template <class OtherSpace>
struct VerifyExecutionCanAccessMemorySpace<
typename std::enable_if<
!std::is_same<Kokkos::Experimental::HIPSpace, OtherSpace>::value,
Kokkos::Experimental::HIPSpace>::type,
OtherSpace> {
enum : bool { value = false };
KOKKOS_INLINE_FUNCTION static void verify(void) {
Kokkos::abort("HIP code attempted to access unknown Space memory");
}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {
Kokkos::abort("HIP code attempted to access unknown Space memory");
}
};
//----------------------------------------------------------------------------
/** Running in HostSpace attempting to access HIPSpace */
template <>
struct VerifyExecutionCanAccessMemorySpace<Kokkos::HostSpace,
Kokkos::Experimental::HIPSpace> {
enum : bool { value = false };
inline static void verify(void) {
Kokkos::Experimental::HIPSpace::access_error();
}
inline static void verify(const void* p) {
Kokkos::Experimental::HIPSpace::access_error(p);
}
};
/** Running in HostSpace accessing HIPHostPinnedSpace is OK */
template <>
struct VerifyExecutionCanAccessMemorySpace<
Kokkos::HostSpace, Kokkos::Experimental::HIPHostPinnedSpace> {
enum : bool { value = true };
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template <> template <>
class SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void> class SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>
: public SharedAllocationRecord<void, void> { : public HostInaccessibleSharedAllocationRecordCommon<
Kokkos::Experimental::HIPSpace> {
private: private:
friend class SharedAllocationRecordCommon<Kokkos::Experimental::HIPSpace>;
friend class HostInaccessibleSharedAllocationRecordCommon<
Kokkos::Experimental::HIPSpace>;
using base_t = HostInaccessibleSharedAllocationRecordCommon<
Kokkos::Experimental::HIPSpace>;
using RecordBase = SharedAllocationRecord<void, void>; using RecordBase = SharedAllocationRecord<void, void>;
SharedAllocationRecord(const SharedAllocationRecord&) = delete; SharedAllocationRecord(const SharedAllocationRecord&) = delete;
SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete; SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete;
static void deallocate(RecordBase*);
#ifdef KOKKOS_ENABLE_DEBUG #ifdef KOKKOS_ENABLE_DEBUG
static RecordBase s_root_record; static RecordBase s_root_record;
#endif #endif
@ -577,45 +474,23 @@ class SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>
SharedAllocationRecord( SharedAllocationRecord(
const Kokkos::Experimental::HIPSpace& arg_space, const Kokkos::Experimental::HIPSpace& arg_space,
const std::string& arg_label, const size_t arg_alloc_size, const std::string& arg_label, const size_t arg_alloc_size,
const RecordBase::function_type arg_dealloc = &deallocate); const RecordBase::function_type arg_dealloc = &base_t::deallocate);
public:
std::string get_label() const;
static SharedAllocationRecord* allocate(
const Kokkos::Experimental::HIPSpace& arg_space,
const std::string& arg_label, const size_t arg_alloc_size);
/**\brief Allocate tracked memory in the space */
static void* allocate_tracked(const Kokkos::Experimental::HIPSpace& arg_space,
const std::string& arg_label,
const size_t arg_alloc_size);
/**\brief Reallocate tracked memory in the space */
static void* reallocate_tracked(void* const arg_alloc_ptr,
const size_t arg_alloc_size);
/**\brief Deallocate tracked memory in the space */
static void deallocate_tracked(void* const arg_alloc_ptr);
static SharedAllocationRecord* get_record(void* arg_alloc_ptr);
static void print_records(std::ostream&,
const Kokkos::Experimental::HIPSpace&,
bool detail = false);
}; };
template <> template <>
class SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void> class SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>
: public SharedAllocationRecord<void, void> { : public SharedAllocationRecordCommon<
Kokkos::Experimental::HIPHostPinnedSpace> {
private: private:
friend class SharedAllocationRecordCommon<
Kokkos::Experimental::HIPHostPinnedSpace>;
using base_t =
SharedAllocationRecordCommon<Kokkos::Experimental::HIPHostPinnedSpace>;
using RecordBase = SharedAllocationRecord<void, void>; using RecordBase = SharedAllocationRecord<void, void>;
SharedAllocationRecord(const SharedAllocationRecord&) = delete; SharedAllocationRecord(const SharedAllocationRecord&) = delete;
SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete; SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete;
static void deallocate(RecordBase*);
#ifdef KOKKOS_ENABLE_DEBUG #ifdef KOKKOS_ENABLE_DEBUG
static RecordBase s_root_record; static RecordBase s_root_record;
#endif #endif
@ -624,36 +499,12 @@ class SharedAllocationRecord<Kokkos::Experimental::HIPHostPinnedSpace, void>
protected: protected:
~SharedAllocationRecord(); ~SharedAllocationRecord();
SharedAllocationRecord() : RecordBase(), m_space() {} SharedAllocationRecord() = default;
SharedAllocationRecord( SharedAllocationRecord(
const Kokkos::Experimental::HIPHostPinnedSpace& arg_space, const Kokkos::Experimental::HIPHostPinnedSpace& arg_space,
const std::string& arg_label, const size_t arg_alloc_size, const std::string& arg_label, const size_t arg_alloc_size,
const RecordBase::function_type arg_dealloc = &deallocate); const RecordBase::function_type arg_dealloc = &base_t::deallocate);
public:
std::string get_label() const;
static SharedAllocationRecord* allocate(
const Kokkos::Experimental::HIPHostPinnedSpace& arg_space,
const std::string& arg_label, const size_t arg_alloc_size);
/**\brief Allocate tracked memory in the space */
static void* allocate_tracked(
const Kokkos::Experimental::HIPHostPinnedSpace& arg_space,
const std::string& arg_label, const size_t arg_alloc_size);
/**\brief Reallocate tracked memory in the space */
static void* reallocate_tracked(void* const arg_alloc_ptr,
const size_t arg_alloc_size);
/**\brief Deallocate tracked memory in the space */
static void deallocate_tracked(void* const arg_alloc_ptr);
static SharedAllocationRecord* get_record(void* arg_alloc_ptr);
static void print_records(std::ostream&,
const Kokkos::Experimental::HIPHostPinnedSpace&,
bool detail = false);
}; };
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos
@ -687,13 +538,6 @@ class HIP {
HIP(); HIP();
HIP(hipStream_t stream); HIP(hipStream_t stream);
KOKKOS_FUNCTION HIP(HIP&& other) noexcept;
KOKKOS_FUNCTION HIP(HIP const& other);
KOKKOS_FUNCTION HIP& operator=(HIP&&) noexcept;
KOKKOS_FUNCTION HIP& operator=(HIP const&);
KOKKOS_FUNCTION ~HIP() noexcept;
//@} //@}
//------------------------------------ //------------------------------------
//! \name Functions that all Kokkos devices must implement. //! \name Functions that all Kokkos devices must implement.
@ -749,14 +593,13 @@ class HIP {
static const char* name(); static const char* name();
inline Impl::HIPInternal* impl_internal_space_instance() const { inline Impl::HIPInternal* impl_internal_space_instance() const {
return m_space_instance; return m_space_instance.get();
} }
uint32_t impl_instance_id() const noexcept { return 0; } uint32_t impl_instance_id() const noexcept { return 0; }
private: private:
Impl::HIPInternal* m_space_instance; Kokkos::Impl::HostSharedPtr<Impl::HIPInternal> m_space_instance;
int* m_counter;
}; };
} // namespace Experimental } // namespace Experimental
namespace Tools { namespace Tools {
@ -794,27 +637,6 @@ struct MemorySpaceAccess<Kokkos::Experimental::HIPSpace,
enum : bool { deepcopy = false }; enum : bool { deepcopy = false };
}; };
template <>
struct VerifyExecutionCanAccessMemorySpace<
Kokkos::Experimental::HIP::memory_space,
Kokkos::Experimental::HIP::scratch_memory_space> {
enum : bool { value = true };
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void*) {}
};
template <>
struct VerifyExecutionCanAccessMemorySpace<
Kokkos::HostSpace, Kokkos::Experimental::HIP::scratch_memory_space> {
enum : bool { value = false };
inline static void verify(void) {
Kokkos::Experimental::HIPSpace::access_error();
}
inline static void verify(const void* p) {
Kokkos::Experimental::HIPSpace::access_error(p);
}
};
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos

View File

@ -523,14 +523,6 @@ struct MemorySpaceAccess<Kokkos::Experimental::HPX::memory_space,
enum : bool { deepcopy = false }; enum : bool { deepcopy = false };
}; };
template <>
struct VerifyExecutionCanAccessMemorySpace<
Kokkos::Experimental::HPX::memory_space,
Kokkos::Experimental::HPX::scratch_memory_space> {
enum : bool { value = true };
inline static void verify(void) {}
inline static void verify(const void *) {}
};
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos
@ -1172,6 +1164,15 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>,
: m_functor(arg_functor), : m_functor(arg_functor),
m_mdr_policy(arg_policy), m_mdr_policy(arg_policy),
m_policy(Policy(0, m_mdr_policy.m_num_tiles).set_chunk_size(1)) {} m_policy(Policy(0, m_mdr_policy.m_num_tiles).set_chunk_size(1)) {}
template <typename Policy, typename Functor>
static int max_tile_size_product(const Policy &, const Functor &) {
/**
* 1024 here is just our guess for a reasonable max tile size,
* it isn't a hardware constraint. If people see a use for larger
* tile size products, we're happy to change this.
*/
return 1024;
}
}; };
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos
@ -1715,6 +1716,15 @@ class ParallelReduce<FunctorType, Kokkos::MDRangePolicy<Traits...>, ReducerType,
m_reducer(reducer), m_reducer(reducer),
m_result_ptr(reducer.view().data()), m_result_ptr(reducer.view().data()),
m_force_synchronous(!reducer.view().impl_track().has_record()) {} m_force_synchronous(!reducer.view().impl_track().has_record()) {}
template <typename Policy, typename Functor>
static int max_tile_size_product(const Policy &, const Functor &) {
/**
* 1024 here is just our guess for a reasonable max tile size,
* it isn't a hardware constraint. If people see a use for larger
* tile size products, we're happy to change this.
*/
return 1024;
}
}; };
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos
@ -2438,13 +2448,14 @@ KOKKOS_INLINE_FUNCTION
thread, count); thread, count);
} }
template <typename iType> template <typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION Impl::ThreadVectorRangeBoundariesStruct<
Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::HPXTeamMember> typename std::common_type<iType1, iType2>::type, Impl::HPXTeamMember>
ThreadVectorRange(const Impl::HPXTeamMember &thread, const iType &i_begin, ThreadVectorRange(const Impl::HPXTeamMember &thread, const iType1 &i_begin,
const iType &i_end) { const iType2 &i_end) {
using iType = typename std::common_type<iType1, iType2>::type;
return Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::HPXTeamMember>( return Impl::ThreadVectorRangeBoundariesStruct<iType, Impl::HPXTeamMember>(
thread, i_begin, i_end); thread, iType(i_begin), iType(i_end));
} }
KOKKOS_INLINE_FUNCTION KOKKOS_INLINE_FUNCTION
@ -2615,6 +2626,27 @@ KOKKOS_INLINE_FUNCTION void parallel_scan(
} }
} }
/** \brief Intra-thread vector parallel scan with reducer
*
*/
template <typename iType, class FunctorType, typename ReducerType>
KOKKOS_INLINE_FUNCTION
typename std::enable_if<Kokkos::is_reducer<ReducerType>::value>::type
parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<
iType, Impl::HPXTeamMember> &loop_boundaries,
const FunctorType &lambda, const ReducerType &reducer) {
typename ReducerType::value_type scan_val;
reducer.init(scan_val);
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
for (iType i = loop_boundaries.start; i < loop_boundaries.end;
i += loop_boundaries.increment) {
lambda(i, scan_val, true);
}
}
template <class FunctorType> template <class FunctorType>
KOKKOS_INLINE_FUNCTION void single( KOKKOS_INLINE_FUNCTION void single(
const Impl::VectorSingleStruct<Impl::HPXTeamMember> &, const Impl::VectorSingleStruct<Impl::HPXTeamMember> &,

View File

@ -242,17 +242,17 @@ namespace Impl {
template <> template <>
class SharedAllocationRecord<Kokkos::HostSpace, void> class SharedAllocationRecord<Kokkos::HostSpace, void>
: public SharedAllocationRecord<void, void> { : public SharedAllocationRecordCommon<Kokkos::HostSpace> {
private: private:
friend Kokkos::HostSpace; friend Kokkos::HostSpace;
friend class SharedAllocationRecordCommon<Kokkos::HostSpace>;
using base_t = SharedAllocationRecordCommon<Kokkos::HostSpace>;
using RecordBase = SharedAllocationRecord<void, void>; using RecordBase = SharedAllocationRecord<void, void>;
SharedAllocationRecord(const SharedAllocationRecord&) = delete; SharedAllocationRecord(const SharedAllocationRecord&) = delete;
SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete; SharedAllocationRecord& operator=(const SharedAllocationRecord&) = delete;
static void deallocate(RecordBase*);
#ifdef KOKKOS_ENABLE_DEBUG #ifdef KOKKOS_ENABLE_DEBUG
/**\brief Root record for tracked allocations from this HostSpace instance */ /**\brief Root record for tracked allocations from this HostSpace instance */
static RecordBase s_root_record; static RecordBase s_root_record;
@ -275,10 +275,6 @@ class SharedAllocationRecord<Kokkos::HostSpace, void>
const RecordBase::function_type arg_dealloc = &deallocate); const RecordBase::function_type arg_dealloc = &deallocate);
public: public:
inline std::string get_label() const {
return std::string(RecordBase::head()->m_label);
}
KOKKOS_INLINE_FUNCTION static SharedAllocationRecord* allocate( KOKKOS_INLINE_FUNCTION static SharedAllocationRecord* allocate(
const Kokkos::HostSpace& arg_space, const std::string& arg_label, const Kokkos::HostSpace& arg_space, const std::string& arg_label,
const size_t arg_alloc_size) { const size_t arg_alloc_size) {
@ -291,23 +287,6 @@ class SharedAllocationRecord<Kokkos::HostSpace, void>
return (SharedAllocationRecord*)0; return (SharedAllocationRecord*)0;
#endif #endif
} }
/**\brief Allocate tracked memory in the space */
static void* allocate_tracked(const Kokkos::HostSpace& arg_space,
const std::string& arg_label,
const size_t arg_alloc_size);
/**\brief Reallocate tracked memory in the space */
static void* reallocate_tracked(void* const arg_alloc_ptr,
const size_t arg_alloc_size);
/**\brief Deallocate tracked memory in the space */
static void deallocate_tracked(void* const arg_alloc_ptr);
static SharedAllocationRecord* get_record(void* arg_alloc_ptr);
static void print_records(std::ostream&, const Kokkos::HostSpace&,
bool detail = false);
}; };
} // namespace Impl } // namespace Impl

View File

@ -264,10 +264,10 @@ class SharedAllocationRecord<Kokkos::Experimental::LogicalMemorySpace<
static_cast<SharedAllocationRecord<void, void>*>(this); static_cast<SharedAllocationRecord<void, void>*>(this);
strncpy(RecordBase::m_alloc_ptr->m_label, arg_label.c_str(), strncpy(RecordBase::m_alloc_ptr->m_label, arg_label.c_str(),
SharedAllocationHeader::maximum_label_length); SharedAllocationHeader::maximum_label_length - 1);
// Set last element zero, in case c_str is too long // Set last element zero, in case c_str is too long
RecordBase::m_alloc_ptr RecordBase::m_alloc_ptr
->m_label[SharedAllocationHeader::maximum_label_length - 1] = (char)0; ->m_label[SharedAllocationHeader::maximum_label_length - 1] = '\0';
} }
public: public:

View File

@ -382,6 +382,12 @@
#define KOKKOS_IMPL_DEVICE_FUNCTION #define KOKKOS_IMPL_DEVICE_FUNCTION
#endif #endif
// Temporary solution for SYCL not supporting printf in kernels.
// Might disappear at any point once we have found another solution.
#if !defined(KOKKOS_IMPL_DO_NOT_USE_PRINTF)
#define KOKKOS_IMPL_DO_NOT_USE_PRINTF(...) printf(__VA_ARGS__)
#endif
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
// Define final version of functions. This is so that clang tidy can find these // Define final version of functions. This is so that clang tidy can find these
// macros more easily // macros more easily

Some files were not shown because too many files have changed in this diff Show More