Update Kokkos library in LAMMPS to v4.4.0

This commit is contained in:
Stan Moore
2024-09-11 09:20:36 -06:00
parent 16b19c71c1
commit a44955dd2e
254 changed files with 14227 additions and 9881 deletions

View File

@ -1,12 +1,88 @@
# CHANGELOG # CHANGELOG
## [4.4.00](https://github.com/kokkos/kokkos/tree/4.4.00)
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.3.01...4.4.00)
### Features:
* Add `Kokkos::View` conversions from and to [`std::mdspan`](https://en.cppreference.com/w/cpp/container/mdspan) [\#6830](https://github.com/kokkos/kokkos/pull/6830) [\#7069](https://github.com/kokkos/kokkos/pull/7069)
### Backend and Architecture Enhancements:
#### CUDA:
* `nvcc_wrapper`: Adding ability to process `--disable-warnings` flag [\#6936](https://github.com/kokkos/kokkos/issues/6936)
* Use recommended/max team size functions in Cuda ParallelFor and Reduce constructors [\#6891](https://github.com/kokkos/kokkos/issues/6891)
* Improve compile-times when building with `Kokkos_ENABLE_DEBUG_BOUNDS_CHECK` in Cuda [\#7013](https://github.com/kokkos/kokkos/pull/7013)
#### HIP:
* Use HIP builtin atomics [\#6882](https://github.com/kokkos/kokkos/pull/6882) [\#7000](https://github.com/kokkos/kokkos/pull/7000)
* Enable user-specified compiler and linker flags for AMD GPUs [\#7127](https://github.com/kokkos/kokkos/pull/7127)
#### SYCL:
* Add support for Graphs [\#6912](https://github.com/kokkos/kokkos/pull/6912)
* Fix multi-GPU support [\#6887](https://github.com/kokkos/kokkos/pull/6887)
* Improve performance of reduction and scan operations [\#6562](https://github.com/kokkos/kokkos/pull/6562), [\#6750](https://github.com/kokkos/kokkos/pull/6750)
* Fix lock for guarding scratch space in `TeamPolicy` `parallel_reduce` [\#6988](https://github.com/kokkos/kokkos/pull/6988)
* Include submission command queue property information into `SYCL::print_configuration()` [\#7004](https://github.com/kokkos/kokkos/pull/7004)
#### OpenACC:
* Make `TeamPolicy` `parallel_for` execute on the correct async queue [\#7012](https://github.com/kokkos/kokkos/pull/7012)
#### OpenMPTarget:
* Honor user requested loop ordering in `MDRange` policy [\#6925](https://github.com/kokkos/kokkos/pull/6925)
* Prevent data races by guarding the scratch space used in `parallel_scan` [\#6998](https://github.com/kokkos/kokkos/pull/6998)
#### HPX:
* Workaround issue with template argument deduction to support compilation with NVCC [\#7015](https://github.com/kokkos/kokkos/pull/7015)
### General Enhancements
* Improve performance of view copies in host parallel regions [\#6730](https://github.com/kokkos/kokkos/pull/6730)
* Harmonize convertibility rules of `Kokkos::RandomAccessIterator` with `View`s [\#6929](https://github.com/kokkos/kokkos/pull/6929)
* Add a check precondition non-overlapping ranges for the `adjacent_difference` algorithm in debug mode [\#6922](https://github.com/kokkos/kokkos/pull/6922)
* Add deduction guides for `TeamPolicy` [\#7030](https://github.com/kokkos/kokkos/pull/7030)
* SIMD: Allow flexible vector width for 32 bit types [\#6802](https://github.com/kokkos/kokkos/pull/6802)
* Updates for `Kokkos::Array`: add `kokkos_swap(Array<T, N>)` specialization [\#6943](https://github.com/kokkos/kokkos/pull/6943), add `Kokkos::to_array` [\#6375](https://github.com/kokkos/kokkos/pull/6375), make `Kokkos::Array` equality-comparable [\#7148](https://github.com/kokkos/kokkos/pull/7148)
* Structured binding support for `Kokkos::complex` [\#7040](https://github.com/kokkos/kokkos/pull/7040)
### Build System Changes
* Do not require OpenMP support for languages other than CXX [\#6965](https://github.com/kokkos/kokkos/pull/6965)
* Update Intel GPU architectures in Makefile [\#6895](https://github.com/kokkos/kokkos/pull/6895)
* Fix use of OpenMP with Cuda or HIP as compile language [\#6972](https://github.com/kokkos/kokkos/pull/6972)
* Define and enforce new minimum compiler versions for C++20 support [\#7128](https://github.com/kokkos/kokkos/pull/7128), [\#7123](https://github.com/kokkos/kokkos/pull/7123)
* Add nvidia Grace CPU architecture: `Kokkos_ARCH_ARMV9_GRACE` [\#7158](https://github.com/kokkos/kokkos/pull/7158)
* Fix Makefile.kokkos for Threads [\#6896](https://github.com/kokkos/kokkos/pull/6896)
* Remove support for NVHPC as CUDA device compiler [\#6987](https://github.com/kokkos/kokkos/pull/6987)
* Fix using CUDAToolkit for CMake 3.28.4 and higher [\#7062](https://github.com/kokkos/kokkos/pull/7062)
### Incompatibilities (i.e. breaking changes)
* Drop `Kokkos::Array` special treatment in `View`s [\#6906](https://github.com/kokkos/kokkos/pull/6906)
* Drop `Experimental::RawMemoryAllocationFailure` [\#7145](https://github.com/kokkos/kokkos/pull/7145)
### Deprecations
* Remove `Experimental::LayoutTiled` class template and deprecate `is_layouttiled` trait [\#6907](https://github.com/kokkos/kokkos/pull/6907)
* Deprecate `Kokkos::layout_iterate_type_selector` [\#7076](https://github.com/kokkos/kokkos/pull/7076)
* Deprecate specialization of `Kokkos::pair` for a single element [\#6947](https://github.com/kokkos/kokkos/pull/6947)
* Deprecate `deep_copy` of `UnorderedMap` of different size [\#6812](https://github.com/kokkos/kokkos/pull/6812)
* Deprecate trailing `Proxy` template argument of `Kokkos::Array` [\#6934](https://github.com/kokkos/kokkos/pull/6934)
* Deprecate implicit conversions of integers to `ChunkSize` [\#7151](https://github.com/kokkos/kokkos/pull/7151)
* Deprecate implicit conversions to execution spaces [\#7156](https://github.com/kokkos/kokkos/pull/7156)
### Bug Fixes
* Do not return a copy of the input functor in `Experimental::for_each` [\#6910](https://github.com/kokkos/kokkos/pull/6910)
* Fix `realloc` on views of non-default constructible element types [\#6993](https://github.com/kokkos/kokkos/pull/6993)
* Fix undefined behavior in `View` initialization or fill with zeros [\#7014](https://github.com/kokkos/kokkos/pull/7014)
* Fix `sort_by_key` on host execution spaces when building with NVCC [\#7059](https://github.com/kokkos/kokkos/pull/7059)
* Fix using shared libraries and -fvisibility=hidden [\#7065](https://github.com/kokkos/kokkos/pull/7065)
* Fix view reference counting when functor copy constructor throws in parallel dispatch [\#6289](https://github.com/kokkos/kokkos/pull/6289)
* Fix `initialize(InitializationSetting)` for handling `print_configuration` setting [\#7098](https://github.com/kokkos/kokkos/pull/7098)
* Thread safety fixes for the Serial and OpenMP backend [\#7080](https://github.com/kokkos/kokkos/pull/7080), [\#6151](https://github.com/kokkos/kokkos/pull/6151)
## [4.3.01](https://github.com/kokkos/kokkos/tree/4.3.01) ## [4.3.01](https://github.com/kokkos/kokkos/tree/4.3.01)
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.3.00...4.3.01) [Full Changelog](https://github.com/kokkos/kokkos/compare/4.3.00...4.3.01)
### Backend and Architecture Enhancements: ### Backend and Architecture Enhancements:
#### HIP: #### HIP:
* MI300 support unified memory support [\#6877](https://github.com/kokkos/kokkos/pull/6877) * MI300 support unified memory [\#6877](https://github.com/kokkos/kokkos/pull/6877)
### Bug Fixes ### Bug Fixes
* Serial: Use the provided execution space instance in TeamPolicy [\#6951](https://github.com/kokkos/kokkos/pull/6951) * Serial: Use the provided execution space instance in TeamPolicy [\#6951](https://github.com/kokkos/kokkos/pull/6951)

65
lib/kokkos/CITATION.cff Normal file
View File

@ -0,0 +1,65 @@
cff-version: 1.2.0
title: Kokkos
message: >-
If you use this software, please cite the overview paper
type: software
authors:
- name: The Kokkos authors
website: https://kokkos.org/community/team/
identifiers:
- type: url
website: https://kokkos.org/kokkos-core-wiki/citation.html
repository-code: 'https://github.com/kokkos/kokkos'
url: 'https://kokkos.org/'
license: Apache-2.0
preferred-citation:
type: article
authors:
- given-names: Christian R.
family-names: Trott
- given-names: Damien
family-names: Lebrun-Grandié
- given-names: Daniel
family-names: Arndt
- family-names: Ciesko
given-names: Jan
- given-names: Vinh
family-names: Dang
- family-names: Ellingwood
given-names: Nathan
- given-names: Rahulkumar
family-names: Gayatri
- given-names: Evan
family-names: Harvey
- given-names: Daisy S.
family-names: Hollman
- given-names: Dan
family-names: Ibanez
- given-names: Nevin
family-names: Liber
- given-names: Jonathan
family-names: Madsen
- given-names: Jeff
family-names: Miles
- given-names: David
family-names: Poliakoff
- given-names: Amy
family-names: Powell
- given-names: Sivasankaran
family-names: Rajamanickam
- given-names: Mikael
family-names: Simberg
- given-names: Dan
family-names: Sunderland
- given-names: Bruno
family-names: Turcksin
- given-names: Jeremiah
family-names: Wilke
doi: 10.1109/TPDS.2021.3097283
journal: IEEE Transactions on Parallel and Distributed Systems
start: 805
end: 817
title: "Kokkos 3: Programming Model Extensions for the Exascale Era"
volume: 33
issue: 4
year: 2022

View File

@ -150,8 +150,8 @@ ENDIF()
set(Kokkos_VERSION_MAJOR 4) set(Kokkos_VERSION_MAJOR 4)
set(Kokkos_VERSION_MINOR 3) set(Kokkos_VERSION_MINOR 4)
set(Kokkos_VERSION_PATCH 1) set(Kokkos_VERSION_PATCH 0)
set(Kokkos_VERSION "${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}.${Kokkos_VERSION_PATCH}") set(Kokkos_VERSION "${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}.${Kokkos_VERSION_PATCH}")
message(STATUS "Kokkos version: ${Kokkos_VERSION}") message(STATUS "Kokkos version: ${Kokkos_VERSION}")
math(EXPR KOKKOS_VERSION "${Kokkos_VERSION_MAJOR} * 10000 + ${Kokkos_VERSION_MINOR} * 100 + ${Kokkos_VERSION_PATCH}") math(EXPR KOKKOS_VERSION "${Kokkos_VERSION_MAJOR} * 10000 + ${Kokkos_VERSION_MINOR} * 100 + ${Kokkos_VERSION_PATCH}")

View File

@ -11,8 +11,8 @@ CXXFLAGS += $(SHFLAGS)
endif endif
KOKKOS_VERSION_MAJOR = 4 KOKKOS_VERSION_MAJOR = 4
KOKKOS_VERSION_MINOR = 3 KOKKOS_VERSION_MINOR = 4
KOKKOS_VERSION_PATCH = 1 KOKKOS_VERSION_PATCH = 0
KOKKOS_VERSION = $(shell echo $(KOKKOS_VERSION_MAJOR)*10000+$(KOKKOS_VERSION_MINOR)*100+$(KOKKOS_VERSION_PATCH) | bc) KOKKOS_VERSION = $(shell echo $(KOKKOS_VERSION_MAJOR)*10000+$(KOKKOS_VERSION_MINOR)*100+$(KOKKOS_VERSION_PATCH) | bc)
# Options: Cuda,HIP,SYCL,OpenMPTarget,OpenMP,Threads,Serial # Options: Cuda,HIP,SYCL,OpenMPTarget,OpenMP,Threads,Serial
@ -21,11 +21,11 @@ KOKKOS_DEVICES ?= "OpenMP"
# Options: # Options:
# Intel: KNC,KNL,SNB,HSW,BDW,SKL,SKX,ICL,ICX,SPR # Intel: KNC,KNL,SNB,HSW,BDW,SKL,SKX,ICL,ICX,SPR
# NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86,Ada89,Hopper90 # NVIDIA: Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86,Ada89,Hopper90
# ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX # ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX,ARMv9-Grace
# IBM: Power8,Power9 # IBM: Power8,Power9
# AMD-GPUS: AMD_GFX906,AMD_GFX908,AMD_GFX90A,AMD_GFX940,AMD_GFX942,AMD_GFX1030,AMD_GFX1100,AMD_GFX1103 # AMD-GPUS: AMD_GFX906,AMD_GFX908,AMD_GFX90A,AMD_GFX940,AMD_GFX942,AMD_GFX1030,AMD_GFX1100
# AMD-CPUS: AMDAVX,Zen,Zen2,Zen3 # AMD-CPUS: AMDAVX,Zen,Zen2,Zen3
# Intel-GPUs: Gen9,Gen11,Gen12LP,DG1,XeHP,PVC # Intel-GPUs: Intel_Gen,Intel_Gen9,Intel_Gen11,Intel_Gen12LP,Intel_DG1,Intel_XeHP,Intel_PVC
KOKKOS_ARCH ?= "" KOKKOS_ARCH ?= ""
# Options: yes,no # Options: yes,no
KOKKOS_DEBUG ?= "no" KOKKOS_DEBUG ?= "no"
@ -41,7 +41,7 @@ KOKKOS_STANDALONE_CMAKE ?= "no"
# Default settings specific options. # Default settings specific options.
# Options: force_uvm,use_ldg,rdc,enable_lambda,enable_constexpr,disable_malloc_async # Options: force_uvm,use_ldg,rdc,enable_lambda,enable_constexpr,disable_malloc_async
KOKKOS_CUDA_OPTIONS ?= "enable_lambda" KOKKOS_CUDA_OPTIONS ?= "enable_lambda,disable_malloc_async"
# Options: rdc # Options: rdc
KOKKOS_HIP_OPTIONS ?= "" KOKKOS_HIP_OPTIONS ?= ""
@ -328,12 +328,43 @@ KOKKOS_INTERNAL_USE_ARCH_ICL := $(call kokkos_has_string,$(KOKKOS_ARCH),ICL)
KOKKOS_INTERNAL_USE_ARCH_ICX := $(call kokkos_has_string,$(KOKKOS_ARCH),ICX) KOKKOS_INTERNAL_USE_ARCH_ICX := $(call kokkos_has_string,$(KOKKOS_ARCH),ICX)
KOKKOS_INTERNAL_USE_ARCH_SPR := $(call kokkos_has_string,$(KOKKOS_ARCH),SPR) KOKKOS_INTERNAL_USE_ARCH_SPR := $(call kokkos_has_string,$(KOKKOS_ARCH),SPR)
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelGen) # Traditionally, we supported, e.g., IntelGen9 instead of Intel_Gen9. The latter
# matches the CMake option but we also accept the former for backward-compatibility.
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN9 := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelGen9) KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN9 := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelGen9)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN9), 0)
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN9 := $(call kokkos_has_string,$(KOKKOS_ARCH),Intel_Gen9)
endif
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN11 := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelGen11) KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN11 := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelGen11)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN11), 0)
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN11 := $(call kokkos_has_string,$(KOKKOS_ARCH),Intel_Gen11)
endif
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN12LP := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelGen12LP) KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN12LP := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelGen12LP)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN12LP), 0)
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN12LP := $(call kokkos_has_string,$(KOKKOS_ARCH),Intel_Gen12LP)
endif
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelGen9)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN9), 0)
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN9 := $(call kokkos_has_string,$(KOKKOS_ARCH),Intel_Gen9)
endif
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN_SET := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN9) \
+ $(KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN11) \
+ $(KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN12LP))
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN_SET), 0)
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelGen)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN), 0)
KOKKOS_INTERNAL_USE_ARCH_INTEL_GEN := $(call kokkos_has_string,$(KOKKOS_ARCH),Intel_Gen)
endif
endif
KOKKOS_INTERNAL_USE_ARCH_INTEL_DG1 := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelDG1) KOKKOS_INTERNAL_USE_ARCH_INTEL_DG1 := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelDG1)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_INTEL_DG1), 0)
KOKKOS_INTERNAL_USE_ARCH_INTEL_DG1 := $(call kokkos_has_string,$(KOKKOS_ARCH),Intel_DG1)
endif
KOKKOS_INTERNAL_USE_ARCH_INTEL_XEHP := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelXeHP) KOKKOS_INTERNAL_USE_ARCH_INTEL_XEHP := $(call kokkos_has_string,$(KOKKOS_ARCH),IntelXeHP)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_INTEL_XEHP), 0)
KOKKOS_INTERNAL_USE_ARCH_INTEL_XEHP := $(call kokkos_has_string,$(KOKKOS_ARCH),Intel_XeHP)
endif
# Traditionally the architecture was called PVC instead of Intel_PVC. This
# version makes us accept IntelPVC and Intel_PVC as well.
KOKKOS_INTERNAL_USE_ARCH_INTEL_PVC := $(call kokkos_has_string,$(KOKKOS_ARCH),PVC) KOKKOS_INTERNAL_USE_ARCH_INTEL_PVC := $(call kokkos_has_string,$(KOKKOS_ARCH),PVC)
# NVIDIA based. # NVIDIA based.
@ -394,7 +425,8 @@ KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8-ThunderX) KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8-ThunderX)
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX2 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8-TX2) KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX2 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8-TX2)
KOKKOS_INTERNAL_USE_ARCH_A64FX := $(call kokkos_has_string,$(KOKKOS_ARCH),A64FX) KOKKOS_INTERNAL_USE_ARCH_A64FX := $(call kokkos_has_string,$(KOKKOS_ARCH),A64FX)
KOKKOS_INTERNAL_USE_ARCH_ARM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX2)+$(KOKKOS_INTERNAL_USE_ARCH_A64FX) | bc)) KOKKOS_INTERNAL_USE_ARCH_ARMV9_GRACE := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv9-Grace)
KOKKOS_INTERNAL_USE_ARCH_ARM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX2)+$(KOKKOS_INTERNAL_USE_ARCH_A64FX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV9_GRACE) | bc))
# IBM based. # IBM based.
KOKKOS_INTERNAL_USE_ARCH_POWER8 := $(call kokkos_has_string,$(KOKKOS_ARCH),Power8) KOKKOS_INTERNAL_USE_ARCH_POWER8 := $(call kokkos_has_string,$(KOKKOS_ARCH),Power8)
@ -433,7 +465,6 @@ KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1100 := $(call kokkos_has_string,$(KOKKOS_ARCH),
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1100), 0) ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1100), 0)
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1100 := $(call kokkos_has_string,$(KOKKOS_ARCH),NAVI1100) KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1100 := $(call kokkos_has_string,$(KOKKOS_ARCH),NAVI1100)
endif endif
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1103 := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX1103)
# Any AVX? # Any AVX?
KOKKOS_INTERNAL_USE_ARCH_AVX := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_SNB) + $(KOKKOS_INTERNAL_USE_ARCH_AMDAVX)) KOKKOS_INTERNAL_USE_ARCH_AVX := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_SNB) + $(KOKKOS_INTERNAL_USE_ARCH_AMDAVX))
@ -758,6 +789,14 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_A64FX), 1)
endif endif
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV9_GRACE), 1)
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_ARMV9_GRACE")
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_ARM_NEON")
KOKKOS_CXXFLAGS += -mcpu=neoverse-v2 -msve-vector-bits=128
KOKKOS_LDFLAGS += -mcpu=neoverse-v2 -msve-vector-bits=128
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN), 1) ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN), 1)
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_ZEN") tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_ZEN")
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AVX2") tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AVX2")
@ -1119,11 +1158,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1100), 1)
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GPU") tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GPU")
KOKKOS_INTERNAL_HIP_ARCH_FLAG := --offload-arch=gfx1100 KOKKOS_INTERNAL_HIP_ARCH_FLAG := --offload-arch=gfx1100
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1103), 1)
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GFX1103")
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GPU")
KOKKOS_INTERNAL_HIP_ARCH_FLAG := --offload-arch=gfx1103
endif
ifeq ($(KOKKOS_INTERNAL_USE_HIP), 1) ifeq ($(KOKKOS_INTERNAL_USE_HIP), 1)
@ -1216,6 +1250,8 @@ ifeq ($(KOKKOS_INTERNAL_DISABLE_BUNDLED_MDSPAN), 0)
endif endif
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ENABLE_IMPL_MDSPAN") tmp := $(call kokkos_append_header,"$H""define KOKKOS_ENABLE_IMPL_MDSPAN")
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ENABLE_IMPL_REF_COUNT_BRANCH_UNLIKELY")
KOKKOS_INTERNAL_LS_CONFIG := $(shell ls KokkosCore_config.h 2>&1) KOKKOS_INTERNAL_LS_CONFIG := $(shell ls KokkosCore_config.h 2>&1)
ifeq ($(KOKKOS_INTERNAL_LS_CONFIG), KokkosCore_config.h) ifeq ($(KOKKOS_INTERNAL_LS_CONFIG), KokkosCore_config.h)

View File

@ -81,7 +81,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_THREADS), 1)
Kokkos_Threads_Instance.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_Instance.cpp Kokkos_Threads_Instance.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_Instance.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_Instance.cpp $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_Instance.cpp
Kokkos_Threads_Spinwait.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_Spinwait.cpp Kokkos_Threads_Spinwait.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_Spinwait.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_Spinwait.cpp $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_Spinwait.cpp
endif endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1) ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)

View File

@ -1,4 +1,4 @@
![Kokkos](https://avatars2.githubusercontent.com/u/10199860?s=200&v=4) [![Kokkos](https://avatars2.githubusercontent.com/u/10199860?s=200&v=4)](https://kokkos.org)
# Kokkos: Core Libraries # Kokkos: Core Libraries
@ -10,43 +10,66 @@ hierarchies and multiple types of execution resources. It currently can use
CUDA, HIP, SYCL, HPX, OpenMP and C++ threads as backend programming models with several other CUDA, HIP, SYCL, HPX, OpenMP and C++ threads as backend programming models with several other
backends in development. backends in development.
**Kokkos Core is part of the Kokkos C++ Performance Portability Programming EcoSystem.** **Kokkos Core is part of the [Kokkos C++ Performance Portability Programming Ecosystem](https://kokkos.org/about/abstract/).**
For the complete documentation, click below: Kokkos is a [Linux Foundation](https://linuxfoundation.org) project.
# [kokkos.github.io/kokkos-core-wiki](https://kokkos.github.io/kokkos-core-wiki) ## Learning about Kokkos
# Learning about Kokkos
To start learning about Kokkos: To start learning about Kokkos:
- [Kokkos Lectures](https://kokkos.github.io/kokkos-core-wiki/videolectures.html): they contain a mix of lecture videos and hands-on exercises covering all the important Kokkos Ecosystem capabilities. - [Kokkos Lectures](https://kokkos.org/kokkos-core-wiki/videolectures.html): they contain a mix of lecture videos and hands-on exercises covering all the important capabilities.
- [Programming guide](https://kokkos.github.io/kokkos-core-wiki/programmingguide.html): contains in "narrative" form a technical description of the programming model, machine model, and the main building blocks like the Views and parallel dispatch. - [Programming guide](https://kokkos.org/kokkos-core-wiki/programmingguide.html): contains in "narrative" form a technical description of the programming model, machine model, and the main building blocks like the Views and parallel dispatch.
- [API reference](https://kokkos.github.io/kokkos-core-wiki/): organized by category, i.e., [core](https://kokkos.github.io/kokkos-core-wiki/API/core-index.html), [algorithms](https://kokkos.github.io/kokkos-core-wiki/API/algorithms-index.html) and [containers](https://kokkos.github.io/kokkos-core-wiki/API/containers-index.html) or, if you prefer, in [alphabetical order](https://kokkos.github.io/kokkos-core-wiki/API/alphabetical.html). - [API reference](https://kokkos.org/kokkos-core-wiki/): organized by category, i.e., [core](https://kokkos.org/kokkos-core-wiki/API/core-index.html), [algorithms](https://kokkos.org/kokkos-core-wiki/API/algorithms-index.html) and [containers](https://kokkos.org/kokkos-core-wiki/API/containers-index.html) or, if you prefer, in [alphabetical order](https://kokkos.org/kokkos-core-wiki/API/alphabetical.html).
- [Use cases and Examples](https://kokkos.github.io/kokkos-core-wiki/usecases.html): a series of examples ranging from how to use Kokkos with MPI to Fortran interoperability. - [Use cases and Examples](https://kokkos.org/kokkos-core-wiki/usecases.html): a serie of examples ranging from how to use Kokkos with MPI to Fortran interoperability.
## Obtaining Kokkos
The latest release of Kokkos can be obtained from the [GitHub releases page](https://github.com/kokkos/kokkos/releases/latest).
The current release is [4.3.01](https://github.com/kokkos/kokkos/releases/tag/4.3.01).
```bash
curl -OJ -L https://github.com/kokkos/kokkos/archive/refs/tags/4.3.01.tar.gz
# Or with wget
wget https://github.com/kokkos/kokkos/archive/refs/tags/4.3.01.tar.gz
```
To clone the latest development version of Kokkos from GitHub:
```bash
git clone -b develop https://github.com/kokkos/kokkos.git
```
### Building Kokkos
To build Kokkos, you will need to have a C++ compiler that supports C++17 or later.
All requirements including minimum and primary tested compiler versions can be found [here](https://kokkos.org/kokkos-core-wiki/requirements.html).
Building and installation instructions are described [here](https://kokkos.org/kokkos-core-wiki/building.html).
You can also install Kokkos using [Spack](https://spack.io/): `spack install kokkos`. [Available configuration options](https://packages.spack.io/package.html?name=kokkos) can be displayed using `spack info kokkos`.
## For the complete documentation: [kokkos.org/kokkos-core-wiki/](https://kokkos.org/kokkos-core-wiki/)
## Support
For questions find us on Slack: https://kokkosteam.slack.com or open a GitHub issue. For questions find us on Slack: https://kokkosteam.slack.com or open a GitHub issue.
For non-public questions send an email to: *crtrott(at)sandia.gov* For non-public questions send an email to: *crtrott(at)sandia.gov*
# Contributing to Kokkos ## Contributing
Please see [this page](https://kokkos.github.io/kokkos-core-wiki/contributing.html) for details on how to contribute. Please see [this page](https://kokkos.org/kokkos-core-wiki/contributing.html) for details on how to contribute.
# Requirements, Building and Installing ## Citing Kokkos
All requirements including minimum and primary tested compiler versions can be found [here](https://kokkos.github.io/kokkos-core-wiki/requirements.html). Please see the [following page](https://kokkos.org/kokkos-core-wiki/citation.html).
Building and installation instructions are described [here](https://kokkos.github.io/kokkos-core-wiki/building.html). ## License
# Citing Kokkos
Please see the [following page](https://kokkos.github.io/kokkos-core-wiki/citation.html).
# License
[![License](https://img.shields.io/badge/License-Apache--2.0_WITH_LLVM--exception-blue)](https://spdx.org/licenses/LLVM-exception.html) [![License](https://img.shields.io/badge/License-Apache--2.0_WITH_LLVM--exception-blue)](https://spdx.org/licenses/LLVM-exception.html)

View File

@ -189,6 +189,33 @@ void applyPermutation(const ExecutionSpace& space,
KOKKOS_LAMBDA(int i) { view(i) = view_copy(permutation(i)); }); KOKKOS_LAMBDA(int i) { view(i) = view_copy(permutation(i)); });
} }
// FIXME_NVCC: nvcc has trouble compiling lambdas inside a function with
// variadic templates (sort_by_key_via_sort). Switch to using functors instead.
template <typename Permute>
struct IotaFunctor {
Permute _permute;
KOKKOS_FUNCTION void operator()(int i) const { _permute(i) = i; }
};
template <typename Keys>
struct LessFunctor {
Keys _keys;
KOKKOS_FUNCTION bool operator()(int i, int j) const {
return _keys(i) < _keys(j);
}
};
// FIXME_NVCC+MSVC: We can't use a lambda instead of a functor which gave us
// "For this host platform/dialect, an extended lambda cannot be defined inside
// the 'if' or 'else' block of a constexpr if statement"
template <typename Keys, typename Comparator>
struct KeyComparisonFunctor {
Keys m_keys;
Comparator m_comparator;
KOKKOS_FUNCTION bool operator()(int i, int j) const {
return m_comparator(m_keys(i), m_keys(j));
}
};
template <class ExecutionSpace, class KeysDataType, class... KeysProperties, template <class ExecutionSpace, class KeysDataType, class... KeysProperties,
class ValuesDataType, class... ValuesProperties, class ValuesDataType, class... ValuesProperties,
class... MaybeComparator> class... MaybeComparator>
@ -207,10 +234,9 @@ void sort_by_key_via_sort(
n); n);
// iota // iota
Kokkos::parallel_for( Kokkos::parallel_for("Kokkos::sort_by_key_via_sort::iota",
"Kokkos::sort_by_key_via_sort::iota", Kokkos::RangePolicy<ExecutionSpace>(exec, 0, n),
Kokkos::RangePolicy<ExecutionSpace>(exec, 0, n), IotaFunctor<decltype(permute)>{permute});
KOKKOS_LAMBDA(int i) { permute(i) = i; });
using Layout = using Layout =
typename Kokkos::View<unsigned int*, ExecutionSpace>::array_layout; typename Kokkos::View<unsigned int*, ExecutionSpace>::array_layout;
@ -228,16 +254,15 @@ void sort_by_key_via_sort(
Kokkos::DefaultHostExecutionSpace host_exec; Kokkos::DefaultHostExecutionSpace host_exec;
if constexpr (sizeof...(MaybeComparator) == 0) { if constexpr (sizeof...(MaybeComparator) == 0) {
Kokkos::sort( Kokkos::sort(host_exec, host_permute,
host_exec, host_permute, LessFunctor<decltype(host_keys)>{host_keys});
KOKKOS_LAMBDA(int i, int j) { return host_keys(i) < host_keys(j); });
} else { } else {
auto keys_comparator = auto keys_comparator =
std::get<0>(std::tuple<MaybeComparator...>(maybeComparator...)); std::get<0>(std::tuple<MaybeComparator...>(maybeComparator...));
Kokkos::sort( Kokkos::sort(
host_exec, host_permute, KOKKOS_LAMBDA(int i, int j) { host_exec, host_permute,
return keys_comparator(host_keys(i), host_keys(j)); KeyComparisonFunctor<decltype(host_keys), decltype(keys_comparator)>{
}); host_keys, keys_comparator});
} }
host_exec.fence("Kokkos::Impl::sort_by_key_via_sort: after host sort"); host_exec.fence("Kokkos::Impl::sort_by_key_via_sort: after host sort");
Kokkos::deep_copy(exec, permute, host_permute); Kokkos::deep_copy(exec, permute, host_permute);
@ -262,16 +287,14 @@ void sort_by_key_via_sort(
} }
#else #else
if constexpr (sizeof...(MaybeComparator) == 0) { if constexpr (sizeof...(MaybeComparator) == 0) {
Kokkos::sort( Kokkos::sort(exec, permute, LessFunctor<decltype(keys)>{keys});
exec, permute,
KOKKOS_LAMBDA(int i, int j) { return keys(i) < keys(j); });
} else { } else {
auto keys_comparator = auto keys_comparator =
std::get<0>(std::tuple<MaybeComparator...>(maybeComparator...)); std::get<0>(std::tuple<MaybeComparator...>(maybeComparator...));
Kokkos::sort( Kokkos::sort(
exec, permute, KOKKOS_LAMBDA(int i, int j) { exec, permute,
return keys_comparator(keys(i), keys(j)); KeyComparisonFunctor<decltype(keys), decltype(keys_comparator)>{
}); keys, keys_comparator});
} }
#endif #endif
} }

View File

@ -29,49 +29,46 @@ namespace Experimental {
template < template <
class ExecutionSpace, class IteratorType, class UnaryFunctorType, class ExecutionSpace, class IteratorType, class UnaryFunctorType,
std::enable_if_t<Kokkos::is_execution_space_v<ExecutionSpace>, int> = 0> std::enable_if_t<Kokkos::is_execution_space_v<ExecutionSpace>, int> = 0>
UnaryFunctorType for_each(const std::string& label, const ExecutionSpace& ex, void for_each(const std::string& label, const ExecutionSpace& ex,
IteratorType first, IteratorType last, IteratorType first, IteratorType last, UnaryFunctorType functor) {
UnaryFunctorType functor) { Impl::for_each_exespace_impl(label, ex, first, last, std::move(functor));
return Impl::for_each_exespace_impl(label, ex, first, last,
std::move(functor));
} }
template < template <
class ExecutionSpace, class IteratorType, class UnaryFunctorType, class ExecutionSpace, class IteratorType, class UnaryFunctorType,
std::enable_if_t<Kokkos::is_execution_space_v<ExecutionSpace>, int> = 0> std::enable_if_t<Kokkos::is_execution_space_v<ExecutionSpace>, int> = 0>
UnaryFunctorType for_each(const ExecutionSpace& ex, IteratorType first, void for_each(const ExecutionSpace& ex, IteratorType first, IteratorType last,
IteratorType last, UnaryFunctorType functor) { UnaryFunctorType functor) {
return Impl::for_each_exespace_impl("Kokkos::for_each_iterator_api_default", Impl::for_each_exespace_impl("Kokkos::for_each_iterator_api_default", ex,
ex, first, last, std::move(functor)); first, last, std::move(functor));
} }
template < template <
class ExecutionSpace, class DataType, class... Properties, class ExecutionSpace, class DataType, class... Properties,
class UnaryFunctorType, class UnaryFunctorType,
std::enable_if_t<Kokkos::is_execution_space_v<ExecutionSpace>, int> = 0> std::enable_if_t<Kokkos::is_execution_space_v<ExecutionSpace>, int> = 0>
UnaryFunctorType for_each(const std::string& label, const ExecutionSpace& ex, void for_each(const std::string& label, const ExecutionSpace& ex,
const ::Kokkos::View<DataType, Properties...>& v, const ::Kokkos::View<DataType, Properties...>& v,
UnaryFunctorType functor) { UnaryFunctorType functor) {
Impl::static_assert_is_admissible_to_kokkos_std_algorithms(v); Impl::static_assert_is_admissible_to_kokkos_std_algorithms(v);
namespace KE = ::Kokkos::Experimental; namespace KE = ::Kokkos::Experimental;
return Impl::for_each_exespace_impl(label, ex, KE::begin(v), KE::end(v), Impl::for_each_exespace_impl(label, ex, KE::begin(v), KE::end(v),
std::move(functor)); std::move(functor));
} }
template < template <
class ExecutionSpace, class DataType, class... Properties, class ExecutionSpace, class DataType, class... Properties,
class UnaryFunctorType, class UnaryFunctorType,
std::enable_if_t<Kokkos::is_execution_space_v<ExecutionSpace>, int> = 0> std::enable_if_t<Kokkos::is_execution_space_v<ExecutionSpace>, int> = 0>
UnaryFunctorType for_each(const ExecutionSpace& ex, void for_each(const ExecutionSpace& ex,
const ::Kokkos::View<DataType, Properties...>& v, const ::Kokkos::View<DataType, Properties...>& v,
UnaryFunctorType functor) { UnaryFunctorType functor) {
Impl::static_assert_is_admissible_to_kokkos_std_algorithms(v); Impl::static_assert_is_admissible_to_kokkos_std_algorithms(v);
namespace KE = ::Kokkos::Experimental; namespace KE = ::Kokkos::Experimental;
return Impl::for_each_exespace_impl("Kokkos::for_each_view_api_default", ex, Impl::for_each_exespace_impl("Kokkos::for_each_view_api_default", ex,
KE::begin(v), KE::end(v), KE::begin(v), KE::end(v), std::move(functor));
std::move(functor));
} }
// //
@ -82,24 +79,23 @@ UnaryFunctorType for_each(const ExecutionSpace& ex,
template <class TeamHandleType, class IteratorType, class UnaryFunctorType, template <class TeamHandleType, class IteratorType, class UnaryFunctorType,
std::enable_if_t<Kokkos::is_team_handle_v<TeamHandleType>, int> = 0> std::enable_if_t<Kokkos::is_team_handle_v<TeamHandleType>, int> = 0>
KOKKOS_FUNCTION UnaryFunctorType for_each(const TeamHandleType& teamHandle, KOKKOS_FUNCTION void for_each(const TeamHandleType& teamHandle,
IteratorType first, IteratorType last, IteratorType first, IteratorType last,
UnaryFunctorType functor) { UnaryFunctorType functor) {
return Impl::for_each_team_impl(teamHandle, first, last, std::move(functor)); Impl::for_each_team_impl(teamHandle, first, last, std::move(functor));
} }
template <class TeamHandleType, class DataType, class... Properties, template <class TeamHandleType, class DataType, class... Properties,
class UnaryFunctorType, class UnaryFunctorType,
std::enable_if_t<Kokkos::is_team_handle_v<TeamHandleType>, int> = 0> std::enable_if_t<Kokkos::is_team_handle_v<TeamHandleType>, int> = 0>
KOKKOS_FUNCTION UnaryFunctorType KOKKOS_FUNCTION void for_each(const TeamHandleType& teamHandle,
for_each(const TeamHandleType& teamHandle, const ::Kokkos::View<DataType, Properties...>& v,
const ::Kokkos::View<DataType, Properties...>& v, UnaryFunctorType functor) {
UnaryFunctorType functor) {
Impl::static_assert_is_admissible_to_kokkos_std_algorithms(v); Impl::static_assert_is_admissible_to_kokkos_std_algorithms(v);
namespace KE = ::Kokkos::Experimental; namespace KE = ::Kokkos::Experimental;
return Impl::for_each_team_impl(teamHandle, KE::begin(v), KE::end(v), Impl::for_each_team_impl(teamHandle, KE::begin(v), KE::end(v),
std::move(functor)); std::move(functor));
} }
} // namespace Experimental } // namespace Experimental

View File

@ -82,6 +82,11 @@ OutputIteratorType adjacent_difference_exespace_impl(
return first_dest; return first_dest;
} }
#ifdef KOKKOS_ENABLE_DEBUG
// check for overlapping iterators
Impl::expect_no_overlap(first_from, last_from, first_dest);
#endif
// run // run
const auto num_elements = const auto num_elements =
Kokkos::Experimental::distance(first_from, last_from); Kokkos::Experimental::distance(first_from, last_from);
@ -114,6 +119,11 @@ KOKKOS_FUNCTION OutputIteratorType adjacent_difference_team_impl(
return first_dest; return first_dest;
} }
#ifdef KOKKOS_ENABLE_DEBUG
// check for overlapping iterators
Impl::expect_no_overlap(first_from, last_from, first_dest);
#endif
// run // run
const auto num_elements = const auto num_elements =
Kokkos::Experimental::distance(first_from, last_from); Kokkos::Experimental::distance(first_from, last_from);

View File

@ -24,18 +24,21 @@ namespace Kokkos {
namespace Experimental { namespace Experimental {
namespace Impl { namespace Impl {
template <class T>
class RandomAccessIterator;
template <typename T, typename enable = void> template <typename T, typename enable = void>
struct is_admissible_to_kokkos_std_algorithms : std::false_type {}; struct is_admissible_to_kokkos_std_algorithms : std::false_type {};
template <typename T> template <typename T>
struct is_admissible_to_kokkos_std_algorithms< struct is_admissible_to_kokkos_std_algorithms<
T, std::enable_if_t< ::Kokkos::is_view<T>::value && T::rank() == 1 && T, std::enable_if_t<::Kokkos::is_view<T>::value && T::rank() == 1 &&
(std::is_same<typename T::traits::array_layout, (std::is_same<typename T::traits::array_layout,
Kokkos::LayoutLeft>::value || Kokkos::LayoutLeft>::value ||
std::is_same<typename T::traits::array_layout, std::is_same<typename T::traits::array_layout,
Kokkos::LayoutRight>::value || Kokkos::LayoutRight>::value ||
std::is_same<typename T::traits::array_layout, std::is_same<typename T::traits::array_layout,
Kokkos::LayoutStride>::value)> > Kokkos::LayoutStride>::value)>>
: std::true_type {}; : std::true_type {};
template <class ViewType> template <class ViewType>
@ -58,6 +61,18 @@ using is_iterator = Kokkos::is_detected<iterator_category_t, T>;
template <class T> template <class T>
inline constexpr bool is_iterator_v = is_iterator<T>::value; inline constexpr bool is_iterator_v = is_iterator<T>::value;
template <typename ViewType>
struct is_kokkos_iterator : std::false_type {};
template <typename ViewType>
struct is_kokkos_iterator<RandomAccessIterator<ViewType>> {
static constexpr bool value =
is_admissible_to_kokkos_std_algorithms<ViewType>::value;
};
template <class T>
inline constexpr bool is_kokkos_iterator_v = is_kokkos_iterator<T>::value;
// //
// are_iterators // are_iterators
// //
@ -215,6 +230,38 @@ KOKKOS_INLINE_FUNCTION void expect_valid_range(IteratorType first,
(void)last; (void)last;
} }
//
// Check if kokkos iterators are overlapping
//
template <typename IteratorType1, typename IteratorType2>
KOKKOS_INLINE_FUNCTION void expect_no_overlap(
[[maybe_unused]] IteratorType1 first, [[maybe_unused]] IteratorType1 last,
[[maybe_unused]] IteratorType2 s_first) {
if constexpr (is_kokkos_iterator_v<IteratorType1> &&
is_kokkos_iterator_v<IteratorType2>) {
auto const view1 = first.view();
auto const view2 = s_first.view();
std::size_t stride1 = view1.stride(0);
std::size_t stride2 = view2.stride(0);
ptrdiff_t first_diff = view1.data() - view2.data();
// FIXME If strides are not identical, checks may not be made
// with the cost of O(1)
// Currently, checks are made only if strides are identical
// If first_diff == 0, there is already an overlap
if (stride1 == stride2 || first_diff == 0) {
[[maybe_unused]] bool is_no_overlap = (first_diff % stride1);
auto* first_pointer1 = view1.data();
auto* first_pointer2 = view2.data();
[[maybe_unused]] auto* last_pointer1 = first_pointer1 + (last - first);
[[maybe_unused]] auto* last_pointer2 = first_pointer2 + (last - first);
KOKKOS_EXPECTS(first_pointer1 >= last_pointer2 ||
last_pointer1 <= first_pointer2 || is_no_overlap);
}
}
}
} // namespace Impl } // namespace Impl
} // namespace Experimental } // namespace Experimental
} // namespace Kokkos } // namespace Kokkos

View File

@ -150,8 +150,9 @@ KOKKOS_FUNCTION OutputIterator copy_if_team_impl(
return d_first + count; return d_first + count;
} }
#if defined KOKKOS_COMPILER_INTEL || \ #if defined KOKKOS_COMPILER_INTEL || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130) (defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
__builtin_unreachable(); __builtin_unreachable();
#endif #endif
} }

View File

@ -42,10 +42,9 @@ struct StdForEachFunctor {
}; };
template <class HandleType, class IteratorType, class UnaryFunctorType> template <class HandleType, class IteratorType, class UnaryFunctorType>
UnaryFunctorType for_each_exespace_impl(const std::string& label, void for_each_exespace_impl(const std::string& label, const HandleType& handle,
const HandleType& handle, IteratorType first, IteratorType last,
IteratorType first, IteratorType last, UnaryFunctorType functor) {
UnaryFunctorType functor) {
// checks // checks
Impl::static_assert_random_access_and_accessible(handle, first); Impl::static_assert_random_access_and_accessible(handle, first);
Impl::expect_valid_range(first, last); Impl::expect_valid_range(first, last);
@ -56,8 +55,6 @@ UnaryFunctorType for_each_exespace_impl(const std::string& label,
label, RangePolicy<HandleType>(handle, 0, num_elements), label, RangePolicy<HandleType>(handle, 0, num_elements),
StdForEachFunctor<IteratorType, UnaryFunctorType>(first, functor)); StdForEachFunctor<IteratorType, UnaryFunctorType>(first, functor));
handle.fence("Kokkos::for_each: fence after operation"); handle.fence("Kokkos::for_each: fence after operation");
return functor;
} }
template <class ExecutionSpace, class IteratorType, class SizeType, template <class ExecutionSpace, class IteratorType, class SizeType,
@ -75,7 +72,7 @@ IteratorType for_each_n_exespace_impl(const std::string& label,
} }
for_each_exespace_impl(label, ex, first, last, std::move(functor)); for_each_exespace_impl(label, ex, first, last, std::move(functor));
// no neeed to fence since for_each_exespace_impl fences already // no need to fence since for_each_exespace_impl fences already
return last; return last;
} }
@ -84,9 +81,9 @@ IteratorType for_each_n_exespace_impl(const std::string& label,
// team impl // team impl
// //
template <class TeamHandleType, class IteratorType, class UnaryFunctorType> template <class TeamHandleType, class IteratorType, class UnaryFunctorType>
KOKKOS_FUNCTION UnaryFunctorType KOKKOS_FUNCTION void for_each_team_impl(const TeamHandleType& teamHandle,
for_each_team_impl(const TeamHandleType& teamHandle, IteratorType first, IteratorType first, IteratorType last,
IteratorType last, UnaryFunctorType functor) { UnaryFunctorType functor) {
// checks // checks
Impl::static_assert_random_access_and_accessible(teamHandle, first); Impl::static_assert_random_access_and_accessible(teamHandle, first);
Impl::expect_valid_range(first, last); Impl::expect_valid_range(first, last);
@ -96,7 +93,6 @@ for_each_team_impl(const TeamHandleType& teamHandle, IteratorType first,
TeamThreadRange(teamHandle, 0, num_elements), TeamThreadRange(teamHandle, 0, num_elements),
StdForEachFunctor<IteratorType, UnaryFunctorType>(first, functor)); StdForEachFunctor<IteratorType, UnaryFunctorType>(first, functor));
teamHandle.team_barrier(); teamHandle.team_barrier();
return functor;
} }
template <class TeamHandleType, class IteratorType, class SizeType, template <class TeamHandleType, class IteratorType, class SizeType,
@ -113,7 +109,7 @@ for_each_n_team_impl(const TeamHandleType& teamHandle, IteratorType first,
} }
for_each_team_impl(teamHandle, first, last, std::move(functor)); for_each_team_impl(teamHandle, first, last, std::move(functor));
// no neeed to fence since for_each_team_impl fences already // no need to fence since for_each_team_impl fences already
return last; return last;
} }

View File

@ -59,6 +59,30 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
ptrdiff_t current_index) ptrdiff_t current_index)
: m_view(view), m_current_index(current_index) {} : m_view(view), m_current_index(current_index) {}
#ifndef KOKKOS_ENABLE_CXX17 // C++20 and beyond
template <class OtherViewType>
requires(std::is_constructible_v<view_type, OtherViewType>) KOKKOS_FUNCTION
explicit(!std::is_convertible_v<OtherViewType, view_type>)
RandomAccessIterator(const RandomAccessIterator<OtherViewType>& other)
: m_view(other.m_view), m_current_index(other.m_current_index) {}
#else
template <
class OtherViewType,
std::enable_if_t<std::is_constructible_v<view_type, OtherViewType> &&
!std::is_convertible_v<OtherViewType, view_type>,
int> = 0>
KOKKOS_FUNCTION explicit RandomAccessIterator(
const RandomAccessIterator<OtherViewType>& other)
: m_view(other.m_view), m_current_index(other.m_current_index) {}
template <class OtherViewType,
std::enable_if_t<std::is_convertible_v<OtherViewType, view_type>,
int> = 0>
KOKKOS_FUNCTION RandomAccessIterator(
const RandomAccessIterator<OtherViewType>& other)
: m_view(other.m_view), m_current_index(other.m_current_index) {}
#endif
KOKKOS_FUNCTION KOKKOS_FUNCTION
iterator_type& operator++() { iterator_type& operator++() {
++m_current_index; ++m_current_index;
@ -152,9 +176,16 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
KOKKOS_FUNCTION KOKKOS_FUNCTION
reference operator*() const { return m_view(m_current_index); } reference operator*() const { return m_view(m_current_index); }
KOKKOS_FUNCTION
view_type view() const { return m_view; }
private: private:
view_type m_view; view_type m_view;
ptrdiff_t m_current_index = 0; ptrdiff_t m_current_index = 0;
// Needed for the converting constructor accepting another iterator
template <class>
friend class RandomAccessIterator;
}; };
} // namespace Impl } // namespace Impl

View File

@ -175,8 +175,9 @@ KOKKOS_FUNCTION OutputIterator unique_copy_team_impl(
d_first + count); d_first + count);
} }
#if defined KOKKOS_COMPILER_INTEL || \ #if defined KOKKOS_COMPILER_INTEL || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130) (defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
__builtin_unreachable(); __builtin_unreachable();
#endif #endif
} }

View File

@ -46,6 +46,44 @@ TEST_F(random_access_iterator_test, constructor) {
EXPECT_TRUE(true); EXPECT_TRUE(true);
} }
TEST_F(random_access_iterator_test, constructiblity) {
auto first_d = KE::begin(m_dynamic_view);
auto cfirst_d = KE::cbegin(m_dynamic_view);
static_assert(std::is_constructible_v<decltype(cfirst_d), decltype(first_d)>);
static_assert(
!std::is_constructible_v<decltype(first_d), decltype(cfirst_d)>);
[[maybe_unused]] decltype(cfirst_d) tmp_cfirst_d(first_d);
auto first_s = KE::begin(m_static_view);
auto cfirst_s = KE::cbegin(m_static_view);
static_assert(std::is_constructible_v<decltype(cfirst_s), decltype(first_s)>);
static_assert(
!std::is_constructible_v<decltype(first_s), decltype(cfirst_s)>);
[[maybe_unused]] decltype(cfirst_s) tmp_cfirst_s(first_s);
auto first_st = KE::begin(m_strided_view);
auto cfirst_st = KE::cbegin(m_strided_view);
static_assert(
std::is_constructible_v<decltype(cfirst_st), decltype(first_st)>);
static_assert(
!std::is_constructible_v<decltype(first_st), decltype(cfirst_st)>);
[[maybe_unused]] decltype(cfirst_st) tmp_cfirst_st(first_st);
// [FIXME] Better to have tests for the explicit specifier with an expression.
// As soon as View converting constructors are re-implemented with a
// conditional explicit, we may add those tests.
static_assert(std::is_constructible_v<decltype(first_s), decltype(first_d)>);
static_assert(std::is_constructible_v<decltype(first_st), decltype(first_d)>);
static_assert(std::is_constructible_v<decltype(first_d), decltype(first_s)>);
static_assert(std::is_constructible_v<decltype(first_st), decltype(first_s)>);
static_assert(std::is_constructible_v<decltype(first_d), decltype(first_st)>);
static_assert(std::is_constructible_v<decltype(first_s), decltype(first_st)>);
EXPECT_TRUE(true);
}
template <class IteratorType, class ValueType> template <class IteratorType, class ValueType>
void test_random_access_it_verify(IteratorType it, ValueType gold_value) { void test_random_access_it_verify(IteratorType it, ValueType gold_value) {
using view_t = Kokkos::View<typename IteratorType::value_type>; using view_t = Kokkos::View<typename IteratorType::value_type>;

View File

@ -69,7 +69,7 @@ void iota(ExecutionSpace const &space, ViewType const &v,
typename ViewType::value_type value = 0) { typename ViewType::value_type value = 0) {
using ValueType = typename ViewType::value_type; using ValueType = typename ViewType::value_type;
Kokkos::parallel_for( Kokkos::parallel_for(
"ArborX::Algorithms::iota", "Kokkos::Algorithms::iota",
Kokkos::RangePolicy<ExecutionSpace>(space, 0, v.extent(0)), Kokkos::RangePolicy<ExecutionSpace>(space, 0, v.extent(0)),
KOKKOS_LAMBDA(int i) { v(i) = value + (ValueType)i; }); KOKKOS_LAMBDA(int i) { v(i) = value + (ValueType)i; });
} }
@ -87,6 +87,18 @@ TEST(TEST_CATEGORY, SortByKeyEmptyView) {
Kokkos::Experimental::sort_by_key(ExecutionSpace(), keys, values)); Kokkos::Experimental::sort_by_key(ExecutionSpace(), keys, values));
} }
// Test #7036
TEST(TEST_CATEGORY, SortByKeyEmptyViewHost) {
using ExecutionSpace = Kokkos::DefaultHostExecutionSpace;
// does not matter if we use int or something else
Kokkos::View<int *, ExecutionSpace> keys("keys", 0);
Kokkos::View<float *, ExecutionSpace> values("values", 0);
ASSERT_NO_THROW(
Kokkos::Experimental::sort_by_key(ExecutionSpace(), keys, values));
}
TEST(TEST_CATEGORY, SortByKey) { TEST(TEST_CATEGORY, SortByKey) {
using ExecutionSpace = TEST_EXECSPACE; using ExecutionSpace = TEST_EXECSPACE;
using MemorySpace = typename ExecutionSpace::memory_space; using MemorySpace = typename ExecutionSpace::memory_space;

View File

@ -81,5 +81,114 @@ TEST(std_algorithms, is_admissible_to_std_algorithms) {
strided_view_3d_t>::value); strided_view_3d_t>::value);
} }
TEST(std_algorithms, expect_no_overlap) {
namespace KE = Kokkos::Experimental;
using value_type = double;
static constexpr size_t extent0 = 13;
//-------------
// 1d views
//-------------
using static_view_1d_t = Kokkos::View<value_type[extent0]>;
[[maybe_unused]] static_view_1d_t static_view_1d{
"std-algo-test-1d-contiguous-view-static"};
using dyn_view_1d_t = Kokkos::View<value_type*>;
[[maybe_unused]] dyn_view_1d_t dynamic_view_1d{
"std-algo-test-1d-contiguous-view-dynamic", extent0};
using strided_view_1d_t = Kokkos::View<value_type*, Kokkos::LayoutStride>;
Kokkos::LayoutStride layout1d{extent0, 2};
strided_view_1d_t strided_view_1d{"std-algo-test-1d-strided-view", layout1d};
// Overlapping because iterators are identical
#if defined(KOKKOS_ENABLE_DEBUG)
auto first_s = KE::begin(static_view_1d);
auto last_s = first_s + extent0;
EXPECT_DEATH({ KE::Impl::expect_no_overlap(first_s, last_s, first_s); },
"Kokkos contract violation:.*");
auto first_d = KE::begin(dynamic_view_1d);
auto last_d = first_d + extent0;
EXPECT_DEATH({ KE::Impl::expect_no_overlap(first_d, last_d, first_d); },
"Kokkos contract violation:.*");
auto first_st = KE::begin(strided_view_1d);
auto last_st = first_st + extent0;
EXPECT_DEATH({ KE::Impl::expect_no_overlap(first_st, last_st, first_st); },
"Kokkos contract violation:.*");
#endif
// Ranges are overlapped
static constexpr size_t sub_extent0 = 6, offset0 = 3;
std::pair<size_t, size_t> range0(0, sub_extent0),
range1(offset0, offset0 + sub_extent0);
#if defined(KOKKOS_ENABLE_DEBUG)
auto static_view_1d_0 = Kokkos::subview(static_view_1d, range0);
auto static_view_1d_1 = Kokkos::subview(static_view_1d, range1);
auto first_s0 = KE::begin(static_view_1d_0); // [0, 6)
auto last_s0 = first_s0 + static_view_1d_0.extent(0);
auto first_s1 = KE::begin(static_view_1d_1); // [3, 9)
EXPECT_DEATH({ KE::Impl::expect_no_overlap(first_s0, last_s0, first_s1); },
"Kokkos contract violation:.*");
auto dynamic_view_1d_0 = Kokkos::subview(dynamic_view_1d, range0);
auto dynamic_view_1d_1 = Kokkos::subview(dynamic_view_1d, range1);
auto first_d0 = KE::begin(dynamic_view_1d_0); // [0, 6)
auto last_d0 = first_d0 + dynamic_view_1d_0.extent(0);
auto first_d1 = KE::begin(dynamic_view_1d_1); // [3, 9)
EXPECT_DEATH({ KE::Impl::expect_no_overlap(first_d0, last_d0, first_d1); },
"Kokkos contract violation:.*");
#endif
auto strided_view_1d_0 = Kokkos::subview(strided_view_1d, range0);
auto strided_view_1d_1 = Kokkos::subview(strided_view_1d, range1);
auto first_st0 = KE::begin(strided_view_1d_0); // [0, 12)
auto last_st0 = first_st0 + strided_view_1d_0.extent(0);
auto first_st1 = KE::begin(strided_view_1d_1); // [3, 15)
// Does not overlap since offset (=3) is not divisible by stride (=2)
EXPECT_NO_THROW(
{ KE::Impl::expect_no_overlap(first_st0, last_st0, first_st1); });
// Iterating over the same range without overlapping
Kokkos::View<value_type[2][extent0], Kokkos::LayoutLeft> static_view_2d{
"std-algo-test-2d-contiguous-view-static"};
auto sub_static_view_1d_0 = Kokkos::subview(static_view_2d, 0, Kokkos::ALL);
auto sub_static_view_1d_1 = Kokkos::subview(static_view_2d, 1, Kokkos::ALL);
auto sub_first_s0 = KE::begin(sub_static_view_1d_0); // 0, 2, 4, ...
auto sub_last_s0 = sub_first_s0 + sub_static_view_1d_0.extent(0);
auto sub_first_s1 = KE::begin(sub_static_view_1d_1); // 1, 3, 5, ...
EXPECT_NO_THROW({
KE::Impl::expect_no_overlap(sub_first_s0, sub_last_s0, sub_first_s1);
});
Kokkos::View<value_type**, Kokkos::LayoutLeft> dynamic_view_2d{
"std-algo-test-2d-contiguous-view-dynamic", 2, extent0};
auto sub_dynamic_view_1d_0 = Kokkos::subview(dynamic_view_2d, 0, Kokkos::ALL);
auto sub_dynamic_view_1d_1 = Kokkos::subview(dynamic_view_2d, 1, Kokkos::ALL);
auto sub_first_d0 = KE::begin(sub_dynamic_view_1d_0); // 0, 2, 4, ...
auto sub_last_d0 = sub_first_d0 + sub_dynamic_view_1d_0.extent(0);
auto sub_first_d1 = KE::begin(sub_dynamic_view_1d_1); // 1, 3, 5, ...
EXPECT_NO_THROW({
KE::Impl::expect_no_overlap(sub_first_d0, sub_last_d0, sub_first_d1);
});
Kokkos::LayoutStride layout2d{2, 3, extent0, 2 * 3};
Kokkos::View<value_type**, Kokkos::LayoutStride> strided_view_2d{
"std-algo-test-2d-contiguous-view-strided", layout2d};
auto sub_strided_view_1d_0 = Kokkos::subview(strided_view_2d, 0, Kokkos::ALL);
auto sub_strided_view_1d_1 = Kokkos::subview(strided_view_2d, 1, Kokkos::ALL);
auto sub_first_st0 = KE::begin(sub_strided_view_1d_0); // 0, 6, 12, ...
auto sub_last_st0 = sub_first_st0 + sub_strided_view_1d_0.extent(0);
auto sub_first_st1 = KE::begin(sub_strided_view_1d_1); // 1, 7, 13, ...
EXPECT_NO_THROW({
KE::Impl::expect_no_overlap(sub_first_st0, sub_last_st0, sub_first_st1);
});
}
} // namespace stdalgos } // namespace stdalgos
} // namespace Test } // namespace Test

View File

@ -85,7 +85,7 @@ struct TestFunctorA {
break; break;
} }
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
case 2: { case 2: {
auto it = KE::exclusive_scan( auto it = KE::exclusive_scan(
@ -213,7 +213,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
break; break;
} }
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
case 2: case 2:
case 3: { case 3: {
auto it = exclusive_scan(KE::cbegin(rowFrom), KE::cend(rowFrom), auto it = exclusive_scan(KE::cbegin(rowFrom), KE::cend(rowFrom),
@ -242,7 +242,7 @@ template <class LayoutTag, class ValueType, class InPlaceOrVoid = void>
void run_all_scenarios() { void run_all_scenarios() {
for (int numTeams : teamSizesToTest) { for (int numTeams : teamSizesToTest) {
for (const auto& numCols : {0, 1, 2, 13, 101, 1444, 8153}) { for (const auto& numCols : {0, 1, 2, 13, 101, 1444, 8153}) {
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
for (int apiId : {0, 1, 2, 3}) { for (int apiId : {0, 1, 2, 3}) {
#else #else
for (int apiId : {0, 1}) { for (int apiId : {0, 1}) {

View File

@ -52,7 +52,7 @@ struct TestFunctorA {
Kokkos::single(Kokkos::PerTeam(member), Kokkos::single(Kokkos::PerTeam(member),
[=, *this]() { m_returnsView(myRowIndex) = result; }); [=, *this]() { m_returnsView(myRowIndex) = result; });
} }
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
else if (m_apiPick == 2) { else if (m_apiPick == 2) {
using value_type = typename ViewType::value_type; using value_type = typename ViewType::value_type;
result = KE::is_sorted(member, KE::cbegin(myRowView), KE::cend(myRowView), result = KE::is_sorted(member, KE::cbegin(myRowView), KE::cend(myRowView),
@ -179,7 +179,7 @@ template <class LayoutTag, class ValueType>
void run_all_scenarios(bool makeDataSortedOnPurpose) { void run_all_scenarios(bool makeDataSortedOnPurpose) {
for (int numTeams : teamSizesToTest) { for (int numTeams : teamSizesToTest) {
for (const auto& numCols : {0, 1, 2, 13, 101, 1444, 5153}) { for (const auto& numCols : {0, 1, 2, 13, 101, 1444, 5153}) {
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
for (int apiId : {0, 1, 2, 3}) { for (int apiId : {0, 1, 2, 3}) {
#else #else
for (int apiId : {0, 1}) { for (int apiId : {0, 1}) {

View File

@ -73,7 +73,7 @@ struct TestFunctorA {
m_distancesView(myRowIndex) = resultDist; m_distancesView(myRowIndex) = resultDist;
}); });
} }
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
else if (m_apiPick == 2) { else if (m_apiPick == 2) {
using value_type = typename ViewType::value_type; using value_type = typename ViewType::value_type;
auto it = KE::is_sorted_until(member, KE::cbegin(myRowView), auto it = KE::is_sorted_until(member, KE::cbegin(myRowView),
@ -226,7 +226,7 @@ template <class LayoutTag, class ValueType>
void run_all_scenarios(const std::string& name, const std::vector<int>& cols) { void run_all_scenarios(const std::string& name, const std::vector<int>& cols) {
for (int numTeams : teamSizesToTest) { for (int numTeams : teamSizesToTest) {
for (const auto& numCols : cols) { for (const auto& numCols : cols) {
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
for (int apiId : {0, 1, 2, 3}) { for (int apiId : {0, 1, 2, 3}) {
#else #else
for (int apiId : {0, 1}) { for (int apiId : {0, 1}) {

View File

@ -59,7 +59,7 @@ struct TestFunctorA {
m_distancesView(myRowIndex) = resultDist; m_distancesView(myRowIndex) = resultDist;
}); });
} }
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
else if (m_apiPick == 2) { else if (m_apiPick == 2) {
using value_type = typename ViewType::value_type; using value_type = typename ViewType::value_type;
auto it = auto it =
@ -170,7 +170,7 @@ void run_all_scenarios() {
} }
TEST(std_algorithms_max_element_team_test, test) { TEST(std_algorithms_max_element_team_test, test) {
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
run_all_scenarios<DynamicTag, int>(); run_all_scenarios<DynamicTag, int>();
run_all_scenarios<StridedTwoRowsTag, double>(); run_all_scenarios<StridedTwoRowsTag, double>();
run_all_scenarios<StridedThreeRowsTag, int>(); run_all_scenarios<StridedThreeRowsTag, int>();

View File

@ -59,7 +59,7 @@ struct TestFunctorA {
m_distancesView(myRowIndex) = resultDist; m_distancesView(myRowIndex) = resultDist;
}); });
} }
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
else if (m_apiPick == 2) { else if (m_apiPick == 2) {
using value_type = typename ViewType::value_type; using value_type = typename ViewType::value_type;
auto it = auto it =
@ -169,7 +169,7 @@ void run_all_scenarios() {
} }
TEST(std_algorithms_min_element_team_test, test) { TEST(std_algorithms_min_element_team_test, test) {
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
run_all_scenarios<DynamicTag, int>(); run_all_scenarios<DynamicTag, int>();
run_all_scenarios<StridedTwoRowsTag, double>(); run_all_scenarios<StridedTwoRowsTag, double>();
run_all_scenarios<StridedThreeRowsTag, int>(); run_all_scenarios<StridedThreeRowsTag, int>();

View File

@ -66,7 +66,7 @@ struct TestFunctorA {
m_distancesView(myRowIndex, 1) = resultDist2; m_distancesView(myRowIndex, 1) = resultDist2;
}); });
} }
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
else if (m_apiPick == 2) { else if (m_apiPick == 2) {
using value_type = typename ViewType::value_type; using value_type = typename ViewType::value_type;
auto itPair = auto itPair =
@ -188,7 +188,7 @@ void run_all_scenarios() {
} }
TEST(std_algorithms_minmax_element_team_test, test) { TEST(std_algorithms_minmax_element_team_test, test) {
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
run_all_scenarios<DynamicTag, int>(); run_all_scenarios<DynamicTag, int>();
run_all_scenarios<StridedTwoRowsTag, double>(); run_all_scenarios<StridedTwoRowsTag, double>();
run_all_scenarios<StridedThreeRowsTag, int>(); run_all_scenarios<StridedThreeRowsTag, int>();

View File

@ -16,7 +16,7 @@
#include <TestStdAlgorithmsCommon.hpp> #include <TestStdAlgorithmsCommon.hpp>
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
namespace Test { namespace Test {
namespace stdalgos { namespace stdalgos {

View File

@ -16,7 +16,7 @@
#include <TestStdAlgorithmsCommon.hpp> #include <TestStdAlgorithmsCommon.hpp>
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
namespace Test { namespace Test {
namespace stdalgos { namespace stdalgos {

View File

@ -16,7 +16,7 @@
#include <TestStdAlgorithmsCommon.hpp> #include <TestStdAlgorithmsCommon.hpp>
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
namespace Test { namespace Test {
namespace stdalgos { namespace stdalgos {

View File

@ -16,7 +16,7 @@
#include <TestStdAlgorithmsCommon.hpp> #include <TestStdAlgorithmsCommon.hpp>
#if not defined KOKKOS_ENABLE_OPENMPTARGET #ifndef KOKKOS_ENABLE_OPENMPTARGET
namespace Test { namespace Test {
namespace stdalgos { namespace stdalgos {

View File

@ -5,6 +5,6 @@ build_script:
- cmd: >- - cmd: >-
mkdir build && mkdir build &&
cd build && cd build &&
cmake c:\projects\source -DKokkos_ENABLE_TESTS=ON -DCMAKE_CXX_FLAGS="/W0 /EHsc" -DKokkos_ENABLE_DEPRECATED_CODE_4=ON -DKokkos_ENABLE_DEPRECATION_WARNINGS=OFF && cmake c:\projects\source -DKokkos_ENABLE_IMPL_MDSPAN=OFF -DKokkos_ENABLE_TESTS=ON -DCMAKE_CXX_FLAGS="/W0 /EHsc" -DKokkos_ENABLE_DEPRECATED_CODE_4=ON -DKokkos_ENABLE_DEPRECATION_WARNINGS=OFF &&
cmake --build . --target install && cmake --build . --target install &&
ctest -C Debug --output-on-failure ctest -C Debug --output-on-failure

View File

@ -4,7 +4,7 @@ KOKKOS_ADD_BENCHMARK_DIRECTORIES(gather)
KOKKOS_ADD_BENCHMARK_DIRECTORIES(gups) KOKKOS_ADD_BENCHMARK_DIRECTORIES(gups)
KOKKOS_ADD_BENCHMARK_DIRECTORIES(launch_latency) KOKKOS_ADD_BENCHMARK_DIRECTORIES(launch_latency)
KOKKOS_ADD_BENCHMARK_DIRECTORIES(stream) KOKKOS_ADD_BENCHMARK_DIRECTORIES(stream)
KOKKOS_ADD_BENCHMARK_DIRECTORIES(view_copy_constructor)
#FIXME_OPENMPTARGET - These two benchmarks cause ICE. Commenting them for now but a deeper analysis on the cause and a possible fix will follow. #FIXME_OPENMPTARGET - These two benchmarks cause ICE. Commenting them for now but a deeper analysis on the cause and a possible fix will follow.
IF(NOT Kokkos_ENABLE_OPENMPTARGET) IF(NOT Kokkos_ENABLE_OPENMPTARGET)
KOKKOS_ADD_BENCHMARK_DIRECTORIES(policy_performance) KOKKOS_ADD_BENCHMARK_DIRECTORIES(policy_performance)

View File

@ -0,0 +1,4 @@
KOKKOS_ADD_EXECUTABLE(
view_copy_constructor
SOURCES view_copy_constructor.cpp
)

View File

@ -0,0 +1,46 @@
KOKKOS_DEVICES=Serial
KOKKOS_ARCH = ""
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
ifndef KOKKOS_PATH
KOKKOS_PATH = $(MAKEFILE_PATH)../..
endif
SRC = $(wildcard $(MAKEFILE_PATH)*.cpp)
HEADERS = $(wildcard $(MAKEFILE_PATH)*.hpp)
vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
CXX = clang++
EXE = view_copy_constructor.exe
CXXFLAGS ?= -Ofast
override CXXFLAGS += -I$(MAKEFILE_PATH)
DEPFLAGS = -M
LINK = ${CXX}
LINKFLAGS = -Ofast
KOKKOS_CXX_STANDARD=c++20
OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o view_copy_constructor.cuda view_copy_constructor.exe
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(HEADERS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)

View File

@ -0,0 +1,310 @@
//@HEADER
// ************************************************************************
//
// Kokkos v. 4.0
// Copyright (2022) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Part of Kokkos, under the Apache License v2.0 with LLVM Exceptions.
// See https://kokkos.org/LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//@HEADER
// The function "test_view_collection" exposes the copy constructor
// and destructor overheads in Kokkos View objects
// Please see the lines marked by "NOTE".
#include <limits>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <sys/time.h>
#include <Kokkos_Core.hpp>
#include <iostream>
// NVIEWS is the number of Kokkos View objects in our ViewCollection object
// We have chosen a large value of 40 to make it easier to see performance
// differences when using the likelihood attribute
#define NVIEWS 40
class ViewCollection {
public:
Kokkos::View<double*> v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13,
v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28,
v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40;
double m_expected_sum;
double m_side_effect;
int m_N;
ViewCollection(int N)
: v1("v1", N),
v2("v2", N),
v3("v3", N),
v4("v4", N),
v5("v5", N),
v6("v6", N),
v7("v7", N),
v8("v8", N),
v9("v9", N),
v10("v10", N),
v11("v11", N),
v12("v12", N),
v13("v13", N),
v14("v14", N),
v15("v15", N),
v16("v16", N),
v17("v17", N),
v18("v18", N),
v19("v19", N),
v20("v20", N),
v21("v21", N),
v22("v22", N),
v23("v23", N),
v24("v24", N),
v25("v25", N),
v26("v26", N),
v27("v27", N),
v28("v28", N),
v29("v29", N),
v30("v30", N),
v31("v31", N),
v32("v32", N),
v33("v33", N),
v34("v34", N),
v35("v35", N),
v36("v36", N),
v37("v37", N),
v38("v38", N),
v39("v39", N),
v40("v40", N),
m_expected_sum(N * NVIEWS),
m_side_effect(0.0),
m_N(N) {
for (int i = 0; i < N; ++i) {
v1(i) = 1;
v2(i) = 1;
v3(i) = 1;
v4(i) = 1;
v5(i) = 1;
v6(i) = 1;
v7(i) = 1;
v8(i) = 1;
v9(i) = 1;
v10(i) = 1;
v11(i) = 1;
v12(i) = 1;
v13(i) = 1;
v14(i) = 1;
v15(i) = 1;
v16(i) = 1;
v17(i) = 1;
v18(i) = 1;
v19(i) = 1;
v20(i) = 1;
v21(i) = 1;
v22(i) = 1;
v23(i) = 1;
v24(i) = 1;
v25(i) = 1;
v26(i) = 1;
v27(i) = 1;
v28(i) = 1;
v29(i) = 1;
v30(i) = 1;
v31(i) = 1;
v32(i) = 1;
v33(i) = 1;
v34(i) = 1;
v35(i) = 1;
v36(i) = 1;
v37(i) = 1;
v38(i) = 1;
v39(i) = 1;
v40(i) = 1;
}
}
// The ADD_COPY_CONSTRUCTOR macro is helpful to compare time in the copy
// constructor between compilers. We have found that the GNU compiler
// is sometimes able to inline the default copy constructor.
#ifdef ADD_COPY_CONSTRUCTOR
__attribute__((noinline)) ViewCollection(const ViewCollection& other)
: v1(other.v1),
v2(other.v2),
v3(other.v3),
v4(other.v4),
v5(other.v5),
v6(other.v6),
v7(other.v7),
v8(other.v8),
v9(other.v9),
v10(other.v10),
v11(other.v11),
v12(other.v12),
v13(other.v13),
v14(other.v14),
v15(other.v15),
v16(other.v16),
v17(other.v17),
v18(other.v18),
v19(other.v19),
v20(other.v20),
v21(other.v21),
v22(other.v22),
v23(other.v23),
v24(other.v24),
v25(other.v25),
v26(other.v26),
v27(other.v27),
v28(other.v28),
v29(other.v29),
v30(other.v30),
v31(other.v31),
v32(other.v32),
v33(other.v33),
v34(other.v34),
v35(other.v35),
v36(other.v36),
v37(other.v37),
v38(other.v38),
v39(other.v39),
v40(other.v40),
m_expected_sum(other.m_expected_sum),
m_side_effect(other.m_side_effect),
m_N(other.m_N) {}
#endif
KOKKOS_INLINE_FUNCTION
double sum_views(int ii, bool execute_kernel) {
double result = 0.0;
if (execute_kernel) {
// This code is only executed when using the command line option -k
// The computation references all Kokkos views. This may help our
// effort to stop compilers from optimizing away the Kokkos views
for (int i = 0; i < m_N; ++i) {
result += v1(i) + v2(i) + v3(i) + v4(i) + v5(i) + v6(i) + v7(i) +
v8(i) + v9(i) + v10(i) + v11(i) + v12(i) + v13(i) + v14(i) +
v15(i) + v16(i) + v17(i) + v18(i) + v19(i) + v20(i) + v21(i) +
v22(i) + v23(i) + v24(i) + v25(i) + v26(i) + v27(i) + v28(i) +
v29(i) + v30(i) + v31(i) + v32(i) + v33(i) + v34(i) + v35(i) +
v36(i) + v37(i) + v38(i) + v39(i) + v40(i);
}
} else {
result = m_expected_sum;
}
// This statement introduces a side effect that may help our effort to
// stop compilers from optimizing away the temporary ViewCollection object
m_side_effect = result * (ii + 1);
return result;
}
};
void test_view_collection_kk(int N, int num_iter, bool execute_kernel) {
ViewCollection view_collection(N);
Kokkos::Timer view_collection_timer;
double max_value = 0.0;
// Max Reduction boilerplate code taken from slide 53 of
// kokkos-tutorials/LectureSeries/KokkosTutorial_02_ViewsAndSpaces.pdf
Kokkos::parallel_reduce(
"collection-reduction", num_iter,
KOKKOS_LAMBDA(int i, double& valueToUpdate) {
// NOTE: The following lines expose the Kokkos View overheads
ViewCollection tmp_view_collection = view_collection;
double my_value = tmp_view_collection.sum_views(i, execute_kernel);
if (my_value > valueToUpdate) valueToUpdate = my_value;
},
Kokkos::Max<double>(max_value));
double view_collection_time = view_collection_timer.seconds();
bool success = std::fabs(max_value - N * NVIEWS) < 1.E-6;
std::cout << "View Time = " << view_collection_time << " seconds"
<< std::endl;
if (success) {
std::cout << "Kokkos run:" << std::endl;
std::cout << "SUCCESS" << std::endl;
} else {
std::cout << "FAILURE" << std::endl;
}
}
void test_view_collection_serial(int N, int num_iter, bool execute_kernel) {
ViewCollection view_collection(N);
Kokkos::Timer view_collection_timer;
double max_value = 0.0;
// Max Reduction boilerplate code taken from slide 53 of
// kokkos-tutorials/LectureSeries/KokkosTutorial_02_ViewsAndSpaces.pdf
for (int i = 0; i < num_iter; ++i) {
// NOTE: The following lines expose the Kokkos View overheads
ViewCollection tmp_view_collection = view_collection;
double my_value = tmp_view_collection.sum_views(i, execute_kernel);
if (my_value > max_value) max_value = my_value;
}
double view_collection_time = view_collection_timer.seconds();
bool success = std::fabs(max_value - N * NVIEWS) < 1.E-6;
std::cout << "View Time 2 = " << view_collection_time << " seconds"
<< std::endl;
if (success) {
std::cout << "Serial run:" << std::endl;
std::cout << "SUCCESS" << std::endl;
} else {
std::cout << "FAILURE" << std::endl;
}
}
int main(int argc, char* argv[]) {
// The benchmark is only testing reference counting for views on host.
#if defined(KOKKOS_ENABLE_OPENMP) || defined(KOKKOS_ENABLE_SERIAL) || \
defined(KOKKOS_ENABLE_THREADS) || defined(KOKKOS_ENABLE_HPX)
int N = 1;
int num_iter = 1 << 27;
bool execute_kernel = false;
for (int i = 0; i < argc; i++) {
if ((strcmp(argv[i], "-N") == 0)) {
N = atoi(argv[++i]);
if (N < 1) {
std::cout << "Array extent must be >= 1" << std::endl;
exit(1);
}
} else if (strcmp(argv[i], "-i") == 0) {
num_iter = atoi(argv[++i]);
if (num_iter < 1) {
std::cout << "Number of iterations must be >= 1" << std::endl;
exit(1);
}
} else if (strcmp(argv[i], "-k") == 0) {
execute_kernel = true;
} else if ((strcmp(argv[i], "-h") == 0)) {
printf(" Options:\n");
printf(" -N <int>: Array extent\n");
printf(" -i <int>: Number of iterations\n");
printf(" -k: Execute the summation kernel\n");
printf(" -h: Print this message\n\n");
exit(1);
}
}
std::cout << "Array extent = " << N << std::endl;
std::cout << "Iterations = " << num_iter << std::endl;
std::cout << "Execute summation kernel = " << std::boolalpha << execute_kernel
<< std::noboolalpha << std::endl;
// Test inside a Kokkos kernel.
Kokkos::initialize(argc, argv);
{ test_view_collection_kk(N, num_iter, execute_kernel); }
// Test outside Kokkos kernel.
test_view_collection_serial(N, num_iter, execute_kernel);
Kokkos::finalize();
#endif
return 0;
}

View File

@ -233,7 +233,7 @@ do
cuda_args="$cuda_args $1" cuda_args="$cuda_args $1"
;; ;;
#Handle more known nvcc args #Handle more known nvcc args
--extended-lambda|--expt-extended-lambda|--expt-relaxed-constexpr|--Wno-deprecated-gpu-targets|-Wno-deprecated-gpu-targets|-allow-unsupported-compiler|--allow-unsupported-compiler) --extended-lambda|--expt-extended-lambda|--expt-relaxed-constexpr|--Wno-deprecated-gpu-targets|-Wno-deprecated-gpu-targets|-allow-unsupported-compiler|--allow-unsupported-compiler|--disable-warnings)
cuda_args="$cuda_args $1" cuda_args="$cuda_args $1"
;; ;;
#Handle known nvcc args that have an argument #Handle known nvcc args that have an argument

View File

@ -1,6 +1,5 @@
TRIBITS_PACKAGE_DEFINE_DEPENDENCIES( TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
LIB_OPTIONAL_TPLS Pthread CUDA HWLOC DLlib LIB_OPTIONAL_TPLS Pthread CUDA HWLOC DLlib
TEST_OPTIONAL_TPLS CUSPARSE
) )
TRIBITS_TPL_TENTATIVELY_ENABLE(DLlib) TRIBITS_TPL_TENTATIVELY_ENABLE(DLlib)

View File

@ -225,8 +225,13 @@ FUNCTION(kokkos_compilation)
# if built w/o CUDA support, we want to basically make this a no-op # if built w/o CUDA support, we want to basically make this a no-op
SET(_Kokkos_ENABLE_CUDA @Kokkos_ENABLE_CUDA@) SET(_Kokkos_ENABLE_CUDA @Kokkos_ENABLE_CUDA@)
IF(CMAKE_VERSION VERSION_GREATER_EQUAL 3.17)
SET(MAYBE_CURRENT_INSTALLATION_ROOT "${CMAKE_CURRENT_FUNCTION_LIST_DIR}/../../..")
ENDIF()
# search relative first and then absolute # search relative first and then absolute
SET(_HINTS "${CMAKE_CURRENT_LIST_DIR}/../.." "@CMAKE_INSTALL_PREFIX@") SET(_HINTS "${MAYBE_CURRENT_INSTALLATION_ROOT}" "@CMAKE_INSTALL_PREFIX@")
# find kokkos_launch_compiler # find kokkos_launch_compiler
FIND_PROGRAM(Kokkos_COMPILE_LAUNCHER FIND_PROGRAM(Kokkos_COMPILE_LAUNCHER

View File

@ -52,6 +52,8 @@
#cmakedefine KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION // deprecated #cmakedefine KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION // deprecated
#cmakedefine KOKKOS_ENABLE_AGGRESSIVE_VECTORIZATION #cmakedefine KOKKOS_ENABLE_AGGRESSIVE_VECTORIZATION
#cmakedefine KOKKOS_ENABLE_IMPL_MDSPAN #cmakedefine KOKKOS_ENABLE_IMPL_MDSPAN
#cmakedefine KOKKOS_ENABLE_IMPL_REF_COUNT_BRANCH_UNLIKELY
#cmakedefine KOKKOS_ENABLE_IMPL_VIEW_OF_VIEWS_DESTRUCTOR_PRECONDITION_VIOLATION_WORKAROUND
#cmakedefine KOKKOS_ENABLE_ATOMICS_BYPASS #cmakedefine KOKKOS_ENABLE_ATOMICS_BYPASS
/* TPL Settings */ /* TPL Settings */
@ -65,6 +67,7 @@
#cmakedefine KOKKOS_ARCH_ARMV8_THUNDERX #cmakedefine KOKKOS_ARCH_ARMV8_THUNDERX
#cmakedefine KOKKOS_ARCH_ARMV81 #cmakedefine KOKKOS_ARCH_ARMV81
#cmakedefine KOKKOS_ARCH_ARMV8_THUNDERX2 #cmakedefine KOKKOS_ARCH_ARMV8_THUNDERX2
#cmakedefine KOKKOS_ARCH_ARMV9_GRACE
#cmakedefine KOKKOS_ARCH_A64FX #cmakedefine KOKKOS_ARCH_A64FX
#cmakedefine KOKKOS_ARCH_AVX #cmakedefine KOKKOS_ARCH_AVX
#cmakedefine KOKKOS_ARCH_AVX2 #cmakedefine KOKKOS_ARCH_AVX2
@ -116,7 +119,6 @@
#cmakedefine KOKKOS_ARCH_AMD_GFX942 #cmakedefine KOKKOS_ARCH_AMD_GFX942
#cmakedefine KOKKOS_ARCH_AMD_GFX1030 #cmakedefine KOKKOS_ARCH_AMD_GFX1030
#cmakedefine KOKKOS_ARCH_AMD_GFX1100 #cmakedefine KOKKOS_ARCH_AMD_GFX1100
#cmakedefine KOKKOS_ARCH_AMD_GFX1103
#cmakedefine KOKKOS_ARCH_AMD_GPU #cmakedefine KOKKOS_ARCH_AMD_GPU
#cmakedefine KOKKOS_ARCH_VEGA // deprecated #cmakedefine KOKKOS_ARCH_VEGA // deprecated
#cmakedefine KOKKOS_ARCH_VEGA906 // deprecated #cmakedefine KOKKOS_ARCH_VEGA906 // deprecated

View File

@ -7,37 +7,38 @@ IF (NOT CUDAToolkit_ROOT)
ENDIF() ENDIF()
ENDIF() ENDIF()
# FIXME CMake 3.28.4 creates more targets than we export IF(KOKKOS_CXX_HOST_COMPILER_ID STREQUAL NVHPC AND CMAKE_VERSION VERSION_LESS "3.20.1")
IF(CMAKE_VERSION VERSION_GREATER_EQUAL "3.17.0" AND CMAKE_VERSION VERSION_LESS "3.28.4") MESSAGE(FATAL_ERROR "Using NVHPC as host compiler requires at least CMake 3.20.1")
find_package(CUDAToolkit)
ELSE()
include(${CMAKE_CURRENT_LIST_DIR}/CudaToolkit.cmake)
ENDIF() ENDIF()
IF(CMAKE_VERSION VERSION_GREATER_EQUAL "3.17.0")
IF (TARGET CUDA::cudart) find_package(CUDAToolkit REQUIRED)
SET(FOUND_CUDART TRUE)
KOKKOS_EXPORT_IMPORTED_TPL(CUDA::cudart)
ELSE()
SET(FOUND_CUDART FALSE)
ENDIF()
IF (TARGET CUDA::cuda_driver)
SET(FOUND_CUDA_DRIVER TRUE)
KOKKOS_EXPORT_IMPORTED_TPL(CUDA::cuda_driver)
ELSE()
SET(FOUND_CUDA_DRIVER FALSE)
ENDIF()
include(FindPackageHandleStandardArgs)
IF(KOKKOS_CXX_HOST_COMPILER_ID STREQUAL NVHPC)
SET(KOKKOS_CUDA_ERROR "Using NVHPC as host compiler requires at least CMake 3.20.1")
ELSE()
SET(KOKKOS_CUDA_ERROR DEFAULT_MSG)
ENDIF()
FIND_PACKAGE_HANDLE_STANDARD_ARGS(TPLCUDA ${KOKKOS_CUDA_ERROR} FOUND_CUDART FOUND_CUDA_DRIVER)
IF (FOUND_CUDA_DRIVER AND FOUND_CUDART)
KOKKOS_CREATE_IMPORTED_TPL(CUDA INTERFACE KOKKOS_CREATE_IMPORTED_TPL(CUDA INTERFACE
LINK_LIBRARIES CUDA::cuda_driver CUDA::cudart LINK_LIBRARIES CUDA::cuda_driver CUDA::cudart
) )
KOKKOS_EXPORT_CMAKE_TPL(CUDAToolkit REQUIRED)
ELSE()
include(${CMAKE_CURRENT_LIST_DIR}/CudaToolkit.cmake)
IF (TARGET CUDA::cudart)
SET(FOUND_CUDART TRUE)
KOKKOS_EXPORT_IMPORTED_TPL(CUDA::cudart)
ELSE()
SET(FOUND_CUDART FALSE)
ENDIF()
IF (TARGET CUDA::cuda_driver)
SET(FOUND_CUDA_DRIVER TRUE)
KOKKOS_EXPORT_IMPORTED_TPL(CUDA::cuda_driver)
ELSE()
SET(FOUND_CUDA_DRIVER FALSE)
ENDIF()
include(FindPackageHandleStandardArgs)
FIND_PACKAGE_HANDLE_STANDARD_ARGS(TPLCUDA ${DEFAULT_MSG} FOUND_CUDART FOUND_CUDA_DRIVER)
IF (FOUND_CUDA_DRIVER AND FOUND_CUDART)
KOKKOS_CREATE_IMPORTED_TPL(CUDA INTERFACE
LINK_LIBRARIES CUDA::cuda_driver CUDA::cudart
)
ENDIF()
ENDIF() ENDIF()

View File

@ -35,7 +35,6 @@ IF(NOT _CUDA_FAILURE)
GLOBAL_SET(TPL_CUDA_LIBRARY_DIRS) GLOBAL_SET(TPL_CUDA_LIBRARY_DIRS)
GLOBAL_SET(TPL_CUDA_INCLUDE_DIRS ${CUDA_TOOLKIT_INCLUDE}) GLOBAL_SET(TPL_CUDA_INCLUDE_DIRS ${CUDA_TOOLKIT_INCLUDE})
GLOBAL_SET(TPL_CUDA_LIBRARIES ${CUDA_CUDART_LIBRARY} ${CUDA_cublas_LIBRARY} ${CUDA_cufft_LIBRARY}) GLOBAL_SET(TPL_CUDA_LIBRARIES ${CUDA_CUDART_LIBRARY} ${CUDA_cublas_LIBRARY} ${CUDA_cufft_LIBRARY})
KOKKOS_CREATE_IMPORTED_TPL_LIBRARY(CUSPARSE)
ELSE() ELSE()
SET(TPL_ENABLE_CUDA OFF) SET(TPL_ENABLE_CUDA OFF)
ENDIF() ENDIF()

View File

@ -1,26 +0,0 @@
#@HEADER
# ************************************************************************
#
# Kokkos v. 4.0
# Copyright (2022) National Technology & Engineering
# Solutions of Sandia, LLC (NTESS).
#
# Under the terms of Contract DE-NA0003525 with NTESS,
# the U.S. Government retains certain rights in this software.
#
# Part of Kokkos, under the Apache License v2.0 with LLVM Exceptions.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
#
# ************************************************************************
# @HEADER
#include(${TRIBITS_DEPS_DIR}/CUDA.cmake)
#IF (TPL_ENABLE_CUDA)
# GLOBAL_SET(TPL_CUSPARSE_LIBRARY_DIRS)
# GLOBAL_SET(TPL_CUSPARSE_INCLUDE_DIRS ${TPL_CUDA_INCLUDE_DIRS})
# GLOBAL_SET(TPL_CUSPARSE_LIBRARIES ${CUDA_cusparse_LIBRARY})
# KOKKOS_CREATE_IMPORTED_TPL_LIBRARY(CUSPARSE)
#ENDIF()

View File

@ -118,14 +118,6 @@ FUNCTION(KOKKOS_ADD_TEST)
ENDIF() ENDIF()
ENDFUNCTION() ENDFUNCTION()
FUNCTION(KOKKOS_ADD_ADVANCED_TEST)
if (KOKKOS_HAS_TRILINOS)
TRIBITS_ADD_ADVANCED_TEST(${ARGN})
else()
# TODO Write this
endif()
ENDFUNCTION()
MACRO(KOKKOS_CREATE_IMPORTED_TPL_LIBRARY TPL_NAME) MACRO(KOKKOS_CREATE_IMPORTED_TPL_LIBRARY TPL_NAME)
ADD_INTERFACE_LIBRARY(TPL_LIB_${TPL_NAME}) ADD_INTERFACE_LIBRARY(TPL_LIB_${TPL_NAME})
TARGET_LINK_LIBRARIES(TPL_LIB_${TPL_NAME} LINK_PUBLIC ${TPL_${TPL_NAME}_LIBRARIES}) TARGET_LINK_LIBRARIES(TPL_LIB_${TPL_NAME} LINK_PUBLIC ${TPL_${TPL_NAME}_LIBRARIES})

View File

@ -28,6 +28,7 @@ KOKKOS_CHECK_DEPRECATED_OPTIONS(
#------------------------------------------------------------------------------- #-------------------------------------------------------------------------------
SET(KOKKOS_ARCH_LIST) SET(KOKKOS_ARCH_LIST)
include(CheckCXXCompilerFlag)
KOKKOS_DEPRECATED_LIST(ARCH ARCH) KOKKOS_DEPRECATED_LIST(ARCH ARCH)
@ -49,6 +50,7 @@ DECLARE_AND_CHECK_HOST_ARCH(ARMV81 "ARMv8.1 Compatible CPU")
DECLARE_AND_CHECK_HOST_ARCH(ARMV8_THUNDERX "ARMv8 Cavium ThunderX CPU") DECLARE_AND_CHECK_HOST_ARCH(ARMV8_THUNDERX "ARMv8 Cavium ThunderX CPU")
DECLARE_AND_CHECK_HOST_ARCH(ARMV8_THUNDERX2 "ARMv8 Cavium ThunderX2 CPU") DECLARE_AND_CHECK_HOST_ARCH(ARMV8_THUNDERX2 "ARMv8 Cavium ThunderX2 CPU")
DECLARE_AND_CHECK_HOST_ARCH(A64FX "ARMv8.2 with SVE Support") DECLARE_AND_CHECK_HOST_ARCH(A64FX "ARMv8.2 with SVE Support")
DECLARE_AND_CHECK_HOST_ARCH(ARMV9_GRACE "ARMv9 NVIDIA Grace CPU")
DECLARE_AND_CHECK_HOST_ARCH(SNB "Intel Sandy/Ivy Bridge CPUs") DECLARE_AND_CHECK_HOST_ARCH(SNB "Intel Sandy/Ivy Bridge CPUs")
DECLARE_AND_CHECK_HOST_ARCH(HSW "Intel Haswell CPUs") DECLARE_AND_CHECK_HOST_ARCH(HSW "Intel Haswell CPUs")
DECLARE_AND_CHECK_HOST_ARCH(BDW "Intel Broadwell Xeon E-class CPUs") DECLARE_AND_CHECK_HOST_ARCH(BDW "Intel Broadwell Xeon E-class CPUs")
@ -101,9 +103,9 @@ LIST(APPEND CORRESPONDING_AMD_FLAGS gfx90a gfx90a gfx908 gfx908)
LIST(APPEND SUPPORTED_AMD_GPUS MI50/60 MI50/60) LIST(APPEND SUPPORTED_AMD_GPUS MI50/60 MI50/60)
LIST(APPEND SUPPORTED_AMD_ARCHS VEGA906 AMD_GFX906) LIST(APPEND SUPPORTED_AMD_ARCHS VEGA906 AMD_GFX906)
LIST(APPEND CORRESPONDING_AMD_FLAGS gfx906 gfx906) LIST(APPEND CORRESPONDING_AMD_FLAGS gfx906 gfx906)
LIST(APPEND SUPPORTED_AMD_GPUS PHOENIX RX7900XTX V620/W6800 V620/W6800) LIST(APPEND SUPPORTED_AMD_GPUS RX7900XTX RX7900XTX V620/W6800 V620/W6800)
LIST(APPEND SUPPORTED_AMD_ARCHS AMD_GFX1103 AMD_GFX1100 NAVI1030 AMD_GFX1030) LIST(APPEND SUPPORTED_AMD_ARCHS NAVI1100 AMD_GFX1100 NAVI1030 AMD_GFX1030)
LIST(APPEND CORRESPONDING_AMD_FLAGS gfx1103 gfx1100 gfx1030 gfx1030) LIST(APPEND CORRESPONDING_AMD_FLAGS gfx1100 gfx1100 gfx1030 gfx1030)
#FIXME CAN BE REPLACED WITH LIST_ZIP IN CMAKE 3.17 #FIXME CAN BE REPLACED WITH LIST_ZIP IN CMAKE 3.17
FOREACH(ARCH IN LISTS SUPPORTED_AMD_ARCHS) FOREACH(ARCH IN LISTS SUPPORTED_AMD_ARCHS)
@ -189,12 +191,6 @@ IF (KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
ELSEIF(CUDAToolkit_BIN_DIR) ELSEIF(CUDAToolkit_BIN_DIR)
GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS --cuda-path=${CUDAToolkit_BIN_DIR}/..) GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS --cuda-path=${CUDAToolkit_BIN_DIR}/..)
ENDIF() ENDIF()
ELSEIF (KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
SET(CUDA_ARCH_FLAG "-gpu")
GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS -cuda)
IF (KOKKOS_ENABLE_CUDA) # FIXME ideally unreachable when CUDA not enabled
GLOBAL_APPEND(KOKKOS_LINK_OPTIONS -cuda)
ENDIF()
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
SET(CUDA_ARCH_FLAG "-arch") SET(CUDA_ARCH_FLAG "-arch")
ENDIF() ENDIF()
@ -209,6 +205,11 @@ ENDIF()
#------------------------------- KOKKOS_HIP_OPTIONS --------------------------- #------------------------------- KOKKOS_HIP_OPTIONS ---------------------------
KOKKOS_OPTION(IMPL_AMDGPU_FLAGS "" STRING "Set compiler flags for AMD GPUs")
KOKKOS_OPTION(IMPL_AMDGPU_LINK "" STRING "Set linker flags for AMD GPUs")
MARK_AS_ADVANCED(Kokkos_IMPL_AMDGPU_FLAGS)
MARK_AS_ADVANCED(Kokkos_IMPL_AMDGPU_LINK)
#clear anything that might be in the cache #clear anything that might be in the cache
GLOBAL_SET(KOKKOS_AMDGPU_OPTIONS) GLOBAL_SET(KOKKOS_AMDGPU_OPTIONS)
IF(KOKKOS_ENABLE_HIP) IF(KOKKOS_ENABLE_HIP)
@ -301,6 +302,20 @@ IF (KOKKOS_ARCH_A64FX)
) )
ENDIF() ENDIF()
IF (KOKKOS_ARCH_ARMV9_GRACE)
SET(KOKKOS_ARCH_ARM_NEON ON)
check_cxx_compiler_flag("-mcpu=neoverse-n2" COMPILER_SUPPORTS_NEOVERSE_N2)
check_cxx_compiler_flag("-msve-vector-bits=128" COMPILER_SUPPORTS_SVE_VECTOR_BITS)
IF (COMPILER_SUPPORTS_NEOVERSE_N2 AND COMPILER_SUPPORTS_SVE_VECTOR_BITS)
COMPILER_SPECIFIC_FLAGS(
COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID
DEFAULT -mcpu=neoverse-n2 -msve-vector-bits=128
)
ELSE()
MESSAGE(WARNING "Compiler does not support ARMv9 Grace architecture")
ENDIF()
ENDIF()
IF (KOKKOS_ARCH_ZEN) IF (KOKKOS_ARCH_ZEN)
COMPILER_SPECIFIC_FLAGS( COMPILER_SPECIFIC_FLAGS(
COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID
@ -535,17 +550,17 @@ IF (KOKKOS_CXX_HOST_COMPILER_ID STREQUAL NVHPC)
SET(KOKKOS_ARCH_AVX512XEON OFF) SET(KOKKOS_ARCH_AVX512XEON OFF)
ENDIF() ENDIF()
# FIXME_NVCC nvcc doesn't seem to support Arm Neon.
IF(KOKKOS_ARCH_ARM_NEON AND KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
UNSET(KOKKOS_ARCH_ARM_NEON)
ENDIF()
IF (NOT KOKKOS_COMPILE_LANGUAGE STREQUAL CUDA) IF (NOT KOKKOS_COMPILE_LANGUAGE STREQUAL CUDA)
IF (KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE) IF (KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE)
COMPILER_SPECIFIC_FLAGS( COMPILER_SPECIFIC_FLAGS(
Clang -fcuda-rdc Clang -fcuda-rdc
NVIDIA --relocatable-device-code=true NVIDIA --relocatable-device-code=true
NVHPC -gpu=rdc
) )
ELSEIF(KOKKOS_ENABLE_CUDA)
COMPILER_SPECIFIC_FLAGS(
NVHPC -gpu=nordc
)
ENDIF() ENDIF()
ENDIF() ENDIF()
@ -571,7 +586,7 @@ IF (KOKKOS_ENABLE_HIP)
COMPILER_SPECIFIC_FLAGS( COMPILER_SPECIFIC_FLAGS(
DEFAULT -fgpu-rdc DEFAULT -fgpu-rdc
) )
IF (NOT KOKKOS_CXX_COMPILER_ID STREQUAL HIPCC) IF (NOT KOKKOS_CXX_COMPILER_ID STREQUAL HIPCC AND NOT KOKKOS_IMPL_AMDGPU_FLAGS)
COMPILER_SPECIFIC_LINK_OPTIONS( COMPILER_SPECIFIC_LINK_OPTIONS(
DEFAULT --hip-link DEFAULT --hip-link
) )
@ -654,15 +669,9 @@ FUNCTION(CHECK_CUDA_ARCH ARCH FLAG)
IF(KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE) IF(KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE)
SET(CMAKE_CUDA_ARCHITECTURES ${KOKKOS_CUDA_ARCHITECTURES} PARENT_SCOPE) SET(CMAKE_CUDA_ARCHITECTURES ${KOKKOS_CUDA_ARCHITECTURES} PARENT_SCOPE)
ELSE() ELSE()
IF(KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC) GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS "${CUDA_ARCH_FLAG}=${FLAG}")
STRING(REPLACE "sm_" "cc" NVHPC_CUDA_ARCH ${FLAG}) IF(KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE OR KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS "${CUDA_ARCH_FLAG}=${NVHPC_CUDA_ARCH}") GLOBAL_APPEND(KOKKOS_LINK_OPTIONS "${CUDA_ARCH_FLAG}=${FLAG}")
GLOBAL_APPEND(KOKKOS_LINK_OPTIONS "${CUDA_ARCH_FLAG}=${NVHPC_CUDA_ARCH}")
ELSE()
GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS "${CUDA_ARCH_FLAG}=${FLAG}")
IF(KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE OR KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
GLOBAL_APPEND(KOKKOS_LINK_OPTIONS "${CUDA_ARCH_FLAG}=${FLAG}")
ENDIF()
ENDIF() ENDIF()
ENDIF() ENDIF()
ENDIF() ENDIF()
@ -704,14 +713,16 @@ FUNCTION(CHECK_AMDGPU_ARCH ARCH FLAG)
MESSAGE(WARNING "Given AMD GPU architecture ${ARCH}, but Kokkos_ENABLE_HIP, Kokkos_ENABLE_SYCL, Kokkos_ENABLE_OPENACC, and Kokkos_ENABLE_OPENMPTARGET are OFF. Option will be ignored.") MESSAGE(WARNING "Given AMD GPU architecture ${ARCH}, but Kokkos_ENABLE_HIP, Kokkos_ENABLE_SYCL, Kokkos_ENABLE_OPENACC, and Kokkos_ENABLE_OPENMPTARGET are OFF. Option will be ignored.")
UNSET(KOKKOS_ARCH_${ARCH} PARENT_SCOPE) UNSET(KOKKOS_ARCH_${ARCH} PARENT_SCOPE)
ELSE() ELSE()
IF(KOKKOS_ENABLE_HIP) IF(KOKKOS_ENABLE_HIP)
SET(KOKKOS_HIP_ARCHITECTURES ${FLAG} PARENT_SCOPE) SET(KOKKOS_HIP_ARCHITECTURES ${FLAG} PARENT_SCOPE)
ENDIF() ENDIF()
SET(KOKKOS_AMDGPU_ARCH_FLAG ${FLAG} PARENT_SCOPE) IF(NOT KOKKOS_IMPL_AMDGPU_FLAGS)
GLOBAL_APPEND(KOKKOS_AMDGPU_OPTIONS "${AMDGPU_ARCH_FLAG}=${FLAG}") SET(KOKKOS_AMDGPU_ARCH_FLAG ${FLAG} PARENT_SCOPE)
IF(KOKKOS_ENABLE_HIP_RELOCATABLE_DEVICE_CODE) GLOBAL_APPEND(KOKKOS_AMDGPU_OPTIONS "${AMDGPU_ARCH_FLAG}=${FLAG}")
GLOBAL_APPEND(KOKKOS_LINK_OPTIONS "${AMDGPU_ARCH_FLAG}=${FLAG}") ENDIF()
ENDIF() IF(KOKKOS_ENABLE_HIP_RELOCATABLE_DEVICE_CODE)
GLOBAL_APPEND(KOKKOS_LINK_OPTIONS "${AMDGPU_ARCH_FLAG}=${FLAG}")
ENDIF()
ENDIF() ENDIF()
ENDIF() ENDIF()
ENDFUNCTION() ENDFUNCTION()
@ -724,6 +735,15 @@ FOREACH(ARCH IN LISTS SUPPORTED_AMD_ARCHS)
CHECK_AMDGPU_ARCH(${ARCH} ${FLAG}) CHECK_AMDGPU_ARCH(${ARCH} ${FLAG})
ENDFOREACH() ENDFOREACH()
IF(KOKKOS_IMPL_AMDGPU_FLAGS)
IF (NOT AMDGPU_ARCH_ALREADY_SPECIFIED)
MESSAGE(FATAL_ERROR "When IMPL_AMDGPU_FLAGS is set the architecture autodectection is disabled. "
"Please explicitly set the GPU architecture.")
ENDIF()
GLOBAL_APPEND(KOKKOS_AMDGPU_OPTIONS "${KOKKOS_IMPL_AMDGPU_FLAGS}")
GLOBAL_APPEND(KOKKOS_LINK_OPTIONS "${KOKKOS_IMPL_AMDGPU_LINK}")
ENDIF()
MACRO(SET_AND_CHECK_AMD_ARCH ARCH FLAG) MACRO(SET_AND_CHECK_AMD_ARCH ARCH FLAG)
KOKKOS_SET_OPTION(ARCH_${ARCH} ON) KOKKOS_SET_OPTION(ARCH_${ARCH} ON)
CHECK_AMDGPU_ARCH(${ARCH} ${FLAG}) CHECK_AMDGPU_ARCH(${ARCH} ${FLAG})
@ -984,7 +1004,7 @@ IF (KOKKOS_ARCH_HOPPER90)
ENDIF() ENDIF()
#HIP detection of gpu arch #HIP detection of gpu arch
IF(KOKKOS_ENABLE_HIP AND NOT AMDGPU_ARCH_ALREADY_SPECIFIED) IF(KOKKOS_ENABLE_HIP AND NOT AMDGPU_ARCH_ALREADY_SPECIFIED AND NOT KOKKOS_IMPL_AMDGPU_FLAGS)
FIND_PROGRAM(ROCM_ENUMERATOR rocm_agent_enumerator) FIND_PROGRAM(ROCM_ENUMERATOR rocm_agent_enumerator)
IF(NOT ROCM_ENUMERATOR) IF(NOT ROCM_ENUMERATOR)
MESSAGE(FATAL_ERROR "Autodetection of AMD GPU architecture not possible as " MESSAGE(FATAL_ERROR "Autodetection of AMD GPU architecture not possible as "

View File

@ -42,12 +42,8 @@ IF(Kokkos_ENABLE_CUDA)
# If launcher was found and nvcc_wrapper was not specified as # If launcher was found and nvcc_wrapper was not specified as
# compiler and `CMAKE_CXX_COMPILIER_LAUNCHER` is not set, set to use launcher. # compiler and `CMAKE_CXX_COMPILIER_LAUNCHER` is not set, set to use launcher.
# Will ensure CMAKE_CXX_COMPILER is replaced by nvcc_wrapper # Will ensure CMAKE_CXX_COMPILER is replaced by nvcc_wrapper
IF(Kokkos_COMPILE_LAUNCHER AND NOT INTERNAL_HAVE_COMPILER_NVCC AND NOT KOKKOS_CXX_COMPILER_ID STREQUAL Clang IF(Kokkos_COMPILE_LAUNCHER AND NOT INTERNAL_HAVE_COMPILER_NVCC AND NOT KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
AND NOT (Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC))
IF(CMAKE_CXX_COMPILER_LAUNCHER) IF(CMAKE_CXX_COMPILER_LAUNCHER)
IF(KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
MESSAGE(STATUS "Using nvc++ as device compiler requires Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON!")
ENDIF()
MESSAGE(FATAL_ERROR "Cannot use CMAKE_CXX_COMPILER_LAUNCHER if the CMAKE_CXX_COMPILER is not able to compile CUDA code, i.e. nvcc_wrapper or clang++!") MESSAGE(FATAL_ERROR "Cannot use CMAKE_CXX_COMPILER_LAUNCHER if the CMAKE_CXX_COMPILER is not able to compile CUDA code, i.e. nvcc_wrapper or clang++!")
ENDIF() ENDIF()
# the first argument to launcher is always the C++ compiler defined by cmake # the first argument to launcher is always the C++ compiler defined by cmake
@ -149,56 +145,85 @@ IF(KOKKOS_CXX_COMPILER_ID STREQUAL Fujitsu)
ENDIF() ENDIF()
# Enforce the minimum compilers supported by Kokkos. # Enforce the minimum compilers supported by Kokkos.
SET(KOKKOS_MESSAGE_TEXT "Compiler not supported by Kokkos. Required compiler versions:") IF(NOT CMAKE_CXX_STANDARD)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(CPU) 8.0.0 or higher") SET(CMAKE_CXX_STANDARD 17)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(CUDA) 10.0.0 or higher") ENDIF()
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(OpenMPTarget) 15.0.0 or higher") IF(CMAKE_CXX_STANDARD EQUAL 17)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n GCC 8.2.0 or higher") SET(KOKKOS_CLANG_CPU_MINIMUM 8.0.0)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Intel 19.0.5 or higher") SET(KOKKOS_CLANG_CUDA_MINIMUM 10.0.0)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n IntelLLVM(CPU) 2021.1.1 or higher") SET(KOKKOS_CLANG_OPENMPTARGET_MINIMUM 15.0.0)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n IntelLLVM(SYCL) 2023.0.0 or higher") SET(KOKKOS_GCC_MINIMUM 8.2.0)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n NVCC 11.0.0 or higher") SET(KOKKOS_INTEL_MINIMUM 19.0.5)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n HIPCC 5.2.0 or higher") SET(KOKKOS_INTEL_LLVM_CPU_MINIMUM 2021.1.1)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n NVHPC/PGI 22.3 or higher") SET(KOKKOS_INTEL_LLVM_SYCL_MINIMUM 2023.0.0)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n MSVC 19.29 or higher") SET(KOKKOS_NVCC_MINIMUM 11.0.0)
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n XL/XLClang not supported") SET(KOKKOS_HIPCC_MINIMUM 5.2.0)
SET(KOKKOS_NVHPC_MINIMUM 22.3)
SET(KOKKOS_MSVC_MINIMUM 19.29)
ELSE()
SET(KOKKOS_CLANG_CPU_MINIMUM 14.0.0)
SET(KOKKOS_CLANG_CUDA_MINIMUM 14.0.0)
SET(KOKKOS_CLANG_OPENMPTARGET_MINIMUM 15.0.0)
SET(KOKKOS_GCC_MINIMUM 10.1.0)
SET(KOKKOS_INTEL_MINIMUM "not supported")
SET(KOKKOS_INTEL_LLVM_CPU_MINIMUM 2022.0.0)
SET(KOKKOS_INTEL_LLVM_SYCL_MINIMUM 2023.0.0)
SET(KOKKOS_NVCC_MINIMUM 12.0.0)
SET(KOKKOS_HIPCC_MINIMUM 5.2.0)
SET(KOKKOS_NVHPC_MINIMUM 22.3)
SET(KOKKOS_MSVC_MINIMUM 19.30)
ENDIF()
SET(KOKKOS_MESSAGE_TEXT "Compiler not supported by Kokkos for C++${CMAKE_CXX_STANDARD}. Required minimum compiler versions:")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(CPU) ${KOKKOS_CLANG_CPU_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(CUDA) ${KOKKOS_CLANG_CUDA_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(OpenMPTarget) ${KOKKOS_CLANG_OPENMPTARGET_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n GCC ${KOKKOS_GCC_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Intel ${KOKKOS_INTEL_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n IntelLLVM(CPU) ${KOKKOS_INTEL_LLVM_CPU_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n IntelLLVM(SYCL) ${KOKKOS_INTEL_LLVM_SYCL_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n NVCC ${KOKKOS_NVCC_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n HIPCC ${KOKKOS_HIPCC_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n NVHPC/PGI ${KOKKOS_NVHPC_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n MSVC ${KOKKOS_MSVC_MINIMUM}")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n XL/XLClang not supported")
SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\nCompiler: ${KOKKOS_CXX_COMPILER_ID} ${KOKKOS_CXX_COMPILER_VERSION}\n") SET(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\nCompiler: ${KOKKOS_CXX_COMPILER_ID} ${KOKKOS_CXX_COMPILER_VERSION}\n")
IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang AND NOT Kokkos_ENABLE_CUDA) IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang AND NOT Kokkos_ENABLE_CUDA)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 8.0.0) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_CLANG_CPU_MINIMUM})
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang AND Kokkos_ENABLE_CUDA) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang AND Kokkos_ENABLE_CUDA)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 10.0.0) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_CLANG_CUDA_MINIMUM})
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL GNU) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL GNU)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 8.2.0) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_GCC_MINIMUM})
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL Intel) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL Intel)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 19.0.5) IF((NOT CMAKE_CXX_STANDARD EQUAL 17) OR (KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_INTEL_MINIMUM}))
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL IntelLLVM AND NOT Kokkos_ENABLE_SYCL) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL IntelLLVM AND NOT Kokkos_ENABLE_SYCL)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 2021.1.1) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_INTEL_LLVM_CPU_MINIMUM})
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL IntelLLVM AND Kokkos_ENABLE_SYCL) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL IntelLLVM AND Kokkos_ENABLE_SYCL)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 2023.0.0) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_INTEL_LLVM_SYCL_MINIMUM})
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 11.0.0) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_NVCC_MINIMUM})
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
SET(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "Kokkos turns off CXX extensions" FORCE) SET(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "Kokkos turns off CXX extensions" FORCE)
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL HIPCC) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL HIPCC)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 5.2.0) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_HIPCC_MINIMUM})
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL PGI OR KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL PGI OR KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 22.3) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_NVHPC_MINIMUM})
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
# Treat PGI internally as NVHPC to simplify handling both compilers. # Treat PGI internally as NVHPC to simplify handling both compilers.
@ -206,13 +231,13 @@ ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL PGI OR KOKKOS_CXX_COMPILER_ID STREQUAL NV
# backward-compatible to pgc++. # backward-compatible to pgc++.
SET(KOKKOS_CXX_COMPILER_ID NVHPC CACHE STRING INTERNAL FORCE) SET(KOKKOS_CXX_COMPILER_ID NVHPC CACHE STRING INTERNAL FORCE)
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC") ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC")
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 19.29) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_MSVC_MINIMUM})
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL XL OR KOKKOS_CXX_COMPILER_ID STREQUAL XLClang) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL XL OR KOKKOS_CXX_COMPILER_ID STREQUAL XLClang)
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang AND Kokkos_ENABLE_OPENMPTARGET) ELSEIF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang AND Kokkos_ENABLE_OPENMPTARGET)
IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 15.0.0) IF(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS KOKKOS_CLANG_OPENMPTARGET_MINIMUM)
MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}") MESSAGE(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
ENDIF() ENDIF()
ENDIF() ENDIF()

View File

@ -75,8 +75,12 @@ KOKKOS_ENABLE_OPTION(IMPL_HIP_UNIFIED_MEMORY OFF "Whether to leverage unified me
# This option will go away eventually, but allows fallback to old implementation when needed. # This option will go away eventually, but allows fallback to old implementation when needed.
KOKKOS_ENABLE_OPTION(DESUL_ATOMICS_EXTERNAL OFF "Whether to use an external desul installation") KOKKOS_ENABLE_OPTION(DESUL_ATOMICS_EXTERNAL OFF "Whether to use an external desul installation")
KOKKOS_ENABLE_OPTION(ATOMICS_BYPASS OFF "**NOT RECOMMENDED** Whether to make atomics non-atomic for non-threaded MPI-only use cases") KOKKOS_ENABLE_OPTION(ATOMICS_BYPASS OFF "**NOT RECOMMENDED** Whether to make atomics non-atomic for non-threaded MPI-only use cases")
KOKKOS_ENABLE_OPTION(IMPL_REF_COUNT_BRANCH_UNLIKELY ON "Whether to use the C++20 `[[unlikely]]` attribute in the view reference counting")
mark_as_advanced(Kokkos_ENABLE_IMPL_REF_COUNT_BRANCH_UNLIKELY)
KOKKOS_ENABLE_OPTION(IMPL_VIEW_OF_VIEWS_DESTRUCTOR_PRECONDITION_VIOLATION_WORKAROUND OFF "Whether to enable a workaround for invalid use of View of Views that causes program hang on destruction.")
mark_as_advanced(Kokkos_ENABLE_IMPL_VIEW_OF_VIEWS_DESTRUCTOR_PRECONDITION_VIOLATION_WORKAROUND)
KOKKOS_ENABLE_OPTION(IMPL_MDSPAN OFF "Whether to enable experimental mdspan support") KOKKOS_ENABLE_OPTION(IMPL_MDSPAN ON "Whether to enable experimental mdspan support")
KOKKOS_ENABLE_OPTION(MDSPAN_EXTERNAL OFF BOOL "Whether to use an external version of mdspan") KOKKOS_ENABLE_OPTION(MDSPAN_EXTERNAL OFF BOOL "Whether to use an external version of mdspan")
KOKKOS_ENABLE_OPTION(IMPL_SKIP_COMPILER_MDSPAN ON BOOL "Whether to use an internal version of mdspan even if the compiler supports mdspan") KOKKOS_ENABLE_OPTION(IMPL_SKIP_COMPILER_MDSPAN ON BOOL "Whether to use an internal version of mdspan even if the compiler supports mdspan")
mark_as_advanced(Kokkos_ENABLE_IMPL_MDSPAN) mark_as_advanced(Kokkos_ENABLE_IMPL_MDSPAN)

View File

@ -709,7 +709,12 @@ MACRO(kokkos_find_imported NAME)
ENDIF() ENDIF()
IF (NOT TPL_LIBRARY_SUFFIXES) IF (NOT TPL_LIBRARY_SUFFIXES)
SET(TPL_LIBRARY_SUFFIXES lib lib64) SET(TPL_LIBRARY_SUFFIXES lib)
IF(KOKKOS_IMPL_32BIT)
LIST(APPEND TPL_LIBRARY_SUFFIXES lib32)
ELSE()
LIST(APPEND TPL_LIBRARY_SUFFIXES lib64)
ENDIF()
ENDIF() ENDIF()
SET(${NAME}_INCLUDE_DIRS) SET(${NAME}_INCLUDE_DIRS)

View File

@ -124,12 +124,8 @@ IF(KOKKOS_ENABLE_CUDA)
ELSEIF(CMAKE_CXX_EXTENSIONS) ELSEIF(CMAKE_CXX_EXTENSIONS)
MESSAGE(FATAL_ERROR "Compiling CUDA code with clang doesn't support C++ extensions. Set -DCMAKE_CXX_EXTENSIONS=OFF") MESSAGE(FATAL_ERROR "Compiling CUDA code with clang doesn't support C++ extensions. Set -DCMAKE_CXX_EXTENSIONS=OFF")
ENDIF() ENDIF()
ELSEIF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA AND NOT (Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)) ELSEIF(NOT KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
IF(KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC) MESSAGE(FATAL_ERROR "Invalid compiler for CUDA. The compiler must be nvcc_wrapper or Clang or use kokkos_launch_compiler, but compiler ID was ${KOKKOS_CXX_COMPILER_ID}")
MESSAGE(FATAL_ERROR "Invalid compiler for CUDA. To allow nvc++ as Cuda compiler, Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON must be set!")
ELSE()
MESSAGE(FATAL_ERROR "Invalid compiler for CUDA. The compiler must be nvcc_wrapper or Clang or NVC++ or use kokkos_launch_compiler, but compiler ID was ${KOKKOS_CXX_COMPILER_ID}")
ENDIF()
ENDIF() ENDIF()
ENDIF() ENDIF()

View File

@ -103,13 +103,19 @@ if (Kokkos_ENABLE_IMPL_MDSPAN AND Kokkos_ENABLE_MDSPAN_EXTERNAL)
endif() endif()
IF (Kokkos_ENABLE_OPENMP) IF (Kokkos_ENABLE_OPENMP)
find_package(OpenMP REQUIRED) find_package(OpenMP REQUIRED COMPONENTS CXX)
# FIXME_TRILINOS Trilinos doesn't allow for Kokkos to use find_dependency # FIXME_TRILINOS Trilinos doesn't allow for Kokkos to use find_dependency
# so we just append the flags here instead of linking with the OpenMP target. # so we just append the flags here instead of linking with the OpenMP target.
IF(KOKKOS_HAS_TRILINOS) IF(KOKKOS_HAS_TRILINOS)
COMPILER_SPECIFIC_FLAGS(DEFAULT ${OpenMP_CXX_FLAGS}) COMPILER_SPECIFIC_FLAGS(DEFAULT ${OpenMP_CXX_FLAGS})
ELSE() ELSE()
KOKKOS_EXPORT_CMAKE_TPL(OpenMP REQUIRED) KOKKOS_EXPORT_CMAKE_TPL(OpenMP REQUIRED COMPONENTS CXX)
ENDIF()
IF(Kokkos_ENABLE_HIP AND KOKKOS_COMPILE_LANGUAGE STREQUAL HIP)
GLOBAL_APPEND(KOKKOS_AMDGPU_OPTIONS ${OpenMP_CXX_FLAGS})
ENDIF()
IF(Kokkos_ENABLE_CUDA AND KOKKOS_COMPILE_LANGUAGE STREQUAL CUDA)
GLOBAL_APPEND(KOKKOS_CUDA_OPTIONS -Xcompiler ${OpenMP_CXX_FLAGS})
ENDIF() ENDIF()
ENDIF() ENDIF()

View File

@ -160,6 +160,12 @@ FUNCTION(KOKKOS_ADD_EXECUTABLE_AND_TEST ROOT_NAME)
) )
ENDIF() ENDIF()
ENDIF() ENDIF()
# We noticed problems with -fvisibility=hidden for inline static variables
# if Kokkos was built as shared library.
IF(BUILD_SHARED_LIBS)
SET_PROPERTY(TARGET ${PACKAGE_NAME}_${ROOT_NAME} PROPERTY VISIBILITY_INLINES_HIDDEN ON)
SET_PROPERTY(TARGET ${PACKAGE_NAME}_${ROOT_NAME} PROPERTY CXX_VISIBILITY_PRESET hidden)
ENDIF()
ENDFUNCTION() ENDFUNCTION()
FUNCTION(KOKKOS_SET_EXE_PROPERTY ROOT_NAME) FUNCTION(KOKKOS_SET_EXE_PROPERTY ROOT_NAME)
@ -241,34 +247,6 @@ MACRO(KOKKOS_CONFIGURE_CORE)
KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_FwdBackend.hpp "KOKKOS_FWD" "fwd/Kokkos_Fwd" "${KOKKOS_ENABLED_DEVICES}") KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_FwdBackend.hpp "KOKKOS_FWD" "fwd/Kokkos_Fwd" "${KOKKOS_ENABLED_DEVICES}")
KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_SetupBackend.hpp "KOKKOS_SETUP" "setup/Kokkos_Setup" "${DEVICE_SETUP_LIST}") KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_SetupBackend.hpp "KOKKOS_SETUP" "setup/Kokkos_Setup" "${DEVICE_SETUP_LIST}")
KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_DeclareBackend.hpp "KOKKOS_DECLARE" "decl/Kokkos_Declare" "${KOKKOS_ENABLED_DEVICES}") KOKKOS_CONFIG_HEADER( KokkosCore_Config_HeaderSet.in KokkosCore_Config_DeclareBackend.hpp "KOKKOS_DECLARE" "decl/Kokkos_Declare" "${KOKKOS_ENABLED_DEVICES}")
SET(_DEFAULT_HOST_MEMSPACE "::Kokkos::HostSpace")
KOKKOS_OPTION(DEFAULT_DEVICE_MEMORY_SPACE "" STRING "Override default device memory space")
KOKKOS_OPTION(DEFAULT_HOST_MEMORY_SPACE "" STRING "Override default host memory space")
KOKKOS_OPTION(DEFAULT_DEVICE_EXECUTION_SPACE "" STRING "Override default device execution space")
KOKKOS_OPTION(DEFAULT_HOST_PARALLEL_EXECUTION_SPACE "" STRING "Override default host parallel execution space")
IF (NOT Kokkos_DEFAULT_DEVICE_EXECUTION_SPACE STREQUAL "")
SET(_DEVICE_PARALLEL ${Kokkos_DEFAULT_DEVICE_EXECUTION_SPACE})
MESSAGE(STATUS "Override default device execution space: ${_DEVICE_PARALLEL}")
SET(KOKKOS_DEVICE_SPACE_ACTIVE ON)
ELSE()
IF (_DEVICE_PARALLEL STREQUAL "NoTypeDefined")
SET(KOKKOS_DEVICE_SPACE_ACTIVE OFF)
ELSE()
SET(KOKKOS_DEVICE_SPACE_ACTIVE ON)
ENDIF()
ENDIF()
IF (NOT Kokkos_DEFAULT_HOST_PARALLEL_EXECUTION_SPACE STREQUAL "")
SET(_HOST_PARALLEL ${Kokkos_DEFAULT_HOST_PARALLEL_EXECUTION_SPACE})
MESSAGE(STATUS "Override default host parallel execution space: ${_HOST_PARALLEL}")
SET(KOKKOS_HOSTPARALLEL_SPACE_ACTIVE ON)
ELSE()
IF (_HOST_PARALLEL STREQUAL "NoTypeDefined")
SET(KOKKOS_HOSTPARALLEL_SPACE_ACTIVE OFF)
ELSE()
SET(KOKKOS_HOSTPARALLEL_SPACE_ACTIVE ON)
ENDIF()
ENDIF()
#We are ready to configure the header
CONFIGURE_FILE(cmake/KokkosCore_config.h.in KokkosCore_config.h @ONLY) CONFIGURE_FILE(cmake/KokkosCore_config.h.in KokkosCore_config.h @ONLY)
ENDMACRO() ENDMACRO()
@ -484,15 +462,10 @@ ENDFUNCTION()
FUNCTION(KOKKOS_LIB_INCLUDE_DIRECTORIES TARGET) FUNCTION(KOKKOS_LIB_INCLUDE_DIRECTORIES TARGET)
IF(KOKKOS_HAS_TRILINOS) KOKKOS_LIB_TYPE(${TARGET} INCTYPE)
#ignore the target, tribits doesn't do anything directly with targets FOREACH(DIR ${ARGN})
TRIBITS_INCLUDE_DIRECTORIES(${ARGN}) TARGET_INCLUDE_DIRECTORIES(${TARGET} ${INCTYPE} $<BUILD_INTERFACE:${DIR}>)
ELSE() #append to a list for later ENDFOREACH()
KOKKOS_LIB_TYPE(${TARGET} INCTYPE)
FOREACH(DIR ${ARGN})
TARGET_INCLUDE_DIRECTORIES(${TARGET} ${INCTYPE} $<BUILD_INTERFACE:${DIR}>)
ENDFOREACH()
ENDIF()
ENDFUNCTION() ENDFUNCTION()
FUNCTION(KOKKOS_LIB_COMPILE_OPTIONS TARGET) FUNCTION(KOKKOS_LIB_COMPILE_OPTIONS TARGET)

View File

@ -1,26 +0,0 @@
#@HEADER
# ************************************************************************
#
# Kokkos v. 4.0
# Copyright (2022) National Technology & Engineering
# Solutions of Sandia, LLC (NTESS).
#
# Under the terms of Contract DE-NA0003525 with NTESS,
# the U.S. Government retains certain rights in this software.
#
# Part of Kokkos, under the Apache License v2.0 with LLVM Exceptions.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
#
#@HEADER
# Check for CUDA support
IF (NOT TPL_ENABLE_CUDA)
MESSAGE(FATAL_ERROR "\nCUSPARSE requires CUDA")
ELSE()
GLOBAL_SET(TPL_CUSPARSE_LIBRARY_DIRS)
GLOBAL_SET(TPL_CUSPARSE_INCLUDE_DIRS ${TPL_CUDA_INCLUDE_DIRS})
GLOBAL_SET(TPL_CUSPARSE_LIBRARIES ${CUDA_cusparse_LIBRARY})
ENDIF()

View File

@ -944,13 +944,13 @@ class DualView : public ViewTraits<DataType, Properties...> {
if (sizeMismatch) { if (sizeMismatch) {
::Kokkos::realloc(arg_prop, d_view, n0, n1, n2, n3, n4, n5, n6, n7); ::Kokkos::realloc(arg_prop, d_view, n0, n1, n2, n3, n4, n5, n6, n7);
if (alloc_prop_input::initialize) { if constexpr (alloc_prop_input::initialize) {
h_view = create_mirror_view(typename t_host::memory_space(), d_view); h_view = create_mirror_view(typename t_host::memory_space(), d_view);
} else { } else {
h_view = create_mirror_view(Kokkos::WithoutInitializing, h_view = create_mirror_view(Kokkos::WithoutInitializing,
typename t_host::memory_space(), d_view); typename t_host::memory_space(), d_view);
} }
} else if (alloc_prop_input::initialize) { } else if constexpr (alloc_prop_input::initialize) {
if constexpr (alloc_prop_input::has_execution_space) { if constexpr (alloc_prop_input::has_execution_space) {
const auto& exec_space = const auto& exec_space =
Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop); Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop);
@ -1038,12 +1038,10 @@ class DualView : public ViewTraits<DataType, Properties...> {
/* Resize on Device */ /* Resize on Device */
if (sizeMismatch) { if (sizeMismatch) {
::Kokkos::resize(properties, d_view, n0, n1, n2, n3, n4, n5, n6, n7); ::Kokkos::resize(properties, d_view, n0, n1, n2, n3, n4, n5, n6, n7);
if (alloc_prop_input::initialize) { // this part of the lambda was relocated in a method as it contains a
h_view = create_mirror_view(typename t_host::memory_space(), d_view); // `if constexpr`. In some cases, both branches were evaluated
} else { // leading to a compile error
h_view = create_mirror_view(Kokkos::WithoutInitializing, resync_host(properties);
typename t_host::memory_space(), d_view);
}
/* Mark Device copy as modified */ /* Mark Device copy as modified */
++modified_flags(1); ++modified_flags(1);
@ -1054,13 +1052,10 @@ class DualView : public ViewTraits<DataType, Properties...> {
/* Resize on Host */ /* Resize on Host */
if (sizeMismatch) { if (sizeMismatch) {
::Kokkos::resize(properties, h_view, n0, n1, n2, n3, n4, n5, n6, n7); ::Kokkos::resize(properties, h_view, n0, n1, n2, n3, n4, n5, n6, n7);
if (alloc_prop_input::initialize) { // this part of the lambda was relocated in a method as it contains a
d_view = create_mirror_view(typename t_dev::memory_space(), h_view); // `if constexpr`. In some cases, both branches were evaluated
// leading to a compile error
} else { resync_device(properties);
d_view = create_mirror_view(Kokkos::WithoutInitializing,
typename t_dev::memory_space(), h_view);
}
/* Mark Host copy as modified */ /* Mark Host copy as modified */
++modified_flags(0); ++modified_flags(0);
@ -1099,6 +1094,39 @@ class DualView : public ViewTraits<DataType, Properties...> {
} }
} }
private:
// resync host mirror from device
// this code was relocated from a lambda as it contains a `if constexpr`.
// In some cases, both branches were evaluated, leading to a compile error
template <class... ViewCtorArgs>
inline void resync_host(Impl::ViewCtorProp<ViewCtorArgs...> const&) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
if constexpr (alloc_prop_input::initialize) {
h_view = create_mirror_view(typename t_host::memory_space(), d_view);
} else {
h_view = create_mirror_view(Kokkos::WithoutInitializing,
typename t_host::memory_space(), d_view);
}
}
// resync device mirror from host
// this code was relocated from a lambda as it contains a `if constexpr`
// In some cases, both branches were evaluated leading to a compile error
template <class... ViewCtorArgs>
inline void resync_device(Impl::ViewCtorProp<ViewCtorArgs...> const&) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
if constexpr (alloc_prop_input::initialize) {
d_view = create_mirror_view(typename t_dev::memory_space(), h_view);
} else {
d_view = create_mirror_view(Kokkos::WithoutInitializing,
typename t_dev::memory_space(), h_view);
}
}
public:
void resize(const size_t n0 = KOKKOS_IMPL_CTOR_DEFAULT_ARG, void resize(const size_t n0 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,
const size_t n1 = KOKKOS_IMPL_CTOR_DEFAULT_ARG, const size_t n1 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,
const size_t n2 = KOKKOS_IMPL_CTOR_DEFAULT_ARG, const size_t n2 = KOKKOS_IMPL_CTOR_DEFAULT_ARG,

View File

@ -1657,8 +1657,7 @@ KOKKOS_FUNCTION auto as_view_of_rank_n(
if constexpr (std::is_same_v<decltype(layout), Kokkos::LayoutLeft> || if constexpr (std::is_same_v<decltype(layout), Kokkos::LayoutLeft> ||
std::is_same_v<decltype(layout), Kokkos::LayoutRight> || std::is_same_v<decltype(layout), Kokkos::LayoutRight> ||
std::is_same_v<decltype(layout), Kokkos::LayoutStride> || std::is_same_v<decltype(layout), Kokkos::LayoutStride>) {
is_layouttiled<decltype(layout)>::value) {
for (int i = N; i < 7; ++i) for (int i = N; i < 7; ++i)
layout.dimension[i] = KOKKOS_IMPL_CTOR_DEFAULT_ARG; layout.dimension[i] = KOKKOS_IMPL_CTOR_DEFAULT_ARG;
} }
@ -1933,254 +1932,155 @@ struct MirrorDRVType {
} // namespace Impl } // namespace Impl
namespace Impl { namespace Impl {
// create a mirror
// private interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs> template <class T, class... P, class... ViewCtorArgs>
inline typename DynRankView<T, P...>::HostMirror create_mirror( inline auto create_mirror(const DynRankView<T, P...>& src,
const DynRankView<T, P...>& src, const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, check_view_ctor_args_create_mirror<ViewCtorArgs...>();
std::enable_if_t<!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>* =
nullptr) {
using src_type = DynRankView<T, P...>;
using dst_type = typename src_type::HostMirror;
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
static_assert(
!alloc_prop_input::has_label,
"The view constructor arguments passed to Kokkos::create_mirror "
"must not include a label!");
static_assert(
!alloc_prop_input::has_pointer,
"The view constructor arguments passed to Kokkos::create_mirror must "
"not include a pointer!");
static_assert(
!alloc_prop_input::allow_padding,
"The view constructor arguments passed to Kokkos::create_mirror must "
"not explicitly allow padding!");
auto prop_copy = Impl::with_properties_if_unset( auto prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string(src.label()).append("_mirror")); arg_prop, std::string(src.label()).append("_mirror"));
return dst_type(prop_copy, Impl::reconstructLayout(src.layout(), src.rank())); if constexpr (Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space) {
} using dst_type = typename Impl::MirrorDRVType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::view_type;
template <class T, class... P, class... ViewCtorArgs> return dst_type(prop_copy,
inline auto create_mirror( Impl::reconstructLayout(src.layout(), src.rank()));
const DynRankView<T, P...>& src, } else {
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, using src_type = DynRankView<T, P...>;
std::enable_if_t<Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>* = using dst_type = typename src_type::HostMirror;
nullptr) {
using dst_type = typename Impl::MirrorDRVType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::view_type;
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>; return dst_type(prop_copy,
Impl::reconstructLayout(src.layout(), src.rank()));
static_assert( }
!alloc_prop_input::has_label, #if defined(KOKKOS_COMPILER_INTEL) || \
"The view constructor arguments passed to Kokkos::create_mirror " (defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
"must not include a label!"); !defined(KOKKOS_COMPILER_MSVC))
static_assert( __builtin_unreachable();
!alloc_prop_input::has_pointer, #endif
"The view constructor arguments passed to Kokkos::create_mirror must "
"not include a pointer!");
static_assert(
!alloc_prop_input::allow_padding,
"The view constructor arguments passed to Kokkos::create_mirror must "
"not explicitly allow padding!");
auto prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string(src.label()).append("_mirror"));
return dst_type(prop_copy, Impl::reconstructLayout(src.layout(), src.rank()));
} }
} // namespace Impl } // namespace Impl
// Create a mirror in host space // public interface
template <class T, class... P> template <class T, class... P,
inline typename DynRankView<T, P...>::HostMirror create_mirror( class Enable = std::enable_if_t<
const DynRankView<T, P...>& src, std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
std::enable_if_t<std::is_same<typename ViewTraits<T, P...>::specialize, inline auto create_mirror(const DynRankView<T, P...>& src) {
void>::value>* = nullptr) { return Impl::create_mirror(src, Kokkos::view_alloc());
return Impl::create_mirror(src, Kokkos::Impl::ViewCtorProp<>{});
} }
template <class T, class... P> // public interface that accepts a without initializing flag
inline typename DynRankView<T, P...>::HostMirror create_mirror( template <class T, class... P,
Kokkos::Impl::WithoutInitializing_t wi, const DynRankView<T, P...>& src, class Enable = std::enable_if_t<
std::enable_if_t<std::is_same<typename ViewTraits<T, P...>::specialize, std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
void>::value>* = nullptr) { inline auto create_mirror(Kokkos::Impl::WithoutInitializing_t wi,
const DynRankView<T, P...>& src) {
return Impl::create_mirror(src, Kokkos::view_alloc(wi)); return Impl::create_mirror(src, Kokkos::view_alloc(wi));
} }
template <class T, class... P, class... ViewCtorArgs> // public interface that accepts a space
inline typename DynRankView<T, P...>::HostMirror create_mirror(
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
const DynRankView<T, P...>& src,
std::enable_if_t<
std::is_void<typename ViewTraits<T, P...>::specialize>::value &&
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>* = nullptr) {
return Impl::create_mirror(src, arg_prop);
}
// Create a mirror in a new space
template <class Space, class T, class... P, template <class Space, class T, class... P,
typename Enable = std::enable_if_t< class Enable = std::enable_if_t<
Kokkos::is_space<Space>::value && Kokkos::is_space<Space>::value &&
std::is_void<typename ViewTraits<T, P...>::specialize>::value>> std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
typename Impl::MirrorDRVType<Space, T, P...>::view_type create_mirror( auto create_mirror(const Space&, const Kokkos::DynRankView<T, P...>& src) {
const Space&, const Kokkos::DynRankView<T, P...>& src) {
return Impl::create_mirror( return Impl::create_mirror(
src, Kokkos::view_alloc(typename Space::memory_space{})); src, Kokkos::view_alloc(typename Space::memory_space{}));
} }
template <class Space, class T, class... P> // public interface that accepts a space and a without initializing flag
typename Impl::MirrorDRVType<Space, T, P...>::view_type create_mirror( template <class Space, class T, class... P,
Kokkos::Impl::WithoutInitializing_t wi, const Space&, class Enable = std::enable_if_t<
const Kokkos::DynRankView<T, P...>& src, Kokkos::is_space<Space>::value &&
std::enable_if_t<std::is_same<typename ViewTraits<T, P...>::specialize, std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
void>::value>* = nullptr) { auto create_mirror(Kokkos::Impl::WithoutInitializing_t wi, const Space&,
const Kokkos::DynRankView<T, P...>& src) {
return Impl::create_mirror( return Impl::create_mirror(
src, Kokkos::view_alloc(wi, typename Space::memory_space{})); src, Kokkos::view_alloc(wi, typename Space::memory_space{}));
} }
template <class T, class... P, class... ViewCtorArgs> // public interface that accepts arbitrary view constructor args passed by a
inline auto create_mirror( // view_alloc
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, template <class T, class... P, class... ViewCtorArgs,
const DynRankView<T, P...>& src, typename Enable = std::enable_if_t<
std::enable_if_t< std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
std::is_void<typename ViewTraits<T, P...>::specialize>::value && inline auto create_mirror(const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>* = nullptr) { const DynRankView<T, P...>& src) {
using ReturnType = typename Impl::MirrorDRVType< return Impl::create_mirror(src, arg_prop);
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::view_type;
return ReturnType{Impl::create_mirror(src, arg_prop)};
} }
namespace Impl { namespace Impl {
template <class T, class... P, class... ViewCtorArgs>
inline std::enable_if_t<
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space &&
std::is_same<
typename DynRankView<T, P...>::memory_space,
typename DynRankView<T, P...>::HostMirror::memory_space>::value &&
std::is_same<
typename DynRankView<T, P...>::data_type,
typename DynRankView<T, P...>::HostMirror::data_type>::value,
typename DynRankView<T, P...>::HostMirror>
create_mirror_view(const DynRankView<T, P...>& src,
const typename Impl::ViewCtorProp<ViewCtorArgs...>&) {
return src;
}
// create a mirror view
// private interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs> template <class T, class... P, class... ViewCtorArgs>
inline std::enable_if_t< inline auto create_mirror_view(
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space &&
!(std::is_same<
typename DynRankView<T, P...>::memory_space,
typename DynRankView<T, P...>::HostMirror::memory_space>::value &&
std::is_same<
typename DynRankView<T, P...>::data_type,
typename DynRankView<T, P...>::HostMirror::data_type>::value),
typename DynRankView<T, P...>::HostMirror>
create_mirror_view(
const DynRankView<T, P...>& src, const DynRankView<T, P...>& src,
const typename Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) { [[maybe_unused]] const typename Impl::ViewCtorProp<ViewCtorArgs...>&
return Kokkos::Impl::create_mirror(src, arg_prop); arg_prop) {
if constexpr (!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space) {
if constexpr (std::is_same<typename DynRankView<T, P...>::memory_space,
typename DynRankView<
T, P...>::HostMirror::memory_space>::value &&
std::is_same<typename DynRankView<T, P...>::data_type,
typename DynRankView<
T, P...>::HostMirror::data_type>::value) {
return typename DynRankView<T, P...>::HostMirror(src);
} else {
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
} else {
if constexpr (Impl::MirrorDRViewType<typename Impl::ViewCtorProp<
ViewCtorArgs...>::memory_space,
T, P...>::is_same_memspace) {
return typename Impl::MirrorDRViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::view_type(src);
} else {
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
__builtin_unreachable();
#endif
} }
template <class T, class... P, class... ViewCtorArgs,
class = std::enable_if_t<
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>>
inline std::enable_if_t<
Kokkos::is_space<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space>::value &&
Impl::MirrorDRViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::is_same_memspace,
typename Impl::MirrorDRViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::view_type>
create_mirror_view(const Kokkos::DynRankView<T, P...>& src,
const typename Impl::ViewCtorProp<ViewCtorArgs...>&) {
return src;
}
template <class T, class... P, class... ViewCtorArgs,
class = std::enable_if_t<
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>>
inline std::enable_if_t<
Kokkos::is_space<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space>::value &&
!Impl::MirrorDRViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::is_same_memspace,
typename Impl::MirrorDRViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::view_type>
create_mirror_view(
const Kokkos::DynRankView<T, P...>& src,
const typename Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
return Kokkos::Impl::create_mirror(src, arg_prop);
}
} // namespace Impl } // namespace Impl
// Create a mirror view in host space // public interface
template <class T, class... P> template <class T, class... P>
inline std::enable_if_t< inline auto create_mirror_view(const Kokkos::DynRankView<T, P...>& src) {
(std::is_same< return Impl::create_mirror_view(src, Kokkos::view_alloc());
typename DynRankView<T, P...>::memory_space,
typename DynRankView<T, P...>::HostMirror::memory_space>::value &&
std::is_same<typename DynRankView<T, P...>::data_type,
typename DynRankView<T, P...>::HostMirror::data_type>::value),
typename DynRankView<T, P...>::HostMirror>
create_mirror_view(const Kokkos::DynRankView<T, P...>& src) {
return src;
}
template <class T, class... P>
inline std::enable_if_t<
!(std::is_same<
typename DynRankView<T, P...>::memory_space,
typename DynRankView<T, P...>::HostMirror::memory_space>::value &&
std::is_same<
typename DynRankView<T, P...>::data_type,
typename DynRankView<T, P...>::HostMirror::data_type>::value),
typename DynRankView<T, P...>::HostMirror>
create_mirror_view(const Kokkos::DynRankView<T, P...>& src) {
return Kokkos::create_mirror(src);
} }
// public interface that accepts a without initializing flag
template <class T, class... P> template <class T, class... P>
inline auto create_mirror_view(Kokkos::Impl::WithoutInitializing_t wi, inline auto create_mirror_view(Kokkos::Impl::WithoutInitializing_t wi,
const DynRankView<T, P...>& src) { const DynRankView<T, P...>& src) {
return Impl::create_mirror_view(src, Kokkos::view_alloc(wi)); return Impl::create_mirror_view(src, Kokkos::view_alloc(wi));
} }
// Create a mirror view in a new space // public interface that accepts a space
// FIXME_C++17 Improve SFINAE here.
template <class Space, class T, class... P, template <class Space, class T, class... P,
class Enable = std::enable_if_t<Kokkos::is_space<Space>::value>> class Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
inline typename Impl::MirrorDRViewType<Space, T, P...>::view_type inline auto create_mirror_view(const Space&,
create_mirror_view( const Kokkos::DynRankView<T, P...>& src) {
const Space&, const Kokkos::DynRankView<T, P...>& src, return Impl::create_mirror_view(
std::enable_if_t< src, Kokkos::view_alloc(typename Space::memory_space()));
Impl::MirrorDRViewType<Space, T, P...>::is_same_memspace>* = nullptr) {
return src;
} }
// FIXME_C++17 Improve SFINAE here. // public interface that accepts a space and a without initializing flag
template <class Space, class T, class... P, template <class Space, class T, class... P,
class Enable = std::enable_if_t<Kokkos::is_space<Space>::value>> typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
inline typename Impl::MirrorDRViewType<Space, T, P...>::view_type
create_mirror_view(
const Space& space, const Kokkos::DynRankView<T, P...>& src,
std::enable_if_t<
!Impl::MirrorDRViewType<Space, T, P...>::is_same_memspace>* = nullptr) {
return Kokkos::create_mirror(space, src);
}
template <class Space, class T, class... P>
inline auto create_mirror_view(Kokkos::Impl::WithoutInitializing_t wi, inline auto create_mirror_view(Kokkos::Impl::WithoutInitializing_t wi,
const Space&, const Space&,
const Kokkos::DynRankView<T, P...>& src) { const Kokkos::DynRankView<T, P...>& src) {
@ -2188,6 +2088,8 @@ inline auto create_mirror_view(Kokkos::Impl::WithoutInitializing_t wi,
src, Kokkos::view_alloc(typename Space::memory_space{}, wi)); src, Kokkos::view_alloc(typename Space::memory_space{}, wi));
} }
// public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs> template <class T, class... P, class... ViewCtorArgs>
inline auto create_mirror_view( inline auto create_mirror_view(
const typename Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, const typename Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
@ -2195,75 +2097,51 @@ inline auto create_mirror_view(
return Impl::create_mirror_view(src, arg_prop); return Impl::create_mirror_view(src, arg_prop);
} }
template <class... ViewCtorArgs, class T, class... P> // create a mirror view and deep copy it
// public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class... ViewCtorArgs, class T, class... P,
class Enable = std::enable_if_t<
std::is_void<typename ViewTraits<T, P...>::specialize>::value>>
auto create_mirror_view_and_copy( auto create_mirror_view_and_copy(
const Impl::ViewCtorProp<ViewCtorArgs...>&, [[maybe_unused]] const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
const Kokkos::DynRankView<T, P...>& src, const Kokkos::DynRankView<T, P...>& src) {
std::enable_if_t<
std::is_void<typename ViewTraits<T, P...>::specialize>::value &&
Impl::MirrorDRViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::is_same_memspace>* = nullptr) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>; using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
static_assert(
alloc_prop_input::has_memory_space,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must include a memory space!");
static_assert(!alloc_prop_input::has_pointer,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not include a pointer!");
static_assert(!alloc_prop_input::allow_padding,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not explicitly allow padding!");
// same behavior as deep_copy(src, src) Impl::check_view_ctor_args_create_mirror_view_and_copy<ViewCtorArgs...>();
if (!alloc_prop_input::has_execution_space)
fence(
"Kokkos::create_mirror_view_and_copy: fence before returning src view");
return src;
}
template <class... ViewCtorArgs, class T, class... P> if constexpr (Impl::MirrorDRViewType<
auto create_mirror_view_and_copy( typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, T, P...>::is_same_memspace) {
const Kokkos::DynRankView<T, P...>& src, // same behavior as deep_copy(src, src)
std::enable_if_t< if constexpr (!alloc_prop_input::has_execution_space)
std::is_void<typename ViewTraits<T, P...>::specialize>::value && fence(
!Impl::MirrorDRViewType< "Kokkos::create_mirror_view_and_copy: fence before returning src "
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T, "view");
P...>::is_same_memspace>* = nullptr) { return src;
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>; } else {
static_assert( using Space = typename alloc_prop_input::memory_space;
alloc_prop_input::has_memory_space, using Mirror = typename Impl::MirrorDRViewType<Space, T, P...>::view_type;
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must include a memory space!");
static_assert(!alloc_prop_input::has_pointer,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not include a pointer!");
static_assert(!alloc_prop_input::allow_padding,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not explicitly allow padding!");
using Space = typename alloc_prop_input::memory_space;
using Mirror = typename Impl::MirrorDRViewType<Space, T, P...>::view_type;
auto arg_prop_copy = Impl::with_properties_if_unset( auto arg_prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string{}, WithoutInitializing, arg_prop, std::string{}, WithoutInitializing,
typename Space::execution_space{}); typename Space::execution_space{});
std::string& label = Impl::get_property<Impl::LabelTag>(arg_prop_copy); std::string& label = Impl::get_property<Impl::LabelTag>(arg_prop_copy);
if (label.empty()) label = src.label(); if (label.empty()) label = src.label();
auto mirror = typename Mirror::non_const_type{ auto mirror = typename Mirror::non_const_type{
arg_prop_copy, Impl::reconstructLayout(src.layout(), src.rank())}; arg_prop_copy, Impl::reconstructLayout(src.layout(), src.rank())};
if constexpr (alloc_prop_input::has_execution_space) { if constexpr (alloc_prop_input::has_execution_space) {
deep_copy(Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop_copy), deep_copy(Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop_copy),
mirror, src); mirror, src);
} else } else
deep_copy(mirror, src); deep_copy(mirror, src);
return mirror; return mirror;
}
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
} }
template <class Space, class T, class... P> template <class Space, class T, class... P>

View File

@ -590,96 +590,81 @@ struct MirrorDynamicViewType {
} // namespace Impl } // namespace Impl
namespace Impl { namespace Impl {
template <class T, class... P, class... ViewCtorArgs>
inline auto create_mirror(
const Kokkos::Experimental::DynamicView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
std::enable_if_t<!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>* =
nullptr) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
static_assert( // create a mirror
!alloc_prop_input::has_label, // private interface that accepts arbitrary view constructor args passed by a
"The view constructor arguments passed to Kokkos::create_mirror " // view_alloc
"must not include a label!"); template <class T, class... P, class... ViewCtorArgs>
static_assert( inline auto create_mirror(const Kokkos::Experimental::DynamicView<T, P...>& src,
!alloc_prop_input::has_pointer, const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
"The view constructor arguments passed to Kokkos::create_mirror must " using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
"not include a pointer!"); check_view_ctor_args_create_mirror<ViewCtorArgs...>();
static_assert(
!alloc_prop_input::allow_padding,
"The view constructor arguments passed to Kokkos::create_mirror must "
"not explicitly allow padding!");
auto prop_copy = Impl::with_properties_if_unset( auto prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string(src.label()).append("_mirror")); arg_prop, std::string(src.label()).append("_mirror"));
auto ret = typename Kokkos::Experimental::DynamicView<T, P...>::HostMirror( if constexpr (Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space) {
prop_copy, src.chunk_size(), src.chunk_max() * src.chunk_size()); using MemorySpace = typename alloc_prop_input::memory_space;
ret.resize_serial(src.extent(0)); auto ret = typename Kokkos::Impl::MirrorDynamicViewType<
MemorySpace, T, P...>::view_type(prop_copy, src.chunk_size(),
src.chunk_max() * src.chunk_size());
return ret; ret.resize_serial(src.extent(0));
return ret;
} else {
auto ret = typename Kokkos::Experimental::DynamicView<T, P...>::HostMirror(
prop_copy, src.chunk_size(), src.chunk_max() * src.chunk_size());
ret.resize_serial(src.extent(0));
return ret;
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
__builtin_unreachable();
#endif
} }
template <class T, class... P, class... ViewCtorArgs>
inline auto create_mirror(
const Kokkos::Experimental::DynamicView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
std::enable_if_t<Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>* =
nullptr) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
static_assert(
!alloc_prop_input::has_label,
"The view constructor arguments passed to Kokkos::create_mirror "
"must not include a label!");
static_assert(
!alloc_prop_input::has_pointer,
"The view constructor arguments passed to Kokkos::create_mirror must "
"not include a pointer!");
static_assert(
!alloc_prop_input::allow_padding,
"The view constructor arguments passed to Kokkos::create_mirror must "
"not explicitly allow padding!");
using MemorySpace = typename alloc_prop_input::memory_space;
auto prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string(src.label()).append("_mirror"));
auto ret = typename Kokkos::Impl::MirrorDynamicViewType<
MemorySpace, T, P...>::view_type(prop_copy, src.chunk_size(),
src.chunk_max() * src.chunk_size());
ret.resize_serial(src.extent(0));
return ret;
}
} // namespace Impl } // namespace Impl
// Create a mirror in host space // public interface
template <class T, class... P> template <class T, class... P,
typename Enable = std::enable_if_t<
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
inline auto create_mirror( inline auto create_mirror(
const Kokkos::Experimental::DynamicView<T, P...>& src) { const Kokkos::Experimental::DynamicView<T, P...>& src) {
return Impl::create_mirror(src, Impl::ViewCtorProp<>{}); return Impl::create_mirror(src, Impl::ViewCtorProp<>{});
} }
template <class T, class... P> // public interface that accepts a without initializing flag
template <class T, class... P,
typename Enable = std::enable_if_t<
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
inline auto create_mirror( inline auto create_mirror(
Kokkos::Impl::WithoutInitializing_t wi, Kokkos::Impl::WithoutInitializing_t wi,
const Kokkos::Experimental::DynamicView<T, P...>& src) { const Kokkos::Experimental::DynamicView<T, P...>& src) {
return Impl::create_mirror(src, Kokkos::view_alloc(wi)); return Impl::create_mirror(src, Kokkos::view_alloc(wi));
} }
// Create a mirror in a new space // public interface that accepts a space
template <class Space, class T, class... P> template <class Space, class T, class... P,
typename Enable = std::enable_if_t<
Kokkos::is_space<Space>::value &&
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
inline auto create_mirror( inline auto create_mirror(
const Space&, const Kokkos::Experimental::DynamicView<T, P...>& src) { const Space&, const Kokkos::Experimental::DynamicView<T, P...>& src) {
return Impl::create_mirror( return Impl::create_mirror(
src, Kokkos::view_alloc(typename Space::memory_space{})); src, Kokkos::view_alloc(typename Space::memory_space{}));
} }
template <class Space, class T, class... P> // public interface that accepts a space and a without initializing flag
template <class Space, class T, class... P,
typename Enable = std::enable_if_t<
Kokkos::is_space<Space>::value &&
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
typename Kokkos::Impl::MirrorDynamicViewType<Space, T, P...>::view_type typename Kokkos::Impl::MirrorDynamicViewType<Space, T, P...>::view_type
create_mirror(Kokkos::Impl::WithoutInitializing_t wi, const Space&, create_mirror(Kokkos::Impl::WithoutInitializing_t wi, const Space&,
const Kokkos::Experimental::DynamicView<T, P...>& src) { const Kokkos::Experimental::DynamicView<T, P...>& src) {
@ -687,7 +672,11 @@ create_mirror(Kokkos::Impl::WithoutInitializing_t wi, const Space&,
src, Kokkos::view_alloc(wi, typename Space::memory_space{})); src, Kokkos::view_alloc(wi, typename Space::memory_space{}));
} }
template <class T, class... P, class... ViewCtorArgs> // public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs,
typename Enable = std::enable_if_t<
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
inline auto create_mirror( inline auto create_mirror(
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
const Kokkos::Experimental::DynamicView<T, P...>& src) { const Kokkos::Experimental::DynamicView<T, P...>& src) {
@ -696,76 +685,56 @@ inline auto create_mirror(
namespace Impl { namespace Impl {
// create a mirror view
// private interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs> template <class T, class... P, class... ViewCtorArgs>
inline std::enable_if_t< inline auto create_mirror_view(
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space && const Kokkos::Experimental::DynamicView<T, P...>& src,
(std::is_same< [[maybe_unused]] const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
typename Kokkos::Experimental::DynamicView<T, P...>::memory_space, if constexpr (!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space) {
typename Kokkos::Experimental::DynamicView< if constexpr (std::is_same<typename Kokkos::Experimental::DynamicView<
T, P...>::HostMirror::memory_space>::value && T, P...>::memory_space,
std::is_same< typename Kokkos::Experimental::DynamicView<
typename Kokkos::Experimental::DynamicView<T, P...>::data_type, T, P...>::HostMirror::memory_space>::value &&
typename Kokkos::Experimental::DynamicView< std::is_same<typename Kokkos::Experimental::DynamicView<
T, P...>::HostMirror::data_type>::value), T, P...>::data_type,
typename Kokkos::Experimental::DynamicView<T, P...>::HostMirror> typename Kokkos::Experimental::DynamicView<
create_mirror_view(const Kokkos::Experimental::DynamicView<T, P...>& src, T, P...>::HostMirror::data_type>::value) {
const Impl::ViewCtorProp<ViewCtorArgs...>&) { return
return src; typename Kokkos::Experimental::DynamicView<T, P...>::HostMirror(src);
} else {
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
} else {
if constexpr (Impl::MirrorDynamicViewType<
typename Impl::ViewCtorProp<
ViewCtorArgs...>::memory_space,
T, P...>::is_same_memspace) {
return typename Impl::MirrorDynamicViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::view_type(src);
} else {
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
__builtin_unreachable();
#endif
} }
template <class T, class... P, class... ViewCtorArgs>
inline std::enable_if_t<
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space &&
!(std::is_same<
typename Kokkos::Experimental::DynamicView<T, P...>::memory_space,
typename Kokkos::Experimental::DynamicView<
T, P...>::HostMirror::memory_space>::value &&
std::is_same<
typename Kokkos::Experimental::DynamicView<T, P...>::data_type,
typename Kokkos::Experimental::DynamicView<
T, P...>::HostMirror::data_type>::value),
typename Kokkos::Experimental::DynamicView<T, P...>::HostMirror>
create_mirror_view(const Kokkos::Experimental::DynamicView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
return Kokkos::create_mirror(arg_prop, src);
}
template <class T, class... P, class... ViewCtorArgs,
class = std::enable_if_t<
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>>
std::enable_if_t<Impl::MirrorDynamicViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::is_same_memspace,
typename Impl::MirrorDynamicViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::view_type>
create_mirror_view(const Kokkos::Experimental::DynamicView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>&) {
return src;
}
template <class T, class... P, class... ViewCtorArgs,
class = std::enable_if_t<
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>>
std::enable_if_t<!Impl::MirrorDynamicViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::is_same_memspace,
typename Impl::MirrorDynamicViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::view_type>
create_mirror_view(const Kokkos::Experimental::DynamicView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
return Kokkos::Impl::create_mirror(src, arg_prop);
}
} // namespace Impl } // namespace Impl
// Create a mirror view in host space // public interface
template <class T, class... P> template <class T, class... P>
inline auto create_mirror_view( inline auto create_mirror_view(
const typename Kokkos::Experimental::DynamicView<T, P...>& src) { const typename Kokkos::Experimental::DynamicView<T, P...>& src) {
return Impl::create_mirror_view(src, Impl::ViewCtorProp<>{}); return Impl::create_mirror_view(src, Impl::ViewCtorProp<>{});
} }
// public interface that accepts a without initializing flag
template <class T, class... P> template <class T, class... P>
inline auto create_mirror_view( inline auto create_mirror_view(
Kokkos::Impl::WithoutInitializing_t wi, Kokkos::Impl::WithoutInitializing_t wi,
@ -773,15 +742,18 @@ inline auto create_mirror_view(
return Impl::create_mirror_view(src, Kokkos::view_alloc(wi)); return Impl::create_mirror_view(src, Kokkos::view_alloc(wi));
} }
// Create a mirror in a new space // public interface that accepts a space
template <class Space, class T, class... P> template <class Space, class T, class... P,
class Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
inline auto create_mirror_view( inline auto create_mirror_view(
const Space&, const Kokkos::Experimental::DynamicView<T, P...>& src) { const Space&, const Kokkos::Experimental::DynamicView<T, P...>& src) {
return Impl::create_mirror_view(src, return Impl::create_mirror_view(src,
view_alloc(typename Space::memory_space{})); view_alloc(typename Space::memory_space{}));
} }
template <class Space, class T, class... P> // public interface that accepts a space and a without initializing flag
template <class Space, class T, class... P,
class Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
inline auto create_mirror_view( inline auto create_mirror_view(
Kokkos::Impl::WithoutInitializing_t wi, const Space&, Kokkos::Impl::WithoutInitializing_t wi, const Space&,
const Kokkos::Experimental::DynamicView<T, P...>& src) { const Kokkos::Experimental::DynamicView<T, P...>& src) {
@ -789,6 +761,8 @@ inline auto create_mirror_view(
src, Kokkos::view_alloc(wi, typename Space::memory_space{})); src, Kokkos::view_alloc(wi, typename Space::memory_space{}));
} }
// public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs> template <class T, class... P, class... ViewCtorArgs>
inline auto create_mirror_view( inline auto create_mirror_view(
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
@ -985,80 +959,57 @@ struct ViewCopy<Kokkos::Experimental::DynamicView<DP...>,
} // namespace Impl } // namespace Impl
template <class... ViewCtorArgs, class T, class... P> // create a mirror view and deep copy it
// public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class... ViewCtorArgs, class T, class... P,
class Enable = std::enable_if_t<
std::is_void<typename ViewTraits<T, P...>::specialize>::value>>
auto create_mirror_view_and_copy( auto create_mirror_view_and_copy(
const Impl::ViewCtorProp<ViewCtorArgs...>&, [[maybe_unused]] const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
const Kokkos::Experimental::DynamicView<T, P...>& src, const Kokkos::Experimental::DynamicView<T, P...>& src) {
std::enable_if_t<
std::is_void<typename ViewTraits<T, P...>::specialize>::value &&
Impl::MirrorDynamicViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::is_same_memspace>* = nullptr) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>; using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
static_assert(
alloc_prop_input::has_memory_space,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must include a memory space!");
static_assert(!alloc_prop_input::has_pointer,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not include a pointer!");
static_assert(!alloc_prop_input::allow_padding,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not explicitly allow padding!");
// same behavior as deep_copy(src, src) Impl::check_view_ctor_args_create_mirror_view_and_copy<ViewCtorArgs...>();
if (!alloc_prop_input::has_execution_space)
fence( if constexpr (Impl::MirrorDynamicViewType<
"Kokkos::create_mirror_view_and_copy: fence before returning src view"); typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
return src; T, P...>::is_same_memspace) {
// same behavior as deep_copy(src, src)
if constexpr (!alloc_prop_input::has_execution_space)
fence(
"Kokkos::create_mirror_view_and_copy: fence before returning src "
"view");
return src;
} else {
using Space = typename alloc_prop_input::memory_space;
using Mirror =
typename Impl::MirrorDynamicViewType<Space, T, P...>::view_type;
auto arg_prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string{}, WithoutInitializing,
typename Space::execution_space{});
std::string& label = Impl::get_property<Impl::LabelTag>(arg_prop_copy);
if (label.empty()) label = src.label();
auto mirror = typename Mirror::non_const_type(
arg_prop_copy, src.chunk_size(), src.chunk_max() * src.chunk_size());
mirror.resize_serial(src.extent(0));
if constexpr (alloc_prop_input::has_execution_space) {
deep_copy(Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop_copy),
mirror, src);
} else
deep_copy(mirror, src);
return mirror;
}
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
} }
template <class... ViewCtorArgs, class T, class... P> template <class Space, class T, class... P,
auto create_mirror_view_and_copy( typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
const Kokkos::Experimental::DynamicView<T, P...>& src,
std::enable_if_t<
std::is_void<typename ViewTraits<T, P...>::specialize>::value &&
!Impl::MirrorDynamicViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::is_same_memspace>* = nullptr) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
static_assert(
alloc_prop_input::has_memory_space,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must include a memory space!");
static_assert(!alloc_prop_input::has_pointer,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not include a pointer!");
static_assert(!alloc_prop_input::allow_padding,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not explicitly allow padding!");
using Space = typename alloc_prop_input::memory_space;
using Mirror =
typename Impl::MirrorDynamicViewType<Space, T, P...>::view_type;
auto arg_prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string{}, WithoutInitializing,
typename Space::execution_space{});
std::string& label = Impl::get_property<Impl::LabelTag>(arg_prop_copy);
if (label.empty()) label = src.label();
auto mirror = typename Mirror::non_const_type(
arg_prop_copy, src.chunk_size(), src.chunk_max() * src.chunk_size());
mirror.resize_serial(src.extent(0));
if constexpr (alloc_prop_input::has_execution_space) {
deep_copy(Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop_copy),
mirror, src);
} else
deep_copy(mirror, src);
return mirror;
}
template <class Space, class T, class... P>
auto create_mirror_view_and_copy( auto create_mirror_view_and_copy(
const Space&, const Kokkos::Experimental::DynamicView<T, P...>& src, const Space&, const Kokkos::Experimental::DynamicView<T, P...>& src,
std::string const& name = "") { std::string const& name = "") {

View File

@ -471,62 +471,31 @@ class OffsetView : public ViewTraits<DataType, Properties...> {
template <typename I0, typename I1> template <typename I0, typename I1>
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t< KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t<
(Kokkos::Impl::are_integral<I0, I1>::value && (2 == Rank) && (Kokkos::Impl::are_integral<I0, I1>::value && (2 == Rank) &&
is_default_map && is_layout_left && (traits::rank_dynamic == 0)), is_default_map &&
(is_layout_left || is_layout_right || is_layout_stride)),
reference_type> reference_type>
operator()(const I0& i0, const I1& i1) const { operator()(const I0& i0, const I1& i1) const {
KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY((m_track, m_map, m_begins, i0, i1)) KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY((m_track, m_map, m_begins, i0, i1))
const size_t j0 = i0 - m_begins[0]; const size_t j0 = i0 - m_begins[0];
const size_t j1 = i1 - m_begins[1]; const size_t j1 = i1 - m_begins[1];
return m_map.m_impl_handle[j0 + m_map.m_impl_offset.m_dim.N0 * j1]; if constexpr (is_layout_left) {
} if constexpr (traits::rank_dynamic == 0)
return m_map.m_impl_handle[j0 + m_map.m_impl_offset.m_dim.N0 * j1];
template <typename I0, typename I1> else
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t< return m_map.m_impl_handle[j0 + m_map.m_impl_offset.m_stride * j1];
(Kokkos::Impl::are_integral<I0, I1>::value && (2 == Rank) && } else if constexpr (is_layout_right) {
is_default_map && is_layout_left && (traits::rank_dynamic != 0)), if constexpr (traits::rank_dynamic == 0)
reference_type> return m_map.m_impl_handle[j1 + m_map.m_impl_offset.m_dim.N1 * j0];
operator()(const I0& i0, const I1& i1) const { else
KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY((m_track, m_map, m_begins, i0, i1)) return m_map.m_impl_handle[j1 + m_map.m_impl_offset.m_stride * j0];
const size_t j0 = i0 - m_begins[0]; } else {
const size_t j1 = i1 - m_begins[1]; static_assert(is_layout_stride);
return m_map.m_impl_handle[j0 + m_map.m_impl_offset.m_stride * j1]; return m_map.m_impl_handle[j0 * m_map.m_impl_offset.m_stride.S0 +
} j1 * m_map.m_impl_offset.m_stride.S1];
}
template <typename I0, typename I1> #if defined(KOKKOS_COMPILER_INTEL)
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t< __builtin_unreachable();
(Kokkos::Impl::are_integral<I0, I1>::value && (2 == Rank) && #endif
is_default_map && is_layout_right && (traits::rank_dynamic == 0)),
reference_type>
operator()(const I0& i0, const I1& i1) const {
KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY((m_track, m_map, m_begins, i0, i1))
const size_t j0 = i0 - m_begins[0];
const size_t j1 = i1 - m_begins[1];
return m_map.m_impl_handle[j1 + m_map.m_impl_offset.m_dim.N1 * j0];
}
template <typename I0, typename I1>
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t<
(Kokkos::Impl::are_integral<I0, I1>::value && (2 == Rank) &&
is_default_map && is_layout_right && (traits::rank_dynamic != 0)),
reference_type>
operator()(const I0& i0, const I1& i1) const {
KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY((m_track, m_map, m_begins, i0, i1))
const size_t j0 = i0 - m_begins[0];
const size_t j1 = i1 - m_begins[1];
return m_map.m_impl_handle[j1 + m_map.m_impl_offset.m_stride * j0];
}
template <typename I0, typename I1>
KOKKOS_FORCEINLINE_FUNCTION
std::enable_if_t<(Kokkos::Impl::are_integral<I0, I1>::value &&
(2 == Rank) && is_default_map && is_layout_stride),
reference_type>
operator()(const I0& i0, const I1& i1) const {
KOKKOS_IMPL_OFFSETVIEW_OPERATOR_VERIFY((m_track, m_map, m_begins, i0, i1))
const size_t j0 = i0 - m_begins[0];
const size_t j1 = i1 - m_begins[1];
return m_map.m_impl_handle[j0 * m_map.m_impl_offset.m_stride.S0 +
j1 * m_map.m_impl_offset.m_stride.S1];
} }
//------------------------------ //------------------------------
@ -1841,71 +1810,73 @@ struct MirrorOffsetType {
} // namespace Impl } // namespace Impl
namespace Impl { namespace Impl {
template <class T, class... P, class... ViewCtorArgs>
inline std::enable_if_t<
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space,
typename Kokkos::Experimental::OffsetView<T, P...>::HostMirror>
create_mirror(const Kokkos::Experimental::OffsetView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
return typename Kokkos::Experimental::OffsetView<T, P...>::HostMirror(
Kokkos::create_mirror(arg_prop, src.view()), src.begins());
}
template <class T, class... P, class... ViewCtorArgs, // create a mirror
class = std::enable_if_t< // private interface that accepts arbitrary view constructor args passed by a
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>> // view_alloc
template <class T, class... P, class... ViewCtorArgs>
inline auto create_mirror(const Kokkos::Experimental::OffsetView<T, P...>& src, inline auto create_mirror(const Kokkos::Experimental::OffsetView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) { const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>; check_view_ctor_args_create_mirror<ViewCtorArgs...>();
using Space = typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space;
static_assert( if constexpr (Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space) {
!alloc_prop_input::has_label, using Space = typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space;
"The view constructor arguments passed to Kokkos::create_mirror "
"must not include a label!");
static_assert(
!alloc_prop_input::has_pointer,
"The view constructor arguments passed to Kokkos::create_mirror must "
"not include a pointer!");
static_assert(
!alloc_prop_input::allow_padding,
"The view constructor arguments passed to Kokkos::create_mirror must "
"not explicitly allow padding!");
auto prop_copy = Impl::with_properties_if_unset( auto prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string(src.label()).append("_mirror")); arg_prop, std::string(src.label()).append("_mirror"));
return typename Kokkos::Impl::MirrorOffsetType<Space, T, P...>::view_type( return typename Kokkos::Impl::MirrorOffsetType<Space, T, P...>::view_type(
prop_copy, src.layout(), prop_copy, src.layout(),
{src.begin(0), src.begin(1), src.begin(2), src.begin(3), src.begin(4), {src.begin(0), src.begin(1), src.begin(2), src.begin(3), src.begin(4),
src.begin(5), src.begin(6), src.begin(7)}); src.begin(5), src.begin(6), src.begin(7)});
} else {
return typename Kokkos::Experimental::OffsetView<T, P...>::HostMirror(
Kokkos::create_mirror(arg_prop, src.view()), src.begins());
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
__builtin_unreachable();
#endif
} }
} // namespace Impl } // namespace Impl
// Create a mirror in host space // public interface
template <class T, class... P> template <class T, class... P,
typename = std::enable_if_t<
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
inline auto create_mirror( inline auto create_mirror(
const Kokkos::Experimental::OffsetView<T, P...>& src) { const Kokkos::Experimental::OffsetView<T, P...>& src) {
return Impl::create_mirror(src, Impl::ViewCtorProp<>{}); return Impl::create_mirror(src, Impl::ViewCtorProp<>{});
} }
template <class T, class... P> // public interface that accepts a without initializing flag
template <class T, class... P,
typename = std::enable_if_t<
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
inline auto create_mirror( inline auto create_mirror(
Kokkos::Impl::WithoutInitializing_t wi, Kokkos::Impl::WithoutInitializing_t wi,
const Kokkos::Experimental::OffsetView<T, P...>& src) { const Kokkos::Experimental::OffsetView<T, P...>& src) {
return Impl::create_mirror(src, Kokkos::view_alloc(wi)); return Impl::create_mirror(src, Kokkos::view_alloc(wi));
} }
// Create a mirror in a new space // public interface that accepts a space
template <class Space, class T, class... P, template <class Space, class T, class... P,
typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>> typename Enable = std::enable_if_t<
Kokkos::is_space<Space>::value &&
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
inline auto create_mirror( inline auto create_mirror(
const Space&, const Kokkos::Experimental::OffsetView<T, P...>& src) { const Space&, const Kokkos::Experimental::OffsetView<T, P...>& src) {
return Impl::create_mirror( return Impl::create_mirror(
src, Kokkos::view_alloc(typename Space::memory_space{})); src, Kokkos::view_alloc(typename Space::memory_space{}));
} }
template <class Space, class T, class... P> // public interface that accepts a space and a without initializing flag
template <class Space, class T, class... P,
typename Enable = std::enable_if_t<
Kokkos::is_space<Space>::value &&
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
typename Kokkos::Impl::MirrorOffsetType<Space, T, P...>::view_type typename Kokkos::Impl::MirrorOffsetType<Space, T, P...>::view_type
create_mirror(Kokkos::Impl::WithoutInitializing_t wi, const Space&, create_mirror(Kokkos::Impl::WithoutInitializing_t wi, const Space&,
const Kokkos::Experimental::OffsetView<T, P...>& src) { const Kokkos::Experimental::OffsetView<T, P...>& src) {
@ -1913,7 +1884,11 @@ create_mirror(Kokkos::Impl::WithoutInitializing_t wi, const Space&,
src, Kokkos::view_alloc(typename Space::memory_space{}, wi)); src, Kokkos::view_alloc(typename Space::memory_space{}, wi));
} }
template <class T, class... P, class... ViewCtorArgs> // public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs,
typename = std::enable_if_t<
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
inline auto create_mirror( inline auto create_mirror(
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
const Kokkos::Experimental::OffsetView<T, P...>& src) { const Kokkos::Experimental::OffsetView<T, P...>& src) {
@ -1921,76 +1896,56 @@ inline auto create_mirror(
} }
namespace Impl { namespace Impl {
// create a mirror view
// private interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs> template <class T, class... P, class... ViewCtorArgs>
inline std::enable_if_t< inline auto create_mirror_view(
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space && const Kokkos::Experimental::OffsetView<T, P...>& src,
(std::is_same< [[maybe_unused]] const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
typename Kokkos::Experimental::OffsetView<T, P...>::memory_space, if constexpr (!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space) {
typename Kokkos::Experimental::OffsetView< if constexpr (std::is_same<typename Kokkos::Experimental::OffsetView<
T, P...>::HostMirror::memory_space>::value && T, P...>::memory_space,
std::is_same< typename Kokkos::Experimental::OffsetView<
typename Kokkos::Experimental::OffsetView<T, P...>::data_type, T, P...>::HostMirror::memory_space>::value &&
typename Kokkos::Experimental::OffsetView< std::is_same<typename Kokkos::Experimental::OffsetView<
T, P...>::HostMirror::data_type>::value), T, P...>::data_type,
typename Kokkos::Experimental::OffsetView<T, P...>::HostMirror> typename Kokkos::Experimental::OffsetView<
create_mirror_view(const Kokkos::Experimental::OffsetView<T, P...>& src, T, P...>::HostMirror::data_type>::value) {
const Impl::ViewCtorProp<ViewCtorArgs...>&) { return
return src; typename Kokkos::Experimental::OffsetView<T, P...>::HostMirror(src);
} else {
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
} else {
if constexpr (Impl::MirrorOffsetViewType<typename Impl::ViewCtorProp<
ViewCtorArgs...>::memory_space,
T, P...>::is_same_memspace) {
return typename Impl::MirrorOffsetViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::view_type(src);
} else {
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
__builtin_unreachable();
#endif
} }
template <class T, class... P, class... ViewCtorArgs>
inline std::enable_if_t<
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space &&
!(std::is_same<
typename Kokkos::Experimental::OffsetView<T, P...>::memory_space,
typename Kokkos::Experimental::OffsetView<
T, P...>::HostMirror::memory_space>::value &&
std::is_same<
typename Kokkos::Experimental::OffsetView<T, P...>::data_type,
typename Kokkos::Experimental::OffsetView<
T, P...>::HostMirror::data_type>::value),
typename Kokkos::Experimental::OffsetView<T, P...>::HostMirror>
create_mirror_view(const Kokkos::Experimental::OffsetView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
return Kokkos::create_mirror(arg_prop, src);
}
template <class T, class... P, class... ViewCtorArgs,
class = std::enable_if_t<
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>>
std::enable_if_t<Impl::MirrorOffsetViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::is_same_memspace,
typename Impl::MirrorOffsetViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::view_type>
create_mirror_view(const Kokkos::Experimental::OffsetView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>&) {
return src;
}
template <class T, class... P, class... ViewCtorArgs,
class = std::enable_if_t<
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>>
std::enable_if_t<!Impl::MirrorOffsetViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::is_same_memspace,
typename Impl::MirrorOffsetViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::view_type>
create_mirror_view(const Kokkos::Experimental::OffsetView<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
return Kokkos::Impl::create_mirror(src, arg_prop);
}
} // namespace Impl } // namespace Impl
// Create a mirror view in host space // public interface
template <class T, class... P> template <class T, class... P>
inline auto create_mirror_view( inline auto create_mirror_view(
const typename Kokkos::Experimental::OffsetView<T, P...>& src) { const typename Kokkos::Experimental::OffsetView<T, P...>& src) {
return Impl::create_mirror_view(src, Impl::ViewCtorProp<>{}); return Impl::create_mirror_view(src, Impl::ViewCtorProp<>{});
} }
// public interface that accepts a without initializing flag
template <class T, class... P> template <class T, class... P>
inline auto create_mirror_view( inline auto create_mirror_view(
Kokkos::Impl::WithoutInitializing_t wi, Kokkos::Impl::WithoutInitializing_t wi,
@ -1998,7 +1953,7 @@ inline auto create_mirror_view(
return Impl::create_mirror_view(src, Kokkos::view_alloc(wi)); return Impl::create_mirror_view(src, Kokkos::view_alloc(wi));
} }
// Create a mirror view in a new space // public interface that accepts a space
template <class Space, class T, class... P, template <class Space, class T, class... P,
typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>> typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
inline auto create_mirror_view( inline auto create_mirror_view(
@ -2007,7 +1962,9 @@ inline auto create_mirror_view(
src, Kokkos::view_alloc(typename Space::memory_space{})); src, Kokkos::view_alloc(typename Space::memory_space{}));
} }
template <class Space, class T, class... P> // public interface that accepts a space and a without initializing flag
template <class Space, class T, class... P,
typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
inline auto create_mirror_view( inline auto create_mirror_view(
Kokkos::Impl::WithoutInitializing_t wi, const Space&, Kokkos::Impl::WithoutInitializing_t wi, const Space&,
const Kokkos::Experimental::OffsetView<T, P...>& src) { const Kokkos::Experimental::OffsetView<T, P...>& src) {
@ -2015,6 +1972,8 @@ inline auto create_mirror_view(
src, Kokkos::view_alloc(typename Space::memory_space{}, wi)); src, Kokkos::view_alloc(typename Space::memory_space{}, wi));
} }
// public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs> template <class T, class... P, class... ViewCtorArgs>
inline auto create_mirror_view( inline auto create_mirror_view(
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
@ -2022,7 +1981,9 @@ inline auto create_mirror_view(
return Impl::create_mirror_view(src, arg_prop); return Impl::create_mirror_view(src, arg_prop);
} }
// Create a mirror view and deep_copy in a new space // create a mirror view and deep copy it
// public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class... ViewCtorArgs, class T, class... P> template <class... ViewCtorArgs, class T, class... P>
typename Kokkos::Impl::MirrorOffsetViewType< typename Kokkos::Impl::MirrorOffsetViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T, typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,

View File

@ -805,56 +805,94 @@ class UnorderedMap {
return *this; return *this;
} }
// Re-allocate the views of the calling UnorderedMap according to src
// capacity, and deep copy the src data.
template <typename SKey, typename SValue, typename SDevice> template <typename SKey, typename SValue, typename SDevice>
std::enable_if_t<std::is_same<std::remove_const_t<SKey>, key_type>::value && std::enable_if_t<std::is_same<std::remove_const_t<SKey>, key_type>::value &&
std::is_same<std::remove_const_t<SValue>, value_type>::value> std::is_same<std::remove_const_t<SValue>, value_type>::value>
create_copy_view( create_copy_view(
UnorderedMap<SKey, SValue, SDevice, Hasher, EqualTo> const &src) { UnorderedMap<SKey, SValue, SDevice, Hasher, EqualTo> const &src) {
if (m_hash_lists.data() != src.m_hash_lists.data()) { if (m_hash_lists.data() != src.m_hash_lists.data()) {
insertable_map_type tmp; allocate_view(src);
deep_copy_view(src);
}
}
tmp.m_bounded_insert = src.m_bounded_insert; // Allocate views of the calling UnorderedMap with the same capacity as the
tmp.m_hasher = src.m_hasher; // src.
tmp.m_equal_to = src.m_equal_to; template <typename SKey, typename SValue, typename SDevice>
tmp.m_size() = src.m_size(); std::enable_if_t<std::is_same<std::remove_const_t<SKey>, key_type>::value &&
tmp.m_available_indexes = bitset_type(src.capacity()); std::is_same<std::remove_const_t<SValue>, value_type>::value>
tmp.m_hash_lists = size_type_view( allocate_view(
view_alloc(WithoutInitializing, "UnorderedMap hash list"), UnorderedMap<SKey, SValue, SDevice, Hasher, EqualTo> const &src) {
src.m_hash_lists.extent(0)); insertable_map_type tmp;
tmp.m_next_index = size_type_view(
view_alloc(WithoutInitializing, "UnorderedMap next index"),
src.m_next_index.extent(0));
tmp.m_keys =
key_type_view(view_alloc(WithoutInitializing, "UnorderedMap keys"),
src.m_keys.extent(0));
tmp.m_values = value_type_view(
view_alloc(WithoutInitializing, "UnorderedMap values"),
src.m_values.extent(0));
tmp.m_scalars = scalars_view("UnorderedMap scalars");
Kokkos::deep_copy(tmp.m_available_indexes, src.m_available_indexes); tmp.m_bounded_insert = src.m_bounded_insert;
tmp.m_hasher = src.m_hasher;
tmp.m_equal_to = src.m_equal_to;
tmp.m_size() = src.m_size();
tmp.m_available_indexes = bitset_type(src.capacity());
tmp.m_hash_lists = size_type_view(
view_alloc(WithoutInitializing, "UnorderedMap hash list"),
src.m_hash_lists.extent(0));
tmp.m_next_index = size_type_view(
view_alloc(WithoutInitializing, "UnorderedMap next index"),
src.m_next_index.extent(0));
tmp.m_keys =
key_type_view(view_alloc(WithoutInitializing, "UnorderedMap keys"),
src.m_keys.extent(0));
tmp.m_values =
value_type_view(view_alloc(WithoutInitializing, "UnorderedMap values"),
src.m_values.extent(0));
tmp.m_scalars = scalars_view("UnorderedMap scalars");
*this = tmp;
}
// Deep copy view data from src. This requires that the src capacity is
// identical to the capacity of the calling UnorderedMap.
template <typename SKey, typename SValue, typename SDevice>
std::enable_if_t<std::is_same<std::remove_const_t<SKey>, key_type>::value &&
std::is_same<std::remove_const_t<SValue>, value_type>::value>
deep_copy_view(
UnorderedMap<SKey, SValue, SDevice, Hasher, EqualTo> const &src) {
#ifndef KOKKOS_ENABLE_DEPRECATED_CODE_4
// To deep copy UnorderedMap, capacity must be identical
KOKKOS_EXPECTS(capacity() == src.capacity());
#else
if (capacity() != src.capacity()) {
allocate_view(src);
#ifdef KOKKOS_ENABLE_DEPRECATION_WARNINGS
Kokkos::Impl::log_warning(
"Warning: deep_copy_view() allocating views is deprecated. Must call "
"with UnorderedMaps of identical capacity, or use "
"create_copy_view().\n");
#endif
}
#endif
if (m_hash_lists.data() != src.m_hash_lists.data()) {
Kokkos::deep_copy(m_available_indexes, src.m_available_indexes);
using raw_deep_copy = using raw_deep_copy =
Kokkos::Impl::DeepCopy<typename device_type::memory_space, Kokkos::Impl::DeepCopy<typename device_type::memory_space,
typename SDevice::memory_space>; typename SDevice::memory_space>;
raw_deep_copy(tmp.m_hash_lists.data(), src.m_hash_lists.data(), raw_deep_copy(m_hash_lists.data(), src.m_hash_lists.data(),
sizeof(size_type) * src.m_hash_lists.extent(0)); sizeof(size_type) * src.m_hash_lists.extent(0));
raw_deep_copy(tmp.m_next_index.data(), src.m_next_index.data(), raw_deep_copy(m_next_index.data(), src.m_next_index.data(),
sizeof(size_type) * src.m_next_index.extent(0)); sizeof(size_type) * src.m_next_index.extent(0));
raw_deep_copy(tmp.m_keys.data(), src.m_keys.data(), raw_deep_copy(m_keys.data(), src.m_keys.data(),
sizeof(key_type) * src.m_keys.extent(0)); sizeof(key_type) * src.m_keys.extent(0));
if (!is_set) { if (!is_set) {
raw_deep_copy(tmp.m_values.data(), src.m_values.data(), raw_deep_copy(m_values.data(), src.m_values.data(),
sizeof(impl_value_type) * src.m_values.extent(0)); sizeof(impl_value_type) * src.m_values.extent(0));
} }
raw_deep_copy(tmp.m_scalars.data(), src.m_scalars.data(), raw_deep_copy(m_scalars.data(), src.m_scalars.data(),
sizeof(int) * num_scalars); sizeof(int) * num_scalars);
Kokkos::fence( Kokkos::fence(
"Kokkos::UnorderedMap::create_copy_view: fence after copy to tmp"); "Kokkos::UnorderedMap::deep_copy_view: fence after copy to dst.");
*this = tmp;
} }
} }
@ -932,13 +970,25 @@ class UnorderedMap {
friend struct Impl::UnorderedMapPrint; friend struct Impl::UnorderedMapPrint;
}; };
// Specialization of deep_copy for two UnorderedMap objects. // Specialization of deep_copy() for two UnorderedMap objects.
template <typename DKey, typename DT, typename DDevice, typename SKey, template <typename DKey, typename DT, typename DDevice, typename SKey,
typename ST, typename SDevice, typename Hasher, typename EqualTo> typename ST, typename SDevice, typename Hasher, typename EqualTo>
inline void deep_copy( inline void deep_copy(
UnorderedMap<DKey, DT, DDevice, Hasher, EqualTo> &dst, UnorderedMap<DKey, DT, DDevice, Hasher, EqualTo> &dst,
const UnorderedMap<SKey, ST, SDevice, Hasher, EqualTo> &src) { const UnorderedMap<SKey, ST, SDevice, Hasher, EqualTo> &src) {
dst.create_copy_view(src); dst.deep_copy_view(src);
}
// Specialization of create_mirror() for an UnorderedMap object.
template <typename Key, typename ValueType, typename Device, typename Hasher,
typename EqualTo>
typename UnorderedMap<Key, ValueType, Device, Hasher, EqualTo>::HostMirror
create_mirror(
const UnorderedMap<Key, ValueType, Device, Hasher, EqualTo> &src) {
typename UnorderedMap<Key, ValueType, Device, Hasher, EqualTo>::HostMirror
dst;
dst.allocate_view(src);
return dst;
} }
} // namespace Kokkos } // namespace Kokkos

View File

@ -55,8 +55,8 @@ struct test_dualview_alloc {
bool result = false; bool result = false;
test_dualview_alloc(unsigned int size) { test_dualview_alloc(unsigned int size) {
result = run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >( result =
size, 3); run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(size, 3);
} }
}; };
@ -154,7 +154,7 @@ struct test_dualview_combinations {
} }
test_dualview_combinations(unsigned int size, bool with_init) { test_dualview_combinations(unsigned int size, bool with_init) {
result = run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >( result = run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(
size, 3, with_init); size, 3, with_init);
} }
}; };
@ -253,21 +253,18 @@ struct test_dual_view_deep_copy {
} // end run_me } // end run_me
test_dual_view_deep_copy() { test_dual_view_deep_copy() {
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >(10, 5, run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(10, 5, true);
true); run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(10, 5,
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >(10, 5, false);
false);
// Test zero length but allocated (a.d_view.data!=nullptr but // Test zero length but allocated (a.d_view.data!=nullptr but
// a.d_view.span()==0) // a.d_view.span()==0)
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >(0, 5, true); run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(0, 5, true);
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >(0, 5, run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(0, 5, false);
false);
// Test default constructed view // Test default constructed view
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >(-1, 5, run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(-1, 5, true);
true); run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(-1, 5,
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >(-1, 5, false);
false);
} }
}; };
@ -282,15 +279,20 @@ struct test_dualview_resize {
const unsigned int m = 5; const unsigned int m = 5;
const unsigned int factor = 2; const unsigned int factor = 2;
ViewType a("A", n, m); ViewType a;
if constexpr (Initialize)
a = ViewType("A", n, m);
else
a = ViewType(Kokkos::view_alloc(Kokkos::WithoutInitializing, "A"), n, m);
Kokkos::deep_copy(a.d_view, 1); Kokkos::deep_copy(a.d_view, 1);
/* Covers case "Resize on Device" */ /* Covers case "Resize on Device" */
a.modify_device(); a.modify_device();
if (Initialize) if constexpr (Initialize)
Kokkos::resize(Kokkos::WithoutInitializing, a, factor * n, factor * m);
else
Kokkos::resize(a, factor * n, factor * m); Kokkos::resize(a, factor * n, factor * m);
else
Kokkos::resize(Kokkos::WithoutInitializing, a, factor * n, factor * m);
ASSERT_EQ(a.extent(0), n * factor); ASSERT_EQ(a.extent(0), n * factor);
ASSERT_EQ(a.extent(1), m * factor); ASSERT_EQ(a.extent(1), m * factor);
@ -298,33 +300,38 @@ struct test_dualview_resize {
a.sync_host(); a.sync_host();
// Check device view is initialized as expected // Check device view is initialized as expected
scalar_type a_d_sum = 0;
// Execute on the execution_space associated with t_dev's memory space // Execute on the execution_space associated with t_dev's memory space
using t_dev_exec_space = using t_dev_exec_space =
typename ViewType::t_dev::memory_space::execution_space; typename ViewType::t_dev::memory_space::execution_space;
Kokkos::parallel_reduce( Kokkos::View<int, typename ViewType::t_dev::memory_space> errors_d(
Kokkos::RangePolicy<t_dev_exec_space>(0, a.d_view.extent(0)), "errors");
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(a.d_view), Kokkos::parallel_for(
a_d_sum); Kokkos::MDRangePolicy<t_dev_exec_space, Kokkos::Rank<2>>(
{0, 0}, {a.d_view.extent(0), a.d_view.extent(1)}),
KOKKOS_LAMBDA(int i, int j) {
if (a.d_view(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
});
int errors_d_scalar;
Kokkos::deep_copy(errors_d_scalar, errors_d);
// Check host view is synced as expected // Check host view is synced as expected
scalar_type a_h_sum = 0; int errors_h_scalar = 0;
for (size_t i = 0; i < a.h_view.extent(0); ++i) for (size_t i = 0; i < a.h_view.extent(0); ++i)
for (size_t j = 0; j < a.h_view.extent(1); ++j) { for (size_t j = 0; j < a.h_view.extent(1); ++j) {
a_h_sum += a.h_view(i, j); if (a.h_view(i, j) != 1) ++errors_h_scalar;
} }
// Check // Check
ASSERT_EQ(a_h_sum, a_d_sum); ASSERT_EQ(errors_d_scalar, 0);
ASSERT_EQ(a_h_sum, scalar_type(a.extent(0) * a.extent(1))); ASSERT_EQ(errors_h_scalar, 0);
/* Covers case "Resize on Host" */ /* Covers case "Resize on Host" */
a.modify_host(); a.modify_host();
if (Initialize) if constexpr (Initialize)
Kokkos::resize(Kokkos::WithoutInitializing, a, n / factor, m / factor);
else
Kokkos::resize(a, n / factor, m / factor); Kokkos::resize(a, n / factor, m / factor);
else
Kokkos::resize(Kokkos::WithoutInitializing, a, n / factor, m / factor);
ASSERT_EQ(a.extent(0), n / factor); ASSERT_EQ(a.extent(0), n / factor);
ASSERT_EQ(a.extent(1), m / factor); ASSERT_EQ(a.extent(1), m / factor);
@ -332,30 +339,33 @@ struct test_dualview_resize {
a.sync_device(Kokkos::DefaultExecutionSpace{}); a.sync_device(Kokkos::DefaultExecutionSpace{});
// Check device view is initialized as expected // Check device view is initialized as expected
a_d_sum = 0; Kokkos::deep_copy(errors_d, 0);
// Execute on the execution_space associated with t_dev's memory space // Execute on the execution_space associated with t_dev's memory space
using t_dev_exec_space = using t_dev_exec_space =
typename ViewType::t_dev::memory_space::execution_space; typename ViewType::t_dev::memory_space::execution_space;
Kokkos::parallel_reduce( Kokkos::parallel_for(
Kokkos::RangePolicy<t_dev_exec_space>(0, a.d_view.extent(0)), Kokkos::MDRangePolicy<t_dev_exec_space, Kokkos::Rank<2>>(
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(a.d_view), {0, 0}, {a.d_view.extent(0), a.d_view.extent(1)}),
a_d_sum); KOKKOS_LAMBDA(int i, int j) {
if (a.d_view(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
});
Kokkos::deep_copy(errors_d_scalar, errors_d);
// Check host view is synced as expected // Check host view is synced as expected
a_h_sum = 0; errors_h_scalar = 0;
for (size_t i = 0; i < a.h_view.extent(0); ++i) for (size_t i = 0; i < a.h_view.extent(0); ++i)
for (size_t j = 0; j < a.h_view.extent(1); ++j) { for (size_t j = 0; j < a.h_view.extent(1); ++j) {
a_h_sum += a.h_view(i, j); if (a.h_view(i, j) != 1) ++errors_h_scalar;
} }
// Check // Check
ASSERT_EQ(a_h_sum, scalar_type(a.extent(0) * a.extent(1))); ASSERT_EQ(errors_d_scalar, 0);
ASSERT_EQ(a_h_sum, a_d_sum); ASSERT_EQ(errors_h_scalar, 0);
} // end run_me } // end run_me
test_dualview_resize() { test_dualview_resize() {
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >(); run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>();
} }
}; };
@ -369,40 +379,51 @@ struct test_dualview_realloc {
const unsigned int n = 10; const unsigned int n = 10;
const unsigned int m = 5; const unsigned int m = 5;
ViewType a("A", n, m); ViewType a;
if (Initialize) if constexpr (Initialize) {
Kokkos::realloc(Kokkos::WithoutInitializing, a, n, m); a = ViewType("A", n, m);
else
Kokkos::realloc(a, n, m); Kokkos::realloc(a, n, m);
} else {
a = ViewType(Kokkos::view_alloc(Kokkos::WithoutInitializing, "A"), n, m);
Kokkos::realloc(Kokkos::WithoutInitializing, a, n, m);
}
ASSERT_EQ(a.extent(0), n);
ASSERT_EQ(a.extent(1), m);
Kokkos::deep_copy(a.d_view, 1); Kokkos::deep_copy(a.d_view, 1);
a.modify_device(); a.modify_device();
a.sync_host(); a.sync_host();
// Check device view is initialized as expected // Check device view is initialized as expected
scalar_type a_d_sum = 0;
// Execute on the execution_space associated with t_dev's memory space // Execute on the execution_space associated with t_dev's memory space
using t_dev_exec_space = using t_dev_exec_space =
typename ViewType::t_dev::memory_space::execution_space; typename ViewType::t_dev::memory_space::execution_space;
Kokkos::parallel_reduce( Kokkos::View<int, typename ViewType::t_dev::memory_space> errors_d(
Kokkos::RangePolicy<t_dev_exec_space>(0, a.d_view.extent(0)), "errors");
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(a.d_view), Kokkos::parallel_for(
a_d_sum); Kokkos::MDRangePolicy<t_dev_exec_space, Kokkos::Rank<2>>(
{0, 0}, {a.d_view.extent(0), a.d_view.extent(1)}),
KOKKOS_LAMBDA(int i, int j) {
if (a.d_view(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
});
int errors_d_scalar;
Kokkos::deep_copy(errors_d_scalar, errors_d);
// Check host view is synced as expected // Check host view is synced as expected
scalar_type a_h_sum = 0; int errors_h_scalar = 0;
for (size_t i = 0; i < a.h_view.extent(0); ++i) for (size_t i = 0; i < a.h_view.extent(0); ++i)
for (size_t j = 0; j < a.h_view.extent(1); ++j) { for (size_t j = 0; j < a.h_view.extent(1); ++j) {
a_h_sum += a.h_view(i, j); if (a.h_view(i, j) != 1) ++errors_h_scalar;
} }
// Check // Check
ASSERT_EQ(a_h_sum, scalar_type(a.extent(0) * a.extent(1))); ASSERT_EQ(errors_d_scalar, 0);
ASSERT_EQ(a_h_sum, a_d_sum); ASSERT_EQ(errors_h_scalar, 0);
} // end run_me } // end run_me
test_dualview_realloc() { test_dualview_realloc() {
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device> >(); run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>();
} }
}; };
@ -463,12 +484,23 @@ TEST(TEST_CATEGORY, dualview_deep_copy) {
test_dualview_deep_copy<double, TEST_EXECSPACE>(); test_dualview_deep_copy<double, TEST_EXECSPACE>();
} }
struct NoDefaultConstructor {
NoDefaultConstructor(int i_) : i(i_) {}
KOKKOS_FUNCTION operator int() const { return i; }
int i;
};
TEST(TEST_CATEGORY, dualview_realloc) { TEST(TEST_CATEGORY, dualview_realloc) {
test_dualview_realloc<int, TEST_EXECSPACE>(); test_dualview_realloc<int, TEST_EXECSPACE>();
Impl::test_dualview_realloc<NoDefaultConstructor, TEST_EXECSPACE,
/* Initialize */ false>();
} }
TEST(TEST_CATEGORY, dualview_resize) { TEST(TEST_CATEGORY, dualview_resize) {
test_dualview_resize<int, TEST_EXECSPACE>(); test_dualview_resize<int, TEST_EXECSPACE>();
Impl::test_dualview_resize<NoDefaultConstructor, TEST_EXECSPACE,
/* Initialize */ false>();
} }
namespace { namespace {

View File

@ -68,7 +68,7 @@ struct TestInsert {
} while (rehash_on_fail && failed_count > 0u); } while (rehash_on_fail && failed_count > 0u);
// Trigger the m_size mutable bug. // Trigger the m_size mutable bug.
typename map_type::HostMirror map_h; auto map_h = create_mirror(map);
execution_space().fence(); execution_space().fence();
Kokkos::deep_copy(map_h, map); Kokkos::deep_copy(map_h, map);
execution_space().fence(); execution_space().fence();
@ -367,7 +367,7 @@ void test_deep_copy(uint32_t num_nodes) {
} }
} }
host_map_type hmap; auto hmap = create_mirror(map);
Kokkos::deep_copy(hmap, map); Kokkos::deep_copy(hmap, map);
ASSERT_EQ(map.size(), hmap.size()); ASSERT_EQ(map.size(), hmap.size());
@ -380,6 +380,7 @@ void test_deep_copy(uint32_t num_nodes) {
} }
map_type mmap; map_type mmap;
mmap.allocate_view(hmap);
Kokkos::deep_copy(mmap, hmap); Kokkos::deep_copy(mmap, hmap);
const_map_type cmap = mmap; const_map_type cmap = mmap;
@ -424,7 +425,7 @@ TEST(TEST_CATEGORY, UnorderedMap_valid_empty) {
Map n{}; Map n{};
n = Map{m.capacity()}; n = Map{m.capacity()};
n.rehash(m.capacity()); n.rehash(m.capacity());
Kokkos::deep_copy(n, m); n.create_copy_view(m);
ASSERT_TRUE(m.is_allocated()); ASSERT_TRUE(m.is_allocated());
ASSERT_TRUE(n.is_allocated()); ASSERT_TRUE(n.is_allocated());
} }

View File

@ -21,6 +21,8 @@
#include <iostream> #include <iostream>
#include <cstdlib> #include <cstdlib>
#include <cstdio> #include <cstdio>
#include <Kokkos_Macros.hpp>
KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_PUSH()
#include <Kokkos_Vector.hpp> #include <Kokkos_Vector.hpp>
namespace Test { namespace Test {
@ -231,7 +233,7 @@ void test_vector_allocate(unsigned int size) {
TEST(TEST_CATEGORY, vector_combination) { TEST(TEST_CATEGORY, vector_combination) {
test_vector_allocate<int, TEST_EXECSPACE>(10); test_vector_allocate<int, TEST_EXECSPACE>(10);
test_vector_combinations<int, TEST_EXECSPACE>(10); test_vector_combinations<int, TEST_EXECSPACE>(10);
test_vector_combinations<int, TEST_EXECSPACE>(3057); test_vector_combinations<long long int, TEST_EXECSPACE>(3057);
} }
TEST(TEST_CATEGORY, vector_insert) { TEST(TEST_CATEGORY, vector_insert) {

View File

@ -390,7 +390,7 @@ static void Test_Atomic(benchmark::State& state) {
static constexpr int LOOP = 100'000; static constexpr int LOOP = 100'000;
BENCHMARK(Test_Atomic<int>)->Arg(LOOP)->Iterations(10); BENCHMARK(Test_Atomic<int>)->Arg(30'000)->Iterations(10);
BENCHMARK(Test_Atomic<long int>)->Arg(LOOP)->Iterations(10); BENCHMARK(Test_Atomic<long int>)->Arg(LOOP)->Iterations(10);
BENCHMARK(Test_Atomic<long long int>)->Arg(LOOP)->Iterations(10); BENCHMARK(Test_Atomic<long long int>)->Arg(LOOP)->Iterations(10);
BENCHMARK(Test_Atomic<unsigned int>)->Arg(LOOP)->Iterations(10); BENCHMARK(Test_Atomic<unsigned int>)->Arg(LOOP)->Iterations(10);
@ -398,4 +398,3 @@ BENCHMARK(Test_Atomic<unsigned long int>)->Arg(LOOP)->Iterations(10);
BENCHMARK(Test_Atomic<unsigned long long int>)->Arg(LOOP)->Iterations(10); BENCHMARK(Test_Atomic<unsigned long long int>)->Arg(LOOP)->Iterations(10);
BENCHMARK(Test_Atomic<float>)->Arg(LOOP)->Iterations(10); BENCHMARK(Test_Atomic<float>)->Arg(LOOP)->Iterations(10);
BENCHMARK(Test_Atomic<double>)->Arg(LOOP)->Iterations(10); BENCHMARK(Test_Atomic<double>)->Arg(LOOP)->Iterations(10);
BENCHMARK(Test_Atomic<int>)->Arg(LOOP)->Iterations(10);

View File

@ -183,7 +183,8 @@ double atomic_contentious_max_replacement(benchmark::State& state,
Kokkos::parallel_reduce( Kokkos::parallel_reduce(
con_length, con_length,
KOKKOS_LAMBDA(const int i, T& inner) { KOKKOS_LAMBDA(const int i, T& inner) {
inner = Kokkos::atomic_max_fetch(&(input(0)), inner + 1); inner = Kokkos::atomic_max_fetch(&(input(0)),
Kokkos::min(inner, max - 1) + 1);
if (i == con_length - 1) { if (i == con_length - 1) {
Kokkos::atomic_max_fetch(&(input(0)), max); Kokkos::atomic_max_fetch(&(input(0)), max);
inner = max; inner = max;
@ -223,7 +224,8 @@ double atomic_contentious_min_replacement(benchmark::State& state,
Kokkos::parallel_reduce( Kokkos::parallel_reduce(
con_length, con_length,
KOKKOS_LAMBDA(const int i, T& inner) { KOKKOS_LAMBDA(const int i, T& inner) {
inner = Kokkos::atomic_min_fetch(&(input(0)), inner - 1); inner = Kokkos::atomic_min_fetch(&(input(0)),
Kokkos::max(inner, min + 1) - 1);
if (i == con_length - 1) { if (i == con_length - 1) {
Kokkos::atomic_min_fetch(&(input(0)), min); Kokkos::atomic_min_fetch(&(input(0)), min);
inner = min; inner = min;
@ -246,7 +248,7 @@ static void Atomic_ContentiousMinReplacements(benchmark::State& state) {
auto inp = prepare_input(1, std::numeric_limits<T>::max()); auto inp = prepare_input(1, std::numeric_limits<T>::max());
for (auto _ : state) { for (auto _ : state) {
const auto time = atomic_contentious_max_replacement(state, inp, length); const auto time = atomic_contentious_min_replacement(state, inp, length);
state.SetIterationTime(time); state.SetIterationTime(time);
} }

View File

@ -166,8 +166,17 @@ class Cuda {
Cuda(); Cuda();
Cuda(cudaStream_t stream, explicit Cuda(cudaStream_t stream) : Cuda(stream, Impl::ManageStream::no) {}
Impl::ManageStream manage_stream = Impl::ManageStream::no);
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
template <typename T = void>
KOKKOS_DEPRECATED_WITH_COMMENT(
"Cuda execution space should be constructed explicitly.")
Cuda(cudaStream_t stream)
: Cuda(stream) {}
#endif
Cuda(cudaStream_t stream, Impl::ManageStream manage_stream);
KOKKOS_DEPRECATED Cuda(cudaStream_t stream, bool manage_stream); KOKKOS_DEPRECATED Cuda(cudaStream_t stream, bool manage_stream);
@ -186,7 +195,7 @@ class Cuda {
/// ///
/// This matches the __CUDA_ARCH__ specification. /// This matches the __CUDA_ARCH__ specification.
KOKKOS_DEPRECATED static size_type device_arch() { KOKKOS_DEPRECATED static size_type device_arch() {
const cudaDeviceProp& cudaProp = Cuda().cuda_device_prop(); const cudaDeviceProp cudaProp = Cuda().cuda_device_prop();
return cudaProp.major * 100 + cudaProp.minor; return cudaProp.major * 100 + cudaProp.minor;
} }

View File

@ -59,12 +59,6 @@ const std::unique_ptr<Kokkos::Cuda> &Kokkos::Impl::cuda_get_deep_copy_space(
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
namespace {
static std::atomic<int> num_uvm_allocations(0);
} // namespace
void DeepCopyCuda(void *dst, const void *src, size_t n) { void DeepCopyCuda(void *dst, const void *src, size_t n) {
KOKKOS_IMPL_CUDA_SAFE_CALL((CudaInternal::singleton().cuda_memcpy_wrapper( KOKKOS_IMPL_CUDA_SAFE_CALL((CudaInternal::singleton().cuda_memcpy_wrapper(
dst, src, n, cudaMemcpyDefault))); dst, src, n, cudaMemcpyDefault)));
@ -204,10 +198,7 @@ void *impl_allocate_common(const int device_id,
// we should do here since we're turning it into an // we should do here since we're turning it into an
// exception here // exception here
cudaGetLastError(); cudaGetLastError();
throw Experimental::CudaRawMemoryAllocationFailure( Kokkos::Impl::throw_bad_alloc(arg_handle.name, arg_alloc_size, arg_label);
arg_alloc_size, error_code,
Experimental::RawMemoryAllocationFailure::AllocationMechanism::
CudaMalloc);
} }
if (Kokkos::Profiling::profileLibraryLoaded()) { if (Kokkos::Profiling::profileLibraryLoaded()) {
@ -252,8 +243,6 @@ void *CudaUVMSpace::impl_allocate(
Cuda::impl_static_fence( Cuda::impl_static_fence(
"Kokkos::CudaUVMSpace::impl_allocate: Pre UVM Allocation"); "Kokkos::CudaUVMSpace::impl_allocate: Pre UVM Allocation");
if (arg_alloc_size > 0) { if (arg_alloc_size > 0) {
Kokkos::Impl::num_uvm_allocations++;
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device)); KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device));
cudaError_t error_code = cudaError_t error_code =
cudaMallocManaged(&ptr, arg_alloc_size, cudaMemAttachGlobal); cudaMallocManaged(&ptr, arg_alloc_size, cudaMemAttachGlobal);
@ -263,10 +252,7 @@ void *CudaUVMSpace::impl_allocate(
// we should do here since we're turning it into an // we should do here since we're turning it into an
// exception here // exception here
cudaGetLastError(); cudaGetLastError();
throw Experimental::CudaRawMemoryAllocationFailure( Kokkos::Impl::throw_bad_alloc(name(), arg_alloc_size, arg_label);
arg_alloc_size, error_code,
Experimental::RawMemoryAllocationFailure::AllocationMechanism::
CudaMallocManaged);
} }
#ifdef KOKKOS_IMPL_DEBUG_CUDA_PIN_UVM_TO_HOST #ifdef KOKKOS_IMPL_DEBUG_CUDA_PIN_UVM_TO_HOST
@ -307,10 +293,7 @@ void *CudaHostPinnedSpace::impl_allocate(
// we should do here since we're turning it into an // we should do here since we're turning it into an
// exception here // exception here
cudaGetLastError(); cudaGetLastError();
throw Experimental::CudaRawMemoryAllocationFailure( Kokkos::Impl::throw_bad_alloc(name(), arg_alloc_size, arg_label);
arg_alloc_size, error_code,
Experimental::RawMemoryAllocationFailure::AllocationMechanism::
CudaHostAlloc);
} }
if (Kokkos::Profiling::profileLibraryLoaded()) { if (Kokkos::Profiling::profileLibraryLoaded()) {
const size_t reported_size = const size_t reported_size =
@ -341,27 +324,24 @@ void CudaSpace::impl_deallocate(
Kokkos::Profiling::deallocateData(arg_handle, arg_label, arg_alloc_ptr, Kokkos::Profiling::deallocateData(arg_handle, arg_label, arg_alloc_ptr,
reported_size); reported_size);
} }
try {
#ifndef CUDART_VERSION #ifndef CUDART_VERSION
#error CUDART_VERSION undefined! #error CUDART_VERSION undefined!
#elif (defined(KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC) && CUDART_VERSION >= 11020) #elif (defined(KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC) && CUDART_VERSION >= 11020)
if (arg_alloc_size >= memory_threshold_g) { if (arg_alloc_size >= memory_threshold_g) {
Impl::cuda_device_synchronize( Impl::cuda_device_synchronize(
"Kokkos::Cuda: backend fence before async free"); "Kokkos::Cuda: backend fence before async free");
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device)); KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFreeAsync(arg_alloc_ptr, m_stream)); KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFreeAsync(arg_alloc_ptr, m_stream));
Impl::cuda_device_synchronize( Impl::cuda_device_synchronize(
"Kokkos::Cuda: backend fence after async free"); "Kokkos::Cuda: backend fence after async free");
} else { } else {
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFree(arg_alloc_ptr));
}
#else
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device)); KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFree(arg_alloc_ptr)); KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFree(arg_alloc_ptr));
#endif
} catch (...) {
} }
#else
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFree(arg_alloc_ptr));
#endif
} }
void CudaUVMSpace::deallocate(void *const arg_alloc_ptr, void CudaUVMSpace::deallocate(void *const arg_alloc_ptr,
const size_t arg_alloc_size) const { const size_t arg_alloc_size) const {
@ -387,13 +367,9 @@ void CudaUVMSpace::impl_deallocate(
Kokkos::Profiling::deallocateData(arg_handle, arg_label, arg_alloc_ptr, Kokkos::Profiling::deallocateData(arg_handle, arg_label, arg_alloc_ptr,
reported_size); reported_size);
} }
try { if (arg_alloc_ptr != nullptr) {
if (arg_alloc_ptr != nullptr) { KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device));
Kokkos::Impl::num_uvm_allocations--; KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFree(arg_alloc_ptr));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFree(arg_alloc_ptr));
}
} catch (...) {
} }
Cuda::impl_static_fence( Cuda::impl_static_fence(
"Kokkos::CudaUVMSpace::impl_deallocate: Post UVM Deallocation"); "Kokkos::CudaUVMSpace::impl_deallocate: Post UVM Deallocation");
@ -420,11 +396,8 @@ void CudaHostPinnedSpace::impl_deallocate(
Kokkos::Profiling::deallocateData(arg_handle, arg_label, arg_alloc_ptr, Kokkos::Profiling::deallocateData(arg_handle, arg_label, arg_alloc_ptr,
reported_size); reported_size);
} }
try { KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(m_device)); KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFreeHost(arg_alloc_ptr));
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaFreeHost(arg_alloc_ptr));
} catch (...) {
}
} }
} // namespace Kokkos } // namespace Kokkos

View File

@ -22,7 +22,6 @@
#include <impl/Kokkos_Error.hpp> #include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_Profiling.hpp> #include <impl/Kokkos_Profiling.hpp>
#include <iosfwd>
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
@ -69,52 +68,6 @@ inline void cuda_internal_safe_call(cudaError e, const char* name,
Kokkos::Impl::cuda_internal_safe_call(call, #call, __FILE__, __LINE__) Kokkos::Impl::cuda_internal_safe_call(call, #call, __FILE__, __LINE__)
} // namespace Impl } // namespace Impl
namespace Experimental {
class CudaRawMemoryAllocationFailure : public RawMemoryAllocationFailure {
private:
using base_t = RawMemoryAllocationFailure;
cudaError_t m_error_code = cudaSuccess;
static FailureMode get_failure_mode(cudaError_t error_code) {
switch (error_code) {
case cudaErrorMemoryAllocation: return FailureMode::OutOfMemoryError;
case cudaErrorInvalidValue: return FailureMode::InvalidAllocationSize;
// TODO handle cudaErrorNotSupported for cudaMallocManaged
default: return FailureMode::Unknown;
}
}
public:
// using base_t::base_t;
// would trigger
//
// error: cannot determine the exception specification of the default
// constructor due to a circular dependency
//
// using NVCC 9.1 and gcc 7.4
CudaRawMemoryAllocationFailure(
size_t arg_attempted_size, size_t arg_attempted_alignment,
FailureMode arg_failure_mode = FailureMode::OutOfMemoryError,
AllocationMechanism arg_mechanism =
AllocationMechanism::StdMalloc) noexcept
: base_t(arg_attempted_size, arg_attempted_alignment, arg_failure_mode,
arg_mechanism) {}
CudaRawMemoryAllocationFailure(size_t arg_attempted_size,
cudaError_t arg_error_code,
AllocationMechanism arg_mechanism) noexcept
: base_t(arg_attempted_size, /* CudaSpace doesn't handle alignment? */ 1,
get_failure_mode(arg_error_code), arg_mechanism),
m_error_code(arg_error_code) {}
void append_additional_error_information(std::ostream& o) const override;
};
} // end namespace Experimental
} // namespace Kokkos } // namespace Kokkos
#endif // KOKKOS_ENABLE_CUDA #endif // KOKKOS_ENABLE_CUDA

View File

@ -72,7 +72,7 @@ struct GraphImpl<Kokkos::Cuda> {
GraphNodeImpl<Kokkos::Cuda, aggregate_kernel_impl_t, GraphNodeImpl<Kokkos::Cuda, aggregate_kernel_impl_t,
Kokkos::Experimental::TypeErasedTag>; Kokkos::Experimental::TypeErasedTag>;
// Not moveable or copyable; it spends its whole life as a shared_ptr in the // Not movable or copyable; it spends its whole life as a shared_ptr in the
// Graph object // Graph object
GraphImpl() = delete; GraphImpl() = delete;
GraphImpl(GraphImpl const&) = delete; GraphImpl(GraphImpl const&) = delete;
@ -115,12 +115,9 @@ struct GraphImpl<Kokkos::Cuda> {
template <class NodeImpl> template <class NodeImpl>
// requires NodeImplPtr is a shared_ptr to specialization of GraphNodeImpl // requires NodeImplPtr is a shared_ptr to specialization of GraphNodeImpl
// Also requires that the kernel has the graph node tag in it's policy // Also requires that the kernel has the graph node tag in its policy
void add_node(std::shared_ptr<NodeImpl> const& arg_node_ptr) { void add_node(std::shared_ptr<NodeImpl> const& arg_node_ptr) {
static_assert( static_assert(NodeImpl::kernel_type::Policy::is_graph_kernel::value);
NodeImpl::kernel_type::Policy::is_graph_kernel::value,
"Something has gone horribly wrong, but it's too complicated to "
"explain here. Buy Daisy a coffee and she'll explain it to you.");
KOKKOS_EXPECTS(bool(arg_node_ptr)); KOKKOS_EXPECTS(bool(arg_node_ptr));
// The Kernel launch from the execute() method has been shimmed to insert // The Kernel launch from the execute() method has been shimmed to insert
// the node into the graph // the node into the graph

View File

@ -737,6 +737,14 @@ namespace Impl {
int g_cuda_space_factory_initialized = int g_cuda_space_factory_initialized =
initialize_space_factory<Cuda>("150_Cuda"); initialize_space_factory<Cuda>("150_Cuda");
int CudaInternal::m_cudaArch = -1;
cudaDeviceProp CudaInternal::m_deviceProp;
std::set<int> CudaInternal::cuda_devices = {};
std::map<int, unsigned long *> CudaInternal::constantMemHostStagingPerDevice =
{};
std::map<int, cudaEvent_t> CudaInternal::constantMemReusablePerDevice = {};
std::map<int, std::mutex> CudaInternal::constantMemMutexPerDevice = {};
} // namespace Impl } // namespace Impl
} // namespace Kokkos } // namespace Kokkos

View File

@ -91,10 +91,10 @@ class CudaInternal {
int m_cudaDev = -1; int m_cudaDev = -1;
// Device Properties // Device Properties
inline static int m_cudaArch = -1; static int m_cudaArch;
static int concurrency(); static int concurrency();
inline static cudaDeviceProp m_deviceProp; static cudaDeviceProp m_deviceProp;
// Scratch Spaces for Reductions // Scratch Spaces for Reductions
mutable std::size_t m_scratchSpaceCount; mutable std::size_t m_scratchSpaceCount;
@ -120,11 +120,10 @@ class CudaInternal {
bool was_initialized = false; bool was_initialized = false;
bool was_finalized = false; bool was_finalized = false;
inline static std::set<int> cuda_devices = {}; static std::set<int> cuda_devices;
inline static std::map<int, unsigned long*> constantMemHostStagingPerDevice = static std::map<int, unsigned long*> constantMemHostStagingPerDevice;
{}; static std::map<int, cudaEvent_t> constantMemReusablePerDevice;
inline static std::map<int, cudaEvent_t> constantMemReusablePerDevice = {}; static std::map<int, std::mutex> constantMemMutexPerDevice;
inline static std::map<int, std::mutex> constantMemMutexPerDevice = {};
static CudaInternal& singleton(); static CudaInternal& singleton();
@ -421,23 +420,6 @@ class CudaInternal {
return cudaStreamSynchronize(stream); return cudaStreamSynchronize(stream);
} }
// The following are only available for cuda 11.2 and greater
#if (defined(KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC) && CUDART_VERSION >= 11020)
template <bool setCudaDevice = true>
cudaError_t cuda_malloc_async_wrapper(void** devPtr, size_t size,
cudaStream_t hStream = nullptr) const {
if constexpr (setCudaDevice) set_cuda_device();
return cudaMallocAsync(devPtr, size, get_input_stream(hStream));
}
template <bool setCudaDevice = true>
cudaError_t cuda_free_async_wrapper(void* devPtr,
cudaStream_t hStream = nullptr) const {
if constexpr (setCudaDevice) set_cuda_device();
return cudaFreeAsync(devPtr, get_input_stream(hStream));
}
#endif
// C++ API routines // C++ API routines
template <typename T, bool setCudaDevice = true> template <typename T, bool setCudaDevice = true>
cudaError_t cuda_func_get_attributes_wrapper(cudaFuncAttributes* attr, cudaError_t cuda_func_get_attributes_wrapper(cudaFuncAttributes* attr,

View File

@ -539,17 +539,9 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
m_vector_size(arg_policy.impl_vector_length()) { m_vector_size(arg_policy.impl_vector_length()) {
auto internal_space_instance = auto internal_space_instance =
m_policy.space().impl_internal_space_instance(); m_policy.space().impl_internal_space_instance();
cudaFuncAttributes attr = m_team_size = m_team_size >= 0 ? m_team_size
CudaParallelLaunch<ParallelFor, LaunchBounds>::get_cuda_func_attributes( : arg_policy.team_size_recommended(
internal_space_instance->m_cudaDev); arg_functor, ParallelForTag());
m_team_size =
m_team_size >= 0
? m_team_size
: Kokkos::Impl::cuda_get_opt_block_size<FunctorType, LaunchBounds>(
internal_space_instance, attr, m_functor, m_vector_size,
m_policy.team_scratch_size(0),
m_policy.thread_scratch_size(0)) /
m_vector_size;
m_shmem_begin = (sizeof(double) * (m_team_size + 2)); m_shmem_begin = (sizeof(double) * (m_team_size + 2));
m_shmem_size = m_shmem_size =
@ -585,13 +577,7 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
"Kokkos::Impl::ParallelFor< Cuda > insufficient shared memory")); "Kokkos::Impl::ParallelFor< Cuda > insufficient shared memory"));
} }
if (int(m_team_size) > if (m_team_size > arg_policy.team_size_max(arg_functor, ParallelForTag())) {
int(Kokkos::Impl::cuda_get_max_block_size<FunctorType, LaunchBounds>(
internal_space_instance, attr, arg_functor,
arg_policy.impl_vector_length(),
arg_policy.team_scratch_size(0),
arg_policy.thread_scratch_size(0)) /
arg_policy.impl_vector_length())) {
Kokkos::Impl::throw_runtime_exception(std::string( Kokkos::Impl::throw_runtime_exception(std::string(
"Kokkos::Impl::ParallelFor< Cuda > requested too large team size.")); "Kokkos::Impl::ParallelFor< Cuda > requested too large team size."));
} }
@ -909,17 +895,11 @@ class ParallelReduce<CombinedFunctorReducerType,
m_vector_size(arg_policy.impl_vector_length()) { m_vector_size(arg_policy.impl_vector_length()) {
auto internal_space_instance = auto internal_space_instance =
m_policy.space().impl_internal_space_instance(); m_policy.space().impl_internal_space_instance();
cudaFuncAttributes attr = CudaParallelLaunch<ParallelReduce, LaunchBounds>:: m_team_size = m_team_size >= 0 ? m_team_size
get_cuda_func_attributes(internal_space_instance->m_cudaDev); : arg_policy.team_size_recommended(
m_team_size = arg_functor_reducer.get_functor(),
m_team_size >= 0 arg_functor_reducer.get_reducer(),
? m_team_size ParallelReduceTag());
: Kokkos::Impl::cuda_get_opt_block_size<FunctorType, LaunchBounds>(
internal_space_instance, attr,
m_functor_reducer.get_functor(), m_vector_size,
m_policy.team_scratch_size(0),
m_policy.thread_scratch_size(0)) /
m_vector_size;
m_team_begin = m_team_begin =
UseShflReduction UseShflReduction

View File

@ -28,35 +28,20 @@ extern "C" {
/* Cuda runtime function, declared in <crt/device_runtime.h> /* Cuda runtime function, declared in <crt/device_runtime.h>
* Requires capability 2.x or better. * Requires capability 2.x or better.
*/ */
extern __device__ void __assertfail(const void *message, const void *file, [[noreturn]] __device__ void __assertfail(const void *message, const void *file,
unsigned int line, const void *function, unsigned int line,
size_t charsize); const void *function,
size_t charsize);
} }
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
// required to workaround failures in random number generator unit tests with [[noreturn]] __device__ static void cuda_abort(const char *const message) {
// pre-volta architectures
#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
__device__ inline void cuda_abort(const char *const message) {
#else
[[noreturn]] __device__ inline void cuda_abort(const char *const message) {
#endif
const char empty[] = ""; const char empty[] = "";
__assertfail((const void *)message, (const void *)empty, (unsigned int)0, __assertfail((const void *)message, (const void *)empty, (unsigned int)0,
(const void *)empty, sizeof(char)); (const void *)empty, sizeof(char));
// This loop is never executed. It's intended to suppress warnings that the
// function returns, even though it does not. This is necessary because
// __assertfail is not marked as [[noreturn]], even though it does not return.
// Disable with KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK to workaround failures
// in random number generator unit tests with pre-volta architectures
#if !defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
while (true)
;
#endif
} }
} // namespace Impl } // namespace Impl

View File

@ -48,8 +48,19 @@ class HIP {
using scratch_memory_space = ScratchMemorySpace<HIP>; using scratch_memory_space = ScratchMemorySpace<HIP>;
HIP(); HIP();
HIP(hipStream_t stream,
Impl::ManageStream manage_stream = Impl::ManageStream::no); explicit HIP(hipStream_t stream) : HIP(stream, Impl::ManageStream::no) {}
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
template <typename T = void>
KOKKOS_DEPRECATED_WITH_COMMENT(
"HIP execution space should be constructed explicitly.")
HIP(hipStream_t stream)
: HIP(stream) {}
#endif
HIP(hipStream_t stream, Impl::ManageStream manage_stream);
KOKKOS_DEPRECATED HIP(hipStream_t stream, bool manage_stream); KOKKOS_DEPRECATED HIP(hipStream_t stream, bool manage_stream);
//@} //@}

View File

@ -22,8 +22,6 @@
#include <hip/hip_runtime.h> #include <hip/hip_runtime.h>
#include <ostream>
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
@ -44,39 +42,4 @@ inline void hip_internal_safe_call(hipError_t e, const char* name,
#define KOKKOS_IMPL_HIP_SAFE_CALL(call) \ #define KOKKOS_IMPL_HIP_SAFE_CALL(call) \
Kokkos::Impl::hip_internal_safe_call(call, #call, __FILE__, __LINE__) Kokkos::Impl::hip_internal_safe_call(call, #call, __FILE__, __LINE__)
namespace Kokkos {
namespace Experimental {
class HIPRawMemoryAllocationFailure : public RawMemoryAllocationFailure {
private:
hipError_t m_error_code = hipSuccess;
static FailureMode get_failure_mode(hipError_t error_code) {
switch (error_code) {
case hipErrorMemoryAllocation: return FailureMode::OutOfMemoryError;
case hipErrorInvalidValue: return FailureMode::InvalidAllocationSize;
default: return FailureMode::Unknown;
}
}
public:
HIPRawMemoryAllocationFailure(size_t arg_attempted_size,
hipError_t arg_error_code,
AllocationMechanism arg_mechanism) noexcept
: RawMemoryAllocationFailure(
arg_attempted_size, /* HIPSpace doesn't handle alignment? */ 1,
get_failure_mode(arg_error_code), arg_mechanism),
m_error_code(arg_error_code) {}
void append_additional_error_information(std::ostream& o) const override {
if (m_error_code != hipSuccess) {
o << " The HIP allocation returned the error code \""
<< hipGetErrorName(m_error_code) << "\".";
}
}
};
} // namespace Experimental
} // namespace Kokkos
#endif #endif

View File

@ -40,7 +40,7 @@ class GraphImpl<Kokkos::HIP> {
GraphNodeImpl<Kokkos::HIP, aggregate_kernel_impl_t, GraphNodeImpl<Kokkos::HIP, aggregate_kernel_impl_t,
Kokkos::Experimental::TypeErasedTag>; Kokkos::Experimental::TypeErasedTag>;
// Not moveable or copyable; it spends its whole life as a shared_ptr in the // Not movable or copyable; it spends its whole life as a shared_ptr in the
// Graph object. // Graph object.
GraphImpl() = delete; GraphImpl() = delete;
GraphImpl(GraphImpl const&) = delete; GraphImpl(GraphImpl const&) = delete;
@ -108,7 +108,7 @@ inline void GraphImpl<Kokkos::HIP>::add_node(
} }
// Requires NodeImplPtr is a shared_ptr to specialization of GraphNodeImpl // Requires NodeImplPtr is a shared_ptr to specialization of GraphNodeImpl
// Also requires that the kernel has the graph node tag in it's policy // Also requires that the kernel has the graph node tag in its policy
template <class NodeImpl> template <class NodeImpl>
inline void GraphImpl<Kokkos::HIP>::add_node( inline void GraphImpl<Kokkos::HIP>::add_node(
std::shared_ptr<NodeImpl> const& arg_node_ptr) { std::shared_ptr<NodeImpl> const& arg_node_ptr) {

View File

@ -353,6 +353,22 @@ void HIPInternal::finalize() {
m_num_scratch_locks = 0; m_num_scratch_locks = 0;
} }
int HIPInternal::m_hipDev = -1;
unsigned HIPInternal::m_multiProcCount = 0;
unsigned HIPInternal::m_maxWarpCount = 0;
std::array<HIPInternal::size_type, 3> HIPInternal::m_maxBlock = {0, 0, 0};
unsigned HIPInternal::m_maxWavesPerCU = 0;
int HIPInternal::m_shmemPerSM = 0;
int HIPInternal::m_maxShmemPerBlock = 0;
int HIPInternal::m_maxThreadsPerSM = 0;
hipDeviceProp_t HIPInternal::m_deviceProp;
std::mutex HIPInternal::scratchFunctorMutex;
unsigned long *HIPInternal::constantMemHostStaging = nullptr;
hipEvent_t HIPInternal::constantMemReusable = nullptr;
std::mutex HIPInternal::constantMemMutex;
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
Kokkos::HIP::size_type hip_internal_multiprocessor_count() { Kokkos::HIP::size_type hip_internal_multiprocessor_count() {

View File

@ -35,8 +35,7 @@ struct HIPTraits {
static constexpr int WarpSize = 64; static constexpr int WarpSize = 64;
static constexpr int WarpIndexMask = 0x003f; /* hexadecimal for 63 */ static constexpr int WarpIndexMask = 0x003f; /* hexadecimal for 63 */
static constexpr int WarpIndexShift = 6; /* WarpSize == 1 << WarpShift*/ static constexpr int WarpIndexShift = 6; /* WarpSize == 1 << WarpShift*/
#elif defined(KOKKOS_ARCH_AMD_GFX1030) || defined(KOKKOS_ARCH_AMD_GFX1100) || \ #elif defined(KOKKOS_ARCH_AMD_GFX1030) || defined(KOKKOS_ARCH_AMD_GFX1100)
defined(KOKKOS_ARCH_AMD_GFX1103)
static constexpr int WarpSize = 32; static constexpr int WarpSize = 32;
static constexpr int WarpIndexMask = 0x001f; /* hexadecimal for 31 */ static constexpr int WarpIndexMask = 0x001f; /* hexadecimal for 31 */
static constexpr int WarpIndexShift = 5; /* WarpSize == 1 << WarpShift*/ static constexpr int WarpIndexShift = 5; /* WarpSize == 1 << WarpShift*/
@ -71,16 +70,16 @@ class HIPInternal {
public: public:
using size_type = ::Kokkos::HIP::size_type; using size_type = ::Kokkos::HIP::size_type;
inline static int m_hipDev = -1; static int m_hipDev;
inline static unsigned m_multiProcCount = 0; static unsigned m_multiProcCount;
inline static unsigned m_maxWarpCount = 0; static unsigned m_maxWarpCount;
inline static std::array<size_type, 3> m_maxBlock = {0, 0, 0}; static std::array<size_type, 3> m_maxBlock;
inline static unsigned m_maxWavesPerCU = 0; static unsigned m_maxWavesPerCU;
inline static int m_shmemPerSM = 0; static int m_shmemPerSM;
inline static int m_maxShmemPerBlock = 0; static int m_maxShmemPerBlock;
inline static int m_maxThreadsPerSM = 0; static int m_maxThreadsPerSM;
inline static hipDeviceProp_t m_deviceProp; static hipDeviceProp_t m_deviceProp;
static int concurrency(); static int concurrency();
@ -93,7 +92,7 @@ class HIPInternal {
size_type *m_scratchFlags = nullptr; size_type *m_scratchFlags = nullptr;
mutable size_type *m_scratchFunctor = nullptr; mutable size_type *m_scratchFunctor = nullptr;
mutable size_type *m_scratchFunctorHost = nullptr; mutable size_type *m_scratchFunctorHost = nullptr;
inline static std::mutex scratchFunctorMutex; static std::mutex scratchFunctorMutex;
hipStream_t m_stream = nullptr; hipStream_t m_stream = nullptr;
uint32_t m_instance_id = uint32_t m_instance_id =
@ -112,9 +111,9 @@ class HIPInternal {
// FIXME_HIP: these want to be per-device, not per-stream... use of 'static' // FIXME_HIP: these want to be per-device, not per-stream... use of 'static'
// here will break once there are multiple devices though // here will break once there are multiple devices though
inline static unsigned long *constantMemHostStaging = nullptr; static unsigned long *constantMemHostStaging;
inline static hipEvent_t constantMemReusable = nullptr; static hipEvent_t constantMemReusable;
inline static std::mutex constantMemMutex; static std::mutex constantMemMutex;
static HIPInternal &singleton(); static HIPInternal &singleton();

View File

@ -50,6 +50,7 @@ class ParallelReduce<CombinedFunctorReducerType,
using value_type = typename ReducerType::value_type; using value_type = typename ReducerType::value_type;
using reference_type = typename ReducerType::reference_type; using reference_type = typename ReducerType::reference_type;
using functor_type = FunctorType; using functor_type = FunctorType;
using reducer_type = ReducerType;
using size_type = HIP::size_type; using size_type = HIP::size_type;
// Conditionally set word_size_type to int16_t or int8_t if value_type is // Conditionally set word_size_type to int16_t or int8_t if value_type is

View File

@ -31,7 +31,7 @@ template <class CombinedFunctorReducerType, class... Properties>
class ParallelReduce<CombinedFunctorReducerType, class ParallelReduce<CombinedFunctorReducerType,
Kokkos::TeamPolicy<Properties...>, HIP> { Kokkos::TeamPolicy<Properties...>, HIP> {
public: public:
using Policy = TeamPolicyInternal<HIP, Properties...>; using Policy = TeamPolicy<Properties...>;
using FunctorType = typename CombinedFunctorReducerType::functor_type; using FunctorType = typename CombinedFunctorReducerType::functor_type;
using ReducerType = typename CombinedFunctorReducerType::reducer_type; using ReducerType = typename CombinedFunctorReducerType::reducer_type;
@ -46,6 +46,7 @@ class ParallelReduce<CombinedFunctorReducerType,
public: public:
using functor_type = FunctorType; using functor_type = FunctorType;
using reducer_type = ReducerType;
using size_type = HIP::size_type; using size_type = HIP::size_type;
// static int constexpr UseShflReduction = false; // static int constexpr UseShflReduction = false;

View File

@ -39,6 +39,7 @@
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/ /*--------------------------------------------------------------------------*/
namespace { namespace {
static std::atomic<bool> is_first_hip_managed_allocation(true); static std::atomic<bool> is_first_hip_managed_allocation(true);
@ -66,7 +67,6 @@ void* HIPSpace::allocate(
return impl_allocate(arg_label, arg_alloc_size, arg_logical_size); return impl_allocate(arg_label, arg_alloc_size, arg_logical_size);
} }
void* HIPSpace::impl_allocate( void* HIPSpace::impl_allocate(
const char* arg_label, const size_t arg_alloc_size, const char* arg_label, const size_t arg_alloc_size,
const size_t arg_logical_size, const size_t arg_logical_size,
const Kokkos::Tools::SpaceHandle arg_handle) const { const Kokkos::Tools::SpaceHandle arg_handle) const {
@ -77,10 +77,7 @@ void* HIPSpace::impl_allocate(
// This is the only way to clear the last error, which we should do here // This is the only way to clear the last error, which we should do here
// since we're turning it into an exception here // since we're turning it into an exception here
(void)hipGetLastError(); (void)hipGetLastError();
throw Experimental::HIPRawMemoryAllocationFailure( Kokkos::Impl::throw_bad_alloc(name(), arg_alloc_size, arg_label);
arg_alloc_size, error_code,
Experimental::RawMemoryAllocationFailure::AllocationMechanism::
HIPMalloc);
} }
if (Kokkos::Profiling::profileLibraryLoaded()) { if (Kokkos::Profiling::profileLibraryLoaded()) {
const size_t reported_size = const size_t reported_size =
@ -111,10 +108,7 @@ void* HIPHostPinnedSpace::impl_allocate(
// This is the only way to clear the last error, which we should do here // This is the only way to clear the last error, which we should do here
// since we're turning it into an exception here // since we're turning it into an exception here
(void)hipGetLastError(); (void)hipGetLastError();
throw Experimental::HIPRawMemoryAllocationFailure( Kokkos::Impl::throw_bad_alloc(name(), arg_alloc_size, arg_label);
arg_alloc_size, error_code,
Experimental::RawMemoryAllocationFailure::AllocationMechanism::
HIPHostMalloc);
} }
if (Kokkos::Profiling::profileLibraryLoaded()) { if (Kokkos::Profiling::profileLibraryLoaded()) {
const size_t reported_size = const size_t reported_size =
@ -178,10 +172,7 @@ Kokkos::HIP::runtime WARNING: Kokkos did not find an environment variable 'HSA_X
// This is the only way to clear the last error, which we should do here // This is the only way to clear the last error, which we should do here
// since we're turning it into an exception here // since we're turning it into an exception here
(void)hipGetLastError(); (void)hipGetLastError();
throw Experimental::HIPRawMemoryAllocationFailure( Kokkos::Impl::throw_bad_alloc(name(), arg_alloc_size, arg_label);
arg_alloc_size, error_code,
Experimental::RawMemoryAllocationFailure::AllocationMechanism::
HIPMallocManaged);
} }
KOKKOS_IMPL_HIP_SAFE_CALL(hipMemAdvise( KOKKOS_IMPL_HIP_SAFE_CALL(hipMemAdvise(
ptr, arg_alloc_size, hipMemAdviseSetCoarseGrain, m_device)); ptr, arg_alloc_size, hipMemAdviseSetCoarseGrain, m_device));

View File

@ -153,7 +153,7 @@ void HPX::impl_instance_fence_locked(const std::string &name) const {
auto &s = impl_get_sender(); auto &s = impl_get_sender();
hpx::this_thread::experimental::sync_wait(std::move(s)); hpx::this_thread::experimental::sync_wait(std::move(s));
s = hpx::execution::experimental::unique_any_sender( s = hpx::execution::experimental::unique_any_sender<>(
hpx::execution::experimental::just()); hpx::execution::experimental::just());
}); });
} }
@ -184,7 +184,7 @@ void HPX::impl_static_fence(const std::string &name) {
} }
hpx::this_thread::experimental::sync_wait(std::move(s)); hpx::this_thread::experimental::sync_wait(std::move(s));
s = hpx::execution::experimental::unique_any_sender( s = hpx::execution::experimental::unique_any_sender<>(
hpx::execution::experimental::just()); hpx::execution::experimental::just());
}); });
} }

View File

@ -168,17 +168,31 @@ class HPX {
: m_instance_data(Kokkos::Impl::HostSharedPtr<instance_data>( : m_instance_data(Kokkos::Impl::HostSharedPtr<instance_data>(
&m_default_instance_data, &default_instance_deleter)) {} &m_default_instance_data, &default_instance_deleter)) {}
~HPX() = default; ~HPX() = default;
HPX(instance_mode mode) explicit HPX(instance_mode mode)
: m_instance_data( : m_instance_data(
mode == instance_mode::independent mode == instance_mode::independent
? (Kokkos::Impl::HostSharedPtr<instance_data>( ? (Kokkos::Impl::HostSharedPtr<instance_data>(
new instance_data(m_next_instance_id++))) new instance_data(m_next_instance_id++)))
: Kokkos::Impl::HostSharedPtr<instance_data>( : Kokkos::Impl::HostSharedPtr<instance_data>(
&m_default_instance_data, &default_instance_deleter)) {} &m_default_instance_data, &default_instance_deleter)) {}
HPX(hpx::execution::experimental::unique_any_sender<> &&sender) explicit HPX(hpx::execution::experimental::unique_any_sender<> &&sender)
: m_instance_data(Kokkos::Impl::HostSharedPtr<instance_data>( : m_instance_data(Kokkos::Impl::HostSharedPtr<instance_data>(
new instance_data(m_next_instance_id++, std::move(sender)))) {} new instance_data(m_next_instance_id++, std::move(sender)))) {}
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
template <typename T = void>
KOKKOS_DEPRECATED_WITH_COMMENT(
"HPX execution space should be constructed explicitly.")
HPX(instance_mode mode)
: HPX(mode) {}
template <typename T = void>
KOKKOS_DEPRECATED_WITH_COMMENT(
"HPX execution space should be constructed explicitly.")
HPX(hpx::execution::experimental::unique_any_sender<> &&sender)
: HPX(std::move(sender)) {}
#endif
HPX(HPX &&other) = default; HPX(HPX &&other) = default;
HPX(const HPX &other) = default; HPX(const HPX &other) = default;

View File

@ -29,7 +29,6 @@
#include <type_traits> #include <type_traits>
#include <algorithm> #include <algorithm>
#include <utility> #include <utility>
#include <limits>
#include <cstddef> #include <cstddef>
namespace Kokkos { namespace Kokkos {
@ -80,7 +79,11 @@ struct ArrayBoundsCheck<Integral, false> {
/**\brief Derived from the C++17 'std::array'. /**\brief Derived from the C++17 'std::array'.
* Dropping the iterator interface. * Dropping the iterator interface.
*/ */
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
template <class T = void, size_t N = KOKKOS_INVALID_INDEX, class Proxy = void> template <class T = void, size_t N = KOKKOS_INVALID_INDEX, class Proxy = void>
#else
template <class T, size_t N>
#endif
struct Array { struct Array {
public: public:
/** /**
@ -129,10 +132,38 @@ struct Array {
KOKKOS_INLINE_FUNCTION constexpr const_pointer data() const { KOKKOS_INLINE_FUNCTION constexpr const_pointer data() const {
return &m_internal_implementation_private_member_data[0]; return &m_internal_implementation_private_member_data[0];
} }
friend KOKKOS_FUNCTION constexpr bool operator==(Array const& lhs,
Array const& rhs) noexcept {
for (size_t i = 0; i != N; ++i)
if (lhs[i] != rhs[i]) return false;
return true;
}
friend KOKKOS_FUNCTION constexpr bool operator!=(Array const& lhs,
Array const& rhs) noexcept {
return !(lhs == rhs);
}
private:
template <class U = T>
friend KOKKOS_INLINE_FUNCTION constexpr std::enable_if_t<
Impl::is_swappable<U>::value>
kokkos_swap(Array<T, N>& a,
Array<T, N>& b) noexcept(Impl::is_nothrow_swappable_v<U>) {
for (std::size_t i = 0; i < N; ++i) {
kokkos_swap(a[i], b[i]);
}
}
}; };
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
template <class T, class Proxy> template <class T, class Proxy>
struct Array<T, 0, Proxy> { struct Array<T, 0, Proxy> {
#else
template <class T>
struct Array<T, 0> {
#endif
public: public:
using reference = T&; using reference = T&;
using const_reference = std::add_const_t<T>&; using const_reference = std::add_const_t<T>&;
@ -167,25 +198,35 @@ struct Array<T, 0, Proxy> {
KOKKOS_INLINE_FUNCTION pointer data() { return nullptr; } KOKKOS_INLINE_FUNCTION pointer data() { return nullptr; }
KOKKOS_INLINE_FUNCTION const_pointer data() const { return nullptr; } KOKKOS_INLINE_FUNCTION const_pointer data() const { return nullptr; }
KOKKOS_DEFAULTED_FUNCTION ~Array() = default; friend KOKKOS_FUNCTION constexpr bool operator==(Array const&,
KOKKOS_DEFAULTED_FUNCTION Array() = default; Array const&) noexcept {
KOKKOS_DEFAULTED_FUNCTION Array(const Array&) = default; return true;
KOKKOS_DEFAULTED_FUNCTION Array& operator=(const Array&) = default; }
friend KOKKOS_FUNCTION constexpr bool operator!=(Array const&,
Array const&) noexcept {
return false;
}
// Some supported compilers are not sufficiently C++11 compliant private:
// for default move constructor and move assignment operator. friend KOKKOS_INLINE_FUNCTION constexpr void kokkos_swap(
// Array( Array && ) = default ; Array<T, 0>&, Array<T, 0>&) noexcept {}
// Array & operator = ( Array && ) = default ;
}; };
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
namespace Impl {
struct KokkosArrayContiguous {};
struct KokkosArrayStrided {};
} // namespace Impl
template <> template <>
struct Array<void, KOKKOS_INVALID_INDEX, void> { struct KOKKOS_DEPRECATED Array<void, KOKKOS_INVALID_INDEX, void> {
struct contiguous {}; using contiguous = Impl::KokkosArrayContiguous;
struct strided {}; using strided = Impl::KokkosArrayStrided;
}; };
template <class T> template <class T>
struct Array<T, KOKKOS_INVALID_INDEX, Array<>::contiguous> { struct KOKKOS_DEPRECATED
Array<T, KOKKOS_INVALID_INDEX, Impl::KokkosArrayContiguous> {
private: private:
T* m_elem; T* m_elem;
size_t m_size; size_t m_size;
@ -253,7 +294,8 @@ struct Array<T, KOKKOS_INVALID_INDEX, Array<>::contiguous> {
}; };
template <class T> template <class T>
struct Array<T, KOKKOS_INVALID_INDEX, Array<>::strided> { struct KOKKOS_DEPRECATED
Array<T, KOKKOS_INVALID_INDEX, Impl::KokkosArrayStrided> {
private: private:
T* m_elem; T* m_elem;
size_t m_size; size_t m_size;
@ -320,10 +362,37 @@ struct Array<T, KOKKOS_INVALID_INDEX, Array<>::strided> {
size_type arg_stride) size_type arg_stride)
: m_elem(arg_ptr), m_size(arg_size), m_stride(arg_stride) {} : m_elem(arg_ptr), m_size(arg_size), m_stride(arg_stride) {}
}; };
#endif
template <typename T, typename... Us> template <typename T, typename... Us>
Array(T, Us...)->Array<T, 1 + sizeof...(Us)>; Array(T, Us...)->Array<T, 1 + sizeof...(Us)>;
namespace Impl {
template <typename T, size_t N, size_t... I>
KOKKOS_FUNCTION constexpr Array<std::remove_cv_t<T>, N> to_array_impl(
T (&a)[N], std::index_sequence<I...>) {
return {{a[I]...}};
}
template <typename T, size_t N, size_t... I>
KOKKOS_FUNCTION constexpr Array<std::remove_cv_t<T>, N> to_array_impl(
T(&&a)[N], std::index_sequence<I...>) {
return {{std::move(a[I])...}};
}
} // namespace Impl
template <typename T, size_t N>
KOKKOS_FUNCTION constexpr auto to_array(T (&a)[N]) {
return Impl::to_array_impl(a, std::make_index_sequence<N>{});
}
template <typename T, size_t N>
KOKKOS_FUNCTION constexpr auto to_array(T(&&a)[N]) {
return Impl::to_array_impl(std::move(a), std::make_index_sequence<N>{});
}
} // namespace Kokkos } // namespace Kokkos
//<editor-fold desc="Support for structured binding"> //<editor-fold desc="Support for structured binding">
@ -333,6 +402,7 @@ struct std::tuple_size<Kokkos::Array<T, N>>
template <std::size_t I, class T, std::size_t N> template <std::size_t I, class T, std::size_t N>
struct std::tuple_element<I, Kokkos::Array<T, N>> { struct std::tuple_element<I, Kokkos::Array<T, N>> {
static_assert(I < N);
using type = T; using type = T;
}; };
@ -340,21 +410,25 @@ namespace Kokkos {
template <std::size_t I, class T, std::size_t N> template <std::size_t I, class T, std::size_t N>
KOKKOS_FUNCTION constexpr T& get(Array<T, N>& a) noexcept { KOKKOS_FUNCTION constexpr T& get(Array<T, N>& a) noexcept {
static_assert(I < N);
return a[I]; return a[I];
} }
template <std::size_t I, class T, std::size_t N> template <std::size_t I, class T, std::size_t N>
KOKKOS_FUNCTION constexpr T const& get(Array<T, N> const& a) noexcept { KOKKOS_FUNCTION constexpr T const& get(Array<T, N> const& a) noexcept {
static_assert(I < N);
return a[I]; return a[I];
} }
template <std::size_t I, class T, std::size_t N> template <std::size_t I, class T, std::size_t N>
KOKKOS_FUNCTION constexpr T&& get(Array<T, N>&& a) noexcept { KOKKOS_FUNCTION constexpr T&& get(Array<T, N>&& a) noexcept {
static_assert(I < N);
return std::move(a[I]); return std::move(a[I]);
} }
template <std::size_t I, class T, std::size_t N> template <std::size_t I, class T, std::size_t N>
KOKKOS_FUNCTION constexpr T const&& get(Array<T, N> const&& a) noexcept { KOKKOS_FUNCTION constexpr T const&& get(Array<T, N> const&& a) noexcept {
static_assert(I < N);
return std::move(a[I]); return std::move(a[I]);
} }

View File

@ -22,7 +22,6 @@ static_assert(false,
#ifndef KOKKOS_DESUL_ATOMICS_VOLATILE_WRAPPER_HPP_ #ifndef KOKKOS_DESUL_ATOMICS_VOLATILE_WRAPPER_HPP_
#define KOKKOS_DESUL_ATOMICS_VOLATILE_WRAPPER_HPP_ #define KOKKOS_DESUL_ATOMICS_VOLATILE_WRAPPER_HPP_
#include <Kokkos_Macros.hpp> #include <Kokkos_Macros.hpp>
#include <Kokkos_Atomics_Desul_Config.hpp>
#include <desul/atomics.hpp> #include <desul/atomics.hpp>
#ifdef KOKKOS_ENABLE_ATOMICS_BYPASS #ifdef KOKKOS_ENABLE_ATOMICS_BYPASS

View File

@ -22,8 +22,6 @@ static_assert(false,
#ifndef KOKKOS_DESUL_ATOMICS_WRAPPER_HPP_ #ifndef KOKKOS_DESUL_ATOMICS_WRAPPER_HPP_
#define KOKKOS_DESUL_ATOMICS_WRAPPER_HPP_ #define KOKKOS_DESUL_ATOMICS_WRAPPER_HPP_
#include <Kokkos_Macros.hpp> #include <Kokkos_Macros.hpp>
#include <Kokkos_Atomics_Desul_Config.hpp>
#include <desul/atomics.hpp> #include <desul/atomics.hpp>
#include <impl/Kokkos_Volatile_Load.hpp> #include <impl/Kokkos_Volatile_Load.hpp>

View File

@ -28,6 +28,7 @@
#include <complex> #include <complex>
#include <type_traits> #include <type_traits>
#include <iosfwd> #include <iosfwd>
#include <tuple>
namespace Kokkos { namespace Kokkos {
@ -256,6 +257,12 @@ class
return *this; return *this;
} }
template <size_t I, typename RT>
friend constexpr const RT& get(const complex<RT>&) noexcept;
template <size_t I, typename RT>
friend constexpr const RT&& get(const complex<RT>&&) noexcept;
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4 #ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
//! Copy constructor from volatile. //! Copy constructor from volatile.
template < template <
@ -423,6 +430,75 @@ class
#endif // KOKKOS_ENABLE_DEPRECATED_CODE_4 #endif // KOKKOS_ENABLE_DEPRECATED_CODE_4
}; };
} // namespace Kokkos
// Tuple protocol for complex based on https://wg21.link/P2819R2 (voted into
// the C++26 working draft on 2023-11)
template <typename RealType>
struct std::tuple_size<Kokkos::complex<RealType>>
: std::integral_constant<size_t, 2> {};
template <size_t I, typename RealType>
struct std::tuple_element<I, Kokkos::complex<RealType>> {
static_assert(I < 2);
using type = RealType;
};
namespace Kokkos {
// get<...>(...) defined here so as not to be hidden friends, as per P2819R2
template <size_t I, typename RealType>
KOKKOS_FUNCTION constexpr RealType& get(complex<RealType>& z) noexcept {
static_assert(I < 2);
if constexpr (I == 0)
return z.real();
else
return z.imag();
#ifdef KOKKOS_COMPILER_INTEL
__builtin_unreachable();
#endif
}
template <size_t I, typename RealType>
KOKKOS_FUNCTION constexpr RealType&& get(complex<RealType>&& z) noexcept {
static_assert(I < 2);
if constexpr (I == 0)
return std::move(z.real());
else
return std::move(z.imag());
#ifdef KOKKOS_COMPILER_INTEL
__builtin_unreachable();
#endif
}
template <size_t I, typename RealType>
KOKKOS_FUNCTION constexpr const RealType& get(
const complex<RealType>& z) noexcept {
static_assert(I < 2);
if constexpr (I == 0)
return z.re_;
else
return z.im_;
#ifdef KOKKOS_COMPILER_INTEL
__builtin_unreachable();
#endif
}
template <size_t I, typename RealType>
KOKKOS_FUNCTION constexpr const RealType&& get(
const complex<RealType>&& z) noexcept {
static_assert(I < 2);
if constexpr (I == 0)
return std::move(z.re_);
else
return std::move(z.im_);
#ifdef KOKKOS_COMPILER_INTEL
__builtin_unreachable();
#endif
}
//============================================================================== //==============================================================================
// <editor-fold desc="Equality and inequality"> {{{1 // <editor-fold desc="Equality and inequality"> {{{1

View File

@ -221,10 +221,12 @@ struct ViewFill<ViewType, Layout, ExecSpace, 7, iType> {
ViewFill(const ViewType& a_, typename ViewType::const_value_type& val_, ViewFill(const ViewType& a_, typename ViewType::const_value_type& val_,
const ExecSpace& space) const ExecSpace& space)
: a(a_), val(val_) { : a(a_), val(val_) {
// MDRangePolicy is not supported for 7D views
// Iterate separately over extent(2)
Kokkos::parallel_for("Kokkos::ViewFill-7D", Kokkos::parallel_for("Kokkos::ViewFill-7D",
policy_type(space, {0, 0, 0, 0, 0, 0}, policy_type(space, {0, 0, 0, 0, 0, 0},
{a.extent(0), a.extent(1), a.extent(2), {a.extent(0), a.extent(1), a.extent(3),
a.extent(3), a.extent(5), a.extent(6)}), a.extent(4), a.extent(5), a.extent(6)}),
*this); *this);
} }
@ -249,6 +251,8 @@ struct ViewFill<ViewType, Layout, ExecSpace, 8, iType> {
ViewFill(const ViewType& a_, typename ViewType::const_value_type& val_, ViewFill(const ViewType& a_, typename ViewType::const_value_type& val_,
const ExecSpace& space) const ExecSpace& space)
: a(a_), val(val_) { : a(a_), val(val_) {
// MDRangePolicy is not supported for 8D views
// Iterate separately over extent(2) and extent(4)
Kokkos::parallel_for("Kokkos::ViewFill-8D", Kokkos::parallel_for("Kokkos::ViewFill-8D",
policy_type(space, {0, 0, 0, 0, 0, 0}, policy_type(space, {0, 0, 0, 0, 0, 0},
{a.extent(0), a.extent(1), a.extent(3), {a.extent(0), a.extent(1), a.extent(3),
@ -293,9 +297,11 @@ struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 2, iType> {
ViewTypeA a; ViewTypeA a;
ViewTypeB b; ViewTypeB b;
static const Kokkos::Iterate outer_iteration_pattern = static const Kokkos::Iterate outer_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::outer_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::outer_iteration_pattern;
static const Kokkos::Iterate inner_iteration_pattern = static const Kokkos::Iterate inner_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::inner_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::inner_iteration_pattern;
using iterate_type = using iterate_type =
Kokkos::Rank<2, outer_iteration_pattern, inner_iteration_pattern>; Kokkos::Rank<2, outer_iteration_pattern, inner_iteration_pattern>;
using policy_type = using policy_type =
@ -323,9 +329,11 @@ struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 3, iType> {
ViewTypeB b; ViewTypeB b;
static const Kokkos::Iterate outer_iteration_pattern = static const Kokkos::Iterate outer_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::outer_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::outer_iteration_pattern;
static const Kokkos::Iterate inner_iteration_pattern = static const Kokkos::Iterate inner_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::inner_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::inner_iteration_pattern;
using iterate_type = using iterate_type =
Kokkos::Rank<3, outer_iteration_pattern, inner_iteration_pattern>; Kokkos::Rank<3, outer_iteration_pattern, inner_iteration_pattern>;
using policy_type = using policy_type =
@ -354,9 +362,11 @@ struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 4, iType> {
ViewTypeB b; ViewTypeB b;
static const Kokkos::Iterate outer_iteration_pattern = static const Kokkos::Iterate outer_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::outer_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::outer_iteration_pattern;
static const Kokkos::Iterate inner_iteration_pattern = static const Kokkos::Iterate inner_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::inner_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::inner_iteration_pattern;
using iterate_type = using iterate_type =
Kokkos::Rank<4, outer_iteration_pattern, inner_iteration_pattern>; Kokkos::Rank<4, outer_iteration_pattern, inner_iteration_pattern>;
using policy_type = using policy_type =
@ -386,9 +396,11 @@ struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 5, iType> {
ViewTypeB b; ViewTypeB b;
static const Kokkos::Iterate outer_iteration_pattern = static const Kokkos::Iterate outer_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::outer_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::outer_iteration_pattern;
static const Kokkos::Iterate inner_iteration_pattern = static const Kokkos::Iterate inner_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::inner_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::inner_iteration_pattern;
using iterate_type = using iterate_type =
Kokkos::Rank<5, outer_iteration_pattern, inner_iteration_pattern>; Kokkos::Rank<5, outer_iteration_pattern, inner_iteration_pattern>;
using policy_type = using policy_type =
@ -418,9 +430,11 @@ struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 6, iType> {
ViewTypeB b; ViewTypeB b;
static const Kokkos::Iterate outer_iteration_pattern = static const Kokkos::Iterate outer_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::outer_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::outer_iteration_pattern;
static const Kokkos::Iterate inner_iteration_pattern = static const Kokkos::Iterate inner_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::inner_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::inner_iteration_pattern;
using iterate_type = using iterate_type =
Kokkos::Rank<6, outer_iteration_pattern, inner_iteration_pattern>; Kokkos::Rank<6, outer_iteration_pattern, inner_iteration_pattern>;
using policy_type = using policy_type =
@ -450,9 +464,11 @@ struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 7, iType> {
ViewTypeB b; ViewTypeB b;
static const Kokkos::Iterate outer_iteration_pattern = static const Kokkos::Iterate outer_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::outer_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::outer_iteration_pattern;
static const Kokkos::Iterate inner_iteration_pattern = static const Kokkos::Iterate inner_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::inner_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::inner_iteration_pattern;
using iterate_type = using iterate_type =
Kokkos::Rank<6, outer_iteration_pattern, inner_iteration_pattern>; Kokkos::Rank<6, outer_iteration_pattern, inner_iteration_pattern>;
using policy_type = using policy_type =
@ -461,6 +477,8 @@ struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 7, iType> {
ViewCopy(const ViewTypeA& a_, const ViewTypeB& b_, ViewCopy(const ViewTypeA& a_, const ViewTypeB& b_,
const ExecSpace space = ExecSpace()) const ExecSpace space = ExecSpace())
: a(a_), b(b_) { : a(a_), b(b_) {
// MDRangePolicy is not supported for 7D views
// Iterate separately over extent(2)
Kokkos::parallel_for("Kokkos::ViewCopy-7D", Kokkos::parallel_for("Kokkos::ViewCopy-7D",
policy_type(space, {0, 0, 0, 0, 0, 0}, policy_type(space, {0, 0, 0, 0, 0, 0},
{a.extent(0), a.extent(1), a.extent(3), {a.extent(0), a.extent(1), a.extent(3),
@ -483,9 +501,11 @@ struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 8, iType> {
ViewTypeB b; ViewTypeB b;
static const Kokkos::Iterate outer_iteration_pattern = static const Kokkos::Iterate outer_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::outer_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::outer_iteration_pattern;
static const Kokkos::Iterate inner_iteration_pattern = static const Kokkos::Iterate inner_iteration_pattern =
Kokkos::layout_iterate_type_selector<Layout>::inner_iteration_pattern; Kokkos::Impl::layout_iterate_type_selector<
Layout>::inner_iteration_pattern;
using iterate_type = using iterate_type =
Kokkos::Rank<6, outer_iteration_pattern, inner_iteration_pattern>; Kokkos::Rank<6, outer_iteration_pattern, inner_iteration_pattern>;
using policy_type = using policy_type =
@ -494,6 +514,8 @@ struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 8, iType> {
ViewCopy(const ViewTypeA& a_, const ViewTypeB& b_, ViewCopy(const ViewTypeA& a_, const ViewTypeB& b_,
const ExecSpace space = ExecSpace()) const ExecSpace space = ExecSpace())
: a(a_), b(b_) { : a(a_), b(b_) {
// MDRangePolicy is not supported for 8D views
// Iterate separately over extent(2) and extent(4)
Kokkos::parallel_for("Kokkos::ViewCopy-8D", Kokkos::parallel_for("Kokkos::ViewCopy-8D",
policy_type(space, {0, 0, 0, 0, 0, 0}, policy_type(space, {0, 0, 0, 0, 0, 0},
{a.extent(0), a.extent(1), a.extent(3), {a.extent(0), a.extent(1), a.extent(3),
@ -539,11 +561,8 @@ void view_copy(const ExecutionSpace& space, const DstType& dst,
int64_t strides[DstType::rank + 1]; int64_t strides[DstType::rank + 1];
dst.stride(strides); dst.stride(strides);
Kokkos::Iterate iterate; Kokkos::Iterate iterate;
if (Kokkos::is_layouttiled<typename DstType::array_layout>::value) { if (std::is_same<typename DstType::array_layout,
iterate = Kokkos::layout_iterate_type_selector< Kokkos::LayoutRight>::value) {
typename DstType::array_layout>::outer_iteration_pattern;
} else if (std::is_same<typename DstType::array_layout,
Kokkos::LayoutRight>::value) {
iterate = Kokkos::Iterate::Right; iterate = Kokkos::Iterate::Right;
} else if (std::is_same<typename DstType::array_layout, } else if (std::is_same<typename DstType::array_layout,
Kokkos::LayoutLeft>::value) { Kokkos::LayoutLeft>::value) {
@ -630,11 +649,8 @@ void view_copy(const DstType& dst, const SrcType& src) {
int64_t strides[DstType::rank + 1]; int64_t strides[DstType::rank + 1];
dst.stride(strides); dst.stride(strides);
Kokkos::Iterate iterate; Kokkos::Iterate iterate;
if (Kokkos::is_layouttiled<typename DstType::array_layout>::value) { if (std::is_same<typename DstType::array_layout,
iterate = Kokkos::layout_iterate_type_selector< Kokkos::LayoutRight>::value) {
typename DstType::array_layout>::outer_iteration_pattern;
} else if (std::is_same<typename DstType::array_layout,
Kokkos::LayoutRight>::value) {
iterate = Kokkos::Iterate::Right; iterate = Kokkos::Iterate::Right;
} else if (std::is_same<typename DstType::array_layout, } else if (std::is_same<typename DstType::array_layout,
Kokkos::LayoutLeft>::value) { Kokkos::LayoutLeft>::value) {
@ -3092,8 +3108,7 @@ inline std::enable_if_t<
std::is_same<typename Kokkos::View<T, P...>::array_layout, std::is_same<typename Kokkos::View<T, P...>::array_layout,
Kokkos::LayoutRight>::value || Kokkos::LayoutRight>::value ||
std::is_same<typename Kokkos::View<T, P...>::array_layout, std::is_same<typename Kokkos::View<T, P...>::array_layout,
Kokkos::LayoutStride>::value || Kokkos::LayoutStride>::value>
is_layouttiled<typename Kokkos::View<T, P...>::array_layout>::value>
impl_resize(const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, impl_resize(const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
Kokkos::View<T, P...>& v, Kokkos::View<T, P...>& v,
const typename Kokkos::View<T, P...>::array_layout& layout) { const typename Kokkos::View<T, P...>::array_layout& layout) {
@ -3139,8 +3154,7 @@ inline std::enable_if_t<
std::is_same<typename Kokkos::View<T, P...>::array_layout, std::is_same<typename Kokkos::View<T, P...>::array_layout,
Kokkos::LayoutRight>::value || Kokkos::LayoutRight>::value ||
std::is_same<typename Kokkos::View<T, P...>::array_layout, std::is_same<typename Kokkos::View<T, P...>::array_layout,
Kokkos::LayoutStride>::value || Kokkos::LayoutStride>::value)>
is_layouttiled<typename Kokkos::View<T, P...>::array_layout>::value)>
impl_resize(const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, impl_resize(const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
Kokkos::View<T, P...>& v, Kokkos::View<T, P...>& v,
const typename Kokkos::View<T, P...>::array_layout& layout) { const typename Kokkos::View<T, P...>::array_layout& layout) {
@ -3235,7 +3249,10 @@ impl_realloc(Kokkos::View<T, P...>& v, const size_t n0, const size_t n1,
v = view_type(); // Best effort to deallocate in case no other view refers v = view_type(); // Best effort to deallocate in case no other view refers
// to the shared allocation // to the shared allocation
v = view_type(arg_prop_copy, n0, n1, n2, n3, n4, n5, n6, n7); v = view_type(arg_prop_copy, n0, n1, n2, n3, n4, n5, n6, n7);
} else if (alloc_prop_input::initialize) { return;
}
if constexpr (alloc_prop_input::initialize) {
if constexpr (alloc_prop_input::has_execution_space) { if constexpr (alloc_prop_input::has_execution_space) {
const auto& exec_space = const auto& exec_space =
Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop); Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop);
@ -3308,8 +3325,7 @@ inline std::enable_if_t<
std::is_same<typename Kokkos::View<T, P...>::array_layout, std::is_same<typename Kokkos::View<T, P...>::array_layout,
Kokkos::LayoutRight>::value || Kokkos::LayoutRight>::value ||
std::is_same<typename Kokkos::View<T, P...>::array_layout, std::is_same<typename Kokkos::View<T, P...>::array_layout,
Kokkos::LayoutStride>::value || Kokkos::LayoutStride>::value>
is_layouttiled<typename Kokkos::View<T, P...>::array_layout>::value>
impl_realloc(Kokkos::View<T, P...>& v, impl_realloc(Kokkos::View<T, P...>& v,
const typename Kokkos::View<T, P...>::array_layout& layout, const typename Kokkos::View<T, P...>::array_layout& layout,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) { const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
@ -3331,7 +3347,10 @@ impl_realloc(Kokkos::View<T, P...>& v,
if (v.layout() != layout) { if (v.layout() != layout) {
v = view_type(); // Deallocate first, if the only view to allocation v = view_type(); // Deallocate first, if the only view to allocation
v = view_type(arg_prop, layout); v = view_type(arg_prop, layout);
} else if (alloc_prop_input::initialize) { return;
}
if constexpr (alloc_prop_input::initialize) {
if constexpr (alloc_prop_input::has_execution_space) { if constexpr (alloc_prop_input::has_execution_space) {
const auto& exec_space = const auto& exec_space =
Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop); Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop);
@ -3351,8 +3370,7 @@ inline std::enable_if_t<
std::is_same<typename Kokkos::View<T, P...>::array_layout, std::is_same<typename Kokkos::View<T, P...>::array_layout,
Kokkos::LayoutRight>::value || Kokkos::LayoutRight>::value ||
std::is_same<typename Kokkos::View<T, P...>::array_layout, std::is_same<typename Kokkos::View<T, P...>::array_layout,
Kokkos::LayoutStride>::value || Kokkos::LayoutStride>::value)>
is_layouttiled<typename Kokkos::View<T, P...>::array_layout>::value)>
impl_realloc(Kokkos::View<T, P...>& v, impl_realloc(Kokkos::View<T, P...>& v,
const typename Kokkos::View<T, P...>::array_layout& layout, const typename Kokkos::View<T, P...>::array_layout& layout,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) { const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
@ -3452,6 +3470,7 @@ struct MirrorType {
using view_type = Kokkos::View<data_type, array_layout, Space>; using view_type = Kokkos::View<data_type, array_layout, Space>;
}; };
// collection of static asserts for create_mirror and create_mirror_view
template <class... ViewCtorArgs> template <class... ViewCtorArgs>
void check_view_ctor_args_create_mirror() { void check_view_ctor_args_create_mirror() {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>; using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
@ -3470,232 +3489,231 @@ void check_view_ctor_args_create_mirror() {
"not explicitly allow padding!"); "not explicitly allow padding!");
} }
// create a mirror
// private interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs> template <class T, class... P, class... ViewCtorArgs>
inline std::enable_if_t<!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space, inline auto create_mirror(const Kokkos::View<T, P...>& src,
typename Kokkos::View<T, P...>::HostMirror> const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
create_mirror(const Kokkos::View<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
using src_type = View<T, P...>;
using dst_type = typename src_type::HostMirror;
check_view_ctor_args_create_mirror<ViewCtorArgs...>(); check_view_ctor_args_create_mirror<ViewCtorArgs...>();
auto prop_copy = Impl::with_properties_if_unset( auto prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string(src.label()).append("_mirror")); arg_prop, std::string(src.label()).append("_mirror"));
return dst_type(prop_copy, src.layout()); if constexpr (Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space) {
} using memory_space = typename decltype(prop_copy)::memory_space;
using dst_type =
// Create a mirror in a new space (specialization for different space) typename Impl::MirrorType<memory_space, T, P...>::view_type;
template <class T, class... P, class... ViewCtorArgs, return dst_type(prop_copy, src.layout());
class Enable = std::enable_if_t< } else {
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>> using dst_type = typename View<T, P...>::HostMirror;
auto create_mirror(const Kokkos::View<T, P...>& src, return dst_type(prop_copy, src.layout());
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) { }
check_view_ctor_args_create_mirror<ViewCtorArgs...>(); #if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
auto prop_copy = Impl::with_properties_if_unset( !defined(KOKKOS_COMPILER_MSVC))
arg_prop, std::string(src.label()).append("_mirror")); __builtin_unreachable();
using alloc_prop = decltype(prop_copy); #endif
return typename Impl::MirrorType<typename alloc_prop::memory_space, T,
P...>::view_type(prop_copy, src.layout());
} }
} // namespace Impl } // namespace Impl
template <class T, class... P> // public interface
std::enable_if_t<std::is_void<typename ViewTraits<T, P...>::specialize>::value, template <class T, class... P,
typename Kokkos::View<T, P...>::HostMirror> typename = std::enable_if_t<
create_mirror(Kokkos::View<T, P...> const& v) { std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
return Impl::create_mirror(v, Impl::ViewCtorProp<>{}); auto create_mirror(Kokkos::View<T, P...> const& src) {
return Impl::create_mirror(src, Impl::ViewCtorProp<>{});
} }
template <class T, class... P> // public interface that accepts a without initializing flag
std::enable_if_t<std::is_void<typename ViewTraits<T, P...>::specialize>::value, template <class T, class... P,
typename Kokkos::View<T, P...>::HostMirror> typename = std::enable_if_t<
create_mirror(Kokkos::Impl::WithoutInitializing_t wi, std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
Kokkos::View<T, P...> const& v) { auto create_mirror(Kokkos::Impl::WithoutInitializing_t wi,
return Impl::create_mirror(v, view_alloc(wi)); Kokkos::View<T, P...> const& src) {
return Impl::create_mirror(src, view_alloc(wi));
} }
// public interface that accepts a space
template <class Space, class T, class... P, template <class Space, class T, class... P,
typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
std::enable_if_t<std::is_void<typename ViewTraits<T, P...>::specialize>::value,
typename Impl::MirrorType<Space, T, P...>::view_type>
create_mirror(Space const&, Kokkos::View<T, P...> const& v) {
return Impl::create_mirror(v, view_alloc(typename Space::memory_space{}));
}
template <class T, class... P, class... ViewCtorArgs,
typename Enable = std::enable_if_t< typename Enable = std::enable_if_t<
std::is_void<typename ViewTraits<T, P...>::specialize>::value && Kokkos::is_space<Space>::value &&
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>> std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
auto create_mirror(Space const&, Kokkos::View<T, P...> const& src) {
return Impl::create_mirror(src, view_alloc(typename Space::memory_space{}));
}
// public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs,
typename = std::enable_if_t<
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
auto create_mirror(Impl::ViewCtorProp<ViewCtorArgs...> const& arg_prop, auto create_mirror(Impl::ViewCtorProp<ViewCtorArgs...> const& arg_prop,
Kokkos::View<T, P...> const& v) { Kokkos::View<T, P...> const& src) {
return Impl::create_mirror(v, arg_prop); return Impl::create_mirror(src, arg_prop);
}
template <class T, class... P, class... ViewCtorArgs>
std::enable_if_t<
std::is_void<typename ViewTraits<T, P...>::specialize>::value &&
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space,
typename Kokkos::View<T, P...>::HostMirror>
create_mirror(Impl::ViewCtorProp<ViewCtorArgs...> const& arg_prop,
Kokkos::View<T, P...> const& v) {
return Impl::create_mirror(v, arg_prop);
} }
// public interface that accepts a space and a without initializing flag
template <class Space, class T, class... P, template <class Space, class T, class... P,
typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>> typename Enable = std::enable_if_t<
std::enable_if_t<std::is_void<typename ViewTraits<T, P...>::specialize>::value, Kokkos::is_space<Space>::value &&
typename Impl::MirrorType<Space, T, P...>::view_type> std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
create_mirror(Kokkos::Impl::WithoutInitializing_t wi, Space const&, auto create_mirror(Kokkos::Impl::WithoutInitializing_t wi, Space const&,
Kokkos::View<T, P...> const& v) { Kokkos::View<T, P...> const& src) {
return Impl::create_mirror(v, view_alloc(typename Space::memory_space{}, wi)); return Impl::create_mirror(src,
view_alloc(typename Space::memory_space{}, wi));
} }
namespace Impl { namespace Impl {
// choose a `Kokkos::create_mirror` adapted for the provided view and the
// provided arguments
template <class View, class... ViewCtorArgs>
inline auto choose_create_mirror(
const View& src, const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
// Due to the fact that users can overload `Kokkos::create_mirror`, but also
// that they may not have implemented all of its different possible
// variations, this function chooses the correct private or public version of
// it to call.
// This helper should be used by any overload of
// `Kokkos::Impl::create_mirror_view`.
if constexpr (std::is_void_v<typename View::traits::specialize>) {
// if the view is not specialized, just call the Impl function
// using ADL to find the later defined overload of the function
using namespace Kokkos::Impl;
return create_mirror(src, arg_prop);
} else {
// otherwise, recreate the public call
using ViewProp = Impl::ViewCtorProp<ViewCtorArgs...>;
// using ADL to find the later defined overload of the function
using namespace Kokkos;
if constexpr (sizeof...(ViewCtorArgs) == 0) {
// if there are no view constructor args, call the specific public
// function
return create_mirror(src);
} else if constexpr (sizeof...(ViewCtorArgs) == 1 &&
ViewProp::has_memory_space) {
// if there is one view constructor arg and it has a memory space, call
// the specific public function
return create_mirror(typename ViewProp::memory_space{}, src);
} else if constexpr (sizeof...(ViewCtorArgs) == 1 &&
!ViewProp::initialize) {
// if there is one view constructor arg and it has a without initializing
// mark, call the specific public function
return create_mirror(typename Kokkos::Impl::WithoutInitializing_t{}, src);
} else if constexpr (sizeof...(ViewCtorArgs) == 2 &&
ViewProp::has_memory_space && !ViewProp::initialize) {
// if there is two view constructor args and they have a memory space and
// a without initializing mark, call the specific public function
return create_mirror(typename Kokkos::Impl::WithoutInitializing_t{},
typename ViewProp::memory_space{}, src);
} else {
// if there are other constructor args, call the generic public function
// Beware, there are some libraries using Kokkos that don't implement
// this overload (hence the reason for this present function to exist).
return create_mirror(arg_prop, src);
}
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
__builtin_unreachable();
#endif
}
// create a mirror view
// private interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs> template <class T, class... P, class... ViewCtorArgs>
inline std::enable_if_t< inline auto create_mirror_view(
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space && const Kokkos::View<T, P...>& src,
(std::is_same< [[maybe_unused]] const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
typename Kokkos::View<T, P...>::memory_space, if constexpr (!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space) {
typename Kokkos::View<T, P...>::HostMirror::memory_space>::value && if constexpr (std::is_same<typename Kokkos::View<T, P...>::memory_space,
std::is_same< typename Kokkos::View<
typename Kokkos::View<T, P...>::data_type, T, P...>::HostMirror::memory_space>::value &&
typename Kokkos::View<T, P...>::HostMirror::data_type>::value), std::is_same<typename Kokkos::View<T, P...>::data_type,
typename Kokkos::View<T, P...>::HostMirror> typename Kokkos::View<
create_mirror_view(const Kokkos::View<T, P...>& src, T, P...>::HostMirror::data_type>::value) {
const Impl::ViewCtorProp<ViewCtorArgs...>&) { check_view_ctor_args_create_mirror<ViewCtorArgs...>();
check_view_ctor_args_create_mirror<ViewCtorArgs...>(); return typename Kokkos::View<T, P...>::HostMirror(src);
return src; } else {
} return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
template <class T, class... P, class... ViewCtorArgs> } else {
inline std::enable_if_t< if constexpr (Impl::MirrorViewType<typename Impl::ViewCtorProp<
!Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space && ViewCtorArgs...>::memory_space,
!(std::is_same<typename Kokkos::View<T, P...>::memory_space, T, P...>::is_same_memspace) {
typename Kokkos::View< check_view_ctor_args_create_mirror<ViewCtorArgs...>();
T, P...>::HostMirror::memory_space>::value && return typename Impl::MirrorViewType<
std::is_same< typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
typename Kokkos::View<T, P...>::data_type, P...>::view_type(src);
typename Kokkos::View<T, P...>::HostMirror::data_type>::value), } else {
typename Kokkos::View<T, P...>::HostMirror> return Kokkos::Impl::choose_create_mirror(src, arg_prop);
create_mirror_view(const Kokkos::View<T, P...>& src, }
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) { }
return Kokkos::Impl::create_mirror(src, arg_prop); #if defined(KOKKOS_COMPILER_INTEL) || \
} (defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
// Create a mirror view in a new space (specialization for same space) __builtin_unreachable();
template <class T, class... P, class... ViewCtorArgs, #endif
class = std::enable_if_t<
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>>
std::enable_if_t<Impl::MirrorViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::is_same_memspace,
typename Impl::MirrorViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::view_type>
create_mirror_view(const Kokkos::View<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>&) {
check_view_ctor_args_create_mirror<ViewCtorArgs...>();
return src;
}
// Create a mirror view in a new space (specialization for different space)
template <class T, class... P, class... ViewCtorArgs,
class = std::enable_if_t<
Impl::ViewCtorProp<ViewCtorArgs...>::has_memory_space>>
std::enable_if_t<!Impl::MirrorViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::is_same_memspace,
typename Impl::MirrorViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space,
T, P...>::view_type>
create_mirror_view(const Kokkos::View<T, P...>& src,
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop) {
return Kokkos::Impl::create_mirror(src, arg_prop);
} }
} // namespace Impl } // namespace Impl
// public interface
template <class T, class... P> template <class T, class... P>
std::enable_if_t< auto create_mirror_view(const Kokkos::View<T, P...>& src) {
std::is_same< return Impl::create_mirror_view(src, view_alloc());
typename Kokkos::View<T, P...>::memory_space,
typename Kokkos::View<T, P...>::HostMirror::memory_space>::value &&
std::is_same<
typename Kokkos::View<T, P...>::data_type,
typename Kokkos::View<T, P...>::HostMirror::data_type>::value,
typename Kokkos::View<T, P...>::HostMirror>
create_mirror_view(const Kokkos::View<T, P...>& src) {
return src;
} }
// public interface that accepts a without initializing flag
template <class T, class... P> template <class T, class... P>
std::enable_if_t< auto create_mirror_view(Kokkos::Impl::WithoutInitializing_t wi,
!(std::is_same< Kokkos::View<T, P...> const& src) {
typename Kokkos::View<T, P...>::memory_space, return Impl::create_mirror_view(src, view_alloc(wi));
typename Kokkos::View<T, P...>::HostMirror::memory_space>::value &&
std::is_same<
typename Kokkos::View<T, P...>::data_type,
typename Kokkos::View<T, P...>::HostMirror::data_type>::value),
typename Kokkos::View<T, P...>::HostMirror>
create_mirror_view(const Kokkos::View<T, P...>& src) {
return Kokkos::create_mirror(src);
} }
template <class T, class... P> // public interface that accepts a space
typename Kokkos::View<T, P...>::HostMirror create_mirror_view(
Kokkos::Impl::WithoutInitializing_t wi, Kokkos::View<T, P...> const& v) {
return Impl::create_mirror_view(v, view_alloc(wi));
}
// FIXME_C++17 Improve SFINAE here.
template <class Space, class T, class... P, template <class Space, class T, class... P,
class Enable = std::enable_if_t<Kokkos::is_space<Space>::value>> class Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
typename Impl::MirrorViewType<Space, T, P...>::view_type create_mirror_view( auto create_mirror_view(const Space&, const Kokkos::View<T, P...>& src) {
const Space&, const Kokkos::View<T, P...>& src, return Impl::create_mirror_view(src,
std::enable_if_t<Impl::MirrorViewType<Space, T, P...>::is_same_memspace>* = view_alloc(typename Space::memory_space()));
nullptr) {
return src;
}
// FIXME_C++17 Improve SFINAE here.
template <class Space, class T, class... P,
class Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
typename Impl::MirrorViewType<Space, T, P...>::view_type create_mirror_view(
const Space& space, const Kokkos::View<T, P...>& src,
std::enable_if_t<!Impl::MirrorViewType<Space, T, P...>::is_same_memspace>* =
nullptr) {
return Kokkos::create_mirror(space, src);
} }
// public interface that accepts a space and a without initializing flag
template <class Space, class T, class... P, template <class Space, class T, class... P,
typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>> typename Enable = std::enable_if_t<Kokkos::is_space<Space>::value>>
typename Impl::MirrorViewType<Space, T, P...>::view_type create_mirror_view( auto create_mirror_view(Kokkos::Impl::WithoutInitializing_t wi, Space const&,
Kokkos::Impl::WithoutInitializing_t wi, Space const&, Kokkos::View<T, P...> const& src) {
Kokkos::View<T, P...> const& v) {
return Impl::create_mirror_view( return Impl::create_mirror_view(
v, view_alloc(typename Space::memory_space{}, wi)); src, view_alloc(typename Space::memory_space{}, wi));
} }
template <class T, class... P, class... ViewCtorArgs> // public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class T, class... P, class... ViewCtorArgs,
typename = std::enable_if_t<
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
auto create_mirror_view(const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, auto create_mirror_view(const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
const Kokkos::View<T, P...>& v) { const Kokkos::View<T, P...>& src) {
return Impl::create_mirror_view(v, arg_prop); return Impl::create_mirror_view(src, arg_prop);
} }
template <class... ViewCtorArgs, class T, class... P> namespace Impl {
auto create_mirror_view_and_copy(
const Impl::ViewCtorProp<ViewCtorArgs...>&, // collection of static asserts for create_mirror_view_and_copy
const Kokkos::View<T, P...>& src, template <class... ViewCtorArgs>
std::enable_if_t< void check_view_ctor_args_create_mirror_view_and_copy() {
std::is_void<typename ViewTraits<T, P...>::specialize>::value &&
Impl::MirrorViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::is_same_memspace>* = nullptr) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>; using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
static_assert( static_assert(
alloc_prop_input::has_memory_space, alloc_prop_input::has_memory_space,
"The view constructor arguments passed to " "The view constructor arguments passed to "
@ -3708,52 +3726,53 @@ auto create_mirror_view_and_copy(
"The view constructor arguments passed to " "The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must " "Kokkos::create_mirror_view_and_copy must "
"not explicitly allow padding!"); "not explicitly allow padding!");
// same behavior as deep_copy(src, src)
if (!alloc_prop_input::has_execution_space)
fence(
"Kokkos::create_mirror_view_and_copy: fence before returning src view");
return src;
} }
template <class... ViewCtorArgs, class T, class... P> } // namespace Impl
// create a mirror view and deep copy it
// public interface that accepts arbitrary view constructor args passed by a
// view_alloc
template <class... ViewCtorArgs, class T, class... P,
class Enable = std::enable_if_t<
std::is_void_v<typename ViewTraits<T, P...>::specialize>>>
auto create_mirror_view_and_copy( auto create_mirror_view_and_copy(
const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop, [[maybe_unused]] const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
const Kokkos::View<T, P...>& src, const Kokkos::View<T, P...>& src) {
std::enable_if_t<
std::is_void<typename ViewTraits<T, P...>::specialize>::value &&
!Impl::MirrorViewType<
typename Impl::ViewCtorProp<ViewCtorArgs...>::memory_space, T,
P...>::is_same_memspace>* = nullptr) {
using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>; using alloc_prop_input = Impl::ViewCtorProp<ViewCtorArgs...>;
static_assert(
alloc_prop_input::has_memory_space,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must include a memory space!");
static_assert(!alloc_prop_input::has_pointer,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not include a pointer!");
static_assert(!alloc_prop_input::allow_padding,
"The view constructor arguments passed to "
"Kokkos::create_mirror_view_and_copy must "
"not explicitly allow padding!");
using Space = typename alloc_prop_input::memory_space;
using Mirror = typename Impl::MirrorViewType<Space, T, P...>::view_type;
auto arg_prop_copy = Impl::with_properties_if_unset( Impl::check_view_ctor_args_create_mirror_view_and_copy<ViewCtorArgs...>();
arg_prop, std::string{}, WithoutInitializing,
typename Space::execution_space{});
std::string& label = Impl::get_property<Impl::LabelTag>(arg_prop_copy); if constexpr (Impl::MirrorViewType<typename alloc_prop_input::memory_space, T,
if (label.empty()) label = src.label(); P...>::is_same_memspace) {
auto mirror = typename Mirror::non_const_type{arg_prop_copy, src.layout()}; // same behavior as deep_copy(src, src)
if constexpr (alloc_prop_input::has_execution_space) { if constexpr (!alloc_prop_input::has_execution_space)
deep_copy(Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop_copy), fence(
mirror, src); "Kokkos::create_mirror_view_and_copy: fence before returning src "
} else "view");
deep_copy(mirror, src); return src;
return mirror; } else {
using Space = typename alloc_prop_input::memory_space;
using Mirror = typename Impl::MirrorViewType<Space, T, P...>::view_type;
auto arg_prop_copy = Impl::with_properties_if_unset(
arg_prop, std::string{}, WithoutInitializing,
typename Space::execution_space{});
std::string& label = Impl::get_property<Impl::LabelTag>(arg_prop_copy);
if (label.empty()) label = src.label();
auto mirror = typename Mirror::non_const_type{arg_prop_copy, src.layout()};
if constexpr (alloc_prop_input::has_execution_space) {
deep_copy(Impl::get_property<Impl::ExecutionSpaceTag>(arg_prop_copy),
mirror, src);
} else
deep_copy(mirror, src);
return mirror;
}
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
} }
// Previously when using auto here, the intel compiler 19.3 would // Previously when using auto here, the intel compiler 19.3 would

View File

@ -40,7 +40,12 @@ struct ParallelReduceTag {};
struct ChunkSize { struct ChunkSize {
int value; int value;
explicit ChunkSize(int value_) : value(value_) {}
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
template <typename T = void>
KOKKOS_DEPRECATED_WITH_COMMENT("ChunkSize should be constructed explicitly.")
ChunkSize(int value_) : value(value_) {} ChunkSize(int value_) : value(value_) {}
#endif
}; };
/** \brief Execution policy for work over a range of an integral type. /** \brief Execution policy for work over a range of an integral type.
@ -714,6 +719,58 @@ class TeamPolicy
} }
}; };
// Execution space not provided deduces to TeamPolicy<>
TeamPolicy()->TeamPolicy<>;
TeamPolicy(int, int)->TeamPolicy<>;
TeamPolicy(int, int, int)->TeamPolicy<>;
TeamPolicy(int, Kokkos::AUTO_t const&)->TeamPolicy<>;
TeamPolicy(int, Kokkos::AUTO_t const&, int)->TeamPolicy<>;
TeamPolicy(int, Kokkos::AUTO_t const&, Kokkos::AUTO_t const&)->TeamPolicy<>;
TeamPolicy(int, int, Kokkos::AUTO_t const&)->TeamPolicy<>;
// DefaultExecutionSpace deduces to TeamPolicy<>
TeamPolicy(DefaultExecutionSpace const&, int, int)->TeamPolicy<>;
TeamPolicy(DefaultExecutionSpace const&, int, int, int)->TeamPolicy<>;
TeamPolicy(DefaultExecutionSpace const&, int, Kokkos::AUTO_t const&)
->TeamPolicy<>;
TeamPolicy(DefaultExecutionSpace const&, int, Kokkos::AUTO_t const&, int)
->TeamPolicy<>;
TeamPolicy(DefaultExecutionSpace const&, int, Kokkos::AUTO_t const&,
Kokkos::AUTO_t const&)
->TeamPolicy<>;
TeamPolicy(DefaultExecutionSpace const&, int, int, Kokkos::AUTO_t const&)
->TeamPolicy<>;
// ES != DefaultExecutionSpace deduces to TeamPolicy<ES>
template <typename ES,
typename = std::enable_if_t<Kokkos::is_execution_space_v<ES>>>
TeamPolicy(ES const&, int, int)->TeamPolicy<ES>;
template <typename ES,
typename = std::enable_if_t<Kokkos::is_execution_space_v<ES>>>
TeamPolicy(ES const&, int, int, int)->TeamPolicy<ES>;
template <typename ES,
typename = std::enable_if_t<Kokkos::is_execution_space_v<ES>>>
TeamPolicy(ES const&, int, Kokkos::AUTO_t const&)->TeamPolicy<ES>;
template <typename ES,
typename = std::enable_if_t<Kokkos::is_execution_space_v<ES>>>
TeamPolicy(ES const&, int, Kokkos::AUTO_t const&, int)->TeamPolicy<ES>;
template <typename ES,
typename = std::enable_if_t<Kokkos::is_execution_space_v<ES>>>
TeamPolicy(ES const&, int, Kokkos::AUTO_t const&, Kokkos::AUTO_t const&)
->TeamPolicy<ES>;
template <typename ES,
typename = std::enable_if_t<Kokkos::is_execution_space_v<ES>>>
TeamPolicy(ES const&, int, int, Kokkos::AUTO_t const&)->TeamPolicy<ES>;
namespace Impl { namespace Impl {
template <typename iType, class TeamMemberType> template <typename iType, class TeamMemberType>
@ -968,9 +1025,9 @@ struct TeamThreadMDRange<Rank<N, OuterDir, InnerDir>, TeamHandle> {
static constexpr auto par_vector = Impl::TeamMDRangeParVector::NotParVector; static constexpr auto par_vector = Impl::TeamMDRangeParVector::NotParVector;
static constexpr Iterate direction = static constexpr Iterate direction =
OuterDir == Iterate::Default OuterDir == Iterate::Default ? Impl::layout_iterate_type_selector<
? layout_iterate_type_selector<ArrayLayout>::outer_iteration_pattern ArrayLayout>::outer_iteration_pattern
: iter; : iter;
template <class... Args> template <class... Args>
KOKKOS_FUNCTION TeamThreadMDRange(TeamHandleType const& team_, Args&&... args) KOKKOS_FUNCTION TeamThreadMDRange(TeamHandleType const& team_, Args&&... args)
@ -983,7 +1040,7 @@ struct TeamThreadMDRange<Rank<N, OuterDir, InnerDir>, TeamHandle> {
}; };
template <typename TeamHandle, typename... Args> template <typename TeamHandle, typename... Args>
TeamThreadMDRange(TeamHandle const&, Args&&...) KOKKOS_DEDUCTION_GUIDE TeamThreadMDRange(TeamHandle const&, Args&&...)
->TeamThreadMDRange<Rank<sizeof...(Args), Iterate::Default>, TeamHandle>; ->TeamThreadMDRange<Rank<sizeof...(Args), Iterate::Default>, TeamHandle>;
template <typename Rank, typename TeamHandle> template <typename Rank, typename TeamHandle>
@ -1004,9 +1061,9 @@ struct ThreadVectorMDRange<Rank<N, OuterDir, InnerDir>, TeamHandle> {
static constexpr auto par_vector = Impl::TeamMDRangeParVector::ParVector; static constexpr auto par_vector = Impl::TeamMDRangeParVector::ParVector;
static constexpr Iterate direction = static constexpr Iterate direction =
OuterDir == Iterate::Default OuterDir == Iterate::Default ? Impl::layout_iterate_type_selector<
? layout_iterate_type_selector<ArrayLayout>::outer_iteration_pattern ArrayLayout>::outer_iteration_pattern
: iter; : iter;
template <class... Args> template <class... Args>
KOKKOS_INLINE_FUNCTION ThreadVectorMDRange(TeamHandleType const& team_, KOKKOS_INLINE_FUNCTION ThreadVectorMDRange(TeamHandleType const& team_,
@ -1020,7 +1077,7 @@ struct ThreadVectorMDRange<Rank<N, OuterDir, InnerDir>, TeamHandle> {
}; };
template <typename TeamHandle, typename... Args> template <typename TeamHandle, typename... Args>
ThreadVectorMDRange(TeamHandle const&, Args&&...) KOKKOS_DEDUCTION_GUIDE ThreadVectorMDRange(TeamHandle const&, Args&&...)
->ThreadVectorMDRange<Rank<sizeof...(Args), Iterate::Default>, TeamHandle>; ->ThreadVectorMDRange<Rank<sizeof...(Args), Iterate::Default>, TeamHandle>;
template <typename Rank, typename TeamHandle> template <typename Rank, typename TeamHandle>
@ -1041,9 +1098,9 @@ struct TeamVectorMDRange<Rank<N, OuterDir, InnerDir>, TeamHandle> {
static constexpr auto par_vector = Impl::TeamMDRangeParVector::ParVector; static constexpr auto par_vector = Impl::TeamMDRangeParVector::ParVector;
static constexpr Iterate direction = static constexpr Iterate direction =
iter == Iterate::Default iter == Iterate::Default ? Impl::layout_iterate_type_selector<
? layout_iterate_type_selector<ArrayLayout>::outer_iteration_pattern ArrayLayout>::outer_iteration_pattern
: iter; : iter;
template <class... Args> template <class... Args>
KOKKOS_INLINE_FUNCTION TeamVectorMDRange(TeamHandleType const& team_, KOKKOS_INLINE_FUNCTION TeamVectorMDRange(TeamHandleType const& team_,
@ -1057,7 +1114,7 @@ struct TeamVectorMDRange<Rank<N, OuterDir, InnerDir>, TeamHandle> {
}; };
template <typename TeamHandle, typename... Args> template <typename TeamHandle, typename... Args>
TeamVectorMDRange(TeamHandle const&, Args&&...) KOKKOS_DEDUCTION_GUIDE TeamVectorMDRange(TeamHandle const&, Args&&...)
->TeamVectorMDRange<Rank<sizeof...(Args), Iterate::Default>, TeamHandle>; ->TeamVectorMDRange<Rank<sizeof...(Args), Iterate::Default>, TeamHandle>;
template <typename Rank, typename TeamHandle, typename Lambda, template <typename Rank, typename TeamHandle, typename Lambda,

View File

@ -25,33 +25,40 @@ static_assert(false,
#include <cstddef> #include <cstddef>
#include <type_traits> #include <type_traits>
#include <Kokkos_Macros.hpp> #include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_IMPL_MDSPAN
#include <mdspan/mdspan.hpp>
#else
#include <limits>
#endif
namespace Kokkos { namespace Kokkos {
#ifndef KOKKOS_ENABLE_IMPL_MDSPAN
constexpr size_t dynamic_extent = std::numeric_limits<size_t>::max();
#endif
namespace Experimental { namespace Experimental {
constexpr ptrdiff_t dynamic_extent = -1; template <size_t... ExtentSpecs>
template <ptrdiff_t... ExtentSpecs>
struct Extents { struct Extents {
/* TODO @enhancement flesh this out more */ /* TODO @enhancement flesh this out more */
}; };
template <class Exts, ptrdiff_t NewExtent> template <class Exts, size_t NewExtent>
struct PrependExtent; struct PrependExtent;
template <ptrdiff_t... Exts, ptrdiff_t NewExtent> template <size_t... Exts, size_t NewExtent>
struct PrependExtent<Extents<Exts...>, NewExtent> { struct PrependExtent<Extents<Exts...>, NewExtent> {
using type = Extents<NewExtent, Exts...>; using type = Extents<NewExtent, Exts...>;
}; };
template <class Exts, ptrdiff_t NewExtent> template <class Exts, size_t NewExtent>
struct AppendExtent; struct AppendExtent;
template <ptrdiff_t... Exts, ptrdiff_t NewExtent> template <size_t... Exts, size_t NewExtent>
struct AppendExtent<Extents<Exts...>, NewExtent> { struct AppendExtent<Extents<Exts...>, NewExtent> {
using type = Extents<Exts..., NewExtent>; using type = Extents<Exts..., NewExtent>;
}; };
} // end namespace Experimental } // end namespace Experimental
namespace Impl { namespace Impl {
@ -75,33 +82,32 @@ struct _parse_impl {
// We have to treat the case of int**[x] specially, since it *doesn't* go // We have to treat the case of int**[x] specially, since it *doesn't* go
// backwards // backwards
template <class T, ptrdiff_t... ExtentSpec> template <class T, size_t... ExtentSpec>
struct _parse_impl<T*, Kokkos::Experimental::Extents<ExtentSpec...>, struct _parse_impl<T*, Kokkos::Experimental::Extents<ExtentSpec...>,
std::enable_if_t<_all_remaining_extents_dynamic<T>::value>> std::enable_if_t<_all_remaining_extents_dynamic<T>::value>>
: _parse_impl<T, Kokkos::Experimental::Extents< : _parse_impl<T, Kokkos::Experimental::Extents<Kokkos::dynamic_extent,
Kokkos::Experimental::dynamic_extent, ExtentSpec...>> { ExtentSpec...>> {};
};
// int*(*[x])[y] should still work also (meaning int[][x][][y]) // int*(*[x])[y] should still work also (meaning int[][x][][y])
template <class T, ptrdiff_t... ExtentSpec> template <class T, size_t... ExtentSpec>
struct _parse_impl< struct _parse_impl<
T*, Kokkos::Experimental::Extents<ExtentSpec...>, T*, Kokkos::Experimental::Extents<ExtentSpec...>,
std::enable_if_t<!_all_remaining_extents_dynamic<T>::value>> { std::enable_if_t<!_all_remaining_extents_dynamic<T>::value>> {
using _next = Kokkos::Experimental::AppendExtent< using _next = Kokkos::Experimental::AppendExtent<
typename _parse_impl<T, Kokkos::Experimental::Extents<ExtentSpec...>, typename _parse_impl<T, Kokkos::Experimental::Extents<ExtentSpec...>,
void>::type, void>::type,
Kokkos::Experimental::dynamic_extent>; Kokkos::dynamic_extent>;
using type = typename _next::type; using type = typename _next::type;
}; };
template <class T, ptrdiff_t... ExtentSpec, unsigned N> template <class T, size_t... ExtentSpec, unsigned N>
struct _parse_impl<T[N], Kokkos::Experimental::Extents<ExtentSpec...>, void> struct _parse_impl<T[N], Kokkos::Experimental::Extents<ExtentSpec...>, void>
: _parse_impl< : _parse_impl<T,
T, Kokkos::Experimental::Extents<ExtentSpec..., Kokkos::Experimental::Extents<ExtentSpec...,
ptrdiff_t(N)> // TODO @pedantic this size_t(N)> // TODO @pedantic
// could be a // this could be a
// narrowing cast // narrowing cast
> {}; > {};
} // end namespace _parse_view_extents_impl } // end namespace _parse_view_extents_impl
@ -111,38 +117,34 @@ struct ParseViewExtents {
DataType, Kokkos::Experimental::Extents<>>::type; DataType, Kokkos::Experimental::Extents<>>::type;
}; };
template <class ValueType, ptrdiff_t Ext> template <class ValueType, size_t Ext>
struct ApplyExtent { struct ApplyExtent {
using type = ValueType[Ext]; using type = ValueType[Ext];
}; };
template <class ValueType> template <class ValueType>
struct ApplyExtent<ValueType, Kokkos::Experimental::dynamic_extent> { struct ApplyExtent<ValueType, Kokkos::dynamic_extent> {
using type = ValueType*; using type = ValueType*;
}; };
template <class ValueType, unsigned N, ptrdiff_t Ext> template <class ValueType, unsigned N, size_t Ext>
struct ApplyExtent<ValueType[N], Ext> { struct ApplyExtent<ValueType[N], Ext> {
using type = typename ApplyExtent<ValueType, Ext>::type[N]; using type = typename ApplyExtent<ValueType, Ext>::type[N];
}; };
template <class ValueType, ptrdiff_t Ext> template <class ValueType, size_t Ext>
struct ApplyExtent<ValueType*, Ext> { struct ApplyExtent<ValueType*, Ext> {
using type = ValueType * [Ext]; using type = ValueType * [Ext];
}; };
template <class ValueType> template <class ValueType>
struct ApplyExtent<ValueType*, Kokkos::Experimental::dynamic_extent> { struct ApplyExtent<ValueType*, dynamic_extent> {
using type = using type = typename ApplyExtent<ValueType, dynamic_extent>::type*;
typename ApplyExtent<ValueType,
Kokkos::Experimental::dynamic_extent>::type*;
}; };
template <class ValueType, unsigned N> template <class ValueType, unsigned N>
struct ApplyExtent<ValueType[N], Kokkos::Experimental::dynamic_extent> { struct ApplyExtent<ValueType[N], dynamic_extent> {
using type = using type = typename ApplyExtent<ValueType, dynamic_extent>::type[N];
typename ApplyExtent<ValueType,
Kokkos::Experimental::dynamic_extent>::type[N];
}; };
} // end namespace Impl } // end namespace Impl

View File

@ -167,6 +167,9 @@ Graph<ExecutionSpace> create_graph(Closure&& arg_closure) {
#include <HIP/Kokkos_HIP_Graph_Impl.hpp> #include <HIP/Kokkos_HIP_Graph_Impl.hpp>
#endif #endif
#endif #endif
#ifdef SYCL_EXT_ONEAPI_GRAPH
#include <SYCL/Kokkos_SYCL_Graph_Impl.hpp>
#endif
#ifdef KOKKOS_IMPL_PUBLIC_INCLUDE_NOTDEFINED_GRAPH #ifdef KOKKOS_IMPL_PUBLIC_INCLUDE_NOTDEFINED_GRAPH
#undef KOKKOS_IMPL_PUBLIC_INCLUDE #undef KOKKOS_IMPL_PUBLIC_INCLUDE
#undef KOKKOS_IMPL_PUBLIC_INCLUDE_NOTDEFINED_GRAPH #undef KOKKOS_IMPL_PUBLIC_INCLUDE_NOTDEFINED_GRAPH

View File

@ -113,7 +113,6 @@ class HostSpace {
const size_t arg_alloc_size, const size_t arg_alloc_size,
const size_t arg_logical_size = 0) const; const size_t arg_logical_size = 0) const;
private:
void* impl_allocate(const char* arg_label, const size_t arg_alloc_size, void* impl_allocate(const char* arg_label, const size_t arg_alloc_size,
const size_t arg_logical_size = 0, const size_t arg_logical_size = 0,
const Kokkos::Tools::SpaceHandle = const Kokkos::Tools::SpaceHandle =
@ -124,7 +123,6 @@ class HostSpace {
const Kokkos::Tools::SpaceHandle = const Kokkos::Tools::SpaceHandle =
Kokkos::Tools::make_space_handle(name())) const; Kokkos::Tools::make_space_handle(name())) const;
public:
/**\brief Return Name of the MemorySpace */ /**\brief Return Name of the MemorySpace */
static constexpr const char* name() { return m_name; } static constexpr const char* name() { return m_name; }

View File

@ -217,81 +217,12 @@ enum class Iterate {
Right // Right indices stride fastest Right // Right indices stride fastest
}; };
// To check for LayoutTiled #ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
// This is to hide extra compile-time 'identifier' info within the LayoutTiled template <typename Layout, class Enable = void>
// class by not relying on template specialization to include the ArgN*'s struct KOKKOS_DEPRECATED is_layouttiled : std::false_type {};
template <typename LayoutTiledCheck, class Enable = void> #endif
struct is_layouttiled : std::false_type {};
template <typename LayoutTiledCheck>
struct is_layouttiled<LayoutTiledCheck,
std::enable_if_t<LayoutTiledCheck::is_array_layout_tiled>>
: std::true_type {};
namespace Experimental {
/// LayoutTiled
// Must have Rank >= 2
template <
Kokkos::Iterate OuterP, Kokkos::Iterate InnerP, unsigned ArgN0,
unsigned ArgN1, unsigned ArgN2 = 0, unsigned ArgN3 = 0, unsigned ArgN4 = 0,
unsigned ArgN5 = 0, unsigned ArgN6 = 0, unsigned ArgN7 = 0,
bool IsPowerOfTwo =
(Kokkos::Impl::is_integral_power_of_two(ArgN0) &&
Kokkos::Impl::is_integral_power_of_two(ArgN1) &&
(Kokkos::Impl::is_integral_power_of_two(ArgN2) || (ArgN2 == 0)) &&
(Kokkos::Impl::is_integral_power_of_two(ArgN3) || (ArgN3 == 0)) &&
(Kokkos::Impl::is_integral_power_of_two(ArgN4) || (ArgN4 == 0)) &&
(Kokkos::Impl::is_integral_power_of_two(ArgN5) || (ArgN5 == 0)) &&
(Kokkos::Impl::is_integral_power_of_two(ArgN6) || (ArgN6 == 0)) &&
(Kokkos::Impl::is_integral_power_of_two(ArgN7) || (ArgN7 == 0)))>
struct LayoutTiled {
static_assert(IsPowerOfTwo,
"LayoutTiled must be given power-of-two tile dimensions");
using array_layout = LayoutTiled<OuterP, InnerP, ArgN0, ArgN1, ArgN2, ArgN3,
ArgN4, ArgN5, ArgN6, ArgN7, IsPowerOfTwo>;
static constexpr Iterate outer_pattern = OuterP;
static constexpr Iterate inner_pattern = InnerP;
enum { N0 = ArgN0 };
enum { N1 = ArgN1 };
enum { N2 = ArgN2 };
enum { N3 = ArgN3 };
enum { N4 = ArgN4 };
enum { N5 = ArgN5 };
enum { N6 = ArgN6 };
enum { N7 = ArgN7 };
size_t dimension[ARRAY_LAYOUT_MAX_RANK];
enum : bool { is_extent_constructible = true };
LayoutTiled(LayoutTiled const&) = default;
LayoutTiled(LayoutTiled&&) = default;
LayoutTiled& operator=(LayoutTiled const&) = default;
LayoutTiled& operator=(LayoutTiled&&) = default;
KOKKOS_INLINE_FUNCTION
explicit constexpr LayoutTiled(size_t argN0 = 0, size_t argN1 = 0,
size_t argN2 = 0, size_t argN3 = 0,
size_t argN4 = 0, size_t argN5 = 0,
size_t argN6 = 0, size_t argN7 = 0)
: dimension{argN0, argN1, argN2, argN3, argN4, argN5, argN6, argN7} {}
friend bool operator==(const LayoutTiled& left, const LayoutTiled& right) {
for (unsigned int rank = 0; rank < ARRAY_LAYOUT_MAX_RANK; ++rank)
if (left.dimension[rank] != right.dimension[rank]) return false;
return true;
}
friend bool operator!=(const LayoutTiled& left, const LayoutTiled& right) {
return !(left == right);
}
};
} // namespace Experimental
namespace Impl {
// For use with view_copy // For use with view_copy
template <typename... Layout> template <typename... Layout>
struct layout_iterate_type_selector { struct layout_iterate_type_selector {
@ -320,42 +251,13 @@ struct layout_iterate_type_selector<Kokkos::LayoutStride> {
static const Kokkos::Iterate inner_iteration_pattern = static const Kokkos::Iterate inner_iteration_pattern =
Kokkos::Iterate::Default; Kokkos::Iterate::Default;
}; };
} // namespace Impl
template <unsigned ArgN0, unsigned ArgN1, unsigned ArgN2, unsigned ArgN3, #ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
unsigned ArgN4, unsigned ArgN5, unsigned ArgN6, unsigned ArgN7> template <typename... Layout>
struct layout_iterate_type_selector<Kokkos::Experimental::LayoutTiled< using layout_iterate_type_selector KOKKOS_DEPRECATED =
Kokkos::Iterate::Left, Kokkos::Iterate::Left, ArgN0, ArgN1, ArgN2, ArgN3, Impl::layout_iterate_type_selector<Layout...>;
ArgN4, ArgN5, ArgN6, ArgN7, true>> { #endif
static const Kokkos::Iterate outer_iteration_pattern = Kokkos::Iterate::Left;
static const Kokkos::Iterate inner_iteration_pattern = Kokkos::Iterate::Left;
};
template <unsigned ArgN0, unsigned ArgN1, unsigned ArgN2, unsigned ArgN3,
unsigned ArgN4, unsigned ArgN5, unsigned ArgN6, unsigned ArgN7>
struct layout_iterate_type_selector<Kokkos::Experimental::LayoutTiled<
Kokkos::Iterate::Right, Kokkos::Iterate::Left, ArgN0, ArgN1, ArgN2, ArgN3,
ArgN4, ArgN5, ArgN6, ArgN7, true>> {
static const Kokkos::Iterate outer_iteration_pattern = Kokkos::Iterate::Right;
static const Kokkos::Iterate inner_iteration_pattern = Kokkos::Iterate::Left;
};
template <unsigned ArgN0, unsigned ArgN1, unsigned ArgN2, unsigned ArgN3,
unsigned ArgN4, unsigned ArgN5, unsigned ArgN6, unsigned ArgN7>
struct layout_iterate_type_selector<Kokkos::Experimental::LayoutTiled<
Kokkos::Iterate::Left, Kokkos::Iterate::Right, ArgN0, ArgN1, ArgN2, ArgN3,
ArgN4, ArgN5, ArgN6, ArgN7, true>> {
static const Kokkos::Iterate outer_iteration_pattern = Kokkos::Iterate::Left;
static const Kokkos::Iterate inner_iteration_pattern = Kokkos::Iterate::Right;
};
template <unsigned ArgN0, unsigned ArgN1, unsigned ArgN2, unsigned ArgN3,
unsigned ArgN4, unsigned ArgN5, unsigned ArgN6, unsigned ArgN7>
struct layout_iterate_type_selector<Kokkos::Experimental::LayoutTiled<
Kokkos::Iterate::Right, Kokkos::Iterate::Right, ArgN0, ArgN1, ArgN2, ArgN3,
ArgN4, ArgN5, ArgN6, ArgN7, true>> {
static const Kokkos::Iterate outer_iteration_pattern = Kokkos::Iterate::Right;
static const Kokkos::Iterate inner_iteration_pattern = Kokkos::Iterate::Right;
};
} // namespace Kokkos } // namespace Kokkos

View File

@ -55,9 +55,22 @@
#ifndef KOKKOS_DONT_INCLUDE_CORE_CONFIG_H #ifndef KOKKOS_DONT_INCLUDE_CORE_CONFIG_H
#include <KokkosCore_config.h> #include <KokkosCore_config.h>
#include <impl/Kokkos_DesulAtomicsConfig.hpp>
#include <impl/Kokkos_NvidiaGpuArchitectures.hpp> #include <impl/Kokkos_NvidiaGpuArchitectures.hpp>
#endif #endif
#if !defined(KOKKOS_ENABLE_CXX17)
#if __has_include(<version>)
#include <version>
#else
#include <ciso646>
#endif
#if defined(_GLIBCXX_RELEASE) && _GLIBCXX_RELEASE < 10
#error \
"Compiling with support for C++20 or later requires a libstdc++ version later than 9"
#endif
#endif
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
/** Pick up compiler specific #define macros: /** Pick up compiler specific #define macros:
* *
@ -332,6 +345,10 @@
#define KOKKOS_DEFAULTED_FUNCTION #define KOKKOS_DEFAULTED_FUNCTION
#endif #endif
#if !defined(KOKKOS_DEDUCTION_GUIDE)
#define KOKKOS_DEDUCTION_GUIDE
#endif
#if !defined(KOKKOS_IMPL_HOST_FUNCTION) #if !defined(KOKKOS_IMPL_HOST_FUNCTION)
#define KOKKOS_IMPL_HOST_FUNCTION #define KOKKOS_IMPL_HOST_FUNCTION
#endif #endif
@ -562,8 +579,44 @@ static constexpr bool kokkos_omp_on_host() { return false; }
#define KOKKOS_IMPL_WARNING(desc) KOKKOS_IMPL_DO_PRAGMA(message(#desc)) #define KOKKOS_IMPL_WARNING(desc) KOKKOS_IMPL_DO_PRAGMA(message(#desc))
#endif #endif
// clang-format off
#if defined(__NVCOMPILER)
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_PUSH() \
_Pragma("diag_suppress 1216")
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_POP() \
_Pragma("diag_default 1216")
#elif defined(__EDG__)
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_PUSH() \
_Pragma("warning push") \
_Pragma("warning disable 1478")
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_POP() \
_Pragma("warning pop")
#elif defined(__GNUC__) || defined(__clang__)
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_PUSH() \
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wdeprecated-declarations\"")
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_POP() \
_Pragma("GCC diagnostic pop")
#elif defined(_MSC_VER)
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_PUSH() \
_Pragma("warning(push)") \
_Pragma("warning(disable: 4996)")
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_POP() \
_Pragma("warning(pop)")
#else
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_PUSH()
#define KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_POP()
#endif
// clang-format on
#define KOKKOS_ATTRIBUTE_NODISCARD [[nodiscard]] #define KOKKOS_ATTRIBUTE_NODISCARD [[nodiscard]]
#ifndef KOKKOS_ENABLE_CXX17
#define KOKKOS_IMPL_ATTRIBUTE_UNLIKELY [[unlikely]]
#else
#define KOKKOS_IMPL_ATTRIBUTE_UNLIKELY
#endif
#if (defined(KOKKOS_COMPILER_GNU) || defined(KOKKOS_COMPILER_CLANG) || \ #if (defined(KOKKOS_COMPILER_GNU) || defined(KOKKOS_COMPILER_CLANG) || \
defined(KOKKOS_COMPILER_INTEL) || defined(KOKKOS_COMPILER_INTEL_LLVM) || \ defined(KOKKOS_COMPILER_INTEL) || defined(KOKKOS_COMPILER_INTEL_LLVM) || \
defined(KOKKOS_COMPILER_NVHPC)) && \ defined(KOKKOS_COMPILER_NVHPC)) && \

View File

@ -277,12 +277,20 @@ KOKKOS_INLINE_FUNCTION long long abs(long long n) {
#endif #endif
} }
KOKKOS_INLINE_FUNCTION float abs(float x) { KOKKOS_INLINE_FUNCTION float abs(float x) {
#ifdef KOKKOS_ENABLE_SYCL
return sycl::fabs(x); // sycl::abs is only provided for integral types
#else
using KOKKOS_IMPL_MATH_FUNCTIONS_NAMESPACE::abs; using KOKKOS_IMPL_MATH_FUNCTIONS_NAMESPACE::abs;
return abs(x); return abs(x);
#endif
} }
KOKKOS_INLINE_FUNCTION double abs(double x) { KOKKOS_INLINE_FUNCTION double abs(double x) {
#ifdef KOKKOS_ENABLE_SYCL
return sycl::fabs(x); // sycl::abs is only provided for integral types
#else
using KOKKOS_IMPL_MATH_FUNCTIONS_NAMESPACE::abs; using KOKKOS_IMPL_MATH_FUNCTIONS_NAMESPACE::abs;
return abs(x); return abs(x);
#endif
} }
inline long double abs(long double x) { inline long double abs(long double x) {
using std::abs; using std::abs;

View File

@ -413,12 +413,13 @@ KOKKOS_FORCEINLINE_FUNCTION pair<T1&, T2&> tie(T1& x, T2& y) {
return (pair<T1&, T2&>(x, y)); return (pair<T1&, T2&>(x, y));
} }
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
// //
// Specialization of Kokkos::pair for a \c void second argument. This // Specialization of Kokkos::pair for a \c void second argument. This
// is not actually a "pair"; it only contains one element, the first. // is not actually a "pair"; it only contains one element, the first.
// //
template <class T1> template <class T1>
struct pair<T1, void> { struct KOKKOS_DEPRECATED pair<T1, void> {
using first_type = T1; using first_type = T1;
using second_type = void; using second_type = void;
@ -448,41 +449,48 @@ struct pair<T1, void> {
// Specialization of relational operators for Kokkos::pair<T1,void>. // Specialization of relational operators for Kokkos::pair<T1,void>.
// //
#if defined(KOKKOS_COMPILER_GNU) && (KOKKOS_COMPILER_GNU < 1110)
KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_PUSH()
#endif
template <class T1> template <class T1>
KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator==( KOKKOS_DEPRECATED KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator==(
const pair<T1, void>& lhs, const pair<T1, void>& rhs) { const pair<T1, void>& lhs, const pair<T1, void>& rhs) {
return lhs.first == rhs.first; return lhs.first == rhs.first;
} }
template <class T1> template <class T1>
KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator!=( KOKKOS_DEPRECATED KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator!=(
const pair<T1, void>& lhs, const pair<T1, void>& rhs) { const pair<T1, void>& lhs, const pair<T1, void>& rhs) {
return !(lhs == rhs); return !(lhs == rhs);
} }
template <class T1> template <class T1>
KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator<( KOKKOS_DEPRECATED KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator<(
const pair<T1, void>& lhs, const pair<T1, void>& rhs) { const pair<T1, void>& lhs, const pair<T1, void>& rhs) {
return lhs.first < rhs.first; return lhs.first < rhs.first;
} }
template <class T1> template <class T1>
KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator<=( KOKKOS_DEPRECATED KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator<=(
const pair<T1, void>& lhs, const pair<T1, void>& rhs) { const pair<T1, void>& lhs, const pair<T1, void>& rhs) {
return !(rhs < lhs); return !(rhs < lhs);
} }
template <class T1> template <class T1>
KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator>( KOKKOS_DEPRECATED KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator>(
const pair<T1, void>& lhs, const pair<T1, void>& rhs) { const pair<T1, void>& lhs, const pair<T1, void>& rhs) {
return rhs < lhs; return rhs < lhs;
} }
template <class T1> template <class T1>
KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator>=( KOKKOS_DEPRECATED KOKKOS_FORCEINLINE_FUNCTION constexpr bool operator>=(
const pair<T1, void>& lhs, const pair<T1, void>& rhs) { const pair<T1, void>& lhs, const pair<T1, void>& rhs) {
return !(lhs < rhs); return !(lhs < rhs);
} }
#if defined(KOKKOS_COMPILER_GNU) && (KOKKOS_COMPILER_GNU < 1110)
KOKKOS_IMPL_DISABLE_DEPRECATED_WARNINGS_POP()
#endif
#endif
namespace Impl { namespace Impl {
template <class T> template <class T>

View File

@ -137,9 +137,9 @@ inline void parallel_for(const std::string& str, const ExecPolicy& policy,
ExecPolicy inner_policy = policy; ExecPolicy inner_policy = policy;
Kokkos::Tools::Impl::begin_parallel_for(inner_policy, functor, str, kpID); Kokkos::Tools::Impl::begin_parallel_for(inner_policy, functor, str, kpID);
Kokkos::Impl::shared_allocation_tracking_disable(); auto closure =
Impl::ParallelFor<FunctorType, ExecPolicy> closure(functor, inner_policy); Kokkos::Impl::construct_with_shared_allocation_tracking_disabled<
Kokkos::Impl::shared_allocation_tracking_enable(); Impl::ParallelFor<FunctorType, ExecPolicy>>(functor, inner_policy);
closure.execute(); closure.execute();
@ -352,10 +352,10 @@ inline void parallel_scan(const std::string& str, const ExecutionPolicy& policy,
ExecutionPolicy inner_policy = policy; ExecutionPolicy inner_policy = policy;
Kokkos::Tools::Impl::begin_parallel_scan(inner_policy, functor, str, kpID); Kokkos::Tools::Impl::begin_parallel_scan(inner_policy, functor, str, kpID);
Kokkos::Impl::shared_allocation_tracking_disable(); auto closure =
Impl::ParallelScan<FunctorType, ExecutionPolicy> closure(functor, Kokkos::Impl::construct_with_shared_allocation_tracking_disabled<
inner_policy); Impl::ParallelScan<FunctorType, ExecutionPolicy>>(functor,
Kokkos::Impl::shared_allocation_tracking_enable(); inner_policy);
closure.execute(); closure.execute();
@ -398,18 +398,19 @@ inline void parallel_scan(const std::string& str, const ExecutionPolicy& policy,
Kokkos::Tools::Impl::begin_parallel_scan(inner_policy, functor, str, kpID); Kokkos::Tools::Impl::begin_parallel_scan(inner_policy, functor, str, kpID);
if constexpr (Kokkos::is_view<ReturnType>::value) { if constexpr (Kokkos::is_view<ReturnType>::value) {
Kokkos::Impl::shared_allocation_tracking_disable(); auto closure =
Impl::ParallelScanWithTotal<FunctorType, ExecutionPolicy, Kokkos::Impl::construct_with_shared_allocation_tracking_disabled<
typename ReturnType::value_type> Impl::ParallelScanWithTotal<FunctorType, ExecutionPolicy,
closure(functor, inner_policy, return_value); typename ReturnType::value_type>>(
Kokkos::Impl::shared_allocation_tracking_enable(); functor, inner_policy, return_value);
closure.execute(); closure.execute();
} else { } else {
Kokkos::Impl::shared_allocation_tracking_disable();
Kokkos::View<ReturnType, Kokkos::HostSpace> view(&return_value); Kokkos::View<ReturnType, Kokkos::HostSpace> view(&return_value);
Impl::ParallelScanWithTotal<FunctorType, ExecutionPolicy, ReturnType> auto closure =
closure(functor, inner_policy, view); Kokkos::Impl::construct_with_shared_allocation_tracking_disabled<
Kokkos::Impl::shared_allocation_tracking_enable(); Impl::ParallelScanWithTotal<FunctorType, ExecutionPolicy,
ReturnType>>(functor, inner_policy,
view);
closure.execute(); closure.execute();
} }

View File

@ -72,7 +72,7 @@ struct Sum {
}; };
template <typename Scalar, typename... Properties> template <typename Scalar, typename... Properties>
Sum(View<Scalar, Properties...> const&) KOKKOS_DEDUCTION_GUIDE Sum(View<Scalar, Properties...> const&)
->Sum<Scalar, typename View<Scalar, Properties...>::memory_space>; ->Sum<Scalar, typename View<Scalar, Properties...>::memory_space>;
template <class Scalar, class Space> template <class Scalar, class Space>
@ -117,7 +117,7 @@ struct Prod {
}; };
template <typename Scalar, typename... Properties> template <typename Scalar, typename... Properties>
Prod(View<Scalar, Properties...> const&) KOKKOS_DEDUCTION_GUIDE Prod(View<Scalar, Properties...> const&)
->Prod<Scalar, typename View<Scalar, Properties...>::memory_space>; ->Prod<Scalar, typename View<Scalar, Properties...>::memory_space>;
template <class Scalar, class Space> template <class Scalar, class Space>
@ -164,7 +164,7 @@ struct Min {
}; };
template <typename Scalar, typename... Properties> template <typename Scalar, typename... Properties>
Min(View<Scalar, Properties...> const&) KOKKOS_DEDUCTION_GUIDE Min(View<Scalar, Properties...> const&)
->Min<Scalar, typename View<Scalar, Properties...>::memory_space>; ->Min<Scalar, typename View<Scalar, Properties...>::memory_space>;
template <class Scalar, class Space> template <class Scalar, class Space>
@ -212,7 +212,7 @@ struct Max {
}; };
template <typename Scalar, typename... Properties> template <typename Scalar, typename... Properties>
Max(View<Scalar, Properties...> const&) KOKKOS_DEDUCTION_GUIDE Max(View<Scalar, Properties...> const&)
->Max<Scalar, typename View<Scalar, Properties...>::memory_space>; ->Max<Scalar, typename View<Scalar, Properties...>::memory_space>;
template <class Scalar, class Space> template <class Scalar, class Space>
@ -258,7 +258,7 @@ struct LAnd {
}; };
template <typename Scalar, typename... Properties> template <typename Scalar, typename... Properties>
LAnd(View<Scalar, Properties...> const&) KOKKOS_DEDUCTION_GUIDE LAnd(View<Scalar, Properties...> const&)
->LAnd<Scalar, typename View<Scalar, Properties...>::memory_space>; ->LAnd<Scalar, typename View<Scalar, Properties...>::memory_space>;
template <class Scalar, class Space> template <class Scalar, class Space>
@ -305,7 +305,7 @@ struct LOr {
}; };
template <typename Scalar, typename... Properties> template <typename Scalar, typename... Properties>
LOr(View<Scalar, Properties...> const&) KOKKOS_DEDUCTION_GUIDE LOr(View<Scalar, Properties...> const&)
->LOr<Scalar, typename View<Scalar, Properties...>::memory_space>; ->LOr<Scalar, typename View<Scalar, Properties...>::memory_space>;
template <class Scalar, class Space> template <class Scalar, class Space>
@ -352,7 +352,7 @@ struct BAnd {
}; };
template <typename Scalar, typename... Properties> template <typename Scalar, typename... Properties>
BAnd(View<Scalar, Properties...> const&) KOKKOS_DEDUCTION_GUIDE BAnd(View<Scalar, Properties...> const&)
->BAnd<Scalar, typename View<Scalar, Properties...>::memory_space>; ->BAnd<Scalar, typename View<Scalar, Properties...>::memory_space>;
template <class Scalar, class Space> template <class Scalar, class Space>
@ -399,7 +399,7 @@ struct BOr {
}; };
template <typename Scalar, typename... Properties> template <typename Scalar, typename... Properties>
BOr(View<Scalar, Properties...> const&) KOKKOS_DEDUCTION_GUIDE BOr(View<Scalar, Properties...> const&)
->BOr<Scalar, typename View<Scalar, Properties...>::memory_space>; ->BOr<Scalar, typename View<Scalar, Properties...>::memory_space>;
template <class Scalar, class Index> template <class Scalar, class Index>
@ -458,7 +458,8 @@ struct MinLoc {
}; };
template <typename Scalar, typename Index, typename... Properties> template <typename Scalar, typename Index, typename... Properties>
MinLoc(View<ValLocScalar<Scalar, Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE MinLoc(
View<ValLocScalar<Scalar, Index>, Properties...> const&)
->MinLoc<Scalar, Index, ->MinLoc<Scalar, Index,
typename View<ValLocScalar<Scalar, Index>, typename View<ValLocScalar<Scalar, Index>,
Properties...>::memory_space>; Properties...>::memory_space>;
@ -513,7 +514,8 @@ struct MaxLoc {
}; };
template <typename Scalar, typename Index, typename... Properties> template <typename Scalar, typename Index, typename... Properties>
MaxLoc(View<ValLocScalar<Scalar, Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE MaxLoc(
View<ValLocScalar<Scalar, Index>, Properties...> const&)
->MaxLoc<Scalar, Index, ->MaxLoc<Scalar, Index,
typename View<ValLocScalar<Scalar, Index>, typename View<ValLocScalar<Scalar, Index>,
Properties...>::memory_space>; Properties...>::memory_space>;
@ -577,7 +579,7 @@ struct MinMax {
}; };
template <typename Scalar, typename... Properties> template <typename Scalar, typename... Properties>
MinMax(View<MinMaxScalar<Scalar>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE MinMax(View<MinMaxScalar<Scalar>, Properties...> const&)
->MinMax<Scalar, ->MinMax<Scalar,
typename View<MinMaxScalar<Scalar>, Properties...>::memory_space>; typename View<MinMaxScalar<Scalar>, Properties...>::memory_space>;
@ -646,7 +648,8 @@ struct MinMaxLoc {
}; };
template <typename Scalar, typename Index, typename... Properties> template <typename Scalar, typename Index, typename... Properties>
MinMaxLoc(View<MinMaxLocScalar<Scalar, Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE MinMaxLoc(
View<MinMaxLocScalar<Scalar, Index>, Properties...> const&)
->MinMaxLoc<Scalar, Index, ->MinMaxLoc<Scalar, Index,
typename View<MinMaxLocScalar<Scalar, Index>, typename View<MinMaxLocScalar<Scalar, Index>,
Properties...>::memory_space>; Properties...>::memory_space>;
@ -713,7 +716,8 @@ struct MaxFirstLoc {
}; };
template <typename Scalar, typename Index, typename... Properties> template <typename Scalar, typename Index, typename... Properties>
MaxFirstLoc(View<ValLocScalar<Scalar, Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE MaxFirstLoc(
View<ValLocScalar<Scalar, Index>, Properties...> const&)
->MaxFirstLoc<Scalar, Index, ->MaxFirstLoc<Scalar, Index,
typename View<ValLocScalar<Scalar, Index>, typename View<ValLocScalar<Scalar, Index>,
Properties...>::memory_space>; Properties...>::memory_space>;
@ -782,7 +786,7 @@ struct MaxFirstLocCustomComparator {
template <typename Scalar, typename Index, typename ComparatorType, template <typename Scalar, typename Index, typename ComparatorType,
typename... Properties> typename... Properties>
MaxFirstLocCustomComparator( KOKKOS_DEDUCTION_GUIDE MaxFirstLocCustomComparator(
View<ValLocScalar<Scalar, Index>, Properties...> const&, ComparatorType) View<ValLocScalar<Scalar, Index>, Properties...> const&, ComparatorType)
->MaxFirstLocCustomComparator<Scalar, Index, ComparatorType, ->MaxFirstLocCustomComparator<Scalar, Index, ComparatorType,
typename View<ValLocScalar<Scalar, Index>, typename View<ValLocScalar<Scalar, Index>,
@ -846,7 +850,8 @@ struct MinFirstLoc {
}; };
template <typename Scalar, typename Index, typename... Properties> template <typename Scalar, typename Index, typename... Properties>
MinFirstLoc(View<ValLocScalar<Scalar, Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE MinFirstLoc(
View<ValLocScalar<Scalar, Index>, Properties...> const&)
->MinFirstLoc<Scalar, Index, ->MinFirstLoc<Scalar, Index,
typename View<ValLocScalar<Scalar, Index>, typename View<ValLocScalar<Scalar, Index>,
Properties...>::memory_space>; Properties...>::memory_space>;
@ -915,7 +920,7 @@ struct MinFirstLocCustomComparator {
template <typename Scalar, typename Index, typename ComparatorType, template <typename Scalar, typename Index, typename ComparatorType,
typename... Properties> typename... Properties>
MinFirstLocCustomComparator( KOKKOS_DEDUCTION_GUIDE MinFirstLocCustomComparator(
View<ValLocScalar<Scalar, Index>, Properties...> const&, ComparatorType) View<ValLocScalar<Scalar, Index>, Properties...> const&, ComparatorType)
->MinFirstLocCustomComparator<Scalar, Index, ComparatorType, ->MinFirstLocCustomComparator<Scalar, Index, ComparatorType,
typename View<ValLocScalar<Scalar, Index>, typename View<ValLocScalar<Scalar, Index>,
@ -990,7 +995,8 @@ struct MinMaxFirstLastLoc {
}; };
template <typename Scalar, typename Index, typename... Properties> template <typename Scalar, typename Index, typename... Properties>
MinMaxFirstLastLoc(View<MinMaxLocScalar<Scalar, Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE MinMaxFirstLastLoc(
View<MinMaxLocScalar<Scalar, Index>, Properties...> const&)
->MinMaxFirstLastLoc<Scalar, Index, ->MinMaxFirstLastLoc<Scalar, Index,
typename View<MinMaxLocScalar<Scalar, Index>, typename View<MinMaxLocScalar<Scalar, Index>,
Properties...>::memory_space>; Properties...>::memory_space>;
@ -1069,7 +1075,7 @@ struct MinMaxFirstLastLocCustomComparator {
template <typename Scalar, typename Index, typename ComparatorType, template <typename Scalar, typename Index, typename ComparatorType,
typename... Properties> typename... Properties>
MinMaxFirstLastLocCustomComparator( KOKKOS_DEDUCTION_GUIDE MinMaxFirstLastLocCustomComparator(
View<MinMaxLocScalar<Scalar, Index>, Properties...> const&, ComparatorType) View<MinMaxLocScalar<Scalar, Index>, Properties...> const&, ComparatorType)
->MinMaxFirstLastLocCustomComparator< ->MinMaxFirstLastLocCustomComparator<
Scalar, Index, ComparatorType, Scalar, Index, ComparatorType,
@ -1133,7 +1139,8 @@ struct FirstLoc {
}; };
template <typename Index, typename... Properties> template <typename Index, typename... Properties>
FirstLoc(View<FirstLocScalar<Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE FirstLoc(
View<FirstLocScalar<Index>, Properties...> const&)
->FirstLoc<Index, typename View<FirstLocScalar<Index>, ->FirstLoc<Index, typename View<FirstLocScalar<Index>,
Properties...>::memory_space>; Properties...>::memory_space>;
@ -1194,7 +1201,7 @@ struct LastLoc {
}; };
template <typename Index, typename... Properties> template <typename Index, typename... Properties>
LastLoc(View<LastLocScalar<Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE LastLoc(View<LastLocScalar<Index>, Properties...> const&)
->LastLoc<Index, ->LastLoc<Index,
typename View<LastLocScalar<Index>, Properties...>::memory_space>; typename View<LastLocScalar<Index>, Properties...>::memory_space>;
@ -1261,7 +1268,8 @@ struct StdIsPartitioned {
}; };
template <typename Index, typename... Properties> template <typename Index, typename... Properties>
StdIsPartitioned(View<StdIsPartScalar<Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE StdIsPartitioned(
View<StdIsPartScalar<Index>, Properties...> const&)
->StdIsPartitioned<Index, typename View<StdIsPartScalar<Index>, ->StdIsPartitioned<Index, typename View<StdIsPartScalar<Index>,
Properties...>::memory_space>; Properties...>::memory_space>;
@ -1323,7 +1331,8 @@ struct StdPartitionPoint {
}; };
template <typename Index, typename... Properties> template <typename Index, typename... Properties>
StdPartitionPoint(View<StdPartPointScalar<Index>, Properties...> const&) KOKKOS_DEDUCTION_GUIDE StdPartitionPoint(
View<StdPartPointScalar<Index>, Properties...> const&)
->StdPartitionPoint<Index, typename View<StdPartPointScalar<Index>, ->StdPartitionPoint<Index, typename View<StdPartPointScalar<Index>,
Properties...>::memory_space>; Properties...>::memory_space>;
@ -1502,18 +1511,18 @@ struct ParallelReduceAdaptor {
using Analysis = FunctorAnalysis<FunctorPatternInterface::REDUCE, using Analysis = FunctorAnalysis<FunctorPatternInterface::REDUCE,
PolicyType, typename ReducerSelector::type, PolicyType, typename ReducerSelector::type,
typename return_value_adapter::value_type>; typename return_value_adapter::value_type>;
Kokkos::Impl::shared_allocation_tracking_disable();
CombinedFunctorReducer functor_reducer(
functor, typename Analysis::Reducer(
ReducerSelector::select(functor, return_value)));
// FIXME Remove "Wrapper" once all backends implement the new interface using CombinedFunctorReducerType =
Impl::ParallelReduce<decltype(functor_reducer), PolicyType, CombinedFunctorReducer<FunctorType, typename Analysis::Reducer>;
typename Impl::FunctorPolicyExecutionSpace< auto closure = construct_with_shared_allocation_tracking_disabled<
FunctorType, PolicyType>::execution_space> Impl::ParallelReduce<CombinedFunctorReducerType, PolicyType,
closure(functor_reducer, inner_policy, typename Impl::FunctorPolicyExecutionSpace<
return_value_adapter::return_value(return_value, functor)); FunctorType, PolicyType>::execution_space>>(
Kokkos::Impl::shared_allocation_tracking_enable(); CombinedFunctorReducerType(
functor, typename Analysis::Reducer(
ReducerSelector::select(functor, return_value))),
inner_policy,
return_value_adapter::return_value(return_value, functor));
closure.execute(); closure.execute();
Kokkos::Tools::Impl::end_parallel_reduce<PassedReducerType>( Kokkos::Tools::Impl::end_parallel_reduce<PassedReducerType>(

View File

@ -38,6 +38,8 @@ static_assert(false,
#ifdef KOKKOS_ENABLE_IMPL_MDSPAN #ifdef KOKKOS_ENABLE_IMPL_MDSPAN
#include <View/MDSpan/Kokkos_MDSpan_Extents.hpp> #include <View/MDSpan/Kokkos_MDSpan_Extents.hpp>
#include <View/MDSpan/Kokkos_MDSpan_Layout.hpp>
#include <View/MDSpan/Kokkos_MDSpan_Accessor.hpp>
#endif #endif
#include <Kokkos_MinMax.hpp> #include <Kokkos_MinMax.hpp>
@ -372,6 +374,35 @@ struct ViewTraits {
//------------------------------------ //------------------------------------
}; };
#ifdef KOKKOS_ENABLE_IMPL_MDSPAN
namespace Impl {
struct UnsupportedKokkosArrayLayout;
template <class Traits, class Enabled = void>
struct MDSpanViewTraits {
using mdspan_type = UnsupportedKokkosArrayLayout;
};
// "Natural" mdspan for a view if the View's ArrayLayout is supported.
template <class Traits>
struct MDSpanViewTraits<Traits,
std::void_t<typename Impl::LayoutFromArrayLayout<
typename Traits::array_layout>::type>> {
using index_type = std::size_t;
using extents_type =
typename Impl::ExtentsFromDataType<index_type,
typename Traits::data_type>::type;
using mdspan_layout_type =
typename Impl::LayoutFromArrayLayout<typename Traits::array_layout>::type;
using accessor_type = Impl::SpaceAwareAccessor<
typename Traits::memory_space,
Kokkos::default_accessor<typename Traits::value_type>>;
using mdspan_type = mdspan<typename Traits::value_type, extents_type,
mdspan_layout_type, accessor_type>;
};
} // namespace Impl
#endif // KOKKOS_ENABLE_IMPL_MDSPAN
/** \class View /** \class View
* \brief View to an array of data. * \brief View to an array of data.
* *
@ -522,7 +553,6 @@ constexpr bool is_assignable(const Kokkos::View<ViewTDst...>& dst,
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
#include <impl/Kokkos_ViewMapping.hpp> #include <impl/Kokkos_ViewMapping.hpp>
#include <impl/Kokkos_ViewArray.hpp>
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
//---------------------------------------------------------------------------- //----------------------------------------------------------------------------
@ -923,57 +953,30 @@ class View : public ViewTraits<DataType, Properties...> {
template <typename I0, typename I1> template <typename I0, typename I1>
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t< KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t<
(Kokkos::Impl::always_true<I0, I1>::value && // (Kokkos::Impl::always_true<I0, I1>::value && //
(2 == rank) && is_default_map && is_layout_left && (rank_dynamic == 0)), (2 == rank) && is_default_map &&
(is_layout_left || is_layout_right || is_layout_stride)),
reference_type> reference_type>
operator()(I0 i0, I1 i1) const { operator()(I0 i0, I1 i1) const {
check_operator_parens_valid_args(i0, i1); check_operator_parens_valid_args(i0, i1);
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1) KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1)
return m_map.m_impl_handle[i0 + m_map.m_impl_offset.m_dim.N0 * i1]; if constexpr (is_layout_left) {
} if constexpr (rank_dynamic == 0)
return m_map.m_impl_handle[i0 + m_map.m_impl_offset.m_dim.N0 * i1];
template <typename I0, typename I1> else
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t< return m_map.m_impl_handle[i0 + m_map.m_impl_offset.m_stride * i1];
(Kokkos::Impl::always_true<I0, I1>::value && // } else if constexpr (is_layout_right) {
(2 == rank) && is_default_map && is_layout_left && (rank_dynamic != 0)), if constexpr (rank_dynamic == 0)
reference_type> return m_map.m_impl_handle[i1 + m_map.m_impl_offset.m_dim.N1 * i0];
operator()(I0 i0, I1 i1) const { else
check_operator_parens_valid_args(i0, i1); return m_map.m_impl_handle[i1 + m_map.m_impl_offset.m_stride * i0];
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1) } else {
return m_map.m_impl_handle[i0 + m_map.m_impl_offset.m_stride * i1]; static_assert(is_layout_stride);
} return m_map.m_impl_handle[i0 * m_map.m_impl_offset.m_stride.S0 +
i1 * m_map.m_impl_offset.m_stride.S1];
template <typename I0, typename I1> }
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t< #if defined KOKKOS_COMPILER_INTEL
(Kokkos::Impl::always_true<I0, I1>::value && // __builtin_unreachable();
(2 == rank) && is_default_map && is_layout_right && (rank_dynamic == 0)), #endif
reference_type>
operator()(I0 i0, I1 i1) const {
check_operator_parens_valid_args(i0, i1);
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1)
return m_map.m_impl_handle[i1 + m_map.m_impl_offset.m_dim.N1 * i0];
}
template <typename I0, typename I1>
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t<
(Kokkos::Impl::always_true<I0, I1>::value && //
(2 == rank) && is_default_map && is_layout_right && (rank_dynamic != 0)),
reference_type>
operator()(I0 i0, I1 i1) const {
check_operator_parens_valid_args(i0, i1);
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1)
return m_map.m_impl_handle[i1 + m_map.m_impl_offset.m_stride * i0];
}
template <typename I0, typename I1>
KOKKOS_FORCEINLINE_FUNCTION
std::enable_if_t<(Kokkos::Impl::always_true<I0, I1>::value && //
(2 == rank) && is_default_map && is_layout_stride),
reference_type>
operator()(I0 i0, I1 i1) const {
check_operator_parens_valid_args(i0, i1);
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1)
return m_map.m_impl_handle[i0 * m_map.m_impl_offset.m_stride.S0 +
i1 * m_map.m_impl_offset.m_stride.S1];
} }
// Rank 0 -> 8 operator() except for rank-1 and rank-2 with default map which // Rank 0 -> 8 operator() except for rank-1 and rank-2 with default map which
@ -1066,57 +1069,30 @@ class View : public ViewTraits<DataType, Properties...> {
template <typename I0, typename I1, typename... Is> template <typename I0, typename I1, typename... Is>
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t< KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t<
(Kokkos::Impl::always_true<I0, I1, Is...>::value && (2 == rank) && (Kokkos::Impl::always_true<I0, I1, Is...>::value && (2 == rank) &&
is_default_map && is_layout_left && (rank_dynamic == 0)), is_default_map &&
(is_layout_left || is_layout_right || is_layout_stride)),
reference_type> reference_type>
access(I0 i0, I1 i1, Is... extra) const { access(I0 i0, I1 i1, Is... extra) const {
check_access_member_function_valid_args(i0, i1, extra...); check_access_member_function_valid_args(i0, i1, extra...);
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1, extra...) KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1, extra...)
return m_map.m_impl_handle[i0 + m_map.m_impl_offset.m_dim.N0 * i1]; if constexpr (is_layout_left) {
} if constexpr (rank_dynamic == 0)
return m_map.m_impl_handle[i0 + m_map.m_impl_offset.m_dim.N0 * i1];
template <typename I0, typename I1, typename... Is> else
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t< return m_map.m_impl_handle[i0 + m_map.m_impl_offset.m_stride * i1];
(Kokkos::Impl::always_true<I0, I1, Is...>::value && (2 == rank) && } else if constexpr (is_layout_right) {
is_default_map && is_layout_left && (rank_dynamic != 0)), if constexpr (rank_dynamic == 0)
reference_type> return m_map.m_impl_handle[i1 + m_map.m_impl_offset.m_dim.N1 * i0];
access(I0 i0, I1 i1, Is... extra) const { else
check_access_member_function_valid_args(i0, i1, extra...); return m_map.m_impl_handle[i1 + m_map.m_impl_offset.m_stride * i0];
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1, extra...) } else {
return m_map.m_impl_handle[i0 + m_map.m_impl_offset.m_stride * i1]; static_assert(is_layout_stride);
} return m_map.m_impl_handle[i0 * m_map.m_impl_offset.m_stride.S0 +
i1 * m_map.m_impl_offset.m_stride.S1];
template <typename I0, typename I1, typename... Is> }
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t< #if defined KOKKOS_COMPILER_INTEL
(Kokkos::Impl::always_true<I0, I1, Is...>::value && (2 == rank) && __builtin_unreachable();
is_default_map && is_layout_right && (rank_dynamic == 0)), #endif
reference_type>
access(I0 i0, I1 i1, Is... extra) const {
check_access_member_function_valid_args(i0, i1, extra...);
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1, extra...)
return m_map.m_impl_handle[i1 + m_map.m_impl_offset.m_dim.N1 * i0];
}
template <typename I0, typename I1, typename... Is>
KOKKOS_FORCEINLINE_FUNCTION std::enable_if_t<
(Kokkos::Impl::always_true<I0, I1, Is...>::value && (2 == rank) &&
is_default_map && is_layout_right && (rank_dynamic != 0)),
reference_type>
access(I0 i0, I1 i1, Is... extra) const {
check_access_member_function_valid_args(i0, i1, extra...);
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1, extra...)
return m_map.m_impl_handle[i1 + m_map.m_impl_offset.m_stride * i0];
}
template <typename I0, typename I1, typename... Is>
KOKKOS_FORCEINLINE_FUNCTION
std::enable_if_t<(Kokkos::Impl::always_true<I0, I1, Is...>::value &&
(2 == rank) && is_default_map && is_layout_stride),
reference_type>
access(I0 i0, I1 i1, Is... extra) const {
check_access_member_function_valid_args(i0, i1, extra...);
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY(m_track, m_map, i0, i1, extra...)
return m_map.m_impl_handle[i0 * m_map.m_impl_offset.m_stride.S0 +
i1 * m_map.m_impl_offset.m_stride.S1];
} }
//------------------------------ //------------------------------
@ -1442,8 +1418,7 @@ class View : public ViewTraits<DataType, Properties...> {
std::is_same_v<typename traits::array_layout, std::is_same_v<typename traits::array_layout,
Kokkos::LayoutRight> || Kokkos::LayoutRight> ||
std::is_same_v<typename traits::array_layout, std::is_same_v<typename traits::array_layout,
Kokkos::LayoutStride> || Kokkos::LayoutStride>) {
is_layouttiled<typename traits::array_layout>::value) {
size_t i0 = arg_layout.dimension[0]; size_t i0 = arg_layout.dimension[0];
size_t i1 = arg_layout.dimension[1]; size_t i1 = arg_layout.dimension[1];
size_t i2 = arg_layout.dimension[2]; size_t i2 = arg_layout.dimension[2];
@ -1495,8 +1470,7 @@ class View : public ViewTraits<DataType, Properties...> {
std::is_same_v<typename traits::array_layout, std::is_same_v<typename traits::array_layout,
Kokkos::LayoutRight> || Kokkos::LayoutRight> ||
std::is_same_v<typename traits::array_layout, std::is_same_v<typename traits::array_layout,
Kokkos::LayoutStride> || Kokkos::LayoutStride>) {
is_layouttiled<typename traits::array_layout>::value) {
size_t i0 = arg_layout.dimension[0]; size_t i0 = arg_layout.dimension[0];
size_t i1 = arg_layout.dimension[1]; size_t i1 = arg_layout.dimension[1];
size_t i2 = arg_layout.dimension[2]; size_t i2 = arg_layout.dimension[2];
@ -1725,6 +1699,79 @@ class View : public ViewTraits<DataType, Properties...> {
"Layout is not constructible from extent arguments. Use " "Layout is not constructible from extent arguments. Use "
"overload taking a layout object instead."); "overload taking a layout object instead.");
} }
//----------------------------------------
// MDSpan converting constructors
#ifdef KOKKOS_ENABLE_IMPL_MDSPAN
template <typename U = typename Impl::MDSpanViewTraits<traits>::mdspan_type>
KOKKOS_INLINE_FUNCTION
#ifndef KOKKOS_ENABLE_CXX17
explicit(traits::is_managed)
#endif
View(const typename Impl::MDSpanViewTraits<traits>::mdspan_type& mds,
std::enable_if_t<
!std::is_same_v<Impl::UnsupportedKokkosArrayLayout, U>>* =
nullptr)
: View(mds.data_handle(),
Impl::array_layout_from_mapping<
typename traits::array_layout,
typename Impl::MDSpanViewTraits<traits>::mdspan_type>(
mds.mapping())) {
}
template <class ElementType, class ExtentsType, class LayoutType,
class AccessorType>
KOKKOS_INLINE_FUNCTION
#ifndef KOKKOS_ENABLE_CXX17
explicit(!std::is_convertible_v<
Kokkos::mdspan<ElementType, ExtentsType, LayoutType,
AccessorType>,
typename Impl::MDSpanViewTraits<traits>::mdspan_type>)
#endif
View(const Kokkos::mdspan<ElementType, ExtentsType, LayoutType,
AccessorType>& mds)
: View(typename Impl::MDSpanViewTraits<traits>::mdspan_type(mds)) {
}
//----------------------------------------
// Conversion to MDSpan
template <class OtherElementType, class OtherExtents, class OtherLayoutPolicy,
class OtherAccessor,
class ImplNaturalMDSpanType =
typename Impl::MDSpanViewTraits<traits>::mdspan_type,
typename = std::enable_if_t<std::conditional_t<
std::is_same_v<Impl::UnsupportedKokkosArrayLayout,
ImplNaturalMDSpanType>,
std::false_type,
std::is_assignable<mdspan<OtherElementType, OtherExtents,
OtherLayoutPolicy, OtherAccessor>,
ImplNaturalMDSpanType>>::value>>
KOKKOS_INLINE_FUNCTION constexpr operator mdspan<
OtherElementType, OtherExtents, OtherLayoutPolicy, OtherAccessor>() {
using mdspan_type = typename Impl::MDSpanViewTraits<traits>::mdspan_type;
return mdspan_type{data(),
Impl::mapping_from_view_mapping<mdspan_type>(m_map)};
}
template <class OtherAccessorType = Impl::SpaceAwareAccessor<
typename traits::memory_space,
Kokkos::default_accessor<typename traits::value_type>>,
typename = std::enable_if_t<std::is_assignable_v<
typename traits::value_type*&,
typename OtherAccessorType::data_handle_type>>>
KOKKOS_INLINE_FUNCTION constexpr auto to_mdspan(
const OtherAccessorType& other_accessor =
typename Impl::MDSpanViewTraits<traits>::accessor_type()) {
using mdspan_type = typename Impl::MDSpanViewTraits<traits>::mdspan_type;
using ret_mdspan_type =
mdspan<typename mdspan_type::element_type,
typename mdspan_type::extents_type,
typename mdspan_type::layout_type, OtherAccessorType>;
return ret_mdspan_type{data(),
Impl::mapping_from_view_mapping<mdspan_type>(m_map),
other_accessor};
}
#endif // KOKKOS_ENABLE_IMPL_MDSPAN
}; };
template <typename D, class... P> template <typename D, class... P>
@ -1878,23 +1925,6 @@ KOKKOS_INLINE_FUNCTION bool operator!=(const View<LT, LP...>& lhs,
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
inline void shared_allocation_tracking_disable() {
Kokkos::Impl::SharedAllocationRecord<void, void>::tracking_disable();
}
inline void shared_allocation_tracking_enable() {
Kokkos::Impl::SharedAllocationRecord<void, void>::tracking_enable();
}
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template <class Specialize, typename A, typename B> template <class Specialize, typename A, typename B>
struct CommonViewValueType; struct CommonViewValueType;

View File

@ -67,16 +67,7 @@ void *Kokkos::Experimental::OpenACCSpace::impl_allocate(
ptr = acc_malloc(arg_alloc_size); ptr = acc_malloc(arg_alloc_size);
if (!ptr) { if (!ptr) {
size_t alignment = 1; // OpenACC does not handle alignment Kokkos::Impl::throw_bad_alloc(name(), arg_alloc_size, arg_label);
using Kokkos::Experimental::RawMemoryAllocationFailure;
auto failure_mode =
arg_alloc_size > 0
? RawMemoryAllocationFailure::FailureMode::OutOfMemoryError
: RawMemoryAllocationFailure::FailureMode::InvalidAllocationSize;
auto alloc_mechanism =
RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc;
throw RawMemoryAllocationFailure(arg_alloc_size, alignment, failure_mode,
alloc_mechanism);
} }
if (Kokkos::Profiling::profileLibraryLoaded()) { if (Kokkos::Profiling::profileLibraryLoaded()) {

View File

@ -44,10 +44,12 @@ class Kokkos::Impl::ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
auto team_size = m_policy.team_size(); auto team_size = m_policy.team_size();
auto vector_length = m_policy.impl_vector_length(); auto vector_length = m_policy.impl_vector_length();
int const async_arg = m_policy.space().acc_async_queue();
auto const a_functor(m_functor); auto const a_functor(m_functor);
#pragma acc parallel loop gang vector num_gangs(league_size) \ #pragma acc parallel loop gang vector num_gangs(league_size) \
vector_length(team_size* vector_length) copyin(a_functor) vector_length(team_size* vector_length) copyin(a_functor) async(async_arg)
for (int i = 0; i < league_size * team_size * vector_length; i++) { for (int i = 0; i < league_size * team_size * vector_length; i++) {
int league_id = i / (team_size * vector_length); int league_id = i / (team_size * vector_length);
typename Policy::member_type team(league_id, league_size, team_size, typename Policy::member_type team(league_id, league_size, team_size,
@ -145,10 +147,12 @@ class Kokkos::Impl::ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
auto team_size = m_policy.team_size(); auto team_size = m_policy.team_size();
auto vector_length = m_policy.impl_vector_length(); auto vector_length = m_policy.impl_vector_length();
int const async_arg = m_policy.space().acc_async_queue();
auto const a_functor(m_functor); auto const a_functor(m_functor);
#pragma acc parallel loop gang num_gangs(league_size) num_workers(team_size) \ #pragma acc parallel loop gang num_gangs(league_size) num_workers(team_size) \
vector_length(vector_length) copyin(a_functor) vector_length(vector_length) copyin(a_functor) async(async_arg)
for (int i = 0; i < league_size; i++) { for (int i = 0; i < league_size; i++) {
int league_id = i; int league_id = i;
typename Policy::member_type team(league_id, league_size, team_size, typename Policy::member_type team(league_id, league_size, team_size,

View File

@ -72,9 +72,28 @@ int OpenMP::concurrency(OpenMP const &instance) {
int OpenMP::concurrency() const { return impl_thread_pool_size(); } int OpenMP::concurrency() const { return impl_thread_pool_size(); }
#endif #endif
void OpenMP::impl_static_fence(std::string const &name) {
Kokkos::Tools::Experimental::Impl::profile_fence_event<Kokkos::OpenMP>(
name,
Kokkos::Tools::Experimental::SpecialSynchronizationCases::
GlobalDeviceSynchronization,
[]() {
std::lock_guard<std::mutex> lock_all_instances(
Impl::OpenMPInternal::all_instances_mutex);
for (auto *instance_ptr : Impl::OpenMPInternal::all_instances) {
std::lock_guard<std::mutex> lock_instance(
instance_ptr->m_instance_mutex);
}
});
}
void OpenMP::fence(const std::string &name) const { void OpenMP::fence(const std::string &name) const {
Kokkos::Tools::Experimental::Impl::profile_fence_event<Kokkos::OpenMP>( Kokkos::Tools::Experimental::Impl::profile_fence_event<Kokkos::OpenMP>(
name, Kokkos::Tools::Experimental::Impl::DirectFenceIDHandle{1}, []() {}); name, Kokkos::Tools::Experimental::Impl::DirectFenceIDHandle{1},
[this]() {
auto *internal_instance = this->impl_internal_space_instance();
std::lock_guard<std::mutex> lock(internal_instance->m_instance_mutex);
});
} }
bool OpenMP::impl_is_initialized() noexcept { bool OpenMP::impl_is_initialized() noexcept {

View File

@ -67,7 +67,15 @@ class OpenMP {
OpenMP(); OpenMP();
OpenMP(int pool_size); explicit OpenMP(int pool_size);
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
template <typename T = void>
KOKKOS_DEPRECATED_WITH_COMMENT(
"OpenMP execution space should be constructed explicitly.")
OpenMP(int pool_size)
: OpenMP(pool_size) {}
#endif
/// \brief Print configuration information to the given output stream. /// \brief Print configuration information to the given output stream.
void print_configuration(std::ostream& os, bool verbose = false) const; void print_configuration(std::ostream& os, bool verbose = false) const;
@ -146,14 +154,6 @@ inline int OpenMP::impl_thread_pool_rank() noexcept {
KOKKOS_IF_ON_DEVICE((return -1;)) KOKKOS_IF_ON_DEVICE((return -1;))
} }
inline void OpenMP::impl_static_fence(std::string const& name) {
Kokkos::Tools::Experimental::Impl::profile_fence_event<Kokkos::OpenMP>(
name,
Kokkos::Tools::Experimental::SpecialSynchronizationCases::
GlobalDeviceSynchronization,
[]() {});
}
inline bool OpenMP::is_asynchronous(OpenMP const& /*instance*/) noexcept { inline bool OpenMP::is_asynchronous(OpenMP const& /*instance*/) noexcept {
return false; return false;
} }

View File

@ -34,18 +34,8 @@
namespace Kokkos { namespace Kokkos {
namespace Impl { namespace Impl {
void OpenMPInternal::acquire_lock() { std::vector<OpenMPInternal *> OpenMPInternal::all_instances;
while (1 == desul::atomic_compare_exchange(&m_pool_mutex, 0, 1, std::mutex OpenMPInternal::all_instances_mutex;
desul::MemoryOrderAcquire(),
desul::MemoryScopeDevice())) {
// do nothing
}
}
void OpenMPInternal::release_lock() {
desul::atomic_store(&m_pool_mutex, 0, desul::MemoryOrderRelease(),
desul::MemoryScopeDevice());
}
void OpenMPInternal::clear_thread_data() { void OpenMPInternal::clear_thread_data() {
const size_t member_bytes = const size_t member_bytes =
@ -123,17 +113,11 @@ void OpenMPInternal::resize_thread_data(size_t pool_reduce_bytes,
if (nullptr != m_pool[rank]) { if (nullptr != m_pool[rank]) {
m_pool[rank]->disband_pool(); m_pool[rank]->disband_pool();
space.deallocate(m_pool[rank], old_alloc_bytes); // impl_deallocate to not fence here
space.impl_deallocate("[unlabeled]", m_pool[rank], old_alloc_bytes);
} }
void *ptr = nullptr; void *ptr = space.allocate("Kokkos::OpenMP::scratch_mem", alloc_bytes);
try {
ptr = space.allocate(alloc_bytes);
} catch (
Kokkos::Experimental::RawMemoryAllocationFailure const &failure) {
// For now, just rethrow the error message the existing way
Kokkos::Impl::throw_runtime_exception(failure.get_error_message());
}
m_pool[rank] = new (ptr) HostThreadTeamData(); m_pool[rank] = new (ptr) HostThreadTeamData();
@ -304,6 +288,18 @@ void OpenMPInternal::finalize() {
} }
m_initialized = false; m_initialized = false;
// guard erasing from all_instances
{
std::scoped_lock lock(all_instances_mutex);
auto it = std::find(all_instances.begin(), all_instances.end(), this);
if (it == all_instances.end())
Kokkos::abort(
"Execution space instance to be removed couldn't be found!");
*it = all_instances.back();
all_instances.pop_back();
}
} }
void OpenMPInternal::print_configuration(std::ostream &s) const { void OpenMPInternal::print_configuration(std::ostream &s) const {

View File

@ -56,7 +56,13 @@ struct OpenMPTraits {
class OpenMPInternal { class OpenMPInternal {
private: private:
OpenMPInternal(int arg_pool_size) OpenMPInternal(int arg_pool_size)
: m_pool_size{arg_pool_size}, m_level{omp_get_level()}, m_pool() {} : m_pool_size{arg_pool_size}, m_level{omp_get_level()}, m_pool() {
// guard pushing to all_instances
{
std::scoped_lock lock(all_instances_mutex);
all_instances.push_back(this);
}
}
~OpenMPInternal() { clear_thread_data(); } ~OpenMPInternal() { clear_thread_data(); }
@ -66,7 +72,6 @@ class OpenMPInternal {
int m_pool_size; int m_pool_size;
int m_level; int m_level;
int m_pool_mutex = 0;
HostThreadTeamData* m_pool[OpenMPTraits::MAX_THREAD_COUNT]; HostThreadTeamData* m_pool[OpenMPTraits::MAX_THREAD_COUNT];
@ -83,12 +88,6 @@ class OpenMPInternal {
int thread_pool_size() const { return m_pool_size; } int thread_pool_size() const { return m_pool_size; }
// Acquire lock used to protect access to m_pool
void acquire_lock();
// Release lock used to protect access to m_pool
void release_lock();
void resize_thread_data(size_t pool_reduce_bytes, size_t team_reduce_bytes, void resize_thread_data(size_t pool_reduce_bytes, size_t team_reduce_bytes,
size_t team_shared_bytes, size_t thread_local_bytes); size_t team_shared_bytes, size_t thread_local_bytes);
@ -107,6 +106,11 @@ class OpenMPInternal {
bool verify_is_initialized(const char* const label) const; bool verify_is_initialized(const char* const label) const;
void print_configuration(std::ostream& s) const; void print_configuration(std::ostream& s) const;
std::mutex m_instance_mutex;
static std::vector<OpenMPInternal*> all_instances;
static std::mutex all_instances_mutex;
}; };
inline bool execute_in_serial(OpenMP const& space = OpenMP()) { inline bool execute_in_serial(OpenMP const& space = OpenMP()) {
@ -157,7 +161,7 @@ inline std::vector<OpenMP> create_OpenMP_instances(
"Kokkos::abort: Partition not enough resources left to create the last " "Kokkos::abort: Partition not enough resources left to create the last "
"instance."); "instance.");
} }
instances[weights.size() - 1] = resources_left; instances[weights.size() - 1] = OpenMP(resources_left);
return instances; return instances;
} }

View File

@ -108,6 +108,8 @@ class ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::OpenMP> {
public: public:
inline void execute() const { inline void execute() const {
// Serialize kernels on the same execution space instance
std::lock_guard<std::mutex> lock(m_instance->m_instance_mutex);
if (execute_in_serial(m_policy.space())) { if (execute_in_serial(m_policy.space())) {
exec_range(m_functor, m_policy.begin(), m_policy.end()); exec_range(m_functor, m_policy.begin(), m_policy.end());
return; return;
@ -202,6 +204,9 @@ class ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>,
public: public:
inline void execute() const { inline void execute() const {
// Serialize kernels on the same execution space instance
std::lock_guard<std::mutex> lock(m_instance->m_instance_mutex);
#ifndef KOKKOS_COMPILER_INTEL #ifndef KOKKOS_COMPILER_INTEL
if (execute_in_serial(m_iter.m_rp.space())) { if (execute_in_serial(m_iter.m_rp.space())) {
exec_range(0, m_iter.m_rp.m_num_tiles); exec_range(0, m_iter.m_rp.m_num_tiles);
@ -333,7 +338,8 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
const size_t team_shared_size = m_shmem_size; const size_t team_shared_size = m_shmem_size;
const size_t thread_local_size = 0; // Never shrinks const size_t thread_local_size = 0; // Never shrinks
m_instance->acquire_lock(); // Serialize kernels on the same execution space instance
std::lock_guard<std::mutex> lock(m_instance->m_instance_mutex);
m_instance->resize_thread_data(pool_reduce_size, team_reduce_size, m_instance->resize_thread_data(pool_reduce_size, team_reduce_size,
team_shared_size, thread_local_size); team_shared_size, thread_local_size);
@ -343,8 +349,6 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
m_functor, *(m_instance->get_thread_data()), 0, m_functor, *(m_instance->get_thread_data()), 0,
m_policy.league_size(), m_policy.league_size()); m_policy.league_size(), m_policy.league_size());
m_instance->release_lock();
return; return;
} }
@ -383,8 +387,6 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
data.disband_team(); data.disband_team();
} }
m_instance->release_lock();
} }
inline ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy) inline ParallelFor(const FunctorType& arg_functor, const Policy& arg_policy)

View File

@ -83,7 +83,8 @@ class ParallelReduce<CombinedFunctorReducerType, Kokkos::RangePolicy<Traits...>,
const size_t pool_reduce_bytes = reducer.value_size(); const size_t pool_reduce_bytes = reducer.value_size();
m_instance->acquire_lock(); // Serialize kernels on the same execution space instance
std::lock_guard<std::mutex> lock(m_instance->m_instance_mutex);
m_instance->resize_thread_data(pool_reduce_bytes, 0 // team_reduce_bytes m_instance->resize_thread_data(pool_reduce_bytes, 0 // team_reduce_bytes
, ,
@ -106,6 +107,7 @@ class ParallelReduce<CombinedFunctorReducerType, Kokkos::RangePolicy<Traits...>,
update); update);
reducer.final(ptr); reducer.final(ptr);
return; return;
} }
const int pool_size = m_instance->thread_pool_size(); const int pool_size = m_instance->thread_pool_size();
@ -157,8 +159,6 @@ class ParallelReduce<CombinedFunctorReducerType, Kokkos::RangePolicy<Traits...>,
m_result_ptr[j] = ptr[j]; m_result_ptr[j] = ptr[j];
} }
} }
m_instance->release_lock();
} }
//---------------------------------------- //----------------------------------------
@ -218,7 +218,8 @@ class ParallelReduce<CombinedFunctorReducerType,
const ReducerType& reducer = m_iter.m_func.get_reducer(); const ReducerType& reducer = m_iter.m_func.get_reducer();
const size_t pool_reduce_bytes = reducer.value_size(); const size_t pool_reduce_bytes = reducer.value_size();
m_instance->acquire_lock(); // Serialize kernels on the same execution space instance
std::lock_guard<std::mutex> lock(m_instance->m_instance_mutex);
m_instance->resize_thread_data(pool_reduce_bytes, 0 // team_reduce_bytes m_instance->resize_thread_data(pool_reduce_bytes, 0 // team_reduce_bytes
, ,
@ -241,8 +242,6 @@ class ParallelReduce<CombinedFunctorReducerType,
reducer.final(ptr); reducer.final(ptr);
m_instance->release_lock();
return; return;
} }
#endif #endif
@ -299,8 +298,6 @@ class ParallelReduce<CombinedFunctorReducerType,
m_result_ptr[j] = ptr[j]; m_result_ptr[j] = ptr[j];
} }
} }
m_instance->release_lock();
} }
//---------------------------------------- //----------------------------------------
@ -415,7 +412,8 @@ class ParallelReduce<CombinedFunctorReducerType,
const size_t team_shared_size = m_shmem_size + m_policy.scratch_size(1); const size_t team_shared_size = m_shmem_size + m_policy.scratch_size(1);
const size_t thread_local_size = 0; // Never shrinks const size_t thread_local_size = 0; // Never shrinks
m_instance->acquire_lock(); // Serialize kernels on the same execution space instance
std::lock_guard<std::mutex> lock(m_instance->m_instance_mutex);
m_instance->resize_thread_data(pool_reduce_size, team_reduce_size, m_instance->resize_thread_data(pool_reduce_size, team_reduce_size,
team_shared_size, thread_local_size); team_shared_size, thread_local_size);
@ -433,8 +431,6 @@ class ParallelReduce<CombinedFunctorReducerType,
reducer.final(ptr); reducer.final(ptr);
m_instance->release_lock();
return; return;
} }
@ -510,8 +506,6 @@ class ParallelReduce<CombinedFunctorReducerType,
m_result_ptr[j] = ptr[j]; m_result_ptr[j] = ptr[j];
} }
} }
m_instance->release_lock();
} }
//---------------------------------------- //----------------------------------------

Some files were not shown because too many files have changed in this diff Show More