Update Kokkos library in LAMMPS to v4.6.0
This commit is contained in:
@ -1,5 +1,72 @@
|
||||
# CHANGELOG
|
||||
|
||||
## 4.6.00
|
||||
|
||||
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.5.01...4.6.00)
|
||||
|
||||
### Features:
|
||||
|
||||
* Kokkos::Graph: Allow adding tasks to the graph via a `then`-node [\#7629](https://github.com/kokkos/kokkos/pull/7629)
|
||||
* Kokkos::Graph: Allow construction from CUDA/HIP graph [\#7664](https://github.com/kokkos/kokkos/pull/7664)
|
||||
* HIP: Add experimental support for using multiple GPUs from one process [\#7130](https://github.com/kokkos/kokkos/pull/7130)
|
||||
|
||||
### Backend and Architecture Enhancements:
|
||||
|
||||
#### CUDA:
|
||||
* Improved reduction performance, in particular on H100 and newer [\#7823](https://github.com/kokkos/kokkos/pull/7823)
|
||||
|
||||
#### HIP:
|
||||
* Change block size deduction to prefer smaller blocks/teams [\#7509](https://github.com/kokkos/kokkos/pull/7509)
|
||||
* Allocate memory with stream ordered semantics (i.e. use `hipMallocAsync`) [\#7659](https://github.com/kokkos/kokkos/pull/7659)
|
||||
* Fix a segfault when a virtual function called inside a kernel requires too many registers[\#7660](https://github.com/kokkos/kokkos/pull/7660)
|
||||
|
||||
#### SYCL:
|
||||
* Improve sorting performance for non-contiguous views [\#7502](https://github.com/kokkos/kokkos/pull/7502)
|
||||
|
||||
#### Serial:
|
||||
* Reduce fences overhead when using `Kokkos_ENABLE_ATOMICS_BYPASS` [\#7821](https://github.com/kokkos/kokkos/pull/7821)
|
||||
|
||||
### General Enhancements
|
||||
* Allow use of `kokkos_check` in `<PackageName>Config.cmake` without warnings [\#7669](https://github.com/kokkos/kokkos/pull/7669)
|
||||
* Add simd compound assignments and update simd reductions [\#7486](https://github.com/kokkos/kokkos/pull/7486)
|
||||
* Improve performance of the `inclusive_scan` algorithm with Cuda and HIP [\#7542](https://github.com/kokkos/kokkos/pull/7542)
|
||||
* Reduce tooling interface overhead (don't pay for what you don't use) [\#7817](https://github.com/kokkos/kokkos/pull/7817)
|
||||
* Avoid storing the view in `RandomAccessIterator` to increase performance [\#7304](https://github.com/kokkos/kokkos/pull/7304)
|
||||
* Make `RandomAccessIterator` fulfill `std::random_access_iterator concept` [\#7451](https://github.com/kokkos/kokkos/pull/7451)
|
||||
* Include information about support for system allocated memory in `print_configuration` (Cuda and HIP) [\#7673](https://github.com/kokkos/kokkos/pull/7673)
|
||||
|
||||
### Build System Changes
|
||||
* Add support for Zen 4 AMD microarchitecture [\#7550](https://github.com/kokkos/kokkos/pull/7550)
|
||||
* Enable NVIDIA Grace architecture with NVHPC [\#7858](https://github.com/kokkos/kokkos/pull/7858)
|
||||
* Support static library builds when using CUDA as CMake language [\#7830](https://github.com/kokkos/kokkos/pull/7830)
|
||||
|
||||
### Incompatibilities (i.e. breaking changes)
|
||||
* Change SIMD comparison operator to return `simd_mask` instead of `bool` [\#7781](https://github.com/kokkos/kokkos/pull/7781)
|
||||
* Remove classic Intel compiler (icpc) support [\#7737](https://github.com/kokkos/kokkos/pull/7737)
|
||||
* Remove `operator[]` overloads of Kokkos `basic_simd` and `basic_simd_mask` that return a reference [\#7630](https://github.com/kokkos/kokkos/pull/7630)
|
||||
|
||||
### Deprecations
|
||||
* Deprecate `StaticCrsGraph` and move it to Kokkos Kernels into `KokkosSparse::` [\#7516](https://github.com/kokkos/kokkos/pull/7516)
|
||||
* Deprecate `native_simd` and hide `simd_abi` [\#7472](https://github.com/kokkos/kokkos/pull/7472)
|
||||
* Deprecate Makefile support [\#7613](https://github.com/kokkos/kokkos/pull/7613)
|
||||
* DualView: Deprecate direct access to d_view and h_view [\#7716](https://github.com/kokkos/kokkos/pull/7716)
|
||||
|
||||
### Bug Fixes
|
||||
* Fix performance bug affecting `atomic_fetch_{add,sub,min,max,and,or,xor}` on integral types `long` and `unsigned long` with HIP [\#7816](https://github.com/kokkos/kokkos/pull/7816)
|
||||
* Fix execution of ranges with more than 2B elements [\#7797](https://github.com/kokkos/kokkos/pull/7797)
|
||||
* Fix clean target when embedding Kokkos in another project [\#7557](https://github.com/kokkos/kokkos/pull/7557)
|
||||
* Fix Zen3 flag for NVHPC [\#7558](https://github.com/kokkos/kokkos/pull/7558)
|
||||
* graph: nodes must be stored by the graph [\#7619](https://github.com/kokkos/kokkos/pull/7619)
|
||||
* Make sure lock arrays are on device before launching a graph [\#7685](https://github.com/kokkos/kokkos/pull/7685)
|
||||
* Performance bug in `RangePolicy`: construct error message if and only if the precondition is violated [\#7809](https://github.com/kokkos/kokkos/pull/7809)
|
||||
* simd: fix a bug in scalar min/max [\#7813](https://github.com/kokkos/kokkos/pull/7813)
|
||||
* simd: fix a bug in non-masked reductions [\#7845](https://github.com/kokkos/kokkos/pull/7845)
|
||||
* Cuda: fix incorrect iteration in `MDRangePolicy` of rank > 4 for high iteration counts [\#7724](https://github.com/kokkos/kokkos/pull/7724)
|
||||
* Cuda: ignore gcc assembler options in `nvcc-wrapper` [\#7492](https://github.com/kokkos/kokkos/pull/7492)
|
||||
* Build system: hint to `ARCH_NATIVE` if ARMv9 Grace arch is not explicitly supported by the compiler [\#7862](https://github.com/kokkos/kokkos/pull/7862)
|
||||
* Use right arch for MI300A in makefiles [\#7786](https://github.com/kokkos/kokkos/pull/7786)
|
||||
* Fix compiling BasicView on MSVC [\#7751](https://github.com/kokkos/kokkos/pull/7751)
|
||||
|
||||
## 4.5.01
|
||||
|
||||
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.5.00...4.5.01)
|
||||
|
||||
@ -148,8 +148,8 @@ elseif(NOT CMAKE_SIZEOF_VOID_P EQUAL 8)
|
||||
endif()
|
||||
|
||||
set(Kokkos_VERSION_MAJOR 4)
|
||||
set(Kokkos_VERSION_MINOR 5)
|
||||
set(Kokkos_VERSION_PATCH 1)
|
||||
set(Kokkos_VERSION_MINOR 6)
|
||||
set(Kokkos_VERSION_PATCH 0)
|
||||
set(Kokkos_VERSION "${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}.${Kokkos_VERSION_PATCH}")
|
||||
message(STATUS "Kokkos version: ${Kokkos_VERSION}")
|
||||
math(EXPR KOKKOS_VERSION "${Kokkos_VERSION_MAJOR} * 10000 + ${Kokkos_VERSION_MINOR} * 100 + ${Kokkos_VERSION_PATCH}")
|
||||
|
||||
4
lib/kokkos/CTestConfig.cmake
Normal file
4
lib/kokkos/CTestConfig.cmake
Normal file
@ -0,0 +1,4 @@
|
||||
set(CTEST_PROJECT_NAME Kokkos)
|
||||
set(CTEST_NIGHTLY_START_TIME 01:00:00 UTC)
|
||||
set(CTEST_SUBMIT_URL https://my.cdash.org/submit.php?project=Kokkos)
|
||||
set(CTEST_DROP_SITE_CDASH TRUE)
|
||||
@ -1,18 +1,26 @@
|
||||
# Default settings common options.
|
||||
|
||||
#SPARTA specific settings:
|
||||
#LAMMPS specific settings:
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
ifndef KOKKOS_PATH
|
||||
KOKKOS_PATH=../../lib/kokkos
|
||||
endif
|
||||
|
||||
CXXFLAGS=$(CCFLAGS)
|
||||
ifeq ($(mode),shared)
|
||||
CXXFLAGS += $(SHFLAGS)
|
||||
CXXFLAGS += $(SHFLAGS)
|
||||
endif
|
||||
|
||||
|
||||
ifneq ($(KOKKOS_USE_DEPRECATED_MAKEFILES), 1)
|
||||
$(error Makefile support is deprecated. Only CMake builds will be supported from Kokkos 5 on. Set KOKKOS_USE_DEPRECATED_MAKEFILES=1 to silence this error.)
|
||||
endif
|
||||
|
||||
KOKKOS_VERSION_MAJOR = 4
|
||||
KOKKOS_VERSION_MINOR = 5
|
||||
KOKKOS_VERSION_PATCH = 1
|
||||
KOKKOS_VERSION_MINOR = 6
|
||||
KOKKOS_VERSION_PATCH = 0
|
||||
KOKKOS_VERSION = $(shell echo $(KOKKOS_VERSION_MAJOR)*10000+$(KOKKOS_VERSION_MINOR)*100+$(KOKKOS_VERSION_PATCH) | bc)
|
||||
|
||||
# Options: Cuda,HIP,SYCL,OpenMPTarget,OpenMP,Threads,Serial
|
||||
@ -24,7 +32,7 @@ KOKKOS_DEVICES ?= "OpenMP"
|
||||
# ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX,ARMv9-Grace
|
||||
# IBM: Power8,Power9
|
||||
# AMD-GPUS: AMD_GFX906,AMD_GFX908,AMD_GFX90A,AMD_GFX940,AMD_GFX942,AMD_GFX942_APU,AMD_GFX1030,AMD_GFX1100,AMD_GFX1103
|
||||
# AMD-CPUS: AMDAVX,Zen,Zen2,Zen3
|
||||
# AMD-CPUS: AMDAVX,Zen,Zen2,Zen3,Zen4
|
||||
# Intel-GPUs: Intel_Gen,Intel_Gen9,Intel_Gen11,Intel_Gen12LP,Intel_DG1,Intel_XeHP,Intel_PVC
|
||||
KOKKOS_ARCH ?= ""
|
||||
# Options: yes,no
|
||||
@ -442,11 +450,14 @@ KOKKOS_INTERNAL_USE_ARCH_IBM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_
|
||||
|
||||
# AMD based.
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMDAVX := $(call kokkos_has_string,$(KOKKOS_ARCH),AMDAVX)
|
||||
KOKKOS_INTERNAL_USE_ARCH_ZEN4 := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen4)
|
||||
KOKKOS_INTERNAL_USE_ARCH_ZEN3 := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen3)
|
||||
KOKKOS_INTERNAL_USE_ARCH_ZEN2 := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen2)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN3), 0)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN2), 0)
|
||||
KOKKOS_INTERNAL_USE_ARCH_ZEN := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN4), 0)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN3), 0)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN2), 0)
|
||||
KOKKOS_INTERNAL_USE_ARCH_ZEN := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen)
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
|
||||
@ -463,8 +474,10 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX90A), 0)
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX90A := $(call kokkos_has_string,$(KOKKOS_ARCH),VEGA90A)
|
||||
endif
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX940 := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX940)
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX942 := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX942)
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX942_APU := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX942_APU)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX942_APU), 0)
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX942 := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX942)
|
||||
endif
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1030 := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX1030)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1030), 0)
|
||||
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1030 := $(call kokkos_has_string,$(KOKKOS_ARCH),NAVI1030)
|
||||
@ -857,6 +870,19 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN3), 1)
|
||||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN4), 1)
|
||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_ZEN4")
|
||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AVX512XEON")
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
|
||||
KOKKOS_CXXFLAGS += -xCORE-AVX512
|
||||
KOKKOS_LDFLAGS += -xCORE-AVX512
|
||||
else
|
||||
KOKKOS_CXXFLAGS += -march=znver4 -mtune=znver4
|
||||
KOKKOS_LDFLAGS += -march=znver4 -mtune=znver4
|
||||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX), 1)
|
||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_ARMV80")
|
||||
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_ARMV8_THUNDERX")
|
||||
|
||||
@ -18,24 +18,24 @@ Kokkos is a [Linux Foundation](https://linuxfoundation.org) project.
|
||||
|
||||
To start learning about Kokkos:
|
||||
|
||||
- [Kokkos Lectures](https://kokkos.org/kokkos-core-wiki/videolectures.html): they contain a mix of lecture videos and hands-on exercises covering all the important capabilities.
|
||||
- [Kokkos Lectures](https://kokkos.org/kokkos-core-wiki/tutorials-and-examples/video-lectures.html): they contain a mix of lecture videos and hands-on exercises covering all the important capabilities.
|
||||
|
||||
- [Programming guide](https://kokkos.org/kokkos-core-wiki/programmingguide.html): contains in "narrative" form a technical description of the programming model, machine model, and the main building blocks like the Views and parallel dispatch.
|
||||
|
||||
- [API reference](https://kokkos.org/kokkos-core-wiki/): organized by category, i.e., [core](https://kokkos.org/kokkos-core-wiki/API/core-index.html), [algorithms](https://kokkos.org/kokkos-core-wiki/API/algorithms-index.html) and [containers](https://kokkos.org/kokkos-core-wiki/API/containers-index.html) or, if you prefer, in [alphabetical order](https://kokkos.org/kokkos-core-wiki/API/alphabetical.html).
|
||||
|
||||
- [Use cases and Examples](https://kokkos.org/kokkos-core-wiki/usecases.html): a serie of examples ranging from how to use Kokkos with MPI to Fortran interoperability.
|
||||
- [Use cases and Examples](https://kokkos.org/kokkos-core-wiki/tutorials-and-examples/use-cases-and-examples.html): a serie of examples ranging from how to use Kokkos with MPI to Fortran interoperability.
|
||||
|
||||
## Obtaining Kokkos
|
||||
|
||||
The latest release of Kokkos can be obtained from the [GitHub releases page](https://github.com/kokkos/kokkos/releases/latest).
|
||||
|
||||
The current release is [4.5.01](https://github.com/kokkos/kokkos/releases/tag/4.5.01).
|
||||
The current release is [4.6.00](https://github.com/kokkos/kokkos/releases/tag/4.6.00).
|
||||
|
||||
```bash
|
||||
curl -OJ -L https://github.com/kokkos/kokkos/releases/download/4.5.01/kokkos-4.5.01.tar.gz
|
||||
curl -OJ -L https://github.com/kokkos/kokkos/releases/download/4.6.00/kokkos-4.6.00.tar.gz
|
||||
# Or with wget
|
||||
wget https://github.com/kokkos/kokkos/releases/download/4.5.01/kokkos-4.5.01.tar.gz
|
||||
wget https://github.com/kokkos/kokkos/releases/download/4.6.00/kokkos-4.6.00.tar.gz
|
||||
```
|
||||
|
||||
To clone the latest development version of Kokkos from GitHub:
|
||||
@ -47,7 +47,7 @@ git clone -b develop https://github.com/kokkos/kokkos.git
|
||||
### Building Kokkos
|
||||
|
||||
To build Kokkos, you will need to have a C++ compiler that supports C++17 or later.
|
||||
All requirements including minimum and primary tested compiler versions can be found [here](https://kokkos.org/kokkos-core-wiki/requirements.html).
|
||||
All requirements including minimum and primary tested compiler versions can be found [here](https://kokkos.org/kokkos-core-wiki/get-started/requirements.html).
|
||||
|
||||
Building and installation instructions are described [here](https://kokkos.org/kokkos-core-wiki/building.html).
|
||||
|
||||
|
||||
@ -5,3 +5,7 @@ endif()
|
||||
if(NOT ((KOKKOS_ENABLE_OPENMPTARGET AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC) OR KOKKOS_ENABLE_OPENACC))
|
||||
kokkos_add_test_directories(unit_tests)
|
||||
endif()
|
||||
|
||||
if(Kokkos_ENABLE_BENCHMARKS)
|
||||
add_subdirectory(perf_test)
|
||||
endif()
|
||||
|
||||
63
lib/kokkos/algorithms/perf_test/CMakeLists.txt
Normal file
63
lib/kokkos/algorithms/perf_test/CMakeLists.txt
Normal file
@ -0,0 +1,63 @@
|
||||
# FIXME: The following logic should be moved from here and also from `core/perf_test/CMakeLists.txt` to
|
||||
# the root `CMakeLists.txt` in the form of a macro
|
||||
# Find or download google/benchmark library
|
||||
find_package(benchmark QUIET 1.5.6)
|
||||
if(benchmark_FOUND)
|
||||
message(STATUS "Using google benchmark found in ${benchmark_DIR}")
|
||||
else()
|
||||
message(STATUS "No installed google benchmark found, fetching from GitHub")
|
||||
include(FetchContent)
|
||||
set(BENCHMARK_ENABLE_TESTING OFF)
|
||||
|
||||
list(APPEND CMAKE_MESSAGE_INDENT "[benchmark] ")
|
||||
FetchContent_Declare(
|
||||
googlebenchmark
|
||||
DOWNLOAD_EXTRACT_TIMESTAMP FALSE
|
||||
URL https://github.com/google/benchmark/archive/refs/tags/v1.7.1.tar.gz
|
||||
URL_HASH MD5=0459a6c530df9851bee6504c3e37c2e7
|
||||
)
|
||||
FetchContent_MakeAvailable(googlebenchmark)
|
||||
list(POP_BACK CMAKE_MESSAGE_INDENT)
|
||||
|
||||
# Suppress clang-tidy diagnostics on code that we do not have control over
|
||||
if(CMAKE_CXX_CLANG_TIDY)
|
||||
set_target_properties(benchmark PROPERTIES CXX_CLANG_TIDY "")
|
||||
endif()
|
||||
|
||||
# FIXME: Check whether the following target_compile_options are needed.
|
||||
# If so, clarify why.
|
||||
target_compile_options(benchmark PRIVATE -w)
|
||||
target_compile_options(benchmark_main PRIVATE -w)
|
||||
endif()
|
||||
|
||||
# FIXME: This function should be moved from here and also from `core/perf_test/CMakeLists.txt` to
|
||||
# the root `CMakeLists.txt`
|
||||
# FIXME: Could NAME be a one_value_keyword specified in cmake_parse_arguments?
|
||||
function(KOKKOS_ADD_BENCHMARK NAME)
|
||||
cmake_parse_arguments(BENCHMARK "" "" "SOURCES" ${ARGN})
|
||||
if(DEFINED BENCHMARK_UNPARSED_ARGUMENTS)
|
||||
message(WARNING "Unexpected arguments when adding a benchmark: " ${BENCHMARK_UNPARSED_ARGUMENTS})
|
||||
endif()
|
||||
|
||||
set(BENCHMARK_NAME Kokkos_${NAME})
|
||||
# FIXME: BenchmarkMain.cpp and Benchmark_Context.cpp should be moved to a common location from which
|
||||
# they can be used by all performance tests.
|
||||
list(APPEND BENCHMARK_SOURCES ../../core/perf_test/BenchmarkMain.cpp ../../core/perf_test/Benchmark_Context.cpp)
|
||||
|
||||
add_executable(${BENCHMARK_NAME} ${BENCHMARK_SOURCES})
|
||||
target_link_libraries(${BENCHMARK_NAME} PRIVATE benchmark::benchmark Kokkos::kokkos impl_git_version)
|
||||
target_include_directories(${BENCHMARK_NAME} SYSTEM PRIVATE ${benchmark_SOURCE_DIR}/include)
|
||||
|
||||
# FIXME: This alone will not work. It might need an architecture and standard which need to be defined on target level.
|
||||
# It will potentially go away with #7582.
|
||||
foreach(SOURCE_FILE ${BENCHMARK_SOURCES})
|
||||
set_source_files_properties(${SOURCE_FILE} PROPERTIES LANGUAGE ${KOKKOS_COMPILE_LANGUAGE})
|
||||
endforeach()
|
||||
|
||||
string(TIMESTAMP BENCHMARK_TIME "%Y-%m-%d_T%H-%M-%S" UTC)
|
||||
set(BENCHMARK_ARGS --benchmark_counters_tabular=true --benchmark_out=${BENCHMARK_NAME}_${BENCHMARK_TIME}.json)
|
||||
|
||||
add_test(NAME ${BENCHMARK_NAME} COMMAND ${BENCHMARK_NAME} ${BENCHMARK_ARGS})
|
||||
endfunction()
|
||||
|
||||
kokkos_add_benchmark(PerformanceTest_InclusiveScan SOURCES test_inclusive_scan.cpp)
|
||||
191
lib/kokkos/algorithms/perf_test/test_inclusive_scan.cpp
Normal file
191
lib/kokkos/algorithms/perf_test/test_inclusive_scan.cpp
Normal file
@ -0,0 +1,191 @@
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
// Kokkos v. 4.0
|
||||
// Copyright (2022) National Technology & Engineering
|
||||
// Solutions of Sandia, LLC (NTESS).
|
||||
//
|
||||
// Under the terms of Contract DE-NA0003525 with NTESS,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
// Part of Kokkos, under the Apache License v2.0 with LLVM Exceptions.
|
||||
// See https://kokkos.org/LICENSE for license information.
|
||||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
||||
//
|
||||
//@HEADER
|
||||
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
#include <tuple>
|
||||
#include <type_traits>
|
||||
|
||||
#include <benchmark/benchmark.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
#include <Kokkos_Timer.hpp>
|
||||
#include <Kokkos_StdAlgorithms.hpp>
|
||||
// FIXME: Benchmark_Context.hpp should be moved to a common location
|
||||
#include "../../core/perf_test/Benchmark_Context.hpp"
|
||||
|
||||
namespace {
|
||||
|
||||
namespace KE = Kokkos::Experimental;
|
||||
|
||||
using ExecSpace = Kokkos::DefaultExecutionSpace;
|
||||
using HostExecSpace = Kokkos::DefaultHostExecutionSpace;
|
||||
|
||||
// A tag struct to identify when inclusive scan with the implicit sum
|
||||
// based binary operation needs to be called.
|
||||
template <class ValueType>
|
||||
struct ImpSumBinOp;
|
||||
|
||||
template <class ValueType>
|
||||
struct SumFunctor {
|
||||
KOKKOS_FUNCTION
|
||||
ValueType operator()(const ValueType& a, const ValueType& b) const {
|
||||
return (a + b);
|
||||
}
|
||||
};
|
||||
|
||||
template <class ValueType>
|
||||
struct MaxFunctor {
|
||||
KOKKOS_FUNCTION
|
||||
ValueType operator()(const ValueType& a, const ValueType& b) const {
|
||||
if (a > b)
|
||||
return a;
|
||||
else
|
||||
return b;
|
||||
}
|
||||
};
|
||||
|
||||
// Helper to obtain last element of a view
|
||||
template <class T>
|
||||
T obtain_last_elem(const Kokkos::View<T*, ExecSpace>& v) {
|
||||
T last_element;
|
||||
Kokkos::deep_copy(last_element, Kokkos::subview(v, v.extent(0) - 1));
|
||||
return last_element;
|
||||
}
|
||||
|
||||
// Helper to allocate input and output views
|
||||
template <class T>
|
||||
auto prepare_views(const std::size_t kProbSize) {
|
||||
Kokkos::View<T*, ExecSpace> in{"input", kProbSize};
|
||||
Kokkos::View<T*, ExecSpace> out{"output", kProbSize};
|
||||
|
||||
auto h_in = Kokkos::create_mirror_view(in);
|
||||
|
||||
for (std::size_t i = 0; i < kProbSize; ++i) {
|
||||
h_in(i) = i;
|
||||
}
|
||||
|
||||
Kokkos::deep_copy(in, h_in);
|
||||
|
||||
return std::make_tuple(in, out, h_in);
|
||||
}
|
||||
|
||||
// Perform scan with a reference implementation
|
||||
template <class T, class ViewType, class ScanFunctor = SumFunctor<T>>
|
||||
T ref_scan(const ViewType& h_in, ScanFunctor scan_functor = ScanFunctor()) {
|
||||
std::size_t view_size = h_in.extent(0);
|
||||
|
||||
Kokkos::View<T*, HostExecSpace> h_out("output", view_size);
|
||||
|
||||
// FIXME: We have GCC 8.4.0 based check in our ORNL Jenkins CI.
|
||||
// std::inclusive_scan is available only from GCC 9.3. Since, GCC 9.1
|
||||
// std::inclusive_scan that takes execution policy is available. However,
|
||||
// there is error with <execution> header before GCC 10.1.
|
||||
h_out(0) = h_in(0);
|
||||
|
||||
for (std::size_t i = 1; i < view_size; ++i) {
|
||||
h_out(i) = scan_functor(h_in(i), h_out(i - 1));
|
||||
}
|
||||
|
||||
return h_out(view_size - 1);
|
||||
}
|
||||
|
||||
// Inclusive Scan with default binary operation (sum) or user provided functor
|
||||
// Note: The nature of the functor must be compatible with the
|
||||
// elements in the input and output views
|
||||
template <class T, template <class> class ScanFunctor = ImpSumBinOp>
|
||||
auto inclusive_scan(const Kokkos::View<T*, ExecSpace>& in,
|
||||
const Kokkos::View<T*, ExecSpace>& out, T res_check) {
|
||||
ExecSpace().fence();
|
||||
Kokkos::Timer timer;
|
||||
|
||||
if constexpr (std::is_same_v<ScanFunctor<T>, ImpSumBinOp<T>>) {
|
||||
KE::inclusive_scan("Default scan", ExecSpace(), KE::cbegin(in),
|
||||
KE::cend(in), KE::begin(out));
|
||||
} else {
|
||||
KE::inclusive_scan("Scan using a functor", ExecSpace(), KE::cbegin(in),
|
||||
KE::cend(in), KE::begin(out), ScanFunctor<T>());
|
||||
}
|
||||
|
||||
ExecSpace().fence();
|
||||
double time_scan = timer.seconds();
|
||||
|
||||
T res_scan = obtain_last_elem(out);
|
||||
bool passed = (res_check == res_scan);
|
||||
|
||||
return std::make_tuple(time_scan, passed);
|
||||
}
|
||||
|
||||
// Benchmark: Inclusive Scan with default binary operation (sum)
|
||||
// or user provided functor
|
||||
template <class T, template <class> class ScanFunctor = ImpSumBinOp>
|
||||
void BM_inclusive_scan(benchmark::State& state) {
|
||||
const std::size_t kProbSize = state.range(0);
|
||||
|
||||
auto [in, out, h_in] = prepare_views<T>(kProbSize);
|
||||
|
||||
T res_check;
|
||||
|
||||
if constexpr (std::is_same_v<ScanFunctor<T>, ImpSumBinOp<T>>) {
|
||||
res_check = ref_scan<T>(h_in);
|
||||
} else {
|
||||
res_check = ref_scan<T>(h_in, ScanFunctor<T>());
|
||||
}
|
||||
|
||||
double time_scan = 0.;
|
||||
bool passed = false;
|
||||
|
||||
for (auto _ : state) {
|
||||
if constexpr (std::is_same_v<ScanFunctor<T>, ImpSumBinOp<T>>) {
|
||||
std::tie(time_scan, passed) = inclusive_scan<T>(in, out, res_check);
|
||||
} else {
|
||||
std::tie(time_scan, passed) =
|
||||
inclusive_scan<T, ScanFunctor>(in, out, res_check);
|
||||
}
|
||||
|
||||
KokkosBenchmark::report_results(state, in, 2, time_scan);
|
||||
state.counters["Passed"] = passed;
|
||||
}
|
||||
}
|
||||
|
||||
constexpr std::size_t PROB_SIZE = 100'000'000;
|
||||
|
||||
} // anonymous namespace
|
||||
|
||||
// FIXME: Add logic to pass min. warm-up time. Also, the value should be set
|
||||
// by the user. Say, via the environment variable BENCHMARK_MIN_WARMUP_TIME.
|
||||
|
||||
BENCHMARK(BM_inclusive_scan<std::uint64_t>)->Arg(PROB_SIZE)->UseManualTime();
|
||||
BENCHMARK(BM_inclusive_scan<std::int64_t>)->Arg(PROB_SIZE)->UseManualTime();
|
||||
BENCHMARK(BM_inclusive_scan<double>)->Arg(PROB_SIZE)->UseManualTime();
|
||||
BENCHMARK(BM_inclusive_scan<std::uint64_t, SumFunctor>)
|
||||
->Arg(PROB_SIZE)
|
||||
->UseManualTime();
|
||||
BENCHMARK(BM_inclusive_scan<std::int64_t, SumFunctor>)
|
||||
->Arg(PROB_SIZE)
|
||||
->UseManualTime();
|
||||
BENCHMARK(BM_inclusive_scan<double, SumFunctor>)
|
||||
->Arg(PROB_SIZE)
|
||||
->UseManualTime();
|
||||
BENCHMARK(BM_inclusive_scan<std::uint64_t, MaxFunctor>)
|
||||
->Arg(PROB_SIZE)
|
||||
->UseManualTime();
|
||||
BENCHMARK(BM_inclusive_scan<std::int64_t, MaxFunctor>)
|
||||
->Arg(PROB_SIZE)
|
||||
->UseManualTime();
|
||||
BENCHMARK(BM_inclusive_scan<double, MaxFunctor>)
|
||||
->Arg(PROB_SIZE)
|
||||
->UseManualTime();
|
||||
@ -587,11 +587,13 @@ struct Random_XorShift1024_State<false> {
|
||||
int state_idx)
|
||||
: state_(&v(state_idx, 0)), stride_(v.stride_1()) {}
|
||||
|
||||
// NOLINTBEGIN(bugprone-implicit-widening-of-multiplication-result)
|
||||
KOKKOS_FUNCTION
|
||||
uint64_t operator[](const int i) const { return state_[i * stride_]; }
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
uint64_t& operator[](const int i) { return state_[i * stride_]; }
|
||||
// NOLINTEND(bugprone-implicit-widening-of-multiplication-result)
|
||||
};
|
||||
|
||||
template <class ExecutionSpace>
|
||||
@ -670,7 +672,12 @@ struct Random_UniqueIndex<Kokkos::Device<Kokkos::SYCL, MemorySpace>> {
|
||||
View<int**, Kokkos::Device<Kokkos::SYCL, MemorySpace>>;
|
||||
KOKKOS_FUNCTION
|
||||
static int get_state_idx(const locks_view_type& locks_) {
|
||||
#if defined(KOKKOS_COMPILER_INTEL_LLVM) && \
|
||||
KOKKOS_COMPILER_INTEL_LLVM >= 20250000
|
||||
auto item = sycl::ext::oneapi::this_work_item::get_nd_item<3>();
|
||||
#else
|
||||
auto item = sycl::ext::oneapi::experimental::this_nd_item<3>();
|
||||
#endif
|
||||
std::size_t threadIdx[3] = {item.get_local_id(2), item.get_local_id(1),
|
||||
item.get_local_id(0)};
|
||||
std::size_t blockIdx[3] = {item.get_group(2), item.get_group(1),
|
||||
|
||||
@ -45,7 +45,7 @@ struct BinOp1D {
|
||||
// For integral types the number of bins may be larger than the range
|
||||
// in which case we can exactly have one unique value per bin
|
||||
// and then don't need to sort bins.
|
||||
if (std::is_integral<typename KeyViewType::const_value_type>::value &&
|
||||
if (std::is_integral_v<typename KeyViewType::const_value_type> &&
|
||||
(static_cast<double>(max) - static_cast<double>(min)) <=
|
||||
static_cast<double>(max_bins)) {
|
||||
mul_ = 1.;
|
||||
|
||||
@ -53,13 +53,9 @@ void sort(const ExecutionSpace& exec,
|
||||
|
||||
if constexpr (Impl::better_off_calling_std_sort_v<ExecutionSpace>) {
|
||||
exec.fence("Kokkos::sort without comparator use std::sort");
|
||||
if (view.span_is_contiguous()) {
|
||||
std::sort(view.data(), view.data() + view.size());
|
||||
} else {
|
||||
auto first = ::Kokkos::Experimental::begin(view);
|
||||
auto last = ::Kokkos::Experimental::end(view);
|
||||
std::sort(first, last);
|
||||
}
|
||||
auto first = ::Kokkos::Experimental::begin(view);
|
||||
auto last = ::Kokkos::Experimental::end(view);
|
||||
std::sort(first, last);
|
||||
} else {
|
||||
Impl::sort_device_view_without_comparator(exec, view);
|
||||
}
|
||||
@ -111,13 +107,9 @@ void sort(const ExecutionSpace& exec,
|
||||
|
||||
if constexpr (Impl::better_off_calling_std_sort_v<ExecutionSpace>) {
|
||||
exec.fence("Kokkos::sort with comparator use std::sort");
|
||||
if (view.span_is_contiguous()) {
|
||||
std::sort(view.data(), view.data() + view.size(), comparator);
|
||||
} else {
|
||||
auto first = ::Kokkos::Experimental::begin(view);
|
||||
auto last = ::Kokkos::Experimental::end(view);
|
||||
std::sort(first, last, comparator);
|
||||
}
|
||||
auto first = ::Kokkos::Experimental::begin(view);
|
||||
auto last = ::Kokkos::Experimental::end(view);
|
||||
std::sort(first, last, comparator);
|
||||
} else {
|
||||
Impl::sort_device_view_with_comparator(exec, view, comparator);
|
||||
}
|
||||
|
||||
@ -47,6 +47,7 @@
|
||||
#ifdef _CubLog
|
||||
#undef _CubLog
|
||||
#endif
|
||||
// NOLINTNEXTLINE(bugprone-reserved-identifier)
|
||||
#define _CubLog
|
||||
#include <thrust/device_ptr.h>
|
||||
#include <thrust/sort.h>
|
||||
@ -65,12 +66,24 @@
|
||||
#include <thrust/sort.h>
|
||||
#endif
|
||||
|
||||
#if defined(KOKKOS_ENABLE_ONEDPL) && \
|
||||
(ONEDPL_VERSION_MAJOR > 2022 || \
|
||||
(ONEDPL_VERSION_MAJOR == 2022 && ONEDPL_VERSION_MINOR >= 2))
|
||||
#define KOKKOS_ONEDPL_HAS_SORT_BY_KEY
|
||||
#ifdef KOKKOS_ENABLE_ONEDPL
|
||||
#define KOKKOS_IMPL_ONEDPL_VERSION \
|
||||
ONEDPL_VERSION_MAJOR * 10000 + ONEDPL_VERSION_MINOR * 100 + \
|
||||
ONEDPL_VERSION_PATCH
|
||||
#define KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(MAJOR, MINOR, PATCH) \
|
||||
(KOKKOS_IMPL_ONEDPL_VERSION >= ((MAJOR)*10000 + (MINOR)*100 + (PATCH)))
|
||||
|
||||
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 2, 0)
|
||||
#define KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
|
||||
#pragma GCC diagnostic push
|
||||
#pragma GCC diagnostic ignored "-Wshadow"
|
||||
#pragma GCC diagnostic ignored "-Wunused-local-typedef"
|
||||
#pragma GCC diagnostic ignored "-Wunused-parameter"
|
||||
#pragma GCC diagnostic ignored "-Wunused-variable"
|
||||
#include <oneapi/dpl/execution>
|
||||
#include <oneapi/dpl/algorithm>
|
||||
#pragma GCC diagnostic pop
|
||||
#endif
|
||||
#endif
|
||||
|
||||
namespace Kokkos::Impl {
|
||||
@ -141,12 +154,18 @@ void sort_by_key_rocthrust(
|
||||
#endif
|
||||
|
||||
#if defined(KOKKOS_ENABLE_ONEDPL)
|
||||
|
||||
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
|
||||
template <class Layout>
|
||||
inline constexpr bool sort_on_device_v<Kokkos::SYCL, Layout> = true;
|
||||
#else
|
||||
template <class Layout>
|
||||
inline constexpr bool sort_on_device_v<Kokkos::SYCL, Layout> =
|
||||
std::is_same_v<Layout, Kokkos::LayoutLeft> ||
|
||||
std::is_same_v<Layout, Kokkos::LayoutRight>;
|
||||
#endif
|
||||
|
||||
#ifdef KOKKOS_ONEDPL_HAS_SORT_BY_KEY
|
||||
#ifdef KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
|
||||
template <class KeysDataType, class... KeysProperties, class ValuesDataType,
|
||||
class... ValuesProperties, class... MaybeComparator>
|
||||
void sort_by_key_onedpl(
|
||||
@ -154,6 +173,14 @@ void sort_by_key_onedpl(
|
||||
const Kokkos::View<KeysDataType, KeysProperties...>& keys,
|
||||
const Kokkos::View<ValuesDataType, ValuesProperties...>& values,
|
||||
MaybeComparator&&... maybeComparator) {
|
||||
auto queue = exec.sycl_queue();
|
||||
auto policy = oneapi::dpl::execution::make_device_policy(queue);
|
||||
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
|
||||
oneapi::dpl::sort_by_key(policy, ::Kokkos::Experimental::begin(keys),
|
||||
::Kokkos::Experimental::end(keys),
|
||||
::Kokkos::Experimental::begin(values),
|
||||
std::forward<MaybeComparator>(maybeComparator)...);
|
||||
#else
|
||||
if (keys.stride(0) != 1 && values.stride(0) != 1) {
|
||||
Kokkos::abort(
|
||||
"SYCL sort_by_key only supports rank-1 Views with stride(0) = 1.");
|
||||
@ -161,11 +188,10 @@ void sort_by_key_onedpl(
|
||||
|
||||
// Can't use Experimental::begin/end here since the oneDPL then assumes that
|
||||
// the data is on the host.
|
||||
auto queue = exec.sycl_queue();
|
||||
auto policy = oneapi::dpl::execution::make_device_policy(queue);
|
||||
const int n = keys.extent(0);
|
||||
oneapi::dpl::sort_by_key(policy, keys.data(), keys.data() + n, values.data(),
|
||||
std::forward<MaybeComparator>(maybeComparator)...);
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
#endif
|
||||
@ -336,12 +362,18 @@ void sort_by_key_device_view_without_comparator(
|
||||
const Kokkos::SYCL& exec,
|
||||
const Kokkos::View<KeysDataType, KeysProperties...>& keys,
|
||||
const Kokkos::View<ValuesDataType, ValuesProperties...>& values) {
|
||||
#ifdef KOKKOS_ONEDPL_HAS_SORT_BY_KEY
|
||||
#ifdef KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
|
||||
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
|
||||
sort_by_key_onedpl(exec, keys, values);
|
||||
#else
|
||||
if (keys.stride(0) == 1 && values.stride(0) == 1)
|
||||
sort_by_key_onedpl(exec, keys, values);
|
||||
else
|
||||
#endif
|
||||
sort_by_key_via_sort(exec, keys, values);
|
||||
#endif
|
||||
#else
|
||||
sort_by_key_via_sort(exec, keys, values);
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
|
||||
@ -394,12 +426,18 @@ void sort_by_key_device_view_with_comparator(
|
||||
const Kokkos::View<KeysDataType, KeysProperties...>& keys,
|
||||
const Kokkos::View<ValuesDataType, ValuesProperties...>& values,
|
||||
const ComparatorType& comparator) {
|
||||
#ifdef KOKKOS_ONEDPL_HAS_SORT_BY_KEY
|
||||
#ifdef KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
|
||||
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
|
||||
sort_by_key_onedpl(exec, keys, values, comparator);
|
||||
#else
|
||||
if (keys.stride(0) == 1 && values.stride(0) == 1)
|
||||
sort_by_key_onedpl(exec, keys, values, comparator);
|
||||
else
|
||||
#endif
|
||||
sort_by_key_via_sort(exec, keys, values, comparator);
|
||||
#endif
|
||||
#else
|
||||
sort_by_key_via_sort(exec, keys, values, comparator);
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
|
||||
@ -416,7 +454,9 @@ sort_by_key_device_view_with_comparator(
|
||||
sort_by_key_via_sort(exec, keys, values, comparator);
|
||||
}
|
||||
|
||||
#undef KOKKOS_ONEDPL_HAS_SORT_BY_KEY
|
||||
#undef KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
|
||||
|
||||
} // namespace Kokkos::Impl
|
||||
#undef KOKKOS_IMPL_ONEDPL_VERSION
|
||||
#undef KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL
|
||||
#endif
|
||||
|
||||
@ -51,6 +51,7 @@
|
||||
#ifdef _CubLog
|
||||
#undef _CubLog
|
||||
#endif
|
||||
// NOLINTNEXTLINE(bugprone-reserved-identifier)
|
||||
#define _CubLog
|
||||
#include <thrust/device_ptr.h>
|
||||
#include <thrust/sort.h>
|
||||
@ -70,8 +71,20 @@
|
||||
#endif
|
||||
|
||||
#if defined(KOKKOS_ENABLE_ONEDPL)
|
||||
#pragma GCC diagnostic push
|
||||
#pragma GCC diagnostic ignored "-Wshadow"
|
||||
#pragma GCC diagnostic ignored "-Wunused-local-typedef"
|
||||
#pragma GCC diagnostic ignored "-Wunused-parameter"
|
||||
#pragma GCC diagnostic ignored "-Wunused-variable"
|
||||
#include <oneapi/dpl/execution>
|
||||
#include <oneapi/dpl/algorithm>
|
||||
#pragma GCC diagnostic pop
|
||||
|
||||
#define KOKKOS_IMPL_ONEDPL_VERSION \
|
||||
ONEDPL_VERSION_MAJOR * 10000 + ONEDPL_VERSION_MINOR * 100 + \
|
||||
ONEDPL_VERSION_PATCH
|
||||
#define KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(MAJOR, MINOR, PATCH) \
|
||||
(KOKKOS_IMPL_ONEDPL_VERSION >= ((MAJOR)*10000 + (MINOR)*100 + (PATCH)))
|
||||
#endif
|
||||
|
||||
namespace Kokkos {
|
||||
@ -221,6 +234,10 @@ void sort_onedpl(const Kokkos::SYCL& space,
|
||||
"SYCL execution space is not able to access the memory space "
|
||||
"of the View argument!");
|
||||
|
||||
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
|
||||
static_assert(ViewType::rank == 1,
|
||||
"Kokkos::sort currently only supports rank-1 Views.");
|
||||
#else
|
||||
static_assert(
|
||||
(ViewType::rank == 1) &&
|
||||
(std::is_same_v<typename ViewType::array_layout, LayoutRight> ||
|
||||
@ -234,18 +251,26 @@ void sort_onedpl(const Kokkos::SYCL& space,
|
||||
if (view.stride(0) != 1) {
|
||||
Kokkos::abort("SYCL sort only supports rank-1 Views with stride(0) = 1.");
|
||||
}
|
||||
#endif
|
||||
|
||||
if (view.extent(0) <= 1) {
|
||||
return;
|
||||
}
|
||||
|
||||
// Can't use Experimental::begin/end here since the oneDPL then assumes that
|
||||
// the data is on the host.
|
||||
auto queue = space.sycl_queue();
|
||||
auto policy = oneapi::dpl::execution::make_device_policy(queue);
|
||||
|
||||
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
|
||||
oneapi::dpl::sort(policy, ::Kokkos::Experimental::begin(view),
|
||||
::Kokkos::Experimental::end(view),
|
||||
std::forward<MaybeComparator>(maybeComparator)...);
|
||||
#else
|
||||
// Can't use Experimental::begin/end here since the oneDPL then assumes that
|
||||
// the data is on the host.
|
||||
const int n = view.extent(0);
|
||||
oneapi::dpl::sort(policy, view.data(), view.data() + n,
|
||||
std::forward<MaybeComparator>(maybeComparator)...);
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
|
||||
@ -269,29 +294,19 @@ void copy_to_host_run_stdsort_copy_back(
|
||||
KE::copy(exec, view, view_dc);
|
||||
|
||||
// run sort on the mirror of view_dc
|
||||
auto mv_h = create_mirror_view_and_copy(Kokkos::HostSpace(), view_dc);
|
||||
if (view.span_is_contiguous()) {
|
||||
std::sort(mv_h.data(), mv_h.data() + mv_h.size(),
|
||||
std::forward<MaybeComparator>(maybeComparator)...);
|
||||
} else {
|
||||
auto first = KE::begin(mv_h);
|
||||
auto last = KE::end(mv_h);
|
||||
std::sort(first, last, std::forward<MaybeComparator>(maybeComparator)...);
|
||||
}
|
||||
auto mv_h = create_mirror_view_and_copy(Kokkos::HostSpace(), view_dc);
|
||||
auto first = KE::begin(mv_h);
|
||||
auto last = KE::end(mv_h);
|
||||
std::sort(first, last, std::forward<MaybeComparator>(maybeComparator)...);
|
||||
Kokkos::deep_copy(exec, view_dc, mv_h);
|
||||
|
||||
// copy back to argument view
|
||||
KE::copy(exec, KE::cbegin(view_dc), KE::cend(view_dc), KE::begin(view));
|
||||
} else {
|
||||
auto view_h = create_mirror_view_and_copy(Kokkos::HostSpace(), view);
|
||||
if (view.span_is_contiguous()) {
|
||||
std::sort(view_h.data(), view_h.data() + view_h.size(),
|
||||
std::forward<MaybeComparator>(maybeComparator)...);
|
||||
} else {
|
||||
auto first = KE::begin(view_h);
|
||||
auto last = KE::end(view_h);
|
||||
std::sort(first, last, std::forward<MaybeComparator>(maybeComparator)...);
|
||||
}
|
||||
auto first = KE::begin(view_h);
|
||||
auto last = KE::end(view_h);
|
||||
std::sort(first, last, std::forward<MaybeComparator>(maybeComparator)...);
|
||||
Kokkos::deep_copy(exec, view, view_h);
|
||||
}
|
||||
}
|
||||
@ -332,11 +347,15 @@ void sort_device_view_without_comparator(
|
||||
"sort_device_view_without_comparator: supports rank-1 Views "
|
||||
"with LayoutLeft, LayoutRight or LayoutStride");
|
||||
|
||||
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
|
||||
sort_onedpl(exec, view);
|
||||
#else
|
||||
if (view.stride(0) == 1) {
|
||||
sort_onedpl(exec, view);
|
||||
} else {
|
||||
copy_to_host_run_stdsort_copy_back(exec, view);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
|
||||
@ -387,11 +406,15 @@ void sort_device_view_with_comparator(
|
||||
"sort_device_view_with_comparator: supports rank-1 Views "
|
||||
"with LayoutLeft, LayoutRight or LayoutStride");
|
||||
|
||||
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
|
||||
sort_onedpl(exec, view, comparator);
|
||||
#else
|
||||
if (view.stride(0) == 1) {
|
||||
sort_onedpl(exec, view, comparator);
|
||||
} else {
|
||||
copy_to_host_run_stdsort_copy_back(exec, view, comparator);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
|
||||
@ -423,4 +446,7 @@ sort_device_view_with_comparator(
|
||||
|
||||
} // namespace Impl
|
||||
} // namespace Kokkos
|
||||
|
||||
#undef KOKKOS_IMPL_ONEDPL_VERSION
|
||||
#undef KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL
|
||||
#endif
|
||||
|
||||
@ -238,12 +238,9 @@ KOKKOS_INLINE_FUNCTION void expect_no_overlap(
|
||||
[[maybe_unused]] IteratorType2 s_first) {
|
||||
if constexpr (is_kokkos_iterator_v<IteratorType1> &&
|
||||
is_kokkos_iterator_v<IteratorType2>) {
|
||||
auto const view1 = first.view();
|
||||
auto const view2 = s_first.view();
|
||||
|
||||
std::size_t stride1 = view1.stride(0);
|
||||
std::size_t stride2 = view2.stride(0);
|
||||
ptrdiff_t first_diff = view1.data() - view2.data();
|
||||
std::size_t stride1 = first.stride();
|
||||
std::size_t stride2 = s_first.stride();
|
||||
ptrdiff_t first_diff = first.data() - s_first.data();
|
||||
|
||||
// FIXME If strides are not identical, checks may not be made
|
||||
// with the cost of O(1)
|
||||
@ -251,8 +248,8 @@ KOKKOS_INLINE_FUNCTION void expect_no_overlap(
|
||||
// If first_diff == 0, there is already an overlap
|
||||
if (stride1 == stride2 || first_diff == 0) {
|
||||
[[maybe_unused]] bool is_no_overlap = (first_diff % stride1);
|
||||
auto* first_pointer1 = view1.data();
|
||||
auto* first_pointer2 = view2.data();
|
||||
auto* first_pointer1 = first.data();
|
||||
auto* first_pointer2 = s_first.data();
|
||||
[[maybe_unused]] auto* last_pointer1 = first_pointer1 + (last - first);
|
||||
[[maybe_unused]] auto* last_pointer2 = first_pointer2 + (last - first);
|
||||
KOKKOS_EXPECTS(first_pointer1 >= last_pointer2 ||
|
||||
|
||||
@ -150,9 +150,8 @@ KOKKOS_FUNCTION OutputIterator copy_if_team_impl(
|
||||
return d_first + count;
|
||||
}
|
||||
|
||||
#if defined KOKKOS_COMPILER_INTEL || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
|
||||
@ -103,7 +103,7 @@ OutputIteratorType exclusive_scan_custom_op_exespace_impl(
|
||||
|
||||
// aliases
|
||||
using index_type = typename InputIteratorType::difference_type;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<ValueType>;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
|
||||
using func_type = TransformExclusiveScanFunctorWithValueWrapper<
|
||||
ExecutionSpace, index_type, ValueType, InputIteratorType,
|
||||
OutputIteratorType, BinaryOpType, unary_op_type>;
|
||||
@ -177,7 +177,7 @@ KOKKOS_FUNCTION OutputIteratorType exclusive_scan_custom_op_team_impl(
|
||||
|
||||
// aliases
|
||||
using exe_space = typename TeamHandleType::execution_space;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<ValueType>;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
|
||||
using index_type = typename InputIteratorType::difference_type;
|
||||
using func_type = TransformExclusiveScanFunctorWithoutValueWrapper<
|
||||
exe_space, index_type, ValueType, InputIteratorType, OutputIteratorType,
|
||||
|
||||
@ -23,10 +23,11 @@ namespace Kokkos {
|
||||
namespace Experimental {
|
||||
namespace Impl {
|
||||
|
||||
template <class ValueType>
|
||||
struct StdNumericScanIdentityReferenceUnaryFunctor {
|
||||
KOKKOS_FUNCTION
|
||||
constexpr const ValueType& operator()(const ValueType& a) const { return a; }
|
||||
template <class T>
|
||||
KOKKOS_FUNCTION constexpr T&& operator()(T&& t) const {
|
||||
return static_cast<T&&>(t);
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace Impl
|
||||
|
||||
@ -18,12 +18,60 @@
|
||||
#define KOKKOS_STD_ALGORITHMS_INCLUSIVE_SCAN_IMPL_HPP
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
#include <Kokkos_Profiling_ScopedRegion.hpp>
|
||||
#include "Kokkos_Constraints.hpp"
|
||||
#include "Kokkos_HelperPredicates.hpp"
|
||||
#include <std_algorithms/Kokkos_TransformInclusiveScan.hpp>
|
||||
#include <std_algorithms/Kokkos_Distance.hpp>
|
||||
#include <string>
|
||||
|
||||
#if defined(KOKKOS_ENABLE_CUDA)
|
||||
|
||||
// Workaround for `Instruction 'shfl' without '.sync' is not supported on
|
||||
// .target sm_70 and higher from PTX ISA version 6.4`.
|
||||
// Also see https://github.com/NVIDIA/cub/pull/170.
|
||||
#if !defined(CUB_USE_COOPERATIVE_GROUPS)
|
||||
#define CUB_USE_COOPERATIVE_GROUPS
|
||||
#endif
|
||||
|
||||
#pragma GCC diagnostic push
|
||||
#pragma GCC diagnostic ignored "-Wshadow"
|
||||
#pragma GCC diagnostic ignored "-Wsuggest-override"
|
||||
|
||||
#if defined(KOKKOS_COMPILER_CLANG)
|
||||
// Some versions of Clang fail to compile Thrust, failing with errors like
|
||||
// this:
|
||||
// <snip>/thrust/system/cuda/detail/core/agent_launcher.h:557:11:
|
||||
// error: use of undeclared identifier 'va_printf'
|
||||
// The exact combination of versions for Clang and Thrust (or CUDA) for this
|
||||
// failure was not investigated, however even very recent version combination
|
||||
// (Clang 10.0.0 and Cuda 10.0) demonstrated failure.
|
||||
//
|
||||
// Defining _CubLog here locally allows us to avoid that code path, however
|
||||
// disabling some debugging diagnostics
|
||||
#pragma push_macro("_CubLog")
|
||||
#ifdef _CubLog
|
||||
#undef _CubLog
|
||||
#endif
|
||||
// NOLINTNEXTLINE(bugprone-reserved-identifier)
|
||||
#define _CubLog
|
||||
#include <thrust/distance.h>
|
||||
#include <thrust/scan.h>
|
||||
#pragma pop_macro("_CubLog")
|
||||
#else
|
||||
#include <thrust/distance.h>
|
||||
#include <thrust/scan.h>
|
||||
#endif
|
||||
|
||||
#pragma GCC diagnostic pop
|
||||
|
||||
#endif
|
||||
|
||||
#if defined(KOKKOS_ENABLE_ROCTHRUST)
|
||||
#include <thrust/distance.h>
|
||||
#include <thrust/scan.h>
|
||||
#endif
|
||||
|
||||
namespace Kokkos {
|
||||
namespace Experimental {
|
||||
namespace Impl {
|
||||
@ -101,9 +149,48 @@ struct InclusiveScanDefaultFunctor {
|
||||
}
|
||||
};
|
||||
|
||||
//
|
||||
// exespace impl
|
||||
//
|
||||
// -------------------------------------------------------------
|
||||
// inclusive_scan_default_op_exespace_impl
|
||||
// -------------------------------------------------------------
|
||||
|
||||
#if defined(KOKKOS_ENABLE_CUDA)
|
||||
template <class InputIteratorType, class OutputIteratorType>
|
||||
OutputIteratorType inclusive_scan_default_op_exespace_impl(
|
||||
const std::string& label, const Cuda& ex, InputIteratorType first_from,
|
||||
InputIteratorType last_from, OutputIteratorType first_dest) {
|
||||
const auto thrust_ex = thrust::cuda::par.on(ex.cuda_stream());
|
||||
|
||||
Kokkos::Profiling::pushRegion(label + " via thrust::inclusive_scan");
|
||||
|
||||
thrust::inclusive_scan(thrust_ex, first_from, last_from, first_dest);
|
||||
|
||||
Kokkos::Profiling::popRegion();
|
||||
|
||||
const auto num_elements = thrust::distance(first_from, last_from);
|
||||
|
||||
return first_dest + num_elements;
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(KOKKOS_ENABLE_ROCTHRUST)
|
||||
template <class InputIteratorType, class OutputIteratorType>
|
||||
OutputIteratorType inclusive_scan_default_op_exespace_impl(
|
||||
const std::string& label, const HIP& ex, InputIteratorType first_from,
|
||||
InputIteratorType last_from, OutputIteratorType first_dest) {
|
||||
const auto thrust_ex = thrust::hip::par.on(ex.hip_stream());
|
||||
|
||||
Kokkos::Profiling::pushRegion(label + " via thrust::inclusive_scan");
|
||||
|
||||
thrust::inclusive_scan(thrust_ex, first_from, last_from, first_dest);
|
||||
|
||||
Kokkos::Profiling::popRegion();
|
||||
|
||||
const auto num_elements = thrust::distance(first_from, last_from);
|
||||
|
||||
return first_dest + num_elements;
|
||||
}
|
||||
#endif
|
||||
|
||||
template <class ExecutionSpace, class InputIteratorType,
|
||||
class OutputIteratorType>
|
||||
OutputIteratorType inclusive_scan_default_op_exespace_impl(
|
||||
@ -132,11 +219,16 @@ OutputIteratorType inclusive_scan_default_op_exespace_impl(
|
||||
// run
|
||||
const auto num_elements =
|
||||
Kokkos::Experimental::distance(first_from, last_from);
|
||||
|
||||
Kokkos::Profiling::pushRegion(label + " via Kokkos::parallel_scan");
|
||||
|
||||
::Kokkos::parallel_scan(label,
|
||||
RangePolicy<ExecutionSpace>(ex, 0, num_elements),
|
||||
func_type(first_from, first_dest));
|
||||
ex.fence("Kokkos::inclusive_scan_default_op: fence after operation");
|
||||
|
||||
Kokkos::Profiling::popRegion();
|
||||
|
||||
// return
|
||||
return first_dest + num_elements;
|
||||
}
|
||||
@ -144,6 +236,49 @@ OutputIteratorType inclusive_scan_default_op_exespace_impl(
|
||||
// -------------------------------------------------------------
|
||||
// inclusive_scan_custom_binary_op_impl
|
||||
// -------------------------------------------------------------
|
||||
|
||||
#if defined(KOKKOS_ENABLE_CUDA)
|
||||
template <class InputIteratorType, class OutputIteratorType, class BinaryOpType>
|
||||
OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
|
||||
const std::string& label, const Cuda& ex, InputIteratorType first_from,
|
||||
InputIteratorType last_from, OutputIteratorType first_dest,
|
||||
BinaryOpType binary_op) {
|
||||
const auto thrust_ex = thrust::cuda::par.on(ex.cuda_stream());
|
||||
|
||||
Kokkos::Profiling::pushRegion(label + " via thrust::inclusive_scan");
|
||||
|
||||
thrust::inclusive_scan(thrust_ex, first_from, last_from, first_dest,
|
||||
binary_op);
|
||||
|
||||
Kokkos::Profiling::popRegion();
|
||||
|
||||
const auto num_elements = thrust::distance(first_from, last_from);
|
||||
|
||||
return first_dest + num_elements;
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(KOKKOS_ENABLE_ROCTHRUST)
|
||||
template <class InputIteratorType, class OutputIteratorType, class BinaryOpType>
|
||||
OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
|
||||
const std::string& label, const HIP& ex, InputIteratorType first_from,
|
||||
InputIteratorType last_from, OutputIteratorType first_dest,
|
||||
BinaryOpType binary_op) {
|
||||
const auto thrust_ex = thrust::hip::par.on(ex.hip_stream());
|
||||
|
||||
Kokkos::Profiling::pushRegion(label + " via thrust::inclusive_scan");
|
||||
|
||||
thrust::inclusive_scan(thrust_ex, first_from, last_from, first_dest,
|
||||
binary_op);
|
||||
|
||||
Kokkos::Profiling::popRegion();
|
||||
|
||||
const auto num_elements = thrust::distance(first_from, last_from);
|
||||
|
||||
return first_dest + num_elements;
|
||||
}
|
||||
#endif
|
||||
|
||||
template <class ExecutionSpace, class InputIteratorType,
|
||||
class OutputIteratorType, class BinaryOpType>
|
||||
OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
|
||||
@ -160,7 +295,7 @@ OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
|
||||
using index_type = typename InputIteratorType::difference_type;
|
||||
using value_type =
|
||||
std::remove_const_t<typename InputIteratorType::value_type>;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<value_type>;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
|
||||
using func_type = ExeSpaceTransformInclusiveScanNoInitValueFunctor<
|
||||
ExecutionSpace, index_type, value_type, InputIteratorType,
|
||||
OutputIteratorType, BinaryOpType, unary_op_type>;
|
||||
@ -168,11 +303,16 @@ OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
|
||||
// run
|
||||
const auto num_elements =
|
||||
Kokkos::Experimental::distance(first_from, last_from);
|
||||
|
||||
Kokkos::Profiling::pushRegion(label + " via Kokkos::parallel_scan");
|
||||
|
||||
::Kokkos::parallel_scan(
|
||||
label, RangePolicy<ExecutionSpace>(ex, 0, num_elements),
|
||||
func_type(first_from, first_dest, binary_op, unary_op_type()));
|
||||
ex.fence("Kokkos::inclusive_scan_custom_binary_op: fence after operation");
|
||||
|
||||
Kokkos::Profiling::popRegion();
|
||||
|
||||
// return
|
||||
return first_dest + num_elements;
|
||||
}
|
||||
@ -195,7 +335,7 @@ OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
|
||||
|
||||
// aliases
|
||||
using index_type = typename InputIteratorType::difference_type;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<ValueType>;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
|
||||
using func_type = ExeSpaceTransformInclusiveScanWithInitValueFunctor<
|
||||
ExecutionSpace, index_type, ValueType, InputIteratorType,
|
||||
OutputIteratorType, BinaryOpType, unary_op_type>;
|
||||
@ -203,12 +343,17 @@ OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
|
||||
// run
|
||||
const auto num_elements =
|
||||
Kokkos::Experimental::distance(first_from, last_from);
|
||||
|
||||
Kokkos::Profiling::pushRegion(label + " via Kokkos::parallel_scan");
|
||||
|
||||
::Kokkos::parallel_scan(label,
|
||||
RangePolicy<ExecutionSpace>(ex, 0, num_elements),
|
||||
func_type(first_from, first_dest, binary_op,
|
||||
unary_op_type(), std::move(init_value)));
|
||||
ex.fence("Kokkos::inclusive_scan_custom_binary_op: fence after operation");
|
||||
|
||||
Kokkos::Profiling::popRegion();
|
||||
|
||||
// return
|
||||
return first_dest + num_elements;
|
||||
}
|
||||
@ -283,7 +428,7 @@ KOKKOS_FUNCTION OutputIteratorType inclusive_scan_custom_binary_op_team_impl(
|
||||
|
||||
// aliases
|
||||
using exe_space = typename TeamHandleType::execution_space;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<value_type>;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
|
||||
using func_type = TeamTransformInclusiveScanNoInitValueFunctor<
|
||||
exe_space, value_type, InputIteratorType, OutputIteratorType,
|
||||
BinaryOpType, unary_op_type>;
|
||||
@ -291,7 +436,6 @@ KOKKOS_FUNCTION OutputIteratorType inclusive_scan_custom_binary_op_team_impl(
|
||||
// run
|
||||
const auto num_elements =
|
||||
Kokkos::Experimental::distance(first_from, last_from);
|
||||
|
||||
::Kokkos::parallel_scan(
|
||||
TeamThreadRange(teamHandle, 0, num_elements),
|
||||
func_type(first_from, first_dest, binary_op, unary_op_type()));
|
||||
@ -325,7 +469,7 @@ KOKKOS_FUNCTION OutputIteratorType inclusive_scan_custom_binary_op_team_impl(
|
||||
|
||||
// aliases
|
||||
using exe_space = typename TeamHandleType::execution_space;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<ValueType>;
|
||||
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
|
||||
using func_type = TeamTransformInclusiveScanWithInitValueFunctor<
|
||||
exe_space, ValueType, InputIteratorType, OutputIteratorType, BinaryOpType,
|
||||
unary_op_type>;
|
||||
|
||||
@ -18,6 +18,7 @@
|
||||
#define KOKKOS_RANDOM_ACCESS_ITERATOR_IMPL_HPP
|
||||
|
||||
#include <iterator>
|
||||
#include <utility> // declval
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#include <Kokkos_View.hpp>
|
||||
#include "Kokkos_Constraints.hpp"
|
||||
@ -29,8 +30,29 @@ namespace Impl {
|
||||
template <class T>
|
||||
class RandomAccessIterator;
|
||||
|
||||
namespace {
|
||||
|
||||
template <typename ViewType>
|
||||
struct is_always_strided {
|
||||
static_assert(is_view_v<ViewType>);
|
||||
|
||||
constexpr static bool value =
|
||||
#ifdef KOKKOS_ENABLE_IMPL_MDSPAN
|
||||
decltype(std::declval<ViewType>().to_mdspan())::is_always_strided();
|
||||
#else
|
||||
(std::is_same_v<typename ViewType::traits::array_layout,
|
||||
Kokkos::LayoutLeft> ||
|
||||
std::is_same_v<typename ViewType::traits::array_layout,
|
||||
Kokkos::LayoutRight> ||
|
||||
std::is_same_v<typename ViewType::traits::array_layout,
|
||||
Kokkos::LayoutStride>);
|
||||
#endif
|
||||
};
|
||||
|
||||
} // namespace
|
||||
|
||||
template <class DataType, class... Args>
|
||||
class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
|
||||
class RandomAccessIterator<::Kokkos::View<DataType, Args...>> {
|
||||
public:
|
||||
using view_type = ::Kokkos::View<DataType, Args...>;
|
||||
using iterator_type = RandomAccessIterator<view_type>;
|
||||
@ -41,30 +63,31 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
|
||||
using pointer = typename view_type::pointer_type;
|
||||
using reference = typename view_type::reference_type;
|
||||
|
||||
// oneDPL needs this alias in order not to assume the data is on the host but on
|
||||
// the device, see
|
||||
// https://github.com/uxlfoundation/oneDPL/blob/a045eac689f9107f50ba7b42235e9e927118e483/include/oneapi/dpl/pstl/hetero/dpcpp/utils_ranges_sycl.h#L210-L214
|
||||
#ifdef KOKKOS_ENABLE_ONEDPL
|
||||
using is_passed_directly = std::true_type;
|
||||
#endif
|
||||
|
||||
static_assert(view_type::rank == 1 &&
|
||||
(std::is_same_v<typename view_type::traits::array_layout,
|
||||
Kokkos::LayoutLeft> ||
|
||||
std::is_same_v<typename view_type::traits::array_layout,
|
||||
Kokkos::LayoutRight> ||
|
||||
std::is_same_v<typename view_type::traits::array_layout,
|
||||
Kokkos::LayoutStride>),
|
||||
"RandomAccessIterator only supports 1D Views with LayoutLeft, "
|
||||
"LayoutRight, LayoutStride.");
|
||||
is_always_strided<::Kokkos::View<DataType, Args...>>::value);
|
||||
|
||||
KOKKOS_DEFAULTED_FUNCTION RandomAccessIterator() = default;
|
||||
|
||||
explicit KOKKOS_FUNCTION RandomAccessIterator(const view_type view)
|
||||
: m_view(view) {}
|
||||
: m_data(view.data()), m_stride(view.stride_0()) {}
|
||||
explicit KOKKOS_FUNCTION RandomAccessIterator(const view_type view,
|
||||
ptrdiff_t current_index)
|
||||
: m_view(view), m_current_index(current_index) {}
|
||||
: m_data(view.data() + current_index * view.stride_0()),
|
||||
m_stride(view.stride_0()) {}
|
||||
|
||||
#ifndef KOKKOS_ENABLE_CXX17 // C++20 and beyond
|
||||
template <class OtherViewType>
|
||||
requires(std::is_constructible_v<view_type, OtherViewType>)
|
||||
KOKKOS_FUNCTION explicit(!std::is_convertible_v<OtherViewType, view_type>)
|
||||
RandomAccessIterator(const RandomAccessIterator<OtherViewType>& other)
|
||||
: m_view(other.m_view), m_current_index(other.m_current_index) {}
|
||||
: m_data(other.m_data), m_stride(other.m_stride) {}
|
||||
#else
|
||||
template <
|
||||
class OtherViewType,
|
||||
@ -73,19 +96,22 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
|
||||
int> = 0>
|
||||
KOKKOS_FUNCTION explicit RandomAccessIterator(
|
||||
const RandomAccessIterator<OtherViewType>& other)
|
||||
: m_view(other.m_view), m_current_index(other.m_current_index) {}
|
||||
: m_data(other.m_data), m_stride(other.m_stride) {}
|
||||
|
||||
template <class OtherViewType,
|
||||
std::enable_if_t<std::is_convertible_v<OtherViewType, view_type>,
|
||||
int> = 0>
|
||||
KOKKOS_FUNCTION RandomAccessIterator(
|
||||
const RandomAccessIterator<OtherViewType>& other)
|
||||
: m_view(other.m_view), m_current_index(other.m_current_index) {}
|
||||
: m_data(other.m_data), m_stride(other.m_stride) {}
|
||||
#endif
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
iterator_type& operator++() {
|
||||
++m_current_index;
|
||||
if constexpr (is_always_contiguous)
|
||||
m_data++;
|
||||
else
|
||||
m_data += m_stride;
|
||||
return *this;
|
||||
}
|
||||
|
||||
@ -98,7 +124,10 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
iterator_type& operator--() {
|
||||
--m_current_index;
|
||||
if constexpr (is_always_contiguous)
|
||||
m_data--;
|
||||
else
|
||||
m_data -= m_stride;
|
||||
return *this;
|
||||
}
|
||||
|
||||
@ -111,77 +140,95 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
reference operator[](difference_type n) const {
|
||||
return m_view(m_current_index + n);
|
||||
if constexpr (is_always_contiguous)
|
||||
return *(m_data + n);
|
||||
else
|
||||
return *(m_data + n * m_stride);
|
||||
}
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
iterator_type& operator+=(difference_type n) {
|
||||
m_current_index += n;
|
||||
if constexpr (is_always_contiguous)
|
||||
m_data += n;
|
||||
else
|
||||
m_data += n * m_stride;
|
||||
return *this;
|
||||
}
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
iterator_type& operator-=(difference_type n) {
|
||||
m_current_index -= n;
|
||||
if constexpr (is_always_contiguous)
|
||||
m_data -= n;
|
||||
else
|
||||
m_data -= n * m_stride;
|
||||
return *this;
|
||||
}
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
iterator_type operator+(difference_type n) const {
|
||||
return iterator_type(m_view, m_current_index + n);
|
||||
auto it = *this;
|
||||
it += n;
|
||||
return it;
|
||||
}
|
||||
|
||||
friend iterator_type operator+(difference_type n, iterator_type other) {
|
||||
return other + n;
|
||||
}
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
iterator_type operator-(difference_type n) const {
|
||||
return iterator_type(m_view, m_current_index - n);
|
||||
auto it = *this;
|
||||
it -= n;
|
||||
return it;
|
||||
}
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
difference_type operator-(iterator_type it) const {
|
||||
return m_current_index - it.m_current_index;
|
||||
if constexpr (is_always_contiguous)
|
||||
return m_data - it.m_data;
|
||||
else
|
||||
return (m_data - it.m_data) / m_stride;
|
||||
}
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
bool operator==(iterator_type other) const {
|
||||
return m_current_index == other.m_current_index &&
|
||||
m_view.data() == other.m_view.data();
|
||||
return m_data == other.m_data && m_stride == other.m_stride;
|
||||
}
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
bool operator!=(iterator_type other) const {
|
||||
return m_current_index != other.m_current_index ||
|
||||
m_view.data() != other.m_view.data();
|
||||
return m_data != other.m_data || m_stride != other.m_stride;
|
||||
}
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
bool operator<(iterator_type other) const {
|
||||
return m_current_index < other.m_current_index;
|
||||
}
|
||||
bool operator<(iterator_type other) const { return m_data < other.m_data; }
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
bool operator<=(iterator_type other) const {
|
||||
return m_current_index <= other.m_current_index;
|
||||
}
|
||||
bool operator<=(iterator_type other) const { return m_data <= other.m_data; }
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
bool operator>(iterator_type other) const {
|
||||
return m_current_index > other.m_current_index;
|
||||
}
|
||||
bool operator>(iterator_type other) const { return m_data > other.m_data; }
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
bool operator>=(iterator_type other) const {
|
||||
return m_current_index >= other.m_current_index;
|
||||
}
|
||||
bool operator>=(iterator_type other) const { return m_data >= other.m_data; }
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
reference operator*() const { return m_view(m_current_index); }
|
||||
reference operator*() const { return *m_data; }
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
view_type view() const { return m_view; }
|
||||
pointer data() const { return m_data; }
|
||||
|
||||
KOKKOS_FUNCTION
|
||||
int stride() const { return m_stride; }
|
||||
|
||||
private:
|
||||
view_type m_view;
|
||||
ptrdiff_t m_current_index = 0;
|
||||
pointer m_data;
|
||||
int m_stride;
|
||||
static constexpr bool is_always_contiguous =
|
||||
(std::is_same_v<typename view_type::traits::array_layout,
|
||||
Kokkos::LayoutLeft> ||
|
||||
std::is_same_v<typename view_type::traits::array_layout,
|
||||
Kokkos::LayoutRight>);
|
||||
|
||||
// Needed for the converting constructor accepting another iterator
|
||||
template <class>
|
||||
@ -192,4 +239,10 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
|
||||
} // namespace Experimental
|
||||
} // namespace Kokkos
|
||||
|
||||
#ifdef KOKKOS_ENABLE_SYCL
|
||||
template <class T>
|
||||
struct sycl::is_device_copyable<
|
||||
Kokkos::Experimental::Impl::RandomAccessIterator<T>> : std::true_type {};
|
||||
#endif
|
||||
|
||||
#endif
|
||||
|
||||
@ -52,13 +52,10 @@ struct StdUniqueFunctor {
|
||||
auto& val_i = m_first_from[i];
|
||||
const auto& val_ip1 = m_first_from[i + 1];
|
||||
|
||||
if (final_pass) {
|
||||
if (!m_pred(val_i, val_ip1)) {
|
||||
if (!m_pred(val_i, val_ip1)) {
|
||||
if (final_pass) {
|
||||
m_first_dest[update] = std::move(val_i);
|
||||
}
|
||||
}
|
||||
|
||||
if (!m_pred(val_i, val_ip1)) {
|
||||
update += 1;
|
||||
}
|
||||
}
|
||||
@ -188,6 +185,7 @@ KOKKOS_FUNCTION IteratorType unique_team_impl(const TeamHandleType& teamHandle,
|
||||
IteratorType result = first;
|
||||
IteratorType lfirst = first;
|
||||
while (++lfirst != last) {
|
||||
// NOLINTNEXTLINE(bugprone-inc-dec-in-conditions)
|
||||
if (!pred(*result, *lfirst) && ++result != lfirst) {
|
||||
*result = std::move(*lfirst);
|
||||
}
|
||||
|
||||
@ -175,9 +175,8 @@ KOKKOS_FUNCTION OutputIterator unique_copy_team_impl(
|
||||
d_first + count);
|
||||
}
|
||||
|
||||
#if defined KOKKOS_COMPILER_INTEL || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
|
||||
@ -18,6 +18,8 @@ LINK ?= $(CXX)
|
||||
LDFLAGS ?=
|
||||
override LDFLAGS += -lpthread
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||
|
||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/algorithms/unit_tests -I${KOKKOS_PATH}/core/unit_test/category_files
|
||||
|
||||
@ -281,7 +281,7 @@ struct test_random_scalar {
|
||||
double covariance_eps =
|
||||
result.covariance / num_draws / 2 / variance_expect;
|
||||
#if defined(KOKKOS_BHALF_T_IS_FLOAT) && !KOKKOS_BHALF_T_IS_FLOAT
|
||||
if (!std::is_same<Scalar, Kokkos::Experimental::bhalf_t>::value) {
|
||||
if (!std::is_same_v<Scalar, Kokkos::Experimental::bhalf_t>) {
|
||||
#endif
|
||||
EXPECT_LT(std::abs(mean_eps), tolerance);
|
||||
EXPECT_LT(std::abs(variance_eps), 1.5 * tolerance);
|
||||
@ -312,7 +312,7 @@ struct test_random_scalar {
|
||||
(result.covariance / HIST_DIM1D - covariance_expect) / mean_expect;
|
||||
|
||||
#if defined(KOKKOS_HALF_T_IS_FLOAT) && !KOKKOS_HALF_T_IS_FLOAT
|
||||
if (std::is_same<Scalar, Kokkos::Experimental::half_t>::value) {
|
||||
if (std::is_same_v<Scalar, Kokkos::Experimental::half_t>) {
|
||||
mean_eps_expect = 0.0003;
|
||||
variance_eps_expect = 1.0;
|
||||
covariance_eps_expect = 5.0e4;
|
||||
@ -320,7 +320,7 @@ struct test_random_scalar {
|
||||
#endif
|
||||
|
||||
#if defined(KOKKOS_BHALF_T_IS_FLOAT) && !KOKKOS_BHALF_T_IS_FLOAT
|
||||
if (!std::is_same<Scalar, Kokkos::Experimental::bhalf_t>::value) {
|
||||
if (!std::is_same_v<Scalar, Kokkos::Experimental::bhalf_t>) {
|
||||
#endif
|
||||
EXPECT_LT(std::abs(mean_eps), mean_eps_expect);
|
||||
EXPECT_LT(std::abs(variance_eps), variance_eps_expect);
|
||||
@ -358,13 +358,13 @@ struct test_random_scalar {
|
||||
(result.covariance / HIST_DIM1D - covariance_expect) / mean_expect;
|
||||
|
||||
#if defined(KOKKOS_HALF_T_IS_FLOAT) && !KOKKOS_HALF_T_IS_FLOAT
|
||||
if (std::is_same<Scalar, Kokkos::Experimental::half_t>::value) {
|
||||
if (std::is_same_v<Scalar, Kokkos::Experimental::half_t>) {
|
||||
variance_factor = 7;
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(KOKKOS_BHALF_T_IS_FLOAT) && !KOKKOS_BHALF_T_IS_FLOAT
|
||||
if (!std::is_same<Scalar, Kokkos::Experimental::bhalf_t>::value) {
|
||||
if (!std::is_same_v<Scalar, Kokkos::Experimental::bhalf_t>) {
|
||||
#endif
|
||||
EXPECT_LT(std::abs(mean_eps), tolerance);
|
||||
EXPECT_LT(std::abs(variance_eps), variance_factor);
|
||||
|
||||
@ -37,12 +37,18 @@ struct random_access_iterator_test : std_algorithms_test {
|
||||
|
||||
TEST_F(random_access_iterator_test, constructor) {
|
||||
// just tests that constructor works
|
||||
auto it1 = KE::Impl::RandomAccessIterator<static_view_t>(m_static_view);
|
||||
auto it2 = KE::Impl::RandomAccessIterator<dyn_view_t>(m_dynamic_view);
|
||||
auto it3 = KE::Impl::RandomAccessIterator<strided_view_t>(m_strided_view);
|
||||
auto it4 = KE::Impl::RandomAccessIterator<static_view_t>(m_static_view, 3);
|
||||
auto it5 = KE::Impl::RandomAccessIterator<dyn_view_t>(m_dynamic_view, 3);
|
||||
auto it6 = KE::Impl::RandomAccessIterator<strided_view_t>(m_strided_view, 3);
|
||||
[[maybe_unused]] auto it1 =
|
||||
KE::Impl::RandomAccessIterator<static_view_t>(m_static_view);
|
||||
[[maybe_unused]] auto it2 =
|
||||
KE::Impl::RandomAccessIterator<dyn_view_t>(m_dynamic_view);
|
||||
[[maybe_unused]] auto it3 =
|
||||
KE::Impl::RandomAccessIterator<strided_view_t>(m_strided_view);
|
||||
[[maybe_unused]] auto it4 =
|
||||
KE::Impl::RandomAccessIterator<static_view_t>(m_static_view, 3);
|
||||
[[maybe_unused]] auto it5 =
|
||||
KE::Impl::RandomAccessIterator<dyn_view_t>(m_dynamic_view, 3);
|
||||
[[maybe_unused]] auto it6 =
|
||||
KE::Impl::RandomAccessIterator<strided_view_t>(m_strided_view, 3);
|
||||
EXPECT_TRUE(true);
|
||||
}
|
||||
|
||||
|
||||
@ -99,6 +99,7 @@ void test_dynamic_view_sort_impl(unsigned int n) {
|
||||
Kokkos::Experimental::DynamicView<KeyType*, ExecutionSpace>;
|
||||
using KeyViewType = Kokkos::View<KeyType*, ExecutionSpace>;
|
||||
|
||||
// NOLINTNEXTLINE(bugprone-implicit-widening-of-multiplication-result)
|
||||
const size_t upper_bound = 2 * n;
|
||||
const size_t min_chunk_size = 1024;
|
||||
|
||||
|
||||
@ -198,9 +198,8 @@ auto create_deep_copyable_compatible_view_with_same_extent(ViewType view) {
|
||||
|
||||
// this is needed for intel to avoid
|
||||
// error #1011: missing return statement at end of non-void function
|
||||
#if defined KOKKOS_COMPILER_INTEL || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
|
||||
@ -507,6 +507,20 @@ struct TestStruct {
|
||||
}
|
||||
};
|
||||
|
||||
#ifndef KOKKOS_ENABLE_CXX17
|
||||
template <typename ViewType>
|
||||
constexpr bool
|
||||
test_kokkos_iterator_satify_std_random_access_iterator_concept() {
|
||||
return std::random_access_iterator<
|
||||
Kokkos::Experimental::Impl::RandomAccessIterator<ViewType>>;
|
||||
}
|
||||
|
||||
static_assert(test_kokkos_iterator_satify_std_random_access_iterator_concept<
|
||||
Kokkos::View<int *>>());
|
||||
static_assert(test_kokkos_iterator_satify_std_random_access_iterator_concept<
|
||||
Kokkos::View<const int *>>());
|
||||
#endif
|
||||
|
||||
} // namespace compileonly
|
||||
} // namespace stdalgos
|
||||
} // namespace Test
|
||||
|
||||
@ -173,6 +173,7 @@ TEST(std_algorithms_DeathTest, expect_no_overlap) {
|
||||
|
||||
KE::Impl::expect_no_overlap(sub_first_d0, sub_last_d0, sub_first_d1);
|
||||
|
||||
// NOLINTNEXTLINE(bugprone-implicit-widening-of-multiplication-result)
|
||||
Kokkos::LayoutStride layout2d{2, 3, extent0, 2 * 3};
|
||||
Kokkos::View<value_type**, Kokkos::LayoutStride> strided_view_2d{
|
||||
"std-algo-test-2d-contiguous-view-strided", layout2d};
|
||||
|
||||
@ -171,7 +171,7 @@ struct VerifyData {
|
||||
create_mirror_view_and_copy(Kokkos::HostSpace(), test_view_dc);
|
||||
if (test_view_h.extent(0) > 0) {
|
||||
for (std::size_t i = 0; i < test_view_h.extent(0); ++i) {
|
||||
if (std::is_same<gold_view_value_type, int>::value) {
|
||||
if (std::is_same_v<gold_view_value_type, int>) {
|
||||
ASSERT_EQ(gold_h(i), test_view_h(i));
|
||||
} else {
|
||||
const auto error =
|
||||
|
||||
@ -184,7 +184,7 @@ struct VerifyData {
|
||||
const auto ext = test_view_h.extent(0);
|
||||
if (ext > 0) {
|
||||
for (std::size_t i = 0; i < ext; ++i) {
|
||||
if (std::is_same<gold_view_value_type, int>::value) {
|
||||
if (std::is_same_v<gold_view_value_type, int>) {
|
||||
ASSERT_EQ(gold_h(i), test_view_h(i));
|
||||
} else {
|
||||
const auto error =
|
||||
|
||||
@ -153,12 +153,13 @@ void run_single_scenario(const InfoType& scenario_info) {
|
||||
|
||||
#if !defined KOKKOS_ENABLE_OPENMPTARGET
|
||||
CustomLessThanComparator<ValueType, ValueType> comp;
|
||||
auto r5 =
|
||||
[[maybe_unused]] auto r5 =
|
||||
KE::is_sorted_until(exespace(), KE::cbegin(view), KE::cend(view), comp);
|
||||
auto r6 = KE::is_sorted_until("label", exespace(), KE::cbegin(view),
|
||||
KE::cend(view), comp);
|
||||
auto r7 = KE::is_sorted_until(exespace(), view, comp);
|
||||
auto r8 = KE::is_sorted_until("label", exespace(), view, comp);
|
||||
[[maybe_unused]] auto r6 = KE::is_sorted_until(
|
||||
"label", exespace(), KE::cbegin(view), KE::cend(view), comp);
|
||||
[[maybe_unused]] auto r7 = KE::is_sorted_until(exespace(), view, comp);
|
||||
[[maybe_unused]] auto r8 =
|
||||
KE::is_sorted_until("label", exespace(), view, comp);
|
||||
#endif
|
||||
|
||||
ASSERT_EQ(r1, gold) << name << ", " << view_tag_to_string(Tag{});
|
||||
|
||||
@ -53,13 +53,13 @@ TEST(std_algorithms_mod_ops_test, move) {
|
||||
// move constr
|
||||
MyMovableType b(std::move(a));
|
||||
ASSERT_EQ(b.m_value, 11);
|
||||
ASSERT_EQ(a.m_value, -2);
|
||||
ASSERT_EQ(a.m_value, -2); // NOLINT(bugprone-use-after-move)
|
||||
|
||||
// move assign
|
||||
MyMovableType c;
|
||||
c = std::move(b);
|
||||
ASSERT_EQ(c.m_value, 11);
|
||||
ASSERT_EQ(b.m_value, -4);
|
||||
ASSERT_EQ(b.m_value, -4); // NOLINT(bugprone-use-after-move)
|
||||
}
|
||||
|
||||
template <class ViewType>
|
||||
@ -70,7 +70,7 @@ struct StdAlgoModSeqOpsTestMove {
|
||||
void operator()(const int index) const {
|
||||
typename ViewType::value_type a{11};
|
||||
using move_t = decltype(std::move(a));
|
||||
static_assert(std::is_rvalue_reference<move_t>::value);
|
||||
static_assert(std::is_rvalue_reference_v<move_t>);
|
||||
m_view(index) = std::move(a);
|
||||
}
|
||||
|
||||
|
||||
@ -243,16 +243,15 @@ void run_and_check_transform_reduce_overloadA(ViewType1 first_view,
|
||||
ViewType2 second_view,
|
||||
ValueType init_value,
|
||||
ValueType result_value,
|
||||
Args&&... args) {
|
||||
Args const&... args) {
|
||||
// trivial cases
|
||||
const auto r1 = KE::transform_reduce(
|
||||
ExecutionSpace(), KE::cbegin(first_view), KE::cbegin(first_view),
|
||||
KE::cbegin(second_view), init_value, std::forward<Args>(args)...);
|
||||
KE::cbegin(second_view), init_value, args...);
|
||||
|
||||
const auto r2 =
|
||||
KE::transform_reduce("MYLABEL", ExecutionSpace(), KE::cbegin(first_view),
|
||||
KE::cbegin(first_view), KE::cbegin(second_view),
|
||||
init_value, std::forward<Args>(args)...);
|
||||
const auto r2 = KE::transform_reduce(
|
||||
"MYLABEL", ExecutionSpace(), KE::cbegin(first_view),
|
||||
KE::cbegin(first_view), KE::cbegin(second_view), init_value, args...);
|
||||
|
||||
ASSERT_EQ(r1, init_value);
|
||||
ASSERT_EQ(r2, init_value);
|
||||
@ -260,18 +259,16 @@ void run_and_check_transform_reduce_overloadA(ViewType1 first_view,
|
||||
// non trivial cases
|
||||
const auto r3 = KE::transform_reduce(
|
||||
ExecutionSpace(), KE::cbegin(first_view), KE::cend(first_view),
|
||||
KE::cbegin(second_view), init_value, std::forward<Args>(args)...);
|
||||
KE::cbegin(second_view), init_value, args...);
|
||||
|
||||
const auto r4 = KE::transform_reduce(
|
||||
"MYLABEL", ExecutionSpace(), KE::cbegin(first_view), KE::cend(first_view),
|
||||
KE::cbegin(second_view), init_value, std::forward<Args>(args)...);
|
||||
KE::cbegin(second_view), init_value, args...);
|
||||
|
||||
const auto r5 =
|
||||
KE::transform_reduce(ExecutionSpace(), first_view, second_view,
|
||||
init_value, std::forward<Args>(args)...);
|
||||
const auto r6 =
|
||||
KE::transform_reduce("MYLABEL", ExecutionSpace(), first_view, second_view,
|
||||
init_value, std::forward<Args>(args)...);
|
||||
const auto r5 = KE::transform_reduce(ExecutionSpace(), first_view,
|
||||
second_view, init_value, args...);
|
||||
const auto r6 = KE::transform_reduce("MYLABEL", ExecutionSpace(), first_view,
|
||||
second_view, init_value, args...);
|
||||
|
||||
ASSERT_EQ(r3, result_value);
|
||||
ASSERT_EQ(r4, result_value);
|
||||
@ -363,32 +360,30 @@ template <class ExecutionSpace, class ViewType, class ValueType, class... Args>
|
||||
void run_and_check_transform_reduce_overloadB(ViewType view,
|
||||
ValueType init_value,
|
||||
ValueType result_value,
|
||||
Args&&... args) {
|
||||
Args const&... args) {
|
||||
// trivial
|
||||
const auto r1 =
|
||||
KE::transform_reduce(ExecutionSpace(), KE::cbegin(view), KE::cbegin(view),
|
||||
init_value, std::forward<Args>(args)...);
|
||||
const auto r1 = KE::transform_reduce(ExecutionSpace(), KE::cbegin(view),
|
||||
KE::cbegin(view), init_value, args...);
|
||||
|
||||
const auto r2 = KE::transform_reduce("MYLABEL", ExecutionSpace(),
|
||||
KE::cbegin(view), KE::cbegin(view),
|
||||
init_value, std::forward<Args>(args)...);
|
||||
const auto r2 =
|
||||
KE::transform_reduce("MYLABEL", ExecutionSpace(), KE::cbegin(view),
|
||||
KE::cbegin(view), init_value, args...);
|
||||
|
||||
ASSERT_EQ(r1, init_value);
|
||||
ASSERT_EQ(r2, init_value);
|
||||
|
||||
// non trivial
|
||||
const auto r3 =
|
||||
KE::transform_reduce(ExecutionSpace(), KE::cbegin(view), KE::cend(view),
|
||||
init_value, std::forward<Args>(args)...);
|
||||
const auto r3 = KE::transform_reduce(ExecutionSpace(), KE::cbegin(view),
|
||||
KE::cend(view), init_value, args...);
|
||||
|
||||
const auto r4 = KE::transform_reduce("MYLABEL", ExecutionSpace(),
|
||||
KE::cbegin(view), KE::cend(view),
|
||||
init_value, std::forward<Args>(args)...);
|
||||
const auto r5 = KE::transform_reduce(ExecutionSpace(), view, init_value,
|
||||
std::forward<Args>(args)...);
|
||||
const auto r4 =
|
||||
KE::transform_reduce("MYLABEL", ExecutionSpace(), KE::cbegin(view),
|
||||
KE::cend(view), init_value, args...);
|
||||
const auto r5 =
|
||||
KE::transform_reduce(ExecutionSpace(), view, init_value, args...);
|
||||
|
||||
const auto r6 = KE::transform_reduce("MYLABEL", ExecutionSpace(), view,
|
||||
init_value, std::forward<Args>(args)...);
|
||||
init_value, args...);
|
||||
|
||||
ASSERT_EQ(r3, result_value);
|
||||
ASSERT_EQ(r4, result_value);
|
||||
|
||||
@ -196,7 +196,7 @@ void run_single_scenario(const InfoType& scenario_info,
|
||||
// create host copy BEFORE rotate or view will be modified
|
||||
auto view_h = create_host_space_copy(view);
|
||||
auto rit = KE::rotate(exespace(), view, rotation_point);
|
||||
// verify_data(rit, view, view_h, rotation_point);
|
||||
verify_data(rit, view, view_h, rotation_point);
|
||||
}
|
||||
|
||||
{
|
||||
|
||||
@ -191,6 +191,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
|
||||
ASSERT_EQ(stdDistance, distancesView_h(i));
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@ -217,6 +217,7 @@ void test_A(const bool ensureAdjacentFindCanFind, std::size_t numTeams,
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -244,6 +244,7 @@ void test_A(const bool viewsAreEqual, std::size_t numTeams, std::size_t numCols,
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -224,6 +224,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
|
||||
break;
|
||||
}
|
||||
#endif
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
|
||||
#undef exclusive_scan
|
||||
|
||||
@ -227,6 +227,7 @@ void test_A(const bool sequencesExist, std::size_t numTeams,
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
|
||||
if (sequencesExist) {
|
||||
|
||||
@ -244,6 +244,7 @@ void test_A(const bool sequencesExist, std::size_t numTeams,
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -57,14 +57,7 @@ struct TestFunctorA {
|
||||
const auto myRowIndex = member.league_rank();
|
||||
auto myRowViewFrom = Kokkos::subview(m_dataView, myRowIndex, Kokkos::ALL());
|
||||
const auto val = m_greaterThanValuesView(myRowIndex);
|
||||
// FIXME_INTEL
|
||||
#if defined(KOKKOS_COMPILER_INTEL) && (1900 == KOKKOS_COMPILER_INTEL)
|
||||
GreaterEqualFunctor<
|
||||
typename GreaterThanValuesViewType::non_const_value_type>
|
||||
unaryPred{val};
|
||||
#else
|
||||
GreaterEqualFunctor unaryPred{val};
|
||||
#endif
|
||||
ptrdiff_t resultDist = 0;
|
||||
|
||||
switch (m_apiPick) {
|
||||
@ -185,12 +178,7 @@ void test_A(const bool predicatesReturnTrue, std::size_t numTeams,
|
||||
const auto rowFromBegin = KE::cbegin(rowFrom);
|
||||
const auto rowFromEnd = KE::cend(rowFrom);
|
||||
const auto val = greaterEqualValuesView_h(i);
|
||||
// FIXME_INTEL
|
||||
#if defined(KOKKOS_COMPILER_INTEL) && (1900 == KOKKOS_COMPILER_INTEL)
|
||||
const GreaterEqualFunctor<ValueType> unaryPred{val};
|
||||
#else
|
||||
const GreaterEqualFunctor unaryPred{val};
|
||||
#endif
|
||||
|
||||
auto it = std::find_if(rowFromBegin, rowFromEnd, unaryPred);
|
||||
|
||||
|
||||
@ -57,14 +57,7 @@ struct TestFunctorA {
|
||||
const auto myRowIndex = member.league_rank();
|
||||
auto myRowViewFrom = Kokkos::subview(m_dataView, myRowIndex, Kokkos::ALL());
|
||||
const auto val = m_greaterThanValuesView(myRowIndex);
|
||||
// FIXME_INTEL
|
||||
#if defined(KOKKOS_COMPILER_INTEL) && (1900 == KOKKOS_COMPILER_INTEL)
|
||||
GreaterEqualFunctor<
|
||||
typename GreaterThanValuesViewType::non_const_value_type>
|
||||
unaryPred{val};
|
||||
#else
|
||||
GreaterEqualFunctor unaryPred{val};
|
||||
#endif
|
||||
ptrdiff_t resultDist = 0;
|
||||
|
||||
switch (m_apiPick) {
|
||||
@ -180,12 +173,7 @@ void test_A(const bool predicatesReturnTrue, std::size_t numTeams,
|
||||
const auto rowFromBegin = KE::cbegin(rowFrom);
|
||||
const auto rowFromEnd = KE::cend(rowFrom);
|
||||
const auto val = greaterEqualValuesView_h(i);
|
||||
// FIXME_INTEL
|
||||
#if defined(KOKKOS_COMPILER_INTEL) && (1900 == KOKKOS_COMPILER_INTEL)
|
||||
const GreaterEqualFunctor<ValueType> unaryPred{val};
|
||||
#else
|
||||
const GreaterEqualFunctor unaryPred{val};
|
||||
#endif
|
||||
|
||||
auto it = std::find_if_not(rowFromBegin, rowFromEnd, unaryPred);
|
||||
|
||||
|
||||
@ -253,6 +253,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
|
||||
#undef inclusive_scan
|
||||
|
||||
@ -245,6 +245,7 @@ void test_A(const TestCaseType testCase, std::size_t numTeams,
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -249,6 +249,7 @@ void test_A(const bool viewsAreEqual, std::size_t numTeams, std::size_t numCols,
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -242,6 +242,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
|
||||
#undef reduce
|
||||
|
||||
@ -243,6 +243,7 @@ void test_A(const bool sequencesExist, std::size_t numTeams,
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -258,6 +258,7 @@ void test_A(const bool sequencesExist, std::size_t numTeams,
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -203,6 +203,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
|
||||
ASSERT_EQ(stdDistance, distancesView_h(i));
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
|
||||
#undef transform_exclusive_scan
|
||||
|
||||
@ -240,6 +240,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
}
|
||||
#undef transform_inclusive_scan
|
||||
|
||||
@ -293,6 +293,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
|
||||
|
||||
break;
|
||||
}
|
||||
default: Kokkos::abort("unreachable");
|
||||
}
|
||||
|
||||
#undef transform_reduce
|
||||
|
||||
@ -344,8 +344,7 @@ TEST(std_algorithms_numeric_ops_test, transform_exclusive_scan_functor) {
|
||||
using view_type = Kokkos::View<int*, exespace>;
|
||||
view_type dummy_view("dummy_view", 0);
|
||||
using unary_op_type =
|
||||
Kokkos::Experimental::Impl::StdNumericScanIdentityReferenceUnaryFunctor<
|
||||
int>;
|
||||
Kokkos::Experimental::Impl::StdNumericScanIdentityReferenceUnaryFunctor;
|
||||
using functor_type =
|
||||
Kokkos::Experimental::Impl::TransformExclusiveScanFunctorWithValueWrapper<
|
||||
exespace, int, int, view_type, view_type, MultiplyFunctor<int>,
|
||||
|
||||
@ -390,8 +390,7 @@ TEST(std_algorithms_numeric_ops_test, transform_inclusive_scan_functor) {
|
||||
int dummy = 0;
|
||||
using view_type = Kokkos::View<int*, exespace>;
|
||||
view_type dummy_view("dummy_view", 0);
|
||||
using unary_op_type =
|
||||
KE::Impl::StdNumericScanIdentityReferenceUnaryFunctor<int>;
|
||||
using unary_op_type = KE::Impl::StdNumericScanIdentityReferenceUnaryFunctor;
|
||||
{
|
||||
using functor_type =
|
||||
KE::Impl::ExeSpaceTransformInclusiveScanNoInitValueFunctor<
|
||||
|
||||
@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
|
||||
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||
KOKKOS_ARCH = "SNB,Volta70"
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||
|
||||
|
||||
@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
|
||||
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||
KOKKOS_ARCH = "SNB,Volta70"
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||
|
||||
|
||||
@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
|
||||
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||
KOKKOS_ARCH = "SNB,Volta70"
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||
|
||||
|
||||
@ -37,7 +37,7 @@
|
||||
|
||||
template <int V>
|
||||
struct TestFunctor {
|
||||
double values[V];
|
||||
double values[V] = {};
|
||||
Kokkos::View<double*> a;
|
||||
int K;
|
||||
TestFunctor(Kokkos::View<double*> a_, int K_) : a(a_), K(K_) {}
|
||||
@ -50,7 +50,7 @@ struct TestFunctor {
|
||||
|
||||
template <int V>
|
||||
struct TestRFunctor {
|
||||
double values[V];
|
||||
double values[V] = {};
|
||||
Kokkos::View<double*> a;
|
||||
int K;
|
||||
TestRFunctor(Kokkos::View<double*> a_, int K_) : a(a_), K(K_) {}
|
||||
@ -247,12 +247,15 @@ int main(int argc, char* argv[]) {
|
||||
// anything that doesn't start with --
|
||||
if (arg.size() < 2 ||
|
||||
(arg.size() >= 2 && arg[0] != '-' && arg[1] != '-')) {
|
||||
// signing off that arg.data() is null terminated
|
||||
// NOLINTBEGIN(bugprone-suspicious-stringview-data-usage)
|
||||
if (i == 1)
|
||||
N = atoi(arg.data());
|
||||
else if (i == 2)
|
||||
M = atoi(arg.data());
|
||||
else if (i == 3)
|
||||
K = atoi(arg.data());
|
||||
// NOLINTEND(bugprone-suspicious-stringview-data-usage)
|
||||
else {
|
||||
Kokkos::abort("unexpected argument!");
|
||||
}
|
||||
|
||||
@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
|
||||
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||
KOKKOS_ARCH = "SNB,Volta70"
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||
|
||||
|
||||
@ -120,11 +120,12 @@ int main(int argc, char* argv[]) {
|
||||
// view appropriately for test and should obey first-touch etc Second call to
|
||||
// test is the one we actually care about and time
|
||||
view_type_1d v_1(Kokkos::view_alloc(Kokkos::WithoutInitializing, "v_1"),
|
||||
team_range * team_size);
|
||||
static_cast<size_t>(team_range) * team_size);
|
||||
view_type_2d v_2(Kokkos::view_alloc(Kokkos::WithoutInitializing, "v_2"),
|
||||
team_range * team_size, thread_range);
|
||||
static_cast<size_t>(team_range) * team_size, thread_range);
|
||||
view_type_3d v_3(Kokkos::view_alloc(Kokkos::WithoutInitializing, "v_3"),
|
||||
team_range * team_size, thread_range, vector_range);
|
||||
static_cast<size_t>(team_range) * team_size, thread_range,
|
||||
vector_range);
|
||||
|
||||
double result_computed = 0.0;
|
||||
double result_expect = 0.0;
|
||||
|
||||
@ -367,7 +367,7 @@ void test_policy(int team_range, int thread_range, int vector_range,
|
||||
// parallel_for RangePolicy: range = team_size*team_range
|
||||
if (test_type == 300) {
|
||||
Kokkos::parallel_for(
|
||||
"300 outer for", team_size * team_range,
|
||||
"300 outer for", static_cast<size_t>(team_size) * team_range,
|
||||
KOKKOS_LAMBDA(const int idx) {
|
||||
v1(idx) = idx;
|
||||
// prevent compiler from optimizing away the loop
|
||||
@ -376,14 +376,15 @@ void test_policy(int team_range, int thread_range, int vector_range,
|
||||
// parallel_reduce RangePolicy: range = team_size*team_range
|
||||
if (test_type == 400) {
|
||||
Kokkos::parallel_reduce(
|
||||
"400 outer reduce", team_size * team_range,
|
||||
"400 outer reduce", static_cast<size_t>(team_size) * team_range,
|
||||
KOKKOS_LAMBDA(const int idx, double& val) { val += idx; }, result);
|
||||
result_expect =
|
||||
0.5 * (team_size * team_range) * (team_size * team_range - 1);
|
||||
}
|
||||
// parallel_scan RangePolicy: range = team_size*team_range
|
||||
if (test_type == 500) {
|
||||
Kokkos::parallel_scan("500 outer scan", team_size * team_range,
|
||||
Kokkos::parallel_scan("500 outer scan",
|
||||
static_cast<size_t>(team_size) * team_range,
|
||||
ParallelScanFunctor<ViewType1>(v1)
|
||||
#if 0
|
||||
// This does not compile with pre Cuda 8.0 - see Github Issue #913 for explanation
|
||||
|
||||
@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
|
||||
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||
KOKKOS_ARCH = "SNB,Volta70"
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||
|
||||
|
||||
@ -1,6 +1,7 @@
|
||||
KOKKOS_DEVICES=Serial
|
||||
KOKKOS_ARCH = ""
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
|
||||
|
||||
|
||||
@ -317,7 +317,7 @@ do
|
||||
# End of Werror handling
|
||||
#Handle unsupported standard flags
|
||||
--std=c++1y|-std=c++1y|--std=gnu++1y|-std=gnu++1y|--std=c++1z|-std=c++1z|--std=gnu++1z|-std=gnu++1z|--std=c++2a|-std=c++2a)
|
||||
fallback_std_flag="-std=c++14"
|
||||
fallback_std_flag="-std=c++17"
|
||||
# this is hopefully just occurring in a downstream project during CMake feature tests
|
||||
# we really have no choice here but to accept the flag and change to an accepted C++ standard
|
||||
echo "nvcc_wrapper does not accept standard flags $1 since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use $fallback_std_flag instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration."
|
||||
@ -346,35 +346,17 @@ do
|
||||
# NVCC only has C++20 from version 12 on
|
||||
cuda_main_version=$([[ $(${nvcc_compiler} --version) =~ V([0-9]+) ]] && echo ${BASH_REMATCH[1]})
|
||||
if [ ${cuda_main_version} -lt 12 ]; then
|
||||
fallback_std_flag="-std=c++14"
|
||||
fallback_std_flag="-std=c++17"
|
||||
# this is hopefully just occurring in a downstream project during CMake feature tests
|
||||
# we really have no choice here but to accept the flag and change to an accepted C++ standard
|
||||
echo "nvcc_wrapper does not accept standard flags $1 since partial standard flags and standards after C++14 are not supported. nvcc_wrapper will use $fallback_std_flag instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration."
|
||||
echo "nvcc_wrapper does not accept standard flags $1 since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use $fallback_std_flag instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration."
|
||||
std_flag=$fallback_std_flag
|
||||
else
|
||||
std_flag=$1
|
||||
fi
|
||||
shared_args="$shared_args $std_flag"
|
||||
;;
|
||||
--std=c++17|-std=c++17)
|
||||
if [ -n "$std_flag" ]; then
|
||||
warn_std_flag
|
||||
shared_args=${shared_args/ $std_flag/}
|
||||
fi
|
||||
# NVCC only has C++17 from version 11 on
|
||||
cuda_main_version=$([[ $(${nvcc_compiler} --version) =~ V([0-9]+) ]] && echo ${BASH_REMATCH[1]})
|
||||
if [ ${cuda_main_version} -lt 11 ]; then
|
||||
fallback_std_flag="-std=c++14"
|
||||
# this is hopefully just occurring in a downstream project during CMake feature tests
|
||||
# we really have no choice here but to accept the flag and change to an accepted C++ standard
|
||||
echo "nvcc_wrapper does not accept standard flags $1 since partial standard flags and standards after C++14 are not supported. nvcc_wrapper will use $fallback_std_flag instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration."
|
||||
std_flag=$fallback_std_flag
|
||||
else
|
||||
std_flag=$1
|
||||
fi
|
||||
shared_args="$shared_args $std_flag"
|
||||
;;
|
||||
--std=c++11|-std=c++11|--std=c++14|-std=c++14)
|
||||
--std=c++11|-std=c++11|--std=c++14|-std=c++14|--std=c++17|-std=c++17)
|
||||
if [ -n "$std_flag" ]; then
|
||||
warn_std_flag
|
||||
shared_args=${shared_args/ $std_flag/}
|
||||
@ -500,6 +482,20 @@ do
|
||||
xlinker_args="$xlinker_args -Xlinker ${1:4:${#1}}"
|
||||
host_linker_args="$host_linker_args ${1:4:${#1}}"
|
||||
;;
|
||||
#Handle host assembler options
|
||||
-Wa,*)
|
||||
#To pass the -Wa options to the host compiler via -Xcompiler it is necessary
|
||||
#to use '\\,' for each comma in the options. As users might already add escapes
|
||||
#to the comma by themselves, the escapes are first removed and then only the
|
||||
#required number of \ are added back.
|
||||
xcompiler_args_wa=$(echo -e "$1" | sed -E 's/\\\+,/,/g' | sed -E 's/,/\\\\\\\,/g')
|
||||
if [ $first_xcompiler_arg -eq 1 ]; then
|
||||
xcompiler_args="$xcompiler_args_wa"
|
||||
first_xcompiler_arg=0
|
||||
else
|
||||
xcompiler_args="$xcompiler_args,$xcompiler_args_wa"
|
||||
fi
|
||||
;;
|
||||
#Handle object files: -x cu applies to all input files, so give them to linker, except if only linking
|
||||
*.a|*.so|*.o|*.obj)
|
||||
object_files="$object_files $1"
|
||||
|
||||
@ -2,65 +2,71 @@
|
||||
# loaded by include() and find_package() commands except when invoked with
|
||||
# the NO_POLICY_SCOPE option
|
||||
# CMP0057 + NEW -> IN_LIST operator in IF(...)
|
||||
CMAKE_POLICY(SET CMP0057 NEW)
|
||||
cmake_policy(SET CMP0057 NEW)
|
||||
|
||||
# Compute paths
|
||||
@PACKAGE_INIT@
|
||||
|
||||
#Find dependencies
|
||||
INCLUDE(CMakeFindDependencyMacro)
|
||||
include(CMakeFindDependencyMacro)
|
||||
|
||||
#This needs to go above the KokkosTargets in case
|
||||
#the Kokkos targets depend in some way on the TPL imports
|
||||
@KOKKOS_TPL_EXPORTS@
|
||||
|
||||
GET_FILENAME_COMPONENT(Kokkos_CMAKE_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH)
|
||||
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosTargets.cmake")
|
||||
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosConfigCommon.cmake")
|
||||
UNSET(Kokkos_CMAKE_DIR)
|
||||
get_filename_component(Kokkos_CMAKE_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH)
|
||||
include("${Kokkos_CMAKE_DIR}/KokkosTargets.cmake")
|
||||
include("${Kokkos_CMAKE_DIR}/KokkosConfigCommon.cmake")
|
||||
unset(Kokkos_CMAKE_DIR)
|
||||
|
||||
# check for conflicts
|
||||
IF("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS AND
|
||||
"separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS)
|
||||
MESSAGE(STATUS "'launch_compiler' implies global redirection of targets depending on Kokkos to appropriate compiler.")
|
||||
MESSAGE(STATUS "'separable_compilation' implies explicitly defining where redirection occurs via 'kokkos_compilation(PROJECT|TARGET|SOURCE|DIRECTORY ...)'")
|
||||
MESSAGE(FATAL_ERROR "Conflicting COMPONENTS: 'launch_compiler' and 'separable_compilation'")
|
||||
ENDIF()
|
||||
if("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS AND "separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS)
|
||||
message(STATUS "'launch_compiler' implies global redirection of targets depending on Kokkos to appropriate compiler.")
|
||||
message(
|
||||
STATUS
|
||||
"'separable_compilation' implies explicitly defining where redirection occurs via 'kokkos_compilation(PROJECT|TARGET|SOURCE|DIRECTORY ...)'"
|
||||
)
|
||||
message(FATAL_ERROR "Conflicting COMPONENTS: 'launch_compiler' and 'separable_compilation'")
|
||||
endif()
|
||||
|
||||
IF("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS)
|
||||
#
|
||||
# if find_package(Kokkos COMPONENTS launch_compiler) then rely on the
|
||||
# RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK to always redirect to the
|
||||
# appropriate compiler for Kokkos
|
||||
#
|
||||
if("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS)
|
||||
#
|
||||
# if find_package(Kokkos COMPONENTS launch_compiler) then rely on the
|
||||
# RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK to always redirect to the
|
||||
# appropriate compiler for Kokkos
|
||||
#
|
||||
|
||||
MESSAGE(STATUS "kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos")
|
||||
kokkos_compilation(
|
||||
GLOBAL
|
||||
CHECK_CUDA_COMPILES)
|
||||
message(
|
||||
STATUS
|
||||
"kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos"
|
||||
)
|
||||
kokkos_compilation(GLOBAL CHECK_CUDA_COMPILES)
|
||||
|
||||
ELSEIF(@Kokkos_ENABLE_CUDA@
|
||||
AND NOT @KOKKOS_COMPILE_LANGUAGE@ STREQUAL CUDA
|
||||
AND NOT "separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS)
|
||||
#
|
||||
# if CUDA was enabled, the compilation language was not set to CUDA, and separable compilation was not
|
||||
# specified, then set the RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK globally and
|
||||
# kokkos_launch_compiler will re-direct to the compiler used to compile CUDA code during installation.
|
||||
# kokkos_launch_compiler will re-direct if ${CMAKE_CXX_COMPILER} and -DKOKKOS_DEPENDENCE is present,
|
||||
# otherwise, the original command will be executed
|
||||
#
|
||||
elseif(@Kokkos_ENABLE_CUDA@ AND NOT @KOKKOS_COMPILE_LANGUAGE@ STREQUAL CUDA AND NOT "separable_compilation" IN_LIST
|
||||
Kokkos_FIND_COMPONENTS
|
||||
)
|
||||
#
|
||||
# if CUDA was enabled, the compilation language was not set to CUDA, and separable compilation was not
|
||||
# specified, then set the RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK globally and
|
||||
# kokkos_launch_compiler will re-direct to the compiler used to compile CUDA code during installation.
|
||||
# kokkos_launch_compiler will re-direct if ${CMAKE_CXX_COMPILER} and -DKOKKOS_DEPENDENCE is present,
|
||||
# otherwise, the original command will be executed
|
||||
#
|
||||
|
||||
# run test to see if CMAKE_CXX_COMPILER=nvcc_wrapper
|
||||
kokkos_compiler_is_nvcc(IS_NVCC ${CMAKE_CXX_COMPILER})
|
||||
# run test to see if CMAKE_CXX_COMPILER=nvcc_wrapper
|
||||
kokkos_compiler_is_nvcc(IS_NVCC ${CMAKE_CXX_COMPILER})
|
||||
|
||||
# if not nvcc_wrapper and Kokkos_LAUNCH_COMPILER was not set to OFF
|
||||
IF(NOT IS_NVCC AND (NOT DEFINED Kokkos_LAUNCH_COMPILER OR Kokkos_LAUNCH_COMPILER))
|
||||
MESSAGE(STATUS "kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos")
|
||||
kokkos_compilation(GLOBAL)
|
||||
ENDIF()
|
||||
# if not nvcc_wrapper and Kokkos_LAUNCH_COMPILER was not set to OFF
|
||||
if(NOT IS_NVCC AND (NOT DEFINED Kokkos_LAUNCH_COMPILER OR Kokkos_LAUNCH_COMPILER))
|
||||
message(
|
||||
STATUS
|
||||
"kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos"
|
||||
)
|
||||
kokkos_compilation(GLOBAL)
|
||||
endif()
|
||||
|
||||
# be mindful of the environment, pollution is bad
|
||||
UNSET(IS_NVCC)
|
||||
ENDIF()
|
||||
# be mindful of the environment, pollution is bad
|
||||
unset(IS_NVCC)
|
||||
endif()
|
||||
|
||||
set(Kokkos_COMPILE_LANGUAGE @KOKKOS_COMPILE_LANGUAGE@)
|
||||
|
||||
@ -1,67 +1,67 @@
|
||||
SET(Kokkos_DEVICES @KOKKOS_ENABLED_DEVICES@)
|
||||
SET(Kokkos_OPTIONS @KOKKOS_ENABLED_OPTIONS@)
|
||||
SET(Kokkos_TPLS @KOKKOS_ENABLED_TPLS@)
|
||||
SET(Kokkos_ARCH @KOKKOS_ENABLED_ARCH_LIST@)
|
||||
SET(Kokkos_CXX_COMPILER "@CMAKE_CXX_COMPILER@")
|
||||
SET(Kokkos_CXX_COMPILER_ID "@KOKKOS_CXX_COMPILER_ID@")
|
||||
SET(Kokkos_CXX_COMPILER_VERSION "@KOKKOS_CXX_COMPILER_VERSION@")
|
||||
SET(Kokkos_CXX_STANDARD @KOKKOS_CXX_STANDARD@)
|
||||
set(Kokkos_DEVICES @KOKKOS_ENABLED_DEVICES@)
|
||||
set(Kokkos_OPTIONS @KOKKOS_ENABLED_OPTIONS@)
|
||||
set(Kokkos_TPLS @KOKKOS_ENABLED_TPLS@)
|
||||
set(Kokkos_ARCH @KOKKOS_ENABLED_ARCH_LIST@)
|
||||
set(Kokkos_CXX_COMPILER "@CMAKE_CXX_COMPILER@")
|
||||
set(Kokkos_CXX_COMPILER_ID "@KOKKOS_CXX_COMPILER_ID@")
|
||||
set(Kokkos_CXX_COMPILER_VERSION "@KOKKOS_CXX_COMPILER_VERSION@")
|
||||
set(Kokkos_CXX_STANDARD @KOKKOS_CXX_STANDARD@)
|
||||
|
||||
# Required to be a TriBITS-compliant external package
|
||||
IF(NOT TARGET Kokkos::all_libs)
|
||||
if(NOT TARGET Kokkos::all_libs)
|
||||
# CMake Error at <prefix>/lib/cmake/Kokkos/KokkosConfigCommon.cmake:10 (ADD_LIBRARY):
|
||||
# ADD_LIBRARY cannot create ALIAS target "Kokkos::all_libs" because target
|
||||
# "Kokkos::kokkos" is imported but not globally visible.
|
||||
IF(CMAKE_VERSION VERSION_LESS "3.18")
|
||||
SET_TARGET_PROPERTIES(Kokkos::kokkos PROPERTIES IMPORTED_GLOBAL ON)
|
||||
ENDIF()
|
||||
ADD_LIBRARY(Kokkos::all_libs ALIAS Kokkos::kokkos)
|
||||
ENDIF()
|
||||
if(CMAKE_VERSION VERSION_LESS "3.18")
|
||||
set_target_properties(Kokkos::kokkos PROPERTIES IMPORTED_GLOBAL ON)
|
||||
endif()
|
||||
add_library(Kokkos::all_libs ALIAS Kokkos::kokkos)
|
||||
endif()
|
||||
|
||||
# Export Kokkos_ENABLE_<BACKEND> for each backend that was enabled.
|
||||
# NOTE: "Devices" is a little bit of a misnomer here. These are really
|
||||
# backends, e.g. Kokkos_ENABLE_OPENMP, Kokkos_ENABLE_CUDA, Kokkos_ENABLE_HIP,
|
||||
# or Kokkos_ENABLE_SYCL.
|
||||
FOREACH(DEV ${Kokkos_DEVICES})
|
||||
SET(Kokkos_ENABLE_${DEV} ON)
|
||||
ENDFOREACH()
|
||||
foreach(DEV ${Kokkos_DEVICES})
|
||||
set(Kokkos_ENABLE_${DEV} ON)
|
||||
endforeach()
|
||||
# Export relevant Kokkos_ENABLE<OPTION> variables, e.g.
|
||||
# Kokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE, Kokkos_ENABLE_DEBUG, etc.
|
||||
FOREACH(OPT ${Kokkos_OPTIONS})
|
||||
SET(Kokkos_ENABLE_${OPT} ON)
|
||||
ENDFOREACH()
|
||||
foreach(OPT ${Kokkos_OPTIONS})
|
||||
set(Kokkos_ENABLE_${OPT} ON)
|
||||
endforeach()
|
||||
|
||||
IF(Kokkos_ENABLE_CUDA)
|
||||
SET(Kokkos_CUDA_ARCHITECTURES @KOKKOS_CUDA_ARCHITECTURES@)
|
||||
ENDIF()
|
||||
if(Kokkos_ENABLE_CUDA)
|
||||
set(Kokkos_CUDA_ARCHITECTURES @KOKKOS_CUDA_ARCHITECTURES@)
|
||||
endif()
|
||||
|
||||
IF(Kokkos_ENABLE_HIP)
|
||||
SET(Kokkos_HIP_ARCHITECTURES @KOKKOS_HIP_ARCHITECTURES@)
|
||||
ENDIF()
|
||||
if(Kokkos_ENABLE_HIP)
|
||||
set(Kokkos_HIP_ARCHITECTURES @KOKKOS_HIP_ARCHITECTURES@)
|
||||
endif()
|
||||
|
||||
IF(NOT Kokkos_FIND_QUIETLY)
|
||||
MESSAGE(STATUS "Enabled Kokkos devices: ${Kokkos_DEVICES}")
|
||||
ENDIF()
|
||||
if(NOT Kokkos_FIND_QUIETLY)
|
||||
message(STATUS "Enabled Kokkos devices: ${Kokkos_DEVICES}")
|
||||
endif()
|
||||
|
||||
IF (Kokkos_ENABLE_CUDA)
|
||||
if(Kokkos_ENABLE_CUDA)
|
||||
# If we are building CUDA, we have tricked CMake because we declare a CXX project
|
||||
# If the default C++ standard for a given compiler matches the requested
|
||||
# standard, then CMake just omits the -std flag in later versions of CMake
|
||||
# This breaks CUDA compilation (CUDA compiler can have a different default
|
||||
# -std then the underlying host compiler by itself). Setting this variable
|
||||
# forces CMake to always add the -std flag even if it thinks it doesn't need it
|
||||
SET(CMAKE_CXX_STANDARD_DEFAULT 98 CACHE INTERNAL "" FORCE)
|
||||
ENDIF()
|
||||
set(CMAKE_CXX_STANDARD_DEFAULT 98 CACHE INTERNAL "" FORCE)
|
||||
endif()
|
||||
|
||||
SET(KOKKOS_USE_CXX_EXTENSIONS @KOKKOS_USE_CXX_EXTENSIONS@)
|
||||
IF (NOT DEFINED CMAKE_CXX_EXTENSIONS OR CMAKE_CXX_EXTENSIONS)
|
||||
IF (NOT KOKKOS_USE_CXX_EXTENSIONS)
|
||||
MESSAGE(WARNING "The installed Kokkos configuration does not support CXX extensions. Forcing -DCMAKE_CXX_EXTENSIONS=Off")
|
||||
SET(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "" FORCE)
|
||||
ENDIF()
|
||||
ENDIF()
|
||||
|
||||
include(FindPackageHandleStandardArgs)
|
||||
set(KOKKOS_USE_CXX_EXTENSIONS @KOKKOS_USE_CXX_EXTENSIONS@)
|
||||
if(NOT DEFINED CMAKE_CXX_EXTENSIONS OR CMAKE_CXX_EXTENSIONS)
|
||||
if(NOT KOKKOS_USE_CXX_EXTENSIONS)
|
||||
message(
|
||||
WARNING "The installed Kokkos configuration does not support CXX extensions. Forcing -DCMAKE_CXX_EXTENSIONS=Off"
|
||||
)
|
||||
set(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "" FORCE)
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# This function makes sure that Kokkos was built with the requested backends
|
||||
# and target architectures and generates a fatal error if it was not.
|
||||
@ -89,28 +89,23 @@ function(kokkos_check)
|
||||
endforeach()
|
||||
set(KOKKOS_CHECK_SUCCESS TRUE)
|
||||
foreach(arg ${REQUESTED_ARGS})
|
||||
# Define variables named after the required arguments that are provided by
|
||||
# the Kokkos install.
|
||||
set(MISSING_OPTIONS "")
|
||||
foreach(requested ${KOKKOS_CHECK_${arg}})
|
||||
set(FOUND_MATCHING_OPTION FALSE)
|
||||
foreach(provided ${Kokkos_${arg}})
|
||||
STRING(TOUPPER ${requested} REQUESTED_UC)
|
||||
STRING(TOUPPER ${provided} PROVIDED_UC)
|
||||
string(TOUPPER ${requested} REQUESTED_UC)
|
||||
string(TOUPPER ${provided} PROVIDED_UC)
|
||||
if(PROVIDED_UC STREQUAL REQUESTED_UC)
|
||||
string(REPLACE ";" " " ${requested} "${KOKKOS_CHECK_${arg}}")
|
||||
set(FOUND_MATCHING_OPTION TRUE)
|
||||
endif()
|
||||
endforeach()
|
||||
if(NOT FOUND_MATCHING_OPTION)
|
||||
list(APPEND MISSING_OPTIONS ${requested})
|
||||
set(KOKKOS_CHECK_SUCCESS FALSE)
|
||||
endif()
|
||||
endforeach()
|
||||
# Somewhat divert the CMake function below from its original purpose and
|
||||
# use it to check that there are variables defined for all required
|
||||
# arguments. Success or failure messages will be displayed but we are
|
||||
# responsible for signaling failure and skip the build system generation.
|
||||
if (KOKKOS_CHECK_RETURN_VALUE)
|
||||
set(Kokkos_${arg}_FIND_QUIETLY ON)
|
||||
endif()
|
||||
find_package_handle_standard_args("Kokkos_${arg}" DEFAULT_MSG
|
||||
${KOKKOS_CHECK_${arg}})
|
||||
if(NOT Kokkos_${arg}_FOUND)
|
||||
set(KOKKOS_CHECK_SUCCESS FALSE)
|
||||
if(NOT KOKKOS_CHECK_SUCCESS AND NOT KOKKOS_CHECK_RETURN_VALUE)
|
||||
message(STATUS "Could NOT find Kokkos_${arg} (missing: ${MISSING_OPTIONS})")
|
||||
endif()
|
||||
endforeach()
|
||||
if(NOT KOKKOS_CHECK_SUCCESS AND NOT KOKKOS_CHECK_RETURN_VALUE)
|
||||
@ -122,32 +117,35 @@ endfunction()
|
||||
|
||||
# A test to check whether a downstream project set the C++ compiler to NVCC or not
|
||||
# this is called only when Kokkos was installed with Kokkos_ENABLE_CUDA=ON
|
||||
FUNCTION(kokkos_compiler_is_nvcc VAR COMPILER)
|
||||
# Check if the compiler is nvcc (which really means nvcc_wrapper).
|
||||
EXECUTE_PROCESS(COMMAND ${COMPILER} ${ARGN} --version
|
||||
OUTPUT_VARIABLE INTERNAL_COMPILER_VERSION
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE
|
||||
RESULT_VARIABLE RET)
|
||||
# something went wrong
|
||||
IF(RET GREATER 0)
|
||||
SET(${VAR} false PARENT_SCOPE)
|
||||
ELSE()
|
||||
STRING(REPLACE "\n" " - " INTERNAL_COMPILER_VERSION_ONE_LINE ${INTERNAL_COMPILER_VERSION} )
|
||||
STRING(FIND ${INTERNAL_COMPILER_VERSION_ONE_LINE} "nvcc" INTERNAL_COMPILER_VERSION_CONTAINS_NVCC)
|
||||
STRING(REGEX REPLACE "^ +" "" INTERNAL_HAVE_COMPILER_NVCC "${INTERNAL_HAVE_COMPILER_NVCC}")
|
||||
IF(${INTERNAL_COMPILER_VERSION_CONTAINS_NVCC} GREATER -1)
|
||||
SET(${VAR} true PARENT_SCOPE)
|
||||
ELSE()
|
||||
SET(${VAR} false PARENT_SCOPE)
|
||||
ENDIF()
|
||||
ENDIF()
|
||||
ENDFUNCTION()
|
||||
function(kokkos_compiler_is_nvcc VAR COMPILER)
|
||||
# Check if the compiler is nvcc (which really means nvcc_wrapper).
|
||||
execute_process(
|
||||
COMMAND ${COMPILER} ${ARGN} --version
|
||||
OUTPUT_VARIABLE INTERNAL_COMPILER_VERSION
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE
|
||||
RESULT_VARIABLE RET
|
||||
)
|
||||
# something went wrong
|
||||
if(RET GREATER 0)
|
||||
set(${VAR} false PARENT_SCOPE)
|
||||
else()
|
||||
string(REPLACE "\n" " - " INTERNAL_COMPILER_VERSION_ONE_LINE ${INTERNAL_COMPILER_VERSION})
|
||||
string(FIND ${INTERNAL_COMPILER_VERSION_ONE_LINE} "nvcc" INTERNAL_COMPILER_VERSION_CONTAINS_NVCC)
|
||||
string(REGEX REPLACE "^ +" "" INTERNAL_HAVE_COMPILER_NVCC "${INTERNAL_HAVE_COMPILER_NVCC}")
|
||||
if(${INTERNAL_COMPILER_VERSION_CONTAINS_NVCC} GREATER -1)
|
||||
set(${VAR} true PARENT_SCOPE)
|
||||
else()
|
||||
set(${VAR} false PARENT_SCOPE)
|
||||
endif()
|
||||
endif()
|
||||
endfunction()
|
||||
|
||||
# this function checks whether the current CXX compiler supports building CUDA
|
||||
FUNCTION(kokkos_cxx_compiler_cuda_test _VAR _COMPILER)
|
||||
function(kokkos_cxx_compiler_cuda_test _VAR _COMPILER)
|
||||
|
||||
FILE(WRITE ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
|
||||
"
|
||||
file(
|
||||
WRITE ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
|
||||
"
|
||||
#include <cuda.h>
|
||||
#include <cstdlib>
|
||||
|
||||
@ -171,34 +169,39 @@ int main()
|
||||
cudaDeviceSynchronize();
|
||||
return EXIT_SUCCESS;
|
||||
}
|
||||
")
|
||||
"
|
||||
)
|
||||
|
||||
# save the command for debugging
|
||||
set(_COMMANDS "${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu")
|
||||
|
||||
# use execute_process instead of try compile because we want to set custom compiler
|
||||
execute_process(
|
||||
COMMAND ${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
|
||||
RESULT_VARIABLE _RET
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
|
||||
TIMEOUT 15
|
||||
OUTPUT_QUIET ERROR_QUIET
|
||||
)
|
||||
|
||||
if(NOT _RET EQUAL 0)
|
||||
# save the command for debugging
|
||||
SET(_COMMANDS "${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu")
|
||||
set(_COMMANDS
|
||||
"${_COMMAND}\n${_COMPILER} --cuda-gpu-arch=sm_35 ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu"
|
||||
)
|
||||
# try the compile test again with clang arguments
|
||||
execute_process(
|
||||
COMMAND ${_COMPILER} --cuda-gpu-arch=sm_35 -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
|
||||
RESULT_VARIABLE _RET
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
|
||||
TIMEOUT 15
|
||||
OUTPUT_QUIET ERROR_QUIET
|
||||
)
|
||||
endif()
|
||||
|
||||
# use execute_process instead of try compile because we want to set custom compiler
|
||||
EXECUTE_PROCESS(COMMAND ${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
|
||||
RESULT_VARIABLE _RET
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
|
||||
TIMEOUT 15
|
||||
OUTPUT_QUIET
|
||||
ERROR_QUIET)
|
||||
|
||||
IF(NOT _RET EQUAL 0)
|
||||
# save the command for debugging
|
||||
SET(_COMMANDS "${_COMMAND}\n${_COMPILER} --cuda-gpu-arch=sm_35 ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu")
|
||||
# try the compile test again with clang arguments
|
||||
EXECUTE_PROCESS(COMMAND ${_COMPILER} --cuda-gpu-arch=sm_35 -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
|
||||
RESULT_VARIABLE _RET
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
|
||||
TIMEOUT 15
|
||||
OUTPUT_QUIET
|
||||
ERROR_QUIET)
|
||||
ENDIF()
|
||||
|
||||
SET(${_VAR}_COMMANDS "${_COMMANDS}" PARENT_SCOPE)
|
||||
SET(${_VAR} ${_RET} PARENT_SCOPE)
|
||||
ENDFUNCTION()
|
||||
set(${_VAR}_COMMANDS "${_COMMANDS}" PARENT_SCOPE)
|
||||
set(${_VAR} ${_RET} PARENT_SCOPE)
|
||||
endfunction()
|
||||
|
||||
# this function is provided to easily select which files use the same compiler as Kokkos
|
||||
# when it was installed (or nvcc_wrapper):
|
||||
@ -215,94 +218,107 @@ ENDFUNCTION()
|
||||
#
|
||||
# Use CHECK_CUDA_COMPILES to run a check when CUDA is enabled
|
||||
#
|
||||
FUNCTION(kokkos_compilation)
|
||||
CMAKE_PARSE_ARGUMENTS(COMP
|
||||
"GLOBAL;PROJECT;CHECK_CUDA_COMPILES"
|
||||
"COMPILER"
|
||||
"DIRECTORY;TARGET;SOURCE;COMMAND_PREFIX"
|
||||
${ARGN})
|
||||
function(kokkos_compilation)
|
||||
cmake_parse_arguments(
|
||||
COMP "GLOBAL;PROJECT;CHECK_CUDA_COMPILES" "COMPILER" "DIRECTORY;TARGET;SOURCE;COMMAND_PREFIX" ${ARGN}
|
||||
)
|
||||
|
||||
# if built w/o CUDA support, we want to basically make this a no-op
|
||||
SET(_Kokkos_ENABLE_CUDA @Kokkos_ENABLE_CUDA@)
|
||||
# if built w/o CUDA support, we want to basically make this a no-op
|
||||
set(_Kokkos_ENABLE_CUDA @Kokkos_ENABLE_CUDA@)
|
||||
|
||||
if(CMAKE_VERSION VERSION_GREATER_EQUAL 3.17)
|
||||
set(MAYBE_CURRENT_INSTALLATION_ROOT "${CMAKE_CURRENT_FUNCTION_LIST_DIR}/../../..")
|
||||
endif()
|
||||
|
||||
IF(CMAKE_VERSION VERSION_GREATER_EQUAL 3.17)
|
||||
SET(MAYBE_CURRENT_INSTALLATION_ROOT "${CMAKE_CURRENT_FUNCTION_LIST_DIR}/../../..")
|
||||
ENDIF()
|
||||
# search relative first and then absolute
|
||||
set(_HINTS "${MAYBE_CURRENT_INSTALLATION_ROOT}" "@CMAKE_INSTALL_PREFIX@")
|
||||
|
||||
# search relative first and then absolute
|
||||
SET(_HINTS "${MAYBE_CURRENT_INSTALLATION_ROOT}" "@CMAKE_INSTALL_PREFIX@")
|
||||
# find kokkos_launch_compiler
|
||||
find_program(
|
||||
Kokkos_COMPILE_LAUNCHER
|
||||
NAMES kokkos_launch_compiler
|
||||
HINTS ${_HINTS}
|
||||
PATHS ${_HINTS}
|
||||
PATH_SUFFIXES bin
|
||||
)
|
||||
|
||||
# find kokkos_launch_compiler
|
||||
FIND_PROGRAM(Kokkos_COMPILE_LAUNCHER
|
||||
NAMES kokkos_launch_compiler
|
||||
HINTS ${_HINTS}
|
||||
PATHS ${_HINTS}
|
||||
PATH_SUFFIXES bin)
|
||||
if(NOT Kokkos_COMPILE_LAUNCHER)
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'"
|
||||
)
|
||||
endif()
|
||||
|
||||
IF(NOT Kokkos_COMPILE_LAUNCHER)
|
||||
MESSAGE(FATAL_ERROR "Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'")
|
||||
ENDIF()
|
||||
# if COMPILER was not specified, assume Kokkos_CXX_COMPILER
|
||||
if(NOT COMP_COMPILER)
|
||||
set(COMP_COMPILER ${Kokkos_CXX_COMPILER})
|
||||
if(_Kokkos_ENABLE_CUDA AND Kokkos_CXX_COMPILER_ID STREQUAL NVIDIA)
|
||||
# find nvcc_wrapper
|
||||
find_program(
|
||||
Kokkos_NVCC_WRAPPER
|
||||
NAMES nvcc_wrapper
|
||||
HINTS ${_HINTS}
|
||||
PATHS ${_HINTS}
|
||||
PATH_SUFFIXES bin
|
||||
)
|
||||
# fatal if we can't nvcc_wrapper
|
||||
if(NOT Kokkos_NVCC_WRAPPER)
|
||||
message(
|
||||
FATAL_ERROR "Kokkos could not find nvcc_wrapper. Please set '-DKokkos_NVCC_WRAPPER=/path/to/nvcc_wrapper'"
|
||||
)
|
||||
endif()
|
||||
set(COMP_COMPILER ${Kokkos_NVCC_WRAPPER})
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# if COMPILER was not specified, assume Kokkos_CXX_COMPILER
|
||||
IF(NOT COMP_COMPILER)
|
||||
SET(COMP_COMPILER ${Kokkos_CXX_COMPILER})
|
||||
IF(_Kokkos_ENABLE_CUDA AND Kokkos_CXX_COMPILER_ID STREQUAL NVIDIA)
|
||||
# find nvcc_wrapper
|
||||
FIND_PROGRAM(Kokkos_NVCC_WRAPPER
|
||||
NAMES nvcc_wrapper
|
||||
HINTS ${_HINTS}
|
||||
PATHS ${_HINTS}
|
||||
PATH_SUFFIXES bin)
|
||||
# fatal if we can't nvcc_wrapper
|
||||
IF(NOT Kokkos_NVCC_WRAPPER)
|
||||
MESSAGE(FATAL_ERROR "Kokkos could not find nvcc_wrapper. Please set '-DKokkos_NVCC_WRAPPER=/path/to/nvcc_wrapper'")
|
||||
ENDIF()
|
||||
SET(COMP_COMPILER ${Kokkos_NVCC_WRAPPER})
|
||||
ENDIF()
|
||||
ENDIF()
|
||||
# check that the original compiler still exists!
|
||||
if(NOT EXISTS ${COMP_COMPILER})
|
||||
message(FATAL_ERROR "Kokkos could not find original compiler: '${COMP_COMPILER}'")
|
||||
endif()
|
||||
|
||||
# check that the original compiler still exists!
|
||||
IF(NOT EXISTS ${COMP_COMPILER})
|
||||
MESSAGE(FATAL_ERROR "Kokkos could not find original compiler: '${COMP_COMPILER}'")
|
||||
ENDIF()
|
||||
# try to ensure that compiling cuda code works!
|
||||
if(_Kokkos_ENABLE_CUDA AND COMP_CHECK_CUDA_COMPILES)
|
||||
|
||||
# try to ensure that compiling cuda code works!
|
||||
IF(_Kokkos_ENABLE_CUDA AND COMP_CHECK_CUDA_COMPILES)
|
||||
# this may fail if kokkos_compiler launcher was used during install
|
||||
kokkos_cxx_compiler_cuda_test(_COMPILES_CUDA ${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER})
|
||||
|
||||
# this may fail if kokkos_compiler launcher was used during install
|
||||
kokkos_cxx_compiler_cuda_test(_COMPILES_CUDA
|
||||
${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER})
|
||||
# if above failed, throw an error
|
||||
if(NOT _COMPILES_CUDA)
|
||||
message(FATAL_ERROR "kokkos_cxx_compiler_cuda_test failed! Test commands:\n${_COMPILES_CUDA_COMMANDS}")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# if above failed, throw an error
|
||||
IF(NOT _COMPILES_CUDA)
|
||||
MESSAGE(FATAL_ERROR "kokkos_cxx_compiler_cuda_test failed! Test commands:\n${_COMPILES_CUDA_COMMANDS}")
|
||||
ENDIF()
|
||||
ENDIF()
|
||||
if(COMP_COMMAND_PREFIX)
|
||||
set(_PREFIX "${COMP_COMMAND_PREFIX}")
|
||||
string(REPLACE ";" " " _PREFIX "${COMP_COMMAND_PREFIX}")
|
||||
set(Kokkos_COMPILER_LAUNCHER "${_PREFIX} ${Kokkos_COMPILE_LAUNCHER}")
|
||||
endif()
|
||||
|
||||
IF(COMP_COMMAND_PREFIX)
|
||||
SET(_PREFIX "${COMP_COMMAND_PREFIX}")
|
||||
STRING(REPLACE ";" " " _PREFIX "${COMP_COMMAND_PREFIX}")
|
||||
SET(Kokkos_COMPILER_LAUNCHER "${_PREFIX} ${Kokkos_COMPILE_LAUNCHER}")
|
||||
ENDIF()
|
||||
|
||||
IF(COMP_GLOBAL)
|
||||
# if global, don't bother setting others
|
||||
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
|
||||
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
|
||||
ELSE()
|
||||
FOREACH(_TYPE PROJECT DIRECTORY TARGET SOURCE)
|
||||
# make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...)
|
||||
IF("${_TYPE}" STREQUAL "PROJECT" AND COMP_${_TYPE})
|
||||
LIST(APPEND COMP_DIRECTORY ${PROJECT_SOURCE_DIR})
|
||||
UNSET(COMP_${_TYPE})
|
||||
ENDIF()
|
||||
# set the properties if defined
|
||||
IF(COMP_${_TYPE})
|
||||
# MESSAGE(STATUS "Using ${COMP_COMPILER} :: ${_TYPE} :: ${COMP_${_TYPE}}")
|
||||
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
|
||||
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
|
||||
ENDIF()
|
||||
ENDFOREACH()
|
||||
ENDIF()
|
||||
ENDFUNCTION()
|
||||
if(COMP_GLOBAL)
|
||||
# if global, don't bother setting others
|
||||
set_property(
|
||||
GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}"
|
||||
)
|
||||
set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
|
||||
else()
|
||||
foreach(_TYPE PROJECT DIRECTORY TARGET SOURCE)
|
||||
# make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...)
|
||||
if("${_TYPE}" STREQUAL "PROJECT" AND COMP_${_TYPE})
|
||||
list(APPEND COMP_DIRECTORY ${PROJECT_SOURCE_DIR})
|
||||
unset(COMP_${_TYPE})
|
||||
endif()
|
||||
# set the properties if defined
|
||||
if(COMP_${_TYPE})
|
||||
# MESSAGE(STATUS "Using ${COMP_COMPILER} :: ${_TYPE} :: ${COMP_${_TYPE}}")
|
||||
set_property(
|
||||
${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE
|
||||
"${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}"
|
||||
)
|
||||
set_property(
|
||||
${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK
|
||||
"${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}"
|
||||
)
|
||||
endif()
|
||||
endforeach()
|
||||
endif()
|
||||
endfunction()
|
||||
|
||||
@ -9,7 +9,9 @@
|
||||
// KOKKOS_VERSION % 100 is the patch level
|
||||
// KOKKOS_VERSION / 100 % 100 is the minor version
|
||||
// KOKKOS_VERSION / 10000 is the major version
|
||||
#define KOKKOS_VERSION @KOKKOS_VERSION@
|
||||
#cmakedefine KOKKOS_VERSION @KOKKOS_VERSION@
|
||||
// Not using #cmakedefine below because a "0" FOO version number
|
||||
// yields /* undef KOKKOS_VERSION_FOO */
|
||||
#define KOKKOS_VERSION_MAJOR @KOKKOS_VERSION_MAJOR@
|
||||
#define KOKKOS_VERSION_MINOR @KOKKOS_VERSION_MINOR@
|
||||
#define KOKKOS_VERSION_PATCH @KOKKOS_VERSION_PATCH@
|
||||
@ -116,6 +118,7 @@
|
||||
#cmakedefine KOKKOS_ARCH_AMD_ZEN
|
||||
#cmakedefine KOKKOS_ARCH_AMD_ZEN2
|
||||
#cmakedefine KOKKOS_ARCH_AMD_ZEN3
|
||||
#cmakedefine KOKKOS_ARCH_AMD_ZEN4
|
||||
#cmakedefine KOKKOS_ARCH_AMD_GFX906
|
||||
#cmakedefine KOKKOS_ARCH_AMD_GFX908
|
||||
#cmakedefine KOKKOS_ARCH_AMD_GFX90A
|
||||
|
||||
@ -11,9 +11,16 @@ if(KOKKOS_CXX_HOST_COMPILER_ID STREQUAL NVHPC AND CMAKE_VERSION VERSION_LESS "3.
|
||||
message(FATAL_ERROR "Using NVHPC as host compiler requires at least CMake 3.20.1")
|
||||
endif()
|
||||
|
||||
set(TPL_CUDA_LIBRARIES "")
|
||||
if(KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE)
|
||||
set(TPL_CUDA_LIBRARIES CUDA::cuda_driver)
|
||||
else()
|
||||
set(TPL_CUDA_LIBRARIES CUDA::cuda_driver CUDA::cudart)
|
||||
endif()
|
||||
|
||||
if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.17.0")
|
||||
find_package(CUDAToolkit REQUIRED)
|
||||
kokkos_create_imported_tpl(CUDA INTERFACE LINK_LIBRARIES CUDA::cuda_driver CUDA::cudart)
|
||||
kokkos_create_imported_tpl(CUDA INTERFACE LINK_LIBRARIES ${TPL_CUDA_LIBRARIES})
|
||||
kokkos_export_cmake_tpl(CUDAToolkit REQUIRED)
|
||||
else()
|
||||
include(${CMAKE_CURRENT_LIST_DIR}/CudaToolkit.cmake)
|
||||
@ -33,8 +40,8 @@ else()
|
||||
endif()
|
||||
|
||||
include(FindPackageHandleStandardArgs)
|
||||
find_package_handle_standard_args(TPLCUDA ${DEFAULT_MSG} FOUND_CUDART FOUND_CUDA_DRIVER)
|
||||
find_package_handle_standard_args(TPLCUDA ${DEFAULT_MSG} FOUND_CUDA_DRIVER FOUND_CUDART)
|
||||
if(FOUND_CUDA_DRIVER AND FOUND_CUDART)
|
||||
kokkos_create_imported_tpl(CUDA INTERFACE LINK_LIBRARIES CUDA::cuda_driver CUDA::cudart)
|
||||
kokkos_create_imported_tpl(CUDA INTERFACE LINK_LIBRARIES ${TPL_CUDA_LIBRARIES})
|
||||
endif()
|
||||
endif()
|
||||
|
||||
@ -1,15 +0,0 @@
|
||||
function(kokkos_set_intel_flags full_standard int_standard)
|
||||
string(TOLOWER ${full_standard} FULL_LC_STANDARD)
|
||||
string(TOLOWER ${int_standard} INT_LC_STANDARD)
|
||||
# The following three blocks of code were copied from
|
||||
# /Modules/Compiler/Intel-CXX.cmake from CMake 3.18.1 and then modified.
|
||||
if(CMAKE_CXX_SIMULATE_ID STREQUAL MSVC)
|
||||
set(_std -Qstd)
|
||||
set(_ext c++)
|
||||
else()
|
||||
set(_std -std)
|
||||
set(_ext gnu++)
|
||||
endif()
|
||||
set(KOKKOS_CXX_STANDARD_FLAG "${_std}=c++${FULL_LC_STANDARD}" PARENT_SCOPE)
|
||||
set(KOKKOS_CXX_INTERMDIATE_STANDARD_FLAG "${_std}=${_ext}${INT_LC_STANDARD}" PARENT_SCOPE)
|
||||
endfunction()
|
||||
@ -67,6 +67,7 @@ declare_and_check_host_arch(POWER9 "IBM POWER9 CPUs")
|
||||
declare_and_check_host_arch(ZEN "AMD Zen architecture")
|
||||
declare_and_check_host_arch(ZEN2 "AMD Zen2 architecture")
|
||||
declare_and_check_host_arch(ZEN3 "AMD Zen3 architecture")
|
||||
declare_and_check_host_arch(ZEN4 "AMD Zen4 architecture")
|
||||
declare_and_check_host_arch(RISCV_SG2042 "SG2042 (RISC-V) CPUs")
|
||||
declare_and_check_host_arch(RISCV_RVA22V "RVA22V (RISC-V) CPUs")
|
||||
|
||||
@ -163,16 +164,11 @@ if(KOKKOS_ENABLE_COMPILER_WARNINGS)
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# ICPC doesn't support -Wsuggest-override
|
||||
if(KOKKOS_CXX_COMPILER_ID STREQUAL Intel)
|
||||
list(REMOVE_ITEM COMMON_WARNINGS "-Wsuggest-override")
|
||||
endif()
|
||||
|
||||
if(KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
|
||||
list(APPEND COMMON_WARNINGS "-Wimplicit-fallthrough")
|
||||
endif()
|
||||
|
||||
set(GNU_WARNINGS "-Wempty-body" "-Wclobbered" "-Wignored-qualifiers" ${COMMON_WARNINGS})
|
||||
set(GNU_WARNINGS "-Wempty-body" "-Wignored-qualifiers" ${COMMON_WARNINGS})
|
||||
if(KOKKOS_CXX_COMPILER_ID STREQUAL GNU)
|
||||
list(APPEND GNU_WARNINGS "-Wimplicit-fallthrough")
|
||||
endif()
|
||||
@ -349,12 +345,27 @@ endif()
|
||||
|
||||
if(KOKKOS_ARCH_ARMV9_GRACE)
|
||||
set(KOKKOS_ARCH_ARM_NEON ON)
|
||||
check_cxx_compiler_flag("-mcpu=neoverse-n2" COMPILER_SUPPORTS_NEOVERSE_N2)
|
||||
check_cxx_compiler_flag("-msve-vector-bits=128" COMPILER_SUPPORTS_SVE_VECTOR_BITS)
|
||||
if(COMPILER_SUPPORTS_NEOVERSE_N2 AND COMPILER_SUPPORTS_SVE_VECTOR_BITS)
|
||||
compiler_specific_flags(COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID DEFAULT -mcpu=neoverse-n2 -msve-vector-bits=128)
|
||||
if(KOKKOS_CXX_HOST_COMPILER_ID STREQUAL NVHPC)
|
||||
check_cxx_compiler_flag("-tp=grace" COMPILER_SUPPORTS_GRACE_AS_TARGET_PROCESSOR)
|
||||
else()
|
||||
message(WARNING "Compiler does not support ARMv9 Grace architecture")
|
||||
check_cxx_compiler_flag("-mcpu=neoverse-n2" COMPILER_SUPPORTS_NEOVERSE_N2)
|
||||
check_cxx_compiler_flag("-msve-vector-bits=128" COMPILER_SUPPORTS_SVE_VECTOR_BITS)
|
||||
endif()
|
||||
if(COMPILER_SUPPORTS_NEOVERSE_N2 AND COMPILER_SUPPORTS_SVE_VECTOR_BITS OR COMPILER_SUPPORTS_GRACE_AS_TARGET_PROCESSOR)
|
||||
compiler_specific_flags(
|
||||
COMPILER_ID
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
NVHPC
|
||||
-tp=grace
|
||||
DEFAULT
|
||||
-mcpu=neoverse-n2
|
||||
-msve-vector-bits=128
|
||||
)
|
||||
else()
|
||||
message(SEND_ERROR "Your compiler does not appear to support the ARMv9 Grace architecture.
|
||||
Please ensure you are using a compatible compiler and toolchain.
|
||||
Alternatively, try configuring with -DKokkos_ARCH_NATIVE=ON to use the native architecture of your system."
|
||||
)
|
||||
endif()
|
||||
endif()
|
||||
|
||||
@ -362,8 +373,6 @@ if(KOKKOS_ARCH_ZEN)
|
||||
compiler_specific_flags(
|
||||
COMPILER_ID
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
Intel
|
||||
-mavx2
|
||||
MSVC
|
||||
/arch:AVX2
|
||||
NVHPC
|
||||
@ -380,8 +389,6 @@ if(KOKKOS_ARCH_ZEN2)
|
||||
compiler_specific_flags(
|
||||
COMPILER_ID
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
Intel
|
||||
-mavx2
|
||||
MSVC
|
||||
/arch:AVX2
|
||||
NVHPC
|
||||
@ -398,12 +405,10 @@ if(KOKKOS_ARCH_ZEN3)
|
||||
compiler_specific_flags(
|
||||
COMPILER_ID
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
Intel
|
||||
-mavx2
|
||||
MSVC
|
||||
/arch:AVX2
|
||||
NVHPC
|
||||
-tp=zen2
|
||||
-tp=zen3
|
||||
DEFAULT
|
||||
-march=znver3
|
||||
-mtune=znver3
|
||||
@ -412,6 +417,22 @@ if(KOKKOS_ARCH_ZEN3)
|
||||
set(KOKKOS_ARCH_AVX2 ON)
|
||||
endif()
|
||||
|
||||
if(KOKKOS_ARCH_ZEN4)
|
||||
compiler_specific_flags(
|
||||
COMPILER_ID
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
MSVC
|
||||
/arch:AVX512
|
||||
NVHPC
|
||||
-tp=zen4
|
||||
DEFAULT
|
||||
-march=znver4
|
||||
-mtune=znver4
|
||||
)
|
||||
set(KOKKOS_ARCH_AMD_ZEN4 ON)
|
||||
set(KOKKOS_ARCH_AVX512XEON ON)
|
||||
endif()
|
||||
|
||||
if(KOKKOS_ARCH_SNB OR KOKKOS_ARCH_AMDAVX)
|
||||
set(KOKKOS_ARCH_AVX ON)
|
||||
compiler_specific_flags(
|
||||
@ -419,8 +440,6 @@ if(KOKKOS_ARCH_SNB OR KOKKOS_ARCH_AMDAVX)
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
Cray
|
||||
NO-VALUE-SPECIFIED
|
||||
Intel
|
||||
-mavx
|
||||
MSVC
|
||||
/arch:AVX
|
||||
NVHPC
|
||||
@ -437,8 +456,6 @@ if(KOKKOS_ARCH_HSW)
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
Cray
|
||||
NO-VALUE-SPECIFIED
|
||||
Intel
|
||||
-xCORE-AVX2
|
||||
MSVC
|
||||
/arch:AVX2
|
||||
NVHPC
|
||||
@ -477,8 +494,6 @@ if(KOKKOS_ARCH_BDW)
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
Cray
|
||||
NO-VALUE-SPECIFIED
|
||||
Intel
|
||||
-xCORE-AVX2
|
||||
MSVC
|
||||
/arch:AVX2
|
||||
NVHPC
|
||||
@ -498,8 +513,6 @@ if(KOKKOS_ARCH_KNL)
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
Cray
|
||||
NO-VALUE-SPECIFIED
|
||||
Intel
|
||||
-xMIC-AVX512
|
||||
MSVC
|
||||
/arch:AVX512
|
||||
NVHPC
|
||||
@ -520,8 +533,6 @@ if(KOKKOS_ARCH_SKL)
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
Cray
|
||||
NO-VALUE-SPECIFIED
|
||||
Intel
|
||||
-xSKYLAKE
|
||||
MSVC
|
||||
/arch:AVX2
|
||||
NVHPC
|
||||
@ -539,8 +550,6 @@ if(KOKKOS_ARCH_SKX)
|
||||
KOKKOS_CXX_HOST_COMPILER_ID
|
||||
Cray
|
||||
NO-VALUE-SPECIFIED
|
||||
Intel
|
||||
-xCORE-AVX512
|
||||
MSVC
|
||||
/arch:AVX512
|
||||
NVHPC
|
||||
@ -1193,9 +1202,8 @@ if(KOKKOS_ENABLE_HIP AND NOT AMDGPU_ARCH_ALREADY_SPECIFIED AND NOT KOKKOS_IMPL_A
|
||||
)
|
||||
else()
|
||||
execute_process(COMMAND ${ROCM_ENUMERATOR} OUTPUT_VARIABLE GPU_ARCHS)
|
||||
string(LENGTH "${GPU_ARCHS}" len_str)
|
||||
# enumerator always output gfx000 as the first line
|
||||
if(${len_str} LESS 8)
|
||||
# Exits early if no GPU was detected
|
||||
if("${GPU_ARCHS}" STREQUAL "")
|
||||
message(SEND_ERROR "HIP enabled but no AMD GPU architecture could be automatically detected. "
|
||||
"Please manually specify one AMD GPU architecture via -DKokkos_ARCH_{..}=ON'."
|
||||
)
|
||||
|
||||
@ -163,7 +163,6 @@ if(CMAKE_CXX_STANDARD EQUAL 17)
|
||||
set(KOKKOS_CLANG_CUDA_MINIMUM 10.0.0)
|
||||
set(KOKKOS_CLANG_OPENMPTARGET_MINIMUM 15.0.0)
|
||||
set(KOKKOS_GCC_MINIMUM 8.2.0)
|
||||
set(KOKKOS_INTEL_MINIMUM 19.0.5)
|
||||
set(KOKKOS_INTEL_LLVM_CPU_MINIMUM 2021.1.1)
|
||||
set(KOKKOS_INTEL_LLVM_SYCL_MINIMUM 2023.0.0)
|
||||
set(KOKKOS_NVCC_MINIMUM 11.0.0)
|
||||
@ -175,7 +174,6 @@ else()
|
||||
set(KOKKOS_CLANG_CUDA_MINIMUM 14.0.0)
|
||||
set(KOKKOS_CLANG_OPENMPTARGET_MINIMUM 15.0.0)
|
||||
set(KOKKOS_GCC_MINIMUM 10.1.0)
|
||||
set(KOKKOS_INTEL_MINIMUM "not supported")
|
||||
set(KOKKOS_INTEL_LLVM_CPU_MINIMUM 2022.0.0)
|
||||
set(KOKKOS_INTEL_LLVM_SYCL_MINIMUM 2023.0.0)
|
||||
set(KOKKOS_NVCC_MINIMUM 12.0.0)
|
||||
@ -191,7 +189,7 @@ set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(CPU) ${KOKKO
|
||||
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(CUDA) ${KOKKOS_CLANG_CUDA_MINIMUM}")
|
||||
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(OpenMPTarget) ${KOKKOS_CLANG_OPENMPTARGET_MINIMUM}")
|
||||
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n GCC ${KOKKOS_GCC_MINIMUM}")
|
||||
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Intel ${KOKKOS_INTEL_MINIMUM}")
|
||||
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Intel not supported")
|
||||
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n IntelLLVM(CPU) ${KOKKOS_INTEL_LLVM_CPU_MINIMUM}")
|
||||
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n IntelLLVM(SYCL) ${KOKKOS_INTEL_LLVM_SYCL_MINIMUM}")
|
||||
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n NVCC ${KOKKOS_NVCC_MINIMUM}")
|
||||
@ -214,9 +212,7 @@ elseif(KOKKOS_CXX_COMPILER_ID STREQUAL GNU)
|
||||
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||
endif()
|
||||
elseif(KOKKOS_CXX_COMPILER_ID STREQUAL Intel)
|
||||
if((NOT CMAKE_CXX_STANDARD EQUAL 17) OR (KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_INTEL_MINIMUM}))
|
||||
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||
endif()
|
||||
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||
elseif(KOKKOS_CXX_COMPILER_ID STREQUAL IntelLLVM AND NOT Kokkos_ENABLE_SYCL)
|
||||
if(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_INTEL_LLVM_CPU_MINIMUM})
|
||||
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
|
||||
|
||||
@ -76,7 +76,7 @@ kokkos_enable_option(
|
||||
HIP_MULTIPLE_KERNEL_INSTANTIATIONS OFF
|
||||
"Whether multiple kernels are instantiated at compile time - improve performance but increase compile time"
|
||||
)
|
||||
kokkos_enable_option(IMPL_HIP_MALLOC_ASYNC OFF "Whether to enable hipMallocAsync")
|
||||
kokkos_enable_option(IMPL_HIP_MALLOC_ASYNC ${KOKKOS_ENABLE_HIP} "Whether to enable hipMallocAsync")
|
||||
kokkos_enable_option(OPENACC_FORCE_HOST_AS_DEVICE OFF "Whether to force to use host as a target device for OpenACC")
|
||||
|
||||
# This option will go away eventually, but allows fallback to old implementation when needed.
|
||||
|
||||
@ -799,7 +799,6 @@ function(COMPILER_SPECIFIC_OPTIONS_HELPER)
|
||||
NVHPC
|
||||
DEFAULT
|
||||
Cray
|
||||
Intel
|
||||
Clang
|
||||
AppleClang
|
||||
IntelLLVM
|
||||
|
||||
@ -155,9 +155,6 @@ if(NOT KOKKOS_CXX_STANDARD_FEATURE)
|
||||
elseif(KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
|
||||
include(${KOKKOS_SRC_PATH}/cmake/pgi.cmake)
|
||||
kokkos_set_pgi_flags(${KOKKOS_CXX_STANDARD} ${KOKKOS_CXX_INTERMEDIATE_STANDARD})
|
||||
elseif(KOKKOS_CXX_COMPILER_ID STREQUAL Intel)
|
||||
include(${KOKKOS_SRC_PATH}/cmake/intel.cmake)
|
||||
kokkos_set_intel_flags(${KOKKOS_CXX_STANDARD} ${KOKKOS_CXX_INTERMEDIATE_STANDARD})
|
||||
elseif((KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC") OR ((KOKKOS_CXX_COMPILER_ID STREQUAL "NVIDIA") AND WIN32))
|
||||
include(${KOKKOS_SRC_PATH}/cmake/msvc.cmake)
|
||||
kokkos_set_msvc_flags(${KOKKOS_CXX_STANDARD} ${KOKKOS_CXX_INTERMEDIATE_STANDARD})
|
||||
|
||||
@ -106,7 +106,6 @@ function(KOKKOS_ADD_EXECUTABLE_AND_TEST ROOT_NAME)
|
||||
OR Kokkos_ENABLE_SYCL
|
||||
OR Kokkos_ENABLE_HPX
|
||||
OR Kokkos_ENABLE_IMPL_SKIP_NO_RTTI_FLAG
|
||||
OR (KOKKOS_CXX_COMPILER_ID STREQUAL "Intel" AND KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 2021.2.0)
|
||||
OR (KOKKOS_CXX_COMPILER_ID STREQUAL "NVIDIA" AND KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 11.3.0)
|
||||
OR (KOKKOS_CXX_COMPILER_ID STREQUAL "NVIDIA" AND KOKKOS_CXX_HOST_COMPILER_ID STREQUAL "MSVC"))
|
||||
)
|
||||
|
||||
@ -18,6 +18,8 @@ LINK ?= $(CXX)
|
||||
LDFLAGS ?=
|
||||
override LDFLAGS += -lpthread
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||
|
||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/performance_tests
|
||||
|
||||
@ -22,6 +22,7 @@
|
||||
#endif
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
#include <Kokkos_BitManipulation.hpp>
|
||||
#include <Kokkos_Functional.hpp>
|
||||
|
||||
#include <impl/Kokkos_Bitset_impl.hpp>
|
||||
@ -62,13 +63,11 @@ class Bitset {
|
||||
BIT_SCAN_REVERSE | MOVE_HINT_BACKWARD;
|
||||
|
||||
private:
|
||||
enum : unsigned {
|
||||
block_size = static_cast<unsigned>(sizeof(unsigned) * CHAR_BIT)
|
||||
};
|
||||
enum : unsigned { block_mask = block_size - 1u };
|
||||
enum : unsigned {
|
||||
block_shift = Kokkos::Impl::integral_power_of_two(block_size)
|
||||
};
|
||||
static constexpr unsigned block_size = sizeof(unsigned) * CHAR_BIT;
|
||||
static constexpr unsigned block_mask = block_size - 1u;
|
||||
static constexpr unsigned block_shift =
|
||||
Kokkos::has_single_bit(block_size) ? Kokkos::bit_width(block_size) - 1
|
||||
: ~0u;
|
||||
|
||||
//! Type of @ref m_blocks.
|
||||
using block_view_type = View<unsigned*, Device, MemoryTraits<RandomAccess>>;
|
||||
@ -135,9 +134,9 @@ class Bitset {
|
||||
|
||||
if (m_last_block_mask) {
|
||||
// clear the unused bits in the last block
|
||||
Kokkos::Impl::DeepCopy<typename Device::memory_space, Kokkos::HostSpace>(
|
||||
m_blocks.data() + (m_blocks.extent(0) - 1u), &m_last_block_mask,
|
||||
sizeof(unsigned));
|
||||
auto last_block = Kokkos::subview(m_blocks, m_blocks.extent(0) - 1u);
|
||||
Kokkos::deep_copy(typename Device::execution_space{}, last_block,
|
||||
m_last_block_mask);
|
||||
Kokkos::fence(
|
||||
"Bitset::set: fence after clearing unused bits copying from "
|
||||
"HostSpace");
|
||||
@ -324,9 +323,11 @@ class ConstBitset {
|
||||
using block_view_type = typename Bitset<Device>::block_view_type::const_type;
|
||||
|
||||
private:
|
||||
enum { block_size = static_cast<unsigned>(sizeof(unsigned) * CHAR_BIT) };
|
||||
enum { block_mask = block_size - 1u };
|
||||
enum { block_shift = Kokkos::Impl::integral_power_of_two(block_size) };
|
||||
static constexpr unsigned block_size = sizeof(unsigned) * CHAR_BIT;
|
||||
static constexpr unsigned block_mask = block_size - 1u;
|
||||
static constexpr unsigned block_shift =
|
||||
Kokkos::has_single_bit(block_size) ? Kokkos::bit_width(block_size) - 1
|
||||
: ~0u;
|
||||
|
||||
public:
|
||||
KOKKOS_FUNCTION
|
||||
@ -400,13 +401,7 @@ void deep_copy(Bitset<DstDevice>& dst, Bitset<SrcDevice> const& src) {
|
||||
Kokkos::Impl::throw_runtime_exception(
|
||||
"Error: Cannot deep_copy bitsets of different sizes!");
|
||||
}
|
||||
|
||||
Kokkos::fence("Bitset::deep_copy: fence before copy operation");
|
||||
Kokkos::Impl::DeepCopy<typename DstDevice::memory_space,
|
||||
typename SrcDevice::memory_space>(
|
||||
dst.m_blocks.data(), src.m_blocks.data(),
|
||||
sizeof(unsigned) * src.m_blocks.extent(0));
|
||||
Kokkos::fence("Bitset::deep_copy: fence after copy operation");
|
||||
Kokkos::deep_copy(dst.m_blocks, src.m_blocks);
|
||||
}
|
||||
|
||||
template <typename DstDevice, typename SrcDevice>
|
||||
@ -415,13 +410,7 @@ void deep_copy(Bitset<DstDevice>& dst, ConstBitset<SrcDevice> const& src) {
|
||||
Kokkos::Impl::throw_runtime_exception(
|
||||
"Error: Cannot deep_copy bitsets of different sizes!");
|
||||
}
|
||||
|
||||
Kokkos::fence("Bitset::deep_copy: fence before copy operation");
|
||||
Kokkos::Impl::DeepCopy<typename DstDevice::memory_space,
|
||||
typename SrcDevice::memory_space>(
|
||||
dst.m_blocks.data(), src.m_blocks.data(),
|
||||
sizeof(unsigned) * src.m_blocks.extent(0));
|
||||
Kokkos::fence("Bitset::deep_copy: fence after copy operation");
|
||||
Kokkos::deep_copy(dst.m_blocks, src.m_blocks);
|
||||
}
|
||||
|
||||
template <typename DstDevice, typename SrcDevice>
|
||||
@ -430,13 +419,7 @@ void deep_copy(ConstBitset<DstDevice>& dst, ConstBitset<SrcDevice> const& src) {
|
||||
Kokkos::Impl::throw_runtime_exception(
|
||||
"Error: Cannot deep_copy bitsets of different sizes!");
|
||||
}
|
||||
|
||||
Kokkos::fence("Bitset::deep_copy: fence before copy operation");
|
||||
Kokkos::Impl::DeepCopy<typename DstDevice::memory_space,
|
||||
typename SrcDevice::memory_space>(
|
||||
dst.m_blocks.data(), src.m_blocks.data(),
|
||||
sizeof(unsigned) * src.m_blocks.extent(0));
|
||||
Kokkos::fence("Bitset::deep_copy: fence after copy operation");
|
||||
Kokkos::deep_copy(dst.m_blocks, src.m_blocks);
|
||||
}
|
||||
|
||||
} // namespace Kokkos
|
||||
|
||||
@ -211,6 +211,12 @@ class DualView : public ViewTraits<DataType, Properties...> {
|
||||
public:
|
||||
//@}
|
||||
|
||||
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
|
||||
public:
|
||||
#else
|
||||
private:
|
||||
#endif
|
||||
|
||||
// Moved this specifically after modified_flags to resolve an alignment issue
|
||||
// on MSVC/NVCC
|
||||
//! \name The two View instances.
|
||||
@ -219,6 +225,7 @@ class DualView : public ViewTraits<DataType, Properties...> {
|
||||
t_host h_view;
|
||||
//@}
|
||||
|
||||
public:
|
||||
//! \name Constructors
|
||||
//@{
|
||||
|
||||
@ -456,16 +463,21 @@ class DualView : public ViewTraits<DataType, Properties...> {
|
||||
}
|
||||
}
|
||||
}
|
||||
#ifdef KOKKOS_COMPILER_INTEL
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
|
||||
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
t_host view_host() const { return h_view; }
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
t_dev view_device() const { return d_view; }
|
||||
#else
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
const t_host& view_host() const { return h_view; }
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
const t_dev& view_device() const { return d_view; }
|
||||
#endif
|
||||
|
||||
KOKKOS_INLINE_FUNCTION constexpr bool is_allocated() const {
|
||||
return (d_view.is_allocated() && h_view.is_allocated());
|
||||
@ -615,8 +627,8 @@ class DualView : public ViewTraits<DataType, Properties...> {
|
||||
impl_report_host_sync();
|
||||
}
|
||||
}
|
||||
if constexpr (std::is_same<typename t_host::memory_space,
|
||||
typename t_dev::memory_space>::value) {
|
||||
if constexpr (std::is_same_v<typename t_host::memory_space,
|
||||
typename t_dev::memory_space>) {
|
||||
typename t_dev::execution_space().fence(
|
||||
"Kokkos::DualView<>::sync: fence after syncing DualView");
|
||||
typename t_host::execution_space().fence(
|
||||
@ -687,8 +699,8 @@ class DualView : public ViewTraits<DataType, Properties...> {
|
||||
// deliberately passing args by cref as they're used multiple times
|
||||
template <typename... Args>
|
||||
void sync_host_impl(Args const&... args) {
|
||||
if (!std::is_same<typename traits::data_type,
|
||||
typename traits::non_const_data_type>::value)
|
||||
if (!std::is_same_v<typename traits::data_type,
|
||||
typename traits::non_const_data_type>)
|
||||
Impl::throw_runtime_exception(
|
||||
"Calling sync_host on a DualView with a const datatype.");
|
||||
if (modified_flags.data() == nullptr) return;
|
||||
@ -718,8 +730,8 @@ class DualView : public ViewTraits<DataType, Properties...> {
|
||||
// deliberately passing args by cref as they're used multiple times
|
||||
template <typename... Args>
|
||||
void sync_device_impl(Args const&... args) {
|
||||
if (!std::is_same<typename traits::data_type,
|
||||
typename traits::non_const_data_type>::value)
|
||||
if (!std::is_same_v<typename traits::data_type,
|
||||
typename traits::non_const_data_type>)
|
||||
Impl::throw_runtime_exception(
|
||||
"Calling sync_device on a DualView with a const datatype.");
|
||||
if (modified_flags.data() == nullptr) return;
|
||||
@ -1264,10 +1276,10 @@ namespace Kokkos {
|
||||
template <class DT, class... DP, class ST, class... SP>
|
||||
void deep_copy(DualView<DT, DP...>& dst, const DualView<ST, SP...>& src) {
|
||||
if (src.need_sync_device()) {
|
||||
deep_copy(dst.h_view, src.h_view);
|
||||
deep_copy(dst.view_host(), src.view_host());
|
||||
dst.modify_host();
|
||||
} else {
|
||||
deep_copy(dst.d_view, src.d_view);
|
||||
deep_copy(dst.view_device(), src.view_device());
|
||||
dst.modify_device();
|
||||
}
|
||||
}
|
||||
@ -1276,10 +1288,10 @@ template <class ExecutionSpace, class DT, class... DP, class ST, class... SP>
|
||||
void deep_copy(const ExecutionSpace& exec, DualView<DT, DP...>& dst,
|
||||
const DualView<ST, SP...>& src) {
|
||||
if (src.need_sync_device()) {
|
||||
deep_copy(exec, dst.h_view, src.h_view);
|
||||
deep_copy(exec, dst.view_host(), src.view_host());
|
||||
dst.modify_host();
|
||||
} else {
|
||||
deep_copy(exec, dst.d_view, src.d_view);
|
||||
deep_copy(exec, dst.view_device(), src.view_device());
|
||||
dst.modify_device();
|
||||
}
|
||||
}
|
||||
|
||||
@ -626,9 +626,8 @@ class DynRankView : private View<DataType*******, Properties...> {
|
||||
} else
|
||||
#endif
|
||||
return view_type::operator()(i0, 0, 0, 0, 0, 0, 0);
|
||||
#if defined KOKKOS_COMPILER_INTEL || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
@ -656,9 +655,8 @@ class DynRankView : private View<DataType*******, Properties...> {
|
||||
} else
|
||||
#endif
|
||||
return view_type::operator()(i0, i1, 0, 0, 0, 0, 0);
|
||||
#if defined KOKKOS_COMPILER_INTEL || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
@ -690,9 +688,8 @@ class DynRankView : private View<DataType*******, Properties...> {
|
||||
} else
|
||||
#endif
|
||||
return view_type::operator()(i0, i1, i2, 0, 0, 0, 0);
|
||||
#if defined KOKKOS_COMPILER_INTEL || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
@ -1124,57 +1121,6 @@ KOKKOS_INLINE_FUNCTION bool operator!=(const DynRankView<LT, LP...>& lhs,
|
||||
namespace Kokkos {
|
||||
namespace Impl {
|
||||
|
||||
template <class OutputView, class Enable = void>
|
||||
struct DynRankViewFill {
|
||||
using const_value_type = typename OutputView::traits::const_value_type;
|
||||
|
||||
const OutputView output;
|
||||
const_value_type input;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator()(const size_t i0) const {
|
||||
const size_t n1 = output.extent(1);
|
||||
const size_t n2 = output.extent(2);
|
||||
const size_t n3 = output.extent(3);
|
||||
const size_t n4 = output.extent(4);
|
||||
const size_t n5 = output.extent(5);
|
||||
const size_t n6 = output.extent(6);
|
||||
|
||||
for (size_t i1 = 0; i1 < n1; ++i1) {
|
||||
for (size_t i2 = 0; i2 < n2; ++i2) {
|
||||
for (size_t i3 = 0; i3 < n3; ++i3) {
|
||||
for (size_t i4 = 0; i4 < n4; ++i4) {
|
||||
for (size_t i5 = 0; i5 < n5; ++i5) {
|
||||
for (size_t i6 = 0; i6 < n6; ++i6) {
|
||||
output.access(i0, i1, i2, i3, i4, i5, i6) = input;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
DynRankViewFill(const OutputView& arg_out, const_value_type& arg_in)
|
||||
: output(arg_out), input(arg_in) {
|
||||
using execution_space = typename OutputView::execution_space;
|
||||
using Policy = Kokkos::RangePolicy<execution_space>;
|
||||
|
||||
Kokkos::parallel_for("Kokkos::DynRankViewFill", Policy(0, output.extent(0)),
|
||||
*this);
|
||||
}
|
||||
};
|
||||
|
||||
template <class OutputView>
|
||||
struct DynRankViewFill<OutputView, std::enable_if_t<OutputView::rank == 0>> {
|
||||
DynRankViewFill(const OutputView& dst,
|
||||
const typename OutputView::const_value_type& src) {
|
||||
Kokkos::Impl::DeepCopy<typename OutputView::memory_space,
|
||||
Kokkos::HostSpace>(
|
||||
dst.data(), &src, sizeof(typename OutputView::const_value_type));
|
||||
}
|
||||
};
|
||||
|
||||
template <class OutputView, class InputView,
|
||||
class ExecSpace = typename OutputView::execution_space>
|
||||
struct DynRankViewRemap {
|
||||
@ -1521,9 +1467,8 @@ inline auto create_mirror(const DynRankView<T, P...>& src,
|
||||
return dst_type(prop_copy,
|
||||
Impl::reconstructLayout(src.layout(), src.rank()));
|
||||
}
|
||||
#if defined(KOKKOS_COMPILER_INTEL) || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
@ -1611,9 +1556,8 @@ inline auto create_mirror_view(
|
||||
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
|
||||
}
|
||||
}
|
||||
#if defined(KOKKOS_COMPILER_INTEL) || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
@ -1754,6 +1698,7 @@ inline void impl_resize(const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
|
||||
Kokkos::Impl::DynRankViewRemap<drview_type, drview_type>(
|
||||
Impl::get_property<Impl::ExecutionSpaceTag>(prop_copy), v_resized, v);
|
||||
else {
|
||||
// NOLINTNEXTLINE(bugprone-unused-raii)
|
||||
Kokkos::Impl::DynRankViewRemap<drview_type, drview_type>(v_resized, v);
|
||||
Kokkos::fence("Kokkos::resize(DynRankView)");
|
||||
}
|
||||
|
||||
@ -155,6 +155,7 @@ struct ChunkedArrayManager {
|
||||
}
|
||||
// Destroy the linked allocation if we have one.
|
||||
if (m_linked != nullptr) {
|
||||
// NOLINTNEXTLINE(bugprone-multi-level-implicit-pointer-conversion)
|
||||
Space().deallocate(m_label.c_str(), m_linked,
|
||||
(sizeof(value_type*) * (m_chunk_max + 2)));
|
||||
}
|
||||
@ -195,11 +196,13 @@ struct ChunkedArrayManager {
|
||||
void deep_copy_to(
|
||||
const ExecutionSpace& exec_space,
|
||||
ChunkedArrayManager<OtherMemorySpace, ValueType> const& other) const {
|
||||
if (other.m_chunks != m_chunks) {
|
||||
Kokkos::Impl::DeepCopy<OtherMemorySpace, MemorySpace, ExecutionSpace>(
|
||||
exec_space, other.m_chunks, m_chunks,
|
||||
sizeof(pointer_type) * (m_chunk_max + 2));
|
||||
}
|
||||
// use of ad-hoc unmanaged views
|
||||
Kokkos::deep_copy(
|
||||
exec_space,
|
||||
Kokkos::View<uintptr_t*, OtherMemorySpace>(
|
||||
reinterpret_cast<uintptr_t*>(other.m_chunks), m_chunk_max + 2),
|
||||
Kokkos::View<uintptr_t*, MemorySpace>(
|
||||
reinterpret_cast<uintptr_t*>(m_chunks), m_chunk_max + 2));
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
@ -621,9 +624,8 @@ inline auto create_mirror(const Kokkos::Experimental::DynamicView<T, P...>& src,
|
||||
|
||||
return ret;
|
||||
}
|
||||
#if defined(KOKKOS_COMPILER_INTEL) || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
@ -718,9 +720,8 @@ inline auto create_mirror_view(
|
||||
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
|
||||
}
|
||||
}
|
||||
#if defined(KOKKOS_COMPILER_INTEL) || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
@ -789,9 +790,9 @@ inline void deep_copy(const Kokkos::Experimental::DynamicView<T, DP...>& dst,
|
||||
dst_memory_space>::accessible;
|
||||
|
||||
if (DstExecCanAccessSrc)
|
||||
Kokkos::Impl::ViewRemap<dst_type, src_type, dst_execution_space>(dst, src);
|
||||
Kokkos::Impl::ViewRemap<dst_type, src_type>(dst, src);
|
||||
else if (SrcExecCanAccessDst)
|
||||
Kokkos::Impl::ViewRemap<dst_type, src_type, src_execution_space>(dst, src);
|
||||
Kokkos::Impl::ViewRemap<dst_type, src_type>(dst, src);
|
||||
else
|
||||
src.impl_get_chunks().deep_copy_to(dst_execution_space{},
|
||||
dst.impl_get_chunks());
|
||||
@ -819,9 +820,9 @@ inline void deep_copy(const ExecutionSpace& exec,
|
||||
|
||||
// FIXME use execution space
|
||||
if (DstExecCanAccessSrc)
|
||||
Kokkos::Impl::ViewRemap<dst_type, src_type, dst_execution_space>(dst, src);
|
||||
Kokkos::Impl::ViewRemap<dst_type, src_type>(dst, src);
|
||||
else if (SrcExecCanAccessDst)
|
||||
Kokkos::Impl::ViewRemap<dst_type, src_type, src_execution_space>(dst, src);
|
||||
Kokkos::Impl::ViewRemap<dst_type, src_type>(dst, src);
|
||||
else
|
||||
src.impl_get_chunks().deep_copy_to(exec, dst.impl_get_chunks());
|
||||
}
|
||||
@ -873,7 +874,7 @@ inline void deep_copy(const Kokkos::Experimental::DynamicView<T, DP...>& dst,
|
||||
namespace Impl {
|
||||
template <class Arg0, class... DP, class... SP>
|
||||
struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>,
|
||||
Kokkos::Experimental::DynamicView<SP...>, 1, Arg0> {
|
||||
Kokkos::Experimental::DynamicView<SP...>, Arg0> {
|
||||
using DstType = Kokkos::Experimental::DynamicView<DP...>;
|
||||
using SrcType = Kokkos::Experimental::DynamicView<SP...>;
|
||||
using dst_subview_type = DstType;
|
||||
@ -885,8 +886,7 @@ struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>,
|
||||
};
|
||||
|
||||
template <class... DP, class SrcType, class Arg0>
|
||||
struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>, SrcType, 1,
|
||||
Arg0> {
|
||||
struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>, SrcType, Arg0> {
|
||||
using DstType = Kokkos::Experimental::DynamicView<DP...>;
|
||||
using dst_subview_type = DstType;
|
||||
using src_subview_type = typename Kokkos::Subview<SrcType, Arg0>;
|
||||
@ -897,8 +897,7 @@ struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>, SrcType, 1,
|
||||
};
|
||||
|
||||
template <class DstType, class... SP, class Arg0>
|
||||
struct CommonSubview<DstType, Kokkos::Experimental::DynamicView<SP...>, 1,
|
||||
Arg0> {
|
||||
struct CommonSubview<DstType, Kokkos::Experimental::DynamicView<SP...>, Arg0> {
|
||||
using SrcType = Kokkos::Experimental::DynamicView<SP...>;
|
||||
using dst_subview_type = typename Kokkos::Subview<DstType, Arg0>;
|
||||
using src_subview_type = SrcType;
|
||||
|
||||
@ -43,7 +43,7 @@ class ErrorReporter {
|
||||
clear();
|
||||
}
|
||||
|
||||
int getCapacity() const { return m_reports.h_view.extent(0); }
|
||||
int getCapacity() const { return m_reports.view_host().extent(0); }
|
||||
|
||||
int getNumReports();
|
||||
|
||||
@ -69,9 +69,10 @@ class ErrorReporter {
|
||||
bool add_report(int reporter_id, report_type report) const {
|
||||
int idx = Kokkos::atomic_fetch_add(&m_numReportsAttempted(), 1);
|
||||
|
||||
if (idx >= 0 && (idx < static_cast<int>(m_reports.d_view.extent(0)))) {
|
||||
m_reporters.d_view(idx) = reporter_id;
|
||||
m_reports.d_view(idx) = report;
|
||||
if (idx >= 0 &&
|
||||
(idx < static_cast<int>(m_reports.view_device().extent(0)))) {
|
||||
m_reporters.view_device()(idx) = reporter_id;
|
||||
m_reports.view_device()(idx) = report;
|
||||
return true;
|
||||
} else {
|
||||
return false;
|
||||
@ -92,8 +93,8 @@ template <typename ReportType, typename DeviceType>
|
||||
inline int ErrorReporter<ReportType, DeviceType>::getNumReports() {
|
||||
int num_reports = 0;
|
||||
Kokkos::deep_copy(num_reports, m_numReportsAttempted);
|
||||
if (num_reports > static_cast<int>(m_reports.h_view.extent(0))) {
|
||||
num_reports = m_reports.h_view.extent(0);
|
||||
if (num_reports > static_cast<int>(m_reports.view_host().extent(0))) {
|
||||
num_reports = m_reports.view_host().extent(0);
|
||||
}
|
||||
return num_reports;
|
||||
}
|
||||
@ -119,8 +120,8 @@ void ErrorReporter<ReportType, DeviceType>::getReports(
|
||||
m_reporters.template sync<host_mirror_space>();
|
||||
|
||||
for (int i = 0; i < num_reports; ++i) {
|
||||
reporters_out.push_back(m_reporters.h_view(i));
|
||||
reports_out.push_back(m_reports.h_view(i));
|
||||
reporters_out.push_back(m_reporters.view_host()(i));
|
||||
reports_out.push_back(m_reports.view_host()(i));
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -143,8 +144,8 @@ void ErrorReporter<ReportType, DeviceType>::getReports(
|
||||
m_reporters.template sync<host_mirror_space>();
|
||||
|
||||
for (int i = 0; i < num_reports; ++i) {
|
||||
reporters_out(i) = m_reporters.h_view(i);
|
||||
reports_out(i) = m_reports.h_view(i);
|
||||
reporters_out(i) = m_reporters.view_host()(i);
|
||||
reports_out(i) = m_reports.view_host()(i);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -651,8 +651,8 @@ class OffsetView : public View<DataType, Properties...> {
|
||||
m_begins[i] = minIndices.begin()[i];
|
||||
}
|
||||
static_assert(
|
||||
std::is_same<pointer_type, typename Kokkos::Impl::ViewCtorProp<
|
||||
P...>::pointer_type>::value,
|
||||
std::is_same_v<pointer_type,
|
||||
typename Kokkos::Impl::ViewCtorProp<P...>::pointer_type>,
|
||||
"When constructing OffsetView to wrap user memory, you must supply "
|
||||
"matching pointer type");
|
||||
}
|
||||
@ -1312,9 +1312,8 @@ inline auto create_mirror(const Kokkos::Experimental::OffsetView<T, P...>& src,
|
||||
return typename Kokkos::Experimental::OffsetView<T, P...>::HostMirror(
|
||||
Kokkos::create_mirror(arg_prop, src.view()), src.begins());
|
||||
}
|
||||
#if defined(KOKKOS_COMPILER_INTEL) || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
@ -1408,9 +1407,8 @@ inline auto create_mirror_view(
|
||||
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
|
||||
}
|
||||
}
|
||||
#if defined(KOKKOS_COMPILER_INTEL) || \
|
||||
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC))
|
||||
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
|
||||
!defined(KOKKOS_COMPILER_MSVC)
|
||||
__builtin_unreachable();
|
||||
#endif
|
||||
}
|
||||
|
||||
@ -788,7 +788,7 @@ class ScatterView<DataType, Layout, DeviceType, Op, ScatterNonDuplicated,
|
||||
void contribute_into(execution_space const& exec_space,
|
||||
View<DT, RP...> const& dest) const {
|
||||
using dest_type = View<DT, RP...>;
|
||||
static_assert(std::is_same<typename dest_type::array_layout, Layout>::value,
|
||||
static_assert(std::is_same_v<typename dest_type::array_layout, Layout>,
|
||||
"ScatterView contribute destination has different layout");
|
||||
static_assert(
|
||||
Kokkos::SpaceAccessibility<
|
||||
@ -1071,9 +1071,9 @@ class ScatterView<DataType, Kokkos::LayoutRight, DeviceType, Op,
|
||||
void contribute_into(execution_space const& exec_space,
|
||||
View<DT, RP...> const& dest) const {
|
||||
using dest_type = View<DT, RP...>;
|
||||
static_assert(std::is_same<typename dest_type::array_layout,
|
||||
Kokkos::LayoutRight>::value,
|
||||
"ScatterView deep_copy destination has different layout");
|
||||
static_assert(
|
||||
std::is_same_v<typename dest_type::array_layout, Kokkos::LayoutRight>,
|
||||
"ScatterView deep_copy destination has different layout");
|
||||
static_assert(
|
||||
Kokkos::SpaceAccessibility<
|
||||
execution_space, typename dest_type::memory_space>::accessible,
|
||||
@ -1351,12 +1351,12 @@ class ScatterView<DataType, Kokkos::LayoutLeft, DeviceType, Op,
|
||||
View<RP...> const& dest) const {
|
||||
using dest_type = View<RP...>;
|
||||
static_assert(
|
||||
std::is_same<typename dest_type::value_type,
|
||||
typename original_view_type::non_const_value_type>::value,
|
||||
std::is_same_v<typename dest_type::value_type,
|
||||
typename original_view_type::non_const_value_type>,
|
||||
"ScatterView deep_copy destination has wrong value_type");
|
||||
static_assert(std::is_same<typename dest_type::array_layout,
|
||||
Kokkos::LayoutLeft>::value,
|
||||
"ScatterView deep_copy destination has different layout");
|
||||
static_assert(
|
||||
std::is_same_v<typename dest_type::array_layout, Kokkos::LayoutLeft>,
|
||||
"ScatterView deep_copy destination has different layout");
|
||||
static_assert(
|
||||
Kokkos::SpaceAccessibility<
|
||||
execution_space, typename dest_type::memory_space>::accessible,
|
||||
|
||||
@ -21,6 +21,23 @@
|
||||
#define KOKKOS_IMPL_PUBLIC_INCLUDE_NOTDEFINED_STATICCRSGRAPH
|
||||
#endif
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
#if defined(KOKKOS_ENABLE_DEPRECATED_CODE_4)
|
||||
#if defined(KOKKOS_ENABLE_DEPRECATION_WARNINGS) && \
|
||||
!defined(KOKKOS_IMPL_DO_NOT_WARN_INCLUDE_STATIC_CRS_GRAPH)
|
||||
namespace {
|
||||
[[deprecated("Deprecated <Kokkos_StaticCrsGraph.hpp> header is included")]] int
|
||||
emit_warning_kokkos_static_crs_graph_deprecated() {
|
||||
return 0;
|
||||
}
|
||||
static auto do_not_include = emit_warning_kokkos_static_crs_graph_deprecated();
|
||||
} // namespace
|
||||
#endif
|
||||
#else
|
||||
#error "Deprecated <Kokkos_StaticCrsGraph.hpp> header is included"
|
||||
#endif
|
||||
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
|
||||
@ -874,22 +874,16 @@ class UnorderedMap {
|
||||
if (m_hash_lists.data() != src.m_hash_lists.data()) {
|
||||
Kokkos::deep_copy(m_available_indexes, src.m_available_indexes);
|
||||
|
||||
using raw_deep_copy =
|
||||
Kokkos::Impl::DeepCopy<typename device_type::memory_space,
|
||||
typename SDevice::memory_space>;
|
||||
// do the other deep copies asynchronously if possible
|
||||
typename device_type::execution_space exec_space{};
|
||||
|
||||
raw_deep_copy(m_hash_lists.data(), src.m_hash_lists.data(),
|
||||
sizeof(size_type) * src.m_hash_lists.extent(0));
|
||||
raw_deep_copy(m_next_index.data(), src.m_next_index.data(),
|
||||
sizeof(size_type) * src.m_next_index.extent(0));
|
||||
raw_deep_copy(m_keys.data(), src.m_keys.data(),
|
||||
sizeof(key_type) * src.m_keys.extent(0));
|
||||
Kokkos::deep_copy(exec_space, m_hash_lists, src.m_hash_lists);
|
||||
Kokkos::deep_copy(exec_space, m_next_index, src.m_next_index);
|
||||
Kokkos::deep_copy(exec_space, m_keys, src.m_keys);
|
||||
if (!is_set) {
|
||||
raw_deep_copy(m_values.data(), src.m_values.data(),
|
||||
sizeof(impl_value_type) * src.m_values.extent(0));
|
||||
Kokkos::deep_copy(exec_space, m_values, src.m_values);
|
||||
}
|
||||
raw_deep_copy(m_scalars.data(), src.m_scalars.data(),
|
||||
sizeof(int) * num_scalars);
|
||||
Kokkos::deep_copy(exec_space, m_scalars, src.m_scalars);
|
||||
|
||||
Kokkos::fence(
|
||||
"Kokkos::UnorderedMap::deep_copy_view: fence after copy to dst.");
|
||||
@ -901,33 +895,27 @@ class UnorderedMap {
|
||||
bool modified() const { return get_flag(modified_idx); }
|
||||
|
||||
void set_flag(int flag) const {
|
||||
using raw_deep_copy =
|
||||
Kokkos::Impl::DeepCopy<typename device_type::memory_space,
|
||||
Kokkos::HostSpace>;
|
||||
const int true_ = true;
|
||||
raw_deep_copy(m_scalars.data() + flag, &true_, sizeof(int));
|
||||
auto scalar = Kokkos::subview(m_scalars, flag);
|
||||
Kokkos::deep_copy(typename device_type::execution_space{}, scalar,
|
||||
static_cast<int>(true));
|
||||
Kokkos::fence(
|
||||
"Kokkos::UnorderedMap::set_flag: fence after copying flag from "
|
||||
"HostSpace");
|
||||
}
|
||||
|
||||
void reset_flag(int flag) const {
|
||||
using raw_deep_copy =
|
||||
Kokkos::Impl::DeepCopy<typename device_type::memory_space,
|
||||
Kokkos::HostSpace>;
|
||||
const int false_ = false;
|
||||
raw_deep_copy(m_scalars.data() + flag, &false_, sizeof(int));
|
||||
auto scalar = Kokkos::subview(m_scalars, flag);
|
||||
Kokkos::deep_copy(typename device_type::execution_space{}, scalar,
|
||||
static_cast<int>(false));
|
||||
Kokkos::fence(
|
||||
"Kokkos::UnorderedMap::reset_flag: fence after copying flag from "
|
||||
"HostSpace");
|
||||
}
|
||||
|
||||
bool get_flag(int flag) const {
|
||||
using raw_deep_copy =
|
||||
Kokkos::Impl::DeepCopy<Kokkos::HostSpace,
|
||||
typename device_type::memory_space>;
|
||||
int result = false;
|
||||
raw_deep_copy(&result, m_scalars.data() + flag, sizeof(int));
|
||||
const auto scalar = Kokkos::subview(m_scalars, flag);
|
||||
int result;
|
||||
Kokkos::deep_copy(typename device_type::execution_space{}, result, scalar);
|
||||
Kokkos::fence(
|
||||
"Kokkos::UnorderedMap::get_flag: fence after copy to return value in "
|
||||
"HostSpace");
|
||||
|
||||
@ -69,14 +69,14 @@ class KOKKOS_DEPRECATED vector
|
||||
public:
|
||||
#ifdef KOKKOS_ENABLE_CUDA_UVM
|
||||
KOKKOS_INLINE_FUNCTION reference operator()(int i) const {
|
||||
return DV::h_view(i);
|
||||
return DV::view_host()(i);
|
||||
};
|
||||
KOKKOS_INLINE_FUNCTION reference operator[](int i) const {
|
||||
return DV::h_view(i);
|
||||
return DV::view_host()(i);
|
||||
};
|
||||
#else
|
||||
inline reference operator()(int i) const { return DV::h_view(i); };
|
||||
inline reference operator[](int i) const { return DV::h_view(i); };
|
||||
inline reference operator()(int i) const { return DV::view_host()(i); };
|
||||
inline reference operator[](int i) const { return DV::view_host()(i); };
|
||||
#endif
|
||||
|
||||
/* Member functions which behave like std::vector functions */
|
||||
@ -111,13 +111,13 @@ class KOKKOS_DEPRECATED vector
|
||||
/* Assign value either on host or on device */
|
||||
|
||||
if (DV::template need_sync<typename DV::t_dev::device_type>()) {
|
||||
set_functor_host f(DV::h_view, val);
|
||||
set_functor_host f(DV::view_host(), val);
|
||||
parallel_for("Kokkos::vector::assign", n, f);
|
||||
typename DV::t_host::execution_space().fence(
|
||||
"Kokkos::vector::assign: fence after assigning values");
|
||||
DV::template modify<typename DV::t_host::device_type>();
|
||||
} else {
|
||||
set_functor f(DV::d_view, val);
|
||||
set_functor f(DV::view_device(), val);
|
||||
parallel_for("Kokkos::vector::assign", n, f);
|
||||
typename DV::t_dev::execution_space().fence(
|
||||
"Kokkos::vector::assign: fence after assigning values");
|
||||
@ -136,7 +136,7 @@ class KOKKOS_DEPRECATED vector
|
||||
DV::resize(new_size);
|
||||
}
|
||||
|
||||
DV::h_view(_size) = val;
|
||||
DV::view_host()(_size) = val;
|
||||
_size++;
|
||||
}
|
||||
|
||||
@ -209,27 +209,27 @@ class KOKKOS_DEPRECATED vector
|
||||
size_type span() const { return DV::span(); }
|
||||
bool empty() const { return _size == 0; }
|
||||
|
||||
pointer data() const { return DV::h_view.data(); }
|
||||
pointer data() const { return DV::view_host().data(); }
|
||||
|
||||
iterator begin() const { return DV::h_view.data(); }
|
||||
iterator begin() const { return DV::view_host().data(); }
|
||||
|
||||
const_iterator cbegin() const { return DV::h_view.data(); }
|
||||
const_iterator cbegin() const { return DV::view_host().data(); }
|
||||
|
||||
iterator end() const {
|
||||
return _size > 0 ? DV::h_view.data() + _size : DV::h_view.data();
|
||||
return _size > 0 ? DV::view_host().data() + _size : DV::view_host().data();
|
||||
}
|
||||
|
||||
const_iterator cend() const {
|
||||
return _size > 0 ? DV::h_view.data() + _size : DV::h_view.data();
|
||||
return _size > 0 ? DV::view_host().data() + _size : DV::view_host().data();
|
||||
}
|
||||
|
||||
reference front() { return DV::h_view(0); }
|
||||
reference front() { return DV::view_host()(0); }
|
||||
|
||||
reference back() { return DV::h_view(_size - 1); }
|
||||
reference back() { return DV::view_host()(_size - 1); }
|
||||
|
||||
const_reference front() const { return DV::h_view(0); }
|
||||
const_reference front() const { return DV::view_host()(0); }
|
||||
|
||||
const_reference back() const { return DV::h_view(_size - 1); }
|
||||
const_reference back() const { return DV::view_host()(_size - 1); }
|
||||
|
||||
/* std::algorithms which work originally with iterators, here they are
|
||||
* implemented as member functions */
|
||||
@ -245,10 +245,10 @@ class KOKKOS_DEPRECATED vector
|
||||
return theEnd;
|
||||
}
|
||||
|
||||
Scalar lower_val = DV::h_view(lower);
|
||||
Scalar upper_val = DV::h_view(upper);
|
||||
Scalar lower_val = DV::view_host()(lower);
|
||||
Scalar upper_val = DV::view_host()(upper);
|
||||
size_t idx = (upper + lower) / 2;
|
||||
Scalar val = DV::h_view(idx);
|
||||
Scalar val = DV::view_host()(idx);
|
||||
if (val > upper_val) return upper;
|
||||
if (val < lower_val) return start;
|
||||
|
||||
@ -259,14 +259,14 @@ class KOKKOS_DEPRECATED vector
|
||||
upper = idx;
|
||||
}
|
||||
idx = (upper + lower) / 2;
|
||||
val = DV::h_view(idx);
|
||||
val = DV::view_host()(idx);
|
||||
}
|
||||
return idx;
|
||||
}
|
||||
|
||||
bool is_sorted() {
|
||||
for (int i = 0; i < _size - 1; i++) {
|
||||
if (DV::h_view(i) > DV::h_view(i + 1)) return false;
|
||||
if (DV::view_host()(i) > DV::view_host()(i + 1)) return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
@ -279,26 +279,27 @@ class KOKKOS_DEPRECATED vector
|
||||
upper = _size - 1;
|
||||
lower = 0;
|
||||
|
||||
if ((val < DV::h_view(0)) || (val > DV::h_view(_size - 1))) return end();
|
||||
if ((val < DV::view_host()(0)) || (val > DV::view_host()(_size - 1)))
|
||||
return end();
|
||||
|
||||
while (upper > lower) {
|
||||
if (val > DV::h_view(current))
|
||||
if (val > DV::view_host()(current))
|
||||
lower = current + 1;
|
||||
else
|
||||
upper = current;
|
||||
current = (upper + lower) / 2;
|
||||
}
|
||||
|
||||
if (val == DV::h_view(current))
|
||||
return &DV::h_view(current);
|
||||
if (val == DV::view_host()(current))
|
||||
return &DV::view_host()(current);
|
||||
else
|
||||
return end();
|
||||
}
|
||||
|
||||
/* Additional functions for data management */
|
||||
|
||||
void device_to_host() { deep_copy(DV::h_view, DV::d_view); }
|
||||
void host_to_device() const { deep_copy(DV::d_view, DV::h_view); }
|
||||
void device_to_host() { deep_copy(DV::view_host(), DV::view_device()); }
|
||||
void host_to_device() const { deep_copy(DV::view_device(), DV::view_host()); }
|
||||
|
||||
void on_host() { DV::template modify<typename DV::t_host::device_type>(); }
|
||||
void on_device() { DV::template modify<typename DV::t_dev::device_type>(); }
|
||||
|
||||
@ -75,6 +75,7 @@ uint32_t MurmurHash3_x86_32(const void* key, int len, uint32_t seed) {
|
||||
//----------
|
||||
// tail
|
||||
|
||||
// NOLINTNEXTLINE(bugprone-implicit-widening-of-multiplication-result)
|
||||
const uint8_t* tail = (const uint8_t*)(data + nblocks * 4);
|
||||
|
||||
uint32_t k1 = 0;
|
||||
@ -88,6 +89,8 @@ uint32_t MurmurHash3_x86_32(const void* key, int len, uint32_t seed) {
|
||||
k1 = rotl32(k1, 15);
|
||||
k1 *= c2;
|
||||
h1 ^= k1;
|
||||
break;
|
||||
default: break;
|
||||
};
|
||||
|
||||
//----------
|
||||
|
||||
@ -29,8 +29,8 @@ foreach(Tag Threads;Serial;OpenMP;HPX;Cuda;HIP;SYCL)
|
||||
Vector
|
||||
ViewCtorPropEmbeddedDim
|
||||
)
|
||||
if(NOT Kokkos_ENABLE_DEPRECATED_CODE_4 AND Name STREQUAL "Vector")
|
||||
continue() # skip Kokkos::vector test if deprecated code 4 is not enabled
|
||||
if(NOT Kokkos_ENABLE_DEPRECATED_CODE_4 AND NOT Name IN_LIST "Vector,StaticCrsGraph")
|
||||
continue() # skip Kokkos::{vector,StaticCrsGraph} tests if deprecated code 4 is not enabled
|
||||
endif()
|
||||
# Write to a temporary intermediate file and call configure_file to avoid
|
||||
# updating timestamps triggering unnecessary rebuilds on subsequent cmake runs.
|
||||
|
||||
@ -24,6 +24,8 @@ LINK ?= $(CXX)
|
||||
LDFLAGS ?=
|
||||
override LDFLAGS += -lpthread
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||
|
||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests -I${KOKKOS_PATH}/core/unit_test/category_files
|
||||
@ -31,7 +33,7 @@ KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests -I${KO
|
||||
TEST_TARGETS =
|
||||
TARGETS =
|
||||
|
||||
TESTS = Bitset DualView DynamicView DynViewAPI_generic DynViewAPI_rank12345 DynViewAPI_rank67 ErrorReporter OffsetView ScatterView StaticCrsGraph UnorderedMap ViewCtorPropEmbeddedDim
|
||||
TESTS = Bitset DualView DynamicView DynViewAPI_generic DynViewAPI_rank12345 DynViewAPI_rank67 ErrorReporter OffsetView ScatterView UnorderedMap ViewCtorPropEmbeddedDim
|
||||
tmp := $(foreach device, $(KOKKOS_DEVICELIST), \
|
||||
tmp2 := $(foreach test, $(TESTS), \
|
||||
$(if $(filter Test$(device)_$(test).cpp, $(shell ls Test$(device)_$(test).cpp 2>/dev/null)),,\
|
||||
@ -52,7 +54,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||
OBJ_CUDA += TestCuda_ErrorReporter.o
|
||||
OBJ_CUDA += TestCuda_OffsetView.o
|
||||
OBJ_CUDA += TestCuda_ScatterView.o
|
||||
OBJ_CUDA += TestCuda_StaticCrsGraph.o
|
||||
OBJ_CUDA += TestCuda_UnorderedMap.o
|
||||
OBJ_CUDA += TestCuda_ViewCtorPropEmbeddedDim.o
|
||||
TARGETS += KokkosContainers_UnitTest_Cuda
|
||||
@ -70,7 +71,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_THREADS), 1)
|
||||
OBJ_THREADS += TestThreads_ErrorReporter.o
|
||||
OBJ_THREADS += TestThreads_OffsetView.o
|
||||
OBJ_THREADS += TestThreads_ScatterView.o
|
||||
OBJ_THREADS += TestThreads_StaticCrsGraph.o
|
||||
OBJ_THREADS += TestThreads_UnorderedMap.o
|
||||
OBJ_THREADS += TestThreads_ViewCtorPropEmbeddedDim.o
|
||||
TARGETS += KokkosContainers_UnitTest_Threads
|
||||
@ -88,7 +88,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
|
||||
OBJ_OPENMP += TestOpenMP_ErrorReporter.o
|
||||
OBJ_OPENMP += TestOpenMP_OffsetView.o
|
||||
OBJ_OPENMP += TestOpenMP_ScatterView.o
|
||||
OBJ_OPENMP += TestOpenMP_StaticCrsGraph.o
|
||||
OBJ_OPENMP += TestOpenMP_UnorderedMap.o
|
||||
OBJ_OPENMP += TestOpenMP_ViewCtorPropEmbeddedDim.o
|
||||
TARGETS += KokkosContainers_UnitTest_OpenMP
|
||||
@ -106,7 +105,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_HPX), 1)
|
||||
OBJ_HPX += TestHPX_ErrorReporter.o
|
||||
OBJ_HPX += TestHPX_OffsetView.o
|
||||
OBJ_HPX += TestHPX_ScatterView.o
|
||||
OBJ_HPX += TestHPX_StaticCrsGraph.o
|
||||
OBJ_HPX += TestHPX_UnorderedMap.o
|
||||
OBJ_HPX += TestHPX_ViewCtorPropEmbeddedDim.o
|
||||
TARGETS += KokkosContainers_UnitTest_HPX
|
||||
@ -124,7 +122,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
|
||||
OBJ_SERIAL += TestSerial_ErrorReporter.o
|
||||
OBJ_SERIAL += TestSerial_OffsetView.o
|
||||
OBJ_SERIAL += TestSerial_ScatterView.o
|
||||
OBJ_SERIAL += TestSerial_StaticCrsGraph.o
|
||||
OBJ_SERIAL += TestSerial_UnorderedMap.o
|
||||
OBJ_SERIAL += TestSerial_ViewCtorPropEmbeddedDim.o
|
||||
TARGETS += KokkosContainers_UnitTest_Serial
|
||||
|
||||
@ -87,17 +87,6 @@ struct test_dualview_copy_construction_and_assignment {
|
||||
|
||||
ASSERT_EQ(a.view_host(), c.view_host());
|
||||
ASSERT_EQ(a.view_device(), c.view_device());
|
||||
|
||||
// We can't test shallow equality of modified_flags because it's protected.
|
||||
// So we test it indirectly through sync state behavior.
|
||||
if (!std::decay_t<SrcViewType>::impl_dualview_is_single_device::value) {
|
||||
a.clear_sync_state();
|
||||
a.modify_host();
|
||||
ASSERT_TRUE(a.need_sync_device());
|
||||
ASSERT_TRUE(b.need_sync_device());
|
||||
ASSERT_TRUE(c.need_sync_device());
|
||||
a.clear_sync_state();
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
@ -123,16 +112,16 @@ struct test_dualview_combinations {
|
||||
} else {
|
||||
a = ViewType(Kokkos::view_alloc(Kokkos::WithoutInitializing, "A"), n, m);
|
||||
}
|
||||
Kokkos::deep_copy(a.d_view, 1);
|
||||
Kokkos::deep_copy(a.view_device(), 1);
|
||||
|
||||
a.template modify<typename ViewType::execution_space>();
|
||||
a.template sync<typename ViewType::host_mirror_space>();
|
||||
a.template sync<typename ViewType::host_mirror_space>(
|
||||
Kokkos::DefaultExecutionSpace{});
|
||||
|
||||
a.h_view(5, 1) = 3;
|
||||
a.h_view(6, 1) = 4;
|
||||
a.h_view(7, 2) = 5;
|
||||
a.view_host()(5, 1) = 3;
|
||||
a.view_host()(6, 1) = 4;
|
||||
a.view_host()(7, 2) = 5;
|
||||
a.template modify<typename ViewType::host_mirror_space>();
|
||||
ViewType b = Kokkos::subview(a, std::pair<unsigned int, unsigned int>(6, 9),
|
||||
std::pair<unsigned int, unsigned int>(0, 1));
|
||||
@ -141,16 +130,17 @@ struct test_dualview_combinations {
|
||||
Kokkos::DefaultExecutionSpace{});
|
||||
b.template modify<typename ViewType::execution_space>();
|
||||
|
||||
Kokkos::deep_copy(b.d_view, 2);
|
||||
Kokkos::deep_copy(b.view_device(), 2);
|
||||
|
||||
a.template sync<typename ViewType::host_mirror_space>();
|
||||
a.template sync<typename ViewType::host_mirror_space>(
|
||||
Kokkos::DefaultExecutionSpace{});
|
||||
Scalar count = 0;
|
||||
for (unsigned int i = 0; i < a.d_view.extent(0); i++)
|
||||
for (unsigned int j = 0; j < a.d_view.extent(1); j++)
|
||||
count += a.h_view(i, j);
|
||||
return count - a.d_view.extent(0) * a.d_view.extent(1) - 2 - 4 - 3 * 2;
|
||||
for (unsigned int i = 0; i < a.view_device().extent(0); i++)
|
||||
for (unsigned int j = 0; j < a.view_device().extent(1); j++)
|
||||
count += a.view_host()(i, j);
|
||||
return count - a.view_device().extent(0) * a.view_device().extent(1) - 2 -
|
||||
4 - 3 * 2;
|
||||
}
|
||||
|
||||
test_dualview_combinations(unsigned int size, bool with_init) {
|
||||
@ -191,7 +181,7 @@ struct test_dual_view_deep_copy {
|
||||
}
|
||||
const scalar_type sum_total = scalar_type(n * m);
|
||||
|
||||
Kokkos::deep_copy(a.d_view, 1);
|
||||
Kokkos::deep_copy(a.view_device(), 1);
|
||||
|
||||
if (use_templ_sync) {
|
||||
a.template modify<typename ViewType::execution_space>();
|
||||
@ -209,15 +199,16 @@ struct test_dual_view_deep_copy {
|
||||
typename ViewType::t_dev::memory_space::execution_space;
|
||||
Kokkos::parallel_reduce(
|
||||
Kokkos::RangePolicy<t_dev_exec_space>(0, n),
|
||||
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(a.d_view),
|
||||
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(
|
||||
a.view_device()),
|
||||
a_d_sum);
|
||||
ASSERT_EQ(a_d_sum, sum_total);
|
||||
|
||||
// Check host view is synced as expected
|
||||
scalar_type a_h_sum = 0;
|
||||
for (size_t i = 0; i < a.h_view.extent(0); ++i)
|
||||
for (size_t j = 0; j < a.h_view.extent(1); ++j) {
|
||||
a_h_sum += a.h_view(i, j);
|
||||
for (size_t i = 0; i < a.view_host().extent(0); ++i)
|
||||
for (size_t j = 0; j < a.view_host().extent(1); ++j) {
|
||||
a_h_sum += a.view_host()(i, j);
|
||||
}
|
||||
|
||||
ASSERT_EQ(a_h_sum, sum_total);
|
||||
@ -237,15 +228,16 @@ struct test_dual_view_deep_copy {
|
||||
// Execute on the execution_space associated with t_dev's memory space
|
||||
Kokkos::parallel_reduce(
|
||||
Kokkos::RangePolicy<t_dev_exec_space>(0, n),
|
||||
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(b.d_view),
|
||||
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(
|
||||
b.view_device()),
|
||||
b_d_sum);
|
||||
ASSERT_EQ(b_d_sum, sum_total);
|
||||
|
||||
// Check host view is synced as expected
|
||||
scalar_type b_h_sum = 0;
|
||||
for (size_t i = 0; i < b.h_view.extent(0); ++i)
|
||||
for (size_t j = 0; j < b.h_view.extent(1); ++j) {
|
||||
b_h_sum += b.h_view(i, j);
|
||||
for (size_t i = 0; i < b.view_host().extent(0); ++i)
|
||||
for (size_t j = 0; j < b.view_host().extent(1); ++j) {
|
||||
b_h_sum += b.view_host()(i, j);
|
||||
}
|
||||
|
||||
ASSERT_EQ(b_h_sum, sum_total);
|
||||
@ -256,8 +248,8 @@ struct test_dual_view_deep_copy {
|
||||
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(10, 5, true);
|
||||
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(10, 5,
|
||||
false);
|
||||
// Test zero length but allocated (a.d_view.data!=nullptr but
|
||||
// a.d_view.span()==0)
|
||||
// Test zero length but allocated (a.view_device().data() != nullptr but
|
||||
// a.view_device().span() == 0)
|
||||
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(0, 5, true);
|
||||
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(0, 5, false);
|
||||
|
||||
@ -285,7 +277,7 @@ struct test_dualview_resize {
|
||||
else
|
||||
a = ViewType(Kokkos::view_alloc(Kokkos::WithoutInitializing, "A"), n, m);
|
||||
|
||||
Kokkos::deep_copy(a.d_view, 1);
|
||||
Kokkos::deep_copy(a.view_device(), 1);
|
||||
|
||||
/* Covers case "Resize on Device" */
|
||||
a.modify_device();
|
||||
@ -296,7 +288,7 @@ struct test_dualview_resize {
|
||||
ASSERT_EQ(a.extent(0), n * factor);
|
||||
ASSERT_EQ(a.extent(1), m * factor);
|
||||
|
||||
Kokkos::deep_copy(a.d_view, 1);
|
||||
Kokkos::deep_copy(a.view_device(), 1);
|
||||
a.sync_host();
|
||||
|
||||
// Check device view is initialized as expected
|
||||
@ -307,18 +299,18 @@ struct test_dualview_resize {
|
||||
"errors");
|
||||
Kokkos::parallel_for(
|
||||
Kokkos::MDRangePolicy<t_dev_exec_space, Kokkos::Rank<2>>(
|
||||
{0, 0}, {a.d_view.extent(0), a.d_view.extent(1)}),
|
||||
{0, 0}, {a.view_device().extent(0), a.view_device().extent(1)}),
|
||||
KOKKOS_LAMBDA(int i, int j) {
|
||||
if (a.d_view(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
|
||||
if (a.view_device()(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
|
||||
});
|
||||
int errors_d_scalar;
|
||||
Kokkos::deep_copy(errors_d_scalar, errors_d);
|
||||
|
||||
// Check host view is synced as expected
|
||||
int errors_h_scalar = 0;
|
||||
for (size_t i = 0; i < a.h_view.extent(0); ++i)
|
||||
for (size_t j = 0; j < a.h_view.extent(1); ++j) {
|
||||
if (a.h_view(i, j) != 1) ++errors_h_scalar;
|
||||
for (size_t i = 0; i < a.view_host().extent(0); ++i)
|
||||
for (size_t j = 0; j < a.view_host().extent(1); ++j) {
|
||||
if (a.view_host()(i, j) != 1) ++errors_h_scalar;
|
||||
}
|
||||
|
||||
// Check
|
||||
@ -345,17 +337,17 @@ struct test_dualview_resize {
|
||||
typename ViewType::t_dev::memory_space::execution_space;
|
||||
Kokkos::parallel_for(
|
||||
Kokkos::MDRangePolicy<t_dev_exec_space, Kokkos::Rank<2>>(
|
||||
{0, 0}, {a.d_view.extent(0), a.d_view.extent(1)}),
|
||||
{0, 0}, {a.view_device().extent(0), a.view_device().extent(1)}),
|
||||
KOKKOS_LAMBDA(int i, int j) {
|
||||
if (a.d_view(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
|
||||
if (a.view_device()(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
|
||||
});
|
||||
Kokkos::deep_copy(errors_d_scalar, errors_d);
|
||||
|
||||
// Check host view is synced as expected
|
||||
errors_h_scalar = 0;
|
||||
for (size_t i = 0; i < a.h_view.extent(0); ++i)
|
||||
for (size_t j = 0; j < a.h_view.extent(1); ++j) {
|
||||
if (a.h_view(i, j) != 1) ++errors_h_scalar;
|
||||
for (size_t i = 0; i < a.view_host().extent(0); ++i)
|
||||
for (size_t j = 0; j < a.view_host().extent(1); ++j) {
|
||||
if (a.view_host()(i, j) != 1) ++errors_h_scalar;
|
||||
}
|
||||
|
||||
// Check
|
||||
@ -390,7 +382,7 @@ struct test_dualview_realloc {
|
||||
ASSERT_EQ(a.extent(0), n);
|
||||
ASSERT_EQ(a.extent(1), m);
|
||||
|
||||
Kokkos::deep_copy(a.d_view, 1);
|
||||
Kokkos::deep_copy(a.view_device(), 1);
|
||||
|
||||
a.modify_device();
|
||||
a.sync_host();
|
||||
@ -403,18 +395,18 @@ struct test_dualview_realloc {
|
||||
"errors");
|
||||
Kokkos::parallel_for(
|
||||
Kokkos::MDRangePolicy<t_dev_exec_space, Kokkos::Rank<2>>(
|
||||
{0, 0}, {a.d_view.extent(0), a.d_view.extent(1)}),
|
||||
{0, 0}, {a.view_device().extent(0), a.view_device().extent(1)}),
|
||||
KOKKOS_LAMBDA(int i, int j) {
|
||||
if (a.d_view(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
|
||||
if (a.view_device()(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
|
||||
});
|
||||
int errors_d_scalar;
|
||||
Kokkos::deep_copy(errors_d_scalar, errors_d);
|
||||
|
||||
// Check host view is synced as expected
|
||||
int errors_h_scalar = 0;
|
||||
for (size_t i = 0; i < a.h_view.extent(0); ++i)
|
||||
for (size_t j = 0; j < a.h_view.extent(1); ++j) {
|
||||
if (a.h_view(i, j) != 1) ++errors_h_scalar;
|
||||
for (size_t i = 0; i < a.view_host().extent(0); ++i)
|
||||
for (size_t j = 0; j < a.view_host().extent(1); ++j) {
|
||||
if (a.view_host()(i, j) != 1) ++errors_h_scalar;
|
||||
}
|
||||
|
||||
// Check
|
||||
@ -484,6 +476,64 @@ TEST(TEST_CATEGORY, dualview_deep_copy) {
|
||||
test_dualview_deep_copy<double, TEST_EXECSPACE>();
|
||||
}
|
||||
|
||||
template <typename ExecutionSpace>
|
||||
void test_dualview_sync_should_fence() {
|
||||
using DualViewType = Kokkos::DualView<int, ExecutionSpace>;
|
||||
{
|
||||
DualViewType dv("test_dual_view");
|
||||
dv.modify_device();
|
||||
Kokkos::parallel_for(
|
||||
Kokkos::RangePolicy<ExecutionSpace>(0, 10000),
|
||||
KOKKOS_LAMBDA(int) { Kokkos::atomic_add(dv.view_device().data(), 1); });
|
||||
dv.sync_host();
|
||||
ASSERT_EQ(dv.view_host()(), 10000);
|
||||
}
|
||||
{
|
||||
DualViewType dv("test_dual_view");
|
||||
dv.modify_device();
|
||||
Kokkos::parallel_for(
|
||||
Kokkos::RangePolicy<ExecutionSpace>(0, 10000),
|
||||
KOKKOS_LAMBDA(int) { Kokkos::atomic_add(dv.view_device().data(), 1); });
|
||||
dv.template sync<typename DualViewType::t_host::device_type>();
|
||||
ASSERT_EQ(dv.view_host()(), 10000);
|
||||
}
|
||||
{
|
||||
DualViewType dv("test_dual_view");
|
||||
dv.modify_host();
|
||||
Kokkos::parallel_for(
|
||||
Kokkos::RangePolicy<Kokkos::DefaultHostExecutionSpace>(0, 10000),
|
||||
KOKKOS_LAMBDA(int) { Kokkos::atomic_add(dv.view_host().data(), 1); });
|
||||
dv.sync_device();
|
||||
int result;
|
||||
auto device_exec =
|
||||
Kokkos::Experimental::partition_space(ExecutionSpace{}, 1);
|
||||
Kokkos::deep_copy(device_exec[0], result, dv.view_device());
|
||||
device_exec[0].fence();
|
||||
ASSERT_EQ(result, 10000);
|
||||
}
|
||||
{
|
||||
DualViewType dv("test_dual_view");
|
||||
dv.modify_host();
|
||||
Kokkos::parallel_for(
|
||||
Kokkos::RangePolicy<Kokkos::DefaultHostExecutionSpace>(0, 10000),
|
||||
KOKKOS_LAMBDA(int) { Kokkos::atomic_add(dv.view_host().data(), 1); });
|
||||
dv.template sync<typename DualViewType::t_dev::device_type>();
|
||||
int result;
|
||||
auto device_exec =
|
||||
Kokkos::Experimental::partition_space(ExecutionSpace{}, 1);
|
||||
Kokkos::deep_copy(device_exec[0], result, dv.view_device());
|
||||
device_exec[0].fence();
|
||||
ASSERT_EQ(result, 10000);
|
||||
}
|
||||
}
|
||||
|
||||
TEST(TEST_CATEGORY, dualview_sync_should_fence) {
|
||||
#ifdef KOKKOS_ENABLE_HPX // FIXME
|
||||
GTEST_SKIP() << "Known to fail with HPX";
|
||||
#endif
|
||||
test_dualview_sync_should_fence<TEST_EXECSPACE>();
|
||||
}
|
||||
|
||||
struct NoDefaultConstructor {
|
||||
NoDefaultConstructor(int i_) : i(i_) {}
|
||||
KOKKOS_FUNCTION operator int() const { return i; }
|
||||
@ -640,8 +690,8 @@ auto initialize_view_of_views() {
|
||||
|
||||
V v("v", 2);
|
||||
V w("w", 2);
|
||||
dv_v.h_view(0) = v;
|
||||
dv_v.h_view(1) = w;
|
||||
dv_v.view_host()(0) = v;
|
||||
dv_v.view_host()(1) = w;
|
||||
|
||||
dv_v.modify_host();
|
||||
dv_v.sync_device();
|
||||
@ -652,19 +702,19 @@ auto initialize_view_of_views() {
|
||||
TEST(TEST_CATEGORY, dualview_sequential_host_init) {
|
||||
auto dv_v = initialize_view_of_views<Kokkos::View<double*, TEST_EXECSPACE>>();
|
||||
dv_v.resize(Kokkos::view_alloc(Kokkos::SequentialHostInit), 2u);
|
||||
ASSERT_EQ(dv_v.d_view.size(), 2u);
|
||||
ASSERT_EQ(dv_v.h_view.size(), 2u);
|
||||
ASSERT_EQ(dv_v.view_device().size(), 2u);
|
||||
ASSERT_EQ(dv_v.view_host().size(), 2u);
|
||||
|
||||
initialize_view_of_views<S<Kokkos::View<double*, TEST_EXECSPACE>>>();
|
||||
|
||||
Kokkos::DualView<double*> dv(
|
||||
Kokkos::view_alloc("myView", Kokkos::SequentialHostInit), 1u);
|
||||
dv.resize(Kokkos::view_alloc(Kokkos::SequentialHostInit), 2u);
|
||||
ASSERT_EQ(dv.d_view.size(), 2u);
|
||||
ASSERT_EQ(dv.h_view.size(), 2u);
|
||||
ASSERT_EQ(dv.view_device().size(), 2u);
|
||||
ASSERT_EQ(dv.view_host().size(), 2u);
|
||||
dv.realloc(Kokkos::view_alloc(Kokkos::SequentialHostInit), 3u);
|
||||
ASSERT_EQ(dv.d_view.size(), 3u);
|
||||
ASSERT_EQ(dv.h_view.size(), 3u);
|
||||
ASSERT_EQ(dv.view_device().size(), 3u);
|
||||
ASSERT_EQ(dv.view_host().size(), 3u);
|
||||
}
|
||||
} // anonymous namespace
|
||||
} // namespace Test
|
||||
|
||||
@ -27,7 +27,7 @@ void test_dyn_rank_view_team_scratch() {
|
||||
using policy_type = Kokkos::TeamPolicy<execution_space>;
|
||||
using team_type = policy_type::member_type;
|
||||
|
||||
int N0 = 10, N1 = 4, N2 = 3;
|
||||
size_t N0 = 10, N1 = 4, N2 = 3;
|
||||
size_t shmem_size = drv_type::shmem_size(N0, N1, N2);
|
||||
ASSERT_GE(shmem_size, N0 * N1 * N2 * sizeof(int));
|
||||
|
||||
@ -40,9 +40,9 @@ void test_dyn_rank_view_team_scratch() {
|
||||
drv_type scr(team.team_scratch(0), N0, N1, N2);
|
||||
// Control that the code ran at all
|
||||
if (scr.rank() != 3) errors() |= 1u;
|
||||
if (scr.extent_int(0) != N0) errors() |= 2u;
|
||||
if (scr.extent_int(1) != N1) errors() |= 4u;
|
||||
if (scr.extent_int(2) != N2) errors() |= 8u;
|
||||
if (scr.extent(0) != N0) errors() |= 2u;
|
||||
if (scr.extent(1) != N1) errors() |= 4u;
|
||||
if (scr.extent(2) != N2) errors() |= 8u;
|
||||
Kokkos::parallel_for(
|
||||
Kokkos::TeamThreadMDRange(team, N0, N1, N2),
|
||||
[=](int i, int j, int k) { scr(i, j, k) = i * 100 + j * 10 + k; });
|
||||
|
||||
@ -130,7 +130,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 7> {
|
||||
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
|
||||
const long j = &left(i0, i1, i2, i3, i4, i5, i6) -
|
||||
&left(0, 0, 0, 0, 0, 0, 0);
|
||||
if (j <= offset || left_alloc <= j) {
|
||||
if (j < offset || left_alloc <= j) {
|
||||
update |= 1;
|
||||
}
|
||||
offset = j;
|
||||
@ -146,7 +146,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 7> {
|
||||
for (unsigned i6 = 0; i6 < unsigned(right.extent(6)); ++i6) {
|
||||
const long j = &right(i0, i1, i2, i3, i4, i5, i6) -
|
||||
&right(0, 0, 0, 0, 0, 0, 0);
|
||||
if (j <= offset || right_alloc <= j) {
|
||||
if (j < offset || right_alloc <= j) {
|
||||
update |= 2;
|
||||
}
|
||||
offset = j;
|
||||
@ -212,7 +212,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 6> {
|
||||
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
|
||||
const long j =
|
||||
&left(i0, i1, i2, i3, i4, i5) - &left(0, 0, 0, 0, 0, 0);
|
||||
if (j <= offset || left_alloc <= j) {
|
||||
if (j < offset || left_alloc <= j) {
|
||||
update |= 1;
|
||||
}
|
||||
offset = j;
|
||||
@ -227,7 +227,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 6> {
|
||||
for (unsigned i5 = 0; i5 < unsigned(right.extent(5)); ++i5) {
|
||||
const long j =
|
||||
&right(i0, i1, i2, i3, i4, i5) - &right(0, 0, 0, 0, 0, 0);
|
||||
if (j <= offset || right_alloc <= j) {
|
||||
if (j < offset || right_alloc <= j) {
|
||||
update |= 2;
|
||||
}
|
||||
offset = j;
|
||||
@ -298,7 +298,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 5> {
|
||||
for (unsigned i1 = 0; i1 < unsigned(left.extent(1)); ++i1)
|
||||
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
|
||||
const long j = &left(i0, i1, i2, i3, i4) - &left(0, 0, 0, 0, 0);
|
||||
if (j <= offset || left_alloc <= j) {
|
||||
if (j < offset || left_alloc <= j) {
|
||||
update |= 1;
|
||||
}
|
||||
offset = j;
|
||||
@ -316,7 +316,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 5> {
|
||||
for (unsigned i3 = 0; i3 < unsigned(right.extent(3)); ++i3)
|
||||
for (unsigned i4 = 0; i4 < unsigned(right.extent(4)); ++i4) {
|
||||
const long j = &right(i0, i1, i2, i3, i4) - &right(0, 0, 0, 0, 0);
|
||||
if (j <= offset || right_alloc <= j) {
|
||||
if (j < offset || right_alloc <= j) {
|
||||
update |= 2;
|
||||
}
|
||||
offset = j;
|
||||
@ -383,7 +383,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 4> {
|
||||
for (unsigned i1 = 0; i1 < unsigned(left.extent(1)); ++i1)
|
||||
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
|
||||
const long j = &left(i0, i1, i2, i3) - &left(0, 0, 0, 0);
|
||||
if (j <= offset || left_alloc <= j) {
|
||||
if (j < offset || left_alloc <= j) {
|
||||
update |= 1;
|
||||
}
|
||||
offset = j;
|
||||
@ -395,7 +395,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 4> {
|
||||
for (unsigned i2 = 0; i2 < unsigned(right.extent(2)); ++i2)
|
||||
for (unsigned i3 = 0; i3 < unsigned(right.extent(3)); ++i3) {
|
||||
const long j = &right(i0, i1, i2, i3) - &right(0, 0, 0, 0);
|
||||
if (j <= offset || right_alloc <= j) {
|
||||
if (j < offset || right_alloc <= j) {
|
||||
update |= 2;
|
||||
}
|
||||
offset = j;
|
||||
@ -462,7 +462,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 3> {
|
||||
for (unsigned i1 = 0; i1 < unsigned(left.extent(1)); ++i1)
|
||||
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
|
||||
const long j = &left(i0, i1, i2) - &left(0, 0, 0);
|
||||
if (j <= offset || left_alloc <= j) {
|
||||
if (j < offset || left_alloc <= j) {
|
||||
update |= 1;
|
||||
}
|
||||
offset = j;
|
||||
@ -477,7 +477,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 3> {
|
||||
for (unsigned i1 = 0; i1 < unsigned(right.extent(1)); ++i1)
|
||||
for (unsigned i2 = 0; i2 < unsigned(right.extent(2)); ++i2) {
|
||||
const long j = &right(i0, i1, i2) - &right(0, 0, 0);
|
||||
if (j <= offset || right_alloc <= j) {
|
||||
if (j < offset || right_alloc <= j) {
|
||||
update |= 2;
|
||||
}
|
||||
offset = j;
|
||||
@ -551,7 +551,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 2> {
|
||||
for (unsigned i1 = 0; i1 < unsigned(left.extent(1)); ++i1)
|
||||
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
|
||||
const long j = &left(i0, i1) - &left(0, 0);
|
||||
if (j <= offset || left_alloc <= j) {
|
||||
if (j < offset || left_alloc <= j) {
|
||||
update |= 1;
|
||||
}
|
||||
offset = j;
|
||||
@ -561,7 +561,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 2> {
|
||||
for (unsigned i0 = 0; i0 < unsigned(right.extent(0)); ++i0)
|
||||
for (unsigned i1 = 0; i1 < unsigned(right.extent(1)); ++i1) {
|
||||
const long j = &right(i0, i1) - &right(0, 0);
|
||||
if (j <= offset || right_alloc <= j) {
|
||||
if (j < offset || right_alloc <= j) {
|
||||
update |= 2;
|
||||
}
|
||||
offset = j;
|
||||
@ -1563,7 +1563,7 @@ class TestDynViewAPI {
|
||||
// an lvalue reference due to retrieving through texture cache
|
||||
// therefore not allowed to query the underlying pointer.
|
||||
#if defined(KOKKOS_ENABLE_CUDA)
|
||||
if (!std::is_same<typename device::execution_space, Kokkos::Cuda>::value)
|
||||
if (!std::is_same_v<typename device::execution_space, Kokkos::Cuda>)
|
||||
#endif
|
||||
{
|
||||
ASSERT_EQ(x.data(), xr.data());
|
||||
|
||||
@ -270,18 +270,19 @@ void test_offsetview_construction() {
|
||||
|
||||
template <typename Scalar, typename Device>
|
||||
void test_offsetview_unmanaged_construction() {
|
||||
// Preallocated memory (Only need a valid address for this test)
|
||||
Scalar s;
|
||||
// Preallocated memory
|
||||
Kokkos::View<Scalar, Device> s("s");
|
||||
Scalar* ptr = s.data(); // obtain a pointer into the right address space
|
||||
|
||||
{
|
||||
// Constructing an OffsetView directly around our preallocated memory
|
||||
Kokkos::Array<int64_t, 1> begins1{{2}};
|
||||
Kokkos::Array<int64_t, 1> ends1{{3}};
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> ov1(&s, begins1, ends1);
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> ov1(ptr, begins1, ends1);
|
||||
|
||||
// Constructing an OffsetView around an unmanaged View of our preallocated
|
||||
// memory
|
||||
Kokkos::View<Scalar*, Device> v1(&s, ends1[0] - begins1[0]);
|
||||
Kokkos::View<Scalar*, Device> v1(ptr, ends1[0] - begins1[0]);
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> ovv1(v1, begins1);
|
||||
|
||||
// They should match
|
||||
@ -292,9 +293,9 @@ void test_offsetview_unmanaged_construction() {
|
||||
{
|
||||
Kokkos::Array<int64_t, 2> begins2{{-2, -7}};
|
||||
Kokkos::Array<int64_t, 2> ends2{{5, -3}};
|
||||
Kokkos::Experimental::OffsetView<Scalar**, Device> ov2(&s, begins2, ends2);
|
||||
Kokkos::Experimental::OffsetView<Scalar**, Device> ov2(ptr, begins2, ends2);
|
||||
|
||||
Kokkos::View<Scalar**, Device> v2(&s, ends2[0] - begins2[0],
|
||||
Kokkos::View<Scalar**, Device> v2(ptr, ends2[0] - begins2[0],
|
||||
ends2[1] - begins2[1]);
|
||||
Kokkos::Experimental::OffsetView<Scalar**, Device> ovv2(v2, begins2);
|
||||
|
||||
@ -305,10 +306,10 @@ void test_offsetview_unmanaged_construction() {
|
||||
{
|
||||
Kokkos::Array<int64_t, 3> begins3{{2, 3, 5}};
|
||||
Kokkos::Array<int64_t, 3> ends3{{7, 11, 13}};
|
||||
Kokkos::Experimental::OffsetView<Scalar***, Device> ovv3(&s, begins3,
|
||||
Kokkos::Experimental::OffsetView<Scalar***, Device> ovv3(ptr, begins3,
|
||||
ends3);
|
||||
|
||||
Kokkos::View<Scalar***, Device> v3(&s, ends3[0] - begins3[0],
|
||||
Kokkos::View<Scalar***, Device> v3(ptr, ends3[0] - begins3[0],
|
||||
ends3[1] - begins3[1],
|
||||
ends3[2] - begins3[2]);
|
||||
Kokkos::Experimental::OffsetView<Scalar***, Device> ov3(v3, begins3);
|
||||
@ -323,10 +324,10 @@ void test_offsetview_unmanaged_construction() {
|
||||
Kokkos::Array<int64_t, 1> begins{{-3}};
|
||||
Kokkos::Array<int64_t, 1> ends{{2}};
|
||||
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> bb(&s, begins, ends);
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> bi(&s, begins, {2});
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> ib(&s, {-3}, ends);
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> ii(&s, {-3}, {2});
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> bb(ptr, begins, ends);
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> bi(ptr, begins, {2});
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> ib(ptr, {-3}, ends);
|
||||
Kokkos::Experimental::OffsetView<Scalar*, Device> ii(ptr, {-3}, {2});
|
||||
|
||||
ASSERT_EQ(bb, bi);
|
||||
ASSERT_EQ(bb, ib);
|
||||
@ -336,8 +337,9 @@ void test_offsetview_unmanaged_construction() {
|
||||
|
||||
template <typename Scalar, typename Device>
|
||||
void test_offsetview_unmanaged_construction_death() {
|
||||
// Preallocated memory (Only need a valid address for this test)
|
||||
Scalar s;
|
||||
// Preallocated memory
|
||||
Kokkos::View<Scalar, Device> s("s");
|
||||
Scalar* ptr = s.data(); // obtain a pointer into the right address space
|
||||
|
||||
// Regular expression syntax on Windows is a pain. `.` does not match `\n`.
|
||||
// Feel free to make it work if you have time to spare.
|
||||
@ -351,10 +353,10 @@ void test_offsetview_unmanaged_construction_death() {
|
||||
using offset_view_type = Kokkos::Experimental::OffsetView<Scalar*, Device>;
|
||||
|
||||
// Range calculations must be positive
|
||||
(void)offset_view_type(&s, {0}, {1});
|
||||
(void)offset_view_type(&s, {0}, {0});
|
||||
(void)offset_view_type(ptr, {0}, {1});
|
||||
(void)offset_view_type(ptr, {0}, {0});
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {0}, {-1}),
|
||||
offset_view_type(ptr, {0}, {-1}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
@ -366,9 +368,9 @@ void test_offsetview_unmanaged_construction_death() {
|
||||
using offset_view_type = Kokkos::Experimental::OffsetView<Scalar*, Device>;
|
||||
|
||||
// Range calculations must not overflow
|
||||
(void)offset_view_type(&s, {0}, {0x7fffffffffffffffl});
|
||||
(void)offset_view_type(ptr, {0}, {0x7fffffffffffffffl});
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {-1}, {0x7fffffffffffffffl}),
|
||||
offset_view_type(ptr, {-1}, {0x7fffffffffffffffl}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
@ -376,7 +378,8 @@ void test_offsetview_unmanaged_construction_death() {
|
||||
"\\(-1\\)\\) "
|
||||
"overflows"));
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {-0x7fffffffffffffffl - 1}, {0x7fffffffffffffffl}),
|
||||
offset_view_type(ptr, {-0x7fffffffffffffffl - 1},
|
||||
{0x7fffffffffffffffl}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
@ -384,7 +387,7 @@ void test_offsetview_unmanaged_construction_death() {
|
||||
"\\(-9223372036854775808\\)\\) "
|
||||
"overflows"));
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {-0x7fffffffffffffffl - 1}, {0}),
|
||||
offset_view_type(ptr, {-0x7fffffffffffffffl - 1}, {0}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
@ -399,7 +402,7 @@ void test_offsetview_unmanaged_construction_death() {
|
||||
// Should throw when the rank of begins and/or ends doesn't match that
|
||||
// of OffsetView
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {0}, {1}),
|
||||
offset_view_type(ptr, {0}, {1}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
@ -407,13 +410,13 @@ void test_offsetview_unmanaged_construction_death() {
|
||||
".*"
|
||||
"ends\\.size\\(\\) \\(1\\) != Rank \\(2\\)"));
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {0}, {1, 1}),
|
||||
offset_view_type(ptr, {0}, {1, 1}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
"begins\\.size\\(\\) \\(1\\) != Rank \\(2\\)"));
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {0}, {1, 1, 1}),
|
||||
offset_view_type(ptr, {0}, {1, 1, 1}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
@ -421,20 +424,20 @@ void test_offsetview_unmanaged_construction_death() {
|
||||
".*"
|
||||
"ends\\.size\\(\\) \\(3\\) != Rank \\(2\\)"));
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {0, 0}, {1}),
|
||||
offset_view_type(ptr, {0, 0}, {1}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
"ends\\.size\\(\\) \\(1\\) != Rank \\(2\\)"));
|
||||
(void)offset_view_type(&s, {0, 0}, {1, 1});
|
||||
(void)offset_view_type(ptr, {0, 0}, {1, 1});
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {0, 0}, {1, 1, 1}),
|
||||
offset_view_type(ptr, {0, 0}, {1, 1, 1}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
"ends\\.size\\(\\) \\(3\\) != Rank \\(2\\)"));
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {0, 0, 0}, {1}),
|
||||
offset_view_type(ptr, {0, 0, 0}, {1}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
@ -442,13 +445,13 @@ void test_offsetview_unmanaged_construction_death() {
|
||||
".*"
|
||||
"ends\\.size\\(\\) \\(1\\) != Rank \\(2\\)"));
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {0, 0, 0}, {1, 1}),
|
||||
offset_view_type(ptr, {0, 0, 0}, {1, 1}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
"begins\\.size\\(\\) \\(3\\) != Rank \\(2\\)"));
|
||||
ASSERT_DEATH(
|
||||
offset_view_type(&s, {0, 0, 0}, {1, 1, 1}),
|
||||
offset_view_type(ptr, {0, 0, 0}, {1, 1, 1}),
|
||||
SKIP_REGEX_ON_WINDOWS(
|
||||
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
|
||||
".*"
|
||||
|
||||
@ -772,12 +772,12 @@ TEST(TEST_CATEGORY, scatterview) {
|
||||
|
||||
#if defined(KOKKOS_ENABLE_SERIAL) || defined(KOKKOS_ENABLE_OPENMP)
|
||||
#if defined(KOKKOS_ENABLE_SERIAL)
|
||||
bool is_serial = std::is_same<TEST_EXECSPACE, Kokkos::Serial>::value;
|
||||
bool is_serial = std::is_same_v<TEST_EXECSPACE, Kokkos::Serial>;
|
||||
#else
|
||||
bool is_serial = false;
|
||||
#endif
|
||||
#if defined(KOKKOS_ENABLE_OPENMP)
|
||||
bool is_openmp = std::is_same<TEST_EXECSPACE, Kokkos::OpenMP>::value;
|
||||
bool is_openmp = std::is_same_v<TEST_EXECSPACE, Kokkos::OpenMP>;
|
||||
#else
|
||||
bool is_openmp = false;
|
||||
#endif
|
||||
@ -817,7 +817,7 @@ TEST(TEST_CATEGORY, scatterview_devicetype) {
|
||||
using device_memory_space = Kokkos::HIPSpace;
|
||||
using host_accessible_space = Kokkos::HIPManagedSpace;
|
||||
#endif
|
||||
if (std::is_same<TEST_EXECSPACE, device_execution_space>::value) {
|
||||
if (std::is_same_v<TEST_EXECSPACE, device_execution_space>) {
|
||||
using device_device_type =
|
||||
Kokkos::Device<device_execution_space, device_memory_space>;
|
||||
test_scatter_view<device_device_type, Kokkos::Experimental::ScatterSum,
|
||||
|
||||
@ -18,7 +18,9 @@
|
||||
|
||||
#include <vector>
|
||||
|
||||
#define KOKKOS_IMPL_DO_NOT_WARN_INCLUDE_STATIC_CRS_GRAPH
|
||||
#include <Kokkos_StaticCrsGraph.hpp>
|
||||
#undef KOKKOS_IMPL_DO_NOT_WARN_INCLUDE_STATIC_CRS_GRAPH
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
/*--------------------------------------------------------------------------*/
|
||||
|
||||
@ -24,6 +24,7 @@
|
||||
int main(int argc, char** argv) {
|
||||
Kokkos::initialize(argc, argv);
|
||||
benchmark::Initialize(&argc, argv);
|
||||
// FIXME: seconds as default time unit leads to precision loss
|
||||
benchmark::SetDefaultTimeUnit(benchmark::kSecond);
|
||||
KokkosBenchmark::add_benchmark_context(true);
|
||||
|
||||
|
||||
@ -133,3 +133,4 @@ if(NOT KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
|
||||
endif()
|
||||
|
||||
kokkos_add_benchmark(PerformanceTest_Atomic SOURCES test_atomic.cpp)
|
||||
kokkos_add_benchmark(PerformanceTest_Reduction SOURCES test_reduction.cpp)
|
||||
|
||||
@ -20,6 +20,8 @@ LINK ?= $(CXX)
|
||||
LDFLAGS ?=
|
||||
override LDFLAGS += -lpthread
|
||||
|
||||
KOKKOS_USE_DEPRECATED_MAKEFILES=1
|
||||
|
||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||
|
||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/perf_test
|
||||
|
||||
@ -107,8 +107,8 @@ int get_R(benchmark::State& state) {
|
||||
|
||||
template <class Scalar>
|
||||
static void CustomReduction(benchmark::State& state) {
|
||||
int N = get_N(state);
|
||||
int R = get_R(state);
|
||||
size_t N = get_N(state);
|
||||
size_t R = get_R(state);
|
||||
|
||||
for (auto _ : state) {
|
||||
auto results = custom_reduction_test<double>(N, R);
|
||||
|
||||
@ -38,12 +38,15 @@ void deepcopy_view(ViewTypeA& a, ViewTypeB& b, benchmark::State& state) {
|
||||
}
|
||||
}
|
||||
|
||||
template <class LayoutA, class LayoutB>
|
||||
template <
|
||||
class LayoutA, class LayoutB,
|
||||
class MemorySpaceA = typename Kokkos::DefaultExecutionSpace::memory_space,
|
||||
class MemorySpaceB = typename Kokkos::DefaultExecutionSpace::memory_space>
|
||||
static void ViewDeepCopy_Rank1(benchmark::State& state) {
|
||||
const int N8 = std::pow(state.range(0), 8);
|
||||
|
||||
Kokkos::View<double*, LayoutA> a("A1", N8);
|
||||
Kokkos::View<double*, LayoutB> b("B1", N8);
|
||||
Kokkos::View<double*, LayoutA, MemorySpaceA> a("A1", N8);
|
||||
Kokkos::View<double*, LayoutB, MemorySpaceB> b("B1", N8);
|
||||
|
||||
deepcopy_view(a, b, state);
|
||||
}
|
||||
@ -145,6 +148,29 @@ static void ViewDeepCopy_Raw(benchmark::State& state) {
|
||||
}
|
||||
}
|
||||
|
||||
template <typename DstMemorySpace, typename SrcMemorySpace>
|
||||
static void ViewDeepCopy_Rank1Strided(benchmark::State& state) {
|
||||
const size_t N8 = std::pow(state.range(0), 8);
|
||||
|
||||
// This benchmark allocates more data in order to measure a deep_copy
|
||||
// of the same size as the contiguous benchmarks, so in cases where they
|
||||
// can be run, this one may fail to allocate data (e.g., on a small CI runner)
|
||||
try {
|
||||
// allocate 2x the size since layout only has 1/2 the elements
|
||||
Kokkos::View<double*, DstMemorySpace> a("A1", N8 * 2);
|
||||
Kokkos::View<double*, SrcMemorySpace> b("B1", N8 * 2);
|
||||
|
||||
Kokkos::LayoutStride layout(N8 / 2, 2);
|
||||
Kokkos::View<double*, Kokkos::LayoutStride, DstMemorySpace> a_stride(
|
||||
a.data(), layout);
|
||||
Kokkos::View<double*, Kokkos::LayoutStride, SrcMemorySpace> b_stride(
|
||||
b.data(), layout);
|
||||
deepcopy_view(a_stride, b_stride, state);
|
||||
} catch (const std::runtime_error& e) {
|
||||
state.SkipWithError(e.what());
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace Test
|
||||
|
||||
#endif
|
||||
|
||||
@ -18,6 +18,23 @@
|
||||
|
||||
namespace Test {
|
||||
|
||||
// host -> default
|
||||
BENCHMARK(ViewDeepCopy_Rank1<Kokkos::LayoutLeft, Kokkos::LayoutLeft,
|
||||
Kokkos::DefaultExecutionSpace::memory_space,
|
||||
Kokkos::DefaultHostExecutionSpace::memory_space>)
|
||||
->ArgName("N")
|
||||
->Arg(10)
|
||||
->UseManualTime();
|
||||
|
||||
// default -> host
|
||||
BENCHMARK(ViewDeepCopy_Rank1<Kokkos::LayoutLeft, Kokkos::LayoutLeft,
|
||||
Kokkos::DefaultHostExecutionSpace::memory_space,
|
||||
Kokkos::DefaultExecutionSpace::memory_space>)
|
||||
->ArgName("N")
|
||||
->Arg(10)
|
||||
->UseManualTime();
|
||||
|
||||
// default -> default
|
||||
BENCHMARK(ViewDeepCopy_Rank1<Kokkos::LayoutLeft, Kokkos::LayoutLeft>)
|
||||
->ArgName("N")
|
||||
->Arg(10)
|
||||
@ -33,4 +50,18 @@ BENCHMARK(ViewDeepCopy_Rank3<Kokkos::LayoutLeft, Kokkos::LayoutLeft>)
|
||||
->Arg(10)
|
||||
->UseManualTime();
|
||||
|
||||
BENCHMARK(
|
||||
ViewDeepCopy_Rank1Strided<Kokkos::DefaultExecutionSpace::memory_space,
|
||||
Kokkos::DefaultExecutionSpace::memory_space>)
|
||||
->ArgName("N")
|
||||
->Arg(10)
|
||||
->UseManualTime();
|
||||
|
||||
BENCHMARK(
|
||||
ViewDeepCopy_Rank1Strided<Kokkos::DefaultHostExecutionSpace::memory_space,
|
||||
Kokkos::DefaultHostExecutionSpace::memory_space>)
|
||||
->ArgName("N")
|
||||
->Arg(10)
|
||||
->UseManualTime();
|
||||
|
||||
} // namespace Test
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user