Update Kokkos library in LAMMPS to v4.6.0

This commit is contained in:
Stan Moore
2025-03-28 15:29:14 -06:00
parent 48893236ec
commit b7b9a4a599
384 changed files with 13243 additions and 9477 deletions

View File

@ -1,5 +1,72 @@
# CHANGELOG
## 4.6.00
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.5.01...4.6.00)
### Features:
* Kokkos::Graph: Allow adding tasks to the graph via a `then`-node [\#7629](https://github.com/kokkos/kokkos/pull/7629)
* Kokkos::Graph: Allow construction from CUDA/HIP graph [\#7664](https://github.com/kokkos/kokkos/pull/7664)
* HIP: Add experimental support for using multiple GPUs from one process [\#7130](https://github.com/kokkos/kokkos/pull/7130)
### Backend and Architecture Enhancements:
#### CUDA:
* Improved reduction performance, in particular on H100 and newer [\#7823](https://github.com/kokkos/kokkos/pull/7823)
#### HIP:
* Change block size deduction to prefer smaller blocks/teams [\#7509](https://github.com/kokkos/kokkos/pull/7509)
* Allocate memory with stream ordered semantics (i.e. use `hipMallocAsync`) [\#7659](https://github.com/kokkos/kokkos/pull/7659)
* Fix a segfault when a virtual function called inside a kernel requires too many registers[\#7660](https://github.com/kokkos/kokkos/pull/7660)
#### SYCL:
* Improve sorting performance for non-contiguous views [\#7502](https://github.com/kokkos/kokkos/pull/7502)
#### Serial:
* Reduce fences overhead when using `Kokkos_ENABLE_ATOMICS_BYPASS` [\#7821](https://github.com/kokkos/kokkos/pull/7821)
### General Enhancements
* Allow use of `kokkos_check` in `<PackageName>Config.cmake` without warnings [\#7669](https://github.com/kokkos/kokkos/pull/7669)
* Add simd compound assignments and update simd reductions [\#7486](https://github.com/kokkos/kokkos/pull/7486)
* Improve performance of the `inclusive_scan` algorithm with Cuda and HIP [\#7542](https://github.com/kokkos/kokkos/pull/7542)
* Reduce tooling interface overhead (don't pay for what you don't use) [\#7817](https://github.com/kokkos/kokkos/pull/7817)
* Avoid storing the view in `RandomAccessIterator` to increase performance [\#7304](https://github.com/kokkos/kokkos/pull/7304)
* Make `RandomAccessIterator` fulfill `std::random_access_iterator concept` [\#7451](https://github.com/kokkos/kokkos/pull/7451)
* Include information about support for system allocated memory in `print_configuration` (Cuda and HIP) [\#7673](https://github.com/kokkos/kokkos/pull/7673)
### Build System Changes
* Add support for Zen 4 AMD microarchitecture [\#7550](https://github.com/kokkos/kokkos/pull/7550)
* Enable NVIDIA Grace architecture with NVHPC [\#7858](https://github.com/kokkos/kokkos/pull/7858)
* Support static library builds when using CUDA as CMake language [\#7830](https://github.com/kokkos/kokkos/pull/7830)
### Incompatibilities (i.e. breaking changes)
* Change SIMD comparison operator to return `simd_mask` instead of `bool` [\#7781](https://github.com/kokkos/kokkos/pull/7781)
* Remove classic Intel compiler (icpc) support [\#7737](https://github.com/kokkos/kokkos/pull/7737)
* Remove `operator[]` overloads of Kokkos `basic_simd` and `basic_simd_mask` that return a reference [\#7630](https://github.com/kokkos/kokkos/pull/7630)
### Deprecations
* Deprecate `StaticCrsGraph` and move it to Kokkos Kernels into `KokkosSparse::` [\#7516](https://github.com/kokkos/kokkos/pull/7516)
* Deprecate `native_simd` and hide `simd_abi` [\#7472](https://github.com/kokkos/kokkos/pull/7472)
* Deprecate Makefile support [\#7613](https://github.com/kokkos/kokkos/pull/7613)
* DualView: Deprecate direct access to d_view and h_view [\#7716](https://github.com/kokkos/kokkos/pull/7716)
### Bug Fixes
* Fix performance bug affecting `atomic_fetch_{add,sub,min,max,and,or,xor}` on integral types `long` and `unsigned long` with HIP [\#7816](https://github.com/kokkos/kokkos/pull/7816)
* Fix execution of ranges with more than 2B elements [\#7797](https://github.com/kokkos/kokkos/pull/7797)
* Fix clean target when embedding Kokkos in another project [\#7557](https://github.com/kokkos/kokkos/pull/7557)
* Fix Zen3 flag for NVHPC [\#7558](https://github.com/kokkos/kokkos/pull/7558)
* graph: nodes must be stored by the graph [\#7619](https://github.com/kokkos/kokkos/pull/7619)
* Make sure lock arrays are on device before launching a graph [\#7685](https://github.com/kokkos/kokkos/pull/7685)
* Performance bug in `RangePolicy`: construct error message if and only if the precondition is violated [\#7809](https://github.com/kokkos/kokkos/pull/7809)
* simd: fix a bug in scalar min/max [\#7813](https://github.com/kokkos/kokkos/pull/7813)
* simd: fix a bug in non-masked reductions [\#7845](https://github.com/kokkos/kokkos/pull/7845)
* Cuda: fix incorrect iteration in `MDRangePolicy` of rank > 4 for high iteration counts [\#7724](https://github.com/kokkos/kokkos/pull/7724)
* Cuda: ignore gcc assembler options in `nvcc-wrapper` [\#7492](https://github.com/kokkos/kokkos/pull/7492)
* Build system: hint to `ARCH_NATIVE` if ARMv9 Grace arch is not explicitly supported by the compiler [\#7862](https://github.com/kokkos/kokkos/pull/7862)
* Use right arch for MI300A in makefiles [\#7786](https://github.com/kokkos/kokkos/pull/7786)
* Fix compiling BasicView on MSVC [\#7751](https://github.com/kokkos/kokkos/pull/7751)
## 4.5.01
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.5.00...4.5.01)

View File

@ -148,8 +148,8 @@ elseif(NOT CMAKE_SIZEOF_VOID_P EQUAL 8)
endif()
set(Kokkos_VERSION_MAJOR 4)
set(Kokkos_VERSION_MINOR 5)
set(Kokkos_VERSION_PATCH 1)
set(Kokkos_VERSION_MINOR 6)
set(Kokkos_VERSION_PATCH 0)
set(Kokkos_VERSION "${Kokkos_VERSION_MAJOR}.${Kokkos_VERSION_MINOR}.${Kokkos_VERSION_PATCH}")
message(STATUS "Kokkos version: ${Kokkos_VERSION}")
math(EXPR KOKKOS_VERSION "${Kokkos_VERSION_MAJOR} * 10000 + ${Kokkos_VERSION_MINOR} * 100 + ${Kokkos_VERSION_PATCH}")

View File

@ -0,0 +1,4 @@
set(CTEST_PROJECT_NAME Kokkos)
set(CTEST_NIGHTLY_START_TIME 01:00:00 UTC)
set(CTEST_SUBMIT_URL https://my.cdash.org/submit.php?project=Kokkos)
set(CTEST_DROP_SITE_CDASH TRUE)

View File

@ -1,18 +1,26 @@
# Default settings common options.
#SPARTA specific settings:
#LAMMPS specific settings:
KOKKOS_USE_DEPRECATED_MAKEFILES=1
ifndef KOKKOS_PATH
KOKKOS_PATH=../../lib/kokkos
endif
CXXFLAGS=$(CCFLAGS)
ifeq ($(mode),shared)
CXXFLAGS += $(SHFLAGS)
CXXFLAGS += $(SHFLAGS)
endif
ifneq ($(KOKKOS_USE_DEPRECATED_MAKEFILES), 1)
$(error Makefile support is deprecated. Only CMake builds will be supported from Kokkos 5 on. Set KOKKOS_USE_DEPRECATED_MAKEFILES=1 to silence this error.)
endif
KOKKOS_VERSION_MAJOR = 4
KOKKOS_VERSION_MINOR = 5
KOKKOS_VERSION_PATCH = 1
KOKKOS_VERSION_MINOR = 6
KOKKOS_VERSION_PATCH = 0
KOKKOS_VERSION = $(shell echo $(KOKKOS_VERSION_MAJOR)*10000+$(KOKKOS_VERSION_MINOR)*100+$(KOKKOS_VERSION_PATCH) | bc)
# Options: Cuda,HIP,SYCL,OpenMPTarget,OpenMP,Threads,Serial
@ -24,7 +32,7 @@ KOKKOS_DEVICES ?= "OpenMP"
# ARM: ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX,ARMv9-Grace
# IBM: Power8,Power9
# AMD-GPUS: AMD_GFX906,AMD_GFX908,AMD_GFX90A,AMD_GFX940,AMD_GFX942,AMD_GFX942_APU,AMD_GFX1030,AMD_GFX1100,AMD_GFX1103
# AMD-CPUS: AMDAVX,Zen,Zen2,Zen3
# AMD-CPUS: AMDAVX,Zen,Zen2,Zen3,Zen4
# Intel-GPUs: Intel_Gen,Intel_Gen9,Intel_Gen11,Intel_Gen12LP,Intel_DG1,Intel_XeHP,Intel_PVC
KOKKOS_ARCH ?= ""
# Options: yes,no
@ -442,11 +450,14 @@ KOKKOS_INTERNAL_USE_ARCH_IBM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_
# AMD based.
KOKKOS_INTERNAL_USE_ARCH_AMDAVX := $(call kokkos_has_string,$(KOKKOS_ARCH),AMDAVX)
KOKKOS_INTERNAL_USE_ARCH_ZEN4 := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen4)
KOKKOS_INTERNAL_USE_ARCH_ZEN3 := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen3)
KOKKOS_INTERNAL_USE_ARCH_ZEN2 := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen2)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN3), 0)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN2), 0)
KOKKOS_INTERNAL_USE_ARCH_ZEN := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN4), 0)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN3), 0)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN2), 0)
KOKKOS_INTERNAL_USE_ARCH_ZEN := $(call kokkos_has_string,$(KOKKOS_ARCH),Zen)
endif
endif
endif
@ -463,8 +474,10 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX90A), 0)
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX90A := $(call kokkos_has_string,$(KOKKOS_ARCH),VEGA90A)
endif
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX940 := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX940)
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX942 := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX942)
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX942_APU := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX942_APU)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX942_APU), 0)
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX942 := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX942)
endif
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1030 := $(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX1030)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1030), 0)
KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1030 := $(call kokkos_has_string,$(KOKKOS_ARCH),NAVI1030)
@ -857,6 +870,19 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN3), 1)
endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ZEN4), 1)
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_ZEN4")
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AVX512XEON")
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -xCORE-AVX512
KOKKOS_LDFLAGS += -xCORE-AVX512
else
KOKKOS_CXXFLAGS += -march=znver4 -mtune=znver4
KOKKOS_LDFLAGS += -march=znver4 -mtune=znver4
endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX), 1)
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_ARMV80")
tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_ARMV8_THUNDERX")

View File

@ -18,24 +18,24 @@ Kokkos is a [Linux Foundation](https://linuxfoundation.org) project.
To start learning about Kokkos:
- [Kokkos Lectures](https://kokkos.org/kokkos-core-wiki/videolectures.html): they contain a mix of lecture videos and hands-on exercises covering all the important capabilities.
- [Kokkos Lectures](https://kokkos.org/kokkos-core-wiki/tutorials-and-examples/video-lectures.html): they contain a mix of lecture videos and hands-on exercises covering all the important capabilities.
- [Programming guide](https://kokkos.org/kokkos-core-wiki/programmingguide.html): contains in "narrative" form a technical description of the programming model, machine model, and the main building blocks like the Views and parallel dispatch.
- [API reference](https://kokkos.org/kokkos-core-wiki/): organized by category, i.e., [core](https://kokkos.org/kokkos-core-wiki/API/core-index.html), [algorithms](https://kokkos.org/kokkos-core-wiki/API/algorithms-index.html) and [containers](https://kokkos.org/kokkos-core-wiki/API/containers-index.html) or, if you prefer, in [alphabetical order](https://kokkos.org/kokkos-core-wiki/API/alphabetical.html).
- [Use cases and Examples](https://kokkos.org/kokkos-core-wiki/usecases.html): a serie of examples ranging from how to use Kokkos with MPI to Fortran interoperability.
- [Use cases and Examples](https://kokkos.org/kokkos-core-wiki/tutorials-and-examples/use-cases-and-examples.html): a serie of examples ranging from how to use Kokkos with MPI to Fortran interoperability.
## Obtaining Kokkos
The latest release of Kokkos can be obtained from the [GitHub releases page](https://github.com/kokkos/kokkos/releases/latest).
The current release is [4.5.01](https://github.com/kokkos/kokkos/releases/tag/4.5.01).
The current release is [4.6.00](https://github.com/kokkos/kokkos/releases/tag/4.6.00).
```bash
curl -OJ -L https://github.com/kokkos/kokkos/releases/download/4.5.01/kokkos-4.5.01.tar.gz
curl -OJ -L https://github.com/kokkos/kokkos/releases/download/4.6.00/kokkos-4.6.00.tar.gz
# Or with wget
wget https://github.com/kokkos/kokkos/releases/download/4.5.01/kokkos-4.5.01.tar.gz
wget https://github.com/kokkos/kokkos/releases/download/4.6.00/kokkos-4.6.00.tar.gz
```
To clone the latest development version of Kokkos from GitHub:
@ -47,7 +47,7 @@ git clone -b develop https://github.com/kokkos/kokkos.git
### Building Kokkos
To build Kokkos, you will need to have a C++ compiler that supports C++17 or later.
All requirements including minimum and primary tested compiler versions can be found [here](https://kokkos.org/kokkos-core-wiki/requirements.html).
All requirements including minimum and primary tested compiler versions can be found [here](https://kokkos.org/kokkos-core-wiki/get-started/requirements.html).
Building and installation instructions are described [here](https://kokkos.org/kokkos-core-wiki/building.html).

View File

@ -5,3 +5,7 @@ endif()
if(NOT ((KOKKOS_ENABLE_OPENMPTARGET AND KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC) OR KOKKOS_ENABLE_OPENACC))
kokkos_add_test_directories(unit_tests)
endif()
if(Kokkos_ENABLE_BENCHMARKS)
add_subdirectory(perf_test)
endif()

View File

@ -0,0 +1,63 @@
# FIXME: The following logic should be moved from here and also from `core/perf_test/CMakeLists.txt` to
# the root `CMakeLists.txt` in the form of a macro
# Find or download google/benchmark library
find_package(benchmark QUIET 1.5.6)
if(benchmark_FOUND)
message(STATUS "Using google benchmark found in ${benchmark_DIR}")
else()
message(STATUS "No installed google benchmark found, fetching from GitHub")
include(FetchContent)
set(BENCHMARK_ENABLE_TESTING OFF)
list(APPEND CMAKE_MESSAGE_INDENT "[benchmark] ")
FetchContent_Declare(
googlebenchmark
DOWNLOAD_EXTRACT_TIMESTAMP FALSE
URL https://github.com/google/benchmark/archive/refs/tags/v1.7.1.tar.gz
URL_HASH MD5=0459a6c530df9851bee6504c3e37c2e7
)
FetchContent_MakeAvailable(googlebenchmark)
list(POP_BACK CMAKE_MESSAGE_INDENT)
# Suppress clang-tidy diagnostics on code that we do not have control over
if(CMAKE_CXX_CLANG_TIDY)
set_target_properties(benchmark PROPERTIES CXX_CLANG_TIDY "")
endif()
# FIXME: Check whether the following target_compile_options are needed.
# If so, clarify why.
target_compile_options(benchmark PRIVATE -w)
target_compile_options(benchmark_main PRIVATE -w)
endif()
# FIXME: This function should be moved from here and also from `core/perf_test/CMakeLists.txt` to
# the root `CMakeLists.txt`
# FIXME: Could NAME be a one_value_keyword specified in cmake_parse_arguments?
function(KOKKOS_ADD_BENCHMARK NAME)
cmake_parse_arguments(BENCHMARK "" "" "SOURCES" ${ARGN})
if(DEFINED BENCHMARK_UNPARSED_ARGUMENTS)
message(WARNING "Unexpected arguments when adding a benchmark: " ${BENCHMARK_UNPARSED_ARGUMENTS})
endif()
set(BENCHMARK_NAME Kokkos_${NAME})
# FIXME: BenchmarkMain.cpp and Benchmark_Context.cpp should be moved to a common location from which
# they can be used by all performance tests.
list(APPEND BENCHMARK_SOURCES ../../core/perf_test/BenchmarkMain.cpp ../../core/perf_test/Benchmark_Context.cpp)
add_executable(${BENCHMARK_NAME} ${BENCHMARK_SOURCES})
target_link_libraries(${BENCHMARK_NAME} PRIVATE benchmark::benchmark Kokkos::kokkos impl_git_version)
target_include_directories(${BENCHMARK_NAME} SYSTEM PRIVATE ${benchmark_SOURCE_DIR}/include)
# FIXME: This alone will not work. It might need an architecture and standard which need to be defined on target level.
# It will potentially go away with #7582.
foreach(SOURCE_FILE ${BENCHMARK_SOURCES})
set_source_files_properties(${SOURCE_FILE} PROPERTIES LANGUAGE ${KOKKOS_COMPILE_LANGUAGE})
endforeach()
string(TIMESTAMP BENCHMARK_TIME "%Y-%m-%d_T%H-%M-%S" UTC)
set(BENCHMARK_ARGS --benchmark_counters_tabular=true --benchmark_out=${BENCHMARK_NAME}_${BENCHMARK_TIME}.json)
add_test(NAME ${BENCHMARK_NAME} COMMAND ${BENCHMARK_NAME} ${BENCHMARK_ARGS})
endfunction()
kokkos_add_benchmark(PerformanceTest_InclusiveScan SOURCES test_inclusive_scan.cpp)

View File

@ -0,0 +1,191 @@
//@HEADER
// ************************************************************************
//
// Kokkos v. 4.0
// Copyright (2022) National Technology & Engineering
// Solutions of Sandia, LLC (NTESS).
//
// Under the terms of Contract DE-NA0003525 with NTESS,
// the U.S. Government retains certain rights in this software.
//
// Part of Kokkos, under the Apache License v2.0 with LLVM Exceptions.
// See https://kokkos.org/LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//@HEADER
#include <cstddef>
#include <cstdint>
#include <tuple>
#include <type_traits>
#include <benchmark/benchmark.h>
#include <Kokkos_Core.hpp>
#include <Kokkos_Timer.hpp>
#include <Kokkos_StdAlgorithms.hpp>
// FIXME: Benchmark_Context.hpp should be moved to a common location
#include "../../core/perf_test/Benchmark_Context.hpp"
namespace {
namespace KE = Kokkos::Experimental;
using ExecSpace = Kokkos::DefaultExecutionSpace;
using HostExecSpace = Kokkos::DefaultHostExecutionSpace;
// A tag struct to identify when inclusive scan with the implicit sum
// based binary operation needs to be called.
template <class ValueType>
struct ImpSumBinOp;
template <class ValueType>
struct SumFunctor {
KOKKOS_FUNCTION
ValueType operator()(const ValueType& a, const ValueType& b) const {
return (a + b);
}
};
template <class ValueType>
struct MaxFunctor {
KOKKOS_FUNCTION
ValueType operator()(const ValueType& a, const ValueType& b) const {
if (a > b)
return a;
else
return b;
}
};
// Helper to obtain last element of a view
template <class T>
T obtain_last_elem(const Kokkos::View<T*, ExecSpace>& v) {
T last_element;
Kokkos::deep_copy(last_element, Kokkos::subview(v, v.extent(0) - 1));
return last_element;
}
// Helper to allocate input and output views
template <class T>
auto prepare_views(const std::size_t kProbSize) {
Kokkos::View<T*, ExecSpace> in{"input", kProbSize};
Kokkos::View<T*, ExecSpace> out{"output", kProbSize};
auto h_in = Kokkos::create_mirror_view(in);
for (std::size_t i = 0; i < kProbSize; ++i) {
h_in(i) = i;
}
Kokkos::deep_copy(in, h_in);
return std::make_tuple(in, out, h_in);
}
// Perform scan with a reference implementation
template <class T, class ViewType, class ScanFunctor = SumFunctor<T>>
T ref_scan(const ViewType& h_in, ScanFunctor scan_functor = ScanFunctor()) {
std::size_t view_size = h_in.extent(0);
Kokkos::View<T*, HostExecSpace> h_out("output", view_size);
// FIXME: We have GCC 8.4.0 based check in our ORNL Jenkins CI.
// std::inclusive_scan is available only from GCC 9.3. Since, GCC 9.1
// std::inclusive_scan that takes execution policy is available. However,
// there is error with <execution> header before GCC 10.1.
h_out(0) = h_in(0);
for (std::size_t i = 1; i < view_size; ++i) {
h_out(i) = scan_functor(h_in(i), h_out(i - 1));
}
return h_out(view_size - 1);
}
// Inclusive Scan with default binary operation (sum) or user provided functor
// Note: The nature of the functor must be compatible with the
// elements in the input and output views
template <class T, template <class> class ScanFunctor = ImpSumBinOp>
auto inclusive_scan(const Kokkos::View<T*, ExecSpace>& in,
const Kokkos::View<T*, ExecSpace>& out, T res_check) {
ExecSpace().fence();
Kokkos::Timer timer;
if constexpr (std::is_same_v<ScanFunctor<T>, ImpSumBinOp<T>>) {
KE::inclusive_scan("Default scan", ExecSpace(), KE::cbegin(in),
KE::cend(in), KE::begin(out));
} else {
KE::inclusive_scan("Scan using a functor", ExecSpace(), KE::cbegin(in),
KE::cend(in), KE::begin(out), ScanFunctor<T>());
}
ExecSpace().fence();
double time_scan = timer.seconds();
T res_scan = obtain_last_elem(out);
bool passed = (res_check == res_scan);
return std::make_tuple(time_scan, passed);
}
// Benchmark: Inclusive Scan with default binary operation (sum)
// or user provided functor
template <class T, template <class> class ScanFunctor = ImpSumBinOp>
void BM_inclusive_scan(benchmark::State& state) {
const std::size_t kProbSize = state.range(0);
auto [in, out, h_in] = prepare_views<T>(kProbSize);
T res_check;
if constexpr (std::is_same_v<ScanFunctor<T>, ImpSumBinOp<T>>) {
res_check = ref_scan<T>(h_in);
} else {
res_check = ref_scan<T>(h_in, ScanFunctor<T>());
}
double time_scan = 0.;
bool passed = false;
for (auto _ : state) {
if constexpr (std::is_same_v<ScanFunctor<T>, ImpSumBinOp<T>>) {
std::tie(time_scan, passed) = inclusive_scan<T>(in, out, res_check);
} else {
std::tie(time_scan, passed) =
inclusive_scan<T, ScanFunctor>(in, out, res_check);
}
KokkosBenchmark::report_results(state, in, 2, time_scan);
state.counters["Passed"] = passed;
}
}
constexpr std::size_t PROB_SIZE = 100'000'000;
} // anonymous namespace
// FIXME: Add logic to pass min. warm-up time. Also, the value should be set
// by the user. Say, via the environment variable BENCHMARK_MIN_WARMUP_TIME.
BENCHMARK(BM_inclusive_scan<std::uint64_t>)->Arg(PROB_SIZE)->UseManualTime();
BENCHMARK(BM_inclusive_scan<std::int64_t>)->Arg(PROB_SIZE)->UseManualTime();
BENCHMARK(BM_inclusive_scan<double>)->Arg(PROB_SIZE)->UseManualTime();
BENCHMARK(BM_inclusive_scan<std::uint64_t, SumFunctor>)
->Arg(PROB_SIZE)
->UseManualTime();
BENCHMARK(BM_inclusive_scan<std::int64_t, SumFunctor>)
->Arg(PROB_SIZE)
->UseManualTime();
BENCHMARK(BM_inclusive_scan<double, SumFunctor>)
->Arg(PROB_SIZE)
->UseManualTime();
BENCHMARK(BM_inclusive_scan<std::uint64_t, MaxFunctor>)
->Arg(PROB_SIZE)
->UseManualTime();
BENCHMARK(BM_inclusive_scan<std::int64_t, MaxFunctor>)
->Arg(PROB_SIZE)
->UseManualTime();
BENCHMARK(BM_inclusive_scan<double, MaxFunctor>)
->Arg(PROB_SIZE)
->UseManualTime();

View File

@ -587,11 +587,13 @@ struct Random_XorShift1024_State<false> {
int state_idx)
: state_(&v(state_idx, 0)), stride_(v.stride_1()) {}
// NOLINTBEGIN(bugprone-implicit-widening-of-multiplication-result)
KOKKOS_FUNCTION
uint64_t operator[](const int i) const { return state_[i * stride_]; }
KOKKOS_FUNCTION
uint64_t& operator[](const int i) { return state_[i * stride_]; }
// NOLINTEND(bugprone-implicit-widening-of-multiplication-result)
};
template <class ExecutionSpace>
@ -670,7 +672,12 @@ struct Random_UniqueIndex<Kokkos::Device<Kokkos::SYCL, MemorySpace>> {
View<int**, Kokkos::Device<Kokkos::SYCL, MemorySpace>>;
KOKKOS_FUNCTION
static int get_state_idx(const locks_view_type& locks_) {
#if defined(KOKKOS_COMPILER_INTEL_LLVM) && \
KOKKOS_COMPILER_INTEL_LLVM >= 20250000
auto item = sycl::ext::oneapi::this_work_item::get_nd_item<3>();
#else
auto item = sycl::ext::oneapi::experimental::this_nd_item<3>();
#endif
std::size_t threadIdx[3] = {item.get_local_id(2), item.get_local_id(1),
item.get_local_id(0)};
std::size_t blockIdx[3] = {item.get_group(2), item.get_group(1),

View File

@ -45,7 +45,7 @@ struct BinOp1D {
// For integral types the number of bins may be larger than the range
// in which case we can exactly have one unique value per bin
// and then don't need to sort bins.
if (std::is_integral<typename KeyViewType::const_value_type>::value &&
if (std::is_integral_v<typename KeyViewType::const_value_type> &&
(static_cast<double>(max) - static_cast<double>(min)) <=
static_cast<double>(max_bins)) {
mul_ = 1.;

View File

@ -53,13 +53,9 @@ void sort(const ExecutionSpace& exec,
if constexpr (Impl::better_off_calling_std_sort_v<ExecutionSpace>) {
exec.fence("Kokkos::sort without comparator use std::sort");
if (view.span_is_contiguous()) {
std::sort(view.data(), view.data() + view.size());
} else {
auto first = ::Kokkos::Experimental::begin(view);
auto last = ::Kokkos::Experimental::end(view);
std::sort(first, last);
}
auto first = ::Kokkos::Experimental::begin(view);
auto last = ::Kokkos::Experimental::end(view);
std::sort(first, last);
} else {
Impl::sort_device_view_without_comparator(exec, view);
}
@ -111,13 +107,9 @@ void sort(const ExecutionSpace& exec,
if constexpr (Impl::better_off_calling_std_sort_v<ExecutionSpace>) {
exec.fence("Kokkos::sort with comparator use std::sort");
if (view.span_is_contiguous()) {
std::sort(view.data(), view.data() + view.size(), comparator);
} else {
auto first = ::Kokkos::Experimental::begin(view);
auto last = ::Kokkos::Experimental::end(view);
std::sort(first, last, comparator);
}
auto first = ::Kokkos::Experimental::begin(view);
auto last = ::Kokkos::Experimental::end(view);
std::sort(first, last, comparator);
} else {
Impl::sort_device_view_with_comparator(exec, view, comparator);
}

View File

@ -47,6 +47,7 @@
#ifdef _CubLog
#undef _CubLog
#endif
// NOLINTNEXTLINE(bugprone-reserved-identifier)
#define _CubLog
#include <thrust/device_ptr.h>
#include <thrust/sort.h>
@ -65,12 +66,24 @@
#include <thrust/sort.h>
#endif
#if defined(KOKKOS_ENABLE_ONEDPL) && \
(ONEDPL_VERSION_MAJOR > 2022 || \
(ONEDPL_VERSION_MAJOR == 2022 && ONEDPL_VERSION_MINOR >= 2))
#define KOKKOS_ONEDPL_HAS_SORT_BY_KEY
#ifdef KOKKOS_ENABLE_ONEDPL
#define KOKKOS_IMPL_ONEDPL_VERSION \
ONEDPL_VERSION_MAJOR * 10000 + ONEDPL_VERSION_MINOR * 100 + \
ONEDPL_VERSION_PATCH
#define KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(MAJOR, MINOR, PATCH) \
(KOKKOS_IMPL_ONEDPL_VERSION >= ((MAJOR)*10000 + (MINOR)*100 + (PATCH)))
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 2, 0)
#define KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wshadow"
#pragma GCC diagnostic ignored "-Wunused-local-typedef"
#pragma GCC diagnostic ignored "-Wunused-parameter"
#pragma GCC diagnostic ignored "-Wunused-variable"
#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>
#pragma GCC diagnostic pop
#endif
#endif
namespace Kokkos::Impl {
@ -141,12 +154,18 @@ void sort_by_key_rocthrust(
#endif
#if defined(KOKKOS_ENABLE_ONEDPL)
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
template <class Layout>
inline constexpr bool sort_on_device_v<Kokkos::SYCL, Layout> = true;
#else
template <class Layout>
inline constexpr bool sort_on_device_v<Kokkos::SYCL, Layout> =
std::is_same_v<Layout, Kokkos::LayoutLeft> ||
std::is_same_v<Layout, Kokkos::LayoutRight>;
#endif
#ifdef KOKKOS_ONEDPL_HAS_SORT_BY_KEY
#ifdef KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
template <class KeysDataType, class... KeysProperties, class ValuesDataType,
class... ValuesProperties, class... MaybeComparator>
void sort_by_key_onedpl(
@ -154,6 +173,14 @@ void sort_by_key_onedpl(
const Kokkos::View<KeysDataType, KeysProperties...>& keys,
const Kokkos::View<ValuesDataType, ValuesProperties...>& values,
MaybeComparator&&... maybeComparator) {
auto queue = exec.sycl_queue();
auto policy = oneapi::dpl::execution::make_device_policy(queue);
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
oneapi::dpl::sort_by_key(policy, ::Kokkos::Experimental::begin(keys),
::Kokkos::Experimental::end(keys),
::Kokkos::Experimental::begin(values),
std::forward<MaybeComparator>(maybeComparator)...);
#else
if (keys.stride(0) != 1 && values.stride(0) != 1) {
Kokkos::abort(
"SYCL sort_by_key only supports rank-1 Views with stride(0) = 1.");
@ -161,11 +188,10 @@ void sort_by_key_onedpl(
// Can't use Experimental::begin/end here since the oneDPL then assumes that
// the data is on the host.
auto queue = exec.sycl_queue();
auto policy = oneapi::dpl::execution::make_device_policy(queue);
const int n = keys.extent(0);
oneapi::dpl::sort_by_key(policy, keys.data(), keys.data() + n, values.data(),
std::forward<MaybeComparator>(maybeComparator)...);
#endif
}
#endif
#endif
@ -336,12 +362,18 @@ void sort_by_key_device_view_without_comparator(
const Kokkos::SYCL& exec,
const Kokkos::View<KeysDataType, KeysProperties...>& keys,
const Kokkos::View<ValuesDataType, ValuesProperties...>& values) {
#ifdef KOKKOS_ONEDPL_HAS_SORT_BY_KEY
#ifdef KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
sort_by_key_onedpl(exec, keys, values);
#else
if (keys.stride(0) == 1 && values.stride(0) == 1)
sort_by_key_onedpl(exec, keys, values);
else
#endif
sort_by_key_via_sort(exec, keys, values);
#endif
#else
sort_by_key_via_sort(exec, keys, values);
#endif
}
#endif
@ -394,12 +426,18 @@ void sort_by_key_device_view_with_comparator(
const Kokkos::View<KeysDataType, KeysProperties...>& keys,
const Kokkos::View<ValuesDataType, ValuesProperties...>& values,
const ComparatorType& comparator) {
#ifdef KOKKOS_ONEDPL_HAS_SORT_BY_KEY
#ifdef KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
sort_by_key_onedpl(exec, keys, values, comparator);
#else
if (keys.stride(0) == 1 && values.stride(0) == 1)
sort_by_key_onedpl(exec, keys, values, comparator);
else
#endif
sort_by_key_via_sort(exec, keys, values, comparator);
#endif
#else
sort_by_key_via_sort(exec, keys, values, comparator);
#endif
}
#endif
@ -416,7 +454,9 @@ sort_by_key_device_view_with_comparator(
sort_by_key_via_sort(exec, keys, values, comparator);
}
#undef KOKKOS_ONEDPL_HAS_SORT_BY_KEY
#undef KOKKOS_IMPL_ONEDPL_HAS_SORT_BY_KEY
} // namespace Kokkos::Impl
#undef KOKKOS_IMPL_ONEDPL_VERSION
#undef KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL
#endif

View File

@ -51,6 +51,7 @@
#ifdef _CubLog
#undef _CubLog
#endif
// NOLINTNEXTLINE(bugprone-reserved-identifier)
#define _CubLog
#include <thrust/device_ptr.h>
#include <thrust/sort.h>
@ -70,8 +71,20 @@
#endif
#if defined(KOKKOS_ENABLE_ONEDPL)
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wshadow"
#pragma GCC diagnostic ignored "-Wunused-local-typedef"
#pragma GCC diagnostic ignored "-Wunused-parameter"
#pragma GCC diagnostic ignored "-Wunused-variable"
#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>
#pragma GCC diagnostic pop
#define KOKKOS_IMPL_ONEDPL_VERSION \
ONEDPL_VERSION_MAJOR * 10000 + ONEDPL_VERSION_MINOR * 100 + \
ONEDPL_VERSION_PATCH
#define KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(MAJOR, MINOR, PATCH) \
(KOKKOS_IMPL_ONEDPL_VERSION >= ((MAJOR)*10000 + (MINOR)*100 + (PATCH)))
#endif
namespace Kokkos {
@ -221,6 +234,10 @@ void sort_onedpl(const Kokkos::SYCL& space,
"SYCL execution space is not able to access the memory space "
"of the View argument!");
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
static_assert(ViewType::rank == 1,
"Kokkos::sort currently only supports rank-1 Views.");
#else
static_assert(
(ViewType::rank == 1) &&
(std::is_same_v<typename ViewType::array_layout, LayoutRight> ||
@ -234,18 +251,26 @@ void sort_onedpl(const Kokkos::SYCL& space,
if (view.stride(0) != 1) {
Kokkos::abort("SYCL sort only supports rank-1 Views with stride(0) = 1.");
}
#endif
if (view.extent(0) <= 1) {
return;
}
// Can't use Experimental::begin/end here since the oneDPL then assumes that
// the data is on the host.
auto queue = space.sycl_queue();
auto policy = oneapi::dpl::execution::make_device_policy(queue);
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
oneapi::dpl::sort(policy, ::Kokkos::Experimental::begin(view),
::Kokkos::Experimental::end(view),
std::forward<MaybeComparator>(maybeComparator)...);
#else
// Can't use Experimental::begin/end here since the oneDPL then assumes that
// the data is on the host.
const int n = view.extent(0);
oneapi::dpl::sort(policy, view.data(), view.data() + n,
std::forward<MaybeComparator>(maybeComparator)...);
#endif
}
#endif
@ -269,29 +294,19 @@ void copy_to_host_run_stdsort_copy_back(
KE::copy(exec, view, view_dc);
// run sort on the mirror of view_dc
auto mv_h = create_mirror_view_and_copy(Kokkos::HostSpace(), view_dc);
if (view.span_is_contiguous()) {
std::sort(mv_h.data(), mv_h.data() + mv_h.size(),
std::forward<MaybeComparator>(maybeComparator)...);
} else {
auto first = KE::begin(mv_h);
auto last = KE::end(mv_h);
std::sort(first, last, std::forward<MaybeComparator>(maybeComparator)...);
}
auto mv_h = create_mirror_view_and_copy(Kokkos::HostSpace(), view_dc);
auto first = KE::begin(mv_h);
auto last = KE::end(mv_h);
std::sort(first, last, std::forward<MaybeComparator>(maybeComparator)...);
Kokkos::deep_copy(exec, view_dc, mv_h);
// copy back to argument view
KE::copy(exec, KE::cbegin(view_dc), KE::cend(view_dc), KE::begin(view));
} else {
auto view_h = create_mirror_view_and_copy(Kokkos::HostSpace(), view);
if (view.span_is_contiguous()) {
std::sort(view_h.data(), view_h.data() + view_h.size(),
std::forward<MaybeComparator>(maybeComparator)...);
} else {
auto first = KE::begin(view_h);
auto last = KE::end(view_h);
std::sort(first, last, std::forward<MaybeComparator>(maybeComparator)...);
}
auto first = KE::begin(view_h);
auto last = KE::end(view_h);
std::sort(first, last, std::forward<MaybeComparator>(maybeComparator)...);
Kokkos::deep_copy(exec, view, view_h);
}
}
@ -332,11 +347,15 @@ void sort_device_view_without_comparator(
"sort_device_view_without_comparator: supports rank-1 Views "
"with LayoutLeft, LayoutRight or LayoutStride");
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
sort_onedpl(exec, view);
#else
if (view.stride(0) == 1) {
sort_onedpl(exec, view);
} else {
copy_to_host_run_stdsort_copy_back(exec, view);
}
#endif
}
#endif
@ -387,11 +406,15 @@ void sort_device_view_with_comparator(
"sort_device_view_with_comparator: supports rank-1 Views "
"with LayoutLeft, LayoutRight or LayoutStride");
#if KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL(2022, 7, 1)
sort_onedpl(exec, view, comparator);
#else
if (view.stride(0) == 1) {
sort_onedpl(exec, view, comparator);
} else {
copy_to_host_run_stdsort_copy_back(exec, view, comparator);
}
#endif
}
#endif
@ -423,4 +446,7 @@ sort_device_view_with_comparator(
} // namespace Impl
} // namespace Kokkos
#undef KOKKOS_IMPL_ONEDPL_VERSION
#undef KOKKOS_IMPL_ONEDPL_VERSION_GREATER_EQUAL
#endif

View File

@ -238,12 +238,9 @@ KOKKOS_INLINE_FUNCTION void expect_no_overlap(
[[maybe_unused]] IteratorType2 s_first) {
if constexpr (is_kokkos_iterator_v<IteratorType1> &&
is_kokkos_iterator_v<IteratorType2>) {
auto const view1 = first.view();
auto const view2 = s_first.view();
std::size_t stride1 = view1.stride(0);
std::size_t stride2 = view2.stride(0);
ptrdiff_t first_diff = view1.data() - view2.data();
std::size_t stride1 = first.stride();
std::size_t stride2 = s_first.stride();
ptrdiff_t first_diff = first.data() - s_first.data();
// FIXME If strides are not identical, checks may not be made
// with the cost of O(1)
@ -251,8 +248,8 @@ KOKKOS_INLINE_FUNCTION void expect_no_overlap(
// If first_diff == 0, there is already an overlap
if (stride1 == stride2 || first_diff == 0) {
[[maybe_unused]] bool is_no_overlap = (first_diff % stride1);
auto* first_pointer1 = view1.data();
auto* first_pointer2 = view2.data();
auto* first_pointer1 = first.data();
auto* first_pointer2 = s_first.data();
[[maybe_unused]] auto* last_pointer1 = first_pointer1 + (last - first);
[[maybe_unused]] auto* last_pointer2 = first_pointer2 + (last - first);
KOKKOS_EXPECTS(first_pointer1 >= last_pointer2 ||

View File

@ -150,9 +150,8 @@ KOKKOS_FUNCTION OutputIterator copy_if_team_impl(
return d_first + count;
}
#if defined KOKKOS_COMPILER_INTEL || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}

View File

@ -103,7 +103,7 @@ OutputIteratorType exclusive_scan_custom_op_exespace_impl(
// aliases
using index_type = typename InputIteratorType::difference_type;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<ValueType>;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
using func_type = TransformExclusiveScanFunctorWithValueWrapper<
ExecutionSpace, index_type, ValueType, InputIteratorType,
OutputIteratorType, BinaryOpType, unary_op_type>;
@ -177,7 +177,7 @@ KOKKOS_FUNCTION OutputIteratorType exclusive_scan_custom_op_team_impl(
// aliases
using exe_space = typename TeamHandleType::execution_space;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<ValueType>;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
using index_type = typename InputIteratorType::difference_type;
using func_type = TransformExclusiveScanFunctorWithoutValueWrapper<
exe_space, index_type, ValueType, InputIteratorType, OutputIteratorType,

View File

@ -23,10 +23,11 @@ namespace Kokkos {
namespace Experimental {
namespace Impl {
template <class ValueType>
struct StdNumericScanIdentityReferenceUnaryFunctor {
KOKKOS_FUNCTION
constexpr const ValueType& operator()(const ValueType& a) const { return a; }
template <class T>
KOKKOS_FUNCTION constexpr T&& operator()(T&& t) const {
return static_cast<T&&>(t);
}
};
} // namespace Impl

View File

@ -18,12 +18,60 @@
#define KOKKOS_STD_ALGORITHMS_INCLUSIVE_SCAN_IMPL_HPP
#include <Kokkos_Core.hpp>
#include <Kokkos_Profiling_ScopedRegion.hpp>
#include "Kokkos_Constraints.hpp"
#include "Kokkos_HelperPredicates.hpp"
#include <std_algorithms/Kokkos_TransformInclusiveScan.hpp>
#include <std_algorithms/Kokkos_Distance.hpp>
#include <string>
#if defined(KOKKOS_ENABLE_CUDA)
// Workaround for `Instruction 'shfl' without '.sync' is not supported on
// .target sm_70 and higher from PTX ISA version 6.4`.
// Also see https://github.com/NVIDIA/cub/pull/170.
#if !defined(CUB_USE_COOPERATIVE_GROUPS)
#define CUB_USE_COOPERATIVE_GROUPS
#endif
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wshadow"
#pragma GCC diagnostic ignored "-Wsuggest-override"
#if defined(KOKKOS_COMPILER_CLANG)
// Some versions of Clang fail to compile Thrust, failing with errors like
// this:
// <snip>/thrust/system/cuda/detail/core/agent_launcher.h:557:11:
// error: use of undeclared identifier 'va_printf'
// The exact combination of versions for Clang and Thrust (or CUDA) for this
// failure was not investigated, however even very recent version combination
// (Clang 10.0.0 and Cuda 10.0) demonstrated failure.
//
// Defining _CubLog here locally allows us to avoid that code path, however
// disabling some debugging diagnostics
#pragma push_macro("_CubLog")
#ifdef _CubLog
#undef _CubLog
#endif
// NOLINTNEXTLINE(bugprone-reserved-identifier)
#define _CubLog
#include <thrust/distance.h>
#include <thrust/scan.h>
#pragma pop_macro("_CubLog")
#else
#include <thrust/distance.h>
#include <thrust/scan.h>
#endif
#pragma GCC diagnostic pop
#endif
#if defined(KOKKOS_ENABLE_ROCTHRUST)
#include <thrust/distance.h>
#include <thrust/scan.h>
#endif
namespace Kokkos {
namespace Experimental {
namespace Impl {
@ -101,9 +149,48 @@ struct InclusiveScanDefaultFunctor {
}
};
//
// exespace impl
//
// -------------------------------------------------------------
// inclusive_scan_default_op_exespace_impl
// -------------------------------------------------------------
#if defined(KOKKOS_ENABLE_CUDA)
template <class InputIteratorType, class OutputIteratorType>
OutputIteratorType inclusive_scan_default_op_exespace_impl(
const std::string& label, const Cuda& ex, InputIteratorType first_from,
InputIteratorType last_from, OutputIteratorType first_dest) {
const auto thrust_ex = thrust::cuda::par.on(ex.cuda_stream());
Kokkos::Profiling::pushRegion(label + " via thrust::inclusive_scan");
thrust::inclusive_scan(thrust_ex, first_from, last_from, first_dest);
Kokkos::Profiling::popRegion();
const auto num_elements = thrust::distance(first_from, last_from);
return first_dest + num_elements;
}
#endif
#if defined(KOKKOS_ENABLE_ROCTHRUST)
template <class InputIteratorType, class OutputIteratorType>
OutputIteratorType inclusive_scan_default_op_exespace_impl(
const std::string& label, const HIP& ex, InputIteratorType first_from,
InputIteratorType last_from, OutputIteratorType first_dest) {
const auto thrust_ex = thrust::hip::par.on(ex.hip_stream());
Kokkos::Profiling::pushRegion(label + " via thrust::inclusive_scan");
thrust::inclusive_scan(thrust_ex, first_from, last_from, first_dest);
Kokkos::Profiling::popRegion();
const auto num_elements = thrust::distance(first_from, last_from);
return first_dest + num_elements;
}
#endif
template <class ExecutionSpace, class InputIteratorType,
class OutputIteratorType>
OutputIteratorType inclusive_scan_default_op_exespace_impl(
@ -132,11 +219,16 @@ OutputIteratorType inclusive_scan_default_op_exespace_impl(
// run
const auto num_elements =
Kokkos::Experimental::distance(first_from, last_from);
Kokkos::Profiling::pushRegion(label + " via Kokkos::parallel_scan");
::Kokkos::parallel_scan(label,
RangePolicy<ExecutionSpace>(ex, 0, num_elements),
func_type(first_from, first_dest));
ex.fence("Kokkos::inclusive_scan_default_op: fence after operation");
Kokkos::Profiling::popRegion();
// return
return first_dest + num_elements;
}
@ -144,6 +236,49 @@ OutputIteratorType inclusive_scan_default_op_exespace_impl(
// -------------------------------------------------------------
// inclusive_scan_custom_binary_op_impl
// -------------------------------------------------------------
#if defined(KOKKOS_ENABLE_CUDA)
template <class InputIteratorType, class OutputIteratorType, class BinaryOpType>
OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
const std::string& label, const Cuda& ex, InputIteratorType first_from,
InputIteratorType last_from, OutputIteratorType first_dest,
BinaryOpType binary_op) {
const auto thrust_ex = thrust::cuda::par.on(ex.cuda_stream());
Kokkos::Profiling::pushRegion(label + " via thrust::inclusive_scan");
thrust::inclusive_scan(thrust_ex, first_from, last_from, first_dest,
binary_op);
Kokkos::Profiling::popRegion();
const auto num_elements = thrust::distance(first_from, last_from);
return first_dest + num_elements;
}
#endif
#if defined(KOKKOS_ENABLE_ROCTHRUST)
template <class InputIteratorType, class OutputIteratorType, class BinaryOpType>
OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
const std::string& label, const HIP& ex, InputIteratorType first_from,
InputIteratorType last_from, OutputIteratorType first_dest,
BinaryOpType binary_op) {
const auto thrust_ex = thrust::hip::par.on(ex.hip_stream());
Kokkos::Profiling::pushRegion(label + " via thrust::inclusive_scan");
thrust::inclusive_scan(thrust_ex, first_from, last_from, first_dest,
binary_op);
Kokkos::Profiling::popRegion();
const auto num_elements = thrust::distance(first_from, last_from);
return first_dest + num_elements;
}
#endif
template <class ExecutionSpace, class InputIteratorType,
class OutputIteratorType, class BinaryOpType>
OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
@ -160,7 +295,7 @@ OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
using index_type = typename InputIteratorType::difference_type;
using value_type =
std::remove_const_t<typename InputIteratorType::value_type>;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<value_type>;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
using func_type = ExeSpaceTransformInclusiveScanNoInitValueFunctor<
ExecutionSpace, index_type, value_type, InputIteratorType,
OutputIteratorType, BinaryOpType, unary_op_type>;
@ -168,11 +303,16 @@ OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
// run
const auto num_elements =
Kokkos::Experimental::distance(first_from, last_from);
Kokkos::Profiling::pushRegion(label + " via Kokkos::parallel_scan");
::Kokkos::parallel_scan(
label, RangePolicy<ExecutionSpace>(ex, 0, num_elements),
func_type(first_from, first_dest, binary_op, unary_op_type()));
ex.fence("Kokkos::inclusive_scan_custom_binary_op: fence after operation");
Kokkos::Profiling::popRegion();
// return
return first_dest + num_elements;
}
@ -195,7 +335,7 @@ OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
// aliases
using index_type = typename InputIteratorType::difference_type;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<ValueType>;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
using func_type = ExeSpaceTransformInclusiveScanWithInitValueFunctor<
ExecutionSpace, index_type, ValueType, InputIteratorType,
OutputIteratorType, BinaryOpType, unary_op_type>;
@ -203,12 +343,17 @@ OutputIteratorType inclusive_scan_custom_binary_op_exespace_impl(
// run
const auto num_elements =
Kokkos::Experimental::distance(first_from, last_from);
Kokkos::Profiling::pushRegion(label + " via Kokkos::parallel_scan");
::Kokkos::parallel_scan(label,
RangePolicy<ExecutionSpace>(ex, 0, num_elements),
func_type(first_from, first_dest, binary_op,
unary_op_type(), std::move(init_value)));
ex.fence("Kokkos::inclusive_scan_custom_binary_op: fence after operation");
Kokkos::Profiling::popRegion();
// return
return first_dest + num_elements;
}
@ -283,7 +428,7 @@ KOKKOS_FUNCTION OutputIteratorType inclusive_scan_custom_binary_op_team_impl(
// aliases
using exe_space = typename TeamHandleType::execution_space;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<value_type>;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
using func_type = TeamTransformInclusiveScanNoInitValueFunctor<
exe_space, value_type, InputIteratorType, OutputIteratorType,
BinaryOpType, unary_op_type>;
@ -291,7 +436,6 @@ KOKKOS_FUNCTION OutputIteratorType inclusive_scan_custom_binary_op_team_impl(
// run
const auto num_elements =
Kokkos::Experimental::distance(first_from, last_from);
::Kokkos::parallel_scan(
TeamThreadRange(teamHandle, 0, num_elements),
func_type(first_from, first_dest, binary_op, unary_op_type()));
@ -325,7 +469,7 @@ KOKKOS_FUNCTION OutputIteratorType inclusive_scan_custom_binary_op_team_impl(
// aliases
using exe_space = typename TeamHandleType::execution_space;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor<ValueType>;
using unary_op_type = StdNumericScanIdentityReferenceUnaryFunctor;
using func_type = TeamTransformInclusiveScanWithInitValueFunctor<
exe_space, ValueType, InputIteratorType, OutputIteratorType, BinaryOpType,
unary_op_type>;

View File

@ -18,6 +18,7 @@
#define KOKKOS_RANDOM_ACCESS_ITERATOR_IMPL_HPP
#include <iterator>
#include <utility> // declval
#include <Kokkos_Macros.hpp>
#include <Kokkos_View.hpp>
#include "Kokkos_Constraints.hpp"
@ -29,8 +30,29 @@ namespace Impl {
template <class T>
class RandomAccessIterator;
namespace {
template <typename ViewType>
struct is_always_strided {
static_assert(is_view_v<ViewType>);
constexpr static bool value =
#ifdef KOKKOS_ENABLE_IMPL_MDSPAN
decltype(std::declval<ViewType>().to_mdspan())::is_always_strided();
#else
(std::is_same_v<typename ViewType::traits::array_layout,
Kokkos::LayoutLeft> ||
std::is_same_v<typename ViewType::traits::array_layout,
Kokkos::LayoutRight> ||
std::is_same_v<typename ViewType::traits::array_layout,
Kokkos::LayoutStride>);
#endif
};
} // namespace
template <class DataType, class... Args>
class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
class RandomAccessIterator<::Kokkos::View<DataType, Args...>> {
public:
using view_type = ::Kokkos::View<DataType, Args...>;
using iterator_type = RandomAccessIterator<view_type>;
@ -41,30 +63,31 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
using pointer = typename view_type::pointer_type;
using reference = typename view_type::reference_type;
// oneDPL needs this alias in order not to assume the data is on the host but on
// the device, see
// https://github.com/uxlfoundation/oneDPL/blob/a045eac689f9107f50ba7b42235e9e927118e483/include/oneapi/dpl/pstl/hetero/dpcpp/utils_ranges_sycl.h#L210-L214
#ifdef KOKKOS_ENABLE_ONEDPL
using is_passed_directly = std::true_type;
#endif
static_assert(view_type::rank == 1 &&
(std::is_same_v<typename view_type::traits::array_layout,
Kokkos::LayoutLeft> ||
std::is_same_v<typename view_type::traits::array_layout,
Kokkos::LayoutRight> ||
std::is_same_v<typename view_type::traits::array_layout,
Kokkos::LayoutStride>),
"RandomAccessIterator only supports 1D Views with LayoutLeft, "
"LayoutRight, LayoutStride.");
is_always_strided<::Kokkos::View<DataType, Args...>>::value);
KOKKOS_DEFAULTED_FUNCTION RandomAccessIterator() = default;
explicit KOKKOS_FUNCTION RandomAccessIterator(const view_type view)
: m_view(view) {}
: m_data(view.data()), m_stride(view.stride_0()) {}
explicit KOKKOS_FUNCTION RandomAccessIterator(const view_type view,
ptrdiff_t current_index)
: m_view(view), m_current_index(current_index) {}
: m_data(view.data() + current_index * view.stride_0()),
m_stride(view.stride_0()) {}
#ifndef KOKKOS_ENABLE_CXX17 // C++20 and beyond
template <class OtherViewType>
requires(std::is_constructible_v<view_type, OtherViewType>)
KOKKOS_FUNCTION explicit(!std::is_convertible_v<OtherViewType, view_type>)
RandomAccessIterator(const RandomAccessIterator<OtherViewType>& other)
: m_view(other.m_view), m_current_index(other.m_current_index) {}
: m_data(other.m_data), m_stride(other.m_stride) {}
#else
template <
class OtherViewType,
@ -73,19 +96,22 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
int> = 0>
KOKKOS_FUNCTION explicit RandomAccessIterator(
const RandomAccessIterator<OtherViewType>& other)
: m_view(other.m_view), m_current_index(other.m_current_index) {}
: m_data(other.m_data), m_stride(other.m_stride) {}
template <class OtherViewType,
std::enable_if_t<std::is_convertible_v<OtherViewType, view_type>,
int> = 0>
KOKKOS_FUNCTION RandomAccessIterator(
const RandomAccessIterator<OtherViewType>& other)
: m_view(other.m_view), m_current_index(other.m_current_index) {}
: m_data(other.m_data), m_stride(other.m_stride) {}
#endif
KOKKOS_FUNCTION
iterator_type& operator++() {
++m_current_index;
if constexpr (is_always_contiguous)
m_data++;
else
m_data += m_stride;
return *this;
}
@ -98,7 +124,10 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
KOKKOS_FUNCTION
iterator_type& operator--() {
--m_current_index;
if constexpr (is_always_contiguous)
m_data--;
else
m_data -= m_stride;
return *this;
}
@ -111,77 +140,95 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
KOKKOS_FUNCTION
reference operator[](difference_type n) const {
return m_view(m_current_index + n);
if constexpr (is_always_contiguous)
return *(m_data + n);
else
return *(m_data + n * m_stride);
}
KOKKOS_FUNCTION
iterator_type& operator+=(difference_type n) {
m_current_index += n;
if constexpr (is_always_contiguous)
m_data += n;
else
m_data += n * m_stride;
return *this;
}
KOKKOS_FUNCTION
iterator_type& operator-=(difference_type n) {
m_current_index -= n;
if constexpr (is_always_contiguous)
m_data -= n;
else
m_data -= n * m_stride;
return *this;
}
KOKKOS_FUNCTION
iterator_type operator+(difference_type n) const {
return iterator_type(m_view, m_current_index + n);
auto it = *this;
it += n;
return it;
}
friend iterator_type operator+(difference_type n, iterator_type other) {
return other + n;
}
KOKKOS_FUNCTION
iterator_type operator-(difference_type n) const {
return iterator_type(m_view, m_current_index - n);
auto it = *this;
it -= n;
return it;
}
KOKKOS_FUNCTION
difference_type operator-(iterator_type it) const {
return m_current_index - it.m_current_index;
if constexpr (is_always_contiguous)
return m_data - it.m_data;
else
return (m_data - it.m_data) / m_stride;
}
KOKKOS_FUNCTION
bool operator==(iterator_type other) const {
return m_current_index == other.m_current_index &&
m_view.data() == other.m_view.data();
return m_data == other.m_data && m_stride == other.m_stride;
}
KOKKOS_FUNCTION
bool operator!=(iterator_type other) const {
return m_current_index != other.m_current_index ||
m_view.data() != other.m_view.data();
return m_data != other.m_data || m_stride != other.m_stride;
}
KOKKOS_FUNCTION
bool operator<(iterator_type other) const {
return m_current_index < other.m_current_index;
}
bool operator<(iterator_type other) const { return m_data < other.m_data; }
KOKKOS_FUNCTION
bool operator<=(iterator_type other) const {
return m_current_index <= other.m_current_index;
}
bool operator<=(iterator_type other) const { return m_data <= other.m_data; }
KOKKOS_FUNCTION
bool operator>(iterator_type other) const {
return m_current_index > other.m_current_index;
}
bool operator>(iterator_type other) const { return m_data > other.m_data; }
KOKKOS_FUNCTION
bool operator>=(iterator_type other) const {
return m_current_index >= other.m_current_index;
}
bool operator>=(iterator_type other) const { return m_data >= other.m_data; }
KOKKOS_FUNCTION
reference operator*() const { return m_view(m_current_index); }
reference operator*() const { return *m_data; }
KOKKOS_FUNCTION
view_type view() const { return m_view; }
pointer data() const { return m_data; }
KOKKOS_FUNCTION
int stride() const { return m_stride; }
private:
view_type m_view;
ptrdiff_t m_current_index = 0;
pointer m_data;
int m_stride;
static constexpr bool is_always_contiguous =
(std::is_same_v<typename view_type::traits::array_layout,
Kokkos::LayoutLeft> ||
std::is_same_v<typename view_type::traits::array_layout,
Kokkos::LayoutRight>);
// Needed for the converting constructor accepting another iterator
template <class>
@ -192,4 +239,10 @@ class RandomAccessIterator< ::Kokkos::View<DataType, Args...> > {
} // namespace Experimental
} // namespace Kokkos
#ifdef KOKKOS_ENABLE_SYCL
template <class T>
struct sycl::is_device_copyable<
Kokkos::Experimental::Impl::RandomAccessIterator<T>> : std::true_type {};
#endif
#endif

View File

@ -52,13 +52,10 @@ struct StdUniqueFunctor {
auto& val_i = m_first_from[i];
const auto& val_ip1 = m_first_from[i + 1];
if (final_pass) {
if (!m_pred(val_i, val_ip1)) {
if (!m_pred(val_i, val_ip1)) {
if (final_pass) {
m_first_dest[update] = std::move(val_i);
}
}
if (!m_pred(val_i, val_ip1)) {
update += 1;
}
}
@ -188,6 +185,7 @@ KOKKOS_FUNCTION IteratorType unique_team_impl(const TeamHandleType& teamHandle,
IteratorType result = first;
IteratorType lfirst = first;
while (++lfirst != last) {
// NOLINTNEXTLINE(bugprone-inc-dec-in-conditions)
if (!pred(*result, *lfirst) && ++result != lfirst) {
*result = std::move(*lfirst);
}

View File

@ -175,9 +175,8 @@ KOKKOS_FUNCTION OutputIterator unique_copy_team_impl(
d_first + count);
}
#if defined KOKKOS_COMPILER_INTEL || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}

View File

@ -18,6 +18,8 @@ LINK ?= $(CXX)
LDFLAGS ?=
override LDFLAGS += -lpthread
KOKKOS_USE_DEPRECATED_MAKEFILES=1
include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/algorithms/unit_tests -I${KOKKOS_PATH}/core/unit_test/category_files

View File

@ -281,7 +281,7 @@ struct test_random_scalar {
double covariance_eps =
result.covariance / num_draws / 2 / variance_expect;
#if defined(KOKKOS_BHALF_T_IS_FLOAT) && !KOKKOS_BHALF_T_IS_FLOAT
if (!std::is_same<Scalar, Kokkos::Experimental::bhalf_t>::value) {
if (!std::is_same_v<Scalar, Kokkos::Experimental::bhalf_t>) {
#endif
EXPECT_LT(std::abs(mean_eps), tolerance);
EXPECT_LT(std::abs(variance_eps), 1.5 * tolerance);
@ -312,7 +312,7 @@ struct test_random_scalar {
(result.covariance / HIST_DIM1D - covariance_expect) / mean_expect;
#if defined(KOKKOS_HALF_T_IS_FLOAT) && !KOKKOS_HALF_T_IS_FLOAT
if (std::is_same<Scalar, Kokkos::Experimental::half_t>::value) {
if (std::is_same_v<Scalar, Kokkos::Experimental::half_t>) {
mean_eps_expect = 0.0003;
variance_eps_expect = 1.0;
covariance_eps_expect = 5.0e4;
@ -320,7 +320,7 @@ struct test_random_scalar {
#endif
#if defined(KOKKOS_BHALF_T_IS_FLOAT) && !KOKKOS_BHALF_T_IS_FLOAT
if (!std::is_same<Scalar, Kokkos::Experimental::bhalf_t>::value) {
if (!std::is_same_v<Scalar, Kokkos::Experimental::bhalf_t>) {
#endif
EXPECT_LT(std::abs(mean_eps), mean_eps_expect);
EXPECT_LT(std::abs(variance_eps), variance_eps_expect);
@ -358,13 +358,13 @@ struct test_random_scalar {
(result.covariance / HIST_DIM1D - covariance_expect) / mean_expect;
#if defined(KOKKOS_HALF_T_IS_FLOAT) && !KOKKOS_HALF_T_IS_FLOAT
if (std::is_same<Scalar, Kokkos::Experimental::half_t>::value) {
if (std::is_same_v<Scalar, Kokkos::Experimental::half_t>) {
variance_factor = 7;
}
#endif
#if defined(KOKKOS_BHALF_T_IS_FLOAT) && !KOKKOS_BHALF_T_IS_FLOAT
if (!std::is_same<Scalar, Kokkos::Experimental::bhalf_t>::value) {
if (!std::is_same_v<Scalar, Kokkos::Experimental::bhalf_t>) {
#endif
EXPECT_LT(std::abs(mean_eps), tolerance);
EXPECT_LT(std::abs(variance_eps), variance_factor);

View File

@ -37,12 +37,18 @@ struct random_access_iterator_test : std_algorithms_test {
TEST_F(random_access_iterator_test, constructor) {
// just tests that constructor works
auto it1 = KE::Impl::RandomAccessIterator<static_view_t>(m_static_view);
auto it2 = KE::Impl::RandomAccessIterator<dyn_view_t>(m_dynamic_view);
auto it3 = KE::Impl::RandomAccessIterator<strided_view_t>(m_strided_view);
auto it4 = KE::Impl::RandomAccessIterator<static_view_t>(m_static_view, 3);
auto it5 = KE::Impl::RandomAccessIterator<dyn_view_t>(m_dynamic_view, 3);
auto it6 = KE::Impl::RandomAccessIterator<strided_view_t>(m_strided_view, 3);
[[maybe_unused]] auto it1 =
KE::Impl::RandomAccessIterator<static_view_t>(m_static_view);
[[maybe_unused]] auto it2 =
KE::Impl::RandomAccessIterator<dyn_view_t>(m_dynamic_view);
[[maybe_unused]] auto it3 =
KE::Impl::RandomAccessIterator<strided_view_t>(m_strided_view);
[[maybe_unused]] auto it4 =
KE::Impl::RandomAccessIterator<static_view_t>(m_static_view, 3);
[[maybe_unused]] auto it5 =
KE::Impl::RandomAccessIterator<dyn_view_t>(m_dynamic_view, 3);
[[maybe_unused]] auto it6 =
KE::Impl::RandomAccessIterator<strided_view_t>(m_strided_view, 3);
EXPECT_TRUE(true);
}

View File

@ -99,6 +99,7 @@ void test_dynamic_view_sort_impl(unsigned int n) {
Kokkos::Experimental::DynamicView<KeyType*, ExecutionSpace>;
using KeyViewType = Kokkos::View<KeyType*, ExecutionSpace>;
// NOLINTNEXTLINE(bugprone-implicit-widening-of-multiplication-result)
const size_t upper_bound = 2 * n;
const size_t min_chunk_size = 1024;

View File

@ -198,9 +198,8 @@ auto create_deep_copyable_compatible_view_with_same_extent(ViewType view) {
// this is needed for intel to avoid
// error #1011: missing return statement at end of non-void function
#if defined KOKKOS_COMPILER_INTEL || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}

View File

@ -507,6 +507,20 @@ struct TestStruct {
}
};
#ifndef KOKKOS_ENABLE_CXX17
template <typename ViewType>
constexpr bool
test_kokkos_iterator_satify_std_random_access_iterator_concept() {
return std::random_access_iterator<
Kokkos::Experimental::Impl::RandomAccessIterator<ViewType>>;
}
static_assert(test_kokkos_iterator_satify_std_random_access_iterator_concept<
Kokkos::View<int *>>());
static_assert(test_kokkos_iterator_satify_std_random_access_iterator_concept<
Kokkos::View<const int *>>());
#endif
} // namespace compileonly
} // namespace stdalgos
} // namespace Test

View File

@ -173,6 +173,7 @@ TEST(std_algorithms_DeathTest, expect_no_overlap) {
KE::Impl::expect_no_overlap(sub_first_d0, sub_last_d0, sub_first_d1);
// NOLINTNEXTLINE(bugprone-implicit-widening-of-multiplication-result)
Kokkos::LayoutStride layout2d{2, 3, extent0, 2 * 3};
Kokkos::View<value_type**, Kokkos::LayoutStride> strided_view_2d{
"std-algo-test-2d-contiguous-view-strided", layout2d};

View File

@ -171,7 +171,7 @@ struct VerifyData {
create_mirror_view_and_copy(Kokkos::HostSpace(), test_view_dc);
if (test_view_h.extent(0) > 0) {
for (std::size_t i = 0; i < test_view_h.extent(0); ++i) {
if (std::is_same<gold_view_value_type, int>::value) {
if (std::is_same_v<gold_view_value_type, int>) {
ASSERT_EQ(gold_h(i), test_view_h(i));
} else {
const auto error =

View File

@ -184,7 +184,7 @@ struct VerifyData {
const auto ext = test_view_h.extent(0);
if (ext > 0) {
for (std::size_t i = 0; i < ext; ++i) {
if (std::is_same<gold_view_value_type, int>::value) {
if (std::is_same_v<gold_view_value_type, int>) {
ASSERT_EQ(gold_h(i), test_view_h(i));
} else {
const auto error =

View File

@ -153,12 +153,13 @@ void run_single_scenario(const InfoType& scenario_info) {
#if !defined KOKKOS_ENABLE_OPENMPTARGET
CustomLessThanComparator<ValueType, ValueType> comp;
auto r5 =
[[maybe_unused]] auto r5 =
KE::is_sorted_until(exespace(), KE::cbegin(view), KE::cend(view), comp);
auto r6 = KE::is_sorted_until("label", exespace(), KE::cbegin(view),
KE::cend(view), comp);
auto r7 = KE::is_sorted_until(exespace(), view, comp);
auto r8 = KE::is_sorted_until("label", exespace(), view, comp);
[[maybe_unused]] auto r6 = KE::is_sorted_until(
"label", exespace(), KE::cbegin(view), KE::cend(view), comp);
[[maybe_unused]] auto r7 = KE::is_sorted_until(exespace(), view, comp);
[[maybe_unused]] auto r8 =
KE::is_sorted_until("label", exespace(), view, comp);
#endif
ASSERT_EQ(r1, gold) << name << ", " << view_tag_to_string(Tag{});

View File

@ -53,13 +53,13 @@ TEST(std_algorithms_mod_ops_test, move) {
// move constr
MyMovableType b(std::move(a));
ASSERT_EQ(b.m_value, 11);
ASSERT_EQ(a.m_value, -2);
ASSERT_EQ(a.m_value, -2); // NOLINT(bugprone-use-after-move)
// move assign
MyMovableType c;
c = std::move(b);
ASSERT_EQ(c.m_value, 11);
ASSERT_EQ(b.m_value, -4);
ASSERT_EQ(b.m_value, -4); // NOLINT(bugprone-use-after-move)
}
template <class ViewType>
@ -70,7 +70,7 @@ struct StdAlgoModSeqOpsTestMove {
void operator()(const int index) const {
typename ViewType::value_type a{11};
using move_t = decltype(std::move(a));
static_assert(std::is_rvalue_reference<move_t>::value);
static_assert(std::is_rvalue_reference_v<move_t>);
m_view(index) = std::move(a);
}

View File

@ -243,16 +243,15 @@ void run_and_check_transform_reduce_overloadA(ViewType1 first_view,
ViewType2 second_view,
ValueType init_value,
ValueType result_value,
Args&&... args) {
Args const&... args) {
// trivial cases
const auto r1 = KE::transform_reduce(
ExecutionSpace(), KE::cbegin(first_view), KE::cbegin(first_view),
KE::cbegin(second_view), init_value, std::forward<Args>(args)...);
KE::cbegin(second_view), init_value, args...);
const auto r2 =
KE::transform_reduce("MYLABEL", ExecutionSpace(), KE::cbegin(first_view),
KE::cbegin(first_view), KE::cbegin(second_view),
init_value, std::forward<Args>(args)...);
const auto r2 = KE::transform_reduce(
"MYLABEL", ExecutionSpace(), KE::cbegin(first_view),
KE::cbegin(first_view), KE::cbegin(second_view), init_value, args...);
ASSERT_EQ(r1, init_value);
ASSERT_EQ(r2, init_value);
@ -260,18 +259,16 @@ void run_and_check_transform_reduce_overloadA(ViewType1 first_view,
// non trivial cases
const auto r3 = KE::transform_reduce(
ExecutionSpace(), KE::cbegin(first_view), KE::cend(first_view),
KE::cbegin(second_view), init_value, std::forward<Args>(args)...);
KE::cbegin(second_view), init_value, args...);
const auto r4 = KE::transform_reduce(
"MYLABEL", ExecutionSpace(), KE::cbegin(first_view), KE::cend(first_view),
KE::cbegin(second_view), init_value, std::forward<Args>(args)...);
KE::cbegin(second_view), init_value, args...);
const auto r5 =
KE::transform_reduce(ExecutionSpace(), first_view, second_view,
init_value, std::forward<Args>(args)...);
const auto r6 =
KE::transform_reduce("MYLABEL", ExecutionSpace(), first_view, second_view,
init_value, std::forward<Args>(args)...);
const auto r5 = KE::transform_reduce(ExecutionSpace(), first_view,
second_view, init_value, args...);
const auto r6 = KE::transform_reduce("MYLABEL", ExecutionSpace(), first_view,
second_view, init_value, args...);
ASSERT_EQ(r3, result_value);
ASSERT_EQ(r4, result_value);
@ -363,32 +360,30 @@ template <class ExecutionSpace, class ViewType, class ValueType, class... Args>
void run_and_check_transform_reduce_overloadB(ViewType view,
ValueType init_value,
ValueType result_value,
Args&&... args) {
Args const&... args) {
// trivial
const auto r1 =
KE::transform_reduce(ExecutionSpace(), KE::cbegin(view), KE::cbegin(view),
init_value, std::forward<Args>(args)...);
const auto r1 = KE::transform_reduce(ExecutionSpace(), KE::cbegin(view),
KE::cbegin(view), init_value, args...);
const auto r2 = KE::transform_reduce("MYLABEL", ExecutionSpace(),
KE::cbegin(view), KE::cbegin(view),
init_value, std::forward<Args>(args)...);
const auto r2 =
KE::transform_reduce("MYLABEL", ExecutionSpace(), KE::cbegin(view),
KE::cbegin(view), init_value, args...);
ASSERT_EQ(r1, init_value);
ASSERT_EQ(r2, init_value);
// non trivial
const auto r3 =
KE::transform_reduce(ExecutionSpace(), KE::cbegin(view), KE::cend(view),
init_value, std::forward<Args>(args)...);
const auto r3 = KE::transform_reduce(ExecutionSpace(), KE::cbegin(view),
KE::cend(view), init_value, args...);
const auto r4 = KE::transform_reduce("MYLABEL", ExecutionSpace(),
KE::cbegin(view), KE::cend(view),
init_value, std::forward<Args>(args)...);
const auto r5 = KE::transform_reduce(ExecutionSpace(), view, init_value,
std::forward<Args>(args)...);
const auto r4 =
KE::transform_reduce("MYLABEL", ExecutionSpace(), KE::cbegin(view),
KE::cend(view), init_value, args...);
const auto r5 =
KE::transform_reduce(ExecutionSpace(), view, init_value, args...);
const auto r6 = KE::transform_reduce("MYLABEL", ExecutionSpace(), view,
init_value, std::forward<Args>(args)...);
init_value, args...);
ASSERT_EQ(r3, result_value);
ASSERT_EQ(r4, result_value);

View File

@ -196,7 +196,7 @@ void run_single_scenario(const InfoType& scenario_info,
// create host copy BEFORE rotate or view will be modified
auto view_h = create_host_space_copy(view);
auto rit = KE::rotate(exespace(), view, rotation_point);
// verify_data(rit, view, view_h, rotation_point);
verify_data(rit, view, view_h, rotation_point);
}
{

View File

@ -191,6 +191,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
ASSERT_EQ(stdDistance, distancesView_h(i));
break;
}
default: Kokkos::abort("unreachable");
}
}

View File

@ -217,6 +217,7 @@ void test_A(const bool ensureAdjacentFindCanFind, std::size_t numTeams,
break;
}
default: Kokkos::abort("unreachable");
}
}
}

View File

@ -244,6 +244,7 @@ void test_A(const bool viewsAreEqual, std::size_t numTeams, std::size_t numCols,
break;
}
default: Kokkos::abort("unreachable");
}
}
}

View File

@ -224,6 +224,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
break;
}
#endif
default: Kokkos::abort("unreachable");
}
#undef exclusive_scan

View File

@ -227,6 +227,7 @@ void test_A(const bool sequencesExist, std::size_t numTeams,
break;
}
default: Kokkos::abort("unreachable");
}
if (sequencesExist) {

View File

@ -244,6 +244,7 @@ void test_A(const bool sequencesExist, std::size_t numTeams,
break;
}
default: Kokkos::abort("unreachable");
}
}
}

View File

@ -57,14 +57,7 @@ struct TestFunctorA {
const auto myRowIndex = member.league_rank();
auto myRowViewFrom = Kokkos::subview(m_dataView, myRowIndex, Kokkos::ALL());
const auto val = m_greaterThanValuesView(myRowIndex);
// FIXME_INTEL
#if defined(KOKKOS_COMPILER_INTEL) && (1900 == KOKKOS_COMPILER_INTEL)
GreaterEqualFunctor<
typename GreaterThanValuesViewType::non_const_value_type>
unaryPred{val};
#else
GreaterEqualFunctor unaryPred{val};
#endif
ptrdiff_t resultDist = 0;
switch (m_apiPick) {
@ -185,12 +178,7 @@ void test_A(const bool predicatesReturnTrue, std::size_t numTeams,
const auto rowFromBegin = KE::cbegin(rowFrom);
const auto rowFromEnd = KE::cend(rowFrom);
const auto val = greaterEqualValuesView_h(i);
// FIXME_INTEL
#if defined(KOKKOS_COMPILER_INTEL) && (1900 == KOKKOS_COMPILER_INTEL)
const GreaterEqualFunctor<ValueType> unaryPred{val};
#else
const GreaterEqualFunctor unaryPred{val};
#endif
auto it = std::find_if(rowFromBegin, rowFromEnd, unaryPred);

View File

@ -57,14 +57,7 @@ struct TestFunctorA {
const auto myRowIndex = member.league_rank();
auto myRowViewFrom = Kokkos::subview(m_dataView, myRowIndex, Kokkos::ALL());
const auto val = m_greaterThanValuesView(myRowIndex);
// FIXME_INTEL
#if defined(KOKKOS_COMPILER_INTEL) && (1900 == KOKKOS_COMPILER_INTEL)
GreaterEqualFunctor<
typename GreaterThanValuesViewType::non_const_value_type>
unaryPred{val};
#else
GreaterEqualFunctor unaryPred{val};
#endif
ptrdiff_t resultDist = 0;
switch (m_apiPick) {
@ -180,12 +173,7 @@ void test_A(const bool predicatesReturnTrue, std::size_t numTeams,
const auto rowFromBegin = KE::cbegin(rowFrom);
const auto rowFromEnd = KE::cend(rowFrom);
const auto val = greaterEqualValuesView_h(i);
// FIXME_INTEL
#if defined(KOKKOS_COMPILER_INTEL) && (1900 == KOKKOS_COMPILER_INTEL)
const GreaterEqualFunctor<ValueType> unaryPred{val};
#else
const GreaterEqualFunctor unaryPred{val};
#endif
auto it = std::find_if_not(rowFromBegin, rowFromEnd, unaryPred);

View File

@ -253,6 +253,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
break;
}
default: Kokkos::abort("unreachable");
}
#undef inclusive_scan

View File

@ -245,6 +245,7 @@ void test_A(const TestCaseType testCase, std::size_t numTeams,
break;
}
default: Kokkos::abort("unreachable");
}
}
}

View File

@ -249,6 +249,7 @@ void test_A(const bool viewsAreEqual, std::size_t numTeams, std::size_t numCols,
break;
}
default: Kokkos::abort("unreachable");
}
}
}

View File

@ -242,6 +242,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
break;
}
default: Kokkos::abort("unreachable");
}
#undef reduce

View File

@ -243,6 +243,7 @@ void test_A(const bool sequencesExist, std::size_t numTeams,
break;
}
default: Kokkos::abort("unreachable");
}
}
}

View File

@ -258,6 +258,7 @@ void test_A(const bool sequencesExist, std::size_t numTeams,
break;
}
default: Kokkos::abort("unreachable");
}
}
}

View File

@ -203,6 +203,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
ASSERT_EQ(stdDistance, distancesView_h(i));
break;
}
default: Kokkos::abort("unreachable");
}
#undef transform_exclusive_scan

View File

@ -240,6 +240,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
break;
}
default: Kokkos::abort("unreachable");
}
}
#undef transform_inclusive_scan

View File

@ -293,6 +293,7 @@ void test_A(std::size_t numTeams, std::size_t numCols, int apiId) {
break;
}
default: Kokkos::abort("unreachable");
}
#undef transform_reduce

View File

@ -344,8 +344,7 @@ TEST(std_algorithms_numeric_ops_test, transform_exclusive_scan_functor) {
using view_type = Kokkos::View<int*, exespace>;
view_type dummy_view("dummy_view", 0);
using unary_op_type =
Kokkos::Experimental::Impl::StdNumericScanIdentityReferenceUnaryFunctor<
int>;
Kokkos::Experimental::Impl::StdNumericScanIdentityReferenceUnaryFunctor;
using functor_type =
Kokkos::Experimental::Impl::TransformExclusiveScanFunctorWithValueWrapper<
exespace, int, int, view_type, view_type, MultiplyFunctor<int>,

View File

@ -390,8 +390,7 @@ TEST(std_algorithms_numeric_ops_test, transform_inclusive_scan_functor) {
int dummy = 0;
using view_type = Kokkos::View<int*, exespace>;
view_type dummy_view("dummy_view", 0);
using unary_op_type =
KE::Impl::StdNumericScanIdentityReferenceUnaryFunctor<int>;
using unary_op_type = KE::Impl::StdNumericScanIdentityReferenceUnaryFunctor;
{
using functor_type =
KE::Impl::ExeSpaceTransformInclusiveScanNoInitValueFunctor<

View File

@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
KOKKOS_CUDA_OPTIONS=enable_lambda
KOKKOS_ARCH = "SNB,Volta70"
KOKKOS_USE_DEPRECATED_MAKEFILES=1
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))

View File

@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
KOKKOS_CUDA_OPTIONS=enable_lambda
KOKKOS_ARCH = "SNB,Volta70"
KOKKOS_USE_DEPRECATED_MAKEFILES=1
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))

View File

@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
KOKKOS_CUDA_OPTIONS=enable_lambda
KOKKOS_ARCH = "SNB,Volta70"
KOKKOS_USE_DEPRECATED_MAKEFILES=1
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))

View File

@ -37,7 +37,7 @@
template <int V>
struct TestFunctor {
double values[V];
double values[V] = {};
Kokkos::View<double*> a;
int K;
TestFunctor(Kokkos::View<double*> a_, int K_) : a(a_), K(K_) {}
@ -50,7 +50,7 @@ struct TestFunctor {
template <int V>
struct TestRFunctor {
double values[V];
double values[V] = {};
Kokkos::View<double*> a;
int K;
TestRFunctor(Kokkos::View<double*> a_, int K_) : a(a_), K(K_) {}
@ -247,12 +247,15 @@ int main(int argc, char* argv[]) {
// anything that doesn't start with --
if (arg.size() < 2 ||
(arg.size() >= 2 && arg[0] != '-' && arg[1] != '-')) {
// signing off that arg.data() is null terminated
// NOLINTBEGIN(bugprone-suspicious-stringview-data-usage)
if (i == 1)
N = atoi(arg.data());
else if (i == 2)
M = atoi(arg.data());
else if (i == 3)
K = atoi(arg.data());
// NOLINTEND(bugprone-suspicious-stringview-data-usage)
else {
Kokkos::abort("unexpected argument!");
}

View File

@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
KOKKOS_CUDA_OPTIONS=enable_lambda
KOKKOS_ARCH = "SNB,Volta70"
KOKKOS_USE_DEPRECATED_MAKEFILES=1
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))

View File

@ -120,11 +120,12 @@ int main(int argc, char* argv[]) {
// view appropriately for test and should obey first-touch etc Second call to
// test is the one we actually care about and time
view_type_1d v_1(Kokkos::view_alloc(Kokkos::WithoutInitializing, "v_1"),
team_range * team_size);
static_cast<size_t>(team_range) * team_size);
view_type_2d v_2(Kokkos::view_alloc(Kokkos::WithoutInitializing, "v_2"),
team_range * team_size, thread_range);
static_cast<size_t>(team_range) * team_size, thread_range);
view_type_3d v_3(Kokkos::view_alloc(Kokkos::WithoutInitializing, "v_3"),
team_range * team_size, thread_range, vector_range);
static_cast<size_t>(team_range) * team_size, thread_range,
vector_range);
double result_computed = 0.0;
double result_expect = 0.0;

View File

@ -367,7 +367,7 @@ void test_policy(int team_range, int thread_range, int vector_range,
// parallel_for RangePolicy: range = team_size*team_range
if (test_type == 300) {
Kokkos::parallel_for(
"300 outer for", team_size * team_range,
"300 outer for", static_cast<size_t>(team_size) * team_range,
KOKKOS_LAMBDA(const int idx) {
v1(idx) = idx;
// prevent compiler from optimizing away the loop
@ -376,14 +376,15 @@ void test_policy(int team_range, int thread_range, int vector_range,
// parallel_reduce RangePolicy: range = team_size*team_range
if (test_type == 400) {
Kokkos::parallel_reduce(
"400 outer reduce", team_size * team_range,
"400 outer reduce", static_cast<size_t>(team_size) * team_range,
KOKKOS_LAMBDA(const int idx, double& val) { val += idx; }, result);
result_expect =
0.5 * (team_size * team_range) * (team_size * team_range - 1);
}
// parallel_scan RangePolicy: range = team_size*team_range
if (test_type == 500) {
Kokkos::parallel_scan("500 outer scan", team_size * team_range,
Kokkos::parallel_scan("500 outer scan",
static_cast<size_t>(team_size) * team_range,
ParallelScanFunctor<ViewType1>(v1)
#if 0
// This does not compile with pre Cuda 8.0 - see Github Issue #913 for explanation

View File

@ -2,6 +2,7 @@ KOKKOS_DEVICES=Cuda
KOKKOS_CUDA_OPTIONS=enable_lambda
KOKKOS_ARCH = "SNB,Volta70"
KOKKOS_USE_DEPRECATED_MAKEFILES=1
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))

View File

@ -1,6 +1,7 @@
KOKKOS_DEVICES=Serial
KOKKOS_ARCH = ""
KOKKOS_USE_DEPRECATED_MAKEFILES=1
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))

View File

@ -317,7 +317,7 @@ do
# End of Werror handling
#Handle unsupported standard flags
--std=c++1y|-std=c++1y|--std=gnu++1y|-std=gnu++1y|--std=c++1z|-std=c++1z|--std=gnu++1z|-std=gnu++1z|--std=c++2a|-std=c++2a)
fallback_std_flag="-std=c++14"
fallback_std_flag="-std=c++17"
# this is hopefully just occurring in a downstream project during CMake feature tests
# we really have no choice here but to accept the flag and change to an accepted C++ standard
echo "nvcc_wrapper does not accept standard flags $1 since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use $fallback_std_flag instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration."
@ -346,35 +346,17 @@ do
# NVCC only has C++20 from version 12 on
cuda_main_version=$([[ $(${nvcc_compiler} --version) =~ V([0-9]+) ]] && echo ${BASH_REMATCH[1]})
if [ ${cuda_main_version} -lt 12 ]; then
fallback_std_flag="-std=c++14"
fallback_std_flag="-std=c++17"
# this is hopefully just occurring in a downstream project during CMake feature tests
# we really have no choice here but to accept the flag and change to an accepted C++ standard
echo "nvcc_wrapper does not accept standard flags $1 since partial standard flags and standards after C++14 are not supported. nvcc_wrapper will use $fallback_std_flag instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration."
echo "nvcc_wrapper does not accept standard flags $1 since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use $fallback_std_flag instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration."
std_flag=$fallback_std_flag
else
std_flag=$1
fi
shared_args="$shared_args $std_flag"
;;
--std=c++17|-std=c++17)
if [ -n "$std_flag" ]; then
warn_std_flag
shared_args=${shared_args/ $std_flag/}
fi
# NVCC only has C++17 from version 11 on
cuda_main_version=$([[ $(${nvcc_compiler} --version) =~ V([0-9]+) ]] && echo ${BASH_REMATCH[1]})
if [ ${cuda_main_version} -lt 11 ]; then
fallback_std_flag="-std=c++14"
# this is hopefully just occurring in a downstream project during CMake feature tests
# we really have no choice here but to accept the flag and change to an accepted C++ standard
echo "nvcc_wrapper does not accept standard flags $1 since partial standard flags and standards after C++14 are not supported. nvcc_wrapper will use $fallback_std_flag instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration."
std_flag=$fallback_std_flag
else
std_flag=$1
fi
shared_args="$shared_args $std_flag"
;;
--std=c++11|-std=c++11|--std=c++14|-std=c++14)
--std=c++11|-std=c++11|--std=c++14|-std=c++14|--std=c++17|-std=c++17)
if [ -n "$std_flag" ]; then
warn_std_flag
shared_args=${shared_args/ $std_flag/}
@ -500,6 +482,20 @@ do
xlinker_args="$xlinker_args -Xlinker ${1:4:${#1}}"
host_linker_args="$host_linker_args ${1:4:${#1}}"
;;
#Handle host assembler options
-Wa,*)
#To pass the -Wa options to the host compiler via -Xcompiler it is necessary
#to use '\\,' for each comma in the options. As users might already add escapes
#to the comma by themselves, the escapes are first removed and then only the
#required number of \ are added back.
xcompiler_args_wa=$(echo -e "$1" | sed -E 's/\\\+,/,/g' | sed -E 's/,/\\\\\\\,/g')
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args="$xcompiler_args_wa"
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,$xcompiler_args_wa"
fi
;;
#Handle object files: -x cu applies to all input files, so give them to linker, except if only linking
*.a|*.so|*.o|*.obj)
object_files="$object_files $1"

View File

@ -2,65 +2,71 @@
# loaded by include() and find_package() commands except when invoked with
# the NO_POLICY_SCOPE option
# CMP0057 + NEW -> IN_LIST operator in IF(...)
CMAKE_POLICY(SET CMP0057 NEW)
cmake_policy(SET CMP0057 NEW)
# Compute paths
@PACKAGE_INIT@
#Find dependencies
INCLUDE(CMakeFindDependencyMacro)
include(CMakeFindDependencyMacro)
#This needs to go above the KokkosTargets in case
#the Kokkos targets depend in some way on the TPL imports
@KOKKOS_TPL_EXPORTS@
GET_FILENAME_COMPONENT(Kokkos_CMAKE_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH)
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosTargets.cmake")
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosConfigCommon.cmake")
UNSET(Kokkos_CMAKE_DIR)
get_filename_component(Kokkos_CMAKE_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH)
include("${Kokkos_CMAKE_DIR}/KokkosTargets.cmake")
include("${Kokkos_CMAKE_DIR}/KokkosConfigCommon.cmake")
unset(Kokkos_CMAKE_DIR)
# check for conflicts
IF("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS AND
"separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS)
MESSAGE(STATUS "'launch_compiler' implies global redirection of targets depending on Kokkos to appropriate compiler.")
MESSAGE(STATUS "'separable_compilation' implies explicitly defining where redirection occurs via 'kokkos_compilation(PROJECT|TARGET|SOURCE|DIRECTORY ...)'")
MESSAGE(FATAL_ERROR "Conflicting COMPONENTS: 'launch_compiler' and 'separable_compilation'")
ENDIF()
if("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS AND "separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS)
message(STATUS "'launch_compiler' implies global redirection of targets depending on Kokkos to appropriate compiler.")
message(
STATUS
"'separable_compilation' implies explicitly defining where redirection occurs via 'kokkos_compilation(PROJECT|TARGET|SOURCE|DIRECTORY ...)'"
)
message(FATAL_ERROR "Conflicting COMPONENTS: 'launch_compiler' and 'separable_compilation'")
endif()
IF("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS)
#
# if find_package(Kokkos COMPONENTS launch_compiler) then rely on the
# RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK to always redirect to the
# appropriate compiler for Kokkos
#
if("launch_compiler" IN_LIST Kokkos_FIND_COMPONENTS)
#
# if find_package(Kokkos COMPONENTS launch_compiler) then rely on the
# RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK to always redirect to the
# appropriate compiler for Kokkos
#
MESSAGE(STATUS "kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos")
kokkos_compilation(
GLOBAL
CHECK_CUDA_COMPILES)
message(
STATUS
"kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos"
)
kokkos_compilation(GLOBAL CHECK_CUDA_COMPILES)
ELSEIF(@Kokkos_ENABLE_CUDA@
AND NOT @KOKKOS_COMPILE_LANGUAGE@ STREQUAL CUDA
AND NOT "separable_compilation" IN_LIST Kokkos_FIND_COMPONENTS)
#
# if CUDA was enabled, the compilation language was not set to CUDA, and separable compilation was not
# specified, then set the RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK globally and
# kokkos_launch_compiler will re-direct to the compiler used to compile CUDA code during installation.
# kokkos_launch_compiler will re-direct if ${CMAKE_CXX_COMPILER} and -DKOKKOS_DEPENDENCE is present,
# otherwise, the original command will be executed
#
elseif(@Kokkos_ENABLE_CUDA@ AND NOT @KOKKOS_COMPILE_LANGUAGE@ STREQUAL CUDA AND NOT "separable_compilation" IN_LIST
Kokkos_FIND_COMPONENTS
)
#
# if CUDA was enabled, the compilation language was not set to CUDA, and separable compilation was not
# specified, then set the RULE_LAUNCH_COMPILE and RULE_LAUNCH_LINK globally and
# kokkos_launch_compiler will re-direct to the compiler used to compile CUDA code during installation.
# kokkos_launch_compiler will re-direct if ${CMAKE_CXX_COMPILER} and -DKOKKOS_DEPENDENCE is present,
# otherwise, the original command will be executed
#
# run test to see if CMAKE_CXX_COMPILER=nvcc_wrapper
kokkos_compiler_is_nvcc(IS_NVCC ${CMAKE_CXX_COMPILER})
# run test to see if CMAKE_CXX_COMPILER=nvcc_wrapper
kokkos_compiler_is_nvcc(IS_NVCC ${CMAKE_CXX_COMPILER})
# if not nvcc_wrapper and Kokkos_LAUNCH_COMPILER was not set to OFF
IF(NOT IS_NVCC AND (NOT DEFINED Kokkos_LAUNCH_COMPILER OR Kokkos_LAUNCH_COMPILER))
MESSAGE(STATUS "kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos")
kokkos_compilation(GLOBAL)
ENDIF()
# if not nvcc_wrapper and Kokkos_LAUNCH_COMPILER was not set to OFF
if(NOT IS_NVCC AND (NOT DEFINED Kokkos_LAUNCH_COMPILER OR Kokkos_LAUNCH_COMPILER))
message(
STATUS
"kokkos_launch_compiler is enabled globally. C++ compiler commands with -DKOKKOS_DEPENDENCE will be redirected to the appropriate compiler for Kokkos"
)
kokkos_compilation(GLOBAL)
endif()
# be mindful of the environment, pollution is bad
UNSET(IS_NVCC)
ENDIF()
# be mindful of the environment, pollution is bad
unset(IS_NVCC)
endif()
set(Kokkos_COMPILE_LANGUAGE @KOKKOS_COMPILE_LANGUAGE@)

View File

@ -1,67 +1,67 @@
SET(Kokkos_DEVICES @KOKKOS_ENABLED_DEVICES@)
SET(Kokkos_OPTIONS @KOKKOS_ENABLED_OPTIONS@)
SET(Kokkos_TPLS @KOKKOS_ENABLED_TPLS@)
SET(Kokkos_ARCH @KOKKOS_ENABLED_ARCH_LIST@)
SET(Kokkos_CXX_COMPILER "@CMAKE_CXX_COMPILER@")
SET(Kokkos_CXX_COMPILER_ID "@KOKKOS_CXX_COMPILER_ID@")
SET(Kokkos_CXX_COMPILER_VERSION "@KOKKOS_CXX_COMPILER_VERSION@")
SET(Kokkos_CXX_STANDARD @KOKKOS_CXX_STANDARD@)
set(Kokkos_DEVICES @KOKKOS_ENABLED_DEVICES@)
set(Kokkos_OPTIONS @KOKKOS_ENABLED_OPTIONS@)
set(Kokkos_TPLS @KOKKOS_ENABLED_TPLS@)
set(Kokkos_ARCH @KOKKOS_ENABLED_ARCH_LIST@)
set(Kokkos_CXX_COMPILER "@CMAKE_CXX_COMPILER@")
set(Kokkos_CXX_COMPILER_ID "@KOKKOS_CXX_COMPILER_ID@")
set(Kokkos_CXX_COMPILER_VERSION "@KOKKOS_CXX_COMPILER_VERSION@")
set(Kokkos_CXX_STANDARD @KOKKOS_CXX_STANDARD@)
# Required to be a TriBITS-compliant external package
IF(NOT TARGET Kokkos::all_libs)
if(NOT TARGET Kokkos::all_libs)
# CMake Error at <prefix>/lib/cmake/Kokkos/KokkosConfigCommon.cmake:10 (ADD_LIBRARY):
# ADD_LIBRARY cannot create ALIAS target "Kokkos::all_libs" because target
# "Kokkos::kokkos" is imported but not globally visible.
IF(CMAKE_VERSION VERSION_LESS "3.18")
SET_TARGET_PROPERTIES(Kokkos::kokkos PROPERTIES IMPORTED_GLOBAL ON)
ENDIF()
ADD_LIBRARY(Kokkos::all_libs ALIAS Kokkos::kokkos)
ENDIF()
if(CMAKE_VERSION VERSION_LESS "3.18")
set_target_properties(Kokkos::kokkos PROPERTIES IMPORTED_GLOBAL ON)
endif()
add_library(Kokkos::all_libs ALIAS Kokkos::kokkos)
endif()
# Export Kokkos_ENABLE_<BACKEND> for each backend that was enabled.
# NOTE: "Devices" is a little bit of a misnomer here. These are really
# backends, e.g. Kokkos_ENABLE_OPENMP, Kokkos_ENABLE_CUDA, Kokkos_ENABLE_HIP,
# or Kokkos_ENABLE_SYCL.
FOREACH(DEV ${Kokkos_DEVICES})
SET(Kokkos_ENABLE_${DEV} ON)
ENDFOREACH()
foreach(DEV ${Kokkos_DEVICES})
set(Kokkos_ENABLE_${DEV} ON)
endforeach()
# Export relevant Kokkos_ENABLE<OPTION> variables, e.g.
# Kokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE, Kokkos_ENABLE_DEBUG, etc.
FOREACH(OPT ${Kokkos_OPTIONS})
SET(Kokkos_ENABLE_${OPT} ON)
ENDFOREACH()
foreach(OPT ${Kokkos_OPTIONS})
set(Kokkos_ENABLE_${OPT} ON)
endforeach()
IF(Kokkos_ENABLE_CUDA)
SET(Kokkos_CUDA_ARCHITECTURES @KOKKOS_CUDA_ARCHITECTURES@)
ENDIF()
if(Kokkos_ENABLE_CUDA)
set(Kokkos_CUDA_ARCHITECTURES @KOKKOS_CUDA_ARCHITECTURES@)
endif()
IF(Kokkos_ENABLE_HIP)
SET(Kokkos_HIP_ARCHITECTURES @KOKKOS_HIP_ARCHITECTURES@)
ENDIF()
if(Kokkos_ENABLE_HIP)
set(Kokkos_HIP_ARCHITECTURES @KOKKOS_HIP_ARCHITECTURES@)
endif()
IF(NOT Kokkos_FIND_QUIETLY)
MESSAGE(STATUS "Enabled Kokkos devices: ${Kokkos_DEVICES}")
ENDIF()
if(NOT Kokkos_FIND_QUIETLY)
message(STATUS "Enabled Kokkos devices: ${Kokkos_DEVICES}")
endif()
IF (Kokkos_ENABLE_CUDA)
if(Kokkos_ENABLE_CUDA)
# If we are building CUDA, we have tricked CMake because we declare a CXX project
# If the default C++ standard for a given compiler matches the requested
# standard, then CMake just omits the -std flag in later versions of CMake
# This breaks CUDA compilation (CUDA compiler can have a different default
# -std then the underlying host compiler by itself). Setting this variable
# forces CMake to always add the -std flag even if it thinks it doesn't need it
SET(CMAKE_CXX_STANDARD_DEFAULT 98 CACHE INTERNAL "" FORCE)
ENDIF()
set(CMAKE_CXX_STANDARD_DEFAULT 98 CACHE INTERNAL "" FORCE)
endif()
SET(KOKKOS_USE_CXX_EXTENSIONS @KOKKOS_USE_CXX_EXTENSIONS@)
IF (NOT DEFINED CMAKE_CXX_EXTENSIONS OR CMAKE_CXX_EXTENSIONS)
IF (NOT KOKKOS_USE_CXX_EXTENSIONS)
MESSAGE(WARNING "The installed Kokkos configuration does not support CXX extensions. Forcing -DCMAKE_CXX_EXTENSIONS=Off")
SET(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "" FORCE)
ENDIF()
ENDIF()
include(FindPackageHandleStandardArgs)
set(KOKKOS_USE_CXX_EXTENSIONS @KOKKOS_USE_CXX_EXTENSIONS@)
if(NOT DEFINED CMAKE_CXX_EXTENSIONS OR CMAKE_CXX_EXTENSIONS)
if(NOT KOKKOS_USE_CXX_EXTENSIONS)
message(
WARNING "The installed Kokkos configuration does not support CXX extensions. Forcing -DCMAKE_CXX_EXTENSIONS=Off"
)
set(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "" FORCE)
endif()
endif()
# This function makes sure that Kokkos was built with the requested backends
# and target architectures and generates a fatal error if it was not.
@ -89,28 +89,23 @@ function(kokkos_check)
endforeach()
set(KOKKOS_CHECK_SUCCESS TRUE)
foreach(arg ${REQUESTED_ARGS})
# Define variables named after the required arguments that are provided by
# the Kokkos install.
set(MISSING_OPTIONS "")
foreach(requested ${KOKKOS_CHECK_${arg}})
set(FOUND_MATCHING_OPTION FALSE)
foreach(provided ${Kokkos_${arg}})
STRING(TOUPPER ${requested} REQUESTED_UC)
STRING(TOUPPER ${provided} PROVIDED_UC)
string(TOUPPER ${requested} REQUESTED_UC)
string(TOUPPER ${provided} PROVIDED_UC)
if(PROVIDED_UC STREQUAL REQUESTED_UC)
string(REPLACE ";" " " ${requested} "${KOKKOS_CHECK_${arg}}")
set(FOUND_MATCHING_OPTION TRUE)
endif()
endforeach()
if(NOT FOUND_MATCHING_OPTION)
list(APPEND MISSING_OPTIONS ${requested})
set(KOKKOS_CHECK_SUCCESS FALSE)
endif()
endforeach()
# Somewhat divert the CMake function below from its original purpose and
# use it to check that there are variables defined for all required
# arguments. Success or failure messages will be displayed but we are
# responsible for signaling failure and skip the build system generation.
if (KOKKOS_CHECK_RETURN_VALUE)
set(Kokkos_${arg}_FIND_QUIETLY ON)
endif()
find_package_handle_standard_args("Kokkos_${arg}" DEFAULT_MSG
${KOKKOS_CHECK_${arg}})
if(NOT Kokkos_${arg}_FOUND)
set(KOKKOS_CHECK_SUCCESS FALSE)
if(NOT KOKKOS_CHECK_SUCCESS AND NOT KOKKOS_CHECK_RETURN_VALUE)
message(STATUS "Could NOT find Kokkos_${arg} (missing: ${MISSING_OPTIONS})")
endif()
endforeach()
if(NOT KOKKOS_CHECK_SUCCESS AND NOT KOKKOS_CHECK_RETURN_VALUE)
@ -122,32 +117,35 @@ endfunction()
# A test to check whether a downstream project set the C++ compiler to NVCC or not
# this is called only when Kokkos was installed with Kokkos_ENABLE_CUDA=ON
FUNCTION(kokkos_compiler_is_nvcc VAR COMPILER)
# Check if the compiler is nvcc (which really means nvcc_wrapper).
EXECUTE_PROCESS(COMMAND ${COMPILER} ${ARGN} --version
OUTPUT_VARIABLE INTERNAL_COMPILER_VERSION
OUTPUT_STRIP_TRAILING_WHITESPACE
RESULT_VARIABLE RET)
# something went wrong
IF(RET GREATER 0)
SET(${VAR} false PARENT_SCOPE)
ELSE()
STRING(REPLACE "\n" " - " INTERNAL_COMPILER_VERSION_ONE_LINE ${INTERNAL_COMPILER_VERSION} )
STRING(FIND ${INTERNAL_COMPILER_VERSION_ONE_LINE} "nvcc" INTERNAL_COMPILER_VERSION_CONTAINS_NVCC)
STRING(REGEX REPLACE "^ +" "" INTERNAL_HAVE_COMPILER_NVCC "${INTERNAL_HAVE_COMPILER_NVCC}")
IF(${INTERNAL_COMPILER_VERSION_CONTAINS_NVCC} GREATER -1)
SET(${VAR} true PARENT_SCOPE)
ELSE()
SET(${VAR} false PARENT_SCOPE)
ENDIF()
ENDIF()
ENDFUNCTION()
function(kokkos_compiler_is_nvcc VAR COMPILER)
# Check if the compiler is nvcc (which really means nvcc_wrapper).
execute_process(
COMMAND ${COMPILER} ${ARGN} --version
OUTPUT_VARIABLE INTERNAL_COMPILER_VERSION
OUTPUT_STRIP_TRAILING_WHITESPACE
RESULT_VARIABLE RET
)
# something went wrong
if(RET GREATER 0)
set(${VAR} false PARENT_SCOPE)
else()
string(REPLACE "\n" " - " INTERNAL_COMPILER_VERSION_ONE_LINE ${INTERNAL_COMPILER_VERSION})
string(FIND ${INTERNAL_COMPILER_VERSION_ONE_LINE} "nvcc" INTERNAL_COMPILER_VERSION_CONTAINS_NVCC)
string(REGEX REPLACE "^ +" "" INTERNAL_HAVE_COMPILER_NVCC "${INTERNAL_HAVE_COMPILER_NVCC}")
if(${INTERNAL_COMPILER_VERSION_CONTAINS_NVCC} GREATER -1)
set(${VAR} true PARENT_SCOPE)
else()
set(${VAR} false PARENT_SCOPE)
endif()
endif()
endfunction()
# this function checks whether the current CXX compiler supports building CUDA
FUNCTION(kokkos_cxx_compiler_cuda_test _VAR _COMPILER)
function(kokkos_cxx_compiler_cuda_test _VAR _COMPILER)
FILE(WRITE ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
"
file(
WRITE ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
"
#include <cuda.h>
#include <cstdlib>
@ -171,34 +169,39 @@ int main()
cudaDeviceSynchronize();
return EXIT_SUCCESS;
}
")
"
)
# save the command for debugging
set(_COMMANDS "${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu")
# use execute_process instead of try compile because we want to set custom compiler
execute_process(
COMMAND ${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
RESULT_VARIABLE _RET
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
TIMEOUT 15
OUTPUT_QUIET ERROR_QUIET
)
if(NOT _RET EQUAL 0)
# save the command for debugging
SET(_COMMANDS "${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu")
set(_COMMANDS
"${_COMMAND}\n${_COMPILER} --cuda-gpu-arch=sm_35 ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu"
)
# try the compile test again with clang arguments
execute_process(
COMMAND ${_COMPILER} --cuda-gpu-arch=sm_35 -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
RESULT_VARIABLE _RET
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
TIMEOUT 15
OUTPUT_QUIET ERROR_QUIET
)
endif()
# use execute_process instead of try compile because we want to set custom compiler
EXECUTE_PROCESS(COMMAND ${_COMPILER} ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
RESULT_VARIABLE _RET
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
TIMEOUT 15
OUTPUT_QUIET
ERROR_QUIET)
IF(NOT _RET EQUAL 0)
# save the command for debugging
SET(_COMMANDS "${_COMMAND}\n${_COMPILER} --cuda-gpu-arch=sm_35 ${ARGN} -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu")
# try the compile test again with clang arguments
EXECUTE_PROCESS(COMMAND ${_COMPILER} --cuda-gpu-arch=sm_35 -c ${PROJECT_BINARY_DIR}/compile_tests/compiles_cuda.cu
RESULT_VARIABLE _RET
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/compile_tests
TIMEOUT 15
OUTPUT_QUIET
ERROR_QUIET)
ENDIF()
SET(${_VAR}_COMMANDS "${_COMMANDS}" PARENT_SCOPE)
SET(${_VAR} ${_RET} PARENT_SCOPE)
ENDFUNCTION()
set(${_VAR}_COMMANDS "${_COMMANDS}" PARENT_SCOPE)
set(${_VAR} ${_RET} PARENT_SCOPE)
endfunction()
# this function is provided to easily select which files use the same compiler as Kokkos
# when it was installed (or nvcc_wrapper):
@ -215,94 +218,107 @@ ENDFUNCTION()
#
# Use CHECK_CUDA_COMPILES to run a check when CUDA is enabled
#
FUNCTION(kokkos_compilation)
CMAKE_PARSE_ARGUMENTS(COMP
"GLOBAL;PROJECT;CHECK_CUDA_COMPILES"
"COMPILER"
"DIRECTORY;TARGET;SOURCE;COMMAND_PREFIX"
${ARGN})
function(kokkos_compilation)
cmake_parse_arguments(
COMP "GLOBAL;PROJECT;CHECK_CUDA_COMPILES" "COMPILER" "DIRECTORY;TARGET;SOURCE;COMMAND_PREFIX" ${ARGN}
)
# if built w/o CUDA support, we want to basically make this a no-op
SET(_Kokkos_ENABLE_CUDA @Kokkos_ENABLE_CUDA@)
# if built w/o CUDA support, we want to basically make this a no-op
set(_Kokkos_ENABLE_CUDA @Kokkos_ENABLE_CUDA@)
if(CMAKE_VERSION VERSION_GREATER_EQUAL 3.17)
set(MAYBE_CURRENT_INSTALLATION_ROOT "${CMAKE_CURRENT_FUNCTION_LIST_DIR}/../../..")
endif()
IF(CMAKE_VERSION VERSION_GREATER_EQUAL 3.17)
SET(MAYBE_CURRENT_INSTALLATION_ROOT "${CMAKE_CURRENT_FUNCTION_LIST_DIR}/../../..")
ENDIF()
# search relative first and then absolute
set(_HINTS "${MAYBE_CURRENT_INSTALLATION_ROOT}" "@CMAKE_INSTALL_PREFIX@")
# search relative first and then absolute
SET(_HINTS "${MAYBE_CURRENT_INSTALLATION_ROOT}" "@CMAKE_INSTALL_PREFIX@")
# find kokkos_launch_compiler
find_program(
Kokkos_COMPILE_LAUNCHER
NAMES kokkos_launch_compiler
HINTS ${_HINTS}
PATHS ${_HINTS}
PATH_SUFFIXES bin
)
# find kokkos_launch_compiler
FIND_PROGRAM(Kokkos_COMPILE_LAUNCHER
NAMES kokkos_launch_compiler
HINTS ${_HINTS}
PATHS ${_HINTS}
PATH_SUFFIXES bin)
if(NOT Kokkos_COMPILE_LAUNCHER)
message(
FATAL_ERROR
"Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'"
)
endif()
IF(NOT Kokkos_COMPILE_LAUNCHER)
MESSAGE(FATAL_ERROR "Kokkos could not find 'kokkos_launch_compiler'. Please set '-DKokkos_COMPILE_LAUNCHER=/path/to/launcher'")
ENDIF()
# if COMPILER was not specified, assume Kokkos_CXX_COMPILER
if(NOT COMP_COMPILER)
set(COMP_COMPILER ${Kokkos_CXX_COMPILER})
if(_Kokkos_ENABLE_CUDA AND Kokkos_CXX_COMPILER_ID STREQUAL NVIDIA)
# find nvcc_wrapper
find_program(
Kokkos_NVCC_WRAPPER
NAMES nvcc_wrapper
HINTS ${_HINTS}
PATHS ${_HINTS}
PATH_SUFFIXES bin
)
# fatal if we can't nvcc_wrapper
if(NOT Kokkos_NVCC_WRAPPER)
message(
FATAL_ERROR "Kokkos could not find nvcc_wrapper. Please set '-DKokkos_NVCC_WRAPPER=/path/to/nvcc_wrapper'"
)
endif()
set(COMP_COMPILER ${Kokkos_NVCC_WRAPPER})
endif()
endif()
# if COMPILER was not specified, assume Kokkos_CXX_COMPILER
IF(NOT COMP_COMPILER)
SET(COMP_COMPILER ${Kokkos_CXX_COMPILER})
IF(_Kokkos_ENABLE_CUDA AND Kokkos_CXX_COMPILER_ID STREQUAL NVIDIA)
# find nvcc_wrapper
FIND_PROGRAM(Kokkos_NVCC_WRAPPER
NAMES nvcc_wrapper
HINTS ${_HINTS}
PATHS ${_HINTS}
PATH_SUFFIXES bin)
# fatal if we can't nvcc_wrapper
IF(NOT Kokkos_NVCC_WRAPPER)
MESSAGE(FATAL_ERROR "Kokkos could not find nvcc_wrapper. Please set '-DKokkos_NVCC_WRAPPER=/path/to/nvcc_wrapper'")
ENDIF()
SET(COMP_COMPILER ${Kokkos_NVCC_WRAPPER})
ENDIF()
ENDIF()
# check that the original compiler still exists!
if(NOT EXISTS ${COMP_COMPILER})
message(FATAL_ERROR "Kokkos could not find original compiler: '${COMP_COMPILER}'")
endif()
# check that the original compiler still exists!
IF(NOT EXISTS ${COMP_COMPILER})
MESSAGE(FATAL_ERROR "Kokkos could not find original compiler: '${COMP_COMPILER}'")
ENDIF()
# try to ensure that compiling cuda code works!
if(_Kokkos_ENABLE_CUDA AND COMP_CHECK_CUDA_COMPILES)
# try to ensure that compiling cuda code works!
IF(_Kokkos_ENABLE_CUDA AND COMP_CHECK_CUDA_COMPILES)
# this may fail if kokkos_compiler launcher was used during install
kokkos_cxx_compiler_cuda_test(_COMPILES_CUDA ${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER})
# this may fail if kokkos_compiler launcher was used during install
kokkos_cxx_compiler_cuda_test(_COMPILES_CUDA
${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER})
# if above failed, throw an error
if(NOT _COMPILES_CUDA)
message(FATAL_ERROR "kokkos_cxx_compiler_cuda_test failed! Test commands:\n${_COMPILES_CUDA_COMMANDS}")
endif()
endif()
# if above failed, throw an error
IF(NOT _COMPILES_CUDA)
MESSAGE(FATAL_ERROR "kokkos_cxx_compiler_cuda_test failed! Test commands:\n${_COMPILES_CUDA_COMMANDS}")
ENDIF()
ENDIF()
if(COMP_COMMAND_PREFIX)
set(_PREFIX "${COMP_COMMAND_PREFIX}")
string(REPLACE ";" " " _PREFIX "${COMP_COMMAND_PREFIX}")
set(Kokkos_COMPILER_LAUNCHER "${_PREFIX} ${Kokkos_COMPILE_LAUNCHER}")
endif()
IF(COMP_COMMAND_PREFIX)
SET(_PREFIX "${COMP_COMMAND_PREFIX}")
STRING(REPLACE ";" " " _PREFIX "${COMP_COMMAND_PREFIX}")
SET(Kokkos_COMPILER_LAUNCHER "${_PREFIX} ${Kokkos_COMPILE_LAUNCHER}")
ENDIF()
IF(COMP_GLOBAL)
# if global, don't bother setting others
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
SET_PROPERTY(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
ELSE()
FOREACH(_TYPE PROJECT DIRECTORY TARGET SOURCE)
# make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...)
IF("${_TYPE}" STREQUAL "PROJECT" AND COMP_${_TYPE})
LIST(APPEND COMP_DIRECTORY ${PROJECT_SOURCE_DIR})
UNSET(COMP_${_TYPE})
ENDIF()
# set the properties if defined
IF(COMP_${_TYPE})
# MESSAGE(STATUS "Using ${COMP_COMPILER} :: ${_TYPE} :: ${COMP_${_TYPE}}")
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
SET_PROPERTY(${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
ENDIF()
ENDFOREACH()
ENDIF()
ENDFUNCTION()
if(COMP_GLOBAL)
# if global, don't bother setting others
set_property(
GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}"
)
set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK "${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}")
else()
foreach(_TYPE PROJECT DIRECTORY TARGET SOURCE)
# make project/subproject scoping easy, e.g. KokkosCompilation(PROJECT) after project(...)
if("${_TYPE}" STREQUAL "PROJECT" AND COMP_${_TYPE})
list(APPEND COMP_DIRECTORY ${PROJECT_SOURCE_DIR})
unset(COMP_${_TYPE})
endif()
# set the properties if defined
if(COMP_${_TYPE})
# MESSAGE(STATUS "Using ${COMP_COMPILER} :: ${_TYPE} :: ${COMP_${_TYPE}}")
set_property(
${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_COMPILE
"${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}"
)
set_property(
${_TYPE} ${COMP_${_TYPE}} PROPERTY RULE_LAUNCH_LINK
"${Kokkos_COMPILE_LAUNCHER} ${COMP_COMPILER} ${CMAKE_CXX_COMPILER}"
)
endif()
endforeach()
endif()
endfunction()

View File

@ -9,7 +9,9 @@
// KOKKOS_VERSION % 100 is the patch level
// KOKKOS_VERSION / 100 % 100 is the minor version
// KOKKOS_VERSION / 10000 is the major version
#define KOKKOS_VERSION @KOKKOS_VERSION@
#cmakedefine KOKKOS_VERSION @KOKKOS_VERSION@
// Not using #cmakedefine below because a "0" FOO version number
// yields /* undef KOKKOS_VERSION_FOO */
#define KOKKOS_VERSION_MAJOR @KOKKOS_VERSION_MAJOR@
#define KOKKOS_VERSION_MINOR @KOKKOS_VERSION_MINOR@
#define KOKKOS_VERSION_PATCH @KOKKOS_VERSION_PATCH@
@ -116,6 +118,7 @@
#cmakedefine KOKKOS_ARCH_AMD_ZEN
#cmakedefine KOKKOS_ARCH_AMD_ZEN2
#cmakedefine KOKKOS_ARCH_AMD_ZEN3
#cmakedefine KOKKOS_ARCH_AMD_ZEN4
#cmakedefine KOKKOS_ARCH_AMD_GFX906
#cmakedefine KOKKOS_ARCH_AMD_GFX908
#cmakedefine KOKKOS_ARCH_AMD_GFX90A

View File

@ -11,9 +11,16 @@ if(KOKKOS_CXX_HOST_COMPILER_ID STREQUAL NVHPC AND CMAKE_VERSION VERSION_LESS "3.
message(FATAL_ERROR "Using NVHPC as host compiler requires at least CMake 3.20.1")
endif()
set(TPL_CUDA_LIBRARIES "")
if(KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE)
set(TPL_CUDA_LIBRARIES CUDA::cuda_driver)
else()
set(TPL_CUDA_LIBRARIES CUDA::cuda_driver CUDA::cudart)
endif()
if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.17.0")
find_package(CUDAToolkit REQUIRED)
kokkos_create_imported_tpl(CUDA INTERFACE LINK_LIBRARIES CUDA::cuda_driver CUDA::cudart)
kokkos_create_imported_tpl(CUDA INTERFACE LINK_LIBRARIES ${TPL_CUDA_LIBRARIES})
kokkos_export_cmake_tpl(CUDAToolkit REQUIRED)
else()
include(${CMAKE_CURRENT_LIST_DIR}/CudaToolkit.cmake)
@ -33,8 +40,8 @@ else()
endif()
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(TPLCUDA ${DEFAULT_MSG} FOUND_CUDART FOUND_CUDA_DRIVER)
find_package_handle_standard_args(TPLCUDA ${DEFAULT_MSG} FOUND_CUDA_DRIVER FOUND_CUDART)
if(FOUND_CUDA_DRIVER AND FOUND_CUDART)
kokkos_create_imported_tpl(CUDA INTERFACE LINK_LIBRARIES CUDA::cuda_driver CUDA::cudart)
kokkos_create_imported_tpl(CUDA INTERFACE LINK_LIBRARIES ${TPL_CUDA_LIBRARIES})
endif()
endif()

View File

@ -1,15 +0,0 @@
function(kokkos_set_intel_flags full_standard int_standard)
string(TOLOWER ${full_standard} FULL_LC_STANDARD)
string(TOLOWER ${int_standard} INT_LC_STANDARD)
# The following three blocks of code were copied from
# /Modules/Compiler/Intel-CXX.cmake from CMake 3.18.1 and then modified.
if(CMAKE_CXX_SIMULATE_ID STREQUAL MSVC)
set(_std -Qstd)
set(_ext c++)
else()
set(_std -std)
set(_ext gnu++)
endif()
set(KOKKOS_CXX_STANDARD_FLAG "${_std}=c++${FULL_LC_STANDARD}" PARENT_SCOPE)
set(KOKKOS_CXX_INTERMDIATE_STANDARD_FLAG "${_std}=${_ext}${INT_LC_STANDARD}" PARENT_SCOPE)
endfunction()

View File

@ -67,6 +67,7 @@ declare_and_check_host_arch(POWER9 "IBM POWER9 CPUs")
declare_and_check_host_arch(ZEN "AMD Zen architecture")
declare_and_check_host_arch(ZEN2 "AMD Zen2 architecture")
declare_and_check_host_arch(ZEN3 "AMD Zen3 architecture")
declare_and_check_host_arch(ZEN4 "AMD Zen4 architecture")
declare_and_check_host_arch(RISCV_SG2042 "SG2042 (RISC-V) CPUs")
declare_and_check_host_arch(RISCV_RVA22V "RVA22V (RISC-V) CPUs")
@ -163,16 +164,11 @@ if(KOKKOS_ENABLE_COMPILER_WARNINGS)
endif()
endif()
# ICPC doesn't support -Wsuggest-override
if(KOKKOS_CXX_COMPILER_ID STREQUAL Intel)
list(REMOVE_ITEM COMMON_WARNINGS "-Wsuggest-override")
endif()
if(KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
list(APPEND COMMON_WARNINGS "-Wimplicit-fallthrough")
endif()
set(GNU_WARNINGS "-Wempty-body" "-Wclobbered" "-Wignored-qualifiers" ${COMMON_WARNINGS})
set(GNU_WARNINGS "-Wempty-body" "-Wignored-qualifiers" ${COMMON_WARNINGS})
if(KOKKOS_CXX_COMPILER_ID STREQUAL GNU)
list(APPEND GNU_WARNINGS "-Wimplicit-fallthrough")
endif()
@ -349,12 +345,27 @@ endif()
if(KOKKOS_ARCH_ARMV9_GRACE)
set(KOKKOS_ARCH_ARM_NEON ON)
check_cxx_compiler_flag("-mcpu=neoverse-n2" COMPILER_SUPPORTS_NEOVERSE_N2)
check_cxx_compiler_flag("-msve-vector-bits=128" COMPILER_SUPPORTS_SVE_VECTOR_BITS)
if(COMPILER_SUPPORTS_NEOVERSE_N2 AND COMPILER_SUPPORTS_SVE_VECTOR_BITS)
compiler_specific_flags(COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID DEFAULT -mcpu=neoverse-n2 -msve-vector-bits=128)
if(KOKKOS_CXX_HOST_COMPILER_ID STREQUAL NVHPC)
check_cxx_compiler_flag("-tp=grace" COMPILER_SUPPORTS_GRACE_AS_TARGET_PROCESSOR)
else()
message(WARNING "Compiler does not support ARMv9 Grace architecture")
check_cxx_compiler_flag("-mcpu=neoverse-n2" COMPILER_SUPPORTS_NEOVERSE_N2)
check_cxx_compiler_flag("-msve-vector-bits=128" COMPILER_SUPPORTS_SVE_VECTOR_BITS)
endif()
if(COMPILER_SUPPORTS_NEOVERSE_N2 AND COMPILER_SUPPORTS_SVE_VECTOR_BITS OR COMPILER_SUPPORTS_GRACE_AS_TARGET_PROCESSOR)
compiler_specific_flags(
COMPILER_ID
KOKKOS_CXX_HOST_COMPILER_ID
NVHPC
-tp=grace
DEFAULT
-mcpu=neoverse-n2
-msve-vector-bits=128
)
else()
message(SEND_ERROR "Your compiler does not appear to support the ARMv9 Grace architecture.
Please ensure you are using a compatible compiler and toolchain.
Alternatively, try configuring with -DKokkos_ARCH_NATIVE=ON to use the native architecture of your system."
)
endif()
endif()
@ -362,8 +373,6 @@ if(KOKKOS_ARCH_ZEN)
compiler_specific_flags(
COMPILER_ID
KOKKOS_CXX_HOST_COMPILER_ID
Intel
-mavx2
MSVC
/arch:AVX2
NVHPC
@ -380,8 +389,6 @@ if(KOKKOS_ARCH_ZEN2)
compiler_specific_flags(
COMPILER_ID
KOKKOS_CXX_HOST_COMPILER_ID
Intel
-mavx2
MSVC
/arch:AVX2
NVHPC
@ -398,12 +405,10 @@ if(KOKKOS_ARCH_ZEN3)
compiler_specific_flags(
COMPILER_ID
KOKKOS_CXX_HOST_COMPILER_ID
Intel
-mavx2
MSVC
/arch:AVX2
NVHPC
-tp=zen2
-tp=zen3
DEFAULT
-march=znver3
-mtune=znver3
@ -412,6 +417,22 @@ if(KOKKOS_ARCH_ZEN3)
set(KOKKOS_ARCH_AVX2 ON)
endif()
if(KOKKOS_ARCH_ZEN4)
compiler_specific_flags(
COMPILER_ID
KOKKOS_CXX_HOST_COMPILER_ID
MSVC
/arch:AVX512
NVHPC
-tp=zen4
DEFAULT
-march=znver4
-mtune=znver4
)
set(KOKKOS_ARCH_AMD_ZEN4 ON)
set(KOKKOS_ARCH_AVX512XEON ON)
endif()
if(KOKKOS_ARCH_SNB OR KOKKOS_ARCH_AMDAVX)
set(KOKKOS_ARCH_AVX ON)
compiler_specific_flags(
@ -419,8 +440,6 @@ if(KOKKOS_ARCH_SNB OR KOKKOS_ARCH_AMDAVX)
KOKKOS_CXX_HOST_COMPILER_ID
Cray
NO-VALUE-SPECIFIED
Intel
-mavx
MSVC
/arch:AVX
NVHPC
@ -437,8 +456,6 @@ if(KOKKOS_ARCH_HSW)
KOKKOS_CXX_HOST_COMPILER_ID
Cray
NO-VALUE-SPECIFIED
Intel
-xCORE-AVX2
MSVC
/arch:AVX2
NVHPC
@ -477,8 +494,6 @@ if(KOKKOS_ARCH_BDW)
KOKKOS_CXX_HOST_COMPILER_ID
Cray
NO-VALUE-SPECIFIED
Intel
-xCORE-AVX2
MSVC
/arch:AVX2
NVHPC
@ -498,8 +513,6 @@ if(KOKKOS_ARCH_KNL)
KOKKOS_CXX_HOST_COMPILER_ID
Cray
NO-VALUE-SPECIFIED
Intel
-xMIC-AVX512
MSVC
/arch:AVX512
NVHPC
@ -520,8 +533,6 @@ if(KOKKOS_ARCH_SKL)
KOKKOS_CXX_HOST_COMPILER_ID
Cray
NO-VALUE-SPECIFIED
Intel
-xSKYLAKE
MSVC
/arch:AVX2
NVHPC
@ -539,8 +550,6 @@ if(KOKKOS_ARCH_SKX)
KOKKOS_CXX_HOST_COMPILER_ID
Cray
NO-VALUE-SPECIFIED
Intel
-xCORE-AVX512
MSVC
/arch:AVX512
NVHPC
@ -1193,9 +1202,8 @@ if(KOKKOS_ENABLE_HIP AND NOT AMDGPU_ARCH_ALREADY_SPECIFIED AND NOT KOKKOS_IMPL_A
)
else()
execute_process(COMMAND ${ROCM_ENUMERATOR} OUTPUT_VARIABLE GPU_ARCHS)
string(LENGTH "${GPU_ARCHS}" len_str)
# enumerator always output gfx000 as the first line
if(${len_str} LESS 8)
# Exits early if no GPU was detected
if("${GPU_ARCHS}" STREQUAL "")
message(SEND_ERROR "HIP enabled but no AMD GPU architecture could be automatically detected. "
"Please manually specify one AMD GPU architecture via -DKokkos_ARCH_{..}=ON'."
)

View File

@ -163,7 +163,6 @@ if(CMAKE_CXX_STANDARD EQUAL 17)
set(KOKKOS_CLANG_CUDA_MINIMUM 10.0.0)
set(KOKKOS_CLANG_OPENMPTARGET_MINIMUM 15.0.0)
set(KOKKOS_GCC_MINIMUM 8.2.0)
set(KOKKOS_INTEL_MINIMUM 19.0.5)
set(KOKKOS_INTEL_LLVM_CPU_MINIMUM 2021.1.1)
set(KOKKOS_INTEL_LLVM_SYCL_MINIMUM 2023.0.0)
set(KOKKOS_NVCC_MINIMUM 11.0.0)
@ -175,7 +174,6 @@ else()
set(KOKKOS_CLANG_CUDA_MINIMUM 14.0.0)
set(KOKKOS_CLANG_OPENMPTARGET_MINIMUM 15.0.0)
set(KOKKOS_GCC_MINIMUM 10.1.0)
set(KOKKOS_INTEL_MINIMUM "not supported")
set(KOKKOS_INTEL_LLVM_CPU_MINIMUM 2022.0.0)
set(KOKKOS_INTEL_LLVM_SYCL_MINIMUM 2023.0.0)
set(KOKKOS_NVCC_MINIMUM 12.0.0)
@ -191,7 +189,7 @@ set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(CPU) ${KOKKO
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(CUDA) ${KOKKOS_CLANG_CUDA_MINIMUM}")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang(OpenMPTarget) ${KOKKOS_CLANG_OPENMPTARGET_MINIMUM}")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n GCC ${KOKKOS_GCC_MINIMUM}")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Intel ${KOKKOS_INTEL_MINIMUM}")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Intel not supported")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n IntelLLVM(CPU) ${KOKKOS_INTEL_LLVM_CPU_MINIMUM}")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n IntelLLVM(SYCL) ${KOKKOS_INTEL_LLVM_SYCL_MINIMUM}")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n NVCC ${KOKKOS_NVCC_MINIMUM}")
@ -214,9 +212,7 @@ elseif(KOKKOS_CXX_COMPILER_ID STREQUAL GNU)
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
endif()
elseif(KOKKOS_CXX_COMPILER_ID STREQUAL Intel)
if((NOT CMAKE_CXX_STANDARD EQUAL 17) OR (KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_INTEL_MINIMUM}))
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
endif()
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
elseif(KOKKOS_CXX_COMPILER_ID STREQUAL IntelLLVM AND NOT Kokkos_ENABLE_SYCL)
if(KOKKOS_CXX_COMPILER_VERSION VERSION_LESS ${KOKKOS_INTEL_LLVM_CPU_MINIMUM})
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")

View File

@ -76,7 +76,7 @@ kokkos_enable_option(
HIP_MULTIPLE_KERNEL_INSTANTIATIONS OFF
"Whether multiple kernels are instantiated at compile time - improve performance but increase compile time"
)
kokkos_enable_option(IMPL_HIP_MALLOC_ASYNC OFF "Whether to enable hipMallocAsync")
kokkos_enable_option(IMPL_HIP_MALLOC_ASYNC ${KOKKOS_ENABLE_HIP} "Whether to enable hipMallocAsync")
kokkos_enable_option(OPENACC_FORCE_HOST_AS_DEVICE OFF "Whether to force to use host as a target device for OpenACC")
# This option will go away eventually, but allows fallback to old implementation when needed.

View File

@ -799,7 +799,6 @@ function(COMPILER_SPECIFIC_OPTIONS_HELPER)
NVHPC
DEFAULT
Cray
Intel
Clang
AppleClang
IntelLLVM

View File

@ -155,9 +155,6 @@ if(NOT KOKKOS_CXX_STANDARD_FEATURE)
elseif(KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
include(${KOKKOS_SRC_PATH}/cmake/pgi.cmake)
kokkos_set_pgi_flags(${KOKKOS_CXX_STANDARD} ${KOKKOS_CXX_INTERMEDIATE_STANDARD})
elseif(KOKKOS_CXX_COMPILER_ID STREQUAL Intel)
include(${KOKKOS_SRC_PATH}/cmake/intel.cmake)
kokkos_set_intel_flags(${KOKKOS_CXX_STANDARD} ${KOKKOS_CXX_INTERMEDIATE_STANDARD})
elseif((KOKKOS_CXX_COMPILER_ID STREQUAL "MSVC") OR ((KOKKOS_CXX_COMPILER_ID STREQUAL "NVIDIA") AND WIN32))
include(${KOKKOS_SRC_PATH}/cmake/msvc.cmake)
kokkos_set_msvc_flags(${KOKKOS_CXX_STANDARD} ${KOKKOS_CXX_INTERMEDIATE_STANDARD})

View File

@ -106,7 +106,6 @@ function(KOKKOS_ADD_EXECUTABLE_AND_TEST ROOT_NAME)
OR Kokkos_ENABLE_SYCL
OR Kokkos_ENABLE_HPX
OR Kokkos_ENABLE_IMPL_SKIP_NO_RTTI_FLAG
OR (KOKKOS_CXX_COMPILER_ID STREQUAL "Intel" AND KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 2021.2.0)
OR (KOKKOS_CXX_COMPILER_ID STREQUAL "NVIDIA" AND KOKKOS_CXX_COMPILER_VERSION VERSION_LESS 11.3.0)
OR (KOKKOS_CXX_COMPILER_ID STREQUAL "NVIDIA" AND KOKKOS_CXX_HOST_COMPILER_ID STREQUAL "MSVC"))
)

View File

@ -18,6 +18,8 @@ LINK ?= $(CXX)
LDFLAGS ?=
override LDFLAGS += -lpthread
KOKKOS_USE_DEPRECATED_MAKEFILES=1
include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/performance_tests

View File

@ -22,6 +22,7 @@
#endif
#include <Kokkos_Core.hpp>
#include <Kokkos_BitManipulation.hpp>
#include <Kokkos_Functional.hpp>
#include <impl/Kokkos_Bitset_impl.hpp>
@ -62,13 +63,11 @@ class Bitset {
BIT_SCAN_REVERSE | MOVE_HINT_BACKWARD;
private:
enum : unsigned {
block_size = static_cast<unsigned>(sizeof(unsigned) * CHAR_BIT)
};
enum : unsigned { block_mask = block_size - 1u };
enum : unsigned {
block_shift = Kokkos::Impl::integral_power_of_two(block_size)
};
static constexpr unsigned block_size = sizeof(unsigned) * CHAR_BIT;
static constexpr unsigned block_mask = block_size - 1u;
static constexpr unsigned block_shift =
Kokkos::has_single_bit(block_size) ? Kokkos::bit_width(block_size) - 1
: ~0u;
//! Type of @ref m_blocks.
using block_view_type = View<unsigned*, Device, MemoryTraits<RandomAccess>>;
@ -135,9 +134,9 @@ class Bitset {
if (m_last_block_mask) {
// clear the unused bits in the last block
Kokkos::Impl::DeepCopy<typename Device::memory_space, Kokkos::HostSpace>(
m_blocks.data() + (m_blocks.extent(0) - 1u), &m_last_block_mask,
sizeof(unsigned));
auto last_block = Kokkos::subview(m_blocks, m_blocks.extent(0) - 1u);
Kokkos::deep_copy(typename Device::execution_space{}, last_block,
m_last_block_mask);
Kokkos::fence(
"Bitset::set: fence after clearing unused bits copying from "
"HostSpace");
@ -324,9 +323,11 @@ class ConstBitset {
using block_view_type = typename Bitset<Device>::block_view_type::const_type;
private:
enum { block_size = static_cast<unsigned>(sizeof(unsigned) * CHAR_BIT) };
enum { block_mask = block_size - 1u };
enum { block_shift = Kokkos::Impl::integral_power_of_two(block_size) };
static constexpr unsigned block_size = sizeof(unsigned) * CHAR_BIT;
static constexpr unsigned block_mask = block_size - 1u;
static constexpr unsigned block_shift =
Kokkos::has_single_bit(block_size) ? Kokkos::bit_width(block_size) - 1
: ~0u;
public:
KOKKOS_FUNCTION
@ -400,13 +401,7 @@ void deep_copy(Bitset<DstDevice>& dst, Bitset<SrcDevice> const& src) {
Kokkos::Impl::throw_runtime_exception(
"Error: Cannot deep_copy bitsets of different sizes!");
}
Kokkos::fence("Bitset::deep_copy: fence before copy operation");
Kokkos::Impl::DeepCopy<typename DstDevice::memory_space,
typename SrcDevice::memory_space>(
dst.m_blocks.data(), src.m_blocks.data(),
sizeof(unsigned) * src.m_blocks.extent(0));
Kokkos::fence("Bitset::deep_copy: fence after copy operation");
Kokkos::deep_copy(dst.m_blocks, src.m_blocks);
}
template <typename DstDevice, typename SrcDevice>
@ -415,13 +410,7 @@ void deep_copy(Bitset<DstDevice>& dst, ConstBitset<SrcDevice> const& src) {
Kokkos::Impl::throw_runtime_exception(
"Error: Cannot deep_copy bitsets of different sizes!");
}
Kokkos::fence("Bitset::deep_copy: fence before copy operation");
Kokkos::Impl::DeepCopy<typename DstDevice::memory_space,
typename SrcDevice::memory_space>(
dst.m_blocks.data(), src.m_blocks.data(),
sizeof(unsigned) * src.m_blocks.extent(0));
Kokkos::fence("Bitset::deep_copy: fence after copy operation");
Kokkos::deep_copy(dst.m_blocks, src.m_blocks);
}
template <typename DstDevice, typename SrcDevice>
@ -430,13 +419,7 @@ void deep_copy(ConstBitset<DstDevice>& dst, ConstBitset<SrcDevice> const& src) {
Kokkos::Impl::throw_runtime_exception(
"Error: Cannot deep_copy bitsets of different sizes!");
}
Kokkos::fence("Bitset::deep_copy: fence before copy operation");
Kokkos::Impl::DeepCopy<typename DstDevice::memory_space,
typename SrcDevice::memory_space>(
dst.m_blocks.data(), src.m_blocks.data(),
sizeof(unsigned) * src.m_blocks.extent(0));
Kokkos::fence("Bitset::deep_copy: fence after copy operation");
Kokkos::deep_copy(dst.m_blocks, src.m_blocks);
}
} // namespace Kokkos

View File

@ -211,6 +211,12 @@ class DualView : public ViewTraits<DataType, Properties...> {
public:
//@}
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
public:
#else
private:
#endif
// Moved this specifically after modified_flags to resolve an alignment issue
// on MSVC/NVCC
//! \name The two View instances.
@ -219,6 +225,7 @@ class DualView : public ViewTraits<DataType, Properties...> {
t_host h_view;
//@}
public:
//! \name Constructors
//@{
@ -456,16 +463,21 @@ class DualView : public ViewTraits<DataType, Properties...> {
}
}
}
#ifdef KOKKOS_COMPILER_INTEL
__builtin_unreachable();
#endif
}
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
KOKKOS_INLINE_FUNCTION
t_host view_host() const { return h_view; }
KOKKOS_INLINE_FUNCTION
t_dev view_device() const { return d_view; }
#else
KOKKOS_INLINE_FUNCTION
const t_host& view_host() const { return h_view; }
KOKKOS_INLINE_FUNCTION
const t_dev& view_device() const { return d_view; }
#endif
KOKKOS_INLINE_FUNCTION constexpr bool is_allocated() const {
return (d_view.is_allocated() && h_view.is_allocated());
@ -615,8 +627,8 @@ class DualView : public ViewTraits<DataType, Properties...> {
impl_report_host_sync();
}
}
if constexpr (std::is_same<typename t_host::memory_space,
typename t_dev::memory_space>::value) {
if constexpr (std::is_same_v<typename t_host::memory_space,
typename t_dev::memory_space>) {
typename t_dev::execution_space().fence(
"Kokkos::DualView<>::sync: fence after syncing DualView");
typename t_host::execution_space().fence(
@ -687,8 +699,8 @@ class DualView : public ViewTraits<DataType, Properties...> {
// deliberately passing args by cref as they're used multiple times
template <typename... Args>
void sync_host_impl(Args const&... args) {
if (!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value)
if (!std::is_same_v<typename traits::data_type,
typename traits::non_const_data_type>)
Impl::throw_runtime_exception(
"Calling sync_host on a DualView with a const datatype.");
if (modified_flags.data() == nullptr) return;
@ -718,8 +730,8 @@ class DualView : public ViewTraits<DataType, Properties...> {
// deliberately passing args by cref as they're used multiple times
template <typename... Args>
void sync_device_impl(Args const&... args) {
if (!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value)
if (!std::is_same_v<typename traits::data_type,
typename traits::non_const_data_type>)
Impl::throw_runtime_exception(
"Calling sync_device on a DualView with a const datatype.");
if (modified_flags.data() == nullptr) return;
@ -1264,10 +1276,10 @@ namespace Kokkos {
template <class DT, class... DP, class ST, class... SP>
void deep_copy(DualView<DT, DP...>& dst, const DualView<ST, SP...>& src) {
if (src.need_sync_device()) {
deep_copy(dst.h_view, src.h_view);
deep_copy(dst.view_host(), src.view_host());
dst.modify_host();
} else {
deep_copy(dst.d_view, src.d_view);
deep_copy(dst.view_device(), src.view_device());
dst.modify_device();
}
}
@ -1276,10 +1288,10 @@ template <class ExecutionSpace, class DT, class... DP, class ST, class... SP>
void deep_copy(const ExecutionSpace& exec, DualView<DT, DP...>& dst,
const DualView<ST, SP...>& src) {
if (src.need_sync_device()) {
deep_copy(exec, dst.h_view, src.h_view);
deep_copy(exec, dst.view_host(), src.view_host());
dst.modify_host();
} else {
deep_copy(exec, dst.d_view, src.d_view);
deep_copy(exec, dst.view_device(), src.view_device());
dst.modify_device();
}
}

View File

@ -626,9 +626,8 @@ class DynRankView : private View<DataType*******, Properties...> {
} else
#endif
return view_type::operator()(i0, 0, 0, 0, 0, 0, 0);
#if defined KOKKOS_COMPILER_INTEL || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}
@ -656,9 +655,8 @@ class DynRankView : private View<DataType*******, Properties...> {
} else
#endif
return view_type::operator()(i0, i1, 0, 0, 0, 0, 0);
#if defined KOKKOS_COMPILER_INTEL || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}
@ -690,9 +688,8 @@ class DynRankView : private View<DataType*******, Properties...> {
} else
#endif
return view_type::operator()(i0, i1, i2, 0, 0, 0, 0);
#if defined KOKKOS_COMPILER_INTEL || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}
@ -1124,57 +1121,6 @@ KOKKOS_INLINE_FUNCTION bool operator!=(const DynRankView<LT, LP...>& lhs,
namespace Kokkos {
namespace Impl {
template <class OutputView, class Enable = void>
struct DynRankViewFill {
using const_value_type = typename OutputView::traits::const_value_type;
const OutputView output;
const_value_type input;
KOKKOS_INLINE_FUNCTION
void operator()(const size_t i0) const {
const size_t n1 = output.extent(1);
const size_t n2 = output.extent(2);
const size_t n3 = output.extent(3);
const size_t n4 = output.extent(4);
const size_t n5 = output.extent(5);
const size_t n6 = output.extent(6);
for (size_t i1 = 0; i1 < n1; ++i1) {
for (size_t i2 = 0; i2 < n2; ++i2) {
for (size_t i3 = 0; i3 < n3; ++i3) {
for (size_t i4 = 0; i4 < n4; ++i4) {
for (size_t i5 = 0; i5 < n5; ++i5) {
for (size_t i6 = 0; i6 < n6; ++i6) {
output.access(i0, i1, i2, i3, i4, i5, i6) = input;
}
}
}
}
}
}
}
DynRankViewFill(const OutputView& arg_out, const_value_type& arg_in)
: output(arg_out), input(arg_in) {
using execution_space = typename OutputView::execution_space;
using Policy = Kokkos::RangePolicy<execution_space>;
Kokkos::parallel_for("Kokkos::DynRankViewFill", Policy(0, output.extent(0)),
*this);
}
};
template <class OutputView>
struct DynRankViewFill<OutputView, std::enable_if_t<OutputView::rank == 0>> {
DynRankViewFill(const OutputView& dst,
const typename OutputView::const_value_type& src) {
Kokkos::Impl::DeepCopy<typename OutputView::memory_space,
Kokkos::HostSpace>(
dst.data(), &src, sizeof(typename OutputView::const_value_type));
}
};
template <class OutputView, class InputView,
class ExecSpace = typename OutputView::execution_space>
struct DynRankViewRemap {
@ -1521,9 +1467,8 @@ inline auto create_mirror(const DynRankView<T, P...>& src,
return dst_type(prop_copy,
Impl::reconstructLayout(src.layout(), src.rank()));
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}
@ -1611,9 +1556,8 @@ inline auto create_mirror_view(
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}
@ -1754,6 +1698,7 @@ inline void impl_resize(const Impl::ViewCtorProp<ViewCtorArgs...>& arg_prop,
Kokkos::Impl::DynRankViewRemap<drview_type, drview_type>(
Impl::get_property<Impl::ExecutionSpaceTag>(prop_copy), v_resized, v);
else {
// NOLINTNEXTLINE(bugprone-unused-raii)
Kokkos::Impl::DynRankViewRemap<drview_type, drview_type>(v_resized, v);
Kokkos::fence("Kokkos::resize(DynRankView)");
}

View File

@ -155,6 +155,7 @@ struct ChunkedArrayManager {
}
// Destroy the linked allocation if we have one.
if (m_linked != nullptr) {
// NOLINTNEXTLINE(bugprone-multi-level-implicit-pointer-conversion)
Space().deallocate(m_label.c_str(), m_linked,
(sizeof(value_type*) * (m_chunk_max + 2)));
}
@ -195,11 +196,13 @@ struct ChunkedArrayManager {
void deep_copy_to(
const ExecutionSpace& exec_space,
ChunkedArrayManager<OtherMemorySpace, ValueType> const& other) const {
if (other.m_chunks != m_chunks) {
Kokkos::Impl::DeepCopy<OtherMemorySpace, MemorySpace, ExecutionSpace>(
exec_space, other.m_chunks, m_chunks,
sizeof(pointer_type) * (m_chunk_max + 2));
}
// use of ad-hoc unmanaged views
Kokkos::deep_copy(
exec_space,
Kokkos::View<uintptr_t*, OtherMemorySpace>(
reinterpret_cast<uintptr_t*>(other.m_chunks), m_chunk_max + 2),
Kokkos::View<uintptr_t*, MemorySpace>(
reinterpret_cast<uintptr_t*>(m_chunks), m_chunk_max + 2));
}
KOKKOS_INLINE_FUNCTION
@ -621,9 +624,8 @@ inline auto create_mirror(const Kokkos::Experimental::DynamicView<T, P...>& src,
return ret;
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}
@ -718,9 +720,8 @@ inline auto create_mirror_view(
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}
@ -789,9 +790,9 @@ inline void deep_copy(const Kokkos::Experimental::DynamicView<T, DP...>& dst,
dst_memory_space>::accessible;
if (DstExecCanAccessSrc)
Kokkos::Impl::ViewRemap<dst_type, src_type, dst_execution_space>(dst, src);
Kokkos::Impl::ViewRemap<dst_type, src_type>(dst, src);
else if (SrcExecCanAccessDst)
Kokkos::Impl::ViewRemap<dst_type, src_type, src_execution_space>(dst, src);
Kokkos::Impl::ViewRemap<dst_type, src_type>(dst, src);
else
src.impl_get_chunks().deep_copy_to(dst_execution_space{},
dst.impl_get_chunks());
@ -819,9 +820,9 @@ inline void deep_copy(const ExecutionSpace& exec,
// FIXME use execution space
if (DstExecCanAccessSrc)
Kokkos::Impl::ViewRemap<dst_type, src_type, dst_execution_space>(dst, src);
Kokkos::Impl::ViewRemap<dst_type, src_type>(dst, src);
else if (SrcExecCanAccessDst)
Kokkos::Impl::ViewRemap<dst_type, src_type, src_execution_space>(dst, src);
Kokkos::Impl::ViewRemap<dst_type, src_type>(dst, src);
else
src.impl_get_chunks().deep_copy_to(exec, dst.impl_get_chunks());
}
@ -873,7 +874,7 @@ inline void deep_copy(const Kokkos::Experimental::DynamicView<T, DP...>& dst,
namespace Impl {
template <class Arg0, class... DP, class... SP>
struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>,
Kokkos::Experimental::DynamicView<SP...>, 1, Arg0> {
Kokkos::Experimental::DynamicView<SP...>, Arg0> {
using DstType = Kokkos::Experimental::DynamicView<DP...>;
using SrcType = Kokkos::Experimental::DynamicView<SP...>;
using dst_subview_type = DstType;
@ -885,8 +886,7 @@ struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>,
};
template <class... DP, class SrcType, class Arg0>
struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>, SrcType, 1,
Arg0> {
struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>, SrcType, Arg0> {
using DstType = Kokkos::Experimental::DynamicView<DP...>;
using dst_subview_type = DstType;
using src_subview_type = typename Kokkos::Subview<SrcType, Arg0>;
@ -897,8 +897,7 @@ struct CommonSubview<Kokkos::Experimental::DynamicView<DP...>, SrcType, 1,
};
template <class DstType, class... SP, class Arg0>
struct CommonSubview<DstType, Kokkos::Experimental::DynamicView<SP...>, 1,
Arg0> {
struct CommonSubview<DstType, Kokkos::Experimental::DynamicView<SP...>, Arg0> {
using SrcType = Kokkos::Experimental::DynamicView<SP...>;
using dst_subview_type = typename Kokkos::Subview<DstType, Arg0>;
using src_subview_type = SrcType;

View File

@ -43,7 +43,7 @@ class ErrorReporter {
clear();
}
int getCapacity() const { return m_reports.h_view.extent(0); }
int getCapacity() const { return m_reports.view_host().extent(0); }
int getNumReports();
@ -69,9 +69,10 @@ class ErrorReporter {
bool add_report(int reporter_id, report_type report) const {
int idx = Kokkos::atomic_fetch_add(&m_numReportsAttempted(), 1);
if (idx >= 0 && (idx < static_cast<int>(m_reports.d_view.extent(0)))) {
m_reporters.d_view(idx) = reporter_id;
m_reports.d_view(idx) = report;
if (idx >= 0 &&
(idx < static_cast<int>(m_reports.view_device().extent(0)))) {
m_reporters.view_device()(idx) = reporter_id;
m_reports.view_device()(idx) = report;
return true;
} else {
return false;
@ -92,8 +93,8 @@ template <typename ReportType, typename DeviceType>
inline int ErrorReporter<ReportType, DeviceType>::getNumReports() {
int num_reports = 0;
Kokkos::deep_copy(num_reports, m_numReportsAttempted);
if (num_reports > static_cast<int>(m_reports.h_view.extent(0))) {
num_reports = m_reports.h_view.extent(0);
if (num_reports > static_cast<int>(m_reports.view_host().extent(0))) {
num_reports = m_reports.view_host().extent(0);
}
return num_reports;
}
@ -119,8 +120,8 @@ void ErrorReporter<ReportType, DeviceType>::getReports(
m_reporters.template sync<host_mirror_space>();
for (int i = 0; i < num_reports; ++i) {
reporters_out.push_back(m_reporters.h_view(i));
reports_out.push_back(m_reports.h_view(i));
reporters_out.push_back(m_reporters.view_host()(i));
reports_out.push_back(m_reports.view_host()(i));
}
}
}
@ -143,8 +144,8 @@ void ErrorReporter<ReportType, DeviceType>::getReports(
m_reporters.template sync<host_mirror_space>();
for (int i = 0; i < num_reports; ++i) {
reporters_out(i) = m_reporters.h_view(i);
reports_out(i) = m_reports.h_view(i);
reporters_out(i) = m_reporters.view_host()(i);
reports_out(i) = m_reports.view_host()(i);
}
}
}

View File

@ -651,8 +651,8 @@ class OffsetView : public View<DataType, Properties...> {
m_begins[i] = minIndices.begin()[i];
}
static_assert(
std::is_same<pointer_type, typename Kokkos::Impl::ViewCtorProp<
P...>::pointer_type>::value,
std::is_same_v<pointer_type,
typename Kokkos::Impl::ViewCtorProp<P...>::pointer_type>,
"When constructing OffsetView to wrap user memory, you must supply "
"matching pointer type");
}
@ -1312,9 +1312,8 @@ inline auto create_mirror(const Kokkos::Experimental::OffsetView<T, P...>& src,
return typename Kokkos::Experimental::OffsetView<T, P...>::HostMirror(
Kokkos::create_mirror(arg_prop, src.view()), src.begins());
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}
@ -1408,9 +1407,8 @@ inline auto create_mirror_view(
return Kokkos::Impl::choose_create_mirror(src, arg_prop);
}
}
#if defined(KOKKOS_COMPILER_INTEL) || \
(defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC))
#if defined(KOKKOS_COMPILER_NVCC) && KOKKOS_COMPILER_NVCC >= 1130 && \
!defined(KOKKOS_COMPILER_MSVC)
__builtin_unreachable();
#endif
}

View File

@ -788,7 +788,7 @@ class ScatterView<DataType, Layout, DeviceType, Op, ScatterNonDuplicated,
void contribute_into(execution_space const& exec_space,
View<DT, RP...> const& dest) const {
using dest_type = View<DT, RP...>;
static_assert(std::is_same<typename dest_type::array_layout, Layout>::value,
static_assert(std::is_same_v<typename dest_type::array_layout, Layout>,
"ScatterView contribute destination has different layout");
static_assert(
Kokkos::SpaceAccessibility<
@ -1071,9 +1071,9 @@ class ScatterView<DataType, Kokkos::LayoutRight, DeviceType, Op,
void contribute_into(execution_space const& exec_space,
View<DT, RP...> const& dest) const {
using dest_type = View<DT, RP...>;
static_assert(std::is_same<typename dest_type::array_layout,
Kokkos::LayoutRight>::value,
"ScatterView deep_copy destination has different layout");
static_assert(
std::is_same_v<typename dest_type::array_layout, Kokkos::LayoutRight>,
"ScatterView deep_copy destination has different layout");
static_assert(
Kokkos::SpaceAccessibility<
execution_space, typename dest_type::memory_space>::accessible,
@ -1351,12 +1351,12 @@ class ScatterView<DataType, Kokkos::LayoutLeft, DeviceType, Op,
View<RP...> const& dest) const {
using dest_type = View<RP...>;
static_assert(
std::is_same<typename dest_type::value_type,
typename original_view_type::non_const_value_type>::value,
std::is_same_v<typename dest_type::value_type,
typename original_view_type::non_const_value_type>,
"ScatterView deep_copy destination has wrong value_type");
static_assert(std::is_same<typename dest_type::array_layout,
Kokkos::LayoutLeft>::value,
"ScatterView deep_copy destination has different layout");
static_assert(
std::is_same_v<typename dest_type::array_layout, Kokkos::LayoutLeft>,
"ScatterView deep_copy destination has different layout");
static_assert(
Kokkos::SpaceAccessibility<
execution_space, typename dest_type::memory_space>::accessible,

View File

@ -21,6 +21,23 @@
#define KOKKOS_IMPL_PUBLIC_INCLUDE_NOTDEFINED_STATICCRSGRAPH
#endif
#include <Kokkos_Macros.hpp>
#if defined(KOKKOS_ENABLE_DEPRECATED_CODE_4)
#if defined(KOKKOS_ENABLE_DEPRECATION_WARNINGS) && \
!defined(KOKKOS_IMPL_DO_NOT_WARN_INCLUDE_STATIC_CRS_GRAPH)
namespace {
[[deprecated("Deprecated <Kokkos_StaticCrsGraph.hpp> header is included")]] int
emit_warning_kokkos_static_crs_graph_deprecated() {
return 0;
}
static auto do_not_include = emit_warning_kokkos_static_crs_graph_deprecated();
} // namespace
#endif
#else
#error "Deprecated <Kokkos_StaticCrsGraph.hpp> header is included"
#endif
#include <string>
#include <vector>

View File

@ -874,22 +874,16 @@ class UnorderedMap {
if (m_hash_lists.data() != src.m_hash_lists.data()) {
Kokkos::deep_copy(m_available_indexes, src.m_available_indexes);
using raw_deep_copy =
Kokkos::Impl::DeepCopy<typename device_type::memory_space,
typename SDevice::memory_space>;
// do the other deep copies asynchronously if possible
typename device_type::execution_space exec_space{};
raw_deep_copy(m_hash_lists.data(), src.m_hash_lists.data(),
sizeof(size_type) * src.m_hash_lists.extent(0));
raw_deep_copy(m_next_index.data(), src.m_next_index.data(),
sizeof(size_type) * src.m_next_index.extent(0));
raw_deep_copy(m_keys.data(), src.m_keys.data(),
sizeof(key_type) * src.m_keys.extent(0));
Kokkos::deep_copy(exec_space, m_hash_lists, src.m_hash_lists);
Kokkos::deep_copy(exec_space, m_next_index, src.m_next_index);
Kokkos::deep_copy(exec_space, m_keys, src.m_keys);
if (!is_set) {
raw_deep_copy(m_values.data(), src.m_values.data(),
sizeof(impl_value_type) * src.m_values.extent(0));
Kokkos::deep_copy(exec_space, m_values, src.m_values);
}
raw_deep_copy(m_scalars.data(), src.m_scalars.data(),
sizeof(int) * num_scalars);
Kokkos::deep_copy(exec_space, m_scalars, src.m_scalars);
Kokkos::fence(
"Kokkos::UnorderedMap::deep_copy_view: fence after copy to dst.");
@ -901,33 +895,27 @@ class UnorderedMap {
bool modified() const { return get_flag(modified_idx); }
void set_flag(int flag) const {
using raw_deep_copy =
Kokkos::Impl::DeepCopy<typename device_type::memory_space,
Kokkos::HostSpace>;
const int true_ = true;
raw_deep_copy(m_scalars.data() + flag, &true_, sizeof(int));
auto scalar = Kokkos::subview(m_scalars, flag);
Kokkos::deep_copy(typename device_type::execution_space{}, scalar,
static_cast<int>(true));
Kokkos::fence(
"Kokkos::UnorderedMap::set_flag: fence after copying flag from "
"HostSpace");
}
void reset_flag(int flag) const {
using raw_deep_copy =
Kokkos::Impl::DeepCopy<typename device_type::memory_space,
Kokkos::HostSpace>;
const int false_ = false;
raw_deep_copy(m_scalars.data() + flag, &false_, sizeof(int));
auto scalar = Kokkos::subview(m_scalars, flag);
Kokkos::deep_copy(typename device_type::execution_space{}, scalar,
static_cast<int>(false));
Kokkos::fence(
"Kokkos::UnorderedMap::reset_flag: fence after copying flag from "
"HostSpace");
}
bool get_flag(int flag) const {
using raw_deep_copy =
Kokkos::Impl::DeepCopy<Kokkos::HostSpace,
typename device_type::memory_space>;
int result = false;
raw_deep_copy(&result, m_scalars.data() + flag, sizeof(int));
const auto scalar = Kokkos::subview(m_scalars, flag);
int result;
Kokkos::deep_copy(typename device_type::execution_space{}, result, scalar);
Kokkos::fence(
"Kokkos::UnorderedMap::get_flag: fence after copy to return value in "
"HostSpace");

View File

@ -69,14 +69,14 @@ class KOKKOS_DEPRECATED vector
public:
#ifdef KOKKOS_ENABLE_CUDA_UVM
KOKKOS_INLINE_FUNCTION reference operator()(int i) const {
return DV::h_view(i);
return DV::view_host()(i);
};
KOKKOS_INLINE_FUNCTION reference operator[](int i) const {
return DV::h_view(i);
return DV::view_host()(i);
};
#else
inline reference operator()(int i) const { return DV::h_view(i); };
inline reference operator[](int i) const { return DV::h_view(i); };
inline reference operator()(int i) const { return DV::view_host()(i); };
inline reference operator[](int i) const { return DV::view_host()(i); };
#endif
/* Member functions which behave like std::vector functions */
@ -111,13 +111,13 @@ class KOKKOS_DEPRECATED vector
/* Assign value either on host or on device */
if (DV::template need_sync<typename DV::t_dev::device_type>()) {
set_functor_host f(DV::h_view, val);
set_functor_host f(DV::view_host(), val);
parallel_for("Kokkos::vector::assign", n, f);
typename DV::t_host::execution_space().fence(
"Kokkos::vector::assign: fence after assigning values");
DV::template modify<typename DV::t_host::device_type>();
} else {
set_functor f(DV::d_view, val);
set_functor f(DV::view_device(), val);
parallel_for("Kokkos::vector::assign", n, f);
typename DV::t_dev::execution_space().fence(
"Kokkos::vector::assign: fence after assigning values");
@ -136,7 +136,7 @@ class KOKKOS_DEPRECATED vector
DV::resize(new_size);
}
DV::h_view(_size) = val;
DV::view_host()(_size) = val;
_size++;
}
@ -209,27 +209,27 @@ class KOKKOS_DEPRECATED vector
size_type span() const { return DV::span(); }
bool empty() const { return _size == 0; }
pointer data() const { return DV::h_view.data(); }
pointer data() const { return DV::view_host().data(); }
iterator begin() const { return DV::h_view.data(); }
iterator begin() const { return DV::view_host().data(); }
const_iterator cbegin() const { return DV::h_view.data(); }
const_iterator cbegin() const { return DV::view_host().data(); }
iterator end() const {
return _size > 0 ? DV::h_view.data() + _size : DV::h_view.data();
return _size > 0 ? DV::view_host().data() + _size : DV::view_host().data();
}
const_iterator cend() const {
return _size > 0 ? DV::h_view.data() + _size : DV::h_view.data();
return _size > 0 ? DV::view_host().data() + _size : DV::view_host().data();
}
reference front() { return DV::h_view(0); }
reference front() { return DV::view_host()(0); }
reference back() { return DV::h_view(_size - 1); }
reference back() { return DV::view_host()(_size - 1); }
const_reference front() const { return DV::h_view(0); }
const_reference front() const { return DV::view_host()(0); }
const_reference back() const { return DV::h_view(_size - 1); }
const_reference back() const { return DV::view_host()(_size - 1); }
/* std::algorithms which work originally with iterators, here they are
* implemented as member functions */
@ -245,10 +245,10 @@ class KOKKOS_DEPRECATED vector
return theEnd;
}
Scalar lower_val = DV::h_view(lower);
Scalar upper_val = DV::h_view(upper);
Scalar lower_val = DV::view_host()(lower);
Scalar upper_val = DV::view_host()(upper);
size_t idx = (upper + lower) / 2;
Scalar val = DV::h_view(idx);
Scalar val = DV::view_host()(idx);
if (val > upper_val) return upper;
if (val < lower_val) return start;
@ -259,14 +259,14 @@ class KOKKOS_DEPRECATED vector
upper = idx;
}
idx = (upper + lower) / 2;
val = DV::h_view(idx);
val = DV::view_host()(idx);
}
return idx;
}
bool is_sorted() {
for (int i = 0; i < _size - 1; i++) {
if (DV::h_view(i) > DV::h_view(i + 1)) return false;
if (DV::view_host()(i) > DV::view_host()(i + 1)) return false;
}
return true;
}
@ -279,26 +279,27 @@ class KOKKOS_DEPRECATED vector
upper = _size - 1;
lower = 0;
if ((val < DV::h_view(0)) || (val > DV::h_view(_size - 1))) return end();
if ((val < DV::view_host()(0)) || (val > DV::view_host()(_size - 1)))
return end();
while (upper > lower) {
if (val > DV::h_view(current))
if (val > DV::view_host()(current))
lower = current + 1;
else
upper = current;
current = (upper + lower) / 2;
}
if (val == DV::h_view(current))
return &DV::h_view(current);
if (val == DV::view_host()(current))
return &DV::view_host()(current);
else
return end();
}
/* Additional functions for data management */
void device_to_host() { deep_copy(DV::h_view, DV::d_view); }
void host_to_device() const { deep_copy(DV::d_view, DV::h_view); }
void device_to_host() { deep_copy(DV::view_host(), DV::view_device()); }
void host_to_device() const { deep_copy(DV::view_device(), DV::view_host()); }
void on_host() { DV::template modify<typename DV::t_host::device_type>(); }
void on_device() { DV::template modify<typename DV::t_dev::device_type>(); }

View File

@ -75,6 +75,7 @@ uint32_t MurmurHash3_x86_32(const void* key, int len, uint32_t seed) {
//----------
// tail
// NOLINTNEXTLINE(bugprone-implicit-widening-of-multiplication-result)
const uint8_t* tail = (const uint8_t*)(data + nblocks * 4);
uint32_t k1 = 0;
@ -88,6 +89,8 @@ uint32_t MurmurHash3_x86_32(const void* key, int len, uint32_t seed) {
k1 = rotl32(k1, 15);
k1 *= c2;
h1 ^= k1;
break;
default: break;
};
//----------

View File

@ -29,8 +29,8 @@ foreach(Tag Threads;Serial;OpenMP;HPX;Cuda;HIP;SYCL)
Vector
ViewCtorPropEmbeddedDim
)
if(NOT Kokkos_ENABLE_DEPRECATED_CODE_4 AND Name STREQUAL "Vector")
continue() # skip Kokkos::vector test if deprecated code 4 is not enabled
if(NOT Kokkos_ENABLE_DEPRECATED_CODE_4 AND NOT Name IN_LIST "Vector,StaticCrsGraph")
continue() # skip Kokkos::{vector,StaticCrsGraph} tests if deprecated code 4 is not enabled
endif()
# Write to a temporary intermediate file and call configure_file to avoid
# updating timestamps triggering unnecessary rebuilds on subsequent cmake runs.

View File

@ -24,6 +24,8 @@ LINK ?= $(CXX)
LDFLAGS ?=
override LDFLAGS += -lpthread
KOKKOS_USE_DEPRECATED_MAKEFILES=1
include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests -I${KOKKOS_PATH}/core/unit_test/category_files
@ -31,7 +33,7 @@ KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests -I${KO
TEST_TARGETS =
TARGETS =
TESTS = Bitset DualView DynamicView DynViewAPI_generic DynViewAPI_rank12345 DynViewAPI_rank67 ErrorReporter OffsetView ScatterView StaticCrsGraph UnorderedMap ViewCtorPropEmbeddedDim
TESTS = Bitset DualView DynamicView DynViewAPI_generic DynViewAPI_rank12345 DynViewAPI_rank67 ErrorReporter OffsetView ScatterView UnorderedMap ViewCtorPropEmbeddedDim
tmp := $(foreach device, $(KOKKOS_DEVICELIST), \
tmp2 := $(foreach test, $(TESTS), \
$(if $(filter Test$(device)_$(test).cpp, $(shell ls Test$(device)_$(test).cpp 2>/dev/null)),,\
@ -52,7 +54,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_CUDA += TestCuda_ErrorReporter.o
OBJ_CUDA += TestCuda_OffsetView.o
OBJ_CUDA += TestCuda_ScatterView.o
OBJ_CUDA += TestCuda_StaticCrsGraph.o
OBJ_CUDA += TestCuda_UnorderedMap.o
OBJ_CUDA += TestCuda_ViewCtorPropEmbeddedDim.o
TARGETS += KokkosContainers_UnitTest_Cuda
@ -70,7 +71,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_THREADS), 1)
OBJ_THREADS += TestThreads_ErrorReporter.o
OBJ_THREADS += TestThreads_OffsetView.o
OBJ_THREADS += TestThreads_ScatterView.o
OBJ_THREADS += TestThreads_StaticCrsGraph.o
OBJ_THREADS += TestThreads_UnorderedMap.o
OBJ_THREADS += TestThreads_ViewCtorPropEmbeddedDim.o
TARGETS += KokkosContainers_UnitTest_Threads
@ -88,7 +88,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
OBJ_OPENMP += TestOpenMP_ErrorReporter.o
OBJ_OPENMP += TestOpenMP_OffsetView.o
OBJ_OPENMP += TestOpenMP_ScatterView.o
OBJ_OPENMP += TestOpenMP_StaticCrsGraph.o
OBJ_OPENMP += TestOpenMP_UnorderedMap.o
OBJ_OPENMP += TestOpenMP_ViewCtorPropEmbeddedDim.o
TARGETS += KokkosContainers_UnitTest_OpenMP
@ -106,7 +105,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_HPX), 1)
OBJ_HPX += TestHPX_ErrorReporter.o
OBJ_HPX += TestHPX_OffsetView.o
OBJ_HPX += TestHPX_ScatterView.o
OBJ_HPX += TestHPX_StaticCrsGraph.o
OBJ_HPX += TestHPX_UnorderedMap.o
OBJ_HPX += TestHPX_ViewCtorPropEmbeddedDim.o
TARGETS += KokkosContainers_UnitTest_HPX
@ -124,7 +122,6 @@ ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
OBJ_SERIAL += TestSerial_ErrorReporter.o
OBJ_SERIAL += TestSerial_OffsetView.o
OBJ_SERIAL += TestSerial_ScatterView.o
OBJ_SERIAL += TestSerial_StaticCrsGraph.o
OBJ_SERIAL += TestSerial_UnorderedMap.o
OBJ_SERIAL += TestSerial_ViewCtorPropEmbeddedDim.o
TARGETS += KokkosContainers_UnitTest_Serial

View File

@ -87,17 +87,6 @@ struct test_dualview_copy_construction_and_assignment {
ASSERT_EQ(a.view_host(), c.view_host());
ASSERT_EQ(a.view_device(), c.view_device());
// We can't test shallow equality of modified_flags because it's protected.
// So we test it indirectly through sync state behavior.
if (!std::decay_t<SrcViewType>::impl_dualview_is_single_device::value) {
a.clear_sync_state();
a.modify_host();
ASSERT_TRUE(a.need_sync_device());
ASSERT_TRUE(b.need_sync_device());
ASSERT_TRUE(c.need_sync_device());
a.clear_sync_state();
}
}
};
@ -123,16 +112,16 @@ struct test_dualview_combinations {
} else {
a = ViewType(Kokkos::view_alloc(Kokkos::WithoutInitializing, "A"), n, m);
}
Kokkos::deep_copy(a.d_view, 1);
Kokkos::deep_copy(a.view_device(), 1);
a.template modify<typename ViewType::execution_space>();
a.template sync<typename ViewType::host_mirror_space>();
a.template sync<typename ViewType::host_mirror_space>(
Kokkos::DefaultExecutionSpace{});
a.h_view(5, 1) = 3;
a.h_view(6, 1) = 4;
a.h_view(7, 2) = 5;
a.view_host()(5, 1) = 3;
a.view_host()(6, 1) = 4;
a.view_host()(7, 2) = 5;
a.template modify<typename ViewType::host_mirror_space>();
ViewType b = Kokkos::subview(a, std::pair<unsigned int, unsigned int>(6, 9),
std::pair<unsigned int, unsigned int>(0, 1));
@ -141,16 +130,17 @@ struct test_dualview_combinations {
Kokkos::DefaultExecutionSpace{});
b.template modify<typename ViewType::execution_space>();
Kokkos::deep_copy(b.d_view, 2);
Kokkos::deep_copy(b.view_device(), 2);
a.template sync<typename ViewType::host_mirror_space>();
a.template sync<typename ViewType::host_mirror_space>(
Kokkos::DefaultExecutionSpace{});
Scalar count = 0;
for (unsigned int i = 0; i < a.d_view.extent(0); i++)
for (unsigned int j = 0; j < a.d_view.extent(1); j++)
count += a.h_view(i, j);
return count - a.d_view.extent(0) * a.d_view.extent(1) - 2 - 4 - 3 * 2;
for (unsigned int i = 0; i < a.view_device().extent(0); i++)
for (unsigned int j = 0; j < a.view_device().extent(1); j++)
count += a.view_host()(i, j);
return count - a.view_device().extent(0) * a.view_device().extent(1) - 2 -
4 - 3 * 2;
}
test_dualview_combinations(unsigned int size, bool with_init) {
@ -191,7 +181,7 @@ struct test_dual_view_deep_copy {
}
const scalar_type sum_total = scalar_type(n * m);
Kokkos::deep_copy(a.d_view, 1);
Kokkos::deep_copy(a.view_device(), 1);
if (use_templ_sync) {
a.template modify<typename ViewType::execution_space>();
@ -209,15 +199,16 @@ struct test_dual_view_deep_copy {
typename ViewType::t_dev::memory_space::execution_space;
Kokkos::parallel_reduce(
Kokkos::RangePolicy<t_dev_exec_space>(0, n),
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(a.d_view),
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(
a.view_device()),
a_d_sum);
ASSERT_EQ(a_d_sum, sum_total);
// Check host view is synced as expected
scalar_type a_h_sum = 0;
for (size_t i = 0; i < a.h_view.extent(0); ++i)
for (size_t j = 0; j < a.h_view.extent(1); ++j) {
a_h_sum += a.h_view(i, j);
for (size_t i = 0; i < a.view_host().extent(0); ++i)
for (size_t j = 0; j < a.view_host().extent(1); ++j) {
a_h_sum += a.view_host()(i, j);
}
ASSERT_EQ(a_h_sum, sum_total);
@ -237,15 +228,16 @@ struct test_dual_view_deep_copy {
// Execute on the execution_space associated with t_dev's memory space
Kokkos::parallel_reduce(
Kokkos::RangePolicy<t_dev_exec_space>(0, n),
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(b.d_view),
SumViewEntriesFunctor<scalar_type, typename ViewType::t_dev>(
b.view_device()),
b_d_sum);
ASSERT_EQ(b_d_sum, sum_total);
// Check host view is synced as expected
scalar_type b_h_sum = 0;
for (size_t i = 0; i < b.h_view.extent(0); ++i)
for (size_t j = 0; j < b.h_view.extent(1); ++j) {
b_h_sum += b.h_view(i, j);
for (size_t i = 0; i < b.view_host().extent(0); ++i)
for (size_t j = 0; j < b.view_host().extent(1); ++j) {
b_h_sum += b.view_host()(i, j);
}
ASSERT_EQ(b_h_sum, sum_total);
@ -256,8 +248,8 @@ struct test_dual_view_deep_copy {
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(10, 5, true);
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(10, 5,
false);
// Test zero length but allocated (a.d_view.data!=nullptr but
// a.d_view.span()==0)
// Test zero length but allocated (a.view_device().data() != nullptr but
// a.view_device().span() == 0)
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(0, 5, true);
run_me<Kokkos::DualView<Scalar**, Kokkos::LayoutLeft, Device>>(0, 5, false);
@ -285,7 +277,7 @@ struct test_dualview_resize {
else
a = ViewType(Kokkos::view_alloc(Kokkos::WithoutInitializing, "A"), n, m);
Kokkos::deep_copy(a.d_view, 1);
Kokkos::deep_copy(a.view_device(), 1);
/* Covers case "Resize on Device" */
a.modify_device();
@ -296,7 +288,7 @@ struct test_dualview_resize {
ASSERT_EQ(a.extent(0), n * factor);
ASSERT_EQ(a.extent(1), m * factor);
Kokkos::deep_copy(a.d_view, 1);
Kokkos::deep_copy(a.view_device(), 1);
a.sync_host();
// Check device view is initialized as expected
@ -307,18 +299,18 @@ struct test_dualview_resize {
"errors");
Kokkos::parallel_for(
Kokkos::MDRangePolicy<t_dev_exec_space, Kokkos::Rank<2>>(
{0, 0}, {a.d_view.extent(0), a.d_view.extent(1)}),
{0, 0}, {a.view_device().extent(0), a.view_device().extent(1)}),
KOKKOS_LAMBDA(int i, int j) {
if (a.d_view(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
if (a.view_device()(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
});
int errors_d_scalar;
Kokkos::deep_copy(errors_d_scalar, errors_d);
// Check host view is synced as expected
int errors_h_scalar = 0;
for (size_t i = 0; i < a.h_view.extent(0); ++i)
for (size_t j = 0; j < a.h_view.extent(1); ++j) {
if (a.h_view(i, j) != 1) ++errors_h_scalar;
for (size_t i = 0; i < a.view_host().extent(0); ++i)
for (size_t j = 0; j < a.view_host().extent(1); ++j) {
if (a.view_host()(i, j) != 1) ++errors_h_scalar;
}
// Check
@ -345,17 +337,17 @@ struct test_dualview_resize {
typename ViewType::t_dev::memory_space::execution_space;
Kokkos::parallel_for(
Kokkos::MDRangePolicy<t_dev_exec_space, Kokkos::Rank<2>>(
{0, 0}, {a.d_view.extent(0), a.d_view.extent(1)}),
{0, 0}, {a.view_device().extent(0), a.view_device().extent(1)}),
KOKKOS_LAMBDA(int i, int j) {
if (a.d_view(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
if (a.view_device()(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
});
Kokkos::deep_copy(errors_d_scalar, errors_d);
// Check host view is synced as expected
errors_h_scalar = 0;
for (size_t i = 0; i < a.h_view.extent(0); ++i)
for (size_t j = 0; j < a.h_view.extent(1); ++j) {
if (a.h_view(i, j) != 1) ++errors_h_scalar;
for (size_t i = 0; i < a.view_host().extent(0); ++i)
for (size_t j = 0; j < a.view_host().extent(1); ++j) {
if (a.view_host()(i, j) != 1) ++errors_h_scalar;
}
// Check
@ -390,7 +382,7 @@ struct test_dualview_realloc {
ASSERT_EQ(a.extent(0), n);
ASSERT_EQ(a.extent(1), m);
Kokkos::deep_copy(a.d_view, 1);
Kokkos::deep_copy(a.view_device(), 1);
a.modify_device();
a.sync_host();
@ -403,18 +395,18 @@ struct test_dualview_realloc {
"errors");
Kokkos::parallel_for(
Kokkos::MDRangePolicy<t_dev_exec_space, Kokkos::Rank<2>>(
{0, 0}, {a.d_view.extent(0), a.d_view.extent(1)}),
{0, 0}, {a.view_device().extent(0), a.view_device().extent(1)}),
KOKKOS_LAMBDA(int i, int j) {
if (a.d_view(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
if (a.view_device()(i, j) != 1) Kokkos::atomic_inc(errors_d.data());
});
int errors_d_scalar;
Kokkos::deep_copy(errors_d_scalar, errors_d);
// Check host view is synced as expected
int errors_h_scalar = 0;
for (size_t i = 0; i < a.h_view.extent(0); ++i)
for (size_t j = 0; j < a.h_view.extent(1); ++j) {
if (a.h_view(i, j) != 1) ++errors_h_scalar;
for (size_t i = 0; i < a.view_host().extent(0); ++i)
for (size_t j = 0; j < a.view_host().extent(1); ++j) {
if (a.view_host()(i, j) != 1) ++errors_h_scalar;
}
// Check
@ -484,6 +476,64 @@ TEST(TEST_CATEGORY, dualview_deep_copy) {
test_dualview_deep_copy<double, TEST_EXECSPACE>();
}
template <typename ExecutionSpace>
void test_dualview_sync_should_fence() {
using DualViewType = Kokkos::DualView<int, ExecutionSpace>;
{
DualViewType dv("test_dual_view");
dv.modify_device();
Kokkos::parallel_for(
Kokkos::RangePolicy<ExecutionSpace>(0, 10000),
KOKKOS_LAMBDA(int) { Kokkos::atomic_add(dv.view_device().data(), 1); });
dv.sync_host();
ASSERT_EQ(dv.view_host()(), 10000);
}
{
DualViewType dv("test_dual_view");
dv.modify_device();
Kokkos::parallel_for(
Kokkos::RangePolicy<ExecutionSpace>(0, 10000),
KOKKOS_LAMBDA(int) { Kokkos::atomic_add(dv.view_device().data(), 1); });
dv.template sync<typename DualViewType::t_host::device_type>();
ASSERT_EQ(dv.view_host()(), 10000);
}
{
DualViewType dv("test_dual_view");
dv.modify_host();
Kokkos::parallel_for(
Kokkos::RangePolicy<Kokkos::DefaultHostExecutionSpace>(0, 10000),
KOKKOS_LAMBDA(int) { Kokkos::atomic_add(dv.view_host().data(), 1); });
dv.sync_device();
int result;
auto device_exec =
Kokkos::Experimental::partition_space(ExecutionSpace{}, 1);
Kokkos::deep_copy(device_exec[0], result, dv.view_device());
device_exec[0].fence();
ASSERT_EQ(result, 10000);
}
{
DualViewType dv("test_dual_view");
dv.modify_host();
Kokkos::parallel_for(
Kokkos::RangePolicy<Kokkos::DefaultHostExecutionSpace>(0, 10000),
KOKKOS_LAMBDA(int) { Kokkos::atomic_add(dv.view_host().data(), 1); });
dv.template sync<typename DualViewType::t_dev::device_type>();
int result;
auto device_exec =
Kokkos::Experimental::partition_space(ExecutionSpace{}, 1);
Kokkos::deep_copy(device_exec[0], result, dv.view_device());
device_exec[0].fence();
ASSERT_EQ(result, 10000);
}
}
TEST(TEST_CATEGORY, dualview_sync_should_fence) {
#ifdef KOKKOS_ENABLE_HPX // FIXME
GTEST_SKIP() << "Known to fail with HPX";
#endif
test_dualview_sync_should_fence<TEST_EXECSPACE>();
}
struct NoDefaultConstructor {
NoDefaultConstructor(int i_) : i(i_) {}
KOKKOS_FUNCTION operator int() const { return i; }
@ -640,8 +690,8 @@ auto initialize_view_of_views() {
V v("v", 2);
V w("w", 2);
dv_v.h_view(0) = v;
dv_v.h_view(1) = w;
dv_v.view_host()(0) = v;
dv_v.view_host()(1) = w;
dv_v.modify_host();
dv_v.sync_device();
@ -652,19 +702,19 @@ auto initialize_view_of_views() {
TEST(TEST_CATEGORY, dualview_sequential_host_init) {
auto dv_v = initialize_view_of_views<Kokkos::View<double*, TEST_EXECSPACE>>();
dv_v.resize(Kokkos::view_alloc(Kokkos::SequentialHostInit), 2u);
ASSERT_EQ(dv_v.d_view.size(), 2u);
ASSERT_EQ(dv_v.h_view.size(), 2u);
ASSERT_EQ(dv_v.view_device().size(), 2u);
ASSERT_EQ(dv_v.view_host().size(), 2u);
initialize_view_of_views<S<Kokkos::View<double*, TEST_EXECSPACE>>>();
Kokkos::DualView<double*> dv(
Kokkos::view_alloc("myView", Kokkos::SequentialHostInit), 1u);
dv.resize(Kokkos::view_alloc(Kokkos::SequentialHostInit), 2u);
ASSERT_EQ(dv.d_view.size(), 2u);
ASSERT_EQ(dv.h_view.size(), 2u);
ASSERT_EQ(dv.view_device().size(), 2u);
ASSERT_EQ(dv.view_host().size(), 2u);
dv.realloc(Kokkos::view_alloc(Kokkos::SequentialHostInit), 3u);
ASSERT_EQ(dv.d_view.size(), 3u);
ASSERT_EQ(dv.h_view.size(), 3u);
ASSERT_EQ(dv.view_device().size(), 3u);
ASSERT_EQ(dv.view_host().size(), 3u);
}
} // anonymous namespace
} // namespace Test

View File

@ -27,7 +27,7 @@ void test_dyn_rank_view_team_scratch() {
using policy_type = Kokkos::TeamPolicy<execution_space>;
using team_type = policy_type::member_type;
int N0 = 10, N1 = 4, N2 = 3;
size_t N0 = 10, N1 = 4, N2 = 3;
size_t shmem_size = drv_type::shmem_size(N0, N1, N2);
ASSERT_GE(shmem_size, N0 * N1 * N2 * sizeof(int));
@ -40,9 +40,9 @@ void test_dyn_rank_view_team_scratch() {
drv_type scr(team.team_scratch(0), N0, N1, N2);
// Control that the code ran at all
if (scr.rank() != 3) errors() |= 1u;
if (scr.extent_int(0) != N0) errors() |= 2u;
if (scr.extent_int(1) != N1) errors() |= 4u;
if (scr.extent_int(2) != N2) errors() |= 8u;
if (scr.extent(0) != N0) errors() |= 2u;
if (scr.extent(1) != N1) errors() |= 4u;
if (scr.extent(2) != N2) errors() |= 8u;
Kokkos::parallel_for(
Kokkos::TeamThreadMDRange(team, N0, N1, N2),
[=](int i, int j, int k) { scr(i, j, k) = i * 100 + j * 10 + k; });

View File

@ -130,7 +130,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 7> {
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
const long j = &left(i0, i1, i2, i3, i4, i5, i6) -
&left(0, 0, 0, 0, 0, 0, 0);
if (j <= offset || left_alloc <= j) {
if (j < offset || left_alloc <= j) {
update |= 1;
}
offset = j;
@ -146,7 +146,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 7> {
for (unsigned i6 = 0; i6 < unsigned(right.extent(6)); ++i6) {
const long j = &right(i0, i1, i2, i3, i4, i5, i6) -
&right(0, 0, 0, 0, 0, 0, 0);
if (j <= offset || right_alloc <= j) {
if (j < offset || right_alloc <= j) {
update |= 2;
}
offset = j;
@ -212,7 +212,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 6> {
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
const long j =
&left(i0, i1, i2, i3, i4, i5) - &left(0, 0, 0, 0, 0, 0);
if (j <= offset || left_alloc <= j) {
if (j < offset || left_alloc <= j) {
update |= 1;
}
offset = j;
@ -227,7 +227,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 6> {
for (unsigned i5 = 0; i5 < unsigned(right.extent(5)); ++i5) {
const long j =
&right(i0, i1, i2, i3, i4, i5) - &right(0, 0, 0, 0, 0, 0);
if (j <= offset || right_alloc <= j) {
if (j < offset || right_alloc <= j) {
update |= 2;
}
offset = j;
@ -298,7 +298,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 5> {
for (unsigned i1 = 0; i1 < unsigned(left.extent(1)); ++i1)
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
const long j = &left(i0, i1, i2, i3, i4) - &left(0, 0, 0, 0, 0);
if (j <= offset || left_alloc <= j) {
if (j < offset || left_alloc <= j) {
update |= 1;
}
offset = j;
@ -316,7 +316,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 5> {
for (unsigned i3 = 0; i3 < unsigned(right.extent(3)); ++i3)
for (unsigned i4 = 0; i4 < unsigned(right.extent(4)); ++i4) {
const long j = &right(i0, i1, i2, i3, i4) - &right(0, 0, 0, 0, 0);
if (j <= offset || right_alloc <= j) {
if (j < offset || right_alloc <= j) {
update |= 2;
}
offset = j;
@ -383,7 +383,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 4> {
for (unsigned i1 = 0; i1 < unsigned(left.extent(1)); ++i1)
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
const long j = &left(i0, i1, i2, i3) - &left(0, 0, 0, 0);
if (j <= offset || left_alloc <= j) {
if (j < offset || left_alloc <= j) {
update |= 1;
}
offset = j;
@ -395,7 +395,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 4> {
for (unsigned i2 = 0; i2 < unsigned(right.extent(2)); ++i2)
for (unsigned i3 = 0; i3 < unsigned(right.extent(3)); ++i3) {
const long j = &right(i0, i1, i2, i3) - &right(0, 0, 0, 0);
if (j <= offset || right_alloc <= j) {
if (j < offset || right_alloc <= j) {
update |= 2;
}
offset = j;
@ -462,7 +462,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 3> {
for (unsigned i1 = 0; i1 < unsigned(left.extent(1)); ++i1)
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
const long j = &left(i0, i1, i2) - &left(0, 0, 0);
if (j <= offset || left_alloc <= j) {
if (j < offset || left_alloc <= j) {
update |= 1;
}
offset = j;
@ -477,7 +477,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 3> {
for (unsigned i1 = 0; i1 < unsigned(right.extent(1)); ++i1)
for (unsigned i2 = 0; i2 < unsigned(right.extent(2)); ++i2) {
const long j = &right(i0, i1, i2) - &right(0, 0, 0);
if (j <= offset || right_alloc <= j) {
if (j < offset || right_alloc <= j) {
update |= 2;
}
offset = j;
@ -551,7 +551,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 2> {
for (unsigned i1 = 0; i1 < unsigned(left.extent(1)); ++i1)
for (unsigned i0 = 0; i0 < unsigned(left.extent(0)); ++i0) {
const long j = &left(i0, i1) - &left(0, 0);
if (j <= offset || left_alloc <= j) {
if (j < offset || left_alloc <= j) {
update |= 1;
}
offset = j;
@ -561,7 +561,7 @@ struct TestViewOperator_LeftAndRight<DataType, DeviceType, 2> {
for (unsigned i0 = 0; i0 < unsigned(right.extent(0)); ++i0)
for (unsigned i1 = 0; i1 < unsigned(right.extent(1)); ++i1) {
const long j = &right(i0, i1) - &right(0, 0);
if (j <= offset || right_alloc <= j) {
if (j < offset || right_alloc <= j) {
update |= 2;
}
offset = j;
@ -1563,7 +1563,7 @@ class TestDynViewAPI {
// an lvalue reference due to retrieving through texture cache
// therefore not allowed to query the underlying pointer.
#if defined(KOKKOS_ENABLE_CUDA)
if (!std::is_same<typename device::execution_space, Kokkos::Cuda>::value)
if (!std::is_same_v<typename device::execution_space, Kokkos::Cuda>)
#endif
{
ASSERT_EQ(x.data(), xr.data());

View File

@ -270,18 +270,19 @@ void test_offsetview_construction() {
template <typename Scalar, typename Device>
void test_offsetview_unmanaged_construction() {
// Preallocated memory (Only need a valid address for this test)
Scalar s;
// Preallocated memory
Kokkos::View<Scalar, Device> s("s");
Scalar* ptr = s.data(); // obtain a pointer into the right address space
{
// Constructing an OffsetView directly around our preallocated memory
Kokkos::Array<int64_t, 1> begins1{{2}};
Kokkos::Array<int64_t, 1> ends1{{3}};
Kokkos::Experimental::OffsetView<Scalar*, Device> ov1(&s, begins1, ends1);
Kokkos::Experimental::OffsetView<Scalar*, Device> ov1(ptr, begins1, ends1);
// Constructing an OffsetView around an unmanaged View of our preallocated
// memory
Kokkos::View<Scalar*, Device> v1(&s, ends1[0] - begins1[0]);
Kokkos::View<Scalar*, Device> v1(ptr, ends1[0] - begins1[0]);
Kokkos::Experimental::OffsetView<Scalar*, Device> ovv1(v1, begins1);
// They should match
@ -292,9 +293,9 @@ void test_offsetview_unmanaged_construction() {
{
Kokkos::Array<int64_t, 2> begins2{{-2, -7}};
Kokkos::Array<int64_t, 2> ends2{{5, -3}};
Kokkos::Experimental::OffsetView<Scalar**, Device> ov2(&s, begins2, ends2);
Kokkos::Experimental::OffsetView<Scalar**, Device> ov2(ptr, begins2, ends2);
Kokkos::View<Scalar**, Device> v2(&s, ends2[0] - begins2[0],
Kokkos::View<Scalar**, Device> v2(ptr, ends2[0] - begins2[0],
ends2[1] - begins2[1]);
Kokkos::Experimental::OffsetView<Scalar**, Device> ovv2(v2, begins2);
@ -305,10 +306,10 @@ void test_offsetview_unmanaged_construction() {
{
Kokkos::Array<int64_t, 3> begins3{{2, 3, 5}};
Kokkos::Array<int64_t, 3> ends3{{7, 11, 13}};
Kokkos::Experimental::OffsetView<Scalar***, Device> ovv3(&s, begins3,
Kokkos::Experimental::OffsetView<Scalar***, Device> ovv3(ptr, begins3,
ends3);
Kokkos::View<Scalar***, Device> v3(&s, ends3[0] - begins3[0],
Kokkos::View<Scalar***, Device> v3(ptr, ends3[0] - begins3[0],
ends3[1] - begins3[1],
ends3[2] - begins3[2]);
Kokkos::Experimental::OffsetView<Scalar***, Device> ov3(v3, begins3);
@ -323,10 +324,10 @@ void test_offsetview_unmanaged_construction() {
Kokkos::Array<int64_t, 1> begins{{-3}};
Kokkos::Array<int64_t, 1> ends{{2}};
Kokkos::Experimental::OffsetView<Scalar*, Device> bb(&s, begins, ends);
Kokkos::Experimental::OffsetView<Scalar*, Device> bi(&s, begins, {2});
Kokkos::Experimental::OffsetView<Scalar*, Device> ib(&s, {-3}, ends);
Kokkos::Experimental::OffsetView<Scalar*, Device> ii(&s, {-3}, {2});
Kokkos::Experimental::OffsetView<Scalar*, Device> bb(ptr, begins, ends);
Kokkos::Experimental::OffsetView<Scalar*, Device> bi(ptr, begins, {2});
Kokkos::Experimental::OffsetView<Scalar*, Device> ib(ptr, {-3}, ends);
Kokkos::Experimental::OffsetView<Scalar*, Device> ii(ptr, {-3}, {2});
ASSERT_EQ(bb, bi);
ASSERT_EQ(bb, ib);
@ -336,8 +337,9 @@ void test_offsetview_unmanaged_construction() {
template <typename Scalar, typename Device>
void test_offsetview_unmanaged_construction_death() {
// Preallocated memory (Only need a valid address for this test)
Scalar s;
// Preallocated memory
Kokkos::View<Scalar, Device> s("s");
Scalar* ptr = s.data(); // obtain a pointer into the right address space
// Regular expression syntax on Windows is a pain. `.` does not match `\n`.
// Feel free to make it work if you have time to spare.
@ -351,10 +353,10 @@ void test_offsetview_unmanaged_construction_death() {
using offset_view_type = Kokkos::Experimental::OffsetView<Scalar*, Device>;
// Range calculations must be positive
(void)offset_view_type(&s, {0}, {1});
(void)offset_view_type(&s, {0}, {0});
(void)offset_view_type(ptr, {0}, {1});
(void)offset_view_type(ptr, {0}, {0});
ASSERT_DEATH(
offset_view_type(&s, {0}, {-1}),
offset_view_type(ptr, {0}, {-1}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
@ -366,9 +368,9 @@ void test_offsetview_unmanaged_construction_death() {
using offset_view_type = Kokkos::Experimental::OffsetView<Scalar*, Device>;
// Range calculations must not overflow
(void)offset_view_type(&s, {0}, {0x7fffffffffffffffl});
(void)offset_view_type(ptr, {0}, {0x7fffffffffffffffl});
ASSERT_DEATH(
offset_view_type(&s, {-1}, {0x7fffffffffffffffl}),
offset_view_type(ptr, {-1}, {0x7fffffffffffffffl}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
@ -376,7 +378,8 @@ void test_offsetview_unmanaged_construction_death() {
"\\(-1\\)\\) "
"overflows"));
ASSERT_DEATH(
offset_view_type(&s, {-0x7fffffffffffffffl - 1}, {0x7fffffffffffffffl}),
offset_view_type(ptr, {-0x7fffffffffffffffl - 1},
{0x7fffffffffffffffl}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
@ -384,7 +387,7 @@ void test_offsetview_unmanaged_construction_death() {
"\\(-9223372036854775808\\)\\) "
"overflows"));
ASSERT_DEATH(
offset_view_type(&s, {-0x7fffffffffffffffl - 1}, {0}),
offset_view_type(ptr, {-0x7fffffffffffffffl - 1}, {0}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
@ -399,7 +402,7 @@ void test_offsetview_unmanaged_construction_death() {
// Should throw when the rank of begins and/or ends doesn't match that
// of OffsetView
ASSERT_DEATH(
offset_view_type(&s, {0}, {1}),
offset_view_type(ptr, {0}, {1}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
@ -407,13 +410,13 @@ void test_offsetview_unmanaged_construction_death() {
".*"
"ends\\.size\\(\\) \\(1\\) != Rank \\(2\\)"));
ASSERT_DEATH(
offset_view_type(&s, {0}, {1, 1}),
offset_view_type(ptr, {0}, {1, 1}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
"begins\\.size\\(\\) \\(1\\) != Rank \\(2\\)"));
ASSERT_DEATH(
offset_view_type(&s, {0}, {1, 1, 1}),
offset_view_type(ptr, {0}, {1, 1, 1}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
@ -421,20 +424,20 @@ void test_offsetview_unmanaged_construction_death() {
".*"
"ends\\.size\\(\\) \\(3\\) != Rank \\(2\\)"));
ASSERT_DEATH(
offset_view_type(&s, {0, 0}, {1}),
offset_view_type(ptr, {0, 0}, {1}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
"ends\\.size\\(\\) \\(1\\) != Rank \\(2\\)"));
(void)offset_view_type(&s, {0, 0}, {1, 1});
(void)offset_view_type(ptr, {0, 0}, {1, 1});
ASSERT_DEATH(
offset_view_type(&s, {0, 0}, {1, 1, 1}),
offset_view_type(ptr, {0, 0}, {1, 1, 1}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
"ends\\.size\\(\\) \\(3\\) != Rank \\(2\\)"));
ASSERT_DEATH(
offset_view_type(&s, {0, 0, 0}, {1}),
offset_view_type(ptr, {0, 0, 0}, {1}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
@ -442,13 +445,13 @@ void test_offsetview_unmanaged_construction_death() {
".*"
"ends\\.size\\(\\) \\(1\\) != Rank \\(2\\)"));
ASSERT_DEATH(
offset_view_type(&s, {0, 0, 0}, {1, 1}),
offset_view_type(ptr, {0, 0, 0}, {1, 1}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"
"begins\\.size\\(\\) \\(3\\) != Rank \\(2\\)"));
ASSERT_DEATH(
offset_view_type(&s, {0, 0, 0}, {1, 1, 1}),
offset_view_type(ptr, {0, 0, 0}, {1, 1, 1}),
SKIP_REGEX_ON_WINDOWS(
"Kokkos::Experimental::OffsetView ERROR: for unmanaged OffsetView"
".*"

View File

@ -772,12 +772,12 @@ TEST(TEST_CATEGORY, scatterview) {
#if defined(KOKKOS_ENABLE_SERIAL) || defined(KOKKOS_ENABLE_OPENMP)
#if defined(KOKKOS_ENABLE_SERIAL)
bool is_serial = std::is_same<TEST_EXECSPACE, Kokkos::Serial>::value;
bool is_serial = std::is_same_v<TEST_EXECSPACE, Kokkos::Serial>;
#else
bool is_serial = false;
#endif
#if defined(KOKKOS_ENABLE_OPENMP)
bool is_openmp = std::is_same<TEST_EXECSPACE, Kokkos::OpenMP>::value;
bool is_openmp = std::is_same_v<TEST_EXECSPACE, Kokkos::OpenMP>;
#else
bool is_openmp = false;
#endif
@ -817,7 +817,7 @@ TEST(TEST_CATEGORY, scatterview_devicetype) {
using device_memory_space = Kokkos::HIPSpace;
using host_accessible_space = Kokkos::HIPManagedSpace;
#endif
if (std::is_same<TEST_EXECSPACE, device_execution_space>::value) {
if (std::is_same_v<TEST_EXECSPACE, device_execution_space>) {
using device_device_type =
Kokkos::Device<device_execution_space, device_memory_space>;
test_scatter_view<device_device_type, Kokkos::Experimental::ScatterSum,

View File

@ -18,7 +18,9 @@
#include <vector>
#define KOKKOS_IMPL_DO_NOT_WARN_INCLUDE_STATIC_CRS_GRAPH
#include <Kokkos_StaticCrsGraph.hpp>
#undef KOKKOS_IMPL_DO_NOT_WARN_INCLUDE_STATIC_CRS_GRAPH
#include <Kokkos_Core.hpp>
/*--------------------------------------------------------------------------*/

View File

@ -24,6 +24,7 @@
int main(int argc, char** argv) {
Kokkos::initialize(argc, argv);
benchmark::Initialize(&argc, argv);
// FIXME: seconds as default time unit leads to precision loss
benchmark::SetDefaultTimeUnit(benchmark::kSecond);
KokkosBenchmark::add_benchmark_context(true);

View File

@ -133,3 +133,4 @@ if(NOT KOKKOS_CXX_COMPILER_ID STREQUAL NVHPC)
endif()
kokkos_add_benchmark(PerformanceTest_Atomic SOURCES test_atomic.cpp)
kokkos_add_benchmark(PerformanceTest_Reduction SOURCES test_reduction.cpp)

View File

@ -20,6 +20,8 @@ LINK ?= $(CXX)
LDFLAGS ?=
override LDFLAGS += -lpthread
KOKKOS_USE_DEPRECATED_MAKEFILES=1
include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/perf_test

View File

@ -107,8 +107,8 @@ int get_R(benchmark::State& state) {
template <class Scalar>
static void CustomReduction(benchmark::State& state) {
int N = get_N(state);
int R = get_R(state);
size_t N = get_N(state);
size_t R = get_R(state);
for (auto _ : state) {
auto results = custom_reduction_test<double>(N, R);

View File

@ -38,12 +38,15 @@ void deepcopy_view(ViewTypeA& a, ViewTypeB& b, benchmark::State& state) {
}
}
template <class LayoutA, class LayoutB>
template <
class LayoutA, class LayoutB,
class MemorySpaceA = typename Kokkos::DefaultExecutionSpace::memory_space,
class MemorySpaceB = typename Kokkos::DefaultExecutionSpace::memory_space>
static void ViewDeepCopy_Rank1(benchmark::State& state) {
const int N8 = std::pow(state.range(0), 8);
Kokkos::View<double*, LayoutA> a("A1", N8);
Kokkos::View<double*, LayoutB> b("B1", N8);
Kokkos::View<double*, LayoutA, MemorySpaceA> a("A1", N8);
Kokkos::View<double*, LayoutB, MemorySpaceB> b("B1", N8);
deepcopy_view(a, b, state);
}
@ -145,6 +148,29 @@ static void ViewDeepCopy_Raw(benchmark::State& state) {
}
}
template <typename DstMemorySpace, typename SrcMemorySpace>
static void ViewDeepCopy_Rank1Strided(benchmark::State& state) {
const size_t N8 = std::pow(state.range(0), 8);
// This benchmark allocates more data in order to measure a deep_copy
// of the same size as the contiguous benchmarks, so in cases where they
// can be run, this one may fail to allocate data (e.g., on a small CI runner)
try {
// allocate 2x the size since layout only has 1/2 the elements
Kokkos::View<double*, DstMemorySpace> a("A1", N8 * 2);
Kokkos::View<double*, SrcMemorySpace> b("B1", N8 * 2);
Kokkos::LayoutStride layout(N8 / 2, 2);
Kokkos::View<double*, Kokkos::LayoutStride, DstMemorySpace> a_stride(
a.data(), layout);
Kokkos::View<double*, Kokkos::LayoutStride, SrcMemorySpace> b_stride(
b.data(), layout);
deepcopy_view(a_stride, b_stride, state);
} catch (const std::runtime_error& e) {
state.SkipWithError(e.what());
}
}
} // namespace Test
#endif

View File

@ -18,6 +18,23 @@
namespace Test {
// host -> default
BENCHMARK(ViewDeepCopy_Rank1<Kokkos::LayoutLeft, Kokkos::LayoutLeft,
Kokkos::DefaultExecutionSpace::memory_space,
Kokkos::DefaultHostExecutionSpace::memory_space>)
->ArgName("N")
->Arg(10)
->UseManualTime();
// default -> host
BENCHMARK(ViewDeepCopy_Rank1<Kokkos::LayoutLeft, Kokkos::LayoutLeft,
Kokkos::DefaultHostExecutionSpace::memory_space,
Kokkos::DefaultExecutionSpace::memory_space>)
->ArgName("N")
->Arg(10)
->UseManualTime();
// default -> default
BENCHMARK(ViewDeepCopy_Rank1<Kokkos::LayoutLeft, Kokkos::LayoutLeft>)
->ArgName("N")
->Arg(10)
@ -33,4 +50,18 @@ BENCHMARK(ViewDeepCopy_Rank3<Kokkos::LayoutLeft, Kokkos::LayoutLeft>)
->Arg(10)
->UseManualTime();
BENCHMARK(
ViewDeepCopy_Rank1Strided<Kokkos::DefaultExecutionSpace::memory_space,
Kokkos::DefaultExecutionSpace::memory_space>)
->ArgName("N")
->Arg(10)
->UseManualTime();
BENCHMARK(
ViewDeepCopy_Rank1Strided<Kokkos::DefaultHostExecutionSpace::memory_space,
Kokkos::DefaultHostExecutionSpace::memory_space>)
->ArgName("N")
->Arg(10)
->UseManualTime();
} // namespace Test

Some files were not shown because too many files have changed in this diff Show More