Updating Kokkos lib
This commit is contained in:
8
lib/kokkos/.gitignore
vendored
8
lib/kokkos/.gitignore
vendored
@ -1,8 +0,0 @@
|
|||||||
# Standard ignores
|
|
||||||
*~
|
|
||||||
*.pyc
|
|
||||||
\#*#
|
|
||||||
.#*
|
|
||||||
.*.swp
|
|
||||||
.cproject
|
|
||||||
.project
|
|
||||||
284
lib/kokkos/CHANGELOG.md
Normal file
284
lib/kokkos/CHANGELOG.md
Normal file
@ -0,0 +1,284 @@
|
|||||||
|
# Change Log
|
||||||
|
|
||||||
|
## [2.02.07](https://github.com/kokkos/kokkos/tree/2.02.07) (2016-12-16)
|
||||||
|
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.01...2.02.07)
|
||||||
|
|
||||||
|
**Implemented enhancements:**
|
||||||
|
|
||||||
|
- Add CMake option to enable Cuda Lambda support [\#589](https://github.com/kokkos/kokkos/issues/589)
|
||||||
|
- Add CMake option to enable Cuda RDC support [\#588](https://github.com/kokkos/kokkos/issues/588)
|
||||||
|
- Add Initial Intel Sky Lake Xeon-HPC Compiler Support to Kokkos Make System [\#584](https://github.com/kokkos/kokkos/issues/584)
|
||||||
|
- Building Tutorial Examples [\#582](https://github.com/kokkos/kokkos/issues/582)
|
||||||
|
- Internal way for using ThreadVectorRange without TeamHandle [\#574](https://github.com/kokkos/kokkos/issues/574)
|
||||||
|
- Testing: Add testing for uvm and rdc [\#571](https://github.com/kokkos/kokkos/issues/571)
|
||||||
|
- Profiling: Add Memory Tracing and Region Markers [\#557](https://github.com/kokkos/kokkos/issues/557)
|
||||||
|
- nvcc\_wrapper not installed with Kokkos built with CUDA through CMake [\#543](https://github.com/kokkos/kokkos/issues/543)
|
||||||
|
- Improve DynRankView debug check [\#541](https://github.com/kokkos/kokkos/issues/541)
|
||||||
|
- Benchmarks: Add Gather benchmark [\#536](https://github.com/kokkos/kokkos/issues/536)
|
||||||
|
- Testing: add spot\_check option to test\_all\_sandia [\#535](https://github.com/kokkos/kokkos/issues/535)
|
||||||
|
- Deprecate Kokkos::Impl::VerifyExecutionCanAccessMemorySpace [\#527](https://github.com/kokkos/kokkos/issues/527)
|
||||||
|
- Add AtomicAdd support for 64bit float for Pascal [\#522](https://github.com/kokkos/kokkos/issues/522)
|
||||||
|
- Add Restrict and Aligned memory trait [\#517](https://github.com/kokkos/kokkos/issues/517)
|
||||||
|
- Kokkos Tests are Not Run using Compiler Optimization [\#501](https://github.com/kokkos/kokkos/issues/501)
|
||||||
|
- Add support for clang 3.7 w/ openmp backend [\#393](https://github.com/kokkos/kokkos/issues/393)
|
||||||
|
- Provide an error throw class [\#79](https://github.com/kokkos/kokkos/issues/79)
|
||||||
|
|
||||||
|
**Fixed bugs:**
|
||||||
|
|
||||||
|
- Cuda UVM Allocation test broken with UVM as default space [\#586](https://github.com/kokkos/kokkos/issues/586)
|
||||||
|
- Bug \(develop branch only\): multiple tests are now failing when forcing uvm usage. [\#570](https://github.com/kokkos/kokkos/issues/570)
|
||||||
|
- Error in generate\_makefile.sh for Kokkos when Compiler is Empty String/Fails [\#568](https://github.com/kokkos/kokkos/issues/568)
|
||||||
|
- XL 13.1.4 incorrect C++11 flag [\#553](https://github.com/kokkos/kokkos/issues/553)
|
||||||
|
- Improve DynRankView debug check [\#541](https://github.com/kokkos/kokkos/issues/541)
|
||||||
|
- Installing Library on MAC broken due to cp -u [\#539](https://github.com/kokkos/kokkos/issues/539)
|
||||||
|
- Intel Nightly Testing with Debug enabled fails [\#534](https://github.com/kokkos/kokkos/issues/534)
|
||||||
|
|
||||||
|
## [2.02.01](https://github.com/kokkos/kokkos/tree/2.02.01) (2016-11-01)
|
||||||
|
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.00...2.02.01)
|
||||||
|
|
||||||
|
**Implemented enhancements:**
|
||||||
|
|
||||||
|
- Add Changelog generation to our process. [\#506](https://github.com/kokkos/kokkos/issues/506)
|
||||||
|
|
||||||
|
**Fixed bugs:**
|
||||||
|
|
||||||
|
- Test scratch\_request fails in Serial with Debug enabled [\#520](https://github.com/kokkos/kokkos/issues/520)
|
||||||
|
- Bug In BoundsCheck for DynRankView [\#516](https://github.com/kokkos/kokkos/issues/516)
|
||||||
|
|
||||||
|
## [2.02.00](https://github.com/kokkos/kokkos/tree/2.02.00) (2016-10-30)
|
||||||
|
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.10...2.02.00)
|
||||||
|
|
||||||
|
**Implemented enhancements:**
|
||||||
|
|
||||||
|
- Add PowerPC assembly for grabbing clock register in memory pool [\#511](https://github.com/kokkos/kokkos/issues/511)
|
||||||
|
- Add GCC 6.x support [\#508](https://github.com/kokkos/kokkos/issues/508)
|
||||||
|
- Test install and build against installed library [\#498](https://github.com/kokkos/kokkos/issues/498)
|
||||||
|
- Makefile.kokkos adds expt-extended-lambda to cuda build with clang [\#490](https://github.com/kokkos/kokkos/issues/490)
|
||||||
|
- Add top-level makefile option to just test kokkos-core unit-test [\#485](https://github.com/kokkos/kokkos/issues/485)
|
||||||
|
- Split and harmonize Object Files of Core UnitTests to increase build parallelism [\#484](https://github.com/kokkos/kokkos/issues/484)
|
||||||
|
- LayoutLeft to LayoutLeft subview for 3D and 4D views [\#473](https://github.com/kokkos/kokkos/issues/473)
|
||||||
|
- Add official Cuda 8.0 support [\#468](https://github.com/kokkos/kokkos/issues/468)
|
||||||
|
- Allow C++1Z Flag for Class Lambda capture [\#465](https://github.com/kokkos/kokkos/issues/465)
|
||||||
|
- Add Clang 4.0+ compilation of Cuda code [\#455](https://github.com/kokkos/kokkos/issues/455)
|
||||||
|
- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch [\#445](https://github.com/kokkos/kokkos/issues/445)
|
||||||
|
- Add name of view to "View bounds error" [\#432](https://github.com/kokkos/kokkos/issues/432)
|
||||||
|
- Move Sort Binning Operators into Kokkos namespace [\#421](https://github.com/kokkos/kokkos/issues/421)
|
||||||
|
- TaskPolicy - generate error when attempt to use uninitialized [\#396](https://github.com/kokkos/kokkos/issues/396)
|
||||||
|
- Import WithoutInitializing and AllowPadding into Kokkos namespace [\#325](https://github.com/kokkos/kokkos/issues/325)
|
||||||
|
- TeamThreadRange requires begin, end to be the same type [\#305](https://github.com/kokkos/kokkos/issues/305)
|
||||||
|
- CudaUVMSpace should track \# allocations, due to CUDA limit on \# UVM allocations [\#300](https://github.com/kokkos/kokkos/issues/300)
|
||||||
|
- Remove old View and its infrastructure [\#259](https://github.com/kokkos/kokkos/issues/259)
|
||||||
|
|
||||||
|
**Fixed bugs:**
|
||||||
|
|
||||||
|
- Bug in TestCuda\_Other.cpp: most likely assembly inserted into Device code [\#515](https://github.com/kokkos/kokkos/issues/515)
|
||||||
|
- Cuda Compute Capability check of GPU is outdated [\#509](https://github.com/kokkos/kokkos/issues/509)
|
||||||
|
- multi\_scratch test with hwloc and pthreads seg-faults. [\#504](https://github.com/kokkos/kokkos/issues/504)
|
||||||
|
- generate\_makefile.bash: "make install" is broken [\#503](https://github.com/kokkos/kokkos/issues/503)
|
||||||
|
- make clean in Out of Source Build/Tests Does Not Work Correctly [\#502](https://github.com/kokkos/kokkos/issues/502)
|
||||||
|
- Makefiles for test and examples have issues in Cuda when CXX is not explicitly specified [\#497](https://github.com/kokkos/kokkos/issues/497)
|
||||||
|
- Dispatch lambda test directly inside GTEST macro doesn't work with nvcc [\#491](https://github.com/kokkos/kokkos/issues/491)
|
||||||
|
- UnitTests with HWLOC enabled fail if run with mpirun bound to a single core [\#489](https://github.com/kokkos/kokkos/issues/489)
|
||||||
|
- Failing Reducer Test on Mac with Pthreads [\#479](https://github.com/kokkos/kokkos/issues/479)
|
||||||
|
- make test Dumps Error with Clang Not Found [\#471](https://github.com/kokkos/kokkos/issues/471)
|
||||||
|
- OpenMP TeamPolicy member broadcast not using correct volatile shared variable [\#424](https://github.com/kokkos/kokkos/issues/424)
|
||||||
|
- TaskPolicy - generate error when attempt to use uninitialized [\#396](https://github.com/kokkos/kokkos/issues/396)
|
||||||
|
- New task policy implementation is pulling in old experimental code. [\#372](https://github.com/kokkos/kokkos/issues/372)
|
||||||
|
- MemoryPool unit test hangs on Power8 with GCC 6.1.0 [\#298](https://github.com/kokkos/kokkos/issues/298)
|
||||||
|
|
||||||
|
## [2.01.10](https://github.com/kokkos/kokkos/tree/2.01.10) (2016-09-27)
|
||||||
|
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.06...2.01.10)
|
||||||
|
|
||||||
|
**Implemented enhancements:**
|
||||||
|
|
||||||
|
- Enable Profiling by default in Tribits build [\#438](https://github.com/kokkos/kokkos/issues/438)
|
||||||
|
- parallel\_reduce\(0\), parallel\_scan\(0\) unit tests [\#436](https://github.com/kokkos/kokkos/issues/436)
|
||||||
|
- data\(\)==NULL after realloc with LayoutStride [\#351](https://github.com/kokkos/kokkos/issues/351)
|
||||||
|
- Fix tutorials to track new Kokkos::View [\#323](https://github.com/kokkos/kokkos/issues/323)
|
||||||
|
- Rename team policy set\_scratch\_size. [\#195](https://github.com/kokkos/kokkos/issues/195)
|
||||||
|
|
||||||
|
**Fixed bugs:**
|
||||||
|
|
||||||
|
- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch [\#445](https://github.com/kokkos/kokkos/issues/445)
|
||||||
|
- Makefile spits syntax error [\#435](https://github.com/kokkos/kokkos/issues/435)
|
||||||
|
- Kokkos::sort fails for view with all the same values [\#422](https://github.com/kokkos/kokkos/issues/422)
|
||||||
|
- Generic Reducers: can't accept inline constructed reducer [\#404](https://github.com/kokkos/kokkos/issues/404)
|
||||||
|
- data\\(\\)==NULL after realloc with LayoutStride [\#351](https://github.com/kokkos/kokkos/issues/351)
|
||||||
|
- const subview of const view with compile time dimensions on Cuda backend [\#310](https://github.com/kokkos/kokkos/issues/310)
|
||||||
|
- Kokkos \(in Trilinos\) Causes Internal Compiler Error on CUDA 8.0.21-EA on POWER8 [\#307](https://github.com/kokkos/kokkos/issues/307)
|
||||||
|
- Core Oversubscription Detection Broken? [\#159](https://github.com/kokkos/kokkos/issues/159)
|
||||||
|
|
||||||
|
|
||||||
|
## [2.01.06](https://github.com/kokkos/kokkos/tree/2.01.06) (2016-09-02)
|
||||||
|
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.00...2.01.06)
|
||||||
|
|
||||||
|
**Implemented enhancements:**
|
||||||
|
|
||||||
|
- Add "standard" reducers for lambda-supportable customized reduce [\#411](https://github.com/kokkos/kokkos/issues/411)
|
||||||
|
- TaskPolicy - single thread back-end execution [\#390](https://github.com/kokkos/kokkos/issues/390)
|
||||||
|
- Kokkos master clone tag [\#387](https://github.com/kokkos/kokkos/issues/387)
|
||||||
|
- Query memory requirements from task policy [\#378](https://github.com/kokkos/kokkos/issues/378)
|
||||||
|
- Output order of test\_atomic.cpp is confusing [\#373](https://github.com/kokkos/kokkos/issues/373)
|
||||||
|
- Missing testing for atomics [\#341](https://github.com/kokkos/kokkos/issues/341)
|
||||||
|
- Feature request for Kokkos to provide Kokkos::atomic\_fetch\_max and atomic\_fetch\_min [\#336](https://github.com/kokkos/kokkos/issues/336)
|
||||||
|
- TaskPolicy\<Cuda\> performance requires teams mapped to warps [\#218](https://github.com/kokkos/kokkos/issues/218)
|
||||||
|
|
||||||
|
**Fixed bugs:**
|
||||||
|
|
||||||
|
- Reduce with Teams broken for custom initialize [\#407](https://github.com/kokkos/kokkos/issues/407)
|
||||||
|
- Failing Kokkos build on Debian [\#402](https://github.com/kokkos/kokkos/issues/402)
|
||||||
|
- Failing Tests on NVIDIA Pascal GPUs [\#398](https://github.com/kokkos/kokkos/issues/398)
|
||||||
|
- Algorithms: fill\_random assumes dimensions fit in unsigned int [\#389](https://github.com/kokkos/kokkos/issues/389)
|
||||||
|
- Kokkos::subview with RandomAccess Memory Trait [\#385](https://github.com/kokkos/kokkos/issues/385)
|
||||||
|
- Build warning \(signed / unsigned comparison\) in Cuda implementation [\#365](https://github.com/kokkos/kokkos/issues/365)
|
||||||
|
- wrong results for a parallel\_reduce with CUDA8 / Maxwell50 [\#352](https://github.com/kokkos/kokkos/issues/352)
|
||||||
|
- Hierarchical parallelism - 3 level unit test [\#344](https://github.com/kokkos/kokkos/issues/344)
|
||||||
|
- Can I allocate a View w/ both WithoutInitializing & AllowPadding? [\#324](https://github.com/kokkos/kokkos/issues/324)
|
||||||
|
- subview View layout determination [\#309](https://github.com/kokkos/kokkos/issues/309)
|
||||||
|
- Unit tests with Cuda - Maxwell [\#196](https://github.com/kokkos/kokkos/issues/196)
|
||||||
|
|
||||||
|
## [2.01.00](https://github.com/kokkos/kokkos/tree/2.01.00) (2016-07-21)
|
||||||
|
[Full Changelog](https://github.com/kokkos/kokkos/compare/End_C++98...2.01.00)
|
||||||
|
|
||||||
|
**Implemented enhancements:**
|
||||||
|
|
||||||
|
- Edit ViewMapping so assigning Views with the same custom layout compiles when const casting [\#327](https://github.com/kokkos/kokkos/issues/327)
|
||||||
|
- DynRankView: Performance improvement for operator\(\) [\#321](https://github.com/kokkos/kokkos/issues/321)
|
||||||
|
- Interoperability between static and dynamic rank views [\#295](https://github.com/kokkos/kokkos/issues/295)
|
||||||
|
- subview member function ? [\#280](https://github.com/kokkos/kokkos/issues/280)
|
||||||
|
- Inter-operatibility between View and DynRankView. [\#245](https://github.com/kokkos/kokkos/issues/245)
|
||||||
|
- \(Trilinos\) build warning in atomic\_assign, with Kokkos::complex [\#177](https://github.com/kokkos/kokkos/issues/177)
|
||||||
|
- View\<\>::shmem\_size should runtime check for number of arguments equal to rank [\#176](https://github.com/kokkos/kokkos/issues/176)
|
||||||
|
- Custom reduction join via lambda argument [\#99](https://github.com/kokkos/kokkos/issues/99)
|
||||||
|
- DynRankView with 0 dimensions passed in at construction [\#293](https://github.com/kokkos/kokkos/issues/293)
|
||||||
|
- Inject view\_alloc and friends into Kokkos namespace [\#292](https://github.com/kokkos/kokkos/issues/292)
|
||||||
|
- Less restrictive TeamPolicy reduction on Cuda [\#286](https://github.com/kokkos/kokkos/issues/286)
|
||||||
|
- deep\_copy using remap with source execution space [\#267](https://github.com/kokkos/kokkos/issues/267)
|
||||||
|
- Suggestion: Enable opt-in L1 caching via nvcc-wrapper [\#261](https://github.com/kokkos/kokkos/issues/261)
|
||||||
|
- More flexible create\_mirror functions [\#260](https://github.com/kokkos/kokkos/issues/260)
|
||||||
|
- Rename View::memory\_span to View::required\_allocation\_size [\#256](https://github.com/kokkos/kokkos/issues/256)
|
||||||
|
- Use of subviews and views with compile-time dimensions [\#237](https://github.com/kokkos/kokkos/issues/237)
|
||||||
|
- Use of subviews and views with compile-time dimensions [\#237](https://github.com/kokkos/kokkos/issues/237)
|
||||||
|
- Kokkos::Timer [\#234](https://github.com/kokkos/kokkos/issues/234)
|
||||||
|
- Fence CudaUVMSpace allocations [\#230](https://github.com/kokkos/kokkos/issues/230)
|
||||||
|
- View::operator\(\) accept std::is\_integral and std::is\_enum [\#227](https://github.com/kokkos/kokkos/issues/227)
|
||||||
|
- Allocating zero size View [\#216](https://github.com/kokkos/kokkos/issues/216)
|
||||||
|
- Thread scalable memory pool [\#212](https://github.com/kokkos/kokkos/issues/212)
|
||||||
|
- Add a way to disable memory leak output [\#194](https://github.com/kokkos/kokkos/issues/194)
|
||||||
|
- Kokkos exec space init should init Kokkos profiling [\#192](https://github.com/kokkos/kokkos/issues/192)
|
||||||
|
- Runtime rank wrapper for View [\#189](https://github.com/kokkos/kokkos/issues/189)
|
||||||
|
- Profiling Interface [\#158](https://github.com/kokkos/kokkos/issues/158)
|
||||||
|
- Fix View assignment \(of managed to unmanaged\) [\#153](https://github.com/kokkos/kokkos/issues/153)
|
||||||
|
- Add unit test for assignment of managed View to unmanaged View [\#152](https://github.com/kokkos/kokkos/issues/152)
|
||||||
|
- Check for oversubscription of threads with MPI in Kokkos::initialize [\#149](https://github.com/kokkos/kokkos/issues/149)
|
||||||
|
- Dynamic resizeable 1dimensional view [\#143](https://github.com/kokkos/kokkos/issues/143)
|
||||||
|
- Develop TaskPolicy for CUDA [\#142](https://github.com/kokkos/kokkos/issues/142)
|
||||||
|
- New View : Test Compilation Downstream [\#138](https://github.com/kokkos/kokkos/issues/138)
|
||||||
|
- New View Implementation [\#135](https://github.com/kokkos/kokkos/issues/135)
|
||||||
|
- Add variant of subview that lets users add traits [\#134](https://github.com/kokkos/kokkos/issues/134)
|
||||||
|
- NVCC-WRAPPER: Add --host-only flag [\#121](https://github.com/kokkos/kokkos/issues/121)
|
||||||
|
- Address gtest issue with TriBITS Kokkos build outside of Trilinos [\#117](https://github.com/kokkos/kokkos/issues/117)
|
||||||
|
- Make tests pass with -expt-extended-lambda on CUDA [\#108](https://github.com/kokkos/kokkos/issues/108)
|
||||||
|
- Dynamic scheduling for parallel\_for and parallel\_reduce [\#106](https://github.com/kokkos/kokkos/issues/106)
|
||||||
|
- Runtime or compile time error when reduce functor's join is not properly specified as const member function or with volatile arguments [\#105](https://github.com/kokkos/kokkos/issues/105)
|
||||||
|
- Error out when the number of threads is modified after kokkos is initialized [\#104](https://github.com/kokkos/kokkos/issues/104)
|
||||||
|
- Porting to POWER and remove assumption of X86 default [\#103](https://github.com/kokkos/kokkos/issues/103)
|
||||||
|
- Dynamic scheduling option for RangePolicy [\#100](https://github.com/kokkos/kokkos/issues/100)
|
||||||
|
- SharedMemory Support for Lambdas [\#81](https://github.com/kokkos/kokkos/issues/81)
|
||||||
|
- Recommended TeamSize for Lambdas [\#80](https://github.com/kokkos/kokkos/issues/80)
|
||||||
|
- Add Aggressive Vectorization Compilation mode [\#72](https://github.com/kokkos/kokkos/issues/72)
|
||||||
|
- Dynamic scheduling team execution policy [\#53](https://github.com/kokkos/kokkos/issues/53)
|
||||||
|
- UVM allocations in multi-GPU systems [\#50](https://github.com/kokkos/kokkos/issues/50)
|
||||||
|
- Synchronic in Kokkos::Impl [\#44](https://github.com/kokkos/kokkos/issues/44)
|
||||||
|
- index and dimension types in for loops [\#28](https://github.com/kokkos/kokkos/issues/28)
|
||||||
|
- Subview assign of 1D Strided with stride 1 to LayoutLeft/Right [\#1](https://github.com/kokkos/kokkos/issues/1)
|
||||||
|
|
||||||
|
**Fixed bugs:**
|
||||||
|
|
||||||
|
- misspelled variable name in Kokkos\_Atomic\_Fetch + missing unit tests [\#340](https://github.com/kokkos/kokkos/issues/340)
|
||||||
|
- seg fault Kokkos::Impl::CudaInternal::print\_configuration [\#338](https://github.com/kokkos/kokkos/issues/338)
|
||||||
|
- Clang compiler error with named parallel\_reduce, tags, and TeamPolicy. [\#335](https://github.com/kokkos/kokkos/issues/335)
|
||||||
|
- Shared Memory Allocation Error at parallel\_reduce [\#311](https://github.com/kokkos/kokkos/issues/311)
|
||||||
|
- DynRankView: Fix resize and realloc [\#303](https://github.com/kokkos/kokkos/issues/303)
|
||||||
|
- Scratch memory and dynamic scheduling [\#279](https://github.com/kokkos/kokkos/issues/279)
|
||||||
|
- MemoryPool infinite loop when out of memory [\#312](https://github.com/kokkos/kokkos/issues/312)
|
||||||
|
- Kokkos DynRankView changes break Sacado and Panzer [\#299](https://github.com/kokkos/kokkos/issues/299)
|
||||||
|
- MemoryPool fails to compile on non-cuda non-x86 [\#297](https://github.com/kokkos/kokkos/issues/297)
|
||||||
|
- Random Number Generator Fix [\#296](https://github.com/kokkos/kokkos/issues/296)
|
||||||
|
- View template parameter ordering Bug [\#282](https://github.com/kokkos/kokkos/issues/282)
|
||||||
|
- Serial task policy broken. [\#281](https://github.com/kokkos/kokkos/issues/281)
|
||||||
|
- deep\_copy with LayoutStride should not memcpy [\#262](https://github.com/kokkos/kokkos/issues/262)
|
||||||
|
- DualView::need\_sync should be a const method [\#248](https://github.com/kokkos/kokkos/issues/248)
|
||||||
|
- Arbitrary-sized atomics on GPUs broken; loop forever [\#238](https://github.com/kokkos/kokkos/issues/238)
|
||||||
|
- boolean reduction value\_type changes answer [\#225](https://github.com/kokkos/kokkos/issues/225)
|
||||||
|
- Custom init\(\) function for parallel\_reduce with array value\_type [\#210](https://github.com/kokkos/kokkos/issues/210)
|
||||||
|
- unit\_test Makefile is Broken - Recursively Calls itself until Machine Apocalypse. [\#202](https://github.com/kokkos/kokkos/issues/202)
|
||||||
|
- nvcc\_wrapper Does Not Support -Xcompiler \<compiler option\> [\#198](https://github.com/kokkos/kokkos/issues/198)
|
||||||
|
- Kokkos exec space init should init Kokkos profiling [\#192](https://github.com/kokkos/kokkos/issues/192)
|
||||||
|
- Kokkos Threads Backend impl\_shared\_alloc Broken on Intel 16.1 \(Shepard Haswell\) [\#186](https://github.com/kokkos/kokkos/issues/186)
|
||||||
|
- pthread back end hangs if used uninitialized [\#182](https://github.com/kokkos/kokkos/issues/182)
|
||||||
|
- parallel\_reduce of size 0, not calling init/join [\#175](https://github.com/kokkos/kokkos/issues/175)
|
||||||
|
- Bug in Threads with OpenMP enabled [\#173](https://github.com/kokkos/kokkos/issues/173)
|
||||||
|
- KokkosExp\_SharedAlloc, m\_team\_work\_index inaccessible [\#166](https://github.com/kokkos/kokkos/issues/166)
|
||||||
|
- 128-bit CAS without Assembly Broken? [\#161](https://github.com/kokkos/kokkos/issues/161)
|
||||||
|
- fatal error: Cuda/Kokkos\_Cuda\_abort.hpp: No such file or directory [\#157](https://github.com/kokkos/kokkos/issues/157)
|
||||||
|
- Power8: Fix OpenMP backend [\#139](https://github.com/kokkos/kokkos/issues/139)
|
||||||
|
- Data race in Kokkos OpenMP initialization [\#131](https://github.com/kokkos/kokkos/issues/131)
|
||||||
|
- parallel\_launch\_local\_memory and cuda 7.5 [\#125](https://github.com/kokkos/kokkos/issues/125)
|
||||||
|
- Resize can fail with Cuda due to asynchronous dispatch [\#119](https://github.com/kokkos/kokkos/issues/119)
|
||||||
|
- Qthread taskpolicy initialization bug. [\#92](https://github.com/kokkos/kokkos/issues/92)
|
||||||
|
- Windows: sys/mman.h [\#89](https://github.com/kokkos/kokkos/issues/89)
|
||||||
|
- Windows: atomic\_fetch\_sub\(\) [\#88](https://github.com/kokkos/kokkos/issues/88)
|
||||||
|
- Windows: snprintf [\#87](https://github.com/kokkos/kokkos/issues/87)
|
||||||
|
- Parallel\_Reduce with TeamPolicy and league size of 0 returns garbage [\#85](https://github.com/kokkos/kokkos/issues/85)
|
||||||
|
- Throw with Cuda when using \(2D\) team\_policy parallel\_reduce with less than a warp size [\#76](https://github.com/kokkos/kokkos/issues/76)
|
||||||
|
- Scalar views don't work with Kokkos::Atomic memory trait [\#69](https://github.com/kokkos/kokkos/issues/69)
|
||||||
|
- Reduce the number of threads per team for Cuda [\#63](https://github.com/kokkos/kokkos/issues/63)
|
||||||
|
- Named Kernels fail for reductions with CUDA [\#60](https://github.com/kokkos/kokkos/issues/60)
|
||||||
|
- Kokkos View dimension\_\(\) for long returning unsigned int [\#20](https://github.com/kokkos/kokkos/issues/20)
|
||||||
|
- atomic test hangs with LLVM [\#6](https://github.com/kokkos/kokkos/issues/6)
|
||||||
|
- OpenMP Test should set omp\_set\_num\_threads to 1 [\#4](https://github.com/kokkos/kokkos/issues/4)
|
||||||
|
|
||||||
|
**Closed issues:**
|
||||||
|
|
||||||
|
- develop branch broken with CUDA 8 and --expt-extended-lambda [\#354](https://github.com/kokkos/kokkos/issues/354)
|
||||||
|
- --arch=KNL with Intel 2016 build failure [\#349](https://github.com/kokkos/kokkos/issues/349)
|
||||||
|
- Error building with Cuda when passing -DKOKKOS\_CUDA\_USE\_LAMBDA to generate\_makefile.bash [\#343](https://github.com/kokkos/kokkos/issues/343)
|
||||||
|
- Can I safely use int indices in a 2-D View with capacity \> 2B? [\#318](https://github.com/kokkos/kokkos/issues/318)
|
||||||
|
- Kokkos::ViewAllocateWithoutInitializing is not working [\#317](https://github.com/kokkos/kokkos/issues/317)
|
||||||
|
- Intel build on Mac OS X [\#277](https://github.com/kokkos/kokkos/issues/277)
|
||||||
|
- deleted [\#271](https://github.com/kokkos/kokkos/issues/271)
|
||||||
|
- Broken Mira build [\#268](https://github.com/kokkos/kokkos/issues/268)
|
||||||
|
- 32-bit build [\#246](https://github.com/kokkos/kokkos/issues/246)
|
||||||
|
- parallel\_reduce with RDC crashes linker [\#232](https://github.com/kokkos/kokkos/issues/232)
|
||||||
|
- build of Kokkos\_Sparse\_MV\_impl\_spmv\_Serial.cpp.o fails if you use nvcc and have cuda disabled [\#209](https://github.com/kokkos/kokkos/issues/209)
|
||||||
|
- Kokkos Serial execution space is not tested with TeamPolicy. [\#207](https://github.com/kokkos/kokkos/issues/207)
|
||||||
|
- Unit test failure on Hansen KokkosCore\_UnitTest\_Cuda\_MPI\_1 [\#200](https://github.com/kokkos/kokkos/issues/200)
|
||||||
|
- nvcc compiler warning: calling a \_\_host\_\_ function from a \_\_host\_\_ \_\_device\_\_ function is not allowed [\#180](https://github.com/kokkos/kokkos/issues/180)
|
||||||
|
- Intel 15 build error with defaulted "move" operators [\#171](https://github.com/kokkos/kokkos/issues/171)
|
||||||
|
- missing libkokkos.a during Trilinos 12.4.2 build, yet other libkokkos\*.a libs are there [\#165](https://github.com/kokkos/kokkos/issues/165)
|
||||||
|
- Tie atomic updates to execution space or even to thread team? \(speculation\) [\#144](https://github.com/kokkos/kokkos/issues/144)
|
||||||
|
- New View: Compiletime/size Test [\#137](https://github.com/kokkos/kokkos/issues/137)
|
||||||
|
- New View : Performance Test [\#136](https://github.com/kokkos/kokkos/issues/136)
|
||||||
|
- Signed/unsigned comparison warning in CUDA parallel [\#130](https://github.com/kokkos/kokkos/issues/130)
|
||||||
|
- Kokkos::complex: Need op\* w/ std::complex & real [\#126](https://github.com/kokkos/kokkos/issues/126)
|
||||||
|
- Use uintptr\_t for casting pointers [\#110](https://github.com/kokkos/kokkos/issues/110)
|
||||||
|
- Default thread mapping behavior between P and Q threads. [\#91](https://github.com/kokkos/kokkos/issues/91)
|
||||||
|
- Windows: Atomic\_Fetch\_Exchange\(\) return type [\#90](https://github.com/kokkos/kokkos/issues/90)
|
||||||
|
- Synchronic unit test is way too long [\#84](https://github.com/kokkos/kokkos/issues/84)
|
||||||
|
- nvcc\_wrapper -\> $\(NVCC\_WRAPPER\) [\#42](https://github.com/kokkos/kokkos/issues/42)
|
||||||
|
- Check compiler version and print helpful message [\#39](https://github.com/kokkos/kokkos/issues/39)
|
||||||
|
- Kokkos shared memory on Cuda uses a lot of registers [\#31](https://github.com/kokkos/kokkos/issues/31)
|
||||||
|
- Can not pass unit test `cuda.space` without a GT 720 [\#25](https://github.com/kokkos/kokkos/issues/25)
|
||||||
|
- Makefile.kokkos lacks bounds checking option that CMake has [\#24](https://github.com/kokkos/kokkos/issues/24)
|
||||||
|
- Kokkos can not complete unit tests with CUDA UVM enabled [\#23](https://github.com/kokkos/kokkos/issues/23)
|
||||||
|
- Simplify teams + shared memory histogram example to remove vectorization [\#21](https://github.com/kokkos/kokkos/issues/21)
|
||||||
|
- Kokkos needs to rever to ${PROJECT\_NAME}\_ENABLE\_CXX11 not Trilinos\_ENABLE\_CXX11 [\#17](https://github.com/kokkos/kokkos/issues/17)
|
||||||
|
- Kokkos Base Makefile adds AVX to KNC Build [\#16](https://github.com/kokkos/kokkos/issues/16)
|
||||||
|
- MS Visual Studio 2013 Build Errors [\#9](https://github.com/kokkos/kokkos/issues/9)
|
||||||
|
- subview\(X, ALL\(\), j\) for 2-D LayoutRight View X: should it view a column? [\#5](https://github.com/kokkos/kokkos/issues/5)
|
||||||
|
|
||||||
|
## [End_C++98](https://github.com/kokkos/kokkos/tree/End_C++98) (2015-04-15)
|
||||||
|
|
||||||
|
|
||||||
|
\* *This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)*
|
||||||
@ -34,8 +34,8 @@ TRIBITS_PACKAGE_DECL(Kokkos) # ENABLE_SHADOWING_WARNINGS)
|
|||||||
# for compatibility with Kokkos' Makefile build system.
|
# for compatibility with Kokkos' Makefile build system.
|
||||||
|
|
||||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||||
${PACKAGE_NAME}_ENABLE_DEBUG
|
Kokkos_ENABLE_DEBUG
|
||||||
${PACKAGE_NAME_UC}_HAVE_DEBUG
|
KOKKOS_HAVE_DEBUG
|
||||||
"Enable run-time debug checks. These checks may be expensive, so they are disabled by default in a release build."
|
"Enable run-time debug checks. These checks may be expensive, so they are disabled by default in a release build."
|
||||||
${${PROJECT_NAME}_ENABLE_DEBUG}
|
${${PROJECT_NAME}_ENABLE_DEBUG}
|
||||||
)
|
)
|
||||||
@ -57,7 +57,21 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
|
|||||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||||
Kokkos_ENABLE_Cuda_UVM
|
Kokkos_ENABLE_Cuda_UVM
|
||||||
KOKKOS_USE_CUDA_UVM
|
KOKKOS_USE_CUDA_UVM
|
||||||
"Enable CUDA Unified Virtual Memory support in Kokkos."
|
"Enable CUDA Unified Virtual Memory as the default in Kokkos."
|
||||||
|
OFF
|
||||||
|
)
|
||||||
|
|
||||||
|
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||||
|
Kokkos_ENABLE_Cuda_RDC
|
||||||
|
KOKKOS_HAVE_CUDA_RDC
|
||||||
|
"Enable CUDA Relocatable Device Code support in Kokkos."
|
||||||
|
OFF
|
||||||
|
)
|
||||||
|
|
||||||
|
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||||
|
Kokkos_ENABLE_Cuda_Lambda
|
||||||
|
KOKKOS_HAVE_CUDA_LAMBDA
|
||||||
|
"Enable CUDA LAMBDA support in Kokkos."
|
||||||
OFF
|
OFF
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -72,6 +86,9 @@ ASSERT_DEFINED(TPL_ENABLE_Pthread)
|
|||||||
IF (Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
|
IF (Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
|
||||||
MESSAGE(FATAL_ERROR "You set Kokkos_ENABLE_Pthread=ON, but Trilinos' support for Pthread(s) is not enabled (TPL_ENABLE_Pthread=OFF). This is not allowed. Please enable Pthreads in Trilinos before attempting to enable Kokkos' support for Pthreads.")
|
MESSAGE(FATAL_ERROR "You set Kokkos_ENABLE_Pthread=ON, but Trilinos' support for Pthread(s) is not enabled (TPL_ENABLE_Pthread=OFF). This is not allowed. Please enable Pthreads in Trilinos before attempting to enable Kokkos' support for Pthreads.")
|
||||||
ENDIF ()
|
ENDIF ()
|
||||||
|
IF (NOT TPL_ENABLE_Pthread)
|
||||||
|
ADD_DEFINITIONS(-DGTEST_HAS_PTHREAD=0)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||||
Kokkos_ENABLE_OpenMP
|
Kokkos_ENABLE_OpenMP
|
||||||
@ -162,13 +179,28 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
|
|||||||
|
|
||||||
#------------------------------------------------------------------------------
|
#------------------------------------------------------------------------------
|
||||||
#
|
#
|
||||||
# C) Process the subpackages for Kokkos
|
# C) Install Kokkos' executable scripts
|
||||||
|
#
|
||||||
|
|
||||||
|
|
||||||
|
# nvcc_wrapper is Kokkos' wrapper for NVIDIA's NVCC CUDA compiler.
|
||||||
|
# Kokkos needs nvcc_wrapper in order to build. Other libraries and
|
||||||
|
# executables also need nvcc_wrapper. Thus, we need to install it.
|
||||||
|
# If the argument of DESTINATION is a relative path, CMake computes it
|
||||||
|
# as relative to ${CMAKE_INSTALL_PATH}.
|
||||||
|
|
||||||
|
INSTALL(PROGRAMS ${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper DESTINATION bin)
|
||||||
|
|
||||||
|
|
||||||
|
#------------------------------------------------------------------------------
|
||||||
|
#
|
||||||
|
# D) Process the subpackages for Kokkos
|
||||||
#
|
#
|
||||||
|
|
||||||
TRIBITS_PROCESS_SUBPACKAGES()
|
TRIBITS_PROCESS_SUBPACKAGES()
|
||||||
|
|
||||||
#
|
#
|
||||||
# D) If Kokkos itself is enabled, process the Kokkos package
|
# E) If Kokkos itself is enabled, process the Kokkos package
|
||||||
#
|
#
|
||||||
|
|
||||||
TRIBITS_PACKAGE_DEF()
|
TRIBITS_PACKAGE_DEF()
|
||||||
|
|||||||
@ -7,13 +7,13 @@ CXXFLAGS=$(CCFLAGS)
|
|||||||
#Options: OpenMP,Serial,Pthreads,Cuda
|
#Options: OpenMP,Serial,Pthreads,Cuda
|
||||||
KOKKOS_DEVICES ?= "OpenMP"
|
KOKKOS_DEVICES ?= "OpenMP"
|
||||||
#KOKKOS_DEVICES ?= "Pthreads"
|
#KOKKOS_DEVICES ?= "Pthreads"
|
||||||
#Options: KNC,SNB,HSW,Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal61,ARMv8,BGQ,Power7,Power8,KNL,BDW
|
#Options: KNC,SNB,HSW,Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal61,ARMv80,ARMv81,ARMv8-ThunderX,BGQ,Power7,Power8,KNL,BDW,SKX
|
||||||
KOKKOS_ARCH ?= ""
|
KOKKOS_ARCH ?= ""
|
||||||
#Options: yes,no
|
#Options: yes,no
|
||||||
KOKKOS_DEBUG ?= "no"
|
KOKKOS_DEBUG ?= "no"
|
||||||
#Options: hwloc,librt,experimental_memkind
|
#Options: hwloc,librt,experimental_memkind
|
||||||
KOKKOS_USE_TPLS ?= ""
|
KOKKOS_USE_TPLS ?= ""
|
||||||
#Options: c++11
|
#Options: c++11,c++1z
|
||||||
KOKKOS_CXX_STANDARD ?= "c++11"
|
KOKKOS_CXX_STANDARD ?= "c++11"
|
||||||
#Options: aggressive_vectorization,disable_profiling
|
#Options: aggressive_vectorization,disable_profiling
|
||||||
KOKKOS_OPTIONS ?= ""
|
KOKKOS_OPTIONS ?= ""
|
||||||
@ -26,6 +26,7 @@ KOKKOS_CUDA_OPTIONS ?= ""
|
|||||||
|
|
||||||
KOKKOS_INTERNAL_ENABLE_DEBUG := $(strip $(shell echo $(KOKKOS_DEBUG) | grep "yes" | wc -l))
|
KOKKOS_INTERNAL_ENABLE_DEBUG := $(strip $(shell echo $(KOKKOS_DEBUG) | grep "yes" | wc -l))
|
||||||
KOKKOS_INTERNAL_ENABLE_CXX11 := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) | grep "c++11" | wc -l))
|
KOKKOS_INTERNAL_ENABLE_CXX11 := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) | grep "c++11" | wc -l))
|
||||||
|
KOKKOS_INTERNAL_ENABLE_CXX1Z := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) | grep "c++1z" | wc -l))
|
||||||
|
|
||||||
# Check for external libraries
|
# Check for external libraries
|
||||||
KOKKOS_INTERNAL_USE_HWLOC := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "hwloc" | wc -l))
|
KOKKOS_INTERNAL_USE_HWLOC := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "hwloc" | wc -l))
|
||||||
@ -53,23 +54,65 @@ ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 0)
|
|||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
# Check for other Execution Spaces
|
||||||
|
|
||||||
|
KOKKOS_INTERNAL_USE_CUDA := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Cuda | wc -l))
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||||
|
KOKKOS_INTERNAL_NVCC_PATH := $(shell which nvcc)
|
||||||
|
CUDA_PATH ?= $(KOKKOS_INTERNAL_NVCC_PATH:/bin/nvcc=)
|
||||||
|
KOKKOS_INTERNAL_COMPILER_NVCC_VERSION := $(shell nvcc --version 2>&1 | grep release | cut -d' ' -f5 | cut -d',' -f1 | tr -d .)
|
||||||
|
endif
|
||||||
|
|
||||||
|
# Check OS
|
||||||
|
|
||||||
|
KOKKOS_OS := $(shell uname -s)
|
||||||
|
KOKKOS_INTERNAL_OS_CYGWIN := $(shell uname -s | grep CYGWIN | wc -l)
|
||||||
|
KOKKOS_INTERNAL_OS_LINUX := $(shell uname -s | grep Linux | wc -l)
|
||||||
|
KOKKOS_INTERNAL_OS_DARWIN := $(shell uname -s | grep Darwin | wc -l)
|
||||||
|
|
||||||
|
# Check compiler
|
||||||
|
|
||||||
KOKKOS_INTERNAL_COMPILER_INTEL := $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l)
|
KOKKOS_INTERNAL_COMPILER_INTEL := $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l)
|
||||||
KOKKOS_INTERNAL_COMPILER_PGI := $(shell $(CXX) --version 2>&1 | grep PGI | wc -l)
|
KOKKOS_INTERNAL_COMPILER_PGI := $(shell $(CXX) --version 2>&1 | grep PGI | wc -l)
|
||||||
KOKKOS_INTERNAL_COMPILER_XL := $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l)
|
KOKKOS_INTERNAL_COMPILER_XL := $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l)
|
||||||
KOKKOS_INTERNAL_COMPILER_CRAY := $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l)
|
KOKKOS_INTERNAL_COMPILER_CRAY := $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l)
|
||||||
KOKKOS_INTERNAL_OS_CYGWIN := $(shell uname | grep CYGWIN | wc -l)
|
KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(CXX) --version 2>&1 | grep "nvcc" | wc -l)
|
||||||
|
KOKKOS_INTERNAL_COMPILER_CLANG := $(shell $(CXX) --version 2>&1 | grep "clang" | wc -l)
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 2)
|
||||||
|
KOKKOS_INTERNAL_COMPILER_CLANG = 1
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 2)
|
||||||
|
KOKKOS_INTERNAL_COMPILER_XL = 1
|
||||||
|
endif
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||||
|
KOKKOS_INTERNAL_COMPILER_CLANG_VERSION := $(shell clang --version | grep version | cut -d ' ' -f3 | tr -d '.')
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||||
|
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_CLANG_VERSION) -lt 400; echo $$?),0)
|
||||||
|
$(error Compiling Cuda code directly with Clang requires version 4.0.0 or higher)
|
||||||
|
endif
|
||||||
|
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := 1
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||||
KOKKOS_INTERNAL_OPENMP_FLAG := -mp
|
KOKKOS_INTERNAL_OPENMP_FLAG := -mp
|
||||||
else
|
else
|
||||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||||
KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
|
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp=libomp
|
||||||
else
|
else
|
||||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
|
||||||
# OpenMP is turned on by default in Cray compiler environment
|
KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
|
||||||
KOKKOS_INTERNAL_OPENMP_FLAG :=
|
|
||||||
else
|
else
|
||||||
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||||
|
# OpenMP is turned on by default in Cray compiler environment
|
||||||
|
KOKKOS_INTERNAL_OPENMP_FLAG :=
|
||||||
|
else
|
||||||
|
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
|
||||||
|
endif
|
||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
@ -84,13 +127,11 @@ else
|
|||||||
KOKKOS_INTERNAL_CXX11_FLAG := -hstd=c++11
|
KOKKOS_INTERNAL_CXX11_FLAG := -hstd=c++11
|
||||||
else
|
else
|
||||||
KOKKOS_INTERNAL_CXX11_FLAG := --std=c++11
|
KOKKOS_INTERNAL_CXX11_FLAG := --std=c++11
|
||||||
|
KOKKOS_INTERNAL_CXX1Z_FLAG := --std=c++1z
|
||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
|
|
||||||
# Check for other Execution Spaces
|
|
||||||
KOKKOS_INTERNAL_USE_CUDA := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Cuda | wc -l))
|
|
||||||
|
|
||||||
# Check for Kokkos Architecture settings
|
# Check for Kokkos Architecture settings
|
||||||
|
|
||||||
#Intel based
|
#Intel based
|
||||||
@ -98,6 +139,7 @@ KOKKOS_INTERNAL_USE_ARCH_KNC := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNC |
|
|||||||
KOKKOS_INTERNAL_USE_ARCH_SNB := $(strip $(shell echo $(KOKKOS_ARCH) | grep SNB | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_SNB := $(strip $(shell echo $(KOKKOS_ARCH) | grep SNB | wc -l))
|
||||||
KOKKOS_INTERNAL_USE_ARCH_HSW := $(strip $(shell echo $(KOKKOS_ARCH) | grep HSW | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_HSW := $(strip $(shell echo $(KOKKOS_ARCH) | grep HSW | wc -l))
|
||||||
KOKKOS_INTERNAL_USE_ARCH_BDW := $(strip $(shell echo $(KOKKOS_ARCH) | grep BDW | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_BDW := $(strip $(shell echo $(KOKKOS_ARCH) | grep BDW | wc -l))
|
||||||
|
KOKKOS_INTERNAL_USE_ARCH_SKX := $(strip $(shell echo $(KOKKOS_ARCH) | grep SKX | wc -l))
|
||||||
KOKKOS_INTERNAL_USE_ARCH_KNL := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNL | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_KNL := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNL | wc -l))
|
||||||
|
|
||||||
#NVIDIA based
|
#NVIDIA based
|
||||||
@ -110,11 +152,13 @@ KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) | grep
|
|||||||
KOKKOS_INTERNAL_USE_ARCH_MAXWELL52 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell52 | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_MAXWELL52 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell52 | wc -l))
|
||||||
KOKKOS_INTERNAL_USE_ARCH_MAXWELL53 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell53 | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_MAXWELL53 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell53 | wc -l))
|
||||||
KOKKOS_INTERNAL_USE_ARCH_PASCAL61 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Pascal61 | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_PASCAL61 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Pascal61 | wc -l))
|
||||||
|
KOKKOS_INTERNAL_USE_ARCH_PASCAL60 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Pascal60 | wc -l))
|
||||||
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
|
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
|
||||||
|
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
|
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
|
||||||
@ -127,13 +171,16 @@ KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_AR
|
|||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
|
||||||
|
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
|
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
|
||||||
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
|
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
|
||||||
endif
|
endif
|
||||||
|
|
||||||
#ARM based
|
#ARM based
|
||||||
KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv8 | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv80 | wc -l))
|
||||||
|
KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv81 | wc -l))
|
||||||
|
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv8-ThunderX | wc -l))
|
||||||
|
|
||||||
#IBM based
|
#IBM based
|
||||||
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(strip $(shell echo $(KOKKOS_ARCH) | grep BGQ | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(strip $(shell echo $(KOKKOS_ARCH) | grep BGQ | wc -l))
|
||||||
@ -145,17 +192,18 @@ KOKKOS_INTERNAL_USE_ARCH_IBM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_
|
|||||||
KOKKOS_INTERNAL_USE_ARCH_AMDAVX := $(strip $(shell echo $(KOKKOS_ARCH) | grep AMDAVX | wc -l))
|
KOKKOS_INTERNAL_USE_ARCH_AMDAVX := $(strip $(shell echo $(KOKKOS_ARCH) | grep AMDAVX | wc -l))
|
||||||
|
|
||||||
#Any AVX?
|
#Any AVX?
|
||||||
KOKKOS_INTERNAL_USE_ARCH_AVX := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX) | bc ))
|
KOKKOS_INTERNAL_USE_ARCH_AVX := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX) | bc ))
|
||||||
KOKKOS_INTERNAL_USE_ARCH_AVX2 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW) | bc ))
|
KOKKOS_INTERNAL_USE_ARCH_AVX2 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW) | bc ))
|
||||||
KOKKOS_INTERNAL_USE_ARCH_AVX512MIC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNL) | bc ))
|
KOKKOS_INTERNAL_USE_ARCH_AVX512MIC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNL) | bc ))
|
||||||
|
KOKKOS_INTERNAL_USE_ARCH_AVX512XEON := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
|
||||||
|
|
||||||
# Decide what ISA level we are able to support
|
# Decide what ISA level we are able to support
|
||||||
KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_KNL) | bc ))
|
KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_KNL)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
|
||||||
KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) | bc ))
|
KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) | bc ))
|
||||||
KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8) | bc ))
|
KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8) | bc ))
|
||||||
|
|
||||||
#Incompatible flags?
|
#Incompatible flags?
|
||||||
KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV80)>1" | bc ))
|
KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)>1" | bc ))
|
||||||
KOKKOS_INTERNAL_USE_ARCH_MULTIGPU := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_NVIDIA)>1" | bc))
|
KOKKOS_INTERNAL_USE_ARCH_MULTIGPU := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_NVIDIA)>1" | bc))
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIHOST), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIHOST), 1)
|
||||||
@ -207,15 +255,21 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
|||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ISA_X86_64), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ISA_X86_64), 1)
|
||||||
|
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
|
||||||
|
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ISA_KNC), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ISA_KNC), 1)
|
||||||
|
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_USE_ISA_KNC" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_USE_ISA_KNC" >> KokkosCore_config.tmp )
|
||||||
|
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ISA_POWERPCLE), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ISA_POWERPCLE), 1)
|
||||||
|
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_USE_ISA_POWERPCLE" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_USE_ISA_POWERPCLE" >> KokkosCore_config.tmp )
|
||||||
|
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
|
||||||
@ -230,9 +284,15 @@ ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX11), 1)
|
|||||||
tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX1Z), 1)
|
||||||
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX1Z_FLAG)
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_HAVE_CXX1Z 1" >> KokkosCore_config.tmp )
|
||||||
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_ENABLE_DEBUG), 1)
|
ifeq ($(KOKKOS_INTERNAL_ENABLE_DEBUG), 1)
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||||
KOKKOS_CXXFLAGS += -G
|
KOKKOS_CXXFLAGS += -lineinfo
|
||||||
endif
|
endif
|
||||||
KOKKOS_CXXFLAGS += -g
|
KOKKOS_CXXFLAGS += -g
|
||||||
KOKKOS_LDFLAGS += -g -ldl
|
KOKKOS_LDFLAGS += -g -ldl
|
||||||
@ -273,13 +333,14 @@ endif
|
|||||||
|
|
||||||
tmp := $(shell echo "/* Cuda Settings */" >> KokkosCore_config.tmp)
|
tmp := $(shell echo "/* Cuda Settings */" >> KokkosCore_config.tmp)
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
|
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
|
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
|
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
|
||||||
@ -289,27 +350,101 @@ ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
|
|||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
|
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||||
KOKKOS_CXXFLAGS += -expt-extended-lambda
|
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION) -gt 70; echo $$?),0)
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
|
||||||
|
KOKKOS_CXXFLAGS += -expt-extended-lambda
|
||||||
|
else
|
||||||
|
$(warning Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.)
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
|
||||||
|
endif
|
||||||
|
endif
|
||||||
endif
|
endif
|
||||||
|
|
||||||
#Add Architecture flags
|
#Add Architecture flags
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV80), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
|
||||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||||
KOKKOS_CXXFLAGS +=
|
KOKKOS_CXXFLAGS +=
|
||||||
KOKKOS_LDFLAGS +=
|
KOKKOS_LDFLAGS +=
|
||||||
else
|
else
|
||||||
KOKKOS_CXXFLAGS += -mavx
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||||
KOKKOS_LDFLAGS += -mavx
|
KOKKOS_CXXFLAGS +=
|
||||||
|
KOKKOS_LDFLAGS +=
|
||||||
|
else
|
||||||
|
KOKKOS_CXXFLAGS += -march=armv8-a
|
||||||
|
KOKKOS_LDFLAGS += -march=armv8-a
|
||||||
|
endif
|
||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV81), 1)
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV81 1" >> KokkosCore_config.tmp )
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||||
|
KOKKOS_CXXFLAGS +=
|
||||||
|
KOKKOS_LDFLAGS +=
|
||||||
|
else
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||||
|
KOKKOS_CXXFLAGS +=
|
||||||
|
KOKKOS_LDFLAGS +=
|
||||||
|
else
|
||||||
|
KOKKOS_CXXFLAGS += -march=armv8.1-a
|
||||||
|
KOKKOS_LDFLAGS += -march=armv8.1-a
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX), 1)
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV8_THUNDERX 1" >> KokkosCore_config.tmp )
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||||
|
KOKKOS_CXXFLAGS +=
|
||||||
|
KOKKOS_LDFLAGS +=
|
||||||
|
else
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||||
|
KOKKOS_CXXFLAGS +=
|
||||||
|
KOKKOS_LDFLAGS +=
|
||||||
|
else
|
||||||
|
KOKKOS_CXXFLAGS += -march=armv8-a -mtune=thunderx
|
||||||
|
KOKKOS_LDFLAGS += -march=armv8-a -mtune=thunderx
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX), 1)
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX 1" >> KokkosCore_config.tmp )
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
|
||||||
|
KOKKOS_CXXFLAGS += -mavx
|
||||||
|
KOKKOS_LDFLAGS += -mavx
|
||||||
|
else
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||||
|
|
||||||
|
else
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||||
|
KOKKOS_CXXFLAGS += -tp=sandybridge
|
||||||
|
KOKKOS_LDFLAGS += -tp=sandybridge
|
||||||
|
else
|
||||||
|
# Assume that this is a really a GNU compiler
|
||||||
|
KOKKOS_CXXFLAGS += -mavx
|
||||||
|
KOKKOS_LDFLAGS += -mavx
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER8), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER8), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_POWER8 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_POWER8 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -mcpu=power8 -mtune=power8
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||||
KOKKOS_LDFLAGS += -mcpu=power8 -mtune=power8
|
|
||||||
|
else
|
||||||
|
# Assume that this is a really a GNU compiler or it could be XL on P8
|
||||||
|
KOKKOS_CXXFLAGS += -mcpu=power8 -mtune=power8
|
||||||
|
KOKKOS_LDFLAGS += -mcpu=power8 -mtune=power8
|
||||||
|
endif
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
|
||||||
@ -322,7 +457,8 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
|
|||||||
|
|
||||||
else
|
else
|
||||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||||
|
KOKKOS_CXXFLAGS += -tp=haswell
|
||||||
|
KOKKOS_LDFLAGS += -tp=haswell
|
||||||
else
|
else
|
||||||
# Assume that this is a really a GNU compiler
|
# Assume that this is a really a GNU compiler
|
||||||
KOKKOS_CXXFLAGS += -march=core-avx2 -mtune=core-avx2
|
KOKKOS_CXXFLAGS += -march=core-avx2 -mtune=core-avx2
|
||||||
@ -352,52 +488,85 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
|
|||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON), 1)
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512XEON 1" >> KokkosCore_config.tmp )
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
|
||||||
|
KOKKOS_CXXFLAGS += -xCORE-AVX512
|
||||||
|
KOKKOS_LDFLAGS += -xCORE-AVX512
|
||||||
|
else
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||||
|
|
||||||
|
else
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||||
|
|
||||||
|
else
|
||||||
|
# Nothing here yet
|
||||||
|
KOKKOS_CXXFLAGS += -march=skylake-avx512
|
||||||
|
KOKKOS_LDFLAGS += -march=skylake-avx512
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KNC), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KNC), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KNC 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_KNC 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -mmic
|
KOKKOS_CXXFLAGS += -mmic
|
||||||
KOKKOS_LDFLAGS += -mmic
|
KOKKOS_LDFLAGS += -mmic
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
#Figure out the architecture flag for Cuda
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||||
|
KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-arch
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||||
|
KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-x cuda --cuda-gpu-arch
|
||||||
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -arch=sm_30
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
|
||||||
endif
|
endif
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -arch=sm_32
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
|
||||||
endif
|
endif
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -arch=sm_35
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
|
||||||
endif
|
endif
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -arch=sm_37
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
|
||||||
endif
|
endif
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -arch=sm_50
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
|
||||||
endif
|
endif
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -arch=sm_52
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
|
||||||
endif
|
endif
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -arch=sm_53
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
|
||||||
endif
|
endif
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
|
||||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
|
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
|
||||||
KOKKOS_CXXFLAGS += -arch=sm_61
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL60), 1)
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
|
||||||
|
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
|
||||||
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
|
||||||
endif
|
endif
|
||||||
endif
|
endif
|
||||||
|
|
||||||
@ -424,6 +593,7 @@ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.cpp)
|
|||||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||||
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.cpp)
|
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.cpp)
|
||||||
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
|
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
|
||||||
|
KOKKOS_CXXFLAGS += -I$(CUDA_PATH)/include
|
||||||
KOKKOS_LDFLAGS += -L$(CUDA_PATH)/lib64
|
KOKKOS_LDFLAGS += -L$(CUDA_PATH)/lib64
|
||||||
KOKKOS_LIBS += -lcudart -lcuda
|
KOKKOS_LIBS += -lcudart -lcuda
|
||||||
endif
|
endif
|
||||||
@ -443,7 +613,7 @@ endif
|
|||||||
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
|
||||||
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.cpp)
|
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.cpp)
|
||||||
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
|
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||||
KOKKOS_CXXFLAGS += -Xcompiler $(KOKKOS_INTERNAL_OPENMP_FLAG)
|
KOKKOS_CXXFLAGS += -Xcompiler $(KOKKOS_INTERNAL_OPENMP_FLAG)
|
||||||
else
|
else
|
||||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
|
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
|
||||||
@ -451,6 +621,14 @@ ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
|
|||||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
|
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
#Explicitly set the GCC Toolchain for Clang
|
||||||
|
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||||
|
KOKKOS_INTERNAL_GCC_PATH = $(shell which g++)
|
||||||
|
KOKKOS_INTERNAL_GCC_TOOLCHAIN = $(KOKKOS_INTERNAL_GCC_PATH:/bin/g++=)
|
||||||
|
KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN) -DKOKKOS_CUDA_CLANG_WORKAROUND -DKOKKOS_CUDA_USE_LDG_INTRINSIC
|
||||||
|
KOKKOS_LDFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
|
||||||
|
endif
|
||||||
|
|
||||||
#With Cygwin functions such as fdopen and fileno are not defined
|
#With Cygwin functions such as fdopen and fileno are not defined
|
||||||
#when strict ansi is enabled. strict ansi gets enabled with --std=c++11
|
#when strict ansi is enabled. strict ansi gets enabled with --std=c++11
|
||||||
#though. So we hard undefine it here. Not sure if that has any bad side effects
|
#though. So we hard undefine it here. Not sure if that has any bad side effects
|
||||||
@ -471,7 +649,7 @@ KOKKOS_OBJ_LINK = $(notdir $(KOKKOS_OBJ))
|
|||||||
include $(KOKKOS_PATH)/Makefile.targets
|
include $(KOKKOS_PATH)/Makefile.targets
|
||||||
|
|
||||||
kokkos-clean:
|
kokkos-clean:
|
||||||
-rm -f $(KOKKOS_OBJ_LINK) KokkosCore_config.h KokkosCore_config.tmp libkokkos.a
|
rm -f $(KOKKOS_OBJ_LINK) KokkosCore_config.h KokkosCore_config.tmp libkokkos.a
|
||||||
|
|
||||||
libkokkos.a: $(KOKKOS_OBJ_LINK) $(KOKKOS_SRC) $(KOKKOS_HEADERS)
|
libkokkos.a: $(KOKKOS_OBJ_LINK) $(KOKKOS_SRC) $(KOKKOS_HEADERS)
|
||||||
ar cr libkokkos.a $(KOKKOS_OBJ_LINK)
|
ar cr libkokkos.a $(KOKKOS_OBJ_LINK)
|
||||||
|
|||||||
@ -14,20 +14,16 @@ Kokkos_hwloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_hwloc.
|
|||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_hwloc.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_hwloc.cpp
|
||||||
Kokkos_Serial.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
|
Kokkos_Serial.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
|
||||||
Kokkos_Serial_TaskPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_TaskPolicy.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_TaskPolicy.cpp
|
|
||||||
Kokkos_TaskQueue.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
|
|
||||||
Kokkos_Serial_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
|
Kokkos_Serial_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
|
||||||
Kokkos_Shape.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Shape.cpp
|
Kokkos_TaskQueue.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Shape.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
|
||||||
Kokkos_spinwait.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
|
Kokkos_spinwait.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
|
||||||
Kokkos_Profiling_Interface.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
|
Kokkos_Profiling_Interface.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
|
||||||
KokkosExp_SharedAlloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/KokkosExp_SharedAlloc.cpp
|
Kokkos_SharedAlloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_SharedAlloc.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/KokkosExp_SharedAlloc.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_SharedAlloc.cpp
|
||||||
Kokkos_MemoryPool.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp
|
Kokkos_MemoryPool.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp
|
||||||
|
|
||||||
@ -38,8 +34,6 @@ Kokkos_CudaSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cu
|
|||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_CudaSpace.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_CudaSpace.cpp
|
||||||
Kokkos_Cuda_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
|
Kokkos_Cuda_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
|
||||||
Kokkos_Cuda_TaskPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_TaskPolicy.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_TaskPolicy.cpp
|
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
|
||||||
@ -47,8 +41,6 @@ Kokkos_ThreadsExec_base.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads
|
|||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
|
||||||
Kokkos_ThreadsExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
|
Kokkos_ThreadsExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
|
||||||
Kokkos_Threads_TaskPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_TaskPolicy.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_Threads_TaskPolicy.cpp
|
|
||||||
endif
|
endif
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
|
ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
|
||||||
@ -67,6 +59,4 @@ endif
|
|||||||
|
|
||||||
Kokkos_HBWSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
|
Kokkos_HBWSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
|
||||||
Kokkos_HBWAllocators.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWAllocators.cpp
|
|
||||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWAllocators.cpp
|
|
||||||
|
|
||||||
|
|||||||
@ -45,31 +45,32 @@ Primary tested compilers on X86 are:
|
|||||||
Intel 14.0.4
|
Intel 14.0.4
|
||||||
Intel 15.0.2
|
Intel 15.0.2
|
||||||
Intel 16.0.1
|
Intel 16.0.1
|
||||||
|
Intel 17.0.098
|
||||||
Clang 3.5.2
|
Clang 3.5.2
|
||||||
Clang 3.6.1
|
Clang 3.6.1
|
||||||
|
Clang 3.9.0
|
||||||
|
|
||||||
Primary tested compilers on Power 8 are:
|
Primary tested compilers on Power 8 are:
|
||||||
IBM XL 13.1.3 (OpenMP,Serial)
|
GCC 5.4.0 (OpenMP,Serial)
|
||||||
GCC 4.9.2 (OpenMP,Serial)
|
IBM XL 13.1.3 (OpenMP, Serial) (There is a workaround in place to avoid a compiler bug)
|
||||||
GCC 5.3.0 (OpenMP,Serial)
|
|
||||||
|
Primary tested compilers on Intel KNL are:
|
||||||
|
Intel 16.2.181 (with gcc 4.7.2)
|
||||||
|
Intel 17.0.098 (with gcc 4.7.2)
|
||||||
|
|
||||||
Secondary tested compilers are:
|
Secondary tested compilers are:
|
||||||
CUDA 6.5 (with gcc 4.7.2)
|
|
||||||
CUDA 7.0 (with gcc 4.7.2)
|
CUDA 7.0 (with gcc 4.7.2)
|
||||||
CUDA 7.5 (with gcc 4.8.4)
|
CUDA 7.5 (with gcc 4.7.2)
|
||||||
|
CUDA 8.0 (with gcc 5.3.0 on X86 and gcc 5.4.0 on Power8)
|
||||||
|
CUDA/Clang 8.0 using Clang/Trunk compiler
|
||||||
|
|
||||||
Other compilers working:
|
Other compilers working:
|
||||||
X86:
|
X86:
|
||||||
Intel 17.0.042 (the FENL example causes internal compiler error)
|
|
||||||
PGI 15.4
|
PGI 15.4
|
||||||
Cygwin 2.1.0 64bit with gcc 4.9.3
|
Cygwin 2.1.0 64bit with gcc 4.9.3
|
||||||
KNL:
|
|
||||||
Intel 16.2.181 (the FENL example causes internal compiler error)
|
|
||||||
Intel 17.0.042 (the FENL example causes internal compiler error)
|
|
||||||
|
|
||||||
Known non-working combinations:
|
Known non-working combinations:
|
||||||
Power8:
|
Power8:
|
||||||
GCC 6.1.0
|
|
||||||
Pthreads backend
|
Pthreads backend
|
||||||
|
|
||||||
|
|
||||||
@ -92,9 +93,10 @@ master branch, without -Werror and only for a select set of backends.
|
|||||||
|
|
||||||
In the 'example/tutorial' directory you will find step by step tutorial
|
In the 'example/tutorial' directory you will find step by step tutorial
|
||||||
examples which explain many of the features of Kokkos. They work with
|
examples which explain many of the features of Kokkos. They work with
|
||||||
simple Makefiles. To build with g++ and OpenMP simply type 'make openmp'
|
simple Makefiles. To build with g++ and OpenMP simply type 'make'
|
||||||
in the 'example/tutorial' directory. This will build all examples in the
|
in the 'example/tutorial' directory. This will build all examples in the
|
||||||
subfolders.
|
subfolders. To change the build options refer to the Programming Guide
|
||||||
|
in the compilation section.
|
||||||
|
|
||||||
============================================================================
|
============================================================================
|
||||||
====Running Unit Tests======================================================
|
====Running Unit Tests======================================================
|
||||||
|
|||||||
@ -476,54 +476,54 @@ namespace Kokkos {
|
|||||||
};
|
};
|
||||||
|
|
||||||
template<class Generator>
|
template<class Generator>
|
||||||
struct rand<Generator, ::Kokkos::complex<float> > {
|
struct rand<Generator, Kokkos::complex<float> > {
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
static ::Kokkos::complex<float> max () {
|
static Kokkos::complex<float> max () {
|
||||||
return ::Kokkos::complex<float> (1.0, 1.0);
|
return Kokkos::complex<float> (1.0, 1.0);
|
||||||
}
|
}
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
static ::Kokkos::complex<float> draw (Generator& gen) {
|
static Kokkos::complex<float> draw (Generator& gen) {
|
||||||
const float re = gen.frand ();
|
const float re = gen.frand ();
|
||||||
const float im = gen.frand ();
|
const float im = gen.frand ();
|
||||||
return ::Kokkos::complex<float> (re, im);
|
return Kokkos::complex<float> (re, im);
|
||||||
}
|
}
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
static ::Kokkos::complex<float> draw (Generator& gen, const ::Kokkos::complex<float>& range) {
|
static Kokkos::complex<float> draw (Generator& gen, const Kokkos::complex<float>& range) {
|
||||||
const float re = gen.frand (real (range));
|
const float re = gen.frand (real (range));
|
||||||
const float im = gen.frand (imag (range));
|
const float im = gen.frand (imag (range));
|
||||||
return ::Kokkos::complex<float> (re, im);
|
return Kokkos::complex<float> (re, im);
|
||||||
}
|
}
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
static ::Kokkos::complex<float> draw (Generator& gen, const ::Kokkos::complex<float>& start, const ::Kokkos::complex<float>& end) {
|
static Kokkos::complex<float> draw (Generator& gen, const Kokkos::complex<float>& start, const Kokkos::complex<float>& end) {
|
||||||
const float re = gen.frand (real (start), real (end));
|
const float re = gen.frand (real (start), real (end));
|
||||||
const float im = gen.frand (imag (start), imag (end));
|
const float im = gen.frand (imag (start), imag (end));
|
||||||
return ::Kokkos::complex<float> (re, im);
|
return Kokkos::complex<float> (re, im);
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
template<class Generator>
|
template<class Generator>
|
||||||
struct rand<Generator, ::Kokkos::complex<double> > {
|
struct rand<Generator, Kokkos::complex<double> > {
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
static ::Kokkos::complex<double> max () {
|
static Kokkos::complex<double> max () {
|
||||||
return ::Kokkos::complex<double> (1.0, 1.0);
|
return Kokkos::complex<double> (1.0, 1.0);
|
||||||
}
|
}
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
static ::Kokkos::complex<double> draw (Generator& gen) {
|
static Kokkos::complex<double> draw (Generator& gen) {
|
||||||
const double re = gen.drand ();
|
const double re = gen.drand ();
|
||||||
const double im = gen.drand ();
|
const double im = gen.drand ();
|
||||||
return ::Kokkos::complex<double> (re, im);
|
return Kokkos::complex<double> (re, im);
|
||||||
}
|
}
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
static ::Kokkos::complex<double> draw (Generator& gen, const ::Kokkos::complex<double>& range) {
|
static Kokkos::complex<double> draw (Generator& gen, const Kokkos::complex<double>& range) {
|
||||||
const double re = gen.drand (real (range));
|
const double re = gen.drand (real (range));
|
||||||
const double im = gen.drand (imag (range));
|
const double im = gen.drand (imag (range));
|
||||||
return ::Kokkos::complex<double> (re, im);
|
return Kokkos::complex<double> (re, im);
|
||||||
}
|
}
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
static ::Kokkos::complex<double> draw (Generator& gen, const ::Kokkos::complex<double>& start, const ::Kokkos::complex<double>& end) {
|
static Kokkos::complex<double> draw (Generator& gen, const Kokkos::complex<double>& start, const Kokkos::complex<double>& end) {
|
||||||
const double re = gen.drand (real (start), real (end));
|
const double re = gen.drand (real (start), real (end));
|
||||||
const double im = gen.drand (imag (start), imag (end));
|
const double im = gen.drand (imag (start), imag (end));
|
||||||
return ::Kokkos::complex<double> (re, im);
|
return Kokkos::complex<double> (re, im);
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -670,8 +670,8 @@ namespace Kokkos {
|
|||||||
double S = 2.0;
|
double S = 2.0;
|
||||||
double U;
|
double U;
|
||||||
while(S>=1.0) {
|
while(S>=1.0) {
|
||||||
U = drand();
|
U = 2.0*drand() - 1.0;
|
||||||
const double V = drand();
|
const double V = 2.0*drand() - 1.0;
|
||||||
S = U*U+V*V;
|
S = U*U+V*V;
|
||||||
}
|
}
|
||||||
return U*sqrt(-2.0*log(S)/S);
|
return U*sqrt(-2.0*log(S)/S);
|
||||||
@ -910,8 +910,8 @@ namespace Kokkos {
|
|||||||
double S = 2.0;
|
double S = 2.0;
|
||||||
double U;
|
double U;
|
||||||
while(S>=1.0) {
|
while(S>=1.0) {
|
||||||
U = drand();
|
U = 2.0*drand() - 1.0;
|
||||||
const double V = drand();
|
const double V = 2.0*drand() - 1.0;
|
||||||
S = U*U+V*V;
|
S = U*U+V*V;
|
||||||
}
|
}
|
||||||
return U*sqrt(-2.0*log(S)/S);
|
return U*sqrt(-2.0*log(S)/S);
|
||||||
@ -1163,8 +1163,8 @@ namespace Kokkos {
|
|||||||
double S = 2.0;
|
double S = 2.0;
|
||||||
double U;
|
double U;
|
||||||
while(S>=1.0) {
|
while(S>=1.0) {
|
||||||
U = drand();
|
U = 2.0*drand() - 1.0;
|
||||||
const double V = drand();
|
const double V = 2.0*drand() - 1.0;
|
||||||
S = U*U+V*V;
|
S = U*U+V*V;
|
||||||
}
|
}
|
||||||
return U*sqrt(-2.0*log(S)/S);
|
return U*sqrt(-2.0*log(S)/S);
|
||||||
|
|||||||
@ -51,7 +51,7 @@
|
|||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
|
|
||||||
namespace SortImpl {
|
namespace Impl {
|
||||||
|
|
||||||
template<class ValuesViewType, int Rank=ValuesViewType::Rank>
|
template<class ValuesViewType, int Rank=ValuesViewType::Rank>
|
||||||
struct CopyOp;
|
struct CopyOp;
|
||||||
@ -199,7 +199,7 @@ public:
|
|||||||
|
|
||||||
parallel_for(values.dimension_0(),
|
parallel_for(values.dimension_0(),
|
||||||
bin_sort_sort_functor<ValuesViewType, offset_type,
|
bin_sort_sort_functor<ValuesViewType, offset_type,
|
||||||
SortImpl::CopyOp<ValuesViewType> >(values,sorted_values,sort_order));
|
Impl::CopyOp<ValuesViewType> >(values,sorted_values,sort_order));
|
||||||
|
|
||||||
deep_copy(values,sorted_values);
|
deep_copy(values,sorted_values);
|
||||||
}
|
}
|
||||||
@ -262,17 +262,15 @@ public:
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
namespace SortImpl {
|
|
||||||
|
|
||||||
template<class KeyViewType>
|
template<class KeyViewType>
|
||||||
struct DefaultBinOp1D {
|
struct BinOp1D {
|
||||||
const int max_bins_;
|
const int max_bins_;
|
||||||
const double mul_;
|
const double mul_;
|
||||||
typename KeyViewType::const_value_type range_;
|
typename KeyViewType::const_value_type range_;
|
||||||
typename KeyViewType::const_value_type min_;
|
typename KeyViewType::const_value_type min_;
|
||||||
|
|
||||||
//Construct BinOp with number of bins, minimum value and maxuimum value
|
//Construct BinOp with number of bins, minimum value and maxuimum value
|
||||||
DefaultBinOp1D(int max_bins__, typename KeyViewType::const_value_type min,
|
BinOp1D(int max_bins__, typename KeyViewType::const_value_type min,
|
||||||
typename KeyViewType::const_value_type max )
|
typename KeyViewType::const_value_type max )
|
||||||
:max_bins_(max_bins__+1),mul_(1.0*max_bins__/(max-min)),range_(max-min),min_(min) {}
|
:max_bins_(max_bins__+1),mul_(1.0*max_bins__/(max-min)),range_(max-min),min_(min) {}
|
||||||
|
|
||||||
@ -298,13 +296,13 @@ struct DefaultBinOp1D {
|
|||||||
};
|
};
|
||||||
|
|
||||||
template<class KeyViewType>
|
template<class KeyViewType>
|
||||||
struct DefaultBinOp3D {
|
struct BinOp3D {
|
||||||
int max_bins_[3];
|
int max_bins_[3];
|
||||||
double mul_[3];
|
double mul_[3];
|
||||||
typename KeyViewType::non_const_value_type range_[3];
|
typename KeyViewType::non_const_value_type range_[3];
|
||||||
typename KeyViewType::non_const_value_type min_[3];
|
typename KeyViewType::non_const_value_type min_[3];
|
||||||
|
|
||||||
DefaultBinOp3D(int max_bins__[], typename KeyViewType::const_value_type min[],
|
BinOp3D(int max_bins__[], typename KeyViewType::const_value_type min[],
|
||||||
typename KeyViewType::const_value_type max[] )
|
typename KeyViewType::const_value_type max[] )
|
||||||
{
|
{
|
||||||
max_bins_[0] = max_bins__[0]+1;
|
max_bins_[0] = max_bins__[0]+1;
|
||||||
@ -348,109 +346,11 @@ struct DefaultBinOp3D {
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
template<typename Scalar>
|
namespace Impl {
|
||||||
struct min_max {
|
|
||||||
Scalar min;
|
|
||||||
Scalar max;
|
|
||||||
bool init;
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
min_max() {
|
|
||||||
min = 0;
|
|
||||||
max = 0;
|
|
||||||
init = 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
min_max (const min_max& val) {
|
|
||||||
min = val.min;
|
|
||||||
max = val.max;
|
|
||||||
init = val.init;
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
min_max operator = (const min_max& val) {
|
|
||||||
min = val.min;
|
|
||||||
max = val.max;
|
|
||||||
init = val.init;
|
|
||||||
return *this;
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator+= (const Scalar& val) {
|
|
||||||
if(init) {
|
|
||||||
min = min<val?min:val;
|
|
||||||
max = max>val?max:val;
|
|
||||||
} else {
|
|
||||||
min = val;
|
|
||||||
max = val;
|
|
||||||
init = 1;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator+= (const min_max& val) {
|
|
||||||
if(init && val.init) {
|
|
||||||
min = min<val.min?min:val.min;
|
|
||||||
max = max>val.max?max:val.max;
|
|
||||||
} else {
|
|
||||||
if(val.init) {
|
|
||||||
min = val.min;
|
|
||||||
max = val.max;
|
|
||||||
init = 1;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator+= (volatile const Scalar& val) volatile {
|
|
||||||
if(init) {
|
|
||||||
min = min<val?min:val;
|
|
||||||
max = max>val?max:val;
|
|
||||||
} else {
|
|
||||||
min = val;
|
|
||||||
max = val;
|
|
||||||
init = 1;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator+= (volatile const min_max& val) volatile {
|
|
||||||
if(init && val.init) {
|
|
||||||
min = min<val.min?min:val.min;
|
|
||||||
max = max>val.max?max:val.max;
|
|
||||||
} else {
|
|
||||||
if(val.init) {
|
|
||||||
min = val.min;
|
|
||||||
max = val.max;
|
|
||||||
init = 1;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
|
|
||||||
template<class ViewType>
|
|
||||||
struct min_max_functor {
|
|
||||||
typedef typename ViewType::execution_space execution_space;
|
|
||||||
ViewType view;
|
|
||||||
typedef min_max<typename ViewType::non_const_value_type> value_type;
|
|
||||||
min_max_functor (const ViewType view_):view(view_) {
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator()(const size_t& i, value_type& val) const {
|
|
||||||
val += view(i);
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType>
|
template<class ViewType>
|
||||||
bool try_std_sort(ViewType view) {
|
bool try_std_sort(ViewType view) {
|
||||||
bool possible = true;
|
bool possible = true;
|
||||||
#if ! KOKKOS_USING_EXP_VIEW
|
|
||||||
size_t stride[8];
|
|
||||||
view.stride(stride);
|
|
||||||
#else
|
|
||||||
size_t stride[8] = { view.stride_0()
|
size_t stride[8] = { view.stride_0()
|
||||||
, view.stride_1()
|
, view.stride_1()
|
||||||
, view.stride_2()
|
, view.stride_2()
|
||||||
@ -460,8 +360,7 @@ bool try_std_sort(ViewType view) {
|
|||||||
, view.stride_6()
|
, view.stride_6()
|
||||||
, view.stride_7()
|
, view.stride_7()
|
||||||
};
|
};
|
||||||
#endif
|
possible = possible && std::is_same<typename ViewType::memory_space, HostSpace>::value;
|
||||||
possible = possible && Impl::is_same<typename ViewType::memory_space, HostSpace>::value;
|
|
||||||
possible = possible && (ViewType::Rank == 1);
|
possible = possible && (ViewType::Rank == 1);
|
||||||
possible = possible && (stride[0] == 1);
|
possible = possible && (stride[0] == 1);
|
||||||
if(possible) {
|
if(possible) {
|
||||||
@ -470,27 +369,39 @@ bool try_std_sort(ViewType view) {
|
|||||||
return possible;
|
return possible;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
template<class ViewType>
|
||||||
|
struct min_max_functor {
|
||||||
|
typedef Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> minmax_scalar;
|
||||||
|
|
||||||
|
ViewType view;
|
||||||
|
min_max_functor(const ViewType& view_):view(view_) {}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void operator() (const size_t& i, minmax_scalar& minmax) const {
|
||||||
|
if(view(i) < minmax.min_val) minmax.min_val = view(i);
|
||||||
|
if(view(i) > minmax.max_val) minmax.max_val = view(i);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
template<class ViewType>
|
template<class ViewType>
|
||||||
void sort(ViewType view, bool always_use_kokkos_sort = false) {
|
void sort(ViewType view, bool always_use_kokkos_sort = false) {
|
||||||
if(!always_use_kokkos_sort) {
|
if(!always_use_kokkos_sort) {
|
||||||
if(SortImpl::try_std_sort(view)) return;
|
if(Impl::try_std_sort(view)) return;
|
||||||
}
|
}
|
||||||
|
typedef BinOp1D<ViewType> CompType;
|
||||||
|
|
||||||
typedef SortImpl::DefaultBinOp1D<ViewType> CompType;
|
Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> result;
|
||||||
SortImpl::min_max<typename ViewType::non_const_value_type> val;
|
Kokkos::Experimental::MinMax<typename ViewType::non_const_value_type> reducer(result);
|
||||||
parallel_reduce(view.dimension_0(),SortImpl::min_max_functor<ViewType>(view),val);
|
parallel_reduce(Kokkos::RangePolicy<typename ViewType::execution_space>(0,view.dimension_0()),
|
||||||
BinSort<ViewType, CompType> bin_sort(view,CompType(view.dimension_0()/2,val.min,val.max),true);
|
Impl::min_max_functor<ViewType>(view),reducer);
|
||||||
|
if(result.min_val == result.max_val) return;
|
||||||
|
BinSort<ViewType, CompType> bin_sort(view,CompType(view.dimension_0()/2,result.min_val,result.max_val),true);
|
||||||
bin_sort.create_permute_vector();
|
bin_sort.create_permute_vector();
|
||||||
bin_sort.sort(view);
|
bin_sort.sort(view);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*template<class ViewType, class Comparator>
|
|
||||||
void sort(ViewType view, Comparator comp, bool always_use_kokkos_sort = false) {
|
|
||||||
|
|
||||||
}*/
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#endif
|
#endif
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
|
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
|
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
||||||
|
|
||||||
SET(SOURCES
|
SET(SOURCES
|
||||||
|
|||||||
@ -7,21 +7,18 @@ vpath %.cpp ${KOKKOS_PATH}/algorithms/unit_tests
|
|||||||
default: build_all
|
default: build_all
|
||||||
echo "End Build"
|
echo "End Build"
|
||||||
|
|
||||||
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
|
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||||
|
else
|
||||||
|
CXX = g++
|
||||||
|
endif
|
||||||
|
|
||||||
|
CXXFLAGS = -O3
|
||||||
|
LINK ?= $(CXX)
|
||||||
|
LDFLAGS ?= -lpthread
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
|
||||||
CXX = $(NVCC_WRAPPER)
|
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK = $(CXX)
|
|
||||||
LDFLAGS ?= -lpthread
|
|
||||||
else
|
|
||||||
CXX ?= g++
|
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK ?= $(CXX)
|
|
||||||
LDFLAGS ?= -lpthread
|
|
||||||
endif
|
|
||||||
|
|
||||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/algorithms/unit_tests
|
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/algorithms/unit_tests
|
||||||
|
|
||||||
TEST_TARGETS =
|
TEST_TARGETS =
|
||||||
|
|||||||
@ -131,6 +131,10 @@ void test_1D_sort(unsigned int n,bool force_kokkos) {
|
|||||||
typedef Kokkos::View<KeyType*,ExecutionSpace> KeyViewType;
|
typedef Kokkos::View<KeyType*,ExecutionSpace> KeyViewType;
|
||||||
KeyViewType keys("Keys",n);
|
KeyViewType keys("Keys",n);
|
||||||
|
|
||||||
|
// Test sorting array with all numbers equal
|
||||||
|
Kokkos::deep_copy(keys,KeyType(1));
|
||||||
|
Kokkos::sort(keys,force_kokkos);
|
||||||
|
|
||||||
Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
|
Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
|
||||||
Kokkos::fill_random(keys,g,Kokkos::Random_XorShift64_Pool<ExecutionSpace>::generator_type::MAX_URAND);
|
Kokkos::fill_random(keys,g,Kokkos::Random_XorShift64_Pool<ExecutionSpace>::generator_type::MAX_URAND);
|
||||||
|
|
||||||
@ -174,7 +178,7 @@ void test_3D_sort(unsigned int n) {
|
|||||||
typename KeyViewType::value_type min[3] = {0,0,0};
|
typename KeyViewType::value_type min[3] = {0,0,0};
|
||||||
typename KeyViewType::value_type max[3] = {100,100,100};
|
typename KeyViewType::value_type max[3] = {100,100,100};
|
||||||
|
|
||||||
typedef Kokkos::SortImpl::DefaultBinOp3D< KeyViewType > BinOp;
|
typedef Kokkos::BinOp3D< KeyViewType > BinOp;
|
||||||
BinOp bin_op(bin_max,min,max);
|
BinOp bin_op(bin_max,min,max);
|
||||||
Kokkos::BinSort< KeyViewType , BinOp >
|
Kokkos::BinSort< KeyViewType , BinOp >
|
||||||
Sorter(keys,bin_op,false);
|
Sorter(keys,bin_op,false);
|
||||||
|
|||||||
43
lib/kokkos/benchmarks/bytes_and_flops/Makefile
Normal file
43
lib/kokkos/benchmarks/bytes_and_flops/Makefile
Normal file
@ -0,0 +1,43 @@
|
|||||||
|
KOKKOS_PATH = ${HOME}/kokkos
|
||||||
|
SRC = $(wildcard *.cpp)
|
||||||
|
KOKKOS_DEVICES=Cuda
|
||||||
|
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||||
|
|
||||||
|
default: build
|
||||||
|
echo "Start Build"
|
||||||
|
|
||||||
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
|
CXX = ${KOKKOS_PATH}/config/nvcc_wrapper
|
||||||
|
EXE = bytes_and_flops.cuda
|
||||||
|
KOKKOS_DEVICES = "Cuda,OpenMP"
|
||||||
|
KOKKOS_ARCH = "SNB,Kepler35"
|
||||||
|
else
|
||||||
|
CXX = g++
|
||||||
|
EXE = bytes_and_flops.host
|
||||||
|
KOKKOS_DEVICES = "OpenMP"
|
||||||
|
KOKKOS_ARCH = "SNB"
|
||||||
|
endif
|
||||||
|
|
||||||
|
CXXFLAGS = -O3 -g
|
||||||
|
|
||||||
|
DEPFLAGS = -M
|
||||||
|
LINK = ${CXX}
|
||||||
|
LINKFLAGS =
|
||||||
|
|
||||||
|
OBJ = $(SRC:.cpp=.o)
|
||||||
|
LIB =
|
||||||
|
|
||||||
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
|
|
||||||
|
build: $(EXE)
|
||||||
|
|
||||||
|
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
||||||
|
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
||||||
|
|
||||||
|
clean: kokkos-clean
|
||||||
|
rm -f *.o *.cuda *.host
|
||||||
|
|
||||||
|
# Compilation rules
|
||||||
|
|
||||||
|
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) bench.hpp bench_unroll_stride.hpp bench_stride.hpp
|
||||||
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
||||||
99
lib/kokkos/benchmarks/bytes_and_flops/bench.hpp
Normal file
99
lib/kokkos/benchmarks/bytes_and_flops/bench.hpp
Normal file
@ -0,0 +1,99 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include<Kokkos_Core.hpp>
|
||||||
|
#include<impl/Kokkos_Timer.hpp>
|
||||||
|
|
||||||
|
template<class Scalar, int Unroll,int Stride>
|
||||||
|
struct Run {
|
||||||
|
static void run(int N, int K, int R, int F, int T, int S);
|
||||||
|
};
|
||||||
|
|
||||||
|
template<class Scalar, int Stride>
|
||||||
|
struct RunStride {
|
||||||
|
static void run_1(int N, int K, int R, int F, int T, int S);
|
||||||
|
static void run_2(int N, int K, int R, int F, int T, int S);
|
||||||
|
static void run_3(int N, int K, int R, int F, int T, int S);
|
||||||
|
static void run_4(int N, int K, int R, int F, int T, int S);
|
||||||
|
static void run_5(int N, int K, int R, int F, int T, int S);
|
||||||
|
static void run_6(int N, int K, int R, int F, int T, int S);
|
||||||
|
static void run_7(int N, int K, int R, int F, int T, int S);
|
||||||
|
static void run_8(int N, int K, int R, int F, int T, int S);
|
||||||
|
static void run(int N, int K, int R, int U, int F, int T, int S);
|
||||||
|
};
|
||||||
|
|
||||||
|
#define STRIDE 1
|
||||||
|
#include<bench_stride.hpp>
|
||||||
|
#undef STRIDE
|
||||||
|
#define STRIDE 2
|
||||||
|
#include<bench_stride.hpp>
|
||||||
|
#undef STRIDE
|
||||||
|
#define STRIDE 4
|
||||||
|
#include<bench_stride.hpp>
|
||||||
|
#undef STRIDE
|
||||||
|
#define STRIDE 8
|
||||||
|
#include<bench_stride.hpp>
|
||||||
|
#undef STRIDE
|
||||||
|
#define STRIDE 16
|
||||||
|
#include<bench_stride.hpp>
|
||||||
|
#undef STRIDE
|
||||||
|
#define STRIDE 32
|
||||||
|
#include<bench_stride.hpp>
|
||||||
|
#undef STRIDE
|
||||||
|
|
||||||
|
template<class Scalar>
|
||||||
|
void run_stride_unroll(int N, int K, int R, int D, int U, int F, int T, int S) {
|
||||||
|
if(D == 1)
|
||||||
|
RunStride<Scalar,1>::run(N,K,R,U,F,T,S);
|
||||||
|
if(D == 2)
|
||||||
|
RunStride<Scalar,2>::run(N,K,R,U,F,T,S);
|
||||||
|
if(D == 4)
|
||||||
|
RunStride<Scalar,4>::run(N,K,R,U,F,T,S);
|
||||||
|
if(D == 8)
|
||||||
|
RunStride<Scalar,8>::run(N,K,R,U,F,T,S);
|
||||||
|
if(D == 16)
|
||||||
|
RunStride<Scalar,16>::run(N,K,R,U,F,T,S);
|
||||||
|
if(D == 32)
|
||||||
|
RunStride<Scalar,32>::run(N,K,R,U,F,T,S);
|
||||||
|
}
|
||||||
|
|
||||||
124
lib/kokkos/benchmarks/bytes_and_flops/bench_stride.hpp
Normal file
124
lib/kokkos/benchmarks/bytes_and_flops/bench_stride.hpp
Normal file
@ -0,0 +1,124 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
#define UNROLL 1
|
||||||
|
#include<bench_unroll_stride.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 2
|
||||||
|
#include<bench_unroll_stride.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 3
|
||||||
|
#include<bench_unroll_stride.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 4
|
||||||
|
#include<bench_unroll_stride.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 5
|
||||||
|
#include<bench_unroll_stride.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 6
|
||||||
|
#include<bench_unroll_stride.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 7
|
||||||
|
#include<bench_unroll_stride.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 8
|
||||||
|
#include<bench_unroll_stride.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
|
||||||
|
template<class Scalar>
|
||||||
|
struct RunStride<Scalar,STRIDE> {
|
||||||
|
static void run_1(int N, int K, int R, int F, int T, int S) {
|
||||||
|
Run<Scalar,1,STRIDE>::run(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
static void run_2(int N, int K, int R, int F, int T, int S) {
|
||||||
|
Run<Scalar,2,STRIDE>::run(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
static void run_3(int N, int K, int R, int F, int T, int S) {
|
||||||
|
Run<Scalar,3,STRIDE>::run(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
static void run_4(int N, int K, int R, int F, int T, int S) {
|
||||||
|
Run<Scalar,4,STRIDE>::run(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
static void run_5(int N, int K, int R, int F, int T, int S) {
|
||||||
|
Run<Scalar,5,STRIDE>::run(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
static void run_6(int N, int K, int R, int F, int T, int S) {
|
||||||
|
Run<Scalar,6,STRIDE>::run(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
static void run_7(int N, int K, int R, int F, int T, int S) {
|
||||||
|
Run<Scalar,7,STRIDE>::run(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
static void run_8(int N, int K, int R, int F, int T, int S) {
|
||||||
|
Run<Scalar,8,STRIDE>::run(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void run(int N, int K, int R, int U, int F, int T, int S) {
|
||||||
|
if(U==1) {
|
||||||
|
run_1(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
if(U==2) {
|
||||||
|
run_2(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
if(U==3) {
|
||||||
|
run_3(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
if(U==4) {
|
||||||
|
run_4(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
if(U==5) {
|
||||||
|
run_5(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
if(U==6) {
|
||||||
|
run_6(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
if(U==7) {
|
||||||
|
run_7(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
if(U==8) {
|
||||||
|
run_8(N,K,R,F,T,S);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
148
lib/kokkos/benchmarks/bytes_and_flops/bench_unroll_stride.hpp
Normal file
148
lib/kokkos/benchmarks/bytes_and_flops/bench_unroll_stride.hpp
Normal file
@ -0,0 +1,148 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
template<class Scalar>
|
||||||
|
struct Run<Scalar,UNROLL,STRIDE> {
|
||||||
|
static void run(int N, int K, int R, int F, int T, int S) {
|
||||||
|
Kokkos::View<Scalar**[STRIDE],Kokkos::LayoutRight> A("A",N,K);
|
||||||
|
Kokkos::View<Scalar**[STRIDE],Kokkos::LayoutRight> B("B",N,K);
|
||||||
|
Kokkos::View<Scalar**[STRIDE],Kokkos::LayoutRight> C("C",N,K);
|
||||||
|
|
||||||
|
Kokkos::deep_copy(A,Scalar(1.5));
|
||||||
|
Kokkos::deep_copy(B,Scalar(2.5));
|
||||||
|
Kokkos::deep_copy(C,Scalar(3.5));
|
||||||
|
|
||||||
|
Kokkos::Timer timer;
|
||||||
|
Kokkos::parallel_for("BenchmarkKernel",Kokkos::TeamPolicy<>(N,T).set_scratch_size(0,Kokkos::PerTeam(S)),
|
||||||
|
KOKKOS_LAMBDA ( const Kokkos::TeamPolicy<>::member_type& team) {
|
||||||
|
const int n = team.league_rank();
|
||||||
|
for(int r=0; r<R; r++) {
|
||||||
|
Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,K), [&] (const int& i) {
|
||||||
|
Scalar a1 = A(n,i,0);
|
||||||
|
const Scalar b = B(n,i,0);
|
||||||
|
#if(UNROLL>1)
|
||||||
|
Scalar a2 = a1*1.3;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>2)
|
||||||
|
Scalar a3 = a2*1.1;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>3)
|
||||||
|
Scalar a4 = a3*1.1;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>4)
|
||||||
|
Scalar a5 = a4*1.3;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>5)
|
||||||
|
Scalar a6 = a5*1.1;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>6)
|
||||||
|
Scalar a7 = a6*1.1;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>7)
|
||||||
|
Scalar a8 = a7*1.1;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
for(int f = 0; f<F; f++) {
|
||||||
|
a1 += b*a1;
|
||||||
|
#if(UNROLL>1)
|
||||||
|
a2 += b*a2;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>2)
|
||||||
|
a3 += b*a3;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>3)
|
||||||
|
a4 += b*a4;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>4)
|
||||||
|
a5 += b*a5;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>5)
|
||||||
|
a6 += b*a6;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>6)
|
||||||
|
a7 += b*a7;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>7)
|
||||||
|
a8 += b*a8;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
}
|
||||||
|
#if(UNROLL==1)
|
||||||
|
C(n,i,0) = a1;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==2)
|
||||||
|
C(n,i,0) = a1+a2;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==3)
|
||||||
|
C(n,i,0) = a1+a2+a3;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==4)
|
||||||
|
C(n,i,0) = a1+a2+a3+a4;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==5)
|
||||||
|
C(n,i,0) = a1+a2+a3+a4+a5;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==6)
|
||||||
|
C(n,i,0) = a1+a2+a3+a4+a5+a6;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==7)
|
||||||
|
C(n,i,0) = a1+a2+a3+a4+a5+a6+a7;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==8)
|
||||||
|
C(n,i,0) = a1+a2+a3+a4+a5+a6+a7+a8;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
Kokkos::fence();
|
||||||
|
double seconds = timer.seconds();
|
||||||
|
|
||||||
|
double bytes = 1.0*N*K*R*3*sizeof(Scalar);
|
||||||
|
double flops = 1.0*N*K*R*(F*2*UNROLL + 2*(UNROLL-1));
|
||||||
|
printf("NKRUFTS: %i %i %i %i %i %i %i Time: %lfs Bandwidth: %lfGiB/s GFlop/s: %lf\n",N,K,R,UNROLL,F,T,S,seconds,1.0*bytes/seconds/1024/1024/1024,1.e-9*flops/seconds);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
96
lib/kokkos/benchmarks/bytes_and_flops/main.cpp
Normal file
96
lib/kokkos/benchmarks/bytes_and_flops/main.cpp
Normal file
@ -0,0 +1,96 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include<Kokkos_Core.hpp>
|
||||||
|
#include<impl/Kokkos_Timer.hpp>
|
||||||
|
#include<bench.hpp>
|
||||||
|
|
||||||
|
int main(int argc, char* argv[]) {
|
||||||
|
Kokkos::initialize();
|
||||||
|
|
||||||
|
|
||||||
|
if(argc<10) {
|
||||||
|
printf("Arguments: N K R D U F T S\n");
|
||||||
|
printf(" P: Precision (1==float, 2==double)\n");
|
||||||
|
printf(" N,K: dimensions of the 2D array to allocate\n");
|
||||||
|
printf(" R: how often to loop through the K dimension with each team\n");
|
||||||
|
printf(" D: distance between loaded elements (stride)\n");
|
||||||
|
printf(" U: how many independent flops to do per load\n");
|
||||||
|
printf(" F: how many times to repeat the U unrolled operations before reading next element\n");
|
||||||
|
printf(" T: team size\n");
|
||||||
|
printf(" S: shared memory per team (used to control occupancy on GPUs)\n");
|
||||||
|
printf("Example Input GPU:\n");
|
||||||
|
printf(" Bandwidth Bound : 2 100000 1024 1 1 1 1 256 6000\n");
|
||||||
|
printf(" Cache Bound : 2 100000 1024 64 1 1 1 512 20000\n");
|
||||||
|
printf(" Compute Bound : 2 100000 1024 1 1 8 64 256 6000\n");
|
||||||
|
printf(" Load Slots Used : 2 20000 256 32 16 1 1 256 6000\n");
|
||||||
|
printf(" Inefficient Load: 2 20000 256 32 2 1 1 256 20000\n");
|
||||||
|
Kokkos::finalize();
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int P = atoi(argv[1]);
|
||||||
|
int N = atoi(argv[2]);
|
||||||
|
int K = atoi(argv[3]);
|
||||||
|
int R = atoi(argv[4]);
|
||||||
|
int D = atoi(argv[5]);
|
||||||
|
int U = atoi(argv[6]);
|
||||||
|
int F = atoi(argv[7]);
|
||||||
|
int T = atoi(argv[8]);
|
||||||
|
int S = atoi(argv[9]);
|
||||||
|
|
||||||
|
if(U>8) {printf("U must be 1-8\n"); return 0;}
|
||||||
|
if( (D!=1) && (D!=2) && (D!=4) && (D!=8) && (D!=16) && (D!=32)) {printf("D must be one of 1,2,4,8,16,32\n"); return 0;}
|
||||||
|
if( (P!=1) && (P!=2) ) {printf("P must be one of 1,2\n"); return 0;}
|
||||||
|
|
||||||
|
if(P==1) {
|
||||||
|
run_stride_unroll<float>(N,K,R,D,U,F,T,S);
|
||||||
|
}
|
||||||
|
if(P==2) {
|
||||||
|
run_stride_unroll<double>(N,K,R,D,U,F,T,S);
|
||||||
|
}
|
||||||
|
|
||||||
|
Kokkos::finalize();
|
||||||
|
}
|
||||||
|
|
||||||
44
lib/kokkos/benchmarks/gather/Makefile
Normal file
44
lib/kokkos/benchmarks/gather/Makefile
Normal file
@ -0,0 +1,44 @@
|
|||||||
|
KOKKOS_PATH = ${HOME}/kokkos
|
||||||
|
SRC = $(wildcard *.cpp)
|
||||||
|
KOKKOS_DEVICES=Cuda
|
||||||
|
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||||
|
|
||||||
|
default: build
|
||||||
|
echo "Start Build"
|
||||||
|
|
||||||
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
|
CXX = ${KOKKOS_PATH}/config/nvcc_wrapper
|
||||||
|
EXE = gather.cuda
|
||||||
|
KOKKOS_DEVICES = "Cuda,OpenMP"
|
||||||
|
KOKKOS_ARCH = "SNB,Kepler35"
|
||||||
|
else
|
||||||
|
CXX = g++
|
||||||
|
EXE = gather.host
|
||||||
|
KOKKOS_DEVICES = "OpenMP"
|
||||||
|
KOKKOS_ARCH = "SNB"
|
||||||
|
endif
|
||||||
|
|
||||||
|
CXXFLAGS = -O3 -g
|
||||||
|
|
||||||
|
DEPFLAGS = -M
|
||||||
|
LINK = ${CXX}
|
||||||
|
LINKFLAGS =
|
||||||
|
|
||||||
|
OBJ = $(SRC:.cpp=.o)
|
||||||
|
LIB =
|
||||||
|
|
||||||
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
|
|
||||||
|
$(warning ${KOKKOS_CPPFLAGS})
|
||||||
|
build: $(EXE)
|
||||||
|
|
||||||
|
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
||||||
|
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
||||||
|
|
||||||
|
clean: kokkos-clean
|
||||||
|
rm -f *.o *.cuda *.host
|
||||||
|
|
||||||
|
# Compilation rules
|
||||||
|
|
||||||
|
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) gather_unroll.hpp gather.hpp
|
||||||
|
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
||||||
92
lib/kokkos/benchmarks/gather/gather.hpp
Normal file
92
lib/kokkos/benchmarks/gather/gather.hpp
Normal file
@ -0,0 +1,92 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
template<class Scalar, int UNROLL>
|
||||||
|
struct RunGather {
|
||||||
|
static void run(int N, int K, int D, int R, int F);
|
||||||
|
};
|
||||||
|
|
||||||
|
#define UNROLL 1
|
||||||
|
#include<gather_unroll.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 2
|
||||||
|
#include<gather_unroll.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 3
|
||||||
|
#include<gather_unroll.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 4
|
||||||
|
#include<gather_unroll.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 5
|
||||||
|
#include<gather_unroll.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 6
|
||||||
|
#include<gather_unroll.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 7
|
||||||
|
#include<gather_unroll.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
#define UNROLL 8
|
||||||
|
#include<gather_unroll.hpp>
|
||||||
|
#undef UNROLL
|
||||||
|
|
||||||
|
template<class Scalar>
|
||||||
|
void run_gather_test(int N, int K, int D, int R, int U, int F) {
|
||||||
|
if(U == 1)
|
||||||
|
RunGather<Scalar,1>::run(N,K,D,R,F);
|
||||||
|
if(U == 2)
|
||||||
|
RunGather<Scalar,2>::run(N,K,D,R,F);
|
||||||
|
if(U == 3)
|
||||||
|
RunGather<Scalar,3>::run(N,K,D,R,F);
|
||||||
|
if(U == 4)
|
||||||
|
RunGather<Scalar,4>::run(N,K,D,R,F);
|
||||||
|
if(U == 5)
|
||||||
|
RunGather<Scalar,5>::run(N,K,D,R,F);
|
||||||
|
if(U == 6)
|
||||||
|
RunGather<Scalar,6>::run(N,K,D,R,F);
|
||||||
|
if(U == 7)
|
||||||
|
RunGather<Scalar,7>::run(N,K,D,R,F);
|
||||||
|
if(U == 8)
|
||||||
|
RunGather<Scalar,8>::run(N,K,D,R,F);
|
||||||
|
}
|
||||||
169
lib/kokkos/benchmarks/gather/gather_unroll.hpp
Normal file
169
lib/kokkos/benchmarks/gather/gather_unroll.hpp
Normal file
@ -0,0 +1,169 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include<Kokkos_Core.hpp>
|
||||||
|
#include<Kokkos_Random.hpp>
|
||||||
|
|
||||||
|
template<class Scalar>
|
||||||
|
struct RunGather<Scalar,UNROLL> {
|
||||||
|
static void run(int N, int K, int D, int R, int F) {
|
||||||
|
Kokkos::View<int**> connectivity("Connectivity",N,K);
|
||||||
|
Kokkos::View<Scalar*> A_in("Input",N);
|
||||||
|
Kokkos::View<Scalar*> B_in("Input",N);
|
||||||
|
Kokkos::View<Scalar*> C("Output",N);
|
||||||
|
|
||||||
|
Kokkos::Random_XorShift64_Pool<> rand_pool(12313);
|
||||||
|
|
||||||
|
Kokkos::deep_copy(A_in,1.5);
|
||||||
|
Kokkos::deep_copy(B_in,2.0);
|
||||||
|
|
||||||
|
Kokkos::View<const Scalar*, Kokkos::MemoryTraits<Kokkos::RandomAccess> > A(A_in);
|
||||||
|
Kokkos::View<const Scalar*, Kokkos::MemoryTraits<Kokkos::RandomAccess> > B(B_in);
|
||||||
|
|
||||||
|
Kokkos::parallel_for("InitKernel",N,
|
||||||
|
KOKKOS_LAMBDA (const int& i) {
|
||||||
|
auto rand_gen = rand_pool.get_state();
|
||||||
|
for( int jj=0; jj<K; jj++) {
|
||||||
|
connectivity(i,jj) = (rand_gen.rand(D) + i - D/2 + N)%N;
|
||||||
|
}
|
||||||
|
rand_pool.free_state(rand_gen);
|
||||||
|
});
|
||||||
|
Kokkos::fence();
|
||||||
|
|
||||||
|
|
||||||
|
Kokkos::Timer timer;
|
||||||
|
for(int r = 0; r<R; r++) {
|
||||||
|
Kokkos::parallel_for("BenchmarkKernel",N,
|
||||||
|
KOKKOS_LAMBDA (const int& i) {
|
||||||
|
Scalar c = Scalar(0.0);
|
||||||
|
for( int jj=0; jj<K; jj++) {
|
||||||
|
const int j = connectivity(i,jj);
|
||||||
|
Scalar a1 = A(j);
|
||||||
|
const Scalar b = B(j);
|
||||||
|
#if(UNROLL>1)
|
||||||
|
Scalar a2 = a1*Scalar(1.3);
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>2)
|
||||||
|
Scalar a3 = a2*Scalar(1.1);
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>3)
|
||||||
|
Scalar a4 = a3*Scalar(1.1);
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>4)
|
||||||
|
Scalar a5 = a4*Scalar(1.3);
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>5)
|
||||||
|
Scalar a6 = a5*Scalar(1.1);
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>6)
|
||||||
|
Scalar a7 = a6*Scalar(1.1);
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>7)
|
||||||
|
Scalar a8 = a7*Scalar(1.1);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
for(int f = 0; f<F; f++) {
|
||||||
|
a1 += b*a1;
|
||||||
|
#if(UNROLL>1)
|
||||||
|
a2 += b*a2;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>2)
|
||||||
|
a3 += b*a3;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>3)
|
||||||
|
a4 += b*a4;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>4)
|
||||||
|
a5 += b*a5;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>5)
|
||||||
|
a6 += b*a6;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>6)
|
||||||
|
a7 += b*a7;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL>7)
|
||||||
|
a8 += b*a8;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
}
|
||||||
|
#if(UNROLL==1)
|
||||||
|
c += a1;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==2)
|
||||||
|
c += a1+a2;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==3)
|
||||||
|
c += a1+a2+a3;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==4)
|
||||||
|
c += a1+a2+a3+a4;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==5)
|
||||||
|
c += a1+a2+a3+a4+a5;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==6)
|
||||||
|
c += a1+a2+a3+a4+a5+a6;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==7)
|
||||||
|
c += a1+a2+a3+a4+a5+a6+a7;
|
||||||
|
#endif
|
||||||
|
#if(UNROLL==8)
|
||||||
|
c += a1+a2+a3+a4+a5+a6+a7+a8;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
}
|
||||||
|
C(i) = c ;
|
||||||
|
});
|
||||||
|
Kokkos::fence();
|
||||||
|
}
|
||||||
|
double seconds = timer.seconds();
|
||||||
|
|
||||||
|
double bytes = 1.0*N*K*R*(2*sizeof(Scalar)+sizeof(int)) + 1.0*N*R*sizeof(Scalar);
|
||||||
|
double flops = 1.0*N*K*R*(F*2*UNROLL + 2*(UNROLL-1));
|
||||||
|
double gather_ops = 1.0*N*K*R*2;
|
||||||
|
printf("SNKDRUF: %i %i %i %i %i %i %i Time: %lfs Bandwidth: %lfGiB/s GFlop/s: %lf GGather/s: %lf\n",sizeof(Scalar)/4,N,K,D,R,UNROLL,F,seconds,1.0*bytes/seconds/1024/1024/1024,1.e-9*flops/seconds,1.e-9*gather_ops/seconds);
|
||||||
|
}
|
||||||
|
};
|
||||||
@ -41,68 +41,53 @@
|
|||||||
//@HEADER
|
//@HEADER
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#include <Kokkos_HostSpace.hpp>
|
#include<Kokkos_Core.hpp>
|
||||||
|
#include<impl/Kokkos_Timer.hpp>
|
||||||
|
#include<gather.hpp>
|
||||||
|
|
||||||
#include <impl/Kokkos_HBWAllocators.hpp>
|
int main(int argc, char* argv[]) {
|
||||||
#include <impl/Kokkos_Error.hpp>
|
Kokkos::initialize(argc,argv);
|
||||||
|
|
||||||
|
|
||||||
#include <stdint.h> // uintptr_t
|
if(argc<8) {
|
||||||
#include <cstdlib> // for malloc, realloc, and free
|
printf("Arguments: S N K D\n");
|
||||||
#include <cstring> // for memcpy
|
printf(" S: Scalar Type Size (1==float, 2==double, 4=complex<double>)\n");
|
||||||
|
printf(" N: Number of entities\n");
|
||||||
#if defined(KOKKOS_POSIX_MEMALIGN_AVAILABLE)
|
printf(" K: Number of things to gather per entity\n");
|
||||||
#include <sys/mman.h> // for mmap, munmap, MAP_ANON, etc
|
printf(" D: Max distance of gathered things of an entity\n");
|
||||||
#include <unistd.h> // for sysconf, _SC_PAGE_SIZE, _SC_PHYS_PAGES
|
printf(" R: how often to loop through the K dimension with each team\n");
|
||||||
#endif
|
printf(" U: how many independent flops to do per load\n");
|
||||||
|
printf(" F: how many times to repeat the U unrolled operations before reading next element\n");
|
||||||
#include <sstream>
|
printf("Example Input GPU:\n");
|
||||||
#include <iostream>
|
printf(" Bandwidth Bound : 2 10000000 1 1 10 1 1\n");
|
||||||
|
printf(" Cache Bound : 2 10000000 64 1 10 1 1\n");
|
||||||
#ifdef KOKKOS_HAVE_HBWSPACE
|
printf(" Cache Gather : 2 10000000 64 256 10 1 1\n");
|
||||||
#include <memkind.h>
|
printf(" Global Gather : 2 100000000 16 100000000 1 1 1\n");
|
||||||
|
printf(" Typical MD : 2 100000 32 512 1000 8 2\n");
|
||||||
namespace Kokkos {
|
Kokkos::finalize();
|
||||||
namespace Experimental {
|
return 0;
|
||||||
namespace Impl {
|
|
||||||
#define MEMKIND_TYPE MEMKIND_HBW //hbw_get_kind(HBW_PAGESIZE_4KB)
|
|
||||||
/*--------------------------------------------------------------------------*/
|
|
||||||
|
|
||||||
void* HBWMallocAllocator::allocate( size_t size )
|
|
||||||
{
|
|
||||||
std::cout<< "Allocate HBW: " << 1.0e-6*size << "MB" << std::endl;
|
|
||||||
void * ptr = NULL;
|
|
||||||
if (size) {
|
|
||||||
ptr = memkind_malloc(MEMKIND_TYPE,size);
|
|
||||||
|
|
||||||
if (!ptr)
|
|
||||||
{
|
|
||||||
std::ostringstream msg ;
|
|
||||||
msg << name() << ": allocate(" << size << ") FAILED";
|
|
||||||
Kokkos::Impl::throw_runtime_exception( msg.str() );
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
return ptr;
|
|
||||||
|
|
||||||
|
int S = atoi(argv[1]);
|
||||||
|
int N = atoi(argv[2]);
|
||||||
|
int K = atoi(argv[3]);
|
||||||
|
int D = atoi(argv[4]);
|
||||||
|
int R = atoi(argv[5]);
|
||||||
|
int U = atoi(argv[6]);
|
||||||
|
int F = atoi(argv[7]);
|
||||||
|
|
||||||
|
if( (S!=1) && (S!=2) && (S!=4)) {printf("S must be one of 1,2,4\n"); return 0;}
|
||||||
|
if( N<D ) {printf("N must be larger or equal to D\n"); return 0; }
|
||||||
|
if(S==1) {
|
||||||
|
run_gather_test<float>(N,K,D,R,U,F);
|
||||||
|
}
|
||||||
|
if(S==2) {
|
||||||
|
run_gather_test<double>(N,K,D,R,U,F);
|
||||||
|
}
|
||||||
|
if(S==4) {
|
||||||
|
run_gather_test<Kokkos::complex<double> >(N,K,D,R,U,F);
|
||||||
|
}
|
||||||
|
Kokkos::finalize();
|
||||||
}
|
}
|
||||||
|
|
||||||
void HBWMallocAllocator::deallocate( void * ptr, size_t /*size*/ )
|
|
||||||
{
|
|
||||||
if (ptr) {
|
|
||||||
memkind_free(MEMKIND_TYPE,ptr);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
void * HBWMallocAllocator::reallocate(void * old_ptr, size_t /*old_size*/, size_t new_size)
|
|
||||||
{
|
|
||||||
void * ptr = memkind_realloc(MEMKIND_TYPE, old_ptr, new_size);
|
|
||||||
|
|
||||||
if (new_size > 0u && ptr == NULL) {
|
|
||||||
Kokkos::Impl::throw_runtime_exception("Error: Malloc Allocator could not reallocate memory");
|
|
||||||
}
|
|
||||||
return ptr;
|
|
||||||
}
|
|
||||||
|
|
||||||
} // namespace Impl
|
|
||||||
} // namespace Experimental
|
|
||||||
} // namespace Kokkos
|
|
||||||
#endif
|
|
||||||
284
lib/kokkos/bin/nvcc_wrapper
Executable file
284
lib/kokkos/bin/nvcc_wrapper
Executable file
@ -0,0 +1,284 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
#
|
||||||
|
# This shell script (nvcc_wrapper) wraps both the host compiler and
|
||||||
|
# NVCC, if you are building legacy C or C++ code with CUDA enabled.
|
||||||
|
# The script remedies some differences between the interface of NVCC
|
||||||
|
# and that of the host compiler, in particular for linking.
|
||||||
|
# It also means that a legacy code doesn't need separate .cu files;
|
||||||
|
# it can just use .cpp files.
|
||||||
|
#
|
||||||
|
# Default settings: change those according to your machine. For
|
||||||
|
# example, you may have have two different wrappers with either icpc
|
||||||
|
# or g++ as their back-end compiler. The defaults can be overwritten
|
||||||
|
# by using the usual arguments (e.g., -arch=sm_30 -ccbin icpc).
|
||||||
|
|
||||||
|
default_arch="sm_35"
|
||||||
|
#default_arch="sm_50"
|
||||||
|
|
||||||
|
#
|
||||||
|
# The default C++ compiler.
|
||||||
|
#
|
||||||
|
host_compiler=${NVCC_WRAPPER_DEFAULT_COMPILER:-"g++"}
|
||||||
|
#host_compiler="icpc"
|
||||||
|
#host_compiler="/usr/local/gcc/4.8.3/bin/g++"
|
||||||
|
#host_compiler="/usr/local/gcc/4.9.1/bin/g++"
|
||||||
|
|
||||||
|
#
|
||||||
|
# Internal variables
|
||||||
|
#
|
||||||
|
|
||||||
|
# C++ files
|
||||||
|
cpp_files=""
|
||||||
|
|
||||||
|
# Host compiler arguments
|
||||||
|
xcompiler_args=""
|
||||||
|
|
||||||
|
# Cuda (NVCC) only arguments
|
||||||
|
cuda_args=""
|
||||||
|
|
||||||
|
# Arguments for both NVCC and Host compiler
|
||||||
|
shared_args=""
|
||||||
|
|
||||||
|
# Linker arguments
|
||||||
|
xlinker_args=""
|
||||||
|
|
||||||
|
# Object files passable to NVCC
|
||||||
|
object_files=""
|
||||||
|
|
||||||
|
# Link objects for the host linker only
|
||||||
|
object_files_xlinker=""
|
||||||
|
|
||||||
|
# Shared libraries with version numbers are not handled correctly by NVCC
|
||||||
|
shared_versioned_libraries_host=""
|
||||||
|
shared_versioned_libraries=""
|
||||||
|
|
||||||
|
# Does the User set the architecture
|
||||||
|
arch_set=0
|
||||||
|
|
||||||
|
# Does the user overwrite the host compiler
|
||||||
|
ccbin_set=0
|
||||||
|
|
||||||
|
#Error code of compilation
|
||||||
|
error_code=0
|
||||||
|
|
||||||
|
# Do a dry run without actually compiling
|
||||||
|
dry_run=0
|
||||||
|
|
||||||
|
# Skip NVCC compilation and use host compiler directly
|
||||||
|
host_only=0
|
||||||
|
|
||||||
|
# Enable workaround for CUDA 6.5 for pragma ident
|
||||||
|
replace_pragma_ident=0
|
||||||
|
|
||||||
|
# Mark first host compiler argument
|
||||||
|
first_xcompiler_arg=1
|
||||||
|
|
||||||
|
temp_dir=${TMPDIR:-/tmp}
|
||||||
|
|
||||||
|
# Check if we have an optimization argument already
|
||||||
|
optimization_applied=0
|
||||||
|
|
||||||
|
#echo "Arguments: $# $@"
|
||||||
|
|
||||||
|
while [ $# -gt 0 ]
|
||||||
|
do
|
||||||
|
case $1 in
|
||||||
|
#show the executed command
|
||||||
|
--show|--nvcc-wrapper-show)
|
||||||
|
dry_run=1
|
||||||
|
;;
|
||||||
|
#run host compilation only
|
||||||
|
--host-only)
|
||||||
|
host_only=1
|
||||||
|
;;
|
||||||
|
#replace '#pragma ident' with '#ident' this is needed to compile OpenMPI due to a configure script bug and a non standardized behaviour of pragma with macros
|
||||||
|
--replace-pragma-ident)
|
||||||
|
replace_pragma_ident=1
|
||||||
|
;;
|
||||||
|
#handle source files to be compiled as cuda files
|
||||||
|
*.cpp|*.cxx|*.cc|*.C|*.c++|*.cu)
|
||||||
|
cpp_files="$cpp_files $1"
|
||||||
|
;;
|
||||||
|
# Ensure we only have one optimization flag because NVCC doesn't allow muliple
|
||||||
|
-O*)
|
||||||
|
if [ $optimization_applied -eq 1 ]; then
|
||||||
|
echo "nvcc_wrapper - *warning* you have set multiple optimization flags (-O*), only the first is used because nvcc can only accept a single optimization setting."
|
||||||
|
else
|
||||||
|
shared_args="$shared_args $1"
|
||||||
|
optimization_applied=1
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
#Handle shared args (valid for both nvcc and the host compiler)
|
||||||
|
-D*|-c|-I*|-L*|-l*|-g|--help|--version|-E|-M|-shared)
|
||||||
|
shared_args="$shared_args $1"
|
||||||
|
;;
|
||||||
|
#Handle shared args that have an argument
|
||||||
|
-o|-MT)
|
||||||
|
shared_args="$shared_args $1 $2"
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
#Handle known nvcc args
|
||||||
|
-gencode*|--dryrun|--verbose|--keep|--keep-dir*|-G|--relocatable-device-code*|-lineinfo|-expt-extended-lambda|--resource-usage|-Xptxas*)
|
||||||
|
cuda_args="$cuda_args $1"
|
||||||
|
;;
|
||||||
|
#Handle more known nvcc args
|
||||||
|
--expt-extended-lambda|--expt-relaxed-constexpr)
|
||||||
|
cuda_args="$cuda_args $1"
|
||||||
|
;;
|
||||||
|
#Handle known nvcc args that have an argument
|
||||||
|
-rdc|-maxrregcount|--default-stream)
|
||||||
|
cuda_args="$cuda_args $1 $2"
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
#Handle c++11 setting
|
||||||
|
--std=c++11|-std=c++11)
|
||||||
|
shared_args="$shared_args $1"
|
||||||
|
;;
|
||||||
|
#strip of -std=c++98 due to nvcc warnings and Tribits will place both -std=c++11 and -std=c++98
|
||||||
|
-std=c++98|--std=c++98)
|
||||||
|
;;
|
||||||
|
#strip of pedantic because it produces endless warnings about #LINE added by the preprocessor
|
||||||
|
-pedantic|-Wpedantic|-ansi)
|
||||||
|
;;
|
||||||
|
#strip -Xcompiler because we add it
|
||||||
|
-Xcompiler)
|
||||||
|
if [ $first_xcompiler_arg -eq 1 ]; then
|
||||||
|
xcompiler_args="$2"
|
||||||
|
first_xcompiler_arg=0
|
||||||
|
else
|
||||||
|
xcompiler_args="$xcompiler_args,$2"
|
||||||
|
fi
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
#strip of "-x cu" because we add that
|
||||||
|
-x)
|
||||||
|
if [[ $2 != "cu" ]]; then
|
||||||
|
if [ $first_xcompiler_arg -eq 1 ]; then
|
||||||
|
xcompiler_args="-x,$2"
|
||||||
|
first_xcompiler_arg=0
|
||||||
|
else
|
||||||
|
xcompiler_args="$xcompiler_args,-x,$2"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
#Handle -ccbin (if its not set we can set it to a default value)
|
||||||
|
-ccbin)
|
||||||
|
cuda_args="$cuda_args $1 $2"
|
||||||
|
ccbin_set=1
|
||||||
|
host_compiler=$2
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
#Handle -arch argument (if its not set use a default
|
||||||
|
-arch*)
|
||||||
|
cuda_args="$cuda_args $1"
|
||||||
|
arch_set=1
|
||||||
|
;;
|
||||||
|
#Handle -Xcudafe argument
|
||||||
|
-Xcudafe)
|
||||||
|
cuda_args="$cuda_args -Xcudafe $2"
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
#Handle args that should be sent to the linker
|
||||||
|
-Wl*)
|
||||||
|
xlinker_args="$xlinker_args -Xlinker ${1:4:${#1}}"
|
||||||
|
host_linker_args="$host_linker_args ${1:4:${#1}}"
|
||||||
|
;;
|
||||||
|
#Handle object files: -x cu applies to all input files, so give them to linker, except if only linking
|
||||||
|
*.a|*.so|*.o|*.obj)
|
||||||
|
object_files="$object_files $1"
|
||||||
|
object_files_xlinker="$object_files_xlinker -Xlinker $1"
|
||||||
|
;;
|
||||||
|
#Handle object files which always need to use "-Xlinker": -x cu applies to all input files, so give them to linker, except if only linking
|
||||||
|
*.dylib)
|
||||||
|
object_files="$object_files -Xlinker $1"
|
||||||
|
object_files_xlinker="$object_files_xlinker -Xlinker $1"
|
||||||
|
;;
|
||||||
|
#Handle shared libraries with *.so.* names which nvcc can't do.
|
||||||
|
*.so.*)
|
||||||
|
shared_versioned_libraries_host="$shared_versioned_libraries_host $1"
|
||||||
|
shared_versioned_libraries="$shared_versioned_libraries -Xlinker $1"
|
||||||
|
;;
|
||||||
|
#All other args are sent to the host compiler
|
||||||
|
*)
|
||||||
|
if [ $first_xcompiler_arg -eq 1 ]; then
|
||||||
|
xcompiler_args=$1
|
||||||
|
first_xcompiler_arg=0
|
||||||
|
else
|
||||||
|
xcompiler_args="$xcompiler_args,$1"
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
#Add default host compiler if necessary
|
||||||
|
if [ $ccbin_set -ne 1 ]; then
|
||||||
|
cuda_args="$cuda_args -ccbin $host_compiler"
|
||||||
|
fi
|
||||||
|
|
||||||
|
#Add architecture command
|
||||||
|
if [ $arch_set -ne 1 ]; then
|
||||||
|
cuda_args="$cuda_args -arch=$default_arch"
|
||||||
|
fi
|
||||||
|
|
||||||
|
#Compose compilation command
|
||||||
|
nvcc_command="nvcc $cuda_args $shared_args $xlinker_args $shared_versioned_libraries"
|
||||||
|
if [ $first_xcompiler_arg -eq 0 ]; then
|
||||||
|
nvcc_command="$nvcc_command -Xcompiler $xcompiler_args"
|
||||||
|
fi
|
||||||
|
|
||||||
|
#Compose host only command
|
||||||
|
host_command="$host_compiler $shared_args $xcompiler_args $host_linker_args $shared_versioned_libraries_host"
|
||||||
|
|
||||||
|
#nvcc does not accept '#pragma ident SOME_MACRO_STRING' but it does accept '#ident SOME_MACRO_STRING'
|
||||||
|
if [ $replace_pragma_ident -eq 1 ]; then
|
||||||
|
cpp_files2=""
|
||||||
|
for file in $cpp_files
|
||||||
|
do
|
||||||
|
var=`grep pragma ${file} | grep ident | grep "#"`
|
||||||
|
if [ "${#var}" -gt 0 ]
|
||||||
|
then
|
||||||
|
sed 's/#[\ \t]*pragma[\ \t]*ident/#ident/g' $file > $temp_dir/nvcc_wrapper_tmp_$file
|
||||||
|
cpp_files2="$cpp_files2 $temp_dir/nvcc_wrapper_tmp_$file"
|
||||||
|
else
|
||||||
|
cpp_files2="$cpp_files2 $file"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
cpp_files=$cpp_files2
|
||||||
|
#echo $cpp_files
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$cpp_files" ]; then
|
||||||
|
nvcc_command="$nvcc_command $object_files_xlinker -x cu $cpp_files"
|
||||||
|
else
|
||||||
|
nvcc_command="$nvcc_command $object_files"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$cpp_files" ]; then
|
||||||
|
host_command="$host_command $object_files $cpp_files"
|
||||||
|
else
|
||||||
|
host_command="$host_command $object_files"
|
||||||
|
fi
|
||||||
|
|
||||||
|
#Print command for dryrun
|
||||||
|
if [ $dry_run -eq 1 ]; then
|
||||||
|
if [ $host_only -eq 1 ]; then
|
||||||
|
echo $host_command
|
||||||
|
else
|
||||||
|
echo $nvcc_command
|
||||||
|
fi
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
#Run compilation command
|
||||||
|
if [ $host_only -eq 1 ]; then
|
||||||
|
$host_command
|
||||||
|
else
|
||||||
|
$nvcc_command
|
||||||
|
fi
|
||||||
|
error_code=$?
|
||||||
|
|
||||||
|
#Report error code
|
||||||
|
exit $error_code
|
||||||
@ -53,12 +53,12 @@
|
|||||||
# ************************************************************************
|
# ************************************************************************
|
||||||
# @HEADER
|
# @HEADER
|
||||||
|
|
||||||
include(${TRIBITS_DEPS_DIR}/CUDA.cmake)
|
#include(${TRIBITS_DEPS_DIR}/CUDA.cmake)
|
||||||
|
|
||||||
IF (TPL_ENABLE_CUDA)
|
#IF (TPL_ENABLE_CUDA)
|
||||||
GLOBAL_SET(TPL_CUSPARSE_LIBRARY_DIRS)
|
# GLOBAL_SET(TPL_CUSPARSE_LIBRARY_DIRS)
|
||||||
GLOBAL_SET(TPL_CUSPARSE_INCLUDE_DIRS ${TPL_CUDA_INCLUDE_DIRS})
|
# GLOBAL_SET(TPL_CUSPARSE_INCLUDE_DIRS ${TPL_CUDA_INCLUDE_DIRS})
|
||||||
GLOBAL_SET(TPL_CUSPARSE_LIBRARIES ${CUDA_cusparse_LIBRARY})
|
# GLOBAL_SET(TPL_CUSPARSE_LIBRARIES ${CUDA_cusparse_LIBRARY})
|
||||||
TIBITS_CREATE_IMPORTED_TPL_LIBRARY(CUSPARSE)
|
# TIBITS_CREATE_IMPORTED_TPL_LIBRARY(CUSPARSE)
|
||||||
ENDIF()
|
#ENDIF()
|
||||||
|
|
||||||
|
|||||||
@ -1,6 +1,16 @@
|
|||||||
INCLUDE(CMakeParseArguments)
|
INCLUDE(CMakeParseArguments)
|
||||||
INCLUDE(CTest)
|
INCLUDE(CTest)
|
||||||
|
|
||||||
|
cmake_policy(SET CMP0054 NEW)
|
||||||
|
|
||||||
|
IF(NOT DEFINED ${PROJECT_NAME})
|
||||||
|
project(Kokkos)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
|
IF(NOT DEFINED ${${PROJECT_NAME}_ENABLE_DEBUG}})
|
||||||
|
SET(${PROJECT_NAME}_ENABLE_DEBUG OFF)
|
||||||
|
ENDIF()
|
||||||
|
|
||||||
FUNCTION(ASSERT_DEFINED VARS)
|
FUNCTION(ASSERT_DEFINED VARS)
|
||||||
FOREACH(VAR ${VARS})
|
FOREACH(VAR ${VARS})
|
||||||
IF(NOT DEFINED ${VAR})
|
IF(NOT DEFINED ${VAR})
|
||||||
@ -75,6 +85,13 @@ MACRO(TRIBITS_ADD_EXAMPLE_DIRECTORIES)
|
|||||||
|
|
||||||
ENDMACRO()
|
ENDMACRO()
|
||||||
|
|
||||||
|
|
||||||
|
function(INCLUDE_DIRECTORIES)
|
||||||
|
cmake_parse_arguments(INCLUDE_DIRECTORIES "REQUIRED_DURING_INSTALLATION_TESTING" "" "" ${ARGN})
|
||||||
|
_INCLUDE_DIRECTORIES(${INCLUDE_DIRECTORIES_UNPARSED_ARGUMENTS})
|
||||||
|
endfunction()
|
||||||
|
|
||||||
|
|
||||||
MACRO(TARGET_TRANSFER_PROPERTY TARGET_NAME PROP_IN PROP_OUT)
|
MACRO(TARGET_TRANSFER_PROPERTY TARGET_NAME PROP_IN PROP_OUT)
|
||||||
SET(PROP_VALUES)
|
SET(PROP_VALUES)
|
||||||
FOREACH(TARGET_X ${ARGN})
|
FOREACH(TARGET_X ${ARGN})
|
||||||
@ -271,6 +288,11 @@ ENDFUNCTION()
|
|||||||
|
|
||||||
ADD_CUSTOM_TARGET(check COMMAND ${CMAKE_CTEST_COMMAND} -VV -C ${CMAKE_CFG_INTDIR})
|
ADD_CUSTOM_TARGET(check COMMAND ${CMAKE_CTEST_COMMAND} -VV -C ${CMAKE_CFG_INTDIR})
|
||||||
|
|
||||||
|
FUNCTION(TRIBITS_ADD_TEST)
|
||||||
|
ENDFUNCTION()
|
||||||
|
FUNCTION(TRIBITS_TPL_TENTATIVELY_ENABLE)
|
||||||
|
ENDFUNCTION()
|
||||||
|
|
||||||
FUNCTION(TRIBITS_ADD_EXECUTABLE_AND_TEST EXE_NAME)
|
FUNCTION(TRIBITS_ADD_EXECUTABLE_AND_TEST EXE_NAME)
|
||||||
|
|
||||||
SET(options STANDARD_PASS_OUTPUT WILL_FAIL)
|
SET(options STANDARD_PASS_OUTPUT WILL_FAIL)
|
||||||
|
|||||||
0
lib/kokkos/config/configure_compton_cpu.sh
Executable file → Normal file
0
lib/kokkos/config/configure_compton_cpu.sh
Executable file → Normal file
0
lib/kokkos/config/configure_compton_mic.sh
Executable file → Normal file
0
lib/kokkos/config/configure_compton_mic.sh
Executable file → Normal file
0
lib/kokkos/config/configure_kokkos.sh
Executable file → Normal file
0
lib/kokkos/config/configure_kokkos.sh
Executable file → Normal file
0
lib/kokkos/config/configure_kokkos_nvidia.sh
Executable file → Normal file
0
lib/kokkos/config/configure_kokkos_nvidia.sh
Executable file → Normal file
0
lib/kokkos/config/configure_shannon.sh
Executable file → Normal file
0
lib/kokkos/config/configure_shannon.sh
Executable file → Normal file
@ -91,8 +91,19 @@ Step 3:
|
|||||||
|
|
||||||
// -------------------------------------------------------------------------------- //
|
// -------------------------------------------------------------------------------- //
|
||||||
|
|
||||||
Step 4:
|
Step 4: Once all Trilinos tests pass promote Kokkos develop branch to master on Github
|
||||||
4.1. Once all Trilinos tests pass promote Kokkos develop branch to master on Github
|
4.1. Generate Changelog (You need a github API token)
|
||||||
|
|
||||||
|
Close all Open issues with "InDevelop" tag on github
|
||||||
|
|
||||||
|
(Not from kokkos directory)
|
||||||
|
gitthub_changelog_generator kokkos/kokkos --token TOKEN --no-pull-requests --include-labels 'InDevelop' --enhancement-labels 'enhancement,Feature Request' --future-release 'NEWTAG' --between-tags 'NEWTAG,OLDTAG'
|
||||||
|
|
||||||
|
(Copy the new section from the generated CHANGELOG.md to the kokkos/CHANGELOG.md)
|
||||||
|
(Make desired changes to CHANGELOG.md to enhance clarity)
|
||||||
|
(Commit and push the CHANGELOG to develop)
|
||||||
|
|
||||||
|
4.2 Merge develop into Master
|
||||||
|
|
||||||
- DO NOT fast-forward the merge!!!!
|
- DO NOT fast-forward the merge!!!!
|
||||||
|
|
||||||
@ -103,7 +114,7 @@ Step 4:
|
|||||||
git reset --hard origin/master
|
git reset --hard origin/master
|
||||||
git merge --no-ff origin/develop
|
git merge --no-ff origin/develop
|
||||||
|
|
||||||
4.2. Update the tag in kokkos/config/master_history.txt
|
4.3. Update the tag in kokkos/config/master_history.txt
|
||||||
Tag description: MajorNumber.MinorNumber.WeeksSinceMinorNumberUpdate
|
Tag description: MajorNumber.MinorNumber.WeeksSinceMinorNumberUpdate
|
||||||
Tag format: #.#.##
|
Tag format: #.#.##
|
||||||
|
|
||||||
|
|||||||
@ -1,3 +1,6 @@
|
|||||||
tag: 2.01.00 date: 07:21:2016 master: xxxxxxxx develop: fa6dfcc4
|
tag: 2.01.00 date: 07:21:2016 master: xxxxxxxx develop: fa6dfcc4
|
||||||
tag: 2.01.06 date: 09:02:2016 master: 9afaa87f develop: 555f1a3a
|
tag: 2.01.06 date: 09:02:2016 master: 9afaa87f develop: 555f1a3a
|
||||||
|
tag: 2.01.10 date: 09:27:2016 master: e4119325 develop: e6cda11e
|
||||||
|
tag: 2.02.00 date: 10:30:2016 master: 6c90a581 develop: ca3dd56e
|
||||||
|
tag: 2.02.01 date: 11:01:2016 master: 9c698c86 develop: b0072304
|
||||||
|
tag: 2.02.07 date: 12:16:2016 master: 4b4cc4ba develop: 382c0966
|
||||||
|
|||||||
@ -121,6 +121,10 @@ do
|
|||||||
-gencode*|--dryrun|--verbose|--keep|--keep-dir*|-G|--relocatable-device-code*|-lineinfo|-expt-extended-lambda|--resource-usage|-Xptxas*)
|
-gencode*|--dryrun|--verbose|--keep|--keep-dir*|-G|--relocatable-device-code*|-lineinfo|-expt-extended-lambda|--resource-usage|-Xptxas*)
|
||||||
cuda_args="$cuda_args $1"
|
cuda_args="$cuda_args $1"
|
||||||
;;
|
;;
|
||||||
|
#Handle more known nvcc args
|
||||||
|
--expt-extended-lambda|--expt-relaxed-constexpr)
|
||||||
|
cuda_args="$cuda_args $1"
|
||||||
|
;;
|
||||||
#Handle known nvcc args that have an argument
|
#Handle known nvcc args that have an argument
|
||||||
-rdc|-maxrregcount|--default-stream)
|
-rdc|-maxrregcount|--default-stream)
|
||||||
cuda_args="$cuda_args $1 $2"
|
cuda_args="$cuda_args $1 $2"
|
||||||
|
|||||||
@ -16,6 +16,8 @@ elif [[ "$HOSTNAME" =~ .*bowman.* ]]; then
|
|||||||
MACHINE=bowman
|
MACHINE=bowman
|
||||||
elif [[ "$HOSTNAME" =~ node.* ]]; then # Warning: very generic name
|
elif [[ "$HOSTNAME" =~ node.* ]]; then # Warning: very generic name
|
||||||
MACHINE=shepard
|
MACHINE=shepard
|
||||||
|
elif [[ "$HOSTNAME" =~ apollo ]]; then
|
||||||
|
MACHINE=apollo
|
||||||
elif [ ! -z "$SEMS_MODULEFILES_ROOT" ]; then
|
elif [ ! -z "$SEMS_MODULEFILES_ROOT" ]; then
|
||||||
MACHINE=sems
|
MACHINE=sems
|
||||||
else
|
else
|
||||||
@ -28,6 +30,7 @@ IBM_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
|
|||||||
INTEL_BUILD_LIST="OpenMP,Pthread,Serial,OpenMP_Serial,Pthread_Serial"
|
INTEL_BUILD_LIST="OpenMP,Pthread,Serial,OpenMP_Serial,Pthread_Serial"
|
||||||
CLANG_BUILD_LIST="Pthread,Serial,Pthread_Serial"
|
CLANG_BUILD_LIST="Pthread,Serial,Pthread_Serial"
|
||||||
CUDA_BUILD_LIST="Cuda_OpenMP,Cuda_Pthread,Cuda_Serial"
|
CUDA_BUILD_LIST="Cuda_OpenMP,Cuda_Pthread,Cuda_Serial"
|
||||||
|
CUDA_IBM_BUILD_LIST="Cuda_OpenMP,Cuda_Serial"
|
||||||
|
|
||||||
GCC_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wignored-qualifiers,-Wempty-body,-Wclobbered,-Wuninitialized"
|
GCC_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wignored-qualifiers,-Wempty-body,-Wclobbered,-Wuninitialized"
|
||||||
IBM_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
|
IBM_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
|
||||||
@ -44,102 +47,12 @@ BUILD_ONLY=False
|
|||||||
declare -i NUM_JOBS_TO_RUN_IN_PARALLEL=3
|
declare -i NUM_JOBS_TO_RUN_IN_PARALLEL=3
|
||||||
TEST_SCRIPT=False
|
TEST_SCRIPT=False
|
||||||
SKIP_HWLOC=False
|
SKIP_HWLOC=False
|
||||||
|
SPOT_CHECK=False
|
||||||
|
|
||||||
ARCH_FLAG=""
|
PRINT_HELP=False
|
||||||
|
OPT_FLAG=""
|
||||||
|
KOKKOS_OPTIONS=""
|
||||||
|
|
||||||
#
|
|
||||||
# Machine specific config
|
|
||||||
#
|
|
||||||
|
|
||||||
if [ "$MACHINE" = "sems" ]; then
|
|
||||||
source /projects/modulefiles/utils/sems-modules-init.sh
|
|
||||||
source /projects/modulefiles/utils/kokkos-modules-init.sh
|
|
||||||
|
|
||||||
BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>/base,hwloc/1.10.1/<COMPILER_NAME>/<COMPILER_VERSION>/base"
|
|
||||||
CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/4.7.2/base"
|
|
||||||
|
|
||||||
# Format: (compiler module-list build-list exe-name warning-flag)
|
|
||||||
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
|
||||||
"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
|
||||||
"gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
|
||||||
"gcc/5.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
|
||||||
"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
|
||||||
"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
|
||||||
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
|
||||||
"clang/3.5.2 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
|
||||||
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
|
||||||
"cuda/6.5.14 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
|
||||||
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
|
||||||
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
|
||||||
)
|
|
||||||
|
|
||||||
elif [ "$MACHINE" = "white" ]; then
|
|
||||||
source /etc/profile.d/modules.sh
|
|
||||||
SKIP_HWLOC=True
|
|
||||||
export SLURM_TASKS_PER_NODE=32
|
|
||||||
|
|
||||||
BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
|
|
||||||
IBM_MODULE_LIST="<COMPILER_NAME>/xl/<COMPILER_VERSION>"
|
|
||||||
CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/4.9.2"
|
|
||||||
|
|
||||||
# Don't do pthread on white
|
|
||||||
GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
|
|
||||||
|
|
||||||
# Format: (compiler module-list build-list exe-name warning-flag)
|
|
||||||
COMPILERS=("gcc/4.9.2 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
|
||||||
"gcc/5.3.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
|
||||||
"ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
|
|
||||||
)
|
|
||||||
|
|
||||||
ARCH_FLAG="--arch=Power8"
|
|
||||||
NUM_JOBS_TO_RUN_IN_PARALLEL=8
|
|
||||||
|
|
||||||
elif [ "$MACHINE" = "bowman" ]; then
|
|
||||||
source /etc/profile.d/modules.sh
|
|
||||||
SKIP_HWLOC=True
|
|
||||||
export SLURM_TASKS_PER_NODE=32
|
|
||||||
|
|
||||||
BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
|
|
||||||
|
|
||||||
OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
|
|
||||||
|
|
||||||
# Format: (compiler module-list build-list exe-name warning-flag)
|
|
||||||
COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
|
||||||
"intel/17.0.064 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
|
||||||
)
|
|
||||||
|
|
||||||
ARCH_FLAG="--arch=KNL"
|
|
||||||
NUM_JOBS_TO_RUN_IN_PARALLEL=8
|
|
||||||
|
|
||||||
elif [ "$MACHINE" = "shepard" ]; then
|
|
||||||
source /etc/profile.d/modules.sh
|
|
||||||
SKIP_HWLOC=True
|
|
||||||
export SLURM_TASKS_PER_NODE=32
|
|
||||||
|
|
||||||
BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
|
|
||||||
|
|
||||||
OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
|
|
||||||
|
|
||||||
# Format: (compiler module-list build-list exe-name warning-flag)
|
|
||||||
COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
|
||||||
"intel/17.0.064 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
|
||||||
)
|
|
||||||
|
|
||||||
ARCH_FLAG="--arch=HSW"
|
|
||||||
NUM_JOBS_TO_RUN_IN_PARALLEL=8
|
|
||||||
|
|
||||||
else
|
|
||||||
echo "Unhandled machine $MACHINE" >&2
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
export OMP_NUM_THREADS=4
|
|
||||||
|
|
||||||
declare -i NUM_RESULTS_TO_KEEP=7
|
|
||||||
|
|
||||||
RESULT_ROOT_PREFIX=TestAll
|
|
||||||
|
|
||||||
SCRIPT_KOKKOS_ROOT=$( cd "$( dirname "$0" )" && cd .. && pwd )
|
|
||||||
|
|
||||||
#
|
#
|
||||||
# Handle arguments
|
# Handle arguments
|
||||||
@ -173,7 +86,211 @@ NUM_JOBS_TO_RUN_IN_PARALLEL="${key#*=}"
|
|||||||
--dry-run*)
|
--dry-run*)
|
||||||
DRYRUN=True
|
DRYRUN=True
|
||||||
;;
|
;;
|
||||||
--help)
|
--spot-check*)
|
||||||
|
SPOT_CHECK=True
|
||||||
|
;;
|
||||||
|
--arch*)
|
||||||
|
ARCH_FLAG="--arch=${key#*=}"
|
||||||
|
;;
|
||||||
|
--opt-flag*)
|
||||||
|
OPT_FLAG="${key#*=}"
|
||||||
|
;;
|
||||||
|
--with-cuda-options*)
|
||||||
|
KOKKOS_CUDA_OPTIONS="--with-cuda-options=${key#*=}"
|
||||||
|
;;
|
||||||
|
--help*)
|
||||||
|
PRINT_HELP=True
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
# args, just append
|
||||||
|
ARGS="$ARGS $1"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
SCRIPT_KOKKOS_ROOT=$( cd "$( dirname "$0" )" && cd .. && pwd )
|
||||||
|
|
||||||
|
# set kokkos path
|
||||||
|
if [ -z "$KOKKOS_PATH" ]; then
|
||||||
|
KOKKOS_PATH=$SCRIPT_KOKKOS_ROOT
|
||||||
|
else
|
||||||
|
# Ensure KOKKOS_PATH is abs path
|
||||||
|
KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
|
||||||
|
fi
|
||||||
|
|
||||||
|
#
|
||||||
|
# Machine specific config
|
||||||
|
#
|
||||||
|
|
||||||
|
if [ "$MACHINE" = "sems" ]; then
|
||||||
|
source /projects/sems/modulefiles/utils/sems-modules-init.sh
|
||||||
|
|
||||||
|
BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
|
||||||
|
CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
|
||||||
|
CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
|
||||||
|
|
||||||
|
if [ -z "$ARCH_FLAG" ]; then
|
||||||
|
ARCH_FLAG=""
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$SPOT_CHECK" = "True" ]; then
|
||||||
|
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||||
|
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
|
||||||
|
"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
|
||||||
|
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
|
||||||
|
"cuda/8.0.44 $CUDA8_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||||
|
)
|
||||||
|
else
|
||||||
|
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||||
|
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"gcc/5.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||||
|
"clang/3.7.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||||
|
"clang/3.8.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||||
|
"clang/3.9.0 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||||
|
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||||
|
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||||
|
"cuda/8.0.44 $CUDA8_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||||
|
)
|
||||||
|
fi
|
||||||
|
|
||||||
|
elif [ "$MACHINE" = "white" ]; then
|
||||||
|
source /etc/profile.d/modules.sh
|
||||||
|
SKIP_HWLOC=True
|
||||||
|
export SLURM_TASKS_PER_NODE=32
|
||||||
|
|
||||||
|
BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
|
||||||
|
IBM_MODULE_LIST="<COMPILER_NAME>/xl/<COMPILER_VERSION>"
|
||||||
|
CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/5.4.0"
|
||||||
|
|
||||||
|
# Don't do pthread on white
|
||||||
|
GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
|
||||||
|
|
||||||
|
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||||
|
COMPILERS=("gcc/5.4.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
|
||||||
|
"cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||||
|
)
|
||||||
|
if [ -z "$ARCH_FLAG" ]; then
|
||||||
|
ARCH_FLAG="--arch=Power8,Kepler37"
|
||||||
|
fi
|
||||||
|
NUM_JOBS_TO_RUN_IN_PARALLEL=2
|
||||||
|
|
||||||
|
elif [ "$MACHINE" = "bowman" ]; then
|
||||||
|
source /etc/profile.d/modules.sh
|
||||||
|
SKIP_HWLOC=True
|
||||||
|
export SLURM_TASKS_PER_NODE=32
|
||||||
|
|
||||||
|
BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
|
||||||
|
|
||||||
|
OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
|
||||||
|
|
||||||
|
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||||
|
COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
)
|
||||||
|
|
||||||
|
if [ -z "$ARCH_FLAG" ]; then
|
||||||
|
ARCH_FLAG="--arch=KNL"
|
||||||
|
fi
|
||||||
|
NUM_JOBS_TO_RUN_IN_PARALLEL=2
|
||||||
|
|
||||||
|
elif [ "$MACHINE" = "shepard" ]; then
|
||||||
|
source /etc/profile.d/modules.sh
|
||||||
|
SKIP_HWLOC=True
|
||||||
|
export SLURM_TASKS_PER_NODE=32
|
||||||
|
|
||||||
|
BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
|
||||||
|
|
||||||
|
OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
|
||||||
|
|
||||||
|
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||||
|
COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
)
|
||||||
|
|
||||||
|
if [ -z "$ARCH_FLAG" ]; then
|
||||||
|
ARCH_FLAG="--arch=HSW"
|
||||||
|
fi
|
||||||
|
NUM_JOBS_TO_RUN_IN_PARALLEL=2
|
||||||
|
|
||||||
|
elif [ "$MACHINE" = "apollo" ]; then
|
||||||
|
source /projects/sems/modulefiles/utils/sems-modules-init.sh
|
||||||
|
module use /home/projects/modulefiles/local/x86-64
|
||||||
|
module load kokkos-env
|
||||||
|
|
||||||
|
module load sems-git
|
||||||
|
module load sems-tex
|
||||||
|
module load sems-cmake/3.5.2
|
||||||
|
module load sems-gdb
|
||||||
|
|
||||||
|
SKIP_HWLOC=True
|
||||||
|
|
||||||
|
BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
|
||||||
|
CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
|
||||||
|
CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
|
||||||
|
|
||||||
|
CLANG_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,cuda/8.0.44"
|
||||||
|
NVCC_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0"
|
||||||
|
|
||||||
|
BUILD_LIST_CUDA_NVCC="Cuda_Serial,Cuda_OpenMP"
|
||||||
|
BUILD_LIST_CUDA_CLANG="Cuda_Serial,Cuda_Pthread"
|
||||||
|
BUILD_LIST_CLANG="Serial,Pthread,OpenMP"
|
||||||
|
|
||||||
|
if [ "$SPOT_CHECK" = "True" ]; then
|
||||||
|
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||||
|
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
|
||||||
|
"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
|
||||||
|
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
|
||||||
|
"clang/head $CLANG_MODULE_LIST "Cuda_Pthread" clang++ $CUDA_WARNING_FLAGS"
|
||||||
|
"cuda/8.0.44 $CUDA_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||||
|
)
|
||||||
|
else
|
||||||
|
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||||
|
COMPILERS=("cuda/8.0.44 $CUDA8_MODULE_LIST $BUILD_LIST_CUDA_NVCC $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||||
|
"clang/head $CLANG_MODULE_LIST $BUILD_LIST_CUDA_CLANG clang++ $CUDA_WARNING_FLAGS"
|
||||||
|
"clang/3.9.0 $CLANG_MODULE_LIST $BUILD_LIST_CLANG clang++ $CLANG_WARNING_FLAGS"
|
||||||
|
"gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"gcc/5.3.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"gcc/6.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||||
|
"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||||
|
"clang/3.5.2 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||||
|
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||||
|
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||||
|
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||||
|
)
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "$ARCH_FLAG" ]; then
|
||||||
|
ARCH_FLAG="--arch=SNB,Kepler35"
|
||||||
|
fi
|
||||||
|
NUM_JOBS_TO_RUN_IN_PARALLEL=2
|
||||||
|
else
|
||||||
|
echo "Unhandled machine $MACHINE" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
export OMP_NUM_THREADS=4
|
||||||
|
|
||||||
|
declare -i NUM_RESULTS_TO_KEEP=7
|
||||||
|
|
||||||
|
RESULT_ROOT_PREFIX=TestAll
|
||||||
|
|
||||||
|
if [ "$PRINT_HELP" = "True" ]; then
|
||||||
echo "test_all_sandia <ARGS> <OPTIONS>:"
|
echo "test_all_sandia <ARGS> <OPTIONS>:"
|
||||||
echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
|
echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
|
||||||
echo " Defaults to root repo containing this script"
|
echo " Defaults to root repo containing this script"
|
||||||
@ -183,6 +300,9 @@ echo "--skip-hwloc: Do not do hwloc tests"
|
|||||||
echo "--num=N: Number of jobs to run in parallel "
|
echo "--num=N: Number of jobs to run in parallel "
|
||||||
echo "--dry-run: Just print what would be executed"
|
echo "--dry-run: Just print what would be executed"
|
||||||
echo "--build-only: Just do builds, don't run anything"
|
echo "--build-only: Just do builds, don't run anything"
|
||||||
|
echo "--opt-flag=FLAG: Optimization flag (default: -O3)"
|
||||||
|
echo "--arch=ARCHITECTURE: overwrite architecture flags"
|
||||||
|
echo "--with-cuda-options=OPT: set KOKKOS_CUDA_OPTIONS"
|
||||||
echo "--build-list=BUILD,BUILD,BUILD..."
|
echo "--build-list=BUILD,BUILD,BUILD..."
|
||||||
echo " Provide a comma-separated list of builds instead of running all builds"
|
echo " Provide a comma-separated list of builds instead of running all builds"
|
||||||
echo " Valid items:"
|
echo " Valid items:"
|
||||||
@ -220,21 +340,6 @@ echo " hit ctrl-z"
|
|||||||
echo " % kill -9 %1"
|
echo " % kill -9 %1"
|
||||||
echo
|
echo
|
||||||
exit 0
|
exit 0
|
||||||
;;
|
|
||||||
*)
|
|
||||||
# args, just append
|
|
||||||
ARGS="$ARGS $1"
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
shift
|
|
||||||
done
|
|
||||||
|
|
||||||
# set kokkos path
|
|
||||||
if [ -z "$KOKKOS_PATH" ]; then
|
|
||||||
KOKKOS_PATH=$SCRIPT_KOKKOS_ROOT
|
|
||||||
else
|
|
||||||
# Ensure KOKKOS_PATH is abs path
|
|
||||||
KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
|
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# set build type
|
# set build type
|
||||||
@ -381,11 +486,15 @@ single_build_and_test() {
|
|||||||
local extra_args=--with-hwloc=$(dirname $(dirname $(which hwloc-info)))
|
local extra_args=--with-hwloc=$(dirname $(dirname $(which hwloc-info)))
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
if [[ "$OPT_FLAG" = "" ]]; then
|
||||||
|
OPT_FLAG="-O3"
|
||||||
|
fi
|
||||||
|
|
||||||
if [[ "$build_type" = *debug* ]]; then
|
if [[ "$build_type" = *debug* ]]; then
|
||||||
local extra_args="$extra_args --debug"
|
local extra_args="$extra_args --debug"
|
||||||
local cxxflags="-g $compiler_warning_flags"
|
local cxxflags="-g $compiler_warning_flags"
|
||||||
else
|
else
|
||||||
local cxxflags="-O3 $compiler_warning_flags"
|
local cxxflags="$OPT_FLAG $compiler_warning_flags"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [[ "$compiler" == cuda* ]]; then
|
if [[ "$compiler" == cuda* ]]; then
|
||||||
@ -393,7 +502,9 @@ single_build_and_test() {
|
|||||||
export TMPDIR=$(pwd)
|
export TMPDIR=$(pwd)
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# cxxflags="-DKOKKOS_USING_EXP_VIEW=1 $cxxflags"
|
if [[ "$KOKKOS_CUDA_OPTIONS" != "" ]]; then
|
||||||
|
local extra_args="$extra_args $KOKKOS_CUDA_OPTIONS"
|
||||||
|
fi
|
||||||
|
|
||||||
echo " Starting job $desc"
|
echo " Starting job $desc"
|
||||||
|
|
||||||
@ -440,13 +551,14 @@ run_in_background() {
|
|||||||
local compiler=$1
|
local compiler=$1
|
||||||
|
|
||||||
local -i num_jobs=$NUM_JOBS_TO_RUN_IN_PARALLEL
|
local -i num_jobs=$NUM_JOBS_TO_RUN_IN_PARALLEL
|
||||||
if [[ "$BUILD_ONLY" == True ]]; then
|
# don't override command line input
|
||||||
num_jobs=8
|
# if [[ "$BUILD_ONLY" == True ]]; then
|
||||||
else
|
# num_jobs=8
|
||||||
|
# else
|
||||||
if [[ "$compiler" == cuda* ]]; then
|
if [[ "$compiler" == cuda* ]]; then
|
||||||
num_jobs=1
|
num_jobs=1
|
||||||
fi
|
fi
|
||||||
fi
|
# fi
|
||||||
wait_for_jobs $num_jobs
|
wait_for_jobs $num_jobs
|
||||||
|
|
||||||
single_build_and_test $* &
|
single_build_and_test $* &
|
||||||
|
|||||||
50
lib/kokkos/config/trilinos-integration/prepare_trilinos_repos.sh
Executable file
50
lib/kokkos/config/trilinos-integration/prepare_trilinos_repos.sh
Executable file
@ -0,0 +1,50 @@
|
|||||||
|
#!/bin/bash -le
|
||||||
|
|
||||||
|
export TRILINOS_UPDATED_PATH=${PWD}/trilinos-update
|
||||||
|
export TRILINOS_PRISTINE_PATH=${PWD}/trilinos-pristine
|
||||||
|
|
||||||
|
#rm -rf ${KOKKOS_PATH}
|
||||||
|
#rm -rf ${TRILINOS_UPDATED_PATH}
|
||||||
|
#rm -rf ${TRILINOS_PRISTINE_PATH}
|
||||||
|
|
||||||
|
#Already done:
|
||||||
|
if [ ! -d "${TRILINOS_UPDATED_PATH}" ]; then
|
||||||
|
git clone https://github.com/trilinos/trilinos ${TRILINOS_UPDATED_PATH}
|
||||||
|
fi
|
||||||
|
if [ ! -d "${TRILINOS_PRISTINE_PATH}" ]; then
|
||||||
|
git clone https://github.com/trilinos/trilinos ${TRILINOS_PRISTINE_PATH}
|
||||||
|
fi
|
||||||
|
|
||||||
|
cd ${TRILINOS_UPDATED_PATH}
|
||||||
|
git checkout develop
|
||||||
|
git reset --hard origin/develop
|
||||||
|
git pull
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
python kokkos/config/snapshot.py ${KOKKOS_PATH} ${TRILINOS_UPDATED_PATH}/packages
|
||||||
|
|
||||||
|
cd ${TRILINOS_UPDATED_PATH}
|
||||||
|
echo ""
|
||||||
|
echo ""
|
||||||
|
echo "Trilinos State:"
|
||||||
|
git log --pretty=oneline --since=2.days
|
||||||
|
SHA=`git log --pretty=oneline --since=2.days | head -n 2 | tail -n 1 | awk '{print $1}'`
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
cd ${TRILINOS_PRISTINE_PATH}
|
||||||
|
git status
|
||||||
|
git log --pretty=oneline --since=2.days
|
||||||
|
echo "Checkout develop"
|
||||||
|
git checkout develop
|
||||||
|
echo "Pull"
|
||||||
|
git pull
|
||||||
|
echo "Checkout SHA"
|
||||||
|
git checkout ${SHA}
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
cd ${TRILINOS_PRISTINE_PATH}
|
||||||
|
echo ""
|
||||||
|
echo ""
|
||||||
|
echo "Trilinos Pristine State:"
|
||||||
|
git log --pretty=oneline --since=2.days
|
||||||
|
cd ..
|
||||||
@ -1,6 +1,6 @@
|
|||||||
|
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
|
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
||||||
|
|
||||||
SET(SOURCES
|
SET(SOURCES
|
||||||
|
|||||||
@ -7,21 +7,18 @@ vpath %.cpp ${KOKKOS_PATH}/containers/performance_tests
|
|||||||
default: build_all
|
default: build_all
|
||||||
echo "End Build"
|
echo "End Build"
|
||||||
|
|
||||||
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
|
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||||
|
else
|
||||||
|
CXX = g++
|
||||||
|
endif
|
||||||
|
|
||||||
|
CXXFLAGS = -O3
|
||||||
|
LINK ?= $(CXX)
|
||||||
|
LDFLAGS ?= -lpthread
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
|
||||||
CXX = $(NVCC_WRAPPER)
|
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK = $(CXX)
|
|
||||||
LDFLAGS ?= -lpthread
|
|
||||||
else
|
|
||||||
CXX ?= g++
|
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK ?= $(CXX)
|
|
||||||
LDFLAGS ?= -lpthread
|
|
||||||
endif
|
|
||||||
|
|
||||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/performance_tests
|
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/performance_tests
|
||||||
|
|
||||||
TEST_TARGETS =
|
TEST_TARGETS =
|
||||||
|
|||||||
@ -83,7 +83,7 @@ TEST_F( cuda, dynrankview_perf )
|
|||||||
{
|
{
|
||||||
std::cout << "Cuda" << std::endl;
|
std::cout << "Cuda" << std::endl;
|
||||||
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
||||||
test_dynrankview_op_perf<Kokkos::Cuda>( 4096 );
|
test_dynrankview_op_perf<Kokkos::Cuda>( 40960 );
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F( cuda, global_2_local)
|
TEST_F( cuda, global_2_local)
|
||||||
|
|||||||
@ -180,8 +180,8 @@ void test_dynrankview_op_perf( const int par_size )
|
|||||||
|
|
||||||
typedef DeviceType execution_space;
|
typedef DeviceType execution_space;
|
||||||
typedef typename execution_space::size_type size_type;
|
typedef typename execution_space::size_type size_type;
|
||||||
const size_type dim2 = 900;
|
const size_type dim2 = 90;
|
||||||
const size_type dim3 = 300;
|
const size_type dim3 = 30;
|
||||||
|
|
||||||
double elapsed_time_view = 0;
|
double elapsed_time_view = 0;
|
||||||
double elapsed_time_compview = 0;
|
double elapsed_time_compview = 0;
|
||||||
|
|||||||
@ -261,9 +261,6 @@ public:
|
|||||||
modified_device (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_device")),
|
modified_device (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_device")),
|
||||||
modified_host (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_host"))
|
modified_host (View<unsigned int,LayoutLeft,typename t_host::execution_space> ("DualView::modified_host"))
|
||||||
{
|
{
|
||||||
#if ! KOKKOS_USING_EXP_VIEW
|
|
||||||
Impl::assert_shapes_are_equal (d_view.shape (), h_view.shape ());
|
|
||||||
#else
|
|
||||||
if ( int(d_view.rank) != int(h_view.rank) ||
|
if ( int(d_view.rank) != int(h_view.rank) ||
|
||||||
d_view.dimension_0() != h_view.dimension_0() ||
|
d_view.dimension_0() != h_view.dimension_0() ||
|
||||||
d_view.dimension_1() != h_view.dimension_1() ||
|
d_view.dimension_1() != h_view.dimension_1() ||
|
||||||
@ -284,7 +281,6 @@ public:
|
|||||||
d_view.span() != h_view.span() ) {
|
d_view.span() != h_view.span() ) {
|
||||||
Kokkos::Impl::throw_runtime_exception("DualView constructed with incompatible views");
|
Kokkos::Impl::throw_runtime_exception("DualView constructed with incompatible views");
|
||||||
}
|
}
|
||||||
#endif
|
|
||||||
}
|
}
|
||||||
|
|
||||||
//@}
|
//@}
|
||||||
@ -315,13 +311,13 @@ public:
|
|||||||
template< class Device >
|
template< class Device >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
const typename Impl::if_c<
|
const typename Impl::if_c<
|
||||||
Impl::is_same<typename t_dev::memory_space,
|
std::is_same<typename t_dev::memory_space,
|
||||||
typename Device::memory_space>::value,
|
typename Device::memory_space>::value,
|
||||||
t_dev,
|
t_dev,
|
||||||
t_host>::type& view () const
|
t_host>::type& view () const
|
||||||
{
|
{
|
||||||
return Impl::if_c<
|
return Impl::if_c<
|
||||||
Impl::is_same<
|
std::is_same<
|
||||||
typename t_dev::memory_space,
|
typename t_dev::memory_space,
|
||||||
typename Device::memory_space>::value,
|
typename Device::memory_space>::value,
|
||||||
t_dev,
|
t_dev,
|
||||||
@ -347,13 +343,13 @@ public:
|
|||||||
/// appropriate template parameter.
|
/// appropriate template parameter.
|
||||||
template<class Device>
|
template<class Device>
|
||||||
void sync( const typename Impl::enable_if<
|
void sync( const typename Impl::enable_if<
|
||||||
( Impl::is_same< typename traits::data_type , typename traits::non_const_data_type>::value) ||
|
( std::is_same< typename traits::data_type , typename traits::non_const_data_type>::value) ||
|
||||||
( Impl::is_same< Device , int>::value)
|
( std::is_same< Device , int>::value)
|
||||||
, int >::type& = 0)
|
, int >::type& = 0)
|
||||||
{
|
{
|
||||||
const unsigned int dev =
|
const unsigned int dev =
|
||||||
Impl::if_c<
|
Impl::if_c<
|
||||||
Impl::is_same<
|
std::is_same<
|
||||||
typename t_dev::memory_space,
|
typename t_dev::memory_space,
|
||||||
typename Device::memory_space>::value ,
|
typename Device::memory_space>::value ,
|
||||||
unsigned int,
|
unsigned int,
|
||||||
@ -370,7 +366,7 @@ public:
|
|||||||
modified_host() = modified_device() = 0;
|
modified_host() = modified_device() = 0;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if(Impl::is_same<typename t_host::memory_space,typename t_dev::memory_space>::value) {
|
if(std::is_same<typename t_host::memory_space,typename t_dev::memory_space>::value) {
|
||||||
t_dev::execution_space::fence();
|
t_dev::execution_space::fence();
|
||||||
t_host::execution_space::fence();
|
t_host::execution_space::fence();
|
||||||
}
|
}
|
||||||
@ -378,13 +374,13 @@ public:
|
|||||||
|
|
||||||
template<class Device>
|
template<class Device>
|
||||||
void sync ( const typename Impl::enable_if<
|
void sync ( const typename Impl::enable_if<
|
||||||
( ! Impl::is_same< typename traits::data_type , typename traits::non_const_data_type>::value ) ||
|
( ! std::is_same< typename traits::data_type , typename traits::non_const_data_type>::value ) ||
|
||||||
( Impl::is_same< Device , int>::value)
|
( std::is_same< Device , int>::value)
|
||||||
, int >::type& = 0 )
|
, int >::type& = 0 )
|
||||||
{
|
{
|
||||||
const unsigned int dev =
|
const unsigned int dev =
|
||||||
Impl::if_c<
|
Impl::if_c<
|
||||||
Impl::is_same<
|
std::is_same<
|
||||||
typename t_dev::memory_space,
|
typename t_dev::memory_space,
|
||||||
typename Device::memory_space>::value,
|
typename Device::memory_space>::value,
|
||||||
unsigned int,
|
unsigned int,
|
||||||
@ -405,7 +401,7 @@ public:
|
|||||||
{
|
{
|
||||||
const unsigned int dev =
|
const unsigned int dev =
|
||||||
Impl::if_c<
|
Impl::if_c<
|
||||||
Impl::is_same<
|
std::is_same<
|
||||||
typename t_dev::memory_space,
|
typename t_dev::memory_space,
|
||||||
typename Device::memory_space>::value ,
|
typename Device::memory_space>::value ,
|
||||||
unsigned int,
|
unsigned int,
|
||||||
@ -431,7 +427,7 @@ public:
|
|||||||
void modify () {
|
void modify () {
|
||||||
const unsigned int dev =
|
const unsigned int dev =
|
||||||
Impl::if_c<
|
Impl::if_c<
|
||||||
Impl::is_same<
|
std::is_same<
|
||||||
typename t_dev::memory_space,
|
typename t_dev::memory_space,
|
||||||
typename Device::memory_space>::value,
|
typename Device::memory_space>::value,
|
||||||
unsigned int,
|
unsigned int,
|
||||||
@ -514,11 +510,7 @@ public:
|
|||||||
|
|
||||||
//! The allocation size (same as Kokkos::View::capacity).
|
//! The allocation size (same as Kokkos::View::capacity).
|
||||||
size_t capacity() const {
|
size_t capacity() const {
|
||||||
#if KOKKOS_USING_EXP_VIEW
|
|
||||||
return d_view.span();
|
return d_view.span();
|
||||||
#else
|
|
||||||
return d_view.capacity();
|
|
||||||
#endif
|
|
||||||
}
|
}
|
||||||
|
|
||||||
//! Get stride(s) for each dimension.
|
//! Get stride(s) for each dimension.
|
||||||
@ -555,8 +547,6 @@ public:
|
|||||||
// Partial specializations of Kokkos::subview() for DualView objects.
|
// Partial specializations of Kokkos::subview() for DualView objects.
|
||||||
//
|
//
|
||||||
|
|
||||||
#if KOKKOS_USING_EXP_VIEW
|
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
@ -590,352 +580,6 @@ subview( const DualView<D,A1,A2,A3> & src , Args ... args )
|
|||||||
|
|
||||||
} /* namespace Kokkos */
|
} /* namespace Kokkos */
|
||||||
|
|
||||||
#else
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//
|
|
||||||
// Partial specializations of Kokkos::subview() for DualView objects.
|
|
||||||
//
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
template< class SrcDataType , class SrcArg1Type , class SrcArg2Type , class SrcArg3Type
|
|
||||||
, class SubArg0_type , class SubArg1_type , class SubArg2_type , class SubArg3_type
|
|
||||||
, class SubArg4_type , class SubArg5_type , class SubArg6_type , class SubArg7_type
|
|
||||||
>
|
|
||||||
struct ViewSubview< DualView< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type >
|
|
||||||
, SubArg0_type , SubArg1_type , SubArg2_type , SubArg3_type
|
|
||||||
, SubArg4_type , SubArg5_type , SubArg6_type , SubArg7_type >
|
|
||||||
{
|
|
||||||
private:
|
|
||||||
|
|
||||||
typedef DualView< SrcDataType , SrcArg1Type , SrcArg2Type , SrcArg3Type > SrcViewType ;
|
|
||||||
|
|
||||||
enum { V0 = Impl::is_same< SubArg0_type , void >::value ? 1 : 0 };
|
|
||||||
enum { V1 = Impl::is_same< SubArg1_type , void >::value ? 1 : 0 };
|
|
||||||
enum { V2 = Impl::is_same< SubArg2_type , void >::value ? 1 : 0 };
|
|
||||||
enum { V3 = Impl::is_same< SubArg3_type , void >::value ? 1 : 0 };
|
|
||||||
enum { V4 = Impl::is_same< SubArg4_type , void >::value ? 1 : 0 };
|
|
||||||
enum { V5 = Impl::is_same< SubArg5_type , void >::value ? 1 : 0 };
|
|
||||||
enum { V6 = Impl::is_same< SubArg6_type , void >::value ? 1 : 0 };
|
|
||||||
enum { V7 = Impl::is_same< SubArg7_type , void >::value ? 1 : 0 };
|
|
||||||
|
|
||||||
// The source view rank must be equal to the input argument rank
|
|
||||||
// Once a void argument is encountered all subsequent arguments must be void.
|
|
||||||
enum { InputRank =
|
|
||||||
Impl::StaticAssert<( SrcViewType::rank ==
|
|
||||||
( V0 ? 0 : (
|
|
||||||
V1 ? 1 : (
|
|
||||||
V2 ? 2 : (
|
|
||||||
V3 ? 3 : (
|
|
||||||
V4 ? 4 : (
|
|
||||||
V5 ? 5 : (
|
|
||||||
V6 ? 6 : (
|
|
||||||
V7 ? 7 : 8 ))))))) ))
|
|
||||||
&&
|
|
||||||
( SrcViewType::rank ==
|
|
||||||
( 8 - ( V0 + V1 + V2 + V3 + V4 + V5 + V6 + V7 ) ) )
|
|
||||||
>::value ? SrcViewType::rank : 0 };
|
|
||||||
|
|
||||||
enum { R0 = Impl::ViewOffsetRange< SubArg0_type >::is_range ? 1 : 0 };
|
|
||||||
enum { R1 = Impl::ViewOffsetRange< SubArg1_type >::is_range ? 1 : 0 };
|
|
||||||
enum { R2 = Impl::ViewOffsetRange< SubArg2_type >::is_range ? 1 : 0 };
|
|
||||||
enum { R3 = Impl::ViewOffsetRange< SubArg3_type >::is_range ? 1 : 0 };
|
|
||||||
enum { R4 = Impl::ViewOffsetRange< SubArg4_type >::is_range ? 1 : 0 };
|
|
||||||
enum { R5 = Impl::ViewOffsetRange< SubArg5_type >::is_range ? 1 : 0 };
|
|
||||||
enum { R6 = Impl::ViewOffsetRange< SubArg6_type >::is_range ? 1 : 0 };
|
|
||||||
enum { R7 = Impl::ViewOffsetRange< SubArg7_type >::is_range ? 1 : 0 };
|
|
||||||
|
|
||||||
enum { OutputRank = unsigned(R0) + unsigned(R1) + unsigned(R2) + unsigned(R3)
|
|
||||||
+ unsigned(R4) + unsigned(R5) + unsigned(R6) + unsigned(R7) };
|
|
||||||
|
|
||||||
// Reverse
|
|
||||||
enum { R0_rev = 0 == InputRank ? 0u : (
|
|
||||||
1 == InputRank ? unsigned(R0) : (
|
|
||||||
2 == InputRank ? unsigned(R1) : (
|
|
||||||
3 == InputRank ? unsigned(R2) : (
|
|
||||||
4 == InputRank ? unsigned(R3) : (
|
|
||||||
5 == InputRank ? unsigned(R4) : (
|
|
||||||
6 == InputRank ? unsigned(R5) : (
|
|
||||||
7 == InputRank ? unsigned(R6) : unsigned(R7) ))))))) };
|
|
||||||
|
|
||||||
typedef typename SrcViewType::array_layout SrcViewLayout ;
|
|
||||||
|
|
||||||
// Choose array layout, attempting to preserve original layout if at all possible.
|
|
||||||
typedef typename Impl::if_c<
|
|
||||||
( // Same Layout IF
|
|
||||||
// OutputRank 0
|
|
||||||
( OutputRank == 0 )
|
|
||||||
||
|
|
||||||
// OutputRank 1 or 2, InputLayout Left, Interval 0
|
|
||||||
// because single stride one or second index has a stride.
|
|
||||||
( OutputRank <= 2 && R0 && Impl::is_same<SrcViewLayout,LayoutLeft>::value )
|
|
||||||
||
|
|
||||||
// OutputRank 1 or 2, InputLayout Right, Interval [InputRank-1]
|
|
||||||
// because single stride one or second index has a stride.
|
|
||||||
( OutputRank <= 2 && R0_rev && Impl::is_same<SrcViewLayout,LayoutRight>::value )
|
|
||||||
), SrcViewLayout , Kokkos::LayoutStride >::type OutputViewLayout ;
|
|
||||||
|
|
||||||
// Choose data type as a purely dynamic rank array to accomodate a runtime range.
|
|
||||||
typedef typename Impl::if_c< OutputRank == 0 , typename SrcViewType::value_type ,
|
|
||||||
typename Impl::if_c< OutputRank == 1 , typename SrcViewType::value_type *,
|
|
||||||
typename Impl::if_c< OutputRank == 2 , typename SrcViewType::value_type **,
|
|
||||||
typename Impl::if_c< OutputRank == 3 , typename SrcViewType::value_type ***,
|
|
||||||
typename Impl::if_c< OutputRank == 4 , typename SrcViewType::value_type ****,
|
|
||||||
typename Impl::if_c< OutputRank == 5 , typename SrcViewType::value_type *****,
|
|
||||||
typename Impl::if_c< OutputRank == 6 , typename SrcViewType::value_type ******,
|
|
||||||
typename Impl::if_c< OutputRank == 7 , typename SrcViewType::value_type *******,
|
|
||||||
typename SrcViewType::value_type ********
|
|
||||||
>::type >::type >::type >::type >::type >::type >::type >::type OutputData ;
|
|
||||||
|
|
||||||
// Choose space.
|
|
||||||
// If the source view's template arg1 or arg2 is a space then use it,
|
|
||||||
// otherwise use the source view's execution space.
|
|
||||||
|
|
||||||
typedef typename Impl::if_c< Impl::is_space< SrcArg1Type >::value , SrcArg1Type ,
|
|
||||||
typename Impl::if_c< Impl::is_space< SrcArg2Type >::value , SrcArg2Type , typename SrcViewType::execution_space
|
|
||||||
>::type >::type OutputSpace ;
|
|
||||||
|
|
||||||
public:
|
|
||||||
|
|
||||||
// If keeping the layout then match non-data type arguments
|
|
||||||
// else keep execution space and memory traits.
|
|
||||||
typedef typename
|
|
||||||
Impl::if_c< Impl::is_same< SrcViewLayout , OutputViewLayout >::value
|
|
||||||
, Kokkos::DualView< OutputData , SrcArg1Type , SrcArg2Type , SrcArg3Type >
|
|
||||||
, Kokkos::DualView< OutputData , OutputViewLayout , OutputSpace
|
|
||||||
, typename SrcViewType::memory_traits >
|
|
||||||
>::type type ;
|
|
||||||
};
|
|
||||||
|
|
||||||
} /* namespace Impl */
|
|
||||||
} /* namespace Kokkos */
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
|
|
||||||
template< class D , class A1 , class A2 , class A3 ,
|
|
||||||
class ArgType0 >
|
|
||||||
typename Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , void , void , void
|
|
||||||
, void , void , void , void
|
|
||||||
>::type
|
|
||||||
subview( const DualView<D,A1,A2,A3> & src ,
|
|
||||||
const ArgType0 & arg0 )
|
|
||||||
{
|
|
||||||
typedef typename
|
|
||||||
Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , void , void , void
|
|
||||||
, void , void , void , void
|
|
||||||
>::type
|
|
||||||
DstViewType ;
|
|
||||||
DstViewType sub_view;
|
|
||||||
sub_view.d_view = subview(src.d_view,arg0);
|
|
||||||
sub_view.h_view = subview(src.h_view,arg0);
|
|
||||||
sub_view.modified_device = src.modified_device;
|
|
||||||
sub_view.modified_host = src.modified_host;
|
|
||||||
return sub_view;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
template< class D , class A1 , class A2 , class A3 ,
|
|
||||||
class ArgType0 , class ArgType1 >
|
|
||||||
typename Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , void , void
|
|
||||||
, void , void , void , void
|
|
||||||
>::type
|
|
||||||
subview( const DualView<D,A1,A2,A3> & src ,
|
|
||||||
const ArgType0 & arg0 ,
|
|
||||||
const ArgType1 & arg1 )
|
|
||||||
{
|
|
||||||
typedef typename
|
|
||||||
Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , void , void
|
|
||||||
, void , void , void , void
|
|
||||||
>::type
|
|
||||||
DstViewType ;
|
|
||||||
DstViewType sub_view;
|
|
||||||
sub_view.d_view = subview(src.d_view,arg0,arg1);
|
|
||||||
sub_view.h_view = subview(src.h_view,arg0,arg1);
|
|
||||||
sub_view.modified_device = src.modified_device;
|
|
||||||
sub_view.modified_host = src.modified_host;
|
|
||||||
return sub_view;
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class D , class A1 , class A2 , class A3 ,
|
|
||||||
class ArgType0 , class ArgType1 , class ArgType2 >
|
|
||||||
typename Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , void
|
|
||||||
, void , void , void , void
|
|
||||||
>::type
|
|
||||||
subview( const DualView<D,A1,A2,A3> & src ,
|
|
||||||
const ArgType0 & arg0 ,
|
|
||||||
const ArgType1 & arg1 ,
|
|
||||||
const ArgType2 & arg2 )
|
|
||||||
{
|
|
||||||
typedef typename
|
|
||||||
Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , void
|
|
||||||
, void , void , void , void
|
|
||||||
>::type
|
|
||||||
DstViewType ;
|
|
||||||
DstViewType sub_view;
|
|
||||||
sub_view.d_view = subview(src.d_view,arg0,arg1,arg2);
|
|
||||||
sub_view.h_view = subview(src.h_view,arg0,arg1,arg2);
|
|
||||||
sub_view.modified_device = src.modified_device;
|
|
||||||
sub_view.modified_host = src.modified_host;
|
|
||||||
return sub_view;
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class D , class A1 , class A2 , class A3 ,
|
|
||||||
class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 >
|
|
||||||
typename Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, void , void , void , void
|
|
||||||
>::type
|
|
||||||
subview( const DualView<D,A1,A2,A3> & src ,
|
|
||||||
const ArgType0 & arg0 ,
|
|
||||||
const ArgType1 & arg1 ,
|
|
||||||
const ArgType2 & arg2 ,
|
|
||||||
const ArgType3 & arg3 )
|
|
||||||
{
|
|
||||||
typedef typename
|
|
||||||
Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, void , void , void , void
|
|
||||||
>::type
|
|
||||||
DstViewType ;
|
|
||||||
DstViewType sub_view;
|
|
||||||
sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3);
|
|
||||||
sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3);
|
|
||||||
sub_view.modified_device = src.modified_device;
|
|
||||||
sub_view.modified_host = src.modified_host;
|
|
||||||
return sub_view;
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class D , class A1 , class A2 , class A3 ,
|
|
||||||
class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 ,
|
|
||||||
class ArgType4 >
|
|
||||||
typename Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, ArgType4 , void , void , void
|
|
||||||
>::type
|
|
||||||
subview( const DualView<D,A1,A2,A3> & src ,
|
|
||||||
const ArgType0 & arg0 ,
|
|
||||||
const ArgType1 & arg1 ,
|
|
||||||
const ArgType2 & arg2 ,
|
|
||||||
const ArgType3 & arg3 ,
|
|
||||||
const ArgType4 & arg4 )
|
|
||||||
{
|
|
||||||
typedef typename
|
|
||||||
Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, ArgType4 , void , void ,void
|
|
||||||
>::type
|
|
||||||
DstViewType ;
|
|
||||||
DstViewType sub_view;
|
|
||||||
sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3,arg4);
|
|
||||||
sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3,arg4);
|
|
||||||
sub_view.modified_device = src.modified_device;
|
|
||||||
sub_view.modified_host = src.modified_host;
|
|
||||||
return sub_view;
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class D , class A1 , class A2 , class A3 ,
|
|
||||||
class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 ,
|
|
||||||
class ArgType4 , class ArgType5 >
|
|
||||||
typename Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, ArgType4 , ArgType5 , void , void
|
|
||||||
>::type
|
|
||||||
subview( const DualView<D,A1,A2,A3> & src ,
|
|
||||||
const ArgType0 & arg0 ,
|
|
||||||
const ArgType1 & arg1 ,
|
|
||||||
const ArgType2 & arg2 ,
|
|
||||||
const ArgType3 & arg3 ,
|
|
||||||
const ArgType4 & arg4 ,
|
|
||||||
const ArgType5 & arg5 )
|
|
||||||
{
|
|
||||||
typedef typename
|
|
||||||
Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, ArgType4 , ArgType5 , void , void
|
|
||||||
>::type
|
|
||||||
DstViewType ;
|
|
||||||
DstViewType sub_view;
|
|
||||||
sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3,arg4,arg5);
|
|
||||||
sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3,arg4,arg5);
|
|
||||||
sub_view.modified_device = src.modified_device;
|
|
||||||
sub_view.modified_host = src.modified_host;
|
|
||||||
return sub_view;
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class D , class A1 , class A2 , class A3 ,
|
|
||||||
class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 ,
|
|
||||||
class ArgType4 , class ArgType5 , class ArgType6 >
|
|
||||||
typename Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, ArgType4 , ArgType5 , ArgType6 , void
|
|
||||||
>::type
|
|
||||||
subview( const DualView<D,A1,A2,A3> & src ,
|
|
||||||
const ArgType0 & arg0 ,
|
|
||||||
const ArgType1 & arg1 ,
|
|
||||||
const ArgType2 & arg2 ,
|
|
||||||
const ArgType3 & arg3 ,
|
|
||||||
const ArgType4 & arg4 ,
|
|
||||||
const ArgType5 & arg5 ,
|
|
||||||
const ArgType6 & arg6 )
|
|
||||||
{
|
|
||||||
typedef typename
|
|
||||||
Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, ArgType4 , ArgType5 , ArgType6 , void
|
|
||||||
>::type
|
|
||||||
DstViewType ;
|
|
||||||
DstViewType sub_view;
|
|
||||||
sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3,arg4,arg5,arg6);
|
|
||||||
sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3,arg4,arg5,arg6);
|
|
||||||
sub_view.modified_device = src.modified_device;
|
|
||||||
sub_view.modified_host = src.modified_host;
|
|
||||||
return sub_view;
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class D , class A1 , class A2 , class A3 ,
|
|
||||||
class ArgType0 , class ArgType1 , class ArgType2 , class ArgType3 ,
|
|
||||||
class ArgType4 , class ArgType5 , class ArgType6 , class ArgType7 >
|
|
||||||
typename Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, ArgType4 , ArgType5 , ArgType6 , ArgType7
|
|
||||||
>::type
|
|
||||||
subview( const DualView<D,A1,A2,A3> & src ,
|
|
||||||
const ArgType0 & arg0 ,
|
|
||||||
const ArgType1 & arg1 ,
|
|
||||||
const ArgType2 & arg2 ,
|
|
||||||
const ArgType3 & arg3 ,
|
|
||||||
const ArgType4 & arg4 ,
|
|
||||||
const ArgType5 & arg5 ,
|
|
||||||
const ArgType6 & arg6 ,
|
|
||||||
const ArgType7 & arg7 )
|
|
||||||
{
|
|
||||||
typedef typename
|
|
||||||
Impl::ViewSubview< DualView<D,A1,A2,A3>
|
|
||||||
, ArgType0 , ArgType1 , ArgType2 , ArgType3
|
|
||||||
, ArgType4 , ArgType5 , ArgType6 , ArgType7
|
|
||||||
>::type
|
|
||||||
DstViewType ;
|
|
||||||
DstViewType sub_view;
|
|
||||||
sub_view.d_view = subview(src.d_view,arg0,arg1,arg2,arg3,arg4,arg5,arg6,arg7);
|
|
||||||
sub_view.h_view = subview(src.h_view,arg0,arg1,arg2,arg3,arg4,arg5,arg6,arg7);
|
|
||||||
sub_view.modified_device = src.modified_device;
|
|
||||||
sub_view.modified_host = src.modified_host;
|
|
||||||
return sub_view;
|
|
||||||
}
|
|
||||||
|
|
||||||
} // namespace Kokkos
|
|
||||||
|
|
||||||
#endif /* KOKKOS_USING_EXP_VIEW */
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|||||||
@ -223,14 +223,85 @@ struct DynRankDimTraits {
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
template < typename DynRankViewType , typename iType >
|
|
||||||
void verify_dynrankview_rank ( iType N , const DynRankViewType &drv )
|
/** \brief Debug bounds-checking routines */
|
||||||
{
|
// Enhanced debug checking - most infrastructure matches that of functions in
|
||||||
if ( static_cast<iType>(drv.rank()) > N )
|
// Kokkos_ViewMapping; additional checks for extra arguments beyond rank are 0
|
||||||
{
|
template< unsigned , typename iType0 , class MapType >
|
||||||
Kokkos::abort( "Need at least rank arguments to the operator()" );
|
KOKKOS_INLINE_FUNCTION
|
||||||
}
|
bool dyn_rank_view_verify_operator_bounds( const iType0 & , const MapType & )
|
||||||
|
{ return true ; }
|
||||||
|
|
||||||
|
template< unsigned R , typename iType0 , class MapType , typename iType1 , class ... Args >
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
bool dyn_rank_view_verify_operator_bounds
|
||||||
|
( const iType0 & rank
|
||||||
|
, const MapType & map
|
||||||
|
, const iType1 & i
|
||||||
|
, Args ... args
|
||||||
|
)
|
||||||
|
{
|
||||||
|
if ( static_cast<iType0>(R) < rank ) {
|
||||||
|
return ( size_t(i) < map.extent(R) )
|
||||||
|
&& dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
|
||||||
}
|
}
|
||||||
|
else if ( i != 0 ) {
|
||||||
|
printf("DynRankView Debug Bounds Checking Error: at rank %u\n Extra arguments beyond the rank must be zero \n",R);
|
||||||
|
return ( false )
|
||||||
|
&& dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
return ( true )
|
||||||
|
&& dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
template< unsigned , class MapType >
|
||||||
|
inline
|
||||||
|
void dyn_rank_view_error_operator_bounds( char * , int , const MapType & )
|
||||||
|
{}
|
||||||
|
|
||||||
|
template< unsigned R , class MapType , class iType , class ... Args >
|
||||||
|
inline
|
||||||
|
void dyn_rank_view_error_operator_bounds
|
||||||
|
( char * buf
|
||||||
|
, int len
|
||||||
|
, const MapType & map
|
||||||
|
, const iType & i
|
||||||
|
, Args ... args
|
||||||
|
)
|
||||||
|
{
|
||||||
|
const int n =
|
||||||
|
snprintf(buf,len," %ld < %ld %c"
|
||||||
|
, static_cast<unsigned long>(i)
|
||||||
|
, static_cast<unsigned long>( map.extent(R) )
|
||||||
|
, ( sizeof...(Args) ? ',' : ')' )
|
||||||
|
);
|
||||||
|
dyn_rank_view_error_operator_bounds<R+1>(buf+n,len-n,map,args...);
|
||||||
|
}
|
||||||
|
|
||||||
|
// op_rank = rank of the operator version that was called
|
||||||
|
template< typename iType0 , typename iType1 , class MapType , class ... Args >
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void dyn_rank_view_verify_operator_bounds
|
||||||
|
( const iType0 & op_rank , const iType1 & rank , const char* label , const MapType & map , Args ... args )
|
||||||
|
{
|
||||||
|
if ( static_cast<iType0>(rank) > op_rank ) {
|
||||||
|
Kokkos::abort( "DynRankView Bounds Checking Error: Need at least rank arguments to the operator()" );
|
||||||
|
}
|
||||||
|
|
||||||
|
if ( ! dyn_rank_view_verify_operator_bounds<0>( rank , map , args ... ) ) {
|
||||||
|
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
|
||||||
|
enum { LEN = 1024 };
|
||||||
|
char buffer[ LEN ];
|
||||||
|
int n = snprintf(buffer,LEN,"DynRankView bounds error of view %s (", label);
|
||||||
|
dyn_rank_view_error_operator_bounds<0>( buffer + n , LEN - n , map , args ... );
|
||||||
|
Kokkos::Impl::throw_runtime_exception(std::string(buffer));
|
||||||
|
#else
|
||||||
|
Kokkos::abort("DynRankView bounds error");
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
/** \brief Assign compatible default mappings */
|
/** \brief Assign compatible default mappings */
|
||||||
@ -341,7 +412,6 @@ class DynRankView : public ViewTraits< DataType , Properties ... >
|
|||||||
|
|
||||||
private:
|
private:
|
||||||
template < class , class ... > friend class DynRankView ;
|
template < class , class ... > friend class DynRankView ;
|
||||||
// template < class , class ... > friend class Kokkos::Experimental::View ; //unnecessary now...
|
|
||||||
template < class , class ... > friend class Impl::ViewMapping ;
|
template < class , class ... > friend class Impl::ViewMapping ;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
@ -504,20 +574,26 @@ private:
|
|||||||
( is_layout_left || is_layout_right || is_layout_stride )
|
( is_layout_left || is_layout_right || is_layout_stride )
|
||||||
};
|
};
|
||||||
|
|
||||||
|
template< class Space , bool = Kokkos::Impl::MemorySpaceAccess< Space , typename traits::memory_space >::accessible > struct verify_space
|
||||||
|
{ KOKKOS_FORCEINLINE_FUNCTION static void check() {} };
|
||||||
|
|
||||||
|
template< class Space > struct verify_space<Space,false>
|
||||||
|
{ KOKKOS_FORCEINLINE_FUNCTION static void check()
|
||||||
|
{ Kokkos::abort("Kokkos::DynRankView ERROR: attempt to access inaccessible memory space"); };
|
||||||
|
};
|
||||||
|
|
||||||
// Bounds checking macros
|
// Bounds checking macros
|
||||||
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
|
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
|
||||||
|
|
||||||
#define KOKKOS_VIEW_OPERATOR_VERIFY( N , ARG ) \
|
// rank of the calling operator - included as first argument in ARG
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace \
|
#define KOKKOS_VIEW_OPERATOR_VERIFY( ARG ) \
|
||||||
< Kokkos::Impl::ActiveExecutionMemorySpace , typename traits::memory_space >::verify(); \
|
DynRankView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check(); \
|
||||||
Kokkos::Experimental::Impl::verify_dynrankview_rank ( N , *this ) ; \
|
Kokkos::Experimental::Impl::dyn_rank_view_verify_operator_bounds ARG ;
|
||||||
Kokkos::Experimental::Impl::view_verify_operator_bounds ARG ;
|
|
||||||
|
|
||||||
#else
|
#else
|
||||||
|
|
||||||
#define KOKKOS_VIEW_OPERATOR_VERIFY( N , ARG ) \
|
#define KOKKOS_VIEW_OPERATOR_VERIFY( ARG ) \
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace \
|
DynRankView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
|
||||||
< Kokkos::Impl::ActiveExecutionMemorySpace , typename traits::memory_space >::verify();
|
|
||||||
|
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
@ -532,7 +608,11 @@ public:
|
|||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
reference_type operator()() const
|
reference_type operator()() const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 0 , ( implementation_map() ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (0 , this->rank() , NULL , m_map) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (0 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map) )
|
||||||
|
#endif
|
||||||
return implementation_map().reference();
|
return implementation_map().reference();
|
||||||
//return m_map.reference(0,0,0,0,0,0,0);
|
//return m_map.reference(0,0,0,0,0,0,0);
|
||||||
}
|
}
|
||||||
@ -563,12 +643,17 @@ public:
|
|||||||
return rankone_view(i0);
|
return rankone_view(i0);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Rank 1 parenthesis
|
||||||
template< typename iType >
|
template< typename iType >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
|
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
|
||||||
operator()(const iType & i0 ) const
|
operator()(const iType & i0 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 1 , ( m_map , i0 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (1 , this->rank() , NULL , m_map , i0) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (1 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0);
|
return m_map.reference(i0);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -577,6 +662,11 @@ public:
|
|||||||
typename std::enable_if< !(std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
|
typename std::enable_if< !(std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
|
||||||
operator()(const iType & i0 ) const
|
operator()(const iType & i0 ) const
|
||||||
{
|
{
|
||||||
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (1 , this->rank() , NULL , m_map , i0) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (1 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,0,0,0,0,0,0);
|
return m_map.reference(i0,0,0,0,0,0,0);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -586,7 +676,11 @@ public:
|
|||||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value), reference_type>::type
|
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 2 , ( m_map , i0 , i1 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (2 , this->rank() , NULL , m_map , i0 , i1) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (2 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1);
|
return m_map.reference(i0,i1);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -595,7 +689,11 @@ public:
|
|||||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 2 , ( m_map , i0 , i1 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (2 , this->rank() , NULL , m_map , i0 , i1) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (2 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,0,0,0,0,0);
|
return m_map.reference(i0,i1,0,0,0,0,0);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -605,7 +703,11 @@ public:
|
|||||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value), reference_type>::type
|
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 3 , ( m_map , i0 , i1 , i2 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (3 , this->rank() , NULL , m_map , i0 , i1 , i2) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (3 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,i2);
|
return m_map.reference(i0,i1,i2);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -614,7 +716,11 @@ public:
|
|||||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 3 , ( m_map , i0 , i1 , i2 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (3 , this->rank() , NULL , m_map , i0 , i1 , i2) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (3 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,i2,0,0,0,0);
|
return m_map.reference(i0,i1,i2,0,0,0,0);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -624,7 +730,11 @@ public:
|
|||||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value), reference_type>::type
|
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 4 , ( m_map , i0 , i1 , i2 , i3 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (4 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (4 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,i2,i3);
|
return m_map.reference(i0,i1,i2,i3);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -633,7 +743,11 @@ public:
|
|||||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 4 , ( m_map , i0 , i1 , i2 , i3 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (4 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (4 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,i2,i3,0,0,0);
|
return m_map.reference(i0,i1,i2,i3,0,0,0);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -643,7 +757,11 @@ public:
|
|||||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value), reference_type>::type
|
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 5 , ( m_map , i0 , i1 , i2 , i3 , i4 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (5 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (5 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,i2,i3,i4);
|
return m_map.reference(i0,i1,i2,i3,i4);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -652,7 +770,11 @@ public:
|
|||||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 5 , ( m_map , i0 , i1 , i2 , i3 , i4 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (5 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (5 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,i2,i3,i4,0,0);
|
return m_map.reference(i0,i1,i2,i3,i4,0,0);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -662,7 +784,11 @@ public:
|
|||||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value), reference_type>::type
|
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 6 , ( m_map , i0 , i1 , i2 , i3 , i4 , i5 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (6 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (6 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,i2,i3,i4,i5);
|
return m_map.reference(i0,i1,i2,i3,i4,i5);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -671,7 +797,11 @@ public:
|
|||||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 6 , ( m_map , i0 , i1 , i2 , i3 , i4 , i5 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (6 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (6 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,i2,i3,i4,i5,0);
|
return m_map.reference(i0,i1,i2,i3,i4,i5,0);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -681,7 +811,11 @@ public:
|
|||||||
typename std::enable_if< (std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value && std::is_integral<iType6>::value), reference_type>::type
|
typename std::enable_if< (std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value && std::is_integral<iType6>::value), reference_type>::type
|
||||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 , const iType6 & i6 ) const
|
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 , const iType6 & i6 ) const
|
||||||
{
|
{
|
||||||
KOKKOS_VIEW_OPERATOR_VERIFY( 7 , ( m_map , i0 , i1 , i2 , i3 , i4 , i5 , i6 ) )
|
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (7 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5 , i6) )
|
||||||
|
#else
|
||||||
|
KOKKOS_VIEW_OPERATOR_VERIFY( (7 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,i6) )
|
||||||
|
#endif
|
||||||
return m_map.reference(i0,i1,i2,i3,i4,i5,i6);
|
return m_map.reference(i0,i1,i2,i3,i4,i5,i6);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -1136,13 +1270,13 @@ private:
|
|||||||
|
|
||||||
public:
|
public:
|
||||||
|
|
||||||
typedef Kokkos::Experimental::ViewTraits
|
typedef Kokkos::ViewTraits
|
||||||
< data_type
|
< data_type
|
||||||
, array_layout
|
, array_layout
|
||||||
, typename SrcTraits::device_type
|
, typename SrcTraits::device_type
|
||||||
, typename SrcTraits::memory_traits > traits_type ;
|
, typename SrcTraits::memory_traits > traits_type ;
|
||||||
|
|
||||||
typedef Kokkos::Experimental::View
|
typedef Kokkos::View
|
||||||
< data_type
|
< data_type
|
||||||
, array_layout
|
, array_layout
|
||||||
, typename SrcTraits::device_type
|
, typename SrcTraits::device_type
|
||||||
@ -1154,13 +1288,13 @@ public:
|
|||||||
|
|
||||||
static_assert( Kokkos::Impl::is_memory_traits< MemoryTraits >::value , "" );
|
static_assert( Kokkos::Impl::is_memory_traits< MemoryTraits >::value , "" );
|
||||||
|
|
||||||
typedef Kokkos::Experimental::ViewTraits
|
typedef Kokkos::ViewTraits
|
||||||
< data_type
|
< data_type
|
||||||
, array_layout
|
, array_layout
|
||||||
, typename SrcTraits::device_type
|
, typename SrcTraits::device_type
|
||||||
, MemoryTraits > traits_type ;
|
, MemoryTraits > traits_type ;
|
||||||
|
|
||||||
typedef Kokkos::Experimental::View
|
typedef Kokkos::View
|
||||||
< data_type
|
< data_type
|
||||||
, array_layout
|
, array_layout
|
||||||
, typename SrcTraits::device_type
|
, typename SrcTraits::device_type
|
||||||
@ -1264,7 +1398,7 @@ subdynrankview( const Kokkos::Experimental::DynRankView< D , P... > &src , Args.
|
|||||||
if ( src.rank() > sizeof...(Args) ) //allow sizeof...(Args) >= src.rank(), ignore the remaining args
|
if ( src.rank() > sizeof...(Args) ) //allow sizeof...(Args) >= src.rank(), ignore the remaining args
|
||||||
{ Kokkos::abort("subdynrankview: num of args must be >= rank of the source DynRankView"); }
|
{ Kokkos::abort("subdynrankview: num of args must be >= rank of the source DynRankView"); }
|
||||||
|
|
||||||
typedef Kokkos::Experimental::Impl::ViewMapping< Kokkos::Experimental::Impl::DynRankSubviewTag , Kokkos::Experimental::ViewTraits< D*******, P... > , Args... > metafcn ;
|
typedef Kokkos::Experimental::Impl::ViewMapping< Kokkos::Experimental::Impl::DynRankSubviewTag , Kokkos::ViewTraits< D*******, P... > , Args... > metafcn ;
|
||||||
|
|
||||||
return metafcn::subview( src.rank() , src , args... );
|
return metafcn::subview( src.rank() , src , args... );
|
||||||
}
|
}
|
||||||
@ -1502,10 +1636,10 @@ void deep_copy
|
|||||||
typedef typename src_type::memory_space src_memory_space ;
|
typedef typename src_type::memory_space src_memory_space ;
|
||||||
|
|
||||||
enum { DstExecCanAccessSrc =
|
enum { DstExecCanAccessSrc =
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename dst_execution_space::memory_space , src_memory_space >::value };
|
Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
|
||||||
|
|
||||||
enum { SrcExecCanAccessDst =
|
enum { SrcExecCanAccessDst =
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename src_execution_space::memory_space , dst_memory_space >::value };
|
Kokkos::Impl::SpaceAccessibility< src_execution_space , dst_memory_space >::accessible };
|
||||||
|
|
||||||
if ( (void *) dst.data() != (void*) src.data() ) {
|
if ( (void *) dst.data() != (void*) src.data() ) {
|
||||||
|
|
||||||
@ -1666,7 +1800,7 @@ inline
|
|||||||
typename DynRankView<T,P...>::HostMirror
|
typename DynRankView<T,P...>::HostMirror
|
||||||
create_mirror( const DynRankView<T,P...> & src
|
create_mirror( const DynRankView<T,P...> & src
|
||||||
, typename std::enable_if<
|
, typename std::enable_if<
|
||||||
! std::is_same< typename Kokkos::Experimental::ViewTraits<T,P...>::array_layout
|
! std::is_same< typename Kokkos::ViewTraits<T,P...>::array_layout
|
||||||
, Kokkos::LayoutStride >::value
|
, Kokkos::LayoutStride >::value
|
||||||
>::type * = 0
|
>::type * = 0
|
||||||
)
|
)
|
||||||
@ -1684,7 +1818,7 @@ inline
|
|||||||
typename DynRankView<T,P...>::HostMirror
|
typename DynRankView<T,P...>::HostMirror
|
||||||
create_mirror( const DynRankView<T,P...> & src
|
create_mirror( const DynRankView<T,P...> & src
|
||||||
, typename std::enable_if<
|
, typename std::enable_if<
|
||||||
std::is_same< typename Kokkos::Experimental::ViewTraits<T,P...>::array_layout
|
std::is_same< typename Kokkos::ViewTraits<T,P...>::array_layout
|
||||||
, Kokkos::LayoutStride >::value
|
, Kokkos::LayoutStride >::value
|
||||||
>::type * = 0
|
>::type * = 0
|
||||||
)
|
)
|
||||||
@ -1779,7 +1913,7 @@ void resize( DynRankView<T,P...> & v ,
|
|||||||
{
|
{
|
||||||
typedef DynRankView<T,P...> drview_type ;
|
typedef DynRankView<T,P...> drview_type ;
|
||||||
|
|
||||||
static_assert( Kokkos::Experimental::ViewTraits<T,P...>::is_managed , "Can only resize managed views" );
|
static_assert( Kokkos::ViewTraits<T,P...>::is_managed , "Can only resize managed views" );
|
||||||
|
|
||||||
drview_type v_resized( v.label(), n0, n1, n2, n3, n4, n5, n6 );
|
drview_type v_resized( v.label(), n0, n1, n2, n3, n4, n5, n6 );
|
||||||
|
|
||||||
@ -1803,7 +1937,7 @@ void realloc( DynRankView<T,P...> & v ,
|
|||||||
{
|
{
|
||||||
typedef DynRankView<T,P...> drview_type ;
|
typedef DynRankView<T,P...> drview_type ;
|
||||||
|
|
||||||
static_assert( Kokkos::Experimental::ViewTraits<T,P...>::is_managed , "Can only realloc managed views" );
|
static_assert( Kokkos::ViewTraits<T,P...>::is_managed , "Can only realloc managed views" );
|
||||||
|
|
||||||
const std::string label = v.label();
|
const std::string label = v.label();
|
||||||
|
|
||||||
|
|||||||
@ -56,7 +56,7 @@ namespace Experimental {
|
|||||||
* Subviews are not allowed.
|
* Subviews are not allowed.
|
||||||
*/
|
*/
|
||||||
template< typename DataType , typename ... P >
|
template< typename DataType , typename ... P >
|
||||||
class DynamicView : public Kokkos::Experimental::ViewTraits< DataType , P ... >
|
class DynamicView : public Kokkos::ViewTraits< DataType , P ... >
|
||||||
{
|
{
|
||||||
public:
|
public:
|
||||||
|
|
||||||
@ -75,6 +75,15 @@ private:
|
|||||||
std::is_same< typename traits::specialize , void >::value
|
std::is_same< typename traits::specialize , void >::value
|
||||||
, "DynamicView must have trivial data type" );
|
, "DynamicView must have trivial data type" );
|
||||||
|
|
||||||
|
|
||||||
|
template< class Space , bool = Kokkos::Impl::MemorySpaceAccess< Space , typename traits::memory_space >::accessible > struct verify_space
|
||||||
|
{ KOKKOS_FORCEINLINE_FUNCTION static void check() {} };
|
||||||
|
|
||||||
|
template< class Space > struct verify_space<Space,false>
|
||||||
|
{ KOKKOS_FORCEINLINE_FUNCTION static void check()
|
||||||
|
{ Kokkos::abort("Kokkos::DynamicView ERROR: attempt to access inaccessible memory space"); };
|
||||||
|
};
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
|
||||||
typedef Kokkos::Experimental::MemoryPool< typename traits::device_type > memory_pool ;
|
typedef Kokkos::Experimental::MemoryPool< typename traits::device_type > memory_pool ;
|
||||||
@ -117,10 +126,10 @@ public:
|
|||||||
KOKKOS_INLINE_FUNCTION constexpr size_t size() const
|
KOKKOS_INLINE_FUNCTION constexpr size_t size() const
|
||||||
{
|
{
|
||||||
return
|
return
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
|
Kokkos::Impl::MemorySpaceAccess
|
||||||
< Kokkos::Impl::ActiveExecutionMemorySpace
|
< Kokkos::Impl::ActiveExecutionMemorySpace
|
||||||
, typename traits::memory_space
|
, typename traits::memory_space
|
||||||
>::value
|
>::accessible
|
||||||
? // Runtime size is at the end of the chunk pointer array
|
? // Runtime size is at the end of the chunk pointer array
|
||||||
(*reinterpret_cast<const uintptr_t*>( m_chunks + m_chunk_max ))
|
(*reinterpret_cast<const uintptr_t*>( m_chunks + m_chunk_max ))
|
||||||
<< m_chunk_shift
|
<< m_chunk_shift
|
||||||
@ -179,10 +188,7 @@ public:
|
|||||||
static_assert( Kokkos::Impl::are_integral<I0,Args...>::value
|
static_assert( Kokkos::Impl::are_integral<I0,Args...>::value
|
||||||
, "Indices must be integral type" );
|
, "Indices must be integral type" );
|
||||||
|
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
|
DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
|
||||||
< Kokkos::Impl::ActiveExecutionMemorySpace
|
|
||||||
, typename traits::memory_space
|
|
||||||
>::verify();
|
|
||||||
|
|
||||||
// Which chunk is being indexed.
|
// Which chunk is being indexed.
|
||||||
const uintptr_t ic = uintptr_t( i0 >> m_chunk_shift );
|
const uintptr_t ic = uintptr_t( i0 >> m_chunk_shift );
|
||||||
@ -223,15 +229,13 @@ public:
|
|||||||
{
|
{
|
||||||
typedef typename traits::value_type value_type ;
|
typedef typename traits::value_type value_type ;
|
||||||
|
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
|
DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
|
||||||
< Kokkos::Impl::ActiveExecutionMemorySpace
|
|
||||||
, typename traits::memory_space >::verify();
|
|
||||||
|
|
||||||
const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
|
const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
|
||||||
|
|
||||||
if ( m_chunk_max < NC ) {
|
if ( m_chunk_max < NC ) {
|
||||||
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
|
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
|
||||||
printf("DynamicView::resize_parallel(%lu) m_chunk_max(%lu) NC(%lu)\n"
|
printf("DynamicView::resize_parallel(%lu) m_chunk_max(%u) NC(%lu)\n"
|
||||||
, n , m_chunk_max , NC );
|
, n , m_chunk_max , NC );
|
||||||
#endif
|
#endif
|
||||||
Kokkos::abort("DynamicView::resize_parallel exceeded maximum size");
|
Kokkos::abort("DynamicView::resize_parallel exceeded maximum size");
|
||||||
@ -269,9 +273,7 @@ public:
|
|||||||
inline
|
inline
|
||||||
void resize_serial( size_t n )
|
void resize_serial( size_t n )
|
||||||
{
|
{
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
|
DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
|
||||||
< Kokkos::Impl::ActiveExecutionMemorySpace
|
|
||||||
, typename traits::memory_space >::verify();
|
|
||||||
|
|
||||||
const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
|
const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
|
||||||
|
|
||||||
@ -398,9 +400,7 @@ public:
|
|||||||
, m_chunk_mask( ( 1 << m_chunk_shift ) - 1 )
|
, m_chunk_mask( ( 1 << m_chunk_shift ) - 1 )
|
||||||
, m_chunk_max( ( arg_size_max + m_chunk_mask ) >> m_chunk_shift )
|
, m_chunk_max( ( arg_size_max + m_chunk_mask ) >> m_chunk_shift )
|
||||||
{
|
{
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace
|
DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
|
||||||
< Kokkos::Impl::ActiveExecutionMemorySpace
|
|
||||||
, typename traits::memory_space >::verify();
|
|
||||||
|
|
||||||
// A functor to deallocate all of the chunks upon final destruction
|
// A functor to deallocate all of the chunks upon final destruction
|
||||||
|
|
||||||
@ -452,7 +452,7 @@ void deep_copy( const View<T,DP...> & dst
|
|||||||
typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;
|
typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;
|
||||||
|
|
||||||
enum { DstExecCanAccessSrc =
|
enum { DstExecCanAccessSrc =
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename dst_execution_space::memory_space , src_memory_space >::value };
|
Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
|
||||||
|
|
||||||
if ( DstExecCanAccessSrc ) {
|
if ( DstExecCanAccessSrc ) {
|
||||||
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
|
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
|
||||||
@ -476,7 +476,7 @@ void deep_copy( const DynamicView<T,DP...> & dst
|
|||||||
typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;
|
typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;
|
||||||
|
|
||||||
enum { DstExecCanAccessSrc =
|
enum { DstExecCanAccessSrc =
|
||||||
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< typename dst_execution_space::memory_space , src_memory_space >::value };
|
Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
|
||||||
|
|
||||||
if ( DstExecCanAccessSrc ) {
|
if ( DstExecCanAccessSrc ) {
|
||||||
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
|
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
|
||||||
|
|||||||
196
lib/kokkos/containers/src/Kokkos_ErrorReporter.hpp
Normal file
196
lib/kokkos/containers/src/Kokkos_ErrorReporter.hpp
Normal file
@ -0,0 +1,196 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef KOKKOS_EXPERIMENTAL_ERROR_REPORTER_HPP
|
||||||
|
#define KOKKOS_EXPERIMENTAL_ERROR_REPORTER_HPP
|
||||||
|
|
||||||
|
#include <vector>
|
||||||
|
#include <Kokkos_Core.hpp>
|
||||||
|
#include <Kokkos_View.hpp>
|
||||||
|
#include <Kokkos_DualView.hpp>
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
namespace Experimental {
|
||||||
|
|
||||||
|
template <typename ReportType, typename DeviceType>
|
||||||
|
class ErrorReporter
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
|
||||||
|
typedef ReportType report_type;
|
||||||
|
typedef DeviceType device_type;
|
||||||
|
typedef typename device_type::execution_space execution_space;
|
||||||
|
|
||||||
|
ErrorReporter(int max_results)
|
||||||
|
: m_numReportsAttempted(""),
|
||||||
|
m_reports("", max_results),
|
||||||
|
m_reporters("", max_results)
|
||||||
|
{
|
||||||
|
clear();
|
||||||
|
}
|
||||||
|
|
||||||
|
int getCapacity() const { return m_reports.h_view.dimension_0(); }
|
||||||
|
|
||||||
|
int getNumReports();
|
||||||
|
|
||||||
|
int getNumReportAttempts();
|
||||||
|
|
||||||
|
void getReports(std::vector<int> &reporters_out, std::vector<report_type> &reports_out);
|
||||||
|
void getReports( typename Kokkos::View<int*, typename DeviceType::execution_space >::HostMirror &reporters_out,
|
||||||
|
typename Kokkos::View<report_type*, typename DeviceType::execution_space >::HostMirror &reports_out);
|
||||||
|
|
||||||
|
void clear();
|
||||||
|
|
||||||
|
void resize(const size_t new_size);
|
||||||
|
|
||||||
|
bool full() {return (getNumReportAttempts() >= getCapacity()); }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
bool add_report(int reporter_id, report_type report) const
|
||||||
|
{
|
||||||
|
int idx = Kokkos::atomic_fetch_add(&m_numReportsAttempted(), 1);
|
||||||
|
|
||||||
|
if (idx >= 0 && (idx < static_cast<int>(m_reports.d_view.dimension_0()))) {
|
||||||
|
m_reporters.d_view(idx) = reporter_id;
|
||||||
|
m_reports.d_view(idx) = report;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
|
||||||
|
typedef Kokkos::View<report_type *, execution_space> reports_view_t;
|
||||||
|
typedef Kokkos::DualView<report_type *, execution_space> reports_dualview_t;
|
||||||
|
|
||||||
|
typedef typename reports_dualview_t::host_mirror_space host_mirror_space;
|
||||||
|
Kokkos::View<int, execution_space> m_numReportsAttempted;
|
||||||
|
reports_dualview_t m_reports;
|
||||||
|
Kokkos::DualView<int *, execution_space> m_reporters;
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
template <typename ReportType, typename DeviceType>
|
||||||
|
inline int ErrorReporter<ReportType, DeviceType>::getNumReports()
|
||||||
|
{
|
||||||
|
int num_reports = 0;
|
||||||
|
Kokkos::deep_copy(num_reports,m_numReportsAttempted);
|
||||||
|
if (num_reports > static_cast<int>(m_reports.h_view.dimension_0())) {
|
||||||
|
num_reports = m_reports.h_view.dimension_0();
|
||||||
|
}
|
||||||
|
return num_reports;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename ReportType, typename DeviceType>
|
||||||
|
inline int ErrorReporter<ReportType, DeviceType>::getNumReportAttempts()
|
||||||
|
{
|
||||||
|
int num_reports = 0;
|
||||||
|
Kokkos::deep_copy(num_reports,m_numReportsAttempted);
|
||||||
|
return num_reports;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename ReportType, typename DeviceType>
|
||||||
|
void ErrorReporter<ReportType, DeviceType>::getReports(std::vector<int> &reporters_out, std::vector<report_type> &reports_out)
|
||||||
|
{
|
||||||
|
int num_reports = getNumReports();
|
||||||
|
reporters_out.clear();
|
||||||
|
reporters_out.reserve(num_reports);
|
||||||
|
reports_out.clear();
|
||||||
|
reports_out.reserve(num_reports);
|
||||||
|
|
||||||
|
if (num_reports > 0) {
|
||||||
|
m_reports.template sync<host_mirror_space>();
|
||||||
|
m_reporters.template sync<host_mirror_space>();
|
||||||
|
|
||||||
|
for (int i = 0; i < num_reports; ++i) {
|
||||||
|
reporters_out.push_back(m_reporters.h_view(i));
|
||||||
|
reports_out.push_back(m_reports.h_view(i));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename ReportType, typename DeviceType>
|
||||||
|
void ErrorReporter<ReportType, DeviceType>::getReports(
|
||||||
|
typename Kokkos::View<int*, typename DeviceType::execution_space >::HostMirror &reporters_out,
|
||||||
|
typename Kokkos::View<report_type*, typename DeviceType::execution_space >::HostMirror &reports_out)
|
||||||
|
{
|
||||||
|
int num_reports = getNumReports();
|
||||||
|
reporters_out = typename Kokkos::View<int*, typename DeviceType::execution_space >::HostMirror("ErrorReport::reporters_out",num_reports);
|
||||||
|
reports_out = typename Kokkos::View<report_type*, typename DeviceType::execution_space >::HostMirror("ErrorReport::reports_out",num_reports);
|
||||||
|
|
||||||
|
if (num_reports > 0) {
|
||||||
|
m_reports.template sync<host_mirror_space>();
|
||||||
|
m_reporters.template sync<host_mirror_space>();
|
||||||
|
|
||||||
|
for (int i = 0; i < num_reports; ++i) {
|
||||||
|
reporters_out(i) = m_reporters.h_view(i);
|
||||||
|
reports_out(i) = m_reports.h_view(i);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename ReportType, typename DeviceType>
|
||||||
|
void ErrorReporter<ReportType, DeviceType>::clear()
|
||||||
|
{
|
||||||
|
int num_reports=0;
|
||||||
|
Kokkos::deep_copy(m_numReportsAttempted, num_reports);
|
||||||
|
m_reports.template modify<execution_space>();
|
||||||
|
m_reporters.template modify<execution_space>();
|
||||||
|
}
|
||||||
|
|
||||||
|
template <typename ReportType, typename DeviceType>
|
||||||
|
void ErrorReporter<ReportType, DeviceType>::resize(const size_t new_size)
|
||||||
|
{
|
||||||
|
m_reports.resize(new_size);
|
||||||
|
m_reporters.resize(new_size);
|
||||||
|
Kokkos::fence();
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
} // namespace Experimental
|
||||||
|
} // namespace kokkos
|
||||||
|
|
||||||
|
#endif
|
||||||
@ -1,531 +0,0 @@
|
|||||||
/*
|
|
||||||
//@HEADER
|
|
||||||
// ************************************************************************
|
|
||||||
//
|
|
||||||
// Kokkos v. 2.0
|
|
||||||
// Copyright (2014) Sandia Corporation
|
|
||||||
//
|
|
||||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
|
||||||
// the U.S. Government retains certain rights in this software.
|
|
||||||
//
|
|
||||||
// Redistribution and use in source and binary forms, with or without
|
|
||||||
// modification, are permitted provided that the following conditions are
|
|
||||||
// met:
|
|
||||||
//
|
|
||||||
// 1. Redistributions of source code must retain the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer.
|
|
||||||
//
|
|
||||||
// 2. Redistributions in binary form must reproduce the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer in the
|
|
||||||
// documentation and/or other materials provided with the distribution.
|
|
||||||
//
|
|
||||||
// 3. Neither the name of the Corporation nor the names of the
|
|
||||||
// contributors may be used to endorse or promote products derived from
|
|
||||||
// this software without specific prior written permission.
|
|
||||||
//
|
|
||||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
|
||||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
||||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
||||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
|
||||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
|
||||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
|
||||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
|
||||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
|
||||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
|
||||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
|
||||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
||||||
//
|
|
||||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
|
||||||
//
|
|
||||||
// ************************************************************************
|
|
||||||
//@HEADER
|
|
||||||
*/
|
|
||||||
|
|
||||||
#ifndef KOKKOS_SEGMENTED_VIEW_HPP_
|
|
||||||
#define KOKKOS_SEGMENTED_VIEW_HPP_
|
|
||||||
|
|
||||||
#include <Kokkos_Core.hpp>
|
|
||||||
#include <impl/Kokkos_Error.hpp>
|
|
||||||
#include <cstdio>
|
|
||||||
|
|
||||||
#if ! KOKKOS_USING_EXP_VIEW
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
template<class DataType, class Arg1Type, class Arg2Type, class Arg3Type>
|
|
||||||
struct delete_segmented_view;
|
|
||||||
|
|
||||||
template<class MemorySpace>
|
|
||||||
inline
|
|
||||||
void DeviceSetAllocatableMemorySize(size_t) {}
|
|
||||||
|
|
||||||
#if defined( KOKKOS_HAVE_CUDA )
|
|
||||||
|
|
||||||
template<>
|
|
||||||
inline
|
|
||||||
void DeviceSetAllocatableMemorySize<Kokkos::CudaSpace>(size_t size) {
|
|
||||||
#ifdef __CUDACC__
|
|
||||||
size_t size_limit;
|
|
||||||
cudaDeviceGetLimit(&size_limit,cudaLimitMallocHeapSize);
|
|
||||||
if(size_limit<size)
|
|
||||||
cudaDeviceSetLimit(cudaLimitMallocHeapSize,2*size);
|
|
||||||
cudaDeviceGetLimit(&size_limit,cudaLimitMallocHeapSize);
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
template<>
|
|
||||||
inline
|
|
||||||
void DeviceSetAllocatableMemorySize<Kokkos::CudaUVMSpace>(size_t size) {
|
|
||||||
#ifdef __CUDACC__
|
|
||||||
size_t size_limit;
|
|
||||||
cudaDeviceGetLimit(&size_limit,cudaLimitMallocHeapSize);
|
|
||||||
if(size_limit<size)
|
|
||||||
cudaDeviceSetLimit(cudaLimitMallocHeapSize,2*size);
|
|
||||||
cudaDeviceGetLimit(&size_limit,cudaLimitMallocHeapSize);
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class DataType ,
|
|
||||||
class Arg1Type = void ,
|
|
||||||
class Arg2Type = void ,
|
|
||||||
class Arg3Type = void>
|
|
||||||
class SegmentedView : public Kokkos::ViewTraits< DataType , Arg1Type , Arg2Type, Arg3Type >
|
|
||||||
{
|
|
||||||
public:
|
|
||||||
//! \name Typedefs for device types and various Kokkos::View specializations.
|
|
||||||
//@{
|
|
||||||
typedef Kokkos::ViewTraits< DataType , Arg1Type , Arg2Type, Arg3Type > traits ;
|
|
||||||
|
|
||||||
//! The type of a Kokkos::View on the device.
|
|
||||||
typedef Kokkos::View< typename traits::data_type ,
|
|
||||||
typename traits::array_layout ,
|
|
||||||
typename traits::memory_space ,
|
|
||||||
Kokkos::MemoryUnmanaged > t_dev ;
|
|
||||||
|
|
||||||
|
|
||||||
private:
|
|
||||||
Kokkos::View<t_dev*,typename traits::memory_space> segments_;
|
|
||||||
|
|
||||||
Kokkos::View<int,typename traits::memory_space> realloc_lock;
|
|
||||||
Kokkos::View<int,typename traits::memory_space> nsegments_;
|
|
||||||
|
|
||||||
size_t segment_length_;
|
|
||||||
size_t segment_length_m1_;
|
|
||||||
int max_segments_;
|
|
||||||
|
|
||||||
int segment_length_log2;
|
|
||||||
|
|
||||||
// Dimensions, cardinality, capacity, and offset computation for
|
|
||||||
// multidimensional array view of contiguous memory.
|
|
||||||
// Inherits from Impl::Shape
|
|
||||||
typedef Kokkos::Impl::ViewOffset< typename traits::shape_type
|
|
||||||
, typename traits::array_layout
|
|
||||||
> offset_map_type ;
|
|
||||||
|
|
||||||
offset_map_type m_offset_map ;
|
|
||||||
|
|
||||||
typedef Kokkos::View< typename traits::array_intrinsic_type ,
|
|
||||||
typename traits::array_layout ,
|
|
||||||
typename traits::memory_space ,
|
|
||||||
typename traits::memory_traits > array_type ;
|
|
||||||
|
|
||||||
typedef Kokkos::View< typename traits::const_data_type ,
|
|
||||||
typename traits::array_layout ,
|
|
||||||
typename traits::memory_space ,
|
|
||||||
typename traits::memory_traits > const_type ;
|
|
||||||
|
|
||||||
typedef Kokkos::View< typename traits::non_const_data_type ,
|
|
||||||
typename traits::array_layout ,
|
|
||||||
typename traits::memory_space ,
|
|
||||||
typename traits::memory_traits > non_const_type ;
|
|
||||||
|
|
||||||
typedef Kokkos::View< typename traits::non_const_data_type ,
|
|
||||||
typename traits::array_layout ,
|
|
||||||
HostSpace ,
|
|
||||||
void > HostMirror ;
|
|
||||||
|
|
||||||
template< bool Accessible >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
typename Kokkos::Impl::enable_if< Accessible , typename traits::size_type >::type
|
|
||||||
dimension_0_intern() const { return nsegments_() * segment_length_ ; }
|
|
||||||
|
|
||||||
template< bool Accessible >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
typename Kokkos::Impl::enable_if< ! Accessible , typename traits::size_type >::type
|
|
||||||
dimension_0_intern() const
|
|
||||||
{
|
|
||||||
// In Host space
|
|
||||||
int n = 0 ;
|
|
||||||
#if ! defined( __CUDA_ARCH__ )
|
|
||||||
Kokkos::Impl::DeepCopy< HostSpace , typename traits::memory_space >( & n , nsegments_.ptr_on_device() , sizeof(int) );
|
|
||||||
#endif
|
|
||||||
|
|
||||||
return n * segment_length_ ;
|
|
||||||
}
|
|
||||||
|
|
||||||
public:
|
|
||||||
|
|
||||||
enum { Rank = traits::rank };
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION offset_map_type shape() const { return m_offset_map ; }
|
|
||||||
|
|
||||||
/* \brief return (current) size of dimension 0 */
|
|
||||||
KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_0() const {
|
|
||||||
enum { Accessible = Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<
|
|
||||||
Kokkos::Impl::ActiveExecutionMemorySpace, typename traits::memory_space >::value };
|
|
||||||
int n = SegmentedView::dimension_0_intern< Accessible >();
|
|
||||||
return n ;
|
|
||||||
}
|
|
||||||
|
|
||||||
/* \brief return size of dimension 1 */
|
|
||||||
KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_1() const { return m_offset_map.N1 ; }
|
|
||||||
/* \brief return size of dimension 2 */
|
|
||||||
KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_2() const { return m_offset_map.N2 ; }
|
|
||||||
/* \brief return size of dimension 3 */
|
|
||||||
KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_3() const { return m_offset_map.N3 ; }
|
|
||||||
/* \brief return size of dimension 4 */
|
|
||||||
KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_4() const { return m_offset_map.N4 ; }
|
|
||||||
/* \brief return size of dimension 5 */
|
|
||||||
KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_5() const { return m_offset_map.N5 ; }
|
|
||||||
/* \brief return size of dimension 6 */
|
|
||||||
KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_6() const { return m_offset_map.N6 ; }
|
|
||||||
/* \brief return size of dimension 7 */
|
|
||||||
KOKKOS_INLINE_FUNCTION typename traits::size_type dimension_7() const { return m_offset_map.N7 ; }
|
|
||||||
|
|
||||||
/* \brief return size of dimension 2 */
|
|
||||||
KOKKOS_INLINE_FUNCTION typename traits::size_type size() const {
|
|
||||||
return dimension_0() *
|
|
||||||
m_offset_map.N1 * m_offset_map.N2 * m_offset_map.N3 * m_offset_map.N4 *
|
|
||||||
m_offset_map.N5 * m_offset_map.N6 * m_offset_map.N7 ;
|
|
||||||
}
|
|
||||||
|
|
||||||
template< typename iType >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
typename traits::size_type dimension( const iType & i ) const {
|
|
||||||
if(i==0)
|
|
||||||
return dimension_0();
|
|
||||||
else
|
|
||||||
return Kokkos::Impl::dimension( m_offset_map , i );
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
typename traits::size_type capacity() {
|
|
||||||
return segments_.dimension_0() *
|
|
||||||
m_offset_map.N1 * m_offset_map.N2 * m_offset_map.N3 * m_offset_map.N4 *
|
|
||||||
m_offset_map.N5 * m_offset_map.N6 * m_offset_map.N7;
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
typename traits::size_type get_num_segments() {
|
|
||||||
enum { Accessible = Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<
|
|
||||||
Kokkos::Impl::ActiveExecutionMemorySpace, typename traits::memory_space >::value };
|
|
||||||
int n = SegmentedView::dimension_0_intern< Accessible >();
|
|
||||||
return n/segment_length_ ;
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
typename traits::size_type get_max_segments() {
|
|
||||||
return max_segments_;
|
|
||||||
}
|
|
||||||
|
|
||||||
/// \brief Constructor that allocates View objects with an initial length of 0.
|
|
||||||
///
|
|
||||||
/// This constructor works mostly like the analogous constructor of View.
|
|
||||||
/// The first argument is a string label, which is entirely for your
|
|
||||||
/// benefit. (Different SegmentedView objects may have the same label if
|
|
||||||
/// you like.) The second argument 'view_length' is the size of the segments.
|
|
||||||
/// This number must be a power of two. The third argument n0 is the maximum
|
|
||||||
/// value for the first dimension of the segmented view. The maximal allocatable
|
|
||||||
/// number of Segments is thus: (n0+view_length-1)/view_length.
|
|
||||||
/// The arguments that follow are the other dimensions of the (1-7) of the
|
|
||||||
/// View objects. For example, for a View with 3 runtime dimensions,
|
|
||||||
/// the first 4 integer arguments will be nonzero:
|
|
||||||
/// SegmentedView("Name",32768,10000000,8,4). This allocates a SegmentedView
|
|
||||||
/// with a maximum of 306 segments of dimension (32768,8,4). The logical size of
|
|
||||||
/// the segmented view is (n,8,4) with n between 0 and 10000000.
|
|
||||||
/// You may omit the integer arguments that follow.
|
|
||||||
template< class LabelType >
|
|
||||||
SegmentedView(const LabelType & label ,
|
|
||||||
const size_t view_length ,
|
|
||||||
const size_t n0 ,
|
|
||||||
const size_t n1 = 0 ,
|
|
||||||
const size_t n2 = 0 ,
|
|
||||||
const size_t n3 = 0 ,
|
|
||||||
const size_t n4 = 0 ,
|
|
||||||
const size_t n5 = 0 ,
|
|
||||||
const size_t n6 = 0 ,
|
|
||||||
const size_t n7 = 0
|
|
||||||
): segment_length_(view_length),segment_length_m1_(view_length-1)
|
|
||||||
{
|
|
||||||
segment_length_log2 = -1;
|
|
||||||
size_t l = segment_length_;
|
|
||||||
while(l>0) {
|
|
||||||
l>>=1;
|
|
||||||
segment_length_log2++;
|
|
||||||
}
|
|
||||||
l = 1<<segment_length_log2;
|
|
||||||
if(l!=segment_length_)
|
|
||||||
Kokkos::Impl::throw_runtime_exception("Kokkos::SegmentedView requires a 'power of 2' segment length");
|
|
||||||
|
|
||||||
max_segments_ = (n0+segment_length_m1_)/segment_length_;
|
|
||||||
|
|
||||||
Impl::DeviceSetAllocatableMemorySize<typename traits::memory_space>(segment_length_*max_segments_*sizeof(typename traits::value_type));
|
|
||||||
|
|
||||||
segments_ = Kokkos::View<t_dev*,typename traits::execution_space>(label , max_segments_);
|
|
||||||
realloc_lock = Kokkos::View<int,typename traits::execution_space>("Lock");
|
|
||||||
nsegments_ = Kokkos::View<int,typename traits::execution_space>("nviews");
|
|
||||||
m_offset_map.assign( n0, n1, n2, n3, n4, n5, n6, n7, n0*n1*n2*n3*n4*n5*n6*n7 );
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
SegmentedView(const SegmentedView& src):
|
|
||||||
segments_(src.segments_),
|
|
||||||
realloc_lock (src.realloc_lock),
|
|
||||||
nsegments_ (src.nsegments_),
|
|
||||||
segment_length_(src.segment_length_),
|
|
||||||
segment_length_m1_(src.segment_length_m1_),
|
|
||||||
max_segments_ (src.max_segments_),
|
|
||||||
segment_length_log2(src.segment_length_log2),
|
|
||||||
m_offset_map (src.m_offset_map)
|
|
||||||
{}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
SegmentedView& operator= (const SegmentedView& src) {
|
|
||||||
segments_ = src.segments_;
|
|
||||||
realloc_lock = src.realloc_lock;
|
|
||||||
nsegments_ = src.nsegments_;
|
|
||||||
segment_length_= src.segment_length_;
|
|
||||||
segment_length_m1_= src.segment_length_m1_;
|
|
||||||
max_segments_ = src.max_segments_;
|
|
||||||
segment_length_log2= src.segment_length_log2;
|
|
||||||
m_offset_map = src.m_offset_map;
|
|
||||||
return *this;
|
|
||||||
}
|
|
||||||
|
|
||||||
~SegmentedView() {
|
|
||||||
if ( !segments_.tracker().ref_counting()) { return; }
|
|
||||||
size_t ref_count = segments_.tracker().ref_count();
|
|
||||||
if(ref_count == 1u) {
|
|
||||||
Kokkos::fence();
|
|
||||||
typename Kokkos::View<int,typename traits::execution_space>::HostMirror h_nviews("h_nviews");
|
|
||||||
Kokkos::deep_copy(h_nviews,nsegments_);
|
|
||||||
Kokkos::parallel_for(h_nviews(),Impl::delete_segmented_view<DataType , Arg1Type , Arg2Type, Arg3Type>(*this));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
t_dev get_segment(const int& i) const {
|
|
||||||
return segments_[i];
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class MemberType>
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void grow (MemberType& team_member, const size_t& growSize) const {
|
|
||||||
if (growSize>max_segments_*segment_length_) {
|
|
||||||
printf ("Exceeding maxSize: %lu %lu\n", growSize, max_segments_*segment_length_);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
if(team_member.team_rank()==0) {
|
|
||||||
bool too_small = growSize > segment_length_ * nsegments_();
|
|
||||||
if (too_small) {
|
|
||||||
while(Kokkos::atomic_compare_exchange(&realloc_lock(),0,1) )
|
|
||||||
; // get the lock
|
|
||||||
too_small = growSize > segment_length_ * nsegments_(); // Recheck once we have the lock
|
|
||||||
if(too_small) {
|
|
||||||
while(too_small) {
|
|
||||||
const size_t alloc_size = segment_length_*m_offset_map.N1*m_offset_map.N2*m_offset_map.N3*
|
|
||||||
m_offset_map.N4*m_offset_map.N5*m_offset_map.N6*m_offset_map.N7;
|
|
||||||
typename traits::non_const_value_type* const ptr = new typename traits::non_const_value_type[alloc_size];
|
|
||||||
|
|
||||||
segments_(nsegments_()) =
|
|
||||||
t_dev(ptr,segment_length_,m_offset_map.N1,m_offset_map.N2,m_offset_map.N3,m_offset_map.N4,m_offset_map.N5,m_offset_map.N6,m_offset_map.N7);
|
|
||||||
nsegments_()++;
|
|
||||||
too_small = growSize > segment_length_ * nsegments_();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
realloc_lock() = 0; //release the lock
|
|
||||||
}
|
|
||||||
}
|
|
||||||
team_member.team_barrier();
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void grow_non_thread_safe (const size_t& growSize) const {
|
|
||||||
if (growSize>max_segments_*segment_length_) {
|
|
||||||
printf ("Exceeding maxSize: %lu %lu\n", growSize, max_segments_*segment_length_);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
bool too_small = growSize > segment_length_ * nsegments_();
|
|
||||||
if(too_small) {
|
|
||||||
while(too_small) {
|
|
||||||
const size_t alloc_size = segment_length_*m_offset_map.N1*m_offset_map.N2*m_offset_map.N3*
|
|
||||||
m_offset_map.N4*m_offset_map.N5*m_offset_map.N6*m_offset_map.N7;
|
|
||||||
typename traits::non_const_value_type* const ptr =
|
|
||||||
new typename traits::non_const_value_type[alloc_size];
|
|
||||||
|
|
||||||
segments_(nsegments_()) =
|
|
||||||
t_dev (ptr, segment_length_, m_offset_map.N1, m_offset_map.N2,
|
|
||||||
m_offset_map.N3, m_offset_map.N4, m_offset_map.N5,
|
|
||||||
m_offset_map.N6, m_offset_map.N7);
|
|
||||||
nsegments_()++;
|
|
||||||
too_small = growSize > segment_length_ * nsegments_();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
template< typename iType0 >
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
typename std::enable_if<( std::is_integral<iType0>::value && traits::rank == 1 )
|
|
||||||
, typename traits::value_type &
|
|
||||||
>::type
|
|
||||||
operator() ( const iType0 & i0 ) const
|
|
||||||
{
|
|
||||||
return segments_[i0>>segment_length_log2](i0&(segment_length_m1_));
|
|
||||||
}
|
|
||||||
|
|
||||||
template< typename iType0 , typename iType1 >
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
typename std::enable_if<( std::is_integral<iType0>::value &&
|
|
||||||
std::is_integral<iType1>::value &&
|
|
||||||
traits::rank == 2 )
|
|
||||||
, typename traits::value_type &
|
|
||||||
>::type
|
|
||||||
operator() ( const iType0 & i0 , const iType1 & i1 ) const
|
|
||||||
{
|
|
||||||
return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1);
|
|
||||||
}
|
|
||||||
|
|
||||||
template< typename iType0 , typename iType1 , typename iType2 >
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
typename std::enable_if<( std::is_integral<iType0>::value &&
|
|
||||||
std::is_integral<iType1>::value &&
|
|
||||||
std::is_integral<iType2>::value &&
|
|
||||||
traits::rank == 3 )
|
|
||||||
, typename traits::value_type &
|
|
||||||
>::type
|
|
||||||
operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
|
|
||||||
{
|
|
||||||
return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2);
|
|
||||||
}
|
|
||||||
|
|
||||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 >
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
typename std::enable_if<( std::is_integral<iType0>::value &&
|
|
||||||
std::is_integral<iType1>::value &&
|
|
||||||
std::is_integral<iType2>::value &&
|
|
||||||
std::is_integral<iType3>::value &&
|
|
||||||
traits::rank == 4 )
|
|
||||||
, typename traits::value_type &
|
|
||||||
>::type
|
|
||||||
operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
|
|
||||||
{
|
|
||||||
return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3);
|
|
||||||
}
|
|
||||||
|
|
||||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 ,
|
|
||||||
typename iType4 >
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
typename std::enable_if<( std::is_integral<iType0>::value &&
|
|
||||||
std::is_integral<iType1>::value &&
|
|
||||||
std::is_integral<iType2>::value &&
|
|
||||||
std::is_integral<iType3>::value &&
|
|
||||||
std::is_integral<iType4>::value &&
|
|
||||||
traits::rank == 5 )
|
|
||||||
, typename traits::value_type &
|
|
||||||
>::type
|
|
||||||
operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ,
|
|
||||||
const iType4 & i4 ) const
|
|
||||||
{
|
|
||||||
return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3,i4);
|
|
||||||
}
|
|
||||||
|
|
||||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 ,
|
|
||||||
typename iType4 , typename iType5 >
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
typename std::enable_if<( std::is_integral<iType0>::value &&
|
|
||||||
std::is_integral<iType1>::value &&
|
|
||||||
std::is_integral<iType2>::value &&
|
|
||||||
std::is_integral<iType3>::value &&
|
|
||||||
std::is_integral<iType4>::value &&
|
|
||||||
std::is_integral<iType5>::value &&
|
|
||||||
traits::rank == 6 )
|
|
||||||
, typename traits::value_type &
|
|
||||||
>::type
|
|
||||||
operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ,
|
|
||||||
const iType4 & i4 , const iType5 & i5 ) const
|
|
||||||
{
|
|
||||||
return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3,i4,i5);
|
|
||||||
}
|
|
||||||
|
|
||||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 ,
|
|
||||||
typename iType4 , typename iType5 , typename iType6 >
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
typename std::enable_if<( std::is_integral<iType0>::value &&
|
|
||||||
std::is_integral<iType1>::value &&
|
|
||||||
std::is_integral<iType2>::value &&
|
|
||||||
std::is_integral<iType3>::value &&
|
|
||||||
std::is_integral<iType4>::value &&
|
|
||||||
std::is_integral<iType5>::value &&
|
|
||||||
std::is_integral<iType6>::value &&
|
|
||||||
traits::rank == 7 )
|
|
||||||
, typename traits::value_type &
|
|
||||||
>::type
|
|
||||||
operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ,
|
|
||||||
const iType4 & i4 , const iType5 & i5 , const iType6 & i6 ) const
|
|
||||||
{
|
|
||||||
return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3,i4,i5,i6);
|
|
||||||
}
|
|
||||||
|
|
||||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 ,
|
|
||||||
typename iType4 , typename iType5 , typename iType6 , typename iType7 >
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
|
||||||
typename std::enable_if<( std::is_integral<iType0>::value &&
|
|
||||||
std::is_integral<iType1>::value &&
|
|
||||||
std::is_integral<iType2>::value &&
|
|
||||||
std::is_integral<iType3>::value &&
|
|
||||||
std::is_integral<iType4>::value &&
|
|
||||||
std::is_integral<iType5>::value &&
|
|
||||||
std::is_integral<iType6>::value &&
|
|
||||||
std::is_integral<iType7>::value &&
|
|
||||||
traits::rank == 8 )
|
|
||||||
, typename traits::value_type &
|
|
||||||
>::type
|
|
||||||
operator() ( const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ,
|
|
||||||
const iType4 & i4 , const iType5 & i5 , const iType6 & i6 , const iType7 & i7 ) const
|
|
||||||
{
|
|
||||||
return segments_[i0>>segment_length_log2](i0&(segment_length_m1_),i1,i2,i3,i4,i5,i6,i7);
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
namespace Impl {
|
|
||||||
template<class DataType, class Arg1Type, class Arg2Type, class Arg3Type>
|
|
||||||
struct delete_segmented_view {
|
|
||||||
typedef SegmentedView<DataType , Arg1Type , Arg2Type, Arg3Type> view_type;
|
|
||||||
typedef typename view_type::execution_space execution_space;
|
|
||||||
|
|
||||||
view_type view_;
|
|
||||||
delete_segmented_view(view_type view):view_(view) {
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (int i) const {
|
|
||||||
delete [] view_.get_segment(i).ptr_on_device();
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#endif
|
|
||||||
@ -241,9 +241,9 @@ public:
|
|||||||
typedef UnorderedMap<const_key_type,value_type,execution_space,hasher_type,equal_to_type> modifiable_map_type;
|
typedef UnorderedMap<const_key_type,value_type,execution_space,hasher_type,equal_to_type> modifiable_map_type;
|
||||||
typedef UnorderedMap<const_key_type,const_value_type,execution_space,hasher_type,equal_to_type> const_map_type;
|
typedef UnorderedMap<const_key_type,const_value_type,execution_space,hasher_type,equal_to_type> const_map_type;
|
||||||
|
|
||||||
static const bool is_set = Impl::is_same<void,value_type>::value;
|
static const bool is_set = std::is_same<void,value_type>::value;
|
||||||
static const bool has_const_key = Impl::is_same<const_key_type,declared_key_type>::value;
|
static const bool has_const_key = std::is_same<const_key_type,declared_key_type>::value;
|
||||||
static const bool has_const_value = is_set || Impl::is_same<const_value_type,declared_value_type>::value;
|
static const bool has_const_value = is_set || std::is_same<const_value_type,declared_value_type>::value;
|
||||||
|
|
||||||
static const bool is_insertable_map = !has_const_key && (is_set || !has_const_value);
|
static const bool is_insertable_map = !has_const_key && (is_set || !has_const_value);
|
||||||
static const bool is_modifiable_map = has_const_key && !has_const_value;
|
static const bool is_modifiable_map = has_const_key && !has_const_value;
|
||||||
@ -735,8 +735,8 @@ public:
|
|||||||
}
|
}
|
||||||
|
|
||||||
template <typename SKey, typename SValue, typename SDevice>
|
template <typename SKey, typename SValue, typename SDevice>
|
||||||
typename Impl::enable_if< Impl::is_same< typename Impl::remove_const<SKey>::type, key_type>::value &&
|
typename Impl::enable_if< std::is_same< typename Impl::remove_const<SKey>::type, key_type>::value &&
|
||||||
Impl::is_same< typename Impl::remove_const<SValue>::type, value_type>::value
|
std::is_same< typename Impl::remove_const<SValue>::type, value_type>::value
|
||||||
>::type
|
>::type
|
||||||
create_copy_view( UnorderedMap<SKey, SValue, SDevice, Hasher,EqualTo> const& src)
|
create_copy_view( UnorderedMap<SKey, SValue, SDevice, Hasher,EqualTo> const& src)
|
||||||
{
|
{
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
|
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
|
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
|
||||||
|
|
||||||
SET(SOURCES
|
SET(SOURCES
|
||||||
|
|||||||
@ -7,21 +7,18 @@ vpath %.cpp ${KOKKOS_PATH}/containers/unit_tests
|
|||||||
default: build_all
|
default: build_all
|
||||||
echo "End Build"
|
echo "End Build"
|
||||||
|
|
||||||
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
|
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||||
|
else
|
||||||
|
CXX = g++
|
||||||
|
endif
|
||||||
|
|
||||||
|
CXXFLAGS = -O3
|
||||||
|
LINK ?= $(CXX)
|
||||||
|
LDFLAGS ?= -lpthread
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
|
||||||
CXX = $(NVCC_WRAPPER)
|
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK = $(CXX)
|
|
||||||
LDFLAGS ?= -lpthread
|
|
||||||
else
|
|
||||||
CXX ?= g++
|
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK ?= $(CXX)
|
|
||||||
LDFLAGS ?= -lpthread
|
|
||||||
endif
|
|
||||||
|
|
||||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests
|
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests
|
||||||
|
|
||||||
TEST_TARGETS =
|
TEST_TARGETS =
|
||||||
|
|||||||
@ -59,11 +59,13 @@
|
|||||||
#include <TestVector.hpp>
|
#include <TestVector.hpp>
|
||||||
#include <TestDualView.hpp>
|
#include <TestDualView.hpp>
|
||||||
#include <TestDynamicView.hpp>
|
#include <TestDynamicView.hpp>
|
||||||
#include <TestSegmentedView.hpp>
|
|
||||||
|
|
||||||
#include <Kokkos_DynRankView.hpp>
|
#include <Kokkos_DynRankView.hpp>
|
||||||
#include <TestDynViewAPI.hpp>
|
#include <TestDynViewAPI.hpp>
|
||||||
|
|
||||||
|
#include <Kokkos_ErrorReporter.hpp>
|
||||||
|
#include <TestErrorReporter.hpp>
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
@ -133,11 +135,6 @@ void cuda_test_dualview_combinations(unsigned int size)
|
|||||||
test_dualview_combinations<int,Kokkos::Cuda>(size);
|
test_dualview_combinations<int,Kokkos::Cuda>(size);
|
||||||
}
|
}
|
||||||
|
|
||||||
void cuda_test_segmented_view(unsigned int size)
|
|
||||||
{
|
|
||||||
test_segmented_view<double,Kokkos::Cuda>(size);
|
|
||||||
}
|
|
||||||
|
|
||||||
void cuda_test_bitset()
|
void cuda_test_bitset()
|
||||||
{
|
{
|
||||||
test_bitset<Kokkos::Cuda>();
|
test_bitset<Kokkos::Cuda>();
|
||||||
@ -184,11 +181,6 @@ void cuda_test_bitset()
|
|||||||
cuda_test_dualview_combinations(size); \
|
cuda_test_dualview_combinations(size); \
|
||||||
}
|
}
|
||||||
|
|
||||||
#define CUDA_SEGMENTEDVIEW_TEST( size ) \
|
|
||||||
TEST_F( cuda, segmentedview_##size##x) { \
|
|
||||||
cuda_test_segmented_view(size); \
|
|
||||||
}
|
|
||||||
|
|
||||||
CUDA_DUALVIEW_COMBINE_TEST( 10 )
|
CUDA_DUALVIEW_COMBINE_TEST( 10 )
|
||||||
CUDA_VECTOR_COMBINE_TEST( 10 )
|
CUDA_VECTOR_COMBINE_TEST( 10 )
|
||||||
CUDA_VECTOR_COMBINE_TEST( 3057 )
|
CUDA_VECTOR_COMBINE_TEST( 3057 )
|
||||||
@ -198,7 +190,6 @@ CUDA_INSERT_TEST(close, 100000, 90000, 100, 500)
|
|||||||
CUDA_INSERT_TEST(far, 100000, 90000, 100, 500)
|
CUDA_INSERT_TEST(far, 100000, 90000, 100, 500)
|
||||||
CUDA_DEEP_COPY( 10000, 1 )
|
CUDA_DEEP_COPY( 10000, 1 )
|
||||||
CUDA_FAILED_INSERT_TEST( 10000, 1000 )
|
CUDA_FAILED_INSERT_TEST( 10000, 1000 )
|
||||||
CUDA_SEGMENTEDVIEW_TEST( 200 )
|
|
||||||
|
|
||||||
|
|
||||||
#undef CUDA_INSERT_TEST
|
#undef CUDA_INSERT_TEST
|
||||||
@ -207,7 +198,6 @@ CUDA_SEGMENTEDVIEW_TEST( 200 )
|
|||||||
#undef CUDA_DEEP_COPY
|
#undef CUDA_DEEP_COPY
|
||||||
#undef CUDA_VECTOR_COMBINE_TEST
|
#undef CUDA_VECTOR_COMBINE_TEST
|
||||||
#undef CUDA_DUALVIEW_COMBINE_TEST
|
#undef CUDA_DUALVIEW_COMBINE_TEST
|
||||||
#undef CUDA_SEGMENTEDVIEW_TEST
|
|
||||||
|
|
||||||
|
|
||||||
TEST_F( cuda , dynamic_view )
|
TEST_F( cuda , dynamic_view )
|
||||||
@ -221,6 +211,18 @@ TEST_F( cuda , dynamic_view )
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#if defined(KOKKOS_CLASS_LAMBDA)
|
||||||
|
TEST_F(cuda, ErrorReporterViaLambda)
|
||||||
|
{
|
||||||
|
TestErrorReporter<ErrorReporterDriverUseLambda<Kokkos::Cuda>>();
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
TEST_F(cuda, ErrorReporter)
|
||||||
|
{
|
||||||
|
TestErrorReporter<ErrorReporterDriver<Kokkos::Cuda>>();
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#endif /* #ifdef KOKKOS_HAVE_CUDA */
|
#endif /* #ifdef KOKKOS_HAVE_CUDA */
|
||||||
|
|||||||
@ -715,9 +715,9 @@ public:
|
|||||||
typedef Kokkos::Experimental::DynRankView< T, device, Kokkos::MemoryUnmanaged > dView0_unmanaged ;
|
typedef Kokkos::Experimental::DynRankView< T, device, Kokkos::MemoryUnmanaged > dView0_unmanaged ;
|
||||||
typedef typename dView0::host_mirror_space host_drv_space ;
|
typedef typename dView0::host_mirror_space host_drv_space ;
|
||||||
|
|
||||||
typedef Kokkos::Experimental::View< T , device > View0 ;
|
typedef Kokkos::View< T , device > View0 ;
|
||||||
typedef Kokkos::Experimental::View< T* , device > View1 ;
|
typedef Kokkos::View< T* , device > View1 ;
|
||||||
typedef Kokkos::Experimental::View< T******* , device > View7 ;
|
typedef Kokkos::View< T******* , device > View7 ;
|
||||||
|
|
||||||
typedef typename View0::host_mirror_space host_view_space ;
|
typedef typename View0::host_mirror_space host_view_space ;
|
||||||
|
|
||||||
@ -1127,8 +1127,7 @@ public:
|
|||||||
// T v2 = hx(0,0) ; // Generates compile error as intended
|
// T v2 = hx(0,0) ; // Generates compile error as intended
|
||||||
// hx(0,0) = v2 ; // Generates compile error as intended
|
// hx(0,0) = v2 ; // Generates compile error as intended
|
||||||
|
|
||||||
/*
|
#if 0 /* Asynchronous deep copies not implemented for dynamic rank view */
|
||||||
#if ! KOKKOS_USING_EXP_VIEW
|
|
||||||
// Testing with asynchronous deep copy with respect to device
|
// Testing with asynchronous deep copy with respect to device
|
||||||
{
|
{
|
||||||
size_t count = 0 ;
|
size_t count = 0 ;
|
||||||
@ -1193,7 +1192,7 @@ public:
|
|||||||
{ ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
|
{ ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
|
||||||
}}}}
|
}}}}
|
||||||
}
|
}
|
||||||
#endif */ // #if ! KOKKOS_USING_EXP_VIEW
|
#endif
|
||||||
|
|
||||||
// Testing with synchronous deep copy
|
// Testing with synchronous deep copy
|
||||||
{
|
{
|
||||||
|
|||||||
227
lib/kokkos/containers/unit_tests/TestErrorReporter.hpp
Normal file
227
lib/kokkos/containers/unit_tests/TestErrorReporter.hpp
Normal file
@ -0,0 +1,227 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef KOKKOS_TEST_EXPERIMENTAL_ERROR_REPORTER_HPP
|
||||||
|
#define KOKKOS_TEST_EXPERIMENTAL_ERROR_REPORTER_HPP
|
||||||
|
|
||||||
|
#include <gtest/gtest.h>
|
||||||
|
#include <iostream>
|
||||||
|
#include <Kokkos_Core.hpp>
|
||||||
|
|
||||||
|
namespace Test {
|
||||||
|
|
||||||
|
// Just save the data in the report. Informative text goies in the operator<<(..).
|
||||||
|
template <typename DataType1, typename DataType2, typename DataType3>
|
||||||
|
struct ThreeValReport
|
||||||
|
{
|
||||||
|
DataType1 m_data1;
|
||||||
|
DataType2 m_data2;
|
||||||
|
DataType3 m_data3;
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
template <typename DataType1, typename DataType2, typename DataType3>
|
||||||
|
std::ostream &operator<<(std::ostream & os, const ThreeValReport<DataType1, DataType2, DataType3> &val)
|
||||||
|
{
|
||||||
|
return os << "{" << val.m_data1 << " " << val.m_data2 << " " << val.m_data3 << "}";
|
||||||
|
}
|
||||||
|
|
||||||
|
template<typename ReportType>
|
||||||
|
void checkReportersAndReportsAgree(const std::vector<int> &reporters,
|
||||||
|
const std::vector<ReportType> &reports)
|
||||||
|
{
|
||||||
|
for (size_t i = 0; i < reports.size(); ++i) {
|
||||||
|
EXPECT_EQ(1, reporters[i] % 2);
|
||||||
|
EXPECT_EQ(reporters[i], reports[i].m_data1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
template <typename DeviceType>
|
||||||
|
struct ErrorReporterDriverBase {
|
||||||
|
|
||||||
|
typedef ThreeValReport<int, int, double> report_type;
|
||||||
|
typedef Kokkos::Experimental::ErrorReporter<report_type, DeviceType> error_reporter_type;
|
||||||
|
error_reporter_type m_errorReporter;
|
||||||
|
|
||||||
|
ErrorReporterDriverBase(int reporter_capacity, int test_size)
|
||||||
|
: m_errorReporter(reporter_capacity) { }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION bool error_condition(const int work_idx) const { return (work_idx % 2 != 0); }
|
||||||
|
|
||||||
|
void check_expectations(int reporter_capacity, int test_size)
|
||||||
|
{
|
||||||
|
int num_reported = m_errorReporter.getNumReports();
|
||||||
|
int num_attempts = m_errorReporter.getNumReportAttempts();
|
||||||
|
|
||||||
|
int expected_num_reports = std::min(reporter_capacity, test_size / 2);
|
||||||
|
EXPECT_EQ(expected_num_reports, num_reported);
|
||||||
|
EXPECT_EQ(test_size / 2, num_attempts);
|
||||||
|
|
||||||
|
bool expect_full = (reporter_capacity <= (test_size / 2));
|
||||||
|
bool reported_full = m_errorReporter.full();
|
||||||
|
EXPECT_EQ(expect_full, reported_full);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
template <typename ErrorReporterDriverType>
|
||||||
|
void TestErrorReporter()
|
||||||
|
{
|
||||||
|
typedef ErrorReporterDriverType tester_type;
|
||||||
|
std::vector<int> reporters;
|
||||||
|
std::vector<typename tester_type::report_type> reports;
|
||||||
|
|
||||||
|
tester_type test1(100, 10);
|
||||||
|
test1.m_errorReporter.getReports(reporters, reports);
|
||||||
|
checkReportersAndReportsAgree(reporters, reports);
|
||||||
|
|
||||||
|
tester_type test2(10, 100);
|
||||||
|
test2.m_errorReporter.getReports(reporters, reports);
|
||||||
|
checkReportersAndReportsAgree(reporters, reports);
|
||||||
|
|
||||||
|
typename Kokkos::View<int*, typename ErrorReporterDriverType::execution_space >::HostMirror view_reporters;
|
||||||
|
typename Kokkos::View<typename tester_type::report_type*, typename ErrorReporterDriverType::execution_space >::HostMirror
|
||||||
|
view_reports;
|
||||||
|
test2.m_errorReporter.getReports(view_reporters, view_reports);
|
||||||
|
|
||||||
|
int num_reports = view_reporters.extent(0);
|
||||||
|
reporters.clear();
|
||||||
|
reports.clear();
|
||||||
|
reporters.reserve(num_reports);
|
||||||
|
reports.reserve(num_reports);
|
||||||
|
|
||||||
|
for (int i = 0; i < num_reports; ++i) {
|
||||||
|
reporters.push_back(view_reporters(i));
|
||||||
|
reports.push_back(view_reports(i));
|
||||||
|
}
|
||||||
|
checkReportersAndReportsAgree(reporters, reports);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
template <typename DeviceType>
|
||||||
|
struct ErrorReporterDriver : public ErrorReporterDriverBase<DeviceType>
|
||||||
|
{
|
||||||
|
typedef ErrorReporterDriverBase<DeviceType> driver_base;
|
||||||
|
typedef typename driver_base::error_reporter_type::execution_space execution_space;
|
||||||
|
|
||||||
|
ErrorReporterDriver(int reporter_capacity, int test_size)
|
||||||
|
: driver_base(reporter_capacity, test_size)
|
||||||
|
{
|
||||||
|
execute(reporter_capacity, test_size);
|
||||||
|
|
||||||
|
// Test that clear() and resize() work across memory spaces.
|
||||||
|
if (reporter_capacity < test_size) {
|
||||||
|
driver_base::m_errorReporter.clear();
|
||||||
|
driver_base::m_errorReporter.resize(test_size);
|
||||||
|
execute(test_size, test_size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void execute(int reporter_capacity, int test_size)
|
||||||
|
{
|
||||||
|
Kokkos::parallel_for(Kokkos::RangePolicy<execution_space>(0,test_size), *this);
|
||||||
|
driver_base::check_expectations(reporter_capacity, test_size);
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void operator()(const int work_idx) const
|
||||||
|
{
|
||||||
|
if (driver_base::error_condition(work_idx)) {
|
||||||
|
double val = M_PI * static_cast<double>(work_idx);
|
||||||
|
typename driver_base::report_type report = {work_idx, -2*work_idx, val};
|
||||||
|
driver_base::m_errorReporter.add_report(work_idx, report);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
#if defined(KOKKOS_CLASS_LAMBDA)
|
||||||
|
template <typename DeviceType>
|
||||||
|
struct ErrorReporterDriverUseLambda : public ErrorReporterDriverBase<DeviceType>
|
||||||
|
{
|
||||||
|
|
||||||
|
typedef ErrorReporterDriverBase<DeviceType> driver_base;
|
||||||
|
typedef typename driver_base::error_reporter_type::execution_space execution_space;
|
||||||
|
|
||||||
|
ErrorReporterDriverUseLambda(int reporter_capacity, int test_size)
|
||||||
|
: driver_base(reporter_capacity, test_size)
|
||||||
|
{
|
||||||
|
Kokkos::parallel_for(Kokkos::RangePolicy<execution_space>(0,test_size), KOKKOS_CLASS_LAMBDA (const int work_idx) {
|
||||||
|
if (driver_base::error_condition(work_idx)) {
|
||||||
|
double val = M_PI * static_cast<double>(work_idx);
|
||||||
|
typename driver_base::report_type report = {work_idx, -2*work_idx, val};
|
||||||
|
driver_base::m_errorReporter.add_report(work_idx, report);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
driver_base::check_expectations(reporter_capacity, test_size);
|
||||||
|
}
|
||||||
|
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
#ifdef KOKKOS_HAVE_OPENMP
|
||||||
|
struct ErrorReporterDriverNativeOpenMP : public ErrorReporterDriverBase<Kokkos::OpenMP>
|
||||||
|
{
|
||||||
|
typedef ErrorReporterDriverBase<Kokkos::OpenMP> driver_base;
|
||||||
|
typedef typename driver_base::error_reporter_type::execution_space execution_space;
|
||||||
|
|
||||||
|
ErrorReporterDriverNativeOpenMP(int reporter_capacity, int test_size)
|
||||||
|
: driver_base(reporter_capacity, test_size)
|
||||||
|
{
|
||||||
|
#pragma omp parallel for
|
||||||
|
for(int work_idx = 0; work_idx < test_size; ++work_idx)
|
||||||
|
{
|
||||||
|
if (driver_base::error_condition(work_idx)) {
|
||||||
|
double val = M_PI * static_cast<double>(work_idx);
|
||||||
|
typename driver_base::report_type report = {work_idx, -2*work_idx, val};
|
||||||
|
driver_base::m_errorReporter.add_report(work_idx, report);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
driver_base::check_expectations(reporter_capacity, test_size);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
|
||||||
|
} // namespace Test
|
||||||
|
#endif // #ifndef KOKKOS_TEST_ERROR_REPORTING_HPP
|
||||||
@ -56,12 +56,14 @@
|
|||||||
#include <TestVector.hpp>
|
#include <TestVector.hpp>
|
||||||
#include <TestDualView.hpp>
|
#include <TestDualView.hpp>
|
||||||
#include <TestDynamicView.hpp>
|
#include <TestDynamicView.hpp>
|
||||||
#include <TestSegmentedView.hpp>
|
|
||||||
#include <TestComplex.hpp>
|
#include <TestComplex.hpp>
|
||||||
|
|
||||||
#include <Kokkos_DynRankView.hpp>
|
#include <Kokkos_DynRankView.hpp>
|
||||||
#include <TestDynViewAPI.hpp>
|
#include <TestDynViewAPI.hpp>
|
||||||
|
|
||||||
|
#include <Kokkos_ErrorReporter.hpp>
|
||||||
|
#include <TestErrorReporter.hpp>
|
||||||
|
|
||||||
#include <iomanip>
|
#include <iomanip>
|
||||||
|
|
||||||
namespace Test {
|
namespace Test {
|
||||||
@ -143,11 +145,6 @@ TEST_F( openmp , staticcrsgraph )
|
|||||||
test_dualview_combinations<int,Kokkos::OpenMP>(size); \
|
test_dualview_combinations<int,Kokkos::OpenMP>(size); \
|
||||||
}
|
}
|
||||||
|
|
||||||
#define OPENMP_SEGMENTEDVIEW_TEST( size ) \
|
|
||||||
TEST_F( openmp, segmentedview_##size##x) { \
|
|
||||||
test_segmented_view<double,Kokkos::OpenMP>(size); \
|
|
||||||
}
|
|
||||||
|
|
||||||
OPENMP_INSERT_TEST(close, 100000, 90000, 100, 500, true)
|
OPENMP_INSERT_TEST(close, 100000, 90000, 100, 500, true)
|
||||||
OPENMP_INSERT_TEST(far, 100000, 90000, 100, 500, false)
|
OPENMP_INSERT_TEST(far, 100000, 90000, 100, 500, false)
|
||||||
OPENMP_FAILED_INSERT_TEST( 10000, 1000 )
|
OPENMP_FAILED_INSERT_TEST( 10000, 1000 )
|
||||||
@ -156,7 +153,6 @@ OPENMP_DEEP_COPY( 10000, 1 )
|
|||||||
OPENMP_VECTOR_COMBINE_TEST( 10 )
|
OPENMP_VECTOR_COMBINE_TEST( 10 )
|
||||||
OPENMP_VECTOR_COMBINE_TEST( 3057 )
|
OPENMP_VECTOR_COMBINE_TEST( 3057 )
|
||||||
OPENMP_DUALVIEW_COMBINE_TEST( 10 )
|
OPENMP_DUALVIEW_COMBINE_TEST( 10 )
|
||||||
OPENMP_SEGMENTEDVIEW_TEST( 10000 )
|
|
||||||
|
|
||||||
#undef OPENMP_INSERT_TEST
|
#undef OPENMP_INSERT_TEST
|
||||||
#undef OPENMP_FAILED_INSERT_TEST
|
#undef OPENMP_FAILED_INSERT_TEST
|
||||||
@ -164,7 +160,6 @@ OPENMP_SEGMENTEDVIEW_TEST( 10000 )
|
|||||||
#undef OPENMP_DEEP_COPY
|
#undef OPENMP_DEEP_COPY
|
||||||
#undef OPENMP_VECTOR_COMBINE_TEST
|
#undef OPENMP_VECTOR_COMBINE_TEST
|
||||||
#undef OPENMP_DUALVIEW_COMBINE_TEST
|
#undef OPENMP_DUALVIEW_COMBINE_TEST
|
||||||
#undef OPENMP_SEGMENTEDVIEW_TEST
|
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
|
||||||
@ -178,5 +173,22 @@ TEST_F( openmp , dynamic_view )
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#if defined(KOKKOS_CLASS_LAMBDA)
|
||||||
|
TEST_F(openmp, ErrorReporterViaLambda)
|
||||||
|
{
|
||||||
|
TestErrorReporter<ErrorReporterDriverUseLambda<Kokkos::OpenMP>>();
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
TEST_F(openmp, ErrorReporter)
|
||||||
|
{
|
||||||
|
TestErrorReporter<ErrorReporterDriver<Kokkos::OpenMP>>();
|
||||||
|
}
|
||||||
|
|
||||||
|
TEST_F(openmp, ErrorReporterNativeOpenMP)
|
||||||
|
{
|
||||||
|
TestErrorReporter<ErrorReporterDriverNativeOpenMP>();
|
||||||
|
}
|
||||||
|
|
||||||
} // namespace test
|
} // namespace test
|
||||||
|
|
||||||
|
|||||||
@ -1,708 +0,0 @@
|
|||||||
/*
|
|
||||||
//@HEADER
|
|
||||||
// ************************************************************************
|
|
||||||
//
|
|
||||||
// Kokkos v. 2.0
|
|
||||||
// Copyright (2014) Sandia Corporation
|
|
||||||
//
|
|
||||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
|
||||||
// the U.S. Government retains certain rights in this software.
|
|
||||||
//
|
|
||||||
// Redistribution and use in source and binary forms, with or without
|
|
||||||
// modification, are permitted provided that the following conditions are
|
|
||||||
// met:
|
|
||||||
//
|
|
||||||
// 1. Redistributions of source code must retain the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer.
|
|
||||||
//
|
|
||||||
// 2. Redistributions in binary form must reproduce the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer in the
|
|
||||||
// documentation and/or other materials provided with the distribution.
|
|
||||||
//
|
|
||||||
// 3. Neither the name of the Corporation nor the names of the
|
|
||||||
// contributors may be used to endorse or promote products derived from
|
|
||||||
// this software without specific prior written permission.
|
|
||||||
//
|
|
||||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
|
||||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
||||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
||||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
|
||||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
|
||||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
|
||||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
|
||||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
|
||||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
|
||||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
|
||||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
||||||
//
|
|
||||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
|
||||||
//
|
|
||||||
// ************************************************************************
|
|
||||||
//@HEADER
|
|
||||||
*/
|
|
||||||
|
|
||||||
#ifndef KOKKOS_TEST_SEGMENTEDVIEW_HPP
|
|
||||||
#define KOKKOS_TEST_SEGMENTEDVIEW_HPP
|
|
||||||
|
|
||||||
#include <gtest/gtest.h>
|
|
||||||
#include <iostream>
|
|
||||||
#include <cstdlib>
|
|
||||||
#include <cstdio>
|
|
||||||
#include <Kokkos_Core.hpp>
|
|
||||||
|
|
||||||
#if ! KOKKOS_USING_EXP_VIEW
|
|
||||||
|
|
||||||
#include <Kokkos_SegmentedView.hpp>
|
|
||||||
#include <impl/Kokkos_Timer.hpp>
|
|
||||||
|
|
||||||
namespace Test {
|
|
||||||
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace, int Rank = ViewType::Rank>
|
|
||||||
struct GrowTest;
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct GrowTest<ViewType , ExecutionSpace , 1> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
GrowTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
a.grow(team_member , team_idx+team_member.team_size());
|
|
||||||
value += team_idx + team_member.team_rank();
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+team_member.team_rank()))
|
|
||||||
a(team_idx+team_member.team_rank()) = team_idx+team_member.team_rank();
|
|
||||||
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct GrowTest<ViewType , ExecutionSpace , 2> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
GrowTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
a.grow(team_member , team_idx+ team_member.team_size());
|
|
||||||
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<7;k++)
|
|
||||||
value += team_idx + team_member.team_rank() + 13*k;
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++) {
|
|
||||||
a(team_idx+ team_member.team_rank(),k) =
|
|
||||||
team_idx+ team_member.team_rank() + 13*k;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct GrowTest<ViewType , ExecutionSpace , 3> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
GrowTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
a.grow(team_member , team_idx+ team_member.team_size());
|
|
||||||
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<7;k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<3;l++)
|
|
||||||
value += team_idx + team_member.team_rank() + 13*k + 3*l;
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
a(team_idx+ team_member.team_rank(),k,l) =
|
|
||||||
team_idx+ team_member.team_rank() + 13*k + 3*l;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct GrowTest<ViewType , ExecutionSpace , 4> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
GrowTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
a.grow(team_member , team_idx+ team_member.team_size());
|
|
||||||
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<7;k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<3;l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<2;m++)
|
|
||||||
value += team_idx + team_member.team_rank() + 13*k + 3*l + 7*m;
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
a(team_idx+ team_member.team_rank(),k,l,m) =
|
|
||||||
team_idx+ team_member.team_rank() + 13*k + 3*l + 7*m;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct GrowTest<ViewType , ExecutionSpace , 5> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
GrowTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
a.grow(team_member , team_idx+ team_member.team_size());
|
|
||||||
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<7;k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<3;l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<2;m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<3;n++)
|
|
||||||
value +=
|
|
||||||
team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n;
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
|
|
||||||
a(team_idx+ team_member.team_rank(),k,l,m,n) =
|
|
||||||
team_idx+ team_member.team_rank() + 13*k + 3*l + 7*m + 5*n;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct GrowTest<ViewType , ExecutionSpace , 6> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
GrowTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
a.grow(team_member , team_idx+ team_member.team_size());
|
|
||||||
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<7;k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<3;l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<2;m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<3;n++)
|
|
||||||
for( typename ExecutionSpace::size_type o=0;o<2;o++)
|
|
||||||
value +=
|
|
||||||
team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o ;
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
|
|
||||||
for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
|
|
||||||
a(team_idx+ team_member.team_rank(),k,l,m,n,o) =
|
|
||||||
team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o ;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct GrowTest<ViewType , ExecutionSpace , 7> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
GrowTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
a.grow(team_member , team_idx+ team_member.team_size());
|
|
||||||
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<7;k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<3;l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<2;m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<3;n++)
|
|
||||||
for( typename ExecutionSpace::size_type o=0;o<2;o++)
|
|
||||||
for( typename ExecutionSpace::size_type p=0;p<4;p++)
|
|
||||||
value +=
|
|
||||||
team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o + 15*p ;
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
|
|
||||||
for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
|
|
||||||
for( typename ExecutionSpace::size_type p=0;p<a.dimension_6();p++)
|
|
||||||
a(team_idx+ team_member.team_rank(),k,l,m,n,o,p) =
|
|
||||||
team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o + 15*p ;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct GrowTest<ViewType , ExecutionSpace , 8> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
GrowTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
a.grow(team_member , team_idx + team_member.team_size());
|
|
||||||
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<7;k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<3;l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<2;m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<3;n++)
|
|
||||||
for( typename ExecutionSpace::size_type o=0;o<2;o++)
|
|
||||||
for( typename ExecutionSpace::size_type p=0;p<4;p++)
|
|
||||||
for( typename ExecutionSpace::size_type q=0;q<3;q++)
|
|
||||||
value +=
|
|
||||||
team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o + 15*p + 17*q;
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
|
|
||||||
for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
|
|
||||||
for( typename ExecutionSpace::size_type p=0;p<a.dimension_6();p++)
|
|
||||||
for( typename ExecutionSpace::size_type q=0;q<a.dimension_7();q++)
|
|
||||||
a(team_idx+ team_member.team_rank(),k,l,m,n,o,p,q) =
|
|
||||||
team_idx + team_member.team_rank() + 13*k + 3*l + 7*m + 5*n + 2*o + 15*p + 17*q;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace, int Rank = ViewType::Rank>
|
|
||||||
struct VerifyTest;
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct VerifyTest<ViewType , ExecutionSpace , 1> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
VerifyTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
value += a(team_idx+ team_member.team_rank());
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct VerifyTest<ViewType , ExecutionSpace , 2> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
VerifyTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
value += a(team_idx+ team_member.team_rank(),k);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct VerifyTest<ViewType , ExecutionSpace , 3> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
VerifyTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
value += a(team_idx+ team_member.team_rank(),k,l);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct VerifyTest<ViewType , ExecutionSpace , 4> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
VerifyTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
value += a(team_idx+ team_member.team_rank(),k,l,m);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct VerifyTest<ViewType , ExecutionSpace , 5> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
VerifyTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
|
|
||||||
value += a(team_idx+ team_member.team_rank(),k,l,m,n);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct VerifyTest<ViewType , ExecutionSpace , 6> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
VerifyTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
|
|
||||||
for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
|
|
||||||
value += a(team_idx+ team_member.team_rank(),k,l,m,n,o);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct VerifyTest<ViewType , ExecutionSpace , 7> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
VerifyTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
|
|
||||||
for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
|
|
||||||
for( typename ExecutionSpace::size_type p=0;p<a.dimension_6();p++)
|
|
||||||
value += a(team_idx+ team_member.team_rank(),k,l,m,n,o,p);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template<class ViewType , class ExecutionSpace>
|
|
||||||
struct VerifyTest<ViewType , ExecutionSpace , 8> {
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
typedef typename Policy::member_type team_type;
|
|
||||||
typedef double value_type;
|
|
||||||
|
|
||||||
ViewType a;
|
|
||||||
|
|
||||||
VerifyTest(ViewType in):a(in) {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void operator() (team_type team_member, double& value) const {
|
|
||||||
unsigned int team_idx = team_member.league_rank() * team_member.team_size();
|
|
||||||
|
|
||||||
if((a.dimension_0()>team_idx+ team_member.team_rank()) &&
|
|
||||||
(a.dimension(0)>team_idx+ team_member.team_rank())) {
|
|
||||||
for( typename ExecutionSpace::size_type k=0;k<a.dimension_1();k++)
|
|
||||||
for( typename ExecutionSpace::size_type l=0;l<a.dimension_2();l++)
|
|
||||||
for( typename ExecutionSpace::size_type m=0;m<a.dimension_3();m++)
|
|
||||||
for( typename ExecutionSpace::size_type n=0;n<a.dimension_4();n++)
|
|
||||||
for( typename ExecutionSpace::size_type o=0;o<a.dimension_5();o++)
|
|
||||||
for( typename ExecutionSpace::size_type p=0;p<a.dimension_6();p++)
|
|
||||||
for( typename ExecutionSpace::size_type q=0;q<a.dimension_7();q++)
|
|
||||||
value += a(team_idx+ team_member.team_rank(),k,l,m,n,o,p,q);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template <typename Scalar, class ExecutionSpace>
|
|
||||||
struct test_segmented_view
|
|
||||||
{
|
|
||||||
typedef test_segmented_view<Scalar,ExecutionSpace> self_type;
|
|
||||||
|
|
||||||
typedef Scalar scalar_type;
|
|
||||||
typedef ExecutionSpace execution_space;
|
|
||||||
typedef Kokkos::TeamPolicy<execution_space> Policy;
|
|
||||||
|
|
||||||
double result;
|
|
||||||
double reference;
|
|
||||||
|
|
||||||
template <class ViewType>
|
|
||||||
void run_me(ViewType a, int max_length){
|
|
||||||
const int team_size = Policy::team_size_max( GrowTest<ViewType,execution_space>(a) );
|
|
||||||
const int nteams = max_length/team_size;
|
|
||||||
|
|
||||||
reference = 0;
|
|
||||||
result = 0;
|
|
||||||
|
|
||||||
Kokkos::parallel_reduce(Policy(nteams,team_size),GrowTest<ViewType,execution_space>(a),reference);
|
|
||||||
Kokkos::fence();
|
|
||||||
Kokkos::parallel_reduce(Policy(nteams,team_size),VerifyTest<ViewType,execution_space>(a),result);
|
|
||||||
Kokkos::fence();
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
test_segmented_view(unsigned int size,int rank)
|
|
||||||
{
|
|
||||||
reference = 0;
|
|
||||||
result = 0;
|
|
||||||
|
|
||||||
const int dim_1 = 7;
|
|
||||||
const int dim_2 = 3;
|
|
||||||
const int dim_3 = 2;
|
|
||||||
const int dim_4 = 3;
|
|
||||||
const int dim_5 = 2;
|
|
||||||
const int dim_6 = 4;
|
|
||||||
//const int dim_7 = 3;
|
|
||||||
|
|
||||||
if(rank==1) {
|
|
||||||
typedef Kokkos::Experimental::SegmentedView<Scalar*,Kokkos::LayoutLeft,ExecutionSpace> rank1_view;
|
|
||||||
run_me< rank1_view >(rank1_view("Rank1",128,size), size);
|
|
||||||
}
|
|
||||||
if(rank==2) {
|
|
||||||
typedef Kokkos::Experimental::SegmentedView<Scalar**,Kokkos::LayoutLeft,ExecutionSpace> rank2_view;
|
|
||||||
run_me< rank2_view >(rank2_view("Rank2",128,size,dim_1), size);
|
|
||||||
}
|
|
||||||
if(rank==3) {
|
|
||||||
typedef Kokkos::Experimental::SegmentedView<Scalar*[7][3][2],Kokkos::LayoutRight,ExecutionSpace> rank3_view;
|
|
||||||
run_me< rank3_view >(rank3_view("Rank3",128,size), size);
|
|
||||||
}
|
|
||||||
if(rank==4) {
|
|
||||||
typedef Kokkos::Experimental::SegmentedView<Scalar****,Kokkos::LayoutRight,ExecutionSpace> rank4_view;
|
|
||||||
run_me< rank4_view >(rank4_view("Rank4",128,size,dim_1,dim_2,dim_3), size);
|
|
||||||
}
|
|
||||||
if(rank==5) {
|
|
||||||
typedef Kokkos::Experimental::SegmentedView<Scalar*[7][3][2][3],Kokkos::LayoutLeft,ExecutionSpace> rank5_view;
|
|
||||||
run_me< rank5_view >(rank5_view("Rank5",128,size), size);
|
|
||||||
}
|
|
||||||
if(rank==6) {
|
|
||||||
typedef Kokkos::Experimental::SegmentedView<Scalar*****[2],Kokkos::LayoutRight,ExecutionSpace> rank6_view;
|
|
||||||
run_me< rank6_view >(rank6_view("Rank6",128,size,dim_1,dim_2,dim_3,dim_4), size);
|
|
||||||
}
|
|
||||||
if(rank==7) {
|
|
||||||
typedef Kokkos::Experimental::SegmentedView<Scalar*******,Kokkos::LayoutLeft,ExecutionSpace> rank7_view;
|
|
||||||
run_me< rank7_view >(rank7_view("Rank7",128,size,dim_1,dim_2,dim_3,dim_4,dim_5,dim_6), size);
|
|
||||||
}
|
|
||||||
if(rank==8) {
|
|
||||||
typedef Kokkos::Experimental::SegmentedView<Scalar*****[2][4][3],Kokkos::LayoutLeft,ExecutionSpace> rank8_view;
|
|
||||||
run_me< rank8_view >(rank8_view("Rank8",128,size,dim_1,dim_2,dim_3,dim_4), size);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
};
|
|
||||||
|
|
||||||
} // namespace Impl
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
template <typename Scalar, class ExecutionSpace>
|
|
||||||
void test_segmented_view(unsigned int size)
|
|
||||||
{
|
|
||||||
{
|
|
||||||
typedef Kokkos::Experimental::SegmentedView<Scalar*****[2][4][3],Kokkos::LayoutLeft,ExecutionSpace> view_type;
|
|
||||||
view_type a("A",128,size,7,3,2,3);
|
|
||||||
double reference;
|
|
||||||
|
|
||||||
Impl::GrowTest<view_type,ExecutionSpace> f(a);
|
|
||||||
|
|
||||||
const int team_size = Kokkos::TeamPolicy<ExecutionSpace>::team_size_max( f );
|
|
||||||
const int nteams = (size+team_size-1)/team_size;
|
|
||||||
|
|
||||||
Kokkos::parallel_reduce(Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),f,reference);
|
|
||||||
|
|
||||||
size_t real_size = ((size+127)/128)*128;
|
|
||||||
|
|
||||||
ASSERT_EQ(real_size,a.dimension_0());
|
|
||||||
ASSERT_EQ(7,a.dimension_1());
|
|
||||||
ASSERT_EQ(3,a.dimension_2());
|
|
||||||
ASSERT_EQ(2,a.dimension_3());
|
|
||||||
ASSERT_EQ(3,a.dimension_4());
|
|
||||||
ASSERT_EQ(2,a.dimension_5());
|
|
||||||
ASSERT_EQ(4,a.dimension_6());
|
|
||||||
ASSERT_EQ(3,a.dimension_7());
|
|
||||||
ASSERT_EQ(real_size,a.dimension(0));
|
|
||||||
ASSERT_EQ(7,a.dimension(1));
|
|
||||||
ASSERT_EQ(3,a.dimension(2));
|
|
||||||
ASSERT_EQ(2,a.dimension(3));
|
|
||||||
ASSERT_EQ(3,a.dimension(4));
|
|
||||||
ASSERT_EQ(2,a.dimension(5));
|
|
||||||
ASSERT_EQ(4,a.dimension(6));
|
|
||||||
ASSERT_EQ(3,a.dimension(7));
|
|
||||||
ASSERT_EQ(8,a.Rank);
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,1);
|
|
||||||
ASSERT_EQ(test.reference,test.result);
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,2);
|
|
||||||
ASSERT_EQ(test.reference,test.result);
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,3);
|
|
||||||
ASSERT_EQ(test.reference,test.result);
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,4);
|
|
||||||
ASSERT_EQ(test.reference,test.result);
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,5);
|
|
||||||
ASSERT_EQ(test.reference,test.result);
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,6);
|
|
||||||
ASSERT_EQ(test.reference,test.result);
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,7);
|
|
||||||
ASSERT_EQ(test.reference,test.result);
|
|
||||||
}
|
|
||||||
{
|
|
||||||
Impl::test_segmented_view<Scalar,ExecutionSpace> test(size,8);
|
|
||||||
ASSERT_EQ(test.reference,test.result);
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
} // namespace Test
|
|
||||||
|
|
||||||
#else
|
|
||||||
|
|
||||||
template <typename Scalar, class ExecutionSpace>
|
|
||||||
void test_segmented_view(unsigned int ) {}
|
|
||||||
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#endif /* #ifndef KOKKOS_TEST_SEGMENTEDVIEW_HPP */
|
|
||||||
|
|
||||||
@ -58,7 +58,6 @@
|
|||||||
#include <TestStaticCrsGraph.hpp>
|
#include <TestStaticCrsGraph.hpp>
|
||||||
#include <TestVector.hpp>
|
#include <TestVector.hpp>
|
||||||
#include <TestDualView.hpp>
|
#include <TestDualView.hpp>
|
||||||
#include <TestSegmentedView.hpp>
|
|
||||||
#include <TestDynamicView.hpp>
|
#include <TestDynamicView.hpp>
|
||||||
#include <TestComplex.hpp>
|
#include <TestComplex.hpp>
|
||||||
|
|
||||||
@ -67,6 +66,9 @@
|
|||||||
#include <Kokkos_DynRankView.hpp>
|
#include <Kokkos_DynRankView.hpp>
|
||||||
#include <TestDynViewAPI.hpp>
|
#include <TestDynViewAPI.hpp>
|
||||||
|
|
||||||
|
#include <Kokkos_ErrorReporter.hpp>
|
||||||
|
#include <TestErrorReporter.hpp>
|
||||||
|
|
||||||
namespace Test {
|
namespace Test {
|
||||||
|
|
||||||
class serial : public ::testing::Test {
|
class serial : public ::testing::Test {
|
||||||
@ -135,11 +137,6 @@ TEST_F( serial, bitset )
|
|||||||
test_dualview_combinations<int,Kokkos::Serial>(size); \
|
test_dualview_combinations<int,Kokkos::Serial>(size); \
|
||||||
}
|
}
|
||||||
|
|
||||||
#define SERIAL_SEGMENTEDVIEW_TEST( size ) \
|
|
||||||
TEST_F( serial, segmentedview_##size##x) { \
|
|
||||||
test_segmented_view<double,Kokkos::Serial>(size); \
|
|
||||||
}
|
|
||||||
|
|
||||||
SERIAL_INSERT_TEST(close, 100000, 90000, 100, 500, true)
|
SERIAL_INSERT_TEST(close, 100000, 90000, 100, 500, true)
|
||||||
SERIAL_INSERT_TEST(far, 100000, 90000, 100, 500, false)
|
SERIAL_INSERT_TEST(far, 100000, 90000, 100, 500, false)
|
||||||
SERIAL_FAILED_INSERT_TEST( 10000, 1000 )
|
SERIAL_FAILED_INSERT_TEST( 10000, 1000 )
|
||||||
@ -148,7 +145,6 @@ SERIAL_DEEP_COPY( 10000, 1 )
|
|||||||
SERIAL_VECTOR_COMBINE_TEST( 10 )
|
SERIAL_VECTOR_COMBINE_TEST( 10 )
|
||||||
SERIAL_VECTOR_COMBINE_TEST( 3057 )
|
SERIAL_VECTOR_COMBINE_TEST( 3057 )
|
||||||
SERIAL_DUALVIEW_COMBINE_TEST( 10 )
|
SERIAL_DUALVIEW_COMBINE_TEST( 10 )
|
||||||
SERIAL_SEGMENTEDVIEW_TEST( 10000 )
|
|
||||||
|
|
||||||
#undef SERIAL_INSERT_TEST
|
#undef SERIAL_INSERT_TEST
|
||||||
#undef SERIAL_FAILED_INSERT_TEST
|
#undef SERIAL_FAILED_INSERT_TEST
|
||||||
@ -156,7 +152,6 @@ SERIAL_SEGMENTEDVIEW_TEST( 10000 )
|
|||||||
#undef SERIAL_DEEP_COPY
|
#undef SERIAL_DEEP_COPY
|
||||||
#undef SERIAL_VECTOR_COMBINE_TEST
|
#undef SERIAL_VECTOR_COMBINE_TEST
|
||||||
#undef SERIAL_DUALVIEW_COMBINE_TEST
|
#undef SERIAL_DUALVIEW_COMBINE_TEST
|
||||||
#undef SERIAL_SEGMENTEDVIEW_TEST
|
|
||||||
|
|
||||||
TEST_F( serial , dynamic_view )
|
TEST_F( serial , dynamic_view )
|
||||||
{
|
{
|
||||||
@ -168,6 +163,19 @@ TEST_F( serial , dynamic_view )
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#if defined(KOKKOS_CLASS_LAMBDA)
|
||||||
|
TEST_F(serial, ErrorReporterViaLambda)
|
||||||
|
{
|
||||||
|
TestErrorReporter<ErrorReporterDriverUseLambda<Kokkos::Serial>>();
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
TEST_F(serial, ErrorReporter)
|
||||||
|
{
|
||||||
|
TestErrorReporter<ErrorReporterDriver<Kokkos::Serial>>();
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
} // namespace Test
|
} // namespace Test
|
||||||
|
|
||||||
#endif // KOKKOS_HAVE_SERIAL
|
#endif // KOKKOS_HAVE_SERIAL
|
||||||
|
|||||||
@ -62,11 +62,13 @@
|
|||||||
#include <TestVector.hpp>
|
#include <TestVector.hpp>
|
||||||
#include <TestDualView.hpp>
|
#include <TestDualView.hpp>
|
||||||
#include <TestDynamicView.hpp>
|
#include <TestDynamicView.hpp>
|
||||||
#include <TestSegmentedView.hpp>
|
|
||||||
|
|
||||||
#include <Kokkos_DynRankView.hpp>
|
#include <Kokkos_DynRankView.hpp>
|
||||||
#include <TestDynViewAPI.hpp>
|
#include <TestDynViewAPI.hpp>
|
||||||
|
|
||||||
|
#include <Kokkos_ErrorReporter.hpp>
|
||||||
|
#include <TestErrorReporter.hpp>
|
||||||
|
|
||||||
namespace Test {
|
namespace Test {
|
||||||
|
|
||||||
class threads : public ::testing::Test {
|
class threads : public ::testing::Test {
|
||||||
@ -145,12 +147,6 @@ TEST_F( threads , staticcrsgraph )
|
|||||||
test_dualview_combinations<int,Kokkos::Threads>(size); \
|
test_dualview_combinations<int,Kokkos::Threads>(size); \
|
||||||
}
|
}
|
||||||
|
|
||||||
#define THREADS_SEGMENTEDVIEW_TEST( size ) \
|
|
||||||
TEST_F( threads, segmentedview_##size##x) { \
|
|
||||||
test_segmented_view<double,Kokkos::Threads>(size); \
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
THREADS_INSERT_TEST(far, 100000, 90000, 100, 500, false)
|
THREADS_INSERT_TEST(far, 100000, 90000, 100, 500, false)
|
||||||
THREADS_FAILED_INSERT_TEST( 10000, 1000 )
|
THREADS_FAILED_INSERT_TEST( 10000, 1000 )
|
||||||
THREADS_DEEP_COPY( 10000, 1 )
|
THREADS_DEEP_COPY( 10000, 1 )
|
||||||
@ -158,7 +154,6 @@ THREADS_DEEP_COPY( 10000, 1 )
|
|||||||
THREADS_VECTOR_COMBINE_TEST( 10 )
|
THREADS_VECTOR_COMBINE_TEST( 10 )
|
||||||
THREADS_VECTOR_COMBINE_TEST( 3057 )
|
THREADS_VECTOR_COMBINE_TEST( 3057 )
|
||||||
THREADS_DUALVIEW_COMBINE_TEST( 10 )
|
THREADS_DUALVIEW_COMBINE_TEST( 10 )
|
||||||
THREADS_SEGMENTEDVIEW_TEST( 10000 )
|
|
||||||
|
|
||||||
|
|
||||||
#undef THREADS_INSERT_TEST
|
#undef THREADS_INSERT_TEST
|
||||||
@ -167,8 +162,6 @@ THREADS_SEGMENTEDVIEW_TEST( 10000 )
|
|||||||
#undef THREADS_DEEP_COPY
|
#undef THREADS_DEEP_COPY
|
||||||
#undef THREADS_VECTOR_COMBINE_TEST
|
#undef THREADS_VECTOR_COMBINE_TEST
|
||||||
#undef THREADS_DUALVIEW_COMBINE_TEST
|
#undef THREADS_DUALVIEW_COMBINE_TEST
|
||||||
#undef THREADS_SEGMENTEDVIEW_TEST
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
TEST_F( threads , dynamic_view )
|
TEST_F( threads , dynamic_view )
|
||||||
@ -181,6 +174,19 @@ TEST_F( threads , dynamic_view )
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#if defined(KOKKOS_CLASS_LAMBDA)
|
||||||
|
TEST_F(threads, ErrorReporterViaLambda)
|
||||||
|
{
|
||||||
|
TestErrorReporter<ErrorReporterDriverUseLambda<Kokkos::Threads>>();
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
TEST_F(threads, ErrorReporter)
|
||||||
|
{
|
||||||
|
TestErrorReporter<ErrorReporterDriver<Kokkos::Threads>>();
|
||||||
|
}
|
||||||
|
|
||||||
} // namespace Test
|
} // namespace Test
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -2,3 +2,5 @@ TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
|
|||||||
LIB_OPTIONAL_TPLS Pthread CUDA HWLOC QTHREAD DLlib
|
LIB_OPTIONAL_TPLS Pthread CUDA HWLOC QTHREAD DLlib
|
||||||
TEST_OPTIONAL_TPLS CUSPARSE
|
TEST_OPTIONAL_TPLS CUSPARSE
|
||||||
)
|
)
|
||||||
|
|
||||||
|
TRIBITS_TPL_TENTATIVELY_ENABLE(DLlib)
|
||||||
@ -45,6 +45,16 @@
|
|||||||
#define KOKKOS_ENABLE_PROFILING 0
|
#define KOKKOS_ENABLE_PROFILING 0
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
#cmakedefine KOKKOS_HAVE_CUDA_RDC
|
||||||
|
#ifdef KOKKOS_HAVE_CUDA_RDC
|
||||||
|
#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#cmakedefine KOKKOS_HAVE_CUDA_LAMBDA
|
||||||
|
#ifdef KOKKOS_HAVE_CUDA_LAMBDA
|
||||||
|
#define KOKKOS_CUDA_USE_LAMBDA 1
|
||||||
|
#endif
|
||||||
|
|
||||||
// Don't forbid users from defining this macro on the command line,
|
// Don't forbid users from defining this macro on the command line,
|
||||||
// but still make sure that CMake logic can control its definition.
|
// but still make sure that CMake logic can control its definition.
|
||||||
#if ! defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
|
#if ! defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
|
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINRARY_DIR})
|
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINRARY_DIR})
|
||||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
|
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
||||||
|
|
||||||
SET(SOURCES
|
SET(SOURCES
|
||||||
PerfTestMain.cpp
|
PerfTestMain.cpp
|
||||||
@ -19,7 +19,7 @@ TRIBITS_ADD_EXECUTABLE(
|
|||||||
TESTONLYLIBS kokkos_gtest
|
TESTONLYLIBS kokkos_gtest
|
||||||
)
|
)
|
||||||
|
|
||||||
TRIBITS_ADD_EXECUTABLE_AND_TEST(
|
TRIBITS_ADD_TEST(
|
||||||
PerfTest
|
PerfTest
|
||||||
NAME PerfTestExec
|
NAME PerfTestExec
|
||||||
COMM serial mpi
|
COMM serial mpi
|
||||||
|
|||||||
@ -7,21 +7,18 @@ vpath %.cpp ${KOKKOS_PATH}/core/perf_test
|
|||||||
default: build_all
|
default: build_all
|
||||||
echo "End Build"
|
echo "End Build"
|
||||||
|
|
||||||
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
|
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||||
|
else
|
||||||
|
CXX = g++
|
||||||
|
endif
|
||||||
|
|
||||||
|
CXXFLAGS = -O3
|
||||||
|
LINK ?= $(CXX)
|
||||||
|
LDFLAGS ?= -lpthread
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
|
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
|
||||||
CXX = $(NVCC_WRAPPER)
|
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK = $(CXX)
|
|
||||||
LDFLAGS ?= -lpthread
|
|
||||||
else
|
|
||||||
CXX ?= g++
|
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK ?= $(CXX)
|
|
||||||
LDFLAGS ?= -lpthread
|
|
||||||
endif
|
|
||||||
|
|
||||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/perf_test
|
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/perf_test
|
||||||
|
|
||||||
TEST_TARGETS =
|
TEST_TARGETS =
|
||||||
|
|||||||
@ -79,10 +79,21 @@ class host : public ::testing::Test {
|
|||||||
protected:
|
protected:
|
||||||
static void SetUpTestCase()
|
static void SetUpTestCase()
|
||||||
{
|
{
|
||||||
const unsigned team_count = Kokkos::hwloc::get_available_numa_count();
|
if(Kokkos::hwloc::available()) {
|
||||||
const unsigned threads_per_team = 4 ;
|
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
|
||||||
|
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
|
||||||
|
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
|
||||||
|
|
||||||
TestHostDevice::initialize( team_count * threads_per_team );
|
unsigned threads_count = 0 ;
|
||||||
|
|
||||||
|
threads_count = std::max( 1u , numa_count )
|
||||||
|
* std::max( 2u , cores_per_numa * threads_per_core );
|
||||||
|
|
||||||
|
TestHostDevice::initialize( threads_count );
|
||||||
|
} else {
|
||||||
|
const unsigned thread_count = 4 ;
|
||||||
|
TestHostDevice::initialize( thread_count );
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void TearDownTestCase()
|
static void TearDownTestCase()
|
||||||
|
|||||||
@ -1,334 +0,0 @@
|
|||||||
/*
|
|
||||||
//@HEADER
|
|
||||||
// ************************************************************************
|
|
||||||
//
|
|
||||||
// Kokkos v. 2.0
|
|
||||||
// Copyright (2014) Sandia Corporation
|
|
||||||
//
|
|
||||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
|
||||||
// the U.S. Government retains certain rights in this software.
|
|
||||||
//
|
|
||||||
// Redistribution and use in source and binary forms, with or without
|
|
||||||
// modification, are permitted provided that the following conditions are
|
|
||||||
// met:
|
|
||||||
//
|
|
||||||
// 1. Redistributions of source code must retain the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer.
|
|
||||||
//
|
|
||||||
// 2. Redistributions in binary form must reproduce the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer in the
|
|
||||||
// documentation and/or other materials provided with the distribution.
|
|
||||||
//
|
|
||||||
// 3. Neither the name of the Corporation nor the names of the
|
|
||||||
// contributors may be used to endorse or promote products derived from
|
|
||||||
// this software without specific prior written permission.
|
|
||||||
//
|
|
||||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
|
||||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
||||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
||||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
|
||||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
|
||||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
|
||||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
|
||||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
|
||||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
|
||||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
|
||||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
||||||
//
|
|
||||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
|
||||||
//
|
|
||||||
// ************************************************************************
|
|
||||||
//@HEADER
|
|
||||||
*/
|
|
||||||
|
|
||||||
#ifndef KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
|
|
||||||
#define KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
|
|
||||||
|
|
||||||
/* only compile this file if CUDA is enabled for Kokkos */
|
|
||||||
#if defined( KOKKOS_HAVE_CUDA )
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
template<>
|
|
||||||
struct ViewOperatorBoundsErrorAbort< Kokkos::CudaSpace > {
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
static void apply( const size_t rank
|
|
||||||
, const size_t n0 , const size_t n1
|
|
||||||
, const size_t n2 , const size_t n3
|
|
||||||
, const size_t n4 , const size_t n5
|
|
||||||
, const size_t n6 , const size_t n7
|
|
||||||
, const size_t i0 , const size_t i1
|
|
||||||
, const size_t i2 , const size_t i3
|
|
||||||
, const size_t i4 , const size_t i5
|
|
||||||
, const size_t i6 , const size_t i7 )
|
|
||||||
{
|
|
||||||
const int r =
|
|
||||||
( n0 <= i0 ? 0 :
|
|
||||||
( n1 <= i1 ? 1 :
|
|
||||||
( n2 <= i2 ? 2 :
|
|
||||||
( n3 <= i3 ? 3 :
|
|
||||||
( n4 <= i4 ? 4 :
|
|
||||||
( n5 <= i5 ? 5 :
|
|
||||||
( n6 <= i6 ? 6 : 7 )))))));
|
|
||||||
const size_t n =
|
|
||||||
( n0 <= i0 ? n0 :
|
|
||||||
( n1 <= i1 ? n1 :
|
|
||||||
( n2 <= i2 ? n2 :
|
|
||||||
( n3 <= i3 ? n3 :
|
|
||||||
( n4 <= i4 ? n4 :
|
|
||||||
( n5 <= i5 ? n5 :
|
|
||||||
( n6 <= i6 ? n6 : n7 )))))));
|
|
||||||
const size_t i =
|
|
||||||
( n0 <= i0 ? i0 :
|
|
||||||
( n1 <= i1 ? i1 :
|
|
||||||
( n2 <= i2 ? i2 :
|
|
||||||
( n3 <= i3 ? i3 :
|
|
||||||
( n4 <= i4 ? i4 :
|
|
||||||
( n5 <= i5 ? i5 :
|
|
||||||
( n6 <= i6 ? i6 : i7 )))))));
|
|
||||||
printf("Cuda view array bounds error index %d : FAILED %lu < %lu\n" , r , i , n );
|
|
||||||
Kokkos::Impl::cuda_abort("Cuda view array bounds error");
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
} // namespace Impl
|
|
||||||
} // namespace Experimental
|
|
||||||
} // namespace Kokkos
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
// Cuda Texture fetches can be performed for 4, 8 and 16 byte objects (int,int2,int4)
|
|
||||||
// Via reinterpret_case this can be used to support all scalar types of those sizes.
|
|
||||||
// Any other scalar type falls back to either normal reads out of global memory,
|
|
||||||
// or using the __ldg intrinsic on Kepler GPUs or newer (Compute Capability >= 3.0)
|
|
||||||
|
|
||||||
template< typename ValueType , typename AliasType >
|
|
||||||
struct CudaTextureFetch {
|
|
||||||
|
|
||||||
::cudaTextureObject_t m_obj ;
|
|
||||||
const ValueType * m_ptr ;
|
|
||||||
int m_offset ;
|
|
||||||
|
|
||||||
// Deference operator pulls through texture object and returns by value
|
|
||||||
template< typename iType >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
ValueType operator[]( const iType & i ) const
|
|
||||||
{
|
|
||||||
#if defined( __CUDA_ARCH__ ) && ( 300 <= __CUDA_ARCH__ )
|
|
||||||
AliasType v = tex1Dfetch<AliasType>( m_obj , i + m_offset );
|
|
||||||
return *(reinterpret_cast<ValueType*> (&v));
|
|
||||||
#else
|
|
||||||
return m_ptr[ i ];
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
// Pointer to referenced memory
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
operator const ValueType * () const { return m_ptr ; }
|
|
||||||
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaTextureFetch() : m_obj() , m_ptr() , m_offset() {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
~CudaTextureFetch() {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaTextureFetch( const CudaTextureFetch & rhs )
|
|
||||||
: m_obj( rhs.m_obj )
|
|
||||||
, m_ptr( rhs.m_ptr )
|
|
||||||
, m_offset( rhs.m_offset )
|
|
||||||
{}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaTextureFetch( CudaTextureFetch && rhs )
|
|
||||||
: m_obj( rhs.m_obj )
|
|
||||||
, m_ptr( rhs.m_ptr )
|
|
||||||
, m_offset( rhs.m_offset )
|
|
||||||
{}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaTextureFetch & operator = ( const CudaTextureFetch & rhs )
|
|
||||||
{
|
|
||||||
m_obj = rhs.m_obj ;
|
|
||||||
m_ptr = rhs.m_ptr ;
|
|
||||||
m_offset = rhs.m_offset ;
|
|
||||||
return *this ;
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaTextureFetch & operator = ( CudaTextureFetch && rhs )
|
|
||||||
{
|
|
||||||
m_obj = rhs.m_obj ;
|
|
||||||
m_ptr = rhs.m_ptr ;
|
|
||||||
m_offset = rhs.m_offset ;
|
|
||||||
return *this ;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Texture object spans the entire allocation.
|
|
||||||
// This handle may view a subset of the allocation, so an offset is required.
|
|
||||||
template< class CudaMemorySpace >
|
|
||||||
inline explicit
|
|
||||||
CudaTextureFetch( const ValueType * const arg_ptr
|
|
||||||
, Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > & record
|
|
||||||
)
|
|
||||||
: m_obj( record.template attach_texture_object< AliasType >() )
|
|
||||||
, m_ptr( arg_ptr )
|
|
||||||
, m_offset( record.attach_texture_object_offset( reinterpret_cast<const AliasType*>( arg_ptr ) ) )
|
|
||||||
{}
|
|
||||||
};
|
|
||||||
|
|
||||||
#if defined( KOKKOS_CUDA_USE_LDG_INTRINSIC )
|
|
||||||
|
|
||||||
template< typename ValueType , typename AliasType >
|
|
||||||
struct CudaLDGFetch {
|
|
||||||
|
|
||||||
const ValueType * m_ptr ;
|
|
||||||
|
|
||||||
template< typename iType >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
ValueType operator[]( const iType & i ) const
|
|
||||||
{
|
|
||||||
AliasType v = __ldg(reinterpret_cast<AliasType*>(&m_ptr[i]));
|
|
||||||
return *(reinterpret_cast<ValueType*> (&v));
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
operator const ValueType * () const { return m_ptr ; }
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaLDGFetch() : m_ptr() {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
~CudaLDGFetch() {}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaLDGFetch( const CudaLDGFetch & rhs )
|
|
||||||
: m_ptr( rhs.m_ptr )
|
|
||||||
{}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaLDGFetch( CudaLDGFetch && rhs )
|
|
||||||
: m_ptr( rhs.m_ptr )
|
|
||||||
{}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaLDGFetch & operator = ( const CudaLDGFetch & rhs )
|
|
||||||
{
|
|
||||||
m_ptr = rhs.m_ptr ;
|
|
||||||
return *this ;
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
CudaLDGFetch & operator = ( CudaLDGFetch && rhs )
|
|
||||||
{
|
|
||||||
m_ptr = rhs.m_ptr ;
|
|
||||||
return *this ;
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class CudaMemorySpace >
|
|
||||||
inline explicit
|
|
||||||
CudaTextureFetch( const ValueType * const arg_ptr
|
|
||||||
, Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > const &
|
|
||||||
)
|
|
||||||
: m_ptr( arg_data_ptr )
|
|
||||||
{}
|
|
||||||
};
|
|
||||||
|
|
||||||
#endif
|
|
||||||
|
|
||||||
} // namespace Impl
|
|
||||||
} // namespace Experimental
|
|
||||||
} // namespace Kokkos
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
/** \brief Replace Default ViewDataHandle with Cuda texture fetch specialization
|
|
||||||
* if 'const' value type, CudaSpace and random access.
|
|
||||||
*/
|
|
||||||
template< class Traits >
|
|
||||||
class ViewDataHandle< Traits ,
|
|
||||||
typename std::enable_if<(
|
|
||||||
// Is Cuda memory space
|
|
||||||
( std::is_same< typename Traits::memory_space,Kokkos::CudaSpace>::value ||
|
|
||||||
std::is_same< typename Traits::memory_space,Kokkos::CudaUVMSpace>::value )
|
|
||||||
&&
|
|
||||||
// Is a trivial const value of 4, 8, or 16 bytes
|
|
||||||
std::is_trivial<typename Traits::const_value_type>::value
|
|
||||||
&&
|
|
||||||
std::is_same<typename Traits::const_value_type,typename Traits::value_type>::value
|
|
||||||
&&
|
|
||||||
( sizeof(typename Traits::const_value_type) == 4 ||
|
|
||||||
sizeof(typename Traits::const_value_type) == 8 ||
|
|
||||||
sizeof(typename Traits::const_value_type) == 16 )
|
|
||||||
&&
|
|
||||||
// Random access trait
|
|
||||||
( Traits::memory_traits::RandomAccess != 0 )
|
|
||||||
)>::type >
|
|
||||||
{
|
|
||||||
public:
|
|
||||||
|
|
||||||
using track_type = Kokkos::Experimental::Impl::SharedAllocationTracker ;
|
|
||||||
|
|
||||||
using value_type = typename Traits::const_value_type ;
|
|
||||||
using return_type = typename Traits::const_value_type ; // NOT a reference
|
|
||||||
|
|
||||||
using alias_type = typename std::conditional< ( sizeof(value_type) == 4 ) , int ,
|
|
||||||
typename std::conditional< ( sizeof(value_type) == 8 ) , ::int2 ,
|
|
||||||
typename std::conditional< ( sizeof(value_type) == 16 ) , ::int4 , void
|
|
||||||
>::type
|
|
||||||
>::type
|
|
||||||
>::type ;
|
|
||||||
|
|
||||||
#if defined( KOKKOS_CUDA_USE_LDG_INTRINSIC )
|
|
||||||
using handle_type = Kokkos::Experimental::Impl::CudaLDGFetch< value_type , alias_type > ;
|
|
||||||
#else
|
|
||||||
using handle_type = Kokkos::Experimental::Impl::CudaTextureFetch< value_type , alias_type > ;
|
|
||||||
#endif
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
static handle_type const & assign( handle_type const & arg_handle , track_type const & /* arg_tracker */ )
|
|
||||||
{
|
|
||||||
return arg_handle ;
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
static handle_type assign( value_type * arg_data_ptr, track_type const & arg_tracker )
|
|
||||||
{
|
|
||||||
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
|
|
||||||
// Assignment of texture = non-texture requires creation of a texture object
|
|
||||||
// which can only occur on the host. In addition, 'get_record' is only valid
|
|
||||||
// if called in a host execution space
|
|
||||||
return handle_type( arg_data_ptr , arg_tracker.template get_record< typename Traits::memory_space >() );
|
|
||||||
#else
|
|
||||||
Kokkos::Impl::cuda_abort("Cannot create Cuda texture object from within a Cuda kernel");
|
|
||||||
return handle_type();
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
|
|
||||||
#endif /* #ifndef KOKKOS_CUDA_VIEW_HPP */
|
|
||||||
|
|
||||||
@ -46,6 +46,7 @@
|
|||||||
#include <sstream>
|
#include <sstream>
|
||||||
#include <stdexcept>
|
#include <stdexcept>
|
||||||
#include <algorithm>
|
#include <algorithm>
|
||||||
|
#include <atomic>
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
|
|
||||||
/* only compile this file if CUDA is enabled for Kokkos */
|
/* only compile this file if CUDA is enabled for Kokkos */
|
||||||
@ -58,6 +59,11 @@
|
|||||||
#include <Cuda/Kokkos_Cuda_Internal.hpp>
|
#include <Cuda/Kokkos_Cuda_Internal.hpp>
|
||||||
#include <impl/Kokkos_Error.hpp>
|
#include <impl/Kokkos_Error.hpp>
|
||||||
|
|
||||||
|
#if (KOKKOS_ENABLE_PROFILING)
|
||||||
|
#include <impl/Kokkos_Profiling_Interface.hpp>
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
|
|
||||||
@ -65,6 +71,9 @@ namespace Kokkos {
|
|||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
namespace {
|
namespace {
|
||||||
|
|
||||||
|
static std::atomic<int> num_uvm_allocations(0) ;
|
||||||
|
|
||||||
cudaStream_t get_deep_copy_stream() {
|
cudaStream_t get_deep_copy_stream() {
|
||||||
static cudaStream_t s = 0;
|
static cudaStream_t s = 0;
|
||||||
if( s == 0) {
|
if( s == 0) {
|
||||||
@ -119,6 +128,7 @@ void CudaSpace::access_error( const void * const )
|
|||||||
Kokkos::Impl::throw_runtime_exception( msg );
|
Kokkos::Impl::throw_runtime_exception( msg );
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
|
|
||||||
bool CudaUVMSpace::available()
|
bool CudaUVMSpace::available()
|
||||||
@ -133,6 +143,11 @@ bool CudaUVMSpace::available()
|
|||||||
|
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
|
|
||||||
|
int CudaUVMSpace::number_of_allocations()
|
||||||
|
{
|
||||||
|
return Kokkos::Impl::num_uvm_allocations.load();
|
||||||
|
}
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
@ -167,7 +182,18 @@ void * CudaUVMSpace::allocate( const size_t arg_alloc_size ) const
|
|||||||
{
|
{
|
||||||
void * ptr = NULL;
|
void * ptr = NULL;
|
||||||
|
|
||||||
CUDA_SAFE_CALL( cudaMallocManaged( &ptr, arg_alloc_size , cudaMemAttachGlobal ) );
|
enum { max_uvm_allocations = 65536 };
|
||||||
|
|
||||||
|
if ( arg_alloc_size > 0 )
|
||||||
|
{
|
||||||
|
Kokkos::Impl::num_uvm_allocations++;
|
||||||
|
|
||||||
|
if ( Kokkos::Impl::num_uvm_allocations.load() > max_uvm_allocations ) {
|
||||||
|
Kokkos::Impl::throw_runtime_exception( "CudaUVM error: The maximum limit of UVM allocations exceeded (currently 65536)." ) ;
|
||||||
|
}
|
||||||
|
|
||||||
|
CUDA_SAFE_CALL( cudaMallocManaged( &ptr, arg_alloc_size , cudaMemAttachGlobal ) );
|
||||||
|
}
|
||||||
|
|
||||||
return ptr ;
|
return ptr ;
|
||||||
}
|
}
|
||||||
@ -191,7 +217,10 @@ void CudaSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_all
|
|||||||
void CudaUVMSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
|
void CudaUVMSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
|
||||||
{
|
{
|
||||||
try {
|
try {
|
||||||
CUDA_SAFE_CALL( cudaFree( arg_alloc_ptr ) );
|
if ( arg_alloc_ptr != nullptr ) {
|
||||||
|
Kokkos::Impl::num_uvm_allocations--;
|
||||||
|
CUDA_SAFE_CALL( cudaFree( arg_alloc_ptr ) );
|
||||||
|
}
|
||||||
} catch(...) {}
|
} catch(...) {}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -202,13 +231,24 @@ void CudaHostPinnedSpace::deallocate( void * const arg_alloc_ptr , const size_t
|
|||||||
} catch(...) {}
|
} catch(...) {}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
constexpr const char* CudaSpace::name() {
|
||||||
|
return m_name;
|
||||||
|
}
|
||||||
|
|
||||||
|
constexpr const char* CudaUVMSpace::name() {
|
||||||
|
return m_name;
|
||||||
|
}
|
||||||
|
|
||||||
|
constexpr const char* CudaHostPinnedSpace::name() {
|
||||||
|
return m_name;
|
||||||
|
}
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
SharedAllocationRecord< void , void >
|
SharedAllocationRecord< void , void >
|
||||||
@ -335,6 +375,18 @@ deallocate( SharedAllocationRecord< void , void > * arg_rec )
|
|||||||
SharedAllocationRecord< Kokkos::CudaSpace , void >::
|
SharedAllocationRecord< Kokkos::CudaSpace , void >::
|
||||||
~SharedAllocationRecord()
|
~SharedAllocationRecord()
|
||||||
{
|
{
|
||||||
|
#if (KOKKOS_ENABLE_PROFILING)
|
||||||
|
if(Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
|
|
||||||
|
SharedAllocationHeader header ;
|
||||||
|
Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>::DeepCopy( & header , RecordBase::m_alloc_ptr , sizeof(SharedAllocationHeader) );
|
||||||
|
|
||||||
|
Kokkos::Profiling::deallocateData(
|
||||||
|
Kokkos::Profiling::SpaceHandle(Kokkos::CudaSpace::name()),header.m_label,
|
||||||
|
data(),size());
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
|
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
|
||||||
, SharedAllocationRecord< void , void >::m_alloc_size
|
, SharedAllocationRecord< void , void >::m_alloc_size
|
||||||
);
|
);
|
||||||
@ -343,6 +395,15 @@ SharedAllocationRecord< Kokkos::CudaSpace , void >::
|
|||||||
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
|
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
|
||||||
~SharedAllocationRecord()
|
~SharedAllocationRecord()
|
||||||
{
|
{
|
||||||
|
#if (KOKKOS_ENABLE_PROFILING)
|
||||||
|
if(Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
|
Kokkos::fence(); //Make sure I can access the label ...
|
||||||
|
Kokkos::Profiling::deallocateData(
|
||||||
|
Kokkos::Profiling::SpaceHandle(Kokkos::CudaUVMSpace::name()),RecordBase::m_alloc_ptr->m_label,
|
||||||
|
data(),size());
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
|
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
|
||||||
, SharedAllocationRecord< void , void >::m_alloc_size
|
, SharedAllocationRecord< void , void >::m_alloc_size
|
||||||
);
|
);
|
||||||
@ -351,6 +412,14 @@ SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
|
|||||||
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
|
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
|
||||||
~SharedAllocationRecord()
|
~SharedAllocationRecord()
|
||||||
{
|
{
|
||||||
|
#if (KOKKOS_ENABLE_PROFILING)
|
||||||
|
if(Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
|
Kokkos::Profiling::deallocateData(
|
||||||
|
Kokkos::Profiling::SpaceHandle(Kokkos::CudaHostPinnedSpace::name()),RecordBase::m_alloc_ptr->m_label,
|
||||||
|
data(),size());
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
|
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
|
||||||
, SharedAllocationRecord< void , void >::m_alloc_size
|
, SharedAllocationRecord< void , void >::m_alloc_size
|
||||||
);
|
);
|
||||||
@ -373,6 +442,12 @@ SharedAllocationRecord( const Kokkos::CudaSpace & arg_space
|
|||||||
, m_tex_obj( 0 )
|
, m_tex_obj( 0 )
|
||||||
, m_space( arg_space )
|
, m_space( arg_space )
|
||||||
{
|
{
|
||||||
|
#if (KOKKOS_ENABLE_PROFILING)
|
||||||
|
if(Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
|
Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
SharedAllocationHeader header ;
|
SharedAllocationHeader header ;
|
||||||
|
|
||||||
// Fill in the Header information
|
// Fill in the Header information
|
||||||
@ -404,7 +479,12 @@ SharedAllocationRecord( const Kokkos::CudaUVMSpace & arg_space
|
|||||||
, m_tex_obj( 0 )
|
, m_tex_obj( 0 )
|
||||||
, m_space( arg_space )
|
, m_space( arg_space )
|
||||||
{
|
{
|
||||||
// Fill in the Header information, directly accessible via UVM
|
#if (KOKKOS_ENABLE_PROFILING)
|
||||||
|
if(Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
|
Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
// Fill in the Header information, directly accessible via UVM
|
||||||
|
|
||||||
RecordBase::m_alloc_ptr->m_record = this ;
|
RecordBase::m_alloc_ptr->m_record = this ;
|
||||||
|
|
||||||
@ -430,6 +510,11 @@ SharedAllocationRecord( const Kokkos::CudaHostPinnedSpace & arg_space
|
|||||||
)
|
)
|
||||||
, m_space( arg_space )
|
, m_space( arg_space )
|
||||||
{
|
{
|
||||||
|
#if (KOKKOS_ENABLE_PROFILING)
|
||||||
|
if(Kokkos::Profiling::profileLibraryLoaded()) {
|
||||||
|
Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
// Fill in the Header information, directly accessible via UVM
|
// Fill in the Header information, directly accessible via UVM
|
||||||
|
|
||||||
RecordBase::m_alloc_ptr->m_record = this ;
|
RecordBase::m_alloc_ptr->m_record = this ;
|
||||||
@ -502,6 +587,7 @@ void SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
|
|||||||
deallocate_tracked( void * const arg_alloc_ptr )
|
deallocate_tracked( void * const arg_alloc_ptr )
|
||||||
{
|
{
|
||||||
if ( arg_alloc_ptr != 0 ) {
|
if ( arg_alloc_ptr != 0 ) {
|
||||||
|
|
||||||
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
|
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
|
||||||
|
|
||||||
RecordBase::decrement( r );
|
RecordBase::decrement( r );
|
||||||
@ -587,7 +673,7 @@ SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record( void * alloc_ptr
|
|||||||
RecordCuda * const record = alloc_ptr ? static_cast< RecordCuda * >( head.m_record ) : (RecordCuda *) 0 ;
|
RecordCuda * const record = alloc_ptr ? static_cast< RecordCuda * >( head.m_record ) : (RecordCuda *) 0 ;
|
||||||
|
|
||||||
if ( ! alloc_ptr || record->m_alloc_ptr != head_cuda ) {
|
if ( ! alloc_ptr || record->m_alloc_ptr != head_cuda ) {
|
||||||
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
|
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
|
||||||
}
|
}
|
||||||
|
|
||||||
#else
|
#else
|
||||||
@ -598,7 +684,7 @@ SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record( void * alloc_ptr
|
|||||||
RecordCuda * const record = static_cast< RecordCuda * >( RecordBase::find( & s_root_record , alloc_ptr ) );
|
RecordCuda * const record = static_cast< RecordCuda * >( RecordBase::find( & s_root_record , alloc_ptr ) );
|
||||||
|
|
||||||
if ( record == 0 ) {
|
if ( record == 0 ) {
|
||||||
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
|
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
|
||||||
}
|
}
|
||||||
|
|
||||||
#endif
|
#endif
|
||||||
@ -615,7 +701,7 @@ SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record( void * alloc_
|
|||||||
Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;
|
Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;
|
||||||
|
|
||||||
if ( ! alloc_ptr || h->m_record->m_alloc_ptr != h ) {
|
if ( ! alloc_ptr || h->m_record->m_alloc_ptr != h ) {
|
||||||
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record ERROR" ) );
|
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record ERROR" ) );
|
||||||
}
|
}
|
||||||
|
|
||||||
return static_cast< RecordCuda * >( h->m_record );
|
return static_cast< RecordCuda * >( h->m_record );
|
||||||
@ -630,7 +716,7 @@ SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record( void *
|
|||||||
Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;
|
Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;
|
||||||
|
|
||||||
if ( ! alloc_ptr || h->m_record->m_alloc_ptr != h ) {
|
if ( ! alloc_ptr || h->m_record->m_alloc_ptr != h ) {
|
||||||
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record ERROR" ) );
|
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record ERROR" ) );
|
||||||
}
|
}
|
||||||
|
|
||||||
return static_cast< RecordCuda * >( h->m_record );
|
return static_cast< RecordCuda * >( h->m_record );
|
||||||
@ -728,7 +814,6 @@ print_records( std::ostream & s , const Kokkos::CudaHostPinnedSpace & space , bo
|
|||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
} // namespace Experimental
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
|
|||||||
@ -384,10 +384,10 @@ void CudaInternal::initialize( int cuda_device_id , int stream_count )
|
|||||||
const bool ok_id = 0 <= cuda_device_id &&
|
const bool ok_id = 0 <= cuda_device_id &&
|
||||||
cuda_device_id < dev_info.m_cudaDevCount ;
|
cuda_device_id < dev_info.m_cudaDevCount ;
|
||||||
|
|
||||||
// Need device capability 2.0 or better
|
// Need device capability 3.0 or better
|
||||||
|
|
||||||
const bool ok_dev = ok_id &&
|
const bool ok_dev = ok_id &&
|
||||||
( 2 <= dev_info.m_cudaProp[ cuda_device_id ].major &&
|
( 3 <= dev_info.m_cudaProp[ cuda_device_id ].major &&
|
||||||
0 <= dev_info.m_cudaProp[ cuda_device_id ].minor );
|
0 <= dev_info.m_cudaProp[ cuda_device_id ].minor );
|
||||||
|
|
||||||
if ( ok_init && ok_dev ) {
|
if ( ok_init && ok_dev ) {
|
||||||
@ -444,7 +444,7 @@ void CudaInternal::initialize( int cuda_device_id , int stream_count )
|
|||||||
//----------------------------------
|
//----------------------------------
|
||||||
// Maximum number of blocks:
|
// Maximum number of blocks:
|
||||||
|
|
||||||
m_maxBlock = m_cudaArch < 300 ? 65535 : cudaProp.maxGridSize[0] ;
|
m_maxBlock = cudaProp.maxGridSize[0] ;
|
||||||
|
|
||||||
//----------------------------------
|
//----------------------------------
|
||||||
|
|
||||||
@ -495,7 +495,7 @@ void CudaInternal::initialize( int cuda_device_id , int stream_count )
|
|||||||
msg << dev_info.m_cudaProp[ cuda_device_id ].major ;
|
msg << dev_info.m_cudaProp[ cuda_device_id ].major ;
|
||||||
msg << "." ;
|
msg << "." ;
|
||||||
msg << dev_info.m_cudaProp[ cuda_device_id ].minor ;
|
msg << dev_info.m_cudaProp[ cuda_device_id ].minor ;
|
||||||
msg << " has insufficient capability, required 2.0 or better" ;
|
msg << " has insufficient capability, required 3.0 or better" ;
|
||||||
}
|
}
|
||||||
Kokkos::Impl::throw_runtime_exception( msg.str() );
|
Kokkos::Impl::throw_runtime_exception( msg.str() );
|
||||||
}
|
}
|
||||||
|
|||||||
@ -95,27 +95,42 @@ private:
|
|||||||
|
|
||||||
public:
|
public:
|
||||||
|
|
||||||
#if defined( __CUDA_ARCH__ )
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
|
||||||
__device__ inline
|
|
||||||
const execution_space::scratch_memory_space & team_shmem() const
|
const execution_space::scratch_memory_space & team_shmem() const
|
||||||
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
|
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
|
||||||
__device__ inline
|
KOKKOS_INLINE_FUNCTION
|
||||||
const execution_space::scratch_memory_space & team_scratch(const int& level) const
|
const execution_space::scratch_memory_space & team_scratch(const int& level) const
|
||||||
{ return m_team_shared.set_team_thread_mode(level,1,0) ; }
|
{ return m_team_shared.set_team_thread_mode(level,1,0) ; }
|
||||||
__device__ inline
|
KOKKOS_INLINE_FUNCTION
|
||||||
const execution_space::scratch_memory_space & thread_scratch(const int& level) const
|
const execution_space::scratch_memory_space & thread_scratch(const int& level) const
|
||||||
{ return m_team_shared.set_team_thread_mode(level,team_size(),team_rank()) ; }
|
{ return m_team_shared.set_team_thread_mode(level,team_size(),team_rank()) ; }
|
||||||
|
|
||||||
__device__ inline int league_rank() const { return m_league_rank ; }
|
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
|
||||||
__device__ inline int league_size() const { return m_league_size ; }
|
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
|
||||||
__device__ inline int team_rank() const { return threadIdx.y ; }
|
KOKKOS_INLINE_FUNCTION int team_rank() const {
|
||||||
__device__ inline int team_size() const { return blockDim.y ; }
|
#ifdef __CUDA_ARCH__
|
||||||
|
return threadIdx.y ;
|
||||||
|
#else
|
||||||
|
return 1;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
KOKKOS_INLINE_FUNCTION int team_size() const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
return blockDim.y ;
|
||||||
|
#else
|
||||||
|
return 1;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
__device__ inline void team_barrier() const { __syncthreads(); }
|
KOKKOS_INLINE_FUNCTION void team_barrier() const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
__syncthreads();
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
template<class ValueType>
|
template<class ValueType>
|
||||||
__device__ inline void team_broadcast(ValueType& value, const int& thread_id) const {
|
KOKKOS_INLINE_FUNCTION void team_broadcast(ValueType& value, const int& thread_id) const {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
__shared__ ValueType sh_val;
|
__shared__ ValueType sh_val;
|
||||||
if(threadIdx.x == 0 && threadIdx.y == thread_id) {
|
if(threadIdx.x == 0 && threadIdx.y == thread_id) {
|
||||||
sh_val = value;
|
sh_val = value;
|
||||||
@ -123,26 +138,17 @@ public:
|
|||||||
team_barrier();
|
team_barrier();
|
||||||
value = sh_val;
|
value = sh_val;
|
||||||
team_barrier();
|
team_barrier();
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef KOKKOS_HAVE_CXX11
|
|
||||||
template< class ValueType, class JoinOp >
|
template< class ValueType, class JoinOp >
|
||||||
__device__ inline
|
KOKKOS_INLINE_FUNCTION
|
||||||
typename JoinOp::value_type team_reduce( const ValueType & value
|
typename JoinOp::value_type team_reduce( const ValueType & value
|
||||||
, const JoinOp & op_in ) const
|
, const JoinOp & op_in ) const {
|
||||||
{
|
#ifdef __CUDA_ARCH__
|
||||||
typedef JoinLambdaAdapter<ValueType,JoinOp> JoinOpFunctor ;
|
typedef JoinLambdaAdapter<ValueType,JoinOp> JoinOpFunctor ;
|
||||||
const JoinOpFunctor op(op_in);
|
const JoinOpFunctor op(op_in);
|
||||||
ValueType * const base_data = (ValueType *) m_team_reduce ;
|
ValueType * const base_data = (ValueType *) m_team_reduce ;
|
||||||
#else
|
|
||||||
template< class JoinOp >
|
|
||||||
__device__ inline
|
|
||||||
typename JoinOp::value_type team_reduce( const typename JoinOp::value_type & value
|
|
||||||
, const JoinOp & op ) const
|
|
||||||
{
|
|
||||||
typedef JoinOp JoinOpFunctor ;
|
|
||||||
typename JoinOp::value_type * const base_data = (typename JoinOp::value_type *) m_team_reduce ;
|
|
||||||
#endif
|
|
||||||
|
|
||||||
__syncthreads(); // Don't write in to shared data until all threads have entered this function
|
__syncthreads(); // Don't write in to shared data until all threads have entered this function
|
||||||
|
|
||||||
@ -153,6 +159,9 @@ public:
|
|||||||
Impl::cuda_intra_block_reduce_scan<false,JoinOpFunctor,void>( op , base_data );
|
Impl::cuda_intra_block_reduce_scan<false,JoinOpFunctor,void>( op , base_data );
|
||||||
|
|
||||||
return base_data[ blockDim.y - 1 ];
|
return base_data[ blockDim.y - 1 ];
|
||||||
|
#else
|
||||||
|
return typename JoinOp::value_type();
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
|
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
|
||||||
@ -165,8 +174,8 @@ public:
|
|||||||
* non-deterministic.
|
* non-deterministic.
|
||||||
*/
|
*/
|
||||||
template< typename Type >
|
template< typename Type >
|
||||||
__device__ inline Type team_scan( const Type & value , Type * const global_accum ) const
|
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const {
|
||||||
{
|
#ifdef __CUDA_ARCH__
|
||||||
Type * const base_data = (Type *) m_team_reduce ;
|
Type * const base_data = (Type *) m_team_reduce ;
|
||||||
|
|
||||||
__syncthreads(); // Don't write in to shared data until all threads have entered this function
|
__syncthreads(); // Don't write in to shared data until all threads have entered this function
|
||||||
@ -186,6 +195,9 @@ public:
|
|||||||
}
|
}
|
||||||
|
|
||||||
return base_data[ threadIdx.y ];
|
return base_data[ threadIdx.y ];
|
||||||
|
#else
|
||||||
|
return Type();
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
|
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
|
||||||
@ -194,13 +206,14 @@ public:
|
|||||||
* reduction_total = dev.team_scan( value ) + value ;
|
* reduction_total = dev.team_scan( value ) + value ;
|
||||||
*/
|
*/
|
||||||
template< typename Type >
|
template< typename Type >
|
||||||
__device__ inline Type team_scan( const Type & value ) const
|
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const {
|
||||||
{ return this->template team_scan<Type>( value , 0 ); }
|
return this->template team_scan<Type>( value , 0 );
|
||||||
|
}
|
||||||
|
|
||||||
//----------------------------------------
|
//----------------------------------------
|
||||||
// Private for the driver
|
// Private for the driver
|
||||||
|
|
||||||
__device__ inline
|
KOKKOS_INLINE_FUNCTION
|
||||||
CudaTeamMember( void * shared
|
CudaTeamMember( void * shared
|
||||||
, const int shared_begin
|
, const int shared_begin
|
||||||
, const int shared_size
|
, const int shared_size
|
||||||
@ -214,47 +227,6 @@ public:
|
|||||||
, m_league_size( arg_league_size )
|
, m_league_size( arg_league_size )
|
||||||
{}
|
{}
|
||||||
|
|
||||||
#else
|
|
||||||
|
|
||||||
const execution_space::scratch_memory_space & team_shmem() const
|
|
||||||
{ return m_team_shared.set_team_thread_mode(0, 1,0) ; }
|
|
||||||
const execution_space::scratch_memory_space & team_scratch(const int& level) const
|
|
||||||
{ return m_team_shared.set_team_thread_mode(level,1,0) ; }
|
|
||||||
const execution_space::scratch_memory_space & thread_scratch(const int& level) const
|
|
||||||
{ return m_team_shared.set_team_thread_mode(level,team_size(),team_rank()) ; }
|
|
||||||
|
|
||||||
int league_rank() const {return 0;}
|
|
||||||
int league_size() const {return 1;}
|
|
||||||
int team_rank() const {return 0;}
|
|
||||||
int team_size() const {return 1;}
|
|
||||||
|
|
||||||
void team_barrier() const {}
|
|
||||||
template<class ValueType>
|
|
||||||
void team_broadcast(ValueType& value, const int& thread_id) const {}
|
|
||||||
|
|
||||||
template< class JoinOp >
|
|
||||||
typename JoinOp::value_type team_reduce( const typename JoinOp::value_type & value
|
|
||||||
, const JoinOp & op ) const {return typename JoinOp::value_type();}
|
|
||||||
|
|
||||||
template< typename Type >
|
|
||||||
Type team_scan( const Type & value , Type * const global_accum ) const {return Type();}
|
|
||||||
|
|
||||||
template< typename Type >
|
|
||||||
Type team_scan( const Type & value ) const {return Type();}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
// Private for the driver
|
|
||||||
|
|
||||||
CudaTeamMember( void * shared
|
|
||||||
, const int shared_begin
|
|
||||||
, const int shared_end
|
|
||||||
, void* scratch_level_1_ptr
|
|
||||||
, const int scratch_level_1_size
|
|
||||||
, const int arg_league_rank
|
|
||||||
, const int arg_league_size );
|
|
||||||
|
|
||||||
#endif /* #if ! defined( __CUDA_ARCH__ ) */
|
|
||||||
|
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
@ -847,7 +819,6 @@ public:
|
|||||||
|
|
||||||
const int shmem = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( m_functor , block.y );
|
const int shmem = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( m_functor , block.y );
|
||||||
|
|
||||||
|
|
||||||
CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem ); // copy to device and execute
|
CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem ); // copy to device and execute
|
||||||
|
|
||||||
Cuda::fence();
|
Cuda::fence();
|
||||||
@ -925,7 +896,6 @@ private:
|
|||||||
typedef typename ValueTraits::reference_type reference_type ;
|
typedef typename ValueTraits::reference_type reference_type ;
|
||||||
typedef typename ValueTraits::value_type value_type ;
|
typedef typename ValueTraits::value_type value_type ;
|
||||||
|
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
|
||||||
typedef FunctorType functor_type ;
|
typedef FunctorType functor_type ;
|
||||||
@ -937,7 +907,6 @@ private:
|
|||||||
typedef double DummyShflReductionType;
|
typedef double DummyShflReductionType;
|
||||||
typedef int DummySHMEMReductionType;
|
typedef int DummySHMEMReductionType;
|
||||||
|
|
||||||
|
|
||||||
// Algorithmic constraints: blockDim.y is a power of two AND blockDim.y == blockDim.z == 1
|
// Algorithmic constraints: blockDim.y is a power of two AND blockDim.y == blockDim.z == 1
|
||||||
// shared memory utilization:
|
// shared memory utilization:
|
||||||
//
|
//
|
||||||
@ -1058,29 +1027,37 @@ public:
|
|||||||
inline
|
inline
|
||||||
void execute()
|
void execute()
|
||||||
{
|
{
|
||||||
const int block_count = UseShflReduction? std::min( m_league_size , size_type(1024) )
|
const int nwork = m_league_size * m_team_size ;
|
||||||
:std::min( m_league_size , m_team_size );
|
if ( nwork ) {
|
||||||
|
const int block_count = UseShflReduction? std::min( m_league_size , size_type(1024) )
|
||||||
|
:std::min( m_league_size , m_team_size );
|
||||||
|
|
||||||
m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) * block_count );
|
m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) * block_count );
|
||||||
m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) );
|
m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) );
|
||||||
m_unified_space = cuda_internal_scratch_unified( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) );
|
m_unified_space = cuda_internal_scratch_unified( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) );
|
||||||
|
|
||||||
const dim3 block( m_vector_size , m_team_size , 1 );
|
const dim3 block( m_vector_size , m_team_size , 1 );
|
||||||
const dim3 grid( block_count , 1 , 1 );
|
const dim3 grid( block_count , 1 , 1 );
|
||||||
const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;
|
const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;
|
||||||
|
|
||||||
CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem_size_total ); // copy to device and execute
|
CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem_size_total ); // copy to device and execute
|
||||||
|
|
||||||
Cuda::fence();
|
Cuda::fence();
|
||||||
|
|
||||||
if ( m_result_ptr ) {
|
if ( m_result_ptr ) {
|
||||||
if ( m_unified_space ) {
|
if ( m_unified_space ) {
|
||||||
const int count = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
|
const int count = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
|
||||||
for ( int i = 0 ; i < count ; ++i ) { m_result_ptr[i] = pointer_type(m_unified_space)[i] ; }
|
for ( int i = 0 ; i < count ; ++i ) { m_result_ptr[i] = pointer_type(m_unified_space)[i] ; }
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
const int size = ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) );
|
||||||
|
DeepCopy<HostSpace,CudaSpace>( m_result_ptr, m_scratch_space, size );
|
||||||
|
}
|
||||||
}
|
}
|
||||||
else {
|
}
|
||||||
const int size = ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) );
|
else {
|
||||||
DeepCopy<HostSpace,CudaSpace>( m_result_ptr, m_scratch_space, size );
|
if (m_result_ptr) {
|
||||||
|
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -1106,9 +1083,18 @@ public:
|
|||||||
, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
|
, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
|
||||||
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
|
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
|
||||||
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
|
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
|
||||||
arg_policy.vector_length() )
|
arg_policy.vector_length() )
|
||||||
, m_vector_size( arg_policy.vector_length() )
|
, m_vector_size( arg_policy.vector_length() )
|
||||||
, m_scratch_size{arg_policy.scratch_size(0,m_team_size),arg_policy.scratch_size(1,m_team_size)}
|
, m_scratch_size{
|
||||||
|
arg_policy.scratch_size(0,( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
|
||||||
|
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
|
||||||
|
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
|
||||||
|
arg_policy.vector_length() )
|
||||||
|
), arg_policy.scratch_size(1,( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
|
||||||
|
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
|
||||||
|
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
|
||||||
|
arg_policy.vector_length() )
|
||||||
|
)}
|
||||||
{
|
{
|
||||||
// Return Init value if the number of worksets is zero
|
// Return Init value if the number of worksets is zero
|
||||||
if( arg_policy.league_size() == 0) {
|
if( arg_policy.league_size() == 0) {
|
||||||
@ -1342,7 +1328,7 @@ private:
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Scan block values into locations shared_data[1..blockDim.y]
|
// Scan block values into locations shared_data[1..blockDim.y]
|
||||||
cuda_intra_block_reduce_scan<true,FunctorType,WorkTag>( m_functor , ValueTraits::pointer_type(shared_data+word_count.value) );
|
cuda_intra_block_reduce_scan<true,FunctorType,WorkTag>( m_functor , typename ValueTraits::pointer_type(shared_data+word_count.value) );
|
||||||
|
|
||||||
{
|
{
|
||||||
size_type * const block_total = shared_data + word_count.value * blockDim.y ;
|
size_type * const block_total = shared_data + word_count.value * blockDim.y ;
|
||||||
@ -1490,18 +1476,30 @@ namespace Impl {
|
|||||||
|
|
||||||
#ifdef __CUDA_ARCH__
|
#ifdef __CUDA_ARCH__
|
||||||
__device__ inline
|
__device__ inline
|
||||||
ThreadVectorRangeBoundariesStruct (const CudaTeamMember& thread, const iType& count):
|
ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count):
|
||||||
start( threadIdx.x ),
|
start( threadIdx.x ),
|
||||||
end( count ),
|
end( count ),
|
||||||
increment( blockDim.x )
|
increment( blockDim.x )
|
||||||
{}
|
{}
|
||||||
|
__device__ inline
|
||||||
|
ThreadVectorRangeBoundariesStruct (const iType& count):
|
||||||
|
start( threadIdx.x ),
|
||||||
|
end( count ),
|
||||||
|
increment( blockDim.x )
|
||||||
|
{}
|
||||||
#else
|
#else
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
ThreadVectorRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count):
|
ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count):
|
||||||
start( 0 ),
|
start( 0 ),
|
||||||
end( count ),
|
end( count ),
|
||||||
increment( 1 )
|
increment( 1 )
|
||||||
{}
|
{}
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
ThreadVectorRangeBoundariesStruct (const iType& count):
|
||||||
|
start( 0 ),
|
||||||
|
end( count ),
|
||||||
|
increment( 1 )
|
||||||
|
{}
|
||||||
#endif
|
#endif
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -1509,22 +1507,24 @@ namespace Impl {
|
|||||||
|
|
||||||
template<typename iType>
|
template<typename iType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>
|
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >
|
||||||
TeamThreadRange(const Impl::CudaTeamMember& thread, const iType& count) {
|
TeamThreadRange( const Impl::CudaTeamMember & thread, const iType & count ) {
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>(thread,count);
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, count );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template< typename iType1, typename iType2 >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>
|
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
|
||||||
TeamThreadRange(const Impl::CudaTeamMember& thread, const iType& begin, const iType& end) {
|
Impl::CudaTeamMember >
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>(thread,begin,end);
|
TeamThreadRange( const Impl::CudaTeamMember & thread, const iType1 & begin, const iType2 & end ) {
|
||||||
|
typedef typename std::common_type< iType1, iType2 >::type iType;
|
||||||
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, iType(begin), iType(end) );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template<typename iType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
|
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
|
||||||
ThreadVectorRange(const Impl::CudaTeamMember& thread, const iType& count) {
|
ThreadVectorRange(const Impl::CudaTeamMember& thread, const iType& count) {
|
||||||
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >(thread,count);
|
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >(thread,count);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -1571,9 +1571,10 @@ void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::Cud
|
|||||||
lambda(i,result);
|
lambda(i,result);
|
||||||
}
|
}
|
||||||
|
|
||||||
Impl::cuda_intra_warp_reduction(result,[&] (ValueType& dst, const ValueType& src) { dst+=src; });
|
Impl::cuda_intra_warp_reduction(result,[&] (ValueType& dst, const ValueType& src)
|
||||||
Impl::cuda_inter_warp_reduction(result,[&] (ValueType& dst, const ValueType& src) { dst+=src; });
|
{ dst+=src; });
|
||||||
|
Impl::cuda_inter_warp_reduction(result,[&] (ValueType& dst, const ValueType& src)
|
||||||
|
{ dst+=src; });
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -1923,4 +1924,3 @@ namespace Impl {
|
|||||||
#endif /* defined( __CUDACC__ ) */
|
#endif /* defined( __CUDACC__ ) */
|
||||||
|
|
||||||
#endif /* #ifndef KOKKOS_CUDA_PARALLEL_HPP */
|
#endif /* #ifndef KOKKOS_CUDA_PARALLEL_HPP */
|
||||||
|
|
||||||
|
|||||||
@ -139,6 +139,7 @@ bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgT
|
|||||||
typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type const result,
|
typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type const result,
|
||||||
Cuda::size_type * const m_scratch_flags,
|
Cuda::size_type * const m_scratch_flags,
|
||||||
const int max_active_thread = blockDim.y) {
|
const int max_active_thread = blockDim.y) {
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
typedef typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type pointer_type;
|
typedef typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type pointer_type;
|
||||||
typedef typename FunctorValueTraits< FunctorType , ArgTag >::value_type value_type;
|
typedef typename FunctorValueTraits< FunctorType , ArgTag >::value_type value_type;
|
||||||
|
|
||||||
@ -213,6 +214,9 @@ bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgT
|
|||||||
|
|
||||||
//The last block has in its thread=0 the global reduction value through "value"
|
//The last block has in its thread=0 the global reduction value through "value"
|
||||||
return last_block;
|
return last_block;
|
||||||
|
#else
|
||||||
|
return true;
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
@ -290,10 +294,10 @@ void cuda_intra_block_reduce_scan( const FunctorType & functor ,
|
|||||||
|
|
||||||
if ( ! ( rtid_inter + n < blockDim.y ) ) n = 0 ;
|
if ( ! ( rtid_inter + n < blockDim.y ) ) n = 0 ;
|
||||||
|
|
||||||
BLOCK_SCAN_STEP(tdata_inter,n,8)
|
__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,8)
|
||||||
BLOCK_SCAN_STEP(tdata_inter,n,7)
|
__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,7)
|
||||||
BLOCK_SCAN_STEP(tdata_inter,n,6)
|
__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,6)
|
||||||
BLOCK_SCAN_STEP(tdata_inter,n,5)
|
__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,5)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -308,12 +312,19 @@ void cuda_intra_block_reduce_scan( const FunctorType & functor ,
|
|||||||
( rtid_intra & 16 ) ? 16 : 0 ))));
|
( rtid_intra & 16 ) ? 16 : 0 ))));
|
||||||
|
|
||||||
if ( ! ( rtid_intra + n < blockDim.y ) ) n = 0 ;
|
if ( ! ( rtid_intra + n < blockDim.y ) ) n = 0 ;
|
||||||
|
#ifdef KOKKOS_CUDA_CLANG_WORKAROUND
|
||||||
|
BLOCK_SCAN_STEP(tdata_intra,n,4) __syncthreads();//__threadfence_block();
|
||||||
|
BLOCK_SCAN_STEP(tdata_intra,n,3) __syncthreads();//__threadfence_block();
|
||||||
|
BLOCK_SCAN_STEP(tdata_intra,n,2) __syncthreads();//__threadfence_block();
|
||||||
|
BLOCK_SCAN_STEP(tdata_intra,n,1) __syncthreads();//__threadfence_block();
|
||||||
|
BLOCK_SCAN_STEP(tdata_intra,n,0) __syncthreads();
|
||||||
|
#else
|
||||||
BLOCK_SCAN_STEP(tdata_intra,n,4) __threadfence_block();
|
BLOCK_SCAN_STEP(tdata_intra,n,4) __threadfence_block();
|
||||||
BLOCK_SCAN_STEP(tdata_intra,n,3) __threadfence_block();
|
BLOCK_SCAN_STEP(tdata_intra,n,3) __threadfence_block();
|
||||||
BLOCK_SCAN_STEP(tdata_intra,n,2) __threadfence_block();
|
BLOCK_SCAN_STEP(tdata_intra,n,2) __threadfence_block();
|
||||||
BLOCK_SCAN_STEP(tdata_intra,n,1) __threadfence_block();
|
BLOCK_SCAN_STEP(tdata_intra,n,1) __threadfence_block();
|
||||||
BLOCK_SCAN_STEP(tdata_intra,n,0)
|
BLOCK_SCAN_STEP(tdata_intra,n,0) __threadfence_block();
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
#undef BLOCK_SCAN_STEP
|
#undef BLOCK_SCAN_STEP
|
||||||
|
|||||||
@ -43,7 +43,7 @@
|
|||||||
|
|
||||||
#include <Kokkos_Core.hpp>
|
#include <Kokkos_Core.hpp>
|
||||||
|
|
||||||
#if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY )
|
#if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG )
|
||||||
|
|
||||||
#include <impl/Kokkos_TaskQueue_impl.hpp>
|
#include <impl/Kokkos_TaskQueue_impl.hpp>
|
||||||
|
|
||||||
@ -174,6 +174,6 @@ printf("cuda_task_queue_execute after\n");
|
|||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
|
#endif /* #if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG ) */
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -44,7 +44,7 @@
|
|||||||
#ifndef KOKKOS_IMPL_CUDA_TASK_HPP
|
#ifndef KOKKOS_IMPL_CUDA_TASK_HPP
|
||||||
#define KOKKOS_IMPL_CUDA_TASK_HPP
|
#define KOKKOS_IMPL_CUDA_TASK_HPP
|
||||||
|
|
||||||
#if defined( KOKKOS_ENABLE_TASKPOLICY )
|
#if defined( KOKKOS_ENABLE_TASKDAG )
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
@ -99,7 +99,7 @@ public:
|
|||||||
extern template class TaskQueue< Kokkos::Cuda > ;
|
extern template class TaskQueue< Kokkos::Cuda > ;
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
/**\brief Impl::TaskExec<Cuda> is the TaskPolicy<Cuda>::member_type
|
/**\brief Impl::TaskExec<Cuda> is the TaskScheduler<Cuda>::member_type
|
||||||
* passed to tasks running in a Cuda space.
|
* passed to tasks running in a Cuda space.
|
||||||
*
|
*
|
||||||
* Cuda thread blocks for tasking are dimensioned:
|
* Cuda thread blocks for tasking are dimensioned:
|
||||||
@ -234,19 +234,23 @@ namespace Kokkos {
|
|||||||
|
|
||||||
template<typename iType>
|
template<typename iType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >
|
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >
|
||||||
TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread
|
TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread, const iType & count )
|
||||||
, const iType & count )
|
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >(thread,count);
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >( thread, count );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template<typename iType1, typename iType2>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >
|
Impl::TeamThreadRangeBoundariesStruct
|
||||||
TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread, const iType & start , const iType & end )
|
< typename std::common_type<iType1,iType2>::type
|
||||||
|
, Impl::TaskExec< Kokkos::Cuda > >
|
||||||
|
TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread
|
||||||
|
, const iType1 & begin, const iType2 & end )
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Cuda > >(thread,start,end);
|
typedef typename std::common_type< iType1, iType2 >::type iType;
|
||||||
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >(
|
||||||
|
thread, iType(begin), iType(end) );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template<typename iType>
|
||||||
@ -514,6 +518,6 @@ void parallel_scan
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
|
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
|
||||||
#endif /* #ifndef KOKKOS_IMPL_CUDA_TASK_HPP */
|
#endif /* #ifndef KOKKOS_IMPL_CUDA_TASK_HPP */
|
||||||
|
|
||||||
|
|||||||
@ -1,932 +0,0 @@
|
|||||||
/*
|
|
||||||
//@HEADER
|
|
||||||
// ************************************************************************
|
|
||||||
//
|
|
||||||
// Kokkos v. 2.0
|
|
||||||
// Copyright (2014) Sandia Corporation
|
|
||||||
//
|
|
||||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
|
||||||
// the U.S. Government retains certain rights in this software.
|
|
||||||
//
|
|
||||||
// Redistribution and use in source and binary forms, with or without
|
|
||||||
// modification, are permitted provided that the following conditions are
|
|
||||||
// met:
|
|
||||||
//
|
|
||||||
// 1. Redistributions of source code must retain the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer.
|
|
||||||
//
|
|
||||||
// 2. Redistributions in binary form must reproduce the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer in the
|
|
||||||
// documentation and/or other materials provided with the distribution.
|
|
||||||
//
|
|
||||||
// 3. Neither the name of the Corporation nor the names of the
|
|
||||||
// contributors may be used to endorse or promote products derived from
|
|
||||||
// this software without specific prior written permission.
|
|
||||||
//
|
|
||||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
|
||||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
||||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
||||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
|
||||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
|
||||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
|
||||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
|
||||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
|
||||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
|
||||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
|
||||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
||||||
//
|
|
||||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
|
||||||
//
|
|
||||||
// ************************************************************************
|
|
||||||
//@HEADER
|
|
||||||
*/
|
|
||||||
|
|
||||||
// Experimental unified task-data parallel manycore LDRD
|
|
||||||
|
|
||||||
#include <stdio.h>
|
|
||||||
#include <iostream>
|
|
||||||
#include <sstream>
|
|
||||||
#include <Kokkos_Core.hpp>
|
|
||||||
#include <Cuda/Kokkos_Cuda_TaskPolicy.hpp>
|
|
||||||
|
|
||||||
#if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY )
|
|
||||||
|
|
||||||
// #define DETAILED_PRINT
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
#define QLOCK reinterpret_cast<void*>( ~((uintptr_t)0) )
|
|
||||||
#define QDENIED reinterpret_cast<void*>( ~((uintptr_t)0) - 1 )
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
void CudaTaskPolicyQueue::Destroy::destroy_shared_allocation()
|
|
||||||
{
|
|
||||||
// Verify the queue is empty
|
|
||||||
|
|
||||||
if ( m_policy->m_count_ready ||
|
|
||||||
m_policy->m_team[0] ||
|
|
||||||
m_policy->m_team[1] ||
|
|
||||||
m_policy->m_team[2] ||
|
|
||||||
m_policy->m_serial[0] ||
|
|
||||||
m_policy->m_serial[1] ||
|
|
||||||
m_policy->m_serial[2] ) {
|
|
||||||
Kokkos::abort("CudaTaskPolicyQueue ERROR : Attempt to destroy non-empty queue" );
|
|
||||||
}
|
|
||||||
|
|
||||||
m_policy->~CudaTaskPolicyQueue();
|
|
||||||
|
|
||||||
Kokkos::Cuda::fence();
|
|
||||||
}
|
|
||||||
|
|
||||||
CudaTaskPolicyQueue::
|
|
||||||
~CudaTaskPolicyQueue()
|
|
||||||
{
|
|
||||||
}
|
|
||||||
|
|
||||||
CudaTaskPolicyQueue::
|
|
||||||
CudaTaskPolicyQueue
|
|
||||||
( const unsigned arg_task_max_count
|
|
||||||
, const unsigned arg_task_max_size
|
|
||||||
, const unsigned arg_task_default_dependence_capacity
|
|
||||||
, const unsigned arg_team_size
|
|
||||||
)
|
|
||||||
: m_space( Kokkos::CudaUVMSpace()
|
|
||||||
, arg_task_max_size * arg_task_max_count * 1.2
|
|
||||||
, 16 /* log2(superblock size) */
|
|
||||||
)
|
|
||||||
, m_team { 0 , 0 , 0 }
|
|
||||||
, m_serial { 0 , 0 , 0 }
|
|
||||||
, m_team_size( 32 /* 1 warps */ )
|
|
||||||
, m_default_dependence_capacity( arg_task_default_dependence_capacity )
|
|
||||||
, m_count_ready(0)
|
|
||||||
{
|
|
||||||
constexpr int max_team_size = 32 * 16 /* 16 warps */ ;
|
|
||||||
|
|
||||||
const int target_team_size =
|
|
||||||
std::min( int(arg_team_size) , max_team_size );
|
|
||||||
|
|
||||||
while ( m_team_size < target_team_size ) { m_team_size *= 2 ; }
|
|
||||||
}
|
|
||||||
|
|
||||||
//-----------------------------------------------------------------------
|
|
||||||
// Called by each block & thread
|
|
||||||
|
|
||||||
__device__
|
|
||||||
void Kokkos::Experimental::Impl::CudaTaskPolicyQueue::driver()
|
|
||||||
{
|
|
||||||
task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
|
|
||||||
|
|
||||||
#define IS_TEAM_LEAD ( threadIdx.x == 0 && threadIdx.y == 0 )
|
|
||||||
|
|
||||||
#ifdef DETAILED_PRINT
|
|
||||||
if ( IS_TEAM_LEAD ) {
|
|
||||||
printf( "CudaTaskPolicyQueue::driver() begin on %d with count %d\n"
|
|
||||||
, blockIdx.x , m_count_ready );
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
// Each thread block must iterate this loop synchronously
|
|
||||||
// to insure team-execution of team-task
|
|
||||||
|
|
||||||
__shared__ task_root_type * team_task ;
|
|
||||||
|
|
||||||
__syncthreads();
|
|
||||||
|
|
||||||
do {
|
|
||||||
|
|
||||||
if ( IS_TEAM_LEAD ) {
|
|
||||||
if ( 0 == m_count_ready ) {
|
|
||||||
team_task = q_denied ; // All queues are empty and no running tasks
|
|
||||||
}
|
|
||||||
else {
|
|
||||||
team_task = 0 ;
|
|
||||||
for ( int i = 0 ; i < int(NPRIORITY) && 0 == team_task ; ++i ) {
|
|
||||||
if ( ( i < 2 /* regular queue */ )
|
|
||||||
|| ( ! m_space.is_empty() /* waiting for memory */ ) ) {
|
|
||||||
team_task = pop_ready_task( & m_team[i] );
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
__syncthreads();
|
|
||||||
|
|
||||||
#ifdef DETAILED_PRINT
|
|
||||||
if ( IS_TEAM_LEAD && 0 != team_task ) {
|
|
||||||
printf( "CudaTaskPolicyQueue::driver() (%d) team_task(0x%lx)\n"
|
|
||||||
, blockIdx.x
|
|
||||||
, (unsigned long) team_task );
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
// team_task == q_denied if all queues are empty
|
|
||||||
// team_task == 0 if no team tasks available
|
|
||||||
|
|
||||||
if ( q_denied != team_task ) {
|
|
||||||
if ( 0 != team_task ) {
|
|
||||||
|
|
||||||
Kokkos::Impl::CudaTeamMember
|
|
||||||
member( kokkos_impl_cuda_shared_memory<void>()
|
|
||||||
, 16 /* shared_begin */
|
|
||||||
, team_task->m_shmem_size /* shared size */
|
|
||||||
, 0 /* scratch level 1 pointer */
|
|
||||||
, 0 /* scratch level 1 size */
|
|
||||||
, 0 /* league rank */
|
|
||||||
, 1 /* league size */
|
|
||||||
);
|
|
||||||
|
|
||||||
(*team_task->m_team)( team_task , member );
|
|
||||||
|
|
||||||
// A __synthreads was called and if completed the
|
|
||||||
// functor was destroyed.
|
|
||||||
|
|
||||||
if ( IS_TEAM_LEAD ) {
|
|
||||||
complete_executed_task( team_task );
|
|
||||||
}
|
|
||||||
}
|
|
||||||
else {
|
|
||||||
// One thread of one warp performs this serial task
|
|
||||||
if ( threadIdx.x == 0 &&
|
|
||||||
0 == ( threadIdx.y % 32 ) ) {
|
|
||||||
task_root_type * task = 0 ;
|
|
||||||
for ( int i = 0 ; i < int(NPRIORITY) && 0 == task ; ++i ) {
|
|
||||||
if ( ( i < 2 /* regular queue */ )
|
|
||||||
|| ( ! m_space.is_empty() /* waiting for memory */ ) ) {
|
|
||||||
task = pop_ready_task( & m_serial[i] );
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#ifdef DETAILED_PRINT
|
|
||||||
if ( 0 != task ) {
|
|
||||||
printf( "CudaTaskPolicyQueue::driver() (%2d)(%d) single task(0x%lx)\n"
|
|
||||||
, blockIdx.x
|
|
||||||
, threadIdx.y
|
|
||||||
, (unsigned long) task );
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
if ( task ) {
|
|
||||||
(*task->m_serial)( task );
|
|
||||||
complete_executed_task( task );
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
__syncthreads();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} while ( q_denied != team_task );
|
|
||||||
|
|
||||||
#ifdef DETAILED_PRINT
|
|
||||||
if ( IS_TEAM_LEAD ) {
|
|
||||||
printf( "CudaTaskPolicyQueue::driver() end on %d with count %d\n"
|
|
||||||
, blockIdx.x , m_count_ready );
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#undef IS_TEAM_LEAD
|
|
||||||
}
|
|
||||||
|
|
||||||
//-----------------------------------------------------------------------
|
|
||||||
|
|
||||||
__device__
|
|
||||||
CudaTaskPolicyQueue::task_root_type *
|
|
||||||
CudaTaskPolicyQueue::pop_ready_task(
|
|
||||||
CudaTaskPolicyQueue::task_root_type * volatile * const queue )
|
|
||||||
{
|
|
||||||
task_root_type * const q_lock = reinterpret_cast<task_root_type*>(QLOCK);
|
|
||||||
task_root_type * task = 0 ;
|
|
||||||
task_root_type * const task_claim = *queue ;
|
|
||||||
|
|
||||||
if ( ( q_lock != task_claim ) && ( 0 != task_claim ) ) {
|
|
||||||
|
|
||||||
// Queue is not locked and not null, try to claim head of queue.
|
|
||||||
// Is a race among threads to claim the queue.
|
|
||||||
|
|
||||||
if ( task_claim == atomic_compare_exchange(queue,task_claim,q_lock) ) {
|
|
||||||
|
|
||||||
// Aquired the task which must be in the waiting state.
|
|
||||||
|
|
||||||
const int claim_state =
|
|
||||||
atomic_compare_exchange( & task_claim->m_state
|
|
||||||
, int(TASK_STATE_WAITING)
|
|
||||||
, int(TASK_STATE_EXECUTING) );
|
|
||||||
|
|
||||||
task_root_type * lock_verify = 0 ;
|
|
||||||
|
|
||||||
if ( claim_state == int(TASK_STATE_WAITING) ) {
|
|
||||||
|
|
||||||
// Transitioned this task from waiting to executing
|
|
||||||
// Update the queue to the next entry and release the lock
|
|
||||||
|
|
||||||
task_root_type * const next =
|
|
||||||
*((task_root_type * volatile *) & task_claim->m_next );
|
|
||||||
|
|
||||||
*((task_root_type * volatile *) & task_claim->m_next ) = 0 ;
|
|
||||||
|
|
||||||
lock_verify = atomic_compare_exchange( queue , q_lock , next );
|
|
||||||
}
|
|
||||||
|
|
||||||
if ( ( claim_state != int(TASK_STATE_WAITING) ) |
|
|
||||||
( q_lock != lock_verify ) ) {
|
|
||||||
|
|
||||||
printf( "CudaTaskPolicyQueue::pop_ready_task(0x%lx) task(0x%lx) state(%d) ERROR %s\n"
|
|
||||||
, (unsigned long) queue
|
|
||||||
, (unsigned long) task
|
|
||||||
, claim_state
|
|
||||||
, ( claim_state != int(TASK_STATE_WAITING)
|
|
||||||
? "NOT WAITING"
|
|
||||||
: "UNLOCK" ) );
|
|
||||||
Kokkos::abort("CudaTaskPolicyQueue::pop_ready_task");
|
|
||||||
}
|
|
||||||
|
|
||||||
task = task_claim ;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return task ;
|
|
||||||
}
|
|
||||||
|
|
||||||
//-----------------------------------------------------------------------
|
|
||||||
|
|
||||||
__device__
|
|
||||||
void CudaTaskPolicyQueue::complete_executed_task(
|
|
||||||
CudaTaskPolicyQueue::task_root_type * task )
|
|
||||||
{
|
|
||||||
task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
|
|
||||||
|
|
||||||
|
|
||||||
#ifdef DETAILED_PRINT
|
|
||||||
printf( "CudaTaskPolicyQueue::complete_executed_task(0x%lx) state(%d) (%d)(%d,%d)\n"
|
|
||||||
, (unsigned long) task
|
|
||||||
, task->m_state
|
|
||||||
, blockIdx.x
|
|
||||||
, threadIdx.x
|
|
||||||
, threadIdx.y
|
|
||||||
);
|
|
||||||
#endif
|
|
||||||
|
|
||||||
// State is either executing or if respawned then waiting,
|
|
||||||
// try to transition from executing to complete.
|
|
||||||
// Reads the current value.
|
|
||||||
|
|
||||||
const int state_old =
|
|
||||||
atomic_compare_exchange( & task->m_state
|
|
||||||
, int(Kokkos::Experimental::TASK_STATE_EXECUTING)
|
|
||||||
, int(Kokkos::Experimental::TASK_STATE_COMPLETE) );
|
|
||||||
|
|
||||||
if ( int(Kokkos::Experimental::TASK_STATE_WAITING) == state_old ) {
|
|
||||||
/* Task requested a respawn so reschedule it */
|
|
||||||
schedule_task( task , false /* not initial spawn */ );
|
|
||||||
}
|
|
||||||
else if ( int(Kokkos::Experimental::TASK_STATE_EXECUTING) == state_old ) {
|
|
||||||
/* Task is complete */
|
|
||||||
|
|
||||||
// Clear dependences of this task before locking wait queue
|
|
||||||
|
|
||||||
task->clear_dependence();
|
|
||||||
|
|
||||||
// Stop other tasks from adding themselves to this task's wait queue.
|
|
||||||
// The wait queue is updated concurrently so guard with an atomic.
|
|
||||||
|
|
||||||
task_root_type * wait_queue = *((task_root_type * volatile *) & task->m_wait );
|
|
||||||
task_root_type * wait_queue_old = 0 ;
|
|
||||||
|
|
||||||
do {
|
|
||||||
wait_queue_old = wait_queue ;
|
|
||||||
wait_queue = atomic_compare_exchange( & task->m_wait , wait_queue_old , q_denied );
|
|
||||||
} while ( wait_queue_old != wait_queue );
|
|
||||||
|
|
||||||
// The task has been removed from ready queue and
|
|
||||||
// execution is complete so decrement the reference count.
|
|
||||||
// The reference count was incremented by the initial spawning.
|
|
||||||
// The task may be deleted if this was the last reference.
|
|
||||||
|
|
||||||
task_root_type::assign( & task , 0 );
|
|
||||||
|
|
||||||
// Pop waiting tasks and schedule them
|
|
||||||
while ( wait_queue ) {
|
|
||||||
task_root_type * const x = wait_queue ; wait_queue = x->m_next ; x->m_next = 0 ;
|
|
||||||
schedule_task( x , false /* not initial spawn */ );
|
|
||||||
}
|
|
||||||
}
|
|
||||||
else {
|
|
||||||
printf( "CudaTaskPolicyQueue::complete_executed_task(0x%lx) ERROR state_old(%d) dep_size(%d)\n"
|
|
||||||
, (unsigned long)( task )
|
|
||||||
, int(state_old)
|
|
||||||
, task->m_dep_size
|
|
||||||
);
|
|
||||||
Kokkos::abort("CudaTaskPolicyQueue::complete_executed_task" );
|
|
||||||
}
|
|
||||||
|
|
||||||
// If the task was respawned it may have already been
|
|
||||||
// put in a ready queue and the count incremented.
|
|
||||||
// By decrementing the count last it will never go to zero
|
|
||||||
// with a ready or executing task.
|
|
||||||
|
|
||||||
atomic_fetch_add( & m_count_ready , -1 );
|
|
||||||
}
|
|
||||||
|
|
||||||
__device__
|
|
||||||
void TaskMember< Kokkos::Cuda , void , void >::latch_add( const int k )
|
|
||||||
{
|
|
||||||
typedef TaskMember< Kokkos::Cuda , void , void > task_root_type ;
|
|
||||||
|
|
||||||
task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
|
|
||||||
|
|
||||||
const bool ok_input = 0 < k ;
|
|
||||||
|
|
||||||
const int count = ok_input ? atomic_fetch_add( & m_dep_size , -k ) - k
|
|
||||||
: k ;
|
|
||||||
|
|
||||||
const bool ok_count = 0 <= count ;
|
|
||||||
|
|
||||||
const int state = 0 != count ? TASK_STATE_WAITING :
|
|
||||||
atomic_compare_exchange( & m_state
|
|
||||||
, TASK_STATE_WAITING
|
|
||||||
, TASK_STATE_COMPLETE );
|
|
||||||
|
|
||||||
const bool ok_state = state == TASK_STATE_WAITING ;
|
|
||||||
|
|
||||||
if ( ! ok_count || ! ok_state ) {
|
|
||||||
printf( "CudaTaskPolicyQueue::latch_add[0x%lx](%d) ERROR %s %d\n"
|
|
||||||
, (unsigned long) this
|
|
||||||
, k
|
|
||||||
, ( ! ok_input ? "Non-positive input" :
|
|
||||||
( ! ok_count ? "Negative count" : "Bad State" ) )
|
|
||||||
, ( ! ok_input ? k :
|
|
||||||
( ! ok_count ? count : state ) )
|
|
||||||
);
|
|
||||||
Kokkos::abort( "CudaTaskPolicyQueue::latch_add ERROR" );
|
|
||||||
}
|
|
||||||
else if ( 0 == count ) {
|
|
||||||
// Stop other tasks from adding themselves to this latch's wait queue.
|
|
||||||
// The wait queue is updated concurrently so guard with an atomic.
|
|
||||||
|
|
||||||
CudaTaskPolicyQueue & policy = *m_policy ;
|
|
||||||
task_root_type * wait_queue = *((task_root_type * volatile *) &m_wait);
|
|
||||||
task_root_type * wait_queue_old = 0 ;
|
|
||||||
|
|
||||||
do {
|
|
||||||
wait_queue_old = wait_queue ;
|
|
||||||
wait_queue = atomic_compare_exchange( & m_wait , wait_queue_old , q_denied );
|
|
||||||
} while ( wait_queue_old != wait_queue );
|
|
||||||
|
|
||||||
// Pop waiting tasks and schedule them
|
|
||||||
while ( wait_queue ) {
|
|
||||||
task_root_type * const x = wait_queue ; wait_queue = x->m_next ; x->m_next = 0 ;
|
|
||||||
policy.schedule_task( x , false /* not initial spawn */ );
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
void CudaTaskPolicyQueue::reschedule_task(
|
|
||||||
CudaTaskPolicyQueue::task_root_type * const task )
|
|
||||||
{
|
|
||||||
// Reschedule transitions from executing back to waiting.
|
|
||||||
const int old_state =
|
|
||||||
atomic_compare_exchange( & task->m_state
|
|
||||||
, int(TASK_STATE_EXECUTING)
|
|
||||||
, int(TASK_STATE_WAITING) );
|
|
||||||
|
|
||||||
if ( old_state != int(TASK_STATE_EXECUTING) ) {
|
|
||||||
|
|
||||||
printf( "CudaTaskPolicyQueue::reschedule_task(0x%lx) ERROR state(%d)\n"
|
|
||||||
, (unsigned long) task
|
|
||||||
, old_state
|
|
||||||
);
|
|
||||||
Kokkos::abort("CudaTaskPolicyQueue::reschedule" );
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_FUNCTION
|
|
||||||
void CudaTaskPolicyQueue::schedule_task(
|
|
||||||
CudaTaskPolicyQueue::task_root_type * const task ,
|
|
||||||
const bool initial_spawn )
|
|
||||||
{
|
|
||||||
task_root_type * const q_lock = reinterpret_cast<task_root_type*>(QLOCK);
|
|
||||||
task_root_type * const q_denied = reinterpret_cast<task_root_type*>(QDENIED);
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
// State is either constructing or already waiting.
|
|
||||||
// If constructing then transition to waiting.
|
|
||||||
|
|
||||||
{
|
|
||||||
const int old_state = atomic_compare_exchange( & task->m_state
|
|
||||||
, int(TASK_STATE_CONSTRUCTING)
|
|
||||||
, int(TASK_STATE_WAITING) );
|
|
||||||
|
|
||||||
// Head of linked list of tasks waiting on this task
|
|
||||||
task_root_type * const waitTask =
|
|
||||||
*((task_root_type * volatile const *) & task->m_wait );
|
|
||||||
|
|
||||||
// Member of linked list of tasks waiting on some other task
|
|
||||||
task_root_type * const next =
|
|
||||||
*((task_root_type * volatile const *) & task->m_next );
|
|
||||||
|
|
||||||
// An incomplete and non-executing task has:
|
|
||||||
// task->m_state == TASK_STATE_CONSTRUCTING or TASK_STATE_WAITING
|
|
||||||
// task->m_wait != q_denied
|
|
||||||
// task->m_next == 0
|
|
||||||
//
|
|
||||||
if ( ( q_denied == waitTask ) ||
|
|
||||||
( 0 != next ) ||
|
|
||||||
( old_state != int(TASK_STATE_CONSTRUCTING) &&
|
|
||||||
old_state != int(TASK_STATE_WAITING) ) ) {
|
|
||||||
printf( "CudaTaskPolicyQueue::schedule_task(0x%lx) STATE ERROR: state(%d) wait(0x%lx) next(0x%lx)\n"
|
|
||||||
, (unsigned long) task
|
|
||||||
, old_state
|
|
||||||
, (unsigned long) waitTask
|
|
||||||
, (unsigned long) next );
|
|
||||||
Kokkos::abort("CudaTaskPolicyQueue::schedule" );
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
|
|
||||||
if ( initial_spawn ) {
|
|
||||||
// The initial spawn of a task increments the reference count
|
|
||||||
// for the task's existence in either a waiting or ready queue
|
|
||||||
// until the task has completed.
|
|
||||||
// Completing the task's execution is the matching
|
|
||||||
// decrement of the reference count.
|
|
||||||
task_root_type::assign( 0 , task );
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
// Insert this task into a dependence task that is not complete.
|
|
||||||
// Push on to that task's wait queue.
|
|
||||||
|
|
||||||
bool attempt_insert_in_queue = true ;
|
|
||||||
|
|
||||||
task_root_type * volatile * queue =
|
|
||||||
task->m_dep_size ? & task->m_dep[0]->m_wait : (task_root_type **) 0 ;
|
|
||||||
|
|
||||||
for ( int i = 0 ; attempt_insert_in_queue && ( 0 != queue ) ; ) {
|
|
||||||
|
|
||||||
task_root_type * const head_value_old = *queue ;
|
|
||||||
|
|
||||||
if ( q_denied == head_value_old ) {
|
|
||||||
// Wait queue is closed because task is complete,
|
|
||||||
// try again with the next dependence wait queue.
|
|
||||||
++i ;
|
|
||||||
queue = i < task->m_dep_size ? & task->m_dep[i]->m_wait
|
|
||||||
: (task_root_type **) 0 ;
|
|
||||||
}
|
|
||||||
else {
|
|
||||||
|
|
||||||
// Wait queue is open and not denied.
|
|
||||||
// Have exclusive access to this task.
|
|
||||||
// Assign m_next assuming a successfull insertion into the queue.
|
|
||||||
// Fence the memory assignment before attempting the CAS.
|
|
||||||
|
|
||||||
*((task_root_type * volatile *) & task->m_next ) = head_value_old ;
|
|
||||||
|
|
||||||
memory_fence();
|
|
||||||
|
|
||||||
// Attempt to insert this task into the queue.
|
|
||||||
// If fails then continue the attempt.
|
|
||||||
|
|
||||||
attempt_insert_in_queue =
|
|
||||||
head_value_old != atomic_compare_exchange(queue,head_value_old,task);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
// All dependences are complete, insert into the ready list
|
|
||||||
|
|
||||||
if ( attempt_insert_in_queue ) {
|
|
||||||
|
|
||||||
// Increment the count of ready tasks.
|
|
||||||
// Count will be decremented when task is complete.
|
|
||||||
|
|
||||||
atomic_fetch_add( & m_count_ready , 1 );
|
|
||||||
|
|
||||||
queue = task->m_queue ;
|
|
||||||
|
|
||||||
while ( attempt_insert_in_queue ) {
|
|
||||||
|
|
||||||
// A locked queue is being popped.
|
|
||||||
|
|
||||||
task_root_type * const head_value_old = *queue ;
|
|
||||||
|
|
||||||
if ( q_lock != head_value_old ) {
|
|
||||||
// Read the head of ready queue,
|
|
||||||
// if same as previous value then CAS locks the ready queue
|
|
||||||
|
|
||||||
// Have exclusive access to this task,
|
|
||||||
// assign to head of queue, assuming successful insert
|
|
||||||
// Fence assignment before attempting insert.
|
|
||||||
*((task_root_type * volatile *) & task->m_next ) = head_value_old ;
|
|
||||||
|
|
||||||
memory_fence();
|
|
||||||
|
|
||||||
attempt_insert_in_queue =
|
|
||||||
head_value_old != atomic_compare_exchange(queue,head_value_old,task);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
void CudaTaskPolicyQueue::deallocate_task
|
|
||||||
( CudaTaskPolicyQueue::task_root_type * const task )
|
|
||||||
{
|
|
||||||
m_space.deallocate( task , task->m_size_alloc );
|
|
||||||
}
|
|
||||||
|
|
||||||
KOKKOS_FUNCTION
|
|
||||||
CudaTaskPolicyQueue::task_root_type *
|
|
||||||
CudaTaskPolicyQueue::allocate_task
|
|
||||||
( const unsigned arg_sizeof_task
|
|
||||||
, const unsigned arg_dep_capacity
|
|
||||||
, const unsigned arg_team_shmem
|
|
||||||
)
|
|
||||||
{
|
|
||||||
const unsigned base_size = arg_sizeof_task +
|
|
||||||
( arg_sizeof_task % sizeof(task_root_type*)
|
|
||||||
? sizeof(task_root_type*) - arg_sizeof_task % sizeof(task_root_type*)
|
|
||||||
: 0 );
|
|
||||||
|
|
||||||
const unsigned dep_capacity
|
|
||||||
= ~0u == arg_dep_capacity
|
|
||||||
? m_default_dependence_capacity
|
|
||||||
: arg_dep_capacity ;
|
|
||||||
|
|
||||||
const unsigned size_alloc =
|
|
||||||
base_size + sizeof(task_root_type*) * dep_capacity ;
|
|
||||||
|
|
||||||
task_root_type * const task =
|
|
||||||
reinterpret_cast<task_root_type*>( m_space.allocate( size_alloc ) );
|
|
||||||
|
|
||||||
if ( task != 0 ) {
|
|
||||||
|
|
||||||
// Initialize task's root and value data structure
|
|
||||||
// Calling function must copy construct the functor.
|
|
||||||
|
|
||||||
new( (void*) task ) task_root_type();
|
|
||||||
|
|
||||||
task->m_policy = this ;
|
|
||||||
task->m_size_alloc = size_alloc ;
|
|
||||||
task->m_dep_capacity = dep_capacity ;
|
|
||||||
task->m_shmem_size = arg_team_shmem ;
|
|
||||||
|
|
||||||
if ( dep_capacity ) {
|
|
||||||
task->m_dep =
|
|
||||||
reinterpret_cast<task_root_type**>(
|
|
||||||
reinterpret_cast<unsigned char*>(task) + base_size );
|
|
||||||
|
|
||||||
for ( unsigned i = 0 ; i < dep_capacity ; ++i )
|
|
||||||
task->task_root_type::m_dep[i] = 0 ;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return task ;
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
void CudaTaskPolicyQueue::add_dependence
|
|
||||||
( CudaTaskPolicyQueue::task_root_type * const after
|
|
||||||
, CudaTaskPolicyQueue::task_root_type * const before
|
|
||||||
)
|
|
||||||
{
|
|
||||||
if ( ( after != 0 ) && ( before != 0 ) ) {
|
|
||||||
|
|
||||||
int const state = *((volatile const int *) & after->m_state );
|
|
||||||
|
|
||||||
// Only add dependence during construction or during execution.
|
|
||||||
// Both tasks must have the same policy.
|
|
||||||
// Dependence on non-full memory cannot be mixed with any other dependence.
|
|
||||||
|
|
||||||
const bool ok_state =
|
|
||||||
Kokkos::Experimental::TASK_STATE_CONSTRUCTING == state ||
|
|
||||||
Kokkos::Experimental::TASK_STATE_EXECUTING == state ;
|
|
||||||
|
|
||||||
const bool ok_capacity =
|
|
||||||
after->m_dep_size < after->m_dep_capacity ;
|
|
||||||
|
|
||||||
const bool ok_policy =
|
|
||||||
after->m_policy == this && before->m_policy == this ;
|
|
||||||
|
|
||||||
if ( ok_state && ok_capacity && ok_policy ) {
|
|
||||||
|
|
||||||
++after->m_dep_size ;
|
|
||||||
|
|
||||||
task_root_type::assign( after->m_dep + (after->m_dep_size-1) , before );
|
|
||||||
|
|
||||||
memory_fence();
|
|
||||||
}
|
|
||||||
else {
|
|
||||||
|
|
||||||
printf( "CudaTaskPolicyQueue::add_dependence( 0x%lx , 0x%lx ) ERROR %s\n"
|
|
||||||
, (unsigned long) after
|
|
||||||
, (unsigned long) before
|
|
||||||
, ( ! ok_state ? "Task not constructing or executing" :
|
|
||||||
( ! ok_capacity ? "Task Exceeded dependence capacity"
|
|
||||||
: "Tasks from different policies" )) );
|
|
||||||
|
|
||||||
Kokkos::abort("CudaTaskPolicyQueue::add_dependence ERROR");
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
} /* namespace Impl */
|
|
||||||
} /* namespace Experimental */
|
|
||||||
} /* namespace Kokkos */
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
|
|
||||||
TaskPolicy< Kokkos::Cuda >::TaskPolicy
|
|
||||||
( const unsigned arg_task_max_count
|
|
||||||
, const unsigned arg_task_max_size
|
|
||||||
, const unsigned arg_task_default_dependence_capacity
|
|
||||||
, const unsigned arg_task_team_size
|
|
||||||
)
|
|
||||||
: m_track()
|
|
||||||
, m_policy(0)
|
|
||||||
{
|
|
||||||
// Allocate the queue data sructure in UVM space
|
|
||||||
|
|
||||||
typedef Kokkos::Experimental::Impl::SharedAllocationRecord
|
|
||||||
< Kokkos::CudaUVMSpace , Impl::CudaTaskPolicyQueue::Destroy > record_type ;
|
|
||||||
|
|
||||||
record_type * record =
|
|
||||||
record_type::allocate( Kokkos::CudaUVMSpace()
|
|
||||||
, "CudaUVM task queue"
|
|
||||||
, sizeof(Impl::CudaTaskPolicyQueue)
|
|
||||||
);
|
|
||||||
|
|
||||||
m_policy = reinterpret_cast< Impl::CudaTaskPolicyQueue * >( record->data() );
|
|
||||||
|
|
||||||
// Tasks are allocated with application's task size + sizeof(task_root_type)
|
|
||||||
|
|
||||||
const size_t full_task_size_estimate =
|
|
||||||
arg_task_max_size +
|
|
||||||
sizeof(task_root_type) +
|
|
||||||
sizeof(task_root_type*) * arg_task_default_dependence_capacity ;
|
|
||||||
|
|
||||||
new( m_policy )
|
|
||||||
Impl::CudaTaskPolicyQueue( arg_task_max_count
|
|
||||||
, full_task_size_estimate
|
|
||||||
, arg_task_default_dependence_capacity
|
|
||||||
, arg_task_team_size );
|
|
||||||
|
|
||||||
record->m_destroy.m_policy = m_policy ;
|
|
||||||
|
|
||||||
m_track.assign_allocated_record_to_uninitialized( record );
|
|
||||||
}
|
|
||||||
|
|
||||||
__global__
|
|
||||||
static void kokkos_cuda_task_policy_queue_driver
|
|
||||||
( Kokkos::Experimental::Impl::CudaTaskPolicyQueue * queue )
|
|
||||||
{
|
|
||||||
queue->driver();
|
|
||||||
}
|
|
||||||
|
|
||||||
void wait( Kokkos::Experimental::TaskPolicy< Kokkos::Cuda > & policy )
|
|
||||||
{
|
|
||||||
const dim3 grid( Kokkos::Impl::cuda_internal_multiprocessor_count() , 1 , 1 );
|
|
||||||
const dim3 block( 1 , policy.m_policy->m_team_size , 1 );
|
|
||||||
|
|
||||||
const int shared = 0 ; // Kokkos::Impl::CudaTraits::SharedMemoryUsage / 2 ;
|
|
||||||
const cudaStream_t stream = 0 ;
|
|
||||||
|
|
||||||
|
|
||||||
#ifdef DETAILED_PRINT
|
|
||||||
printf("kokkos_cuda_task_policy_queue_driver grid(%d,%d,%d) block(%d,%d,%d) shared(%d) policy(0x%lx)\n"
|
|
||||||
, grid.x , grid.y , grid.z
|
|
||||||
, block.x , block.y , block.z
|
|
||||||
, shared
|
|
||||||
, (unsigned long)( policy.m_policy ) );
|
|
||||||
fflush(stdout);
|
|
||||||
#endif
|
|
||||||
|
|
||||||
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
|
|
||||||
|
|
||||||
/*
|
|
||||||
CUDA_SAFE_CALL(
|
|
||||||
cudaFuncSetCacheConfig( kokkos_cuda_task_policy_queue_driver
|
|
||||||
, cudaFuncCachePreferL1 ) );
|
|
||||||
|
|
||||||
CUDA_SAFE_CALL( cudaGetLastError() );
|
|
||||||
*/
|
|
||||||
|
|
||||||
kokkos_cuda_task_policy_queue_driver<<< grid , block , shared , stream >>>
|
|
||||||
( policy.m_policy );
|
|
||||||
|
|
||||||
CUDA_SAFE_CALL( cudaGetLastError() );
|
|
||||||
|
|
||||||
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
|
|
||||||
|
|
||||||
#ifdef DETAILED_PRINT
|
|
||||||
printf("kokkos_cuda_task_policy_queue_driver end\n");
|
|
||||||
fflush(stdout);
|
|
||||||
#endif
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
} /* namespace Experimental */
|
|
||||||
} /* namespace Kokkos */
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
typedef TaskMember< Kokkos::Cuda , void , void > Task ;
|
|
||||||
|
|
||||||
__host__ __device__
|
|
||||||
Task::~TaskMember()
|
|
||||||
{
|
|
||||||
}
|
|
||||||
|
|
||||||
__host__ __device__
|
|
||||||
void Task::assign( Task ** const lhs_ptr , Task * rhs )
|
|
||||||
{
|
|
||||||
Task * const q_denied = reinterpret_cast<Task*>(QDENIED);
|
|
||||||
|
|
||||||
// Increment rhs reference count.
|
|
||||||
if ( rhs ) { atomic_fetch_add( & rhs->m_ref_count , 1 ); }
|
|
||||||
|
|
||||||
if ( 0 == lhs_ptr ) return ;
|
|
||||||
|
|
||||||
// Must have exclusive access to *lhs_ptr.
|
|
||||||
// Assign the pointer and retrieve the previous value.
|
|
||||||
// Cannot use atomic exchange since *lhs_ptr may be
|
|
||||||
// in Cuda register space.
|
|
||||||
|
|
||||||
#if 0
|
|
||||||
|
|
||||||
Task * const old_lhs = *((Task*volatile*)lhs_ptr);
|
|
||||||
|
|
||||||
*((Task*volatile*)lhs_ptr) = rhs ;
|
|
||||||
|
|
||||||
Kokkos::memory_fence();
|
|
||||||
|
|
||||||
#else
|
|
||||||
|
|
||||||
Task * const old_lhs = *lhs_ptr ;
|
|
||||||
|
|
||||||
*lhs_ptr = rhs ;
|
|
||||||
|
|
||||||
#endif
|
|
||||||
|
|
||||||
if ( old_lhs && rhs && old_lhs->m_policy != rhs->m_policy ) {
|
|
||||||
Kokkos::abort( "Kokkos::Impl::TaskMember<Kokkos::Cuda>::assign ERROR different queues");
|
|
||||||
}
|
|
||||||
|
|
||||||
if ( old_lhs ) {
|
|
||||||
|
|
||||||
Kokkos::memory_fence();
|
|
||||||
|
|
||||||
// Decrement former lhs reference count.
|
|
||||||
// If reference count is zero task must be complete, then delete task.
|
|
||||||
// Task is ready for deletion when wait == q_denied
|
|
||||||
|
|
||||||
int const count = atomic_fetch_add( & (old_lhs->m_ref_count) , -1 ) - 1 ;
|
|
||||||
int const state = old_lhs->m_state ;
|
|
||||||
Task * const wait = *((Task * const volatile *) & old_lhs->m_wait );
|
|
||||||
|
|
||||||
const bool ok_count = 0 <= count ;
|
|
||||||
|
|
||||||
// If count == 0 then will be deleting
|
|
||||||
// and must either be constructing or complete.
|
|
||||||
const bool ok_state = 0 < count ? true :
|
|
||||||
( ( state == int(TASK_STATE_CONSTRUCTING) && wait == 0 ) ||
|
|
||||||
( state == int(TASK_STATE_COMPLETE) && wait == q_denied ) )
|
|
||||||
&&
|
|
||||||
old_lhs->m_next == 0 &&
|
|
||||||
old_lhs->m_dep_size == 0 ;
|
|
||||||
|
|
||||||
if ( ! ok_count || ! ok_state ) {
|
|
||||||
|
|
||||||
printf( "%s Kokkos::Impl::TaskManager<Kokkos::Cuda>::assign ERROR deleting task(0x%lx) m_ref_count(%d) m_state(%d) m_wait(0x%ld)\n"
|
|
||||||
#if defined( KOKKOS_ACTIVE_EXECUTION_SPACE_CUDA )
|
|
||||||
, "CUDA "
|
|
||||||
#else
|
|
||||||
, "HOST "
|
|
||||||
#endif
|
|
||||||
, (unsigned long) old_lhs
|
|
||||||
, count
|
|
||||||
, state
|
|
||||||
, (unsigned long) wait );
|
|
||||||
Kokkos::abort( "Kokkos::Impl::TaskMember<Kokkos::Cuda>::assign ERROR deleting");
|
|
||||||
}
|
|
||||||
|
|
||||||
if ( count == 0 ) {
|
|
||||||
// When 'count == 0' this thread has exclusive access to 'old_lhs'
|
|
||||||
|
|
||||||
#ifdef DETAILED_PRINT
|
|
||||||
printf( "Task::assign(...) old_lhs(0x%lx) deallocate\n"
|
|
||||||
, (unsigned long) old_lhs
|
|
||||||
);
|
|
||||||
#endif
|
|
||||||
|
|
||||||
old_lhs->m_policy->deallocate_task( old_lhs );
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
__device__
|
|
||||||
int Task::get_dependence() const
|
|
||||||
{
|
|
||||||
return m_dep_size ;
|
|
||||||
}
|
|
||||||
|
|
||||||
__device__
|
|
||||||
Task * Task::get_dependence( int i ) const
|
|
||||||
{
|
|
||||||
Task * const t = ((Task*volatile*)m_dep)[i] ;
|
|
||||||
|
|
||||||
if ( Kokkos::Experimental::TASK_STATE_EXECUTING != m_state || i < 0 || m_dep_size <= i || 0 == t ) {
|
|
||||||
|
|
||||||
printf( "TaskMember< Cuda >::get_dependence ERROR : task[%lx]{ state(%d) dep_size(%d) dep[%d] = %lx }\n"
|
|
||||||
, (unsigned long) this
|
|
||||||
, m_state
|
|
||||||
, m_dep_size
|
|
||||||
, i
|
|
||||||
, (unsigned long) t
|
|
||||||
);
|
|
||||||
|
|
||||||
Kokkos::abort("TaskMember< Cuda >::get_dependence ERROR");
|
|
||||||
}
|
|
||||||
|
|
||||||
return t ;
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
__device__ __host__
|
|
||||||
void Task::clear_dependence()
|
|
||||||
{
|
|
||||||
for ( int i = m_dep_size - 1 ; 0 <= i ; --i ) {
|
|
||||||
assign( m_dep + i , 0 );
|
|
||||||
}
|
|
||||||
|
|
||||||
*((volatile int *) & m_dep_size ) = 0 ;
|
|
||||||
|
|
||||||
memory_fence();
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
} /* namespace Impl */
|
|
||||||
} /* namespace Experimental */
|
|
||||||
} /* namespace Kokkos */
|
|
||||||
|
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
|
|
||||||
|
|
||||||
@ -1,833 +0,0 @@
|
|||||||
/*
|
|
||||||
//@HEADER
|
|
||||||
// ************************************************************************
|
|
||||||
//
|
|
||||||
// Kokkos v. 2.0
|
|
||||||
// Copyright (2014) Sandia Corporation
|
|
||||||
//
|
|
||||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
|
||||||
// the U.S. Government retains certain rights in this software.
|
|
||||||
//
|
|
||||||
// Redistribution and use in source and binary forms, with or without
|
|
||||||
// modification, are permitted provided that the following conditions are
|
|
||||||
// met:
|
|
||||||
//
|
|
||||||
// 1. Redistributions of source code must retain the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer.
|
|
||||||
//
|
|
||||||
// 2. Redistributions in binary form must reproduce the above copyright
|
|
||||||
// notice, this list of conditions and the following disclaimer in the
|
|
||||||
// documentation and/or other materials provided with the distribution.
|
|
||||||
//
|
|
||||||
// 3. Neither the name of the Corporation nor the names of the
|
|
||||||
// contributors may be used to endorse or promote products derived from
|
|
||||||
// this software without specific prior written permission.
|
|
||||||
//
|
|
||||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
|
||||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
||||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
||||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
|
||||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
|
||||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
|
||||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
|
||||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
|
||||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
|
||||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
|
||||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
||||||
//
|
|
||||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
|
||||||
//
|
|
||||||
// ************************************************************************
|
|
||||||
//@HEADER
|
|
||||||
*/
|
|
||||||
|
|
||||||
// Experimental unified task-data parallel manycore LDRD
|
|
||||||
|
|
||||||
#ifndef KOKKOS_CUDA_TASKPOLICY_HPP
|
|
||||||
#define KOKKOS_CUDA_TASKPOLICY_HPP
|
|
||||||
|
|
||||||
#include <Kokkos_Core_fwd.hpp>
|
|
||||||
#include <Kokkos_Cuda.hpp>
|
|
||||||
#include <Kokkos_TaskPolicy.hpp>
|
|
||||||
|
|
||||||
#if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY )
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
struct CudaTaskPolicyQueue ;
|
|
||||||
|
|
||||||
/** \brief Base class for all Kokkos::Cuda tasks */
|
|
||||||
template<>
|
|
||||||
class TaskMember< Kokkos::Cuda , void , void > {
|
|
||||||
public:
|
|
||||||
|
|
||||||
template< class > friend class Kokkos::Experimental::TaskPolicy ;
|
|
||||||
friend struct CudaTaskPolicyQueue ;
|
|
||||||
|
|
||||||
typedef void (* function_single_type) ( TaskMember * );
|
|
||||||
typedef void (* function_team_type) ( TaskMember * , Kokkos::Impl::CudaTeamMember & );
|
|
||||||
|
|
||||||
private:
|
|
||||||
|
|
||||||
CudaTaskPolicyQueue * m_policy ;
|
|
||||||
TaskMember * volatile * m_queue ;
|
|
||||||
function_team_type m_team ; ///< Apply function on CUDA
|
|
||||||
function_single_type m_serial ; ///< Apply function on CUDA
|
|
||||||
TaskMember ** m_dep ; ///< Dependences
|
|
||||||
TaskMember * m_wait ; ///< Linked list of tasks waiting on this task
|
|
||||||
TaskMember * m_next ; ///< Linked list of tasks waiting on a different task
|
|
||||||
int m_dep_capacity ; ///< Capacity of dependences
|
|
||||||
int m_dep_size ; ///< Actual count of dependences
|
|
||||||
int m_size_alloc ;
|
|
||||||
int m_shmem_size ;
|
|
||||||
int m_ref_count ; ///< Reference count
|
|
||||||
int m_state ; ///< State of the task
|
|
||||||
|
|
||||||
|
|
||||||
TaskMember( TaskMember && ) = delete ;
|
|
||||||
TaskMember( const TaskMember & ) = delete ;
|
|
||||||
TaskMember & operator = ( TaskMember && ) = delete ;
|
|
||||||
TaskMember & operator = ( const TaskMember & ) = delete ;
|
|
||||||
|
|
||||||
protected:
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
TaskMember()
|
|
||||||
: m_policy(0)
|
|
||||||
, m_queue(0)
|
|
||||||
, m_team(0)
|
|
||||||
, m_serial(0)
|
|
||||||
, m_dep(0)
|
|
||||||
, m_wait(0)
|
|
||||||
, m_next(0)
|
|
||||||
, m_size_alloc(0)
|
|
||||||
, m_dep_capacity(0)
|
|
||||||
, m_dep_size(0)
|
|
||||||
, m_shmem_size(0)
|
|
||||||
, m_ref_count(0)
|
|
||||||
, m_state( TASK_STATE_CONSTRUCTING )
|
|
||||||
{}
|
|
||||||
|
|
||||||
public:
|
|
||||||
|
|
||||||
KOKKOS_FUNCTION
|
|
||||||
~TaskMember();
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
int reference_count() const
|
|
||||||
{ return *((volatile int *) & m_ref_count ); }
|
|
||||||
|
|
||||||
// Cannot use the function pointer to verify the type
|
|
||||||
// since the function pointer is not unique between
|
|
||||||
// Host and Cuda. Don't run verificaton for Cuda.
|
|
||||||
// Assume testing on Host-only back-end will catch such errors.
|
|
||||||
|
|
||||||
template< typename ResultType >
|
|
||||||
KOKKOS_INLINE_FUNCTION static
|
|
||||||
TaskMember * verify_type( TaskMember * t ) { return t ; }
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
/* Inheritence Requirements on task types:
|
|
||||||
*
|
|
||||||
* class DerivedTaskType
|
|
||||||
* : public TaskMember< Cuda , DerivedType::value_type , FunctorType >
|
|
||||||
* { ... };
|
|
||||||
*
|
|
||||||
* class TaskMember< Cuda , DerivedType::value_type , FunctorType >
|
|
||||||
* : public TaskMember< Cuda , DerivedType::value_type , void >
|
|
||||||
* , public Functor
|
|
||||||
* { ... };
|
|
||||||
*
|
|
||||||
* If value_type != void
|
|
||||||
* class TaskMember< Cuda , value_type , void >
|
|
||||||
* : public TaskMember< Cuda , void , void >
|
|
||||||
*
|
|
||||||
* Allocate space for DerivedTaskType followed by TaskMember*[ dependence_capacity ]
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
//----------------------------------------
|
|
||||||
// If after the 'apply' the task's state is waiting
|
|
||||||
// then it will be rescheduled and called again.
|
|
||||||
// Otherwise the functor must be destroyed.
|
|
||||||
|
|
||||||
template< class DerivedTaskType , class Tag >
|
|
||||||
__device__ static
|
|
||||||
void apply_single(
|
|
||||||
typename std::enable_if
|
|
||||||
<( std::is_same< Tag , void >::value &&
|
|
||||||
std::is_same< typename DerivedTaskType::result_type , void >::value
|
|
||||||
), TaskMember * >::type t )
|
|
||||||
{
|
|
||||||
typedef typename DerivedTaskType::functor_type functor_type ;
|
|
||||||
|
|
||||||
functor_type * const f =
|
|
||||||
static_cast< functor_type * >( static_cast< DerivedTaskType * >(t) );
|
|
||||||
|
|
||||||
f->apply();
|
|
||||||
|
|
||||||
if ( t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
|
|
||||||
f->~functor_type();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class DerivedTaskType , class Tag >
|
|
||||||
__device__ static
|
|
||||||
void apply_single(
|
|
||||||
typename std::enable_if
|
|
||||||
<( std::is_same< Tag , void >::value &&
|
|
||||||
! std::is_same< typename DerivedTaskType::result_type , void >::value
|
|
||||||
), TaskMember * >::type t )
|
|
||||||
{
|
|
||||||
typedef typename DerivedTaskType::functor_type functor_type ;
|
|
||||||
|
|
||||||
DerivedTaskType * const self = static_cast< DerivedTaskType * >(t);
|
|
||||||
functor_type * const f = static_cast< functor_type * >( self );
|
|
||||||
|
|
||||||
f->apply( self->m_result );
|
|
||||||
|
|
||||||
if ( t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
|
|
||||||
f->~functor_type();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class DerivedTaskType , class Tag >
|
|
||||||
__device__
|
|
||||||
void set_apply_single()
|
|
||||||
{
|
|
||||||
m_serial = & TaskMember::template apply_single<DerivedTaskType,Tag> ;
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
|
|
||||||
template< class DerivedTaskType , class Tag >
|
|
||||||
__device__ static
|
|
||||||
void apply_team(
|
|
||||||
typename std::enable_if
|
|
||||||
<( std::is_same<Tag,void>::value &&
|
|
||||||
std::is_same<typename DerivedTaskType::result_type,void>::value
|
|
||||||
), TaskMember * >::type t
|
|
||||||
, Kokkos::Impl::CudaTeamMember & member
|
|
||||||
)
|
|
||||||
{
|
|
||||||
typedef typename DerivedTaskType::functor_type functor_type ;
|
|
||||||
|
|
||||||
functor_type * const f =
|
|
||||||
static_cast< functor_type * >( static_cast< DerivedTaskType * >(t) );
|
|
||||||
|
|
||||||
f->apply( member );
|
|
||||||
|
|
||||||
__syncthreads(); // Wait for team to finish calling function
|
|
||||||
|
|
||||||
if ( threadIdx.x == 0 &&
|
|
||||||
threadIdx.y == 0 &&
|
|
||||||
t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
|
|
||||||
f->~functor_type();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class DerivedTaskType , class Tag >
|
|
||||||
__device__ static
|
|
||||||
void apply_team(
|
|
||||||
typename std::enable_if
|
|
||||||
<( std::is_same<Tag,void>::value &&
|
|
||||||
! std::is_same<typename DerivedTaskType::result_type,void>::value
|
|
||||||
), TaskMember * >::type t
|
|
||||||
, Kokkos::Impl::CudaTeamMember & member
|
|
||||||
)
|
|
||||||
{
|
|
||||||
typedef typename DerivedTaskType::functor_type functor_type ;
|
|
||||||
|
|
||||||
DerivedTaskType * const self = static_cast< DerivedTaskType * >(t);
|
|
||||||
functor_type * const f = static_cast< functor_type * >( self );
|
|
||||||
|
|
||||||
f->apply( member , self->m_result );
|
|
||||||
|
|
||||||
__syncthreads(); // Wait for team to finish calling function
|
|
||||||
|
|
||||||
if ( threadIdx.x == 0 &&
|
|
||||||
threadIdx.y == 0 &&
|
|
||||||
t->m_state == int(Kokkos::Experimental::TASK_STATE_EXECUTING) ) {
|
|
||||||
f->~functor_type();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class DerivedTaskType , class Tag >
|
|
||||||
__device__
|
|
||||||
void set_apply_team()
|
|
||||||
{
|
|
||||||
m_team = & TaskMember::template apply_team<DerivedTaskType,Tag> ;
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
|
|
||||||
KOKKOS_FUNCTION static
|
|
||||||
void assign( TaskMember ** const lhs , TaskMember * const rhs );
|
|
||||||
|
|
||||||
__device__
|
|
||||||
TaskMember * get_dependence( int i ) const ;
|
|
||||||
|
|
||||||
__device__
|
|
||||||
int get_dependence() const ;
|
|
||||||
|
|
||||||
KOKKOS_FUNCTION void clear_dependence();
|
|
||||||
|
|
||||||
__device__
|
|
||||||
void latch_add( const int k );
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION static
|
|
||||||
void construct_result( TaskMember * const ) {}
|
|
||||||
|
|
||||||
typedef FutureValueTypeIsVoidError get_result_type ;
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
get_result_type get() const { return get_result_type() ; }
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
Kokkos::Experimental::TaskState get_state() const { return Kokkos::Experimental::TaskState( m_state ); }
|
|
||||||
|
|
||||||
};
|
|
||||||
|
|
||||||
/** \brief A Future< Kokkos::Cuda , ResultType > will cast
|
|
||||||
* from TaskMember< Kokkos::Cuda , void , void >
|
|
||||||
* to TaskMember< Kokkos::Cuda , ResultType , void >
|
|
||||||
* to query the result.
|
|
||||||
*/
|
|
||||||
template< class ResultType >
|
|
||||||
class TaskMember< Kokkos::Cuda , ResultType , void >
|
|
||||||
: public TaskMember< Kokkos::Cuda , void , void >
|
|
||||||
{
|
|
||||||
public:
|
|
||||||
|
|
||||||
typedef ResultType result_type ;
|
|
||||||
|
|
||||||
result_type m_result ;
|
|
||||||
|
|
||||||
typedef const result_type & get_result_type ;
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
get_result_type get() const { return m_result ; }
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION static
|
|
||||||
void construct_result( TaskMember * const ptr )
|
|
||||||
{
|
|
||||||
new((void*)(& ptr->m_result)) result_type();
|
|
||||||
}
|
|
||||||
|
|
||||||
TaskMember() = delete ;
|
|
||||||
TaskMember( TaskMember && ) = delete ;
|
|
||||||
TaskMember( const TaskMember & ) = delete ;
|
|
||||||
TaskMember & operator = ( TaskMember && ) = delete ;
|
|
||||||
TaskMember & operator = ( const TaskMember & ) = delete ;
|
|
||||||
};
|
|
||||||
|
|
||||||
/** \brief Callback functions will cast
|
|
||||||
* from TaskMember< Kokkos::Cuda , void , void >
|
|
||||||
* to TaskMember< Kokkos::Cuda , ResultType , FunctorType >
|
|
||||||
* to execute work functions.
|
|
||||||
*/
|
|
||||||
template< class ResultType , class FunctorType >
|
|
||||||
class TaskMember< Kokkos::Cuda , ResultType , FunctorType >
|
|
||||||
: public TaskMember< Kokkos::Cuda , ResultType , void >
|
|
||||||
, public FunctorType
|
|
||||||
{
|
|
||||||
public:
|
|
||||||
typedef ResultType result_type ;
|
|
||||||
typedef FunctorType functor_type ;
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION static
|
|
||||||
void copy_construct( TaskMember * const ptr
|
|
||||||
, const functor_type & arg_functor )
|
|
||||||
{
|
|
||||||
typedef TaskMember< Kokkos::Cuda , ResultType , void > base_type ;
|
|
||||||
|
|
||||||
new((void*)static_cast<FunctorType*>(ptr)) functor_type( arg_functor );
|
|
||||||
|
|
||||||
base_type::construct_result( static_cast<base_type*>( ptr ) );
|
|
||||||
}
|
|
||||||
|
|
||||||
TaskMember() = delete ;
|
|
||||||
TaskMember( TaskMember && ) = delete ;
|
|
||||||
TaskMember( const TaskMember & ) = delete ;
|
|
||||||
TaskMember & operator = ( TaskMember && ) = delete ;
|
|
||||||
TaskMember & operator = ( const TaskMember & ) = delete ;
|
|
||||||
};
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
namespace {
|
|
||||||
|
|
||||||
template< class DerivedTaskType , class Tag >
|
|
||||||
__global__
|
|
||||||
void cuda_set_apply_single( DerivedTaskType * task )
|
|
||||||
{
|
|
||||||
typedef Kokkos::Experimental::Impl::TaskMember< Kokkos::Cuda , void , void >
|
|
||||||
task_root_type ;
|
|
||||||
|
|
||||||
task->task_root_type::template set_apply_single< DerivedTaskType , Tag >();
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class DerivedTaskType , class Tag >
|
|
||||||
__global__
|
|
||||||
void cuda_set_apply_team( DerivedTaskType * task )
|
|
||||||
{
|
|
||||||
typedef Kokkos::Experimental::Impl::TaskMember< Kokkos::Cuda , void , void >
|
|
||||||
task_root_type ;
|
|
||||||
|
|
||||||
task->task_root_type::template set_apply_team< DerivedTaskType , Tag >();
|
|
||||||
}
|
|
||||||
|
|
||||||
} /* namespace */
|
|
||||||
} /* namespace Impl */
|
|
||||||
} /* namespace Experimental */
|
|
||||||
} /* namespace Kokkos */
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
|
||||||
|
|
||||||
struct CudaTaskPolicyQueue {
|
|
||||||
|
|
||||||
enum { NPRIORITY = 3 };
|
|
||||||
|
|
||||||
// Must use UVM so that tasks can be created in both
|
|
||||||
// Host and Cuda space.
|
|
||||||
|
|
||||||
typedef Kokkos::Experimental::MemoryPool< Kokkos::CudaUVMSpace >
|
|
||||||
memory_space ;
|
|
||||||
|
|
||||||
typedef Kokkos::Experimental::Impl::TaskMember< Kokkos::Cuda , void , void >
|
|
||||||
task_root_type ;
|
|
||||||
|
|
||||||
memory_space m_space ;
|
|
||||||
task_root_type * m_team[ NPRIORITY ] ;
|
|
||||||
task_root_type * m_serial[ NPRIORITY ];
|
|
||||||
int m_team_size ;
|
|
||||||
int m_default_dependence_capacity ;
|
|
||||||
int volatile m_count_ready ; ///< Ready plus executing tasks
|
|
||||||
|
|
||||||
// Execute tasks until all non-waiting tasks are complete
|
|
||||||
__device__
|
|
||||||
void driver();
|
|
||||||
|
|
||||||
__device__ static
|
|
||||||
task_root_type * pop_ready_task( task_root_type * volatile * const queue );
|
|
||||||
|
|
||||||
// When a task finishes executing.
|
|
||||||
__device__
|
|
||||||
void complete_executed_task( task_root_type * );
|
|
||||||
|
|
||||||
KOKKOS_FUNCTION void schedule_task( task_root_type * const
|
|
||||||
, const bool initial_spawn = true );
|
|
||||||
KOKKOS_FUNCTION void reschedule_task( task_root_type * const );
|
|
||||||
KOKKOS_FUNCTION
|
|
||||||
void add_dependence( task_root_type * const after
|
|
||||||
, task_root_type * const before );
|
|
||||||
|
|
||||||
|
|
||||||
CudaTaskPolicyQueue() = delete ;
|
|
||||||
CudaTaskPolicyQueue( CudaTaskPolicyQueue && ) = delete ;
|
|
||||||
CudaTaskPolicyQueue( const CudaTaskPolicyQueue & ) = delete ;
|
|
||||||
CudaTaskPolicyQueue & operator = ( CudaTaskPolicyQueue && ) = delete ;
|
|
||||||
CudaTaskPolicyQueue & operator = ( const CudaTaskPolicyQueue & ) = delete ;
|
|
||||||
|
|
||||||
|
|
||||||
~CudaTaskPolicyQueue();
|
|
||||||
|
|
||||||
// Construct only on the Host
|
|
||||||
CudaTaskPolicyQueue
|
|
||||||
( const unsigned arg_task_max_count
|
|
||||||
, const unsigned arg_task_max_size
|
|
||||||
, const unsigned arg_task_default_dependence_capacity
|
|
||||||
, const unsigned arg_task_team_size
|
|
||||||
);
|
|
||||||
|
|
||||||
struct Destroy {
|
|
||||||
CudaTaskPolicyQueue * m_policy ;
|
|
||||||
void destroy_shared_allocation();
|
|
||||||
};
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
/** \brief Allocate and construct a task.
|
|
||||||
*
|
|
||||||
* Allocate space for DerivedTaskType followed
|
|
||||||
* by TaskMember*[ dependence_capacity ]
|
|
||||||
*/
|
|
||||||
KOKKOS_FUNCTION
|
|
||||||
task_root_type *
|
|
||||||
allocate_task( const unsigned arg_sizeof_task
|
|
||||||
, const unsigned arg_dep_capacity
|
|
||||||
, const unsigned arg_team_shmem = 0 );
|
|
||||||
|
|
||||||
KOKKOS_FUNCTION void deallocate_task( task_root_type * const );
|
|
||||||
};
|
|
||||||
|
|
||||||
} /* namespace Impl */
|
|
||||||
} /* namespace Experimental */
|
|
||||||
} /* namespace Kokkos */
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Experimental {
|
|
||||||
|
|
||||||
void wait( TaskPolicy< Kokkos::Cuda > & );
|
|
||||||
|
|
||||||
template<>
|
|
||||||
class TaskPolicy< Kokkos::Cuda >
|
|
||||||
{
|
|
||||||
public:
|
|
||||||
|
|
||||||
typedef Kokkos::Cuda execution_space ;
|
|
||||||
typedef TaskPolicy execution_policy ;
|
|
||||||
typedef Kokkos::Impl::CudaTeamMember member_type ;
|
|
||||||
|
|
||||||
private:
|
|
||||||
|
|
||||||
typedef Impl::TaskMember< Kokkos::Cuda , void , void > task_root_type ;
|
|
||||||
typedef Kokkos::Experimental::MemoryPool< Kokkos::CudaUVMSpace > memory_space ;
|
|
||||||
typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
|
|
||||||
|
|
||||||
track_type m_track ;
|
|
||||||
Impl::CudaTaskPolicyQueue * m_policy ;
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
KOKKOS_INLINE_FUNCTION static
|
|
||||||
const task_root_type * get_task_root( const FunctorType * f )
|
|
||||||
{
|
|
||||||
typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
|
|
||||||
return static_cast< const task_root_type * >( static_cast< const task_type * >(f) );
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
KOKKOS_INLINE_FUNCTION static
|
|
||||||
task_root_type * get_task_root( FunctorType * f )
|
|
||||||
{
|
|
||||||
typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
|
|
||||||
return static_cast< task_root_type * >( static_cast< task_type * >(f) );
|
|
||||||
}
|
|
||||||
|
|
||||||
public:
|
|
||||||
|
|
||||||
TaskPolicy
|
|
||||||
( const unsigned arg_task_max_count
|
|
||||||
, const unsigned arg_task_max_size
|
|
||||||
, const unsigned arg_task_default_dependence_capacity = 4
|
|
||||||
, const unsigned arg_task_team_size = 0 /* choose default */
|
|
||||||
);
|
|
||||||
|
|
||||||
KOKKOS_FUNCTION TaskPolicy() = default ;
|
|
||||||
KOKKOS_FUNCTION TaskPolicy( TaskPolicy && rhs ) = default ;
|
|
||||||
KOKKOS_FUNCTION TaskPolicy( const TaskPolicy & rhs ) = default ;
|
|
||||||
KOKKOS_FUNCTION TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
|
|
||||||
KOKKOS_FUNCTION TaskPolicy & operator = ( const TaskPolicy & rhs ) = default ;
|
|
||||||
|
|
||||||
KOKKOS_FUNCTION
|
|
||||||
int allocated_task_count() const { return 0 ; }
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
// Create serial-thread task
|
|
||||||
// Main process and tasks must use different functions
|
|
||||||
// to work around CUDA limitation where __host__ __device__
|
|
||||||
// functions are not allowed to invoke templated __global__ functions.
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
Future< typename FunctorType::value_type , execution_space >
|
|
||||||
proc_create( const FunctorType & arg_functor
|
|
||||||
, const unsigned arg_dep_capacity = ~0u ) const
|
|
||||||
{
|
|
||||||
typedef typename FunctorType::value_type value_type ;
|
|
||||||
|
|
||||||
typedef Impl::TaskMember< execution_space , value_type , FunctorType >
|
|
||||||
task_type ;
|
|
||||||
|
|
||||||
task_type * const task =
|
|
||||||
static_cast<task_type*>(
|
|
||||||
m_policy->allocate_task( sizeof(task_type) , arg_dep_capacity ) );
|
|
||||||
|
|
||||||
if ( task ) {
|
|
||||||
// The root part of the class has been constructed.
|
|
||||||
// Must now construct the functor and result specific part.
|
|
||||||
|
|
||||||
task_type::copy_construct( task , arg_functor );
|
|
||||||
|
|
||||||
// Setting the apply pointer on the device requires code
|
|
||||||
// executing on the GPU. This function is called on the
|
|
||||||
// host process so a kernel must be run.
|
|
||||||
|
|
||||||
// Launching a kernel will cause the allocated task in
|
|
||||||
// UVM memory to be copied to the GPU.
|
|
||||||
// Synchronize to guarantee non-concurrent access
|
|
||||||
// between host and device.
|
|
||||||
|
|
||||||
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
|
|
||||||
|
|
||||||
Impl::cuda_set_apply_single<task_type,void><<<1,1>>>( task );
|
|
||||||
|
|
||||||
CUDA_SAFE_CALL( cudaGetLastError() );
|
|
||||||
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
|
|
||||||
}
|
|
||||||
|
|
||||||
return Future< value_type , execution_space >( task );
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
__device__
|
|
||||||
Future< typename FunctorType::value_type , execution_space >
|
|
||||||
task_create( const FunctorType & arg_functor
|
|
||||||
, const unsigned arg_dep_capacity = ~0u ) const
|
|
||||||
{
|
|
||||||
typedef typename FunctorType::value_type value_type ;
|
|
||||||
|
|
||||||
typedef Impl::TaskMember< execution_space , value_type , FunctorType >
|
|
||||||
task_type ;
|
|
||||||
|
|
||||||
task_type * const task =
|
|
||||||
static_cast<task_type*>(
|
|
||||||
m_policy->allocate_task( sizeof(task_type) , arg_dep_capacity ) );
|
|
||||||
|
|
||||||
if ( task ) {
|
|
||||||
// The root part of the class has been constructed.
|
|
||||||
// Must now construct the functor and result specific part.
|
|
||||||
|
|
||||||
task_type::copy_construct( task , arg_functor );
|
|
||||||
|
|
||||||
// Setting the apply pointer on the device requires code
|
|
||||||
// executing on the GPU. If this function is called on the
|
|
||||||
// Host then a kernel must be run.
|
|
||||||
|
|
||||||
task->task_root_type::template set_apply_single< task_type , void >();
|
|
||||||
}
|
|
||||||
|
|
||||||
return Future< value_type , execution_space >( task );
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
// Create thread-team task
|
|
||||||
// Main process and tasks must use different functions
|
|
||||||
// to work around CUDA limitation where __host__ __device__
|
|
||||||
// functions are not allowed to invoke templated __global__ functions.
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
Future< typename FunctorType::value_type , execution_space >
|
|
||||||
proc_create_team( const FunctorType & arg_functor
|
|
||||||
, const unsigned arg_dep_capacity = ~0u ) const
|
|
||||||
{
|
|
||||||
typedef typename FunctorType::value_type value_type ;
|
|
||||||
|
|
||||||
typedef Impl::TaskMember< execution_space , value_type , FunctorType >
|
|
||||||
task_type ;
|
|
||||||
|
|
||||||
const unsigned team_shmem_size =
|
|
||||||
Kokkos::Impl::FunctorTeamShmemSize< FunctorType >::value
|
|
||||||
( arg_functor , m_policy->m_team_size );
|
|
||||||
|
|
||||||
task_type * const task =
|
|
||||||
static_cast<task_type*>(
|
|
||||||
m_policy->allocate_task( sizeof(task_type) , arg_dep_capacity , team_shmem_size ) );
|
|
||||||
|
|
||||||
if ( task ) {
|
|
||||||
// The root part of the class has been constructed.
|
|
||||||
// Must now construct the functor and result specific part.
|
|
||||||
|
|
||||||
task_type::copy_construct( task , arg_functor );
|
|
||||||
|
|
||||||
// Setting the apply pointer on the device requires code
|
|
||||||
// executing on the GPU. This function is called on the
|
|
||||||
// host process so a kernel must be run.
|
|
||||||
|
|
||||||
// Launching a kernel will cause the allocated task in
|
|
||||||
// UVM memory to be copied to the GPU.
|
|
||||||
// Synchronize to guarantee non-concurrent access
|
|
||||||
// between host and device.
|
|
||||||
|
|
||||||
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
|
|
||||||
|
|
||||||
Impl::cuda_set_apply_team<task_type,void><<<1,1>>>( task );
|
|
||||||
|
|
||||||
CUDA_SAFE_CALL( cudaGetLastError() );
|
|
||||||
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
|
|
||||||
}
|
|
||||||
|
|
||||||
return Future< value_type , execution_space >( task );
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
__device__
|
|
||||||
Future< typename FunctorType::value_type , execution_space >
|
|
||||||
task_create_team( const FunctorType & arg_functor
|
|
||||||
, const unsigned arg_dep_capacity = ~0u ) const
|
|
||||||
{
|
|
||||||
typedef typename FunctorType::value_type value_type ;
|
|
||||||
|
|
||||||
typedef Impl::TaskMember< execution_space , value_type , FunctorType >
|
|
||||||
task_type ;
|
|
||||||
|
|
||||||
const unsigned team_shmem_size =
|
|
||||||
Kokkos::Impl::FunctorTeamShmemSize< FunctorType >::value
|
|
||||||
( arg_functor , m_policy->m_team_size );
|
|
||||||
|
|
||||||
task_type * const task =
|
|
||||||
static_cast<task_type*>(
|
|
||||||
m_policy->allocate_task( sizeof(task_type) , arg_dep_capacity , team_shmem_size ) );
|
|
||||||
|
|
||||||
if ( task ) {
|
|
||||||
// The root part of the class has been constructed.
|
|
||||||
// Must now construct the functor and result specific part.
|
|
||||||
|
|
||||||
task_type::copy_construct( task , arg_functor );
|
|
||||||
|
|
||||||
// Setting the apply pointer on the device requires code
|
|
||||||
// executing on the GPU. If this function is called on the
|
|
||||||
// Host then a kernel must be run.
|
|
||||||
|
|
||||||
task->task_root_type::template set_apply_team< task_type , void >();
|
|
||||||
}
|
|
||||||
|
|
||||||
return Future< value_type , execution_space >( task );
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
|
|
||||||
Future< Latch , execution_space >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
create_latch( const int N ) const
|
|
||||||
{
|
|
||||||
task_root_type * const task =
|
|
||||||
m_policy->allocate_task( sizeof(task_root_type) , 0 , 0 );
|
|
||||||
task->m_dep_size = N ; // Using m_dep_size for latch counter
|
|
||||||
task->m_state = TASK_STATE_WAITING ;
|
|
||||||
return Future< Latch , execution_space >( task );
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
|
|
||||||
template< class A1 , class A2 , class A3 , class A4 >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void add_dependence( const Future<A1,A2> & after
|
|
||||||
, const Future<A3,A4> & before
|
|
||||||
, typename std::enable_if
|
|
||||||
< std::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
|
|
||||||
&&
|
|
||||||
std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
|
|
||||||
>::type * = 0
|
|
||||||
) const
|
|
||||||
{ m_policy->add_dependence( after.m_task , before.m_task ); }
|
|
||||||
|
|
||||||
template< class FunctorType , class A3 , class A4 >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void add_dependence( FunctorType * task_functor
|
|
||||||
, const Future<A3,A4> & before
|
|
||||||
, typename std::enable_if
|
|
||||||
< std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
|
|
||||||
>::type * = 0
|
|
||||||
) const
|
|
||||||
{ m_policy->add_dependence( get_task_root(task_functor) , before.m_task ); }
|
|
||||||
|
|
||||||
|
|
||||||
template< class ValueType >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
const Future< ValueType , execution_space > &
|
|
||||||
spawn( const Future< ValueType , execution_space > & f
|
|
||||||
, const bool priority = false ) const
|
|
||||||
{
|
|
||||||
if ( f.m_task ) {
|
|
||||||
f.m_task->m_queue =
|
|
||||||
( f.m_task->m_team != 0
|
|
||||||
? & ( m_policy->m_team[ priority ? 0 : 1 ] )
|
|
||||||
: & ( m_policy->m_serial[ priority ? 0 : 1 ] ) );
|
|
||||||
m_policy->schedule_task( f.m_task );
|
|
||||||
}
|
|
||||||
return f ;
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void respawn( FunctorType * task_functor
|
|
||||||
, const bool priority = false ) const
|
|
||||||
{
|
|
||||||
task_root_type * const t = get_task_root(task_functor);
|
|
||||||
t->m_queue =
|
|
||||||
( t->m_team != 0 ? & ( m_policy->m_team[ priority ? 0 : 1 ] )
|
|
||||||
: & ( m_policy->m_serial[ priority ? 0 : 1 ] ) );
|
|
||||||
m_policy->reschedule_task( t );
|
|
||||||
}
|
|
||||||
|
|
||||||
// When a create method fails by returning a null Future
|
|
||||||
// the task that called the create method may respawn
|
|
||||||
// with a dependence on memory becoming available.
|
|
||||||
// This is a race as more than one task may be respawned
|
|
||||||
// with this need.
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void respawn_needing_memory( FunctorType * task_functor ) const
|
|
||||||
{
|
|
||||||
task_root_type * const t = get_task_root(task_functor);
|
|
||||||
t->m_queue =
|
|
||||||
( t->m_team != 0 ? & ( m_policy->m_team[ 2 ] )
|
|
||||||
: & ( m_policy->m_serial[ 2 ] ) );
|
|
||||||
m_policy->reschedule_task( t );
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
// Functions for an executing task functor to query dependences,
|
|
||||||
// set new dependences, and respawn itself.
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
Future< void , execution_space >
|
|
||||||
get_dependence( const FunctorType * task_functor , int i ) const
|
|
||||||
{
|
|
||||||
return Future<void,execution_space>(
|
|
||||||
get_task_root(task_functor)->get_dependence(i)
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
int get_dependence( const FunctorType * task_functor ) const
|
|
||||||
{ return get_task_root(task_functor)->get_dependence(); }
|
|
||||||
|
|
||||||
template< class FunctorType >
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void clear_dependence( FunctorType * task_functor ) const
|
|
||||||
{ get_task_root(task_functor)->clear_dependence(); }
|
|
||||||
|
|
||||||
//----------------------------------------
|
|
||||||
|
|
||||||
__device__
|
|
||||||
static member_type member_single()
|
|
||||||
{
|
|
||||||
return
|
|
||||||
member_type( 0 /* shared memory pointer */
|
|
||||||
, 0 /* shared memory begin offset */
|
|
||||||
, 0 /* shared memory end offset */
|
|
||||||
, 0 /* scratch level_1 pointer */
|
|
||||||
, 0 /* scratch level_1 size */
|
|
||||||
, 0 /* league rank */
|
|
||||||
, 1 /* league size */ );
|
|
||||||
}
|
|
||||||
|
|
||||||
friend void wait( TaskPolicy< Kokkos::Cuda > & );
|
|
||||||
};
|
|
||||||
|
|
||||||
} /* namespace Experimental */
|
|
||||||
} /* namespace Kokkos */
|
|
||||||
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_HAVE_CUDA ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
|
|
||||||
#endif /* #ifndef KOKKOS_CUDA_TASKPOLICY_HPP */
|
|
||||||
|
|
||||||
|
|
||||||
@ -41,53 +41,266 @@
|
|||||||
//@HEADER
|
//@HEADER
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#ifndef KOKKOS_CUDA_VIEW_HPP
|
#ifndef KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
|
||||||
#define KOKKOS_CUDA_VIEW_HPP
|
#define KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
|
||||||
|
|
||||||
/* only compile this file if CUDA is enabled for Kokkos */
|
/* only compile this file if CUDA is enabled for Kokkos */
|
||||||
#ifdef KOKKOS_HAVE_CUDA
|
#if defined( KOKKOS_HAVE_CUDA )
|
||||||
|
|
||||||
#include <cstring>
|
|
||||||
|
|
||||||
#include <Kokkos_HostSpace.hpp>
|
|
||||||
#include <Kokkos_CudaSpace.hpp>
|
|
||||||
#include <impl/Kokkos_Shape.hpp>
|
|
||||||
#include <Kokkos_View.hpp>
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
|
namespace Experimental {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
template<>
|
// Cuda Texture fetches can be performed for 4, 8 and 16 byte objects (int,int2,int4)
|
||||||
struct AssertShapeBoundsAbort< CudaSpace >
|
// Via reinterpret_case this can be used to support all scalar types of those sizes.
|
||||||
{
|
// Any other scalar type falls back to either normal reads out of global memory,
|
||||||
KOKKOS_INLINE_FUNCTION
|
// or using the __ldg intrinsic on Kepler GPUs or newer (Compute Capability >= 3.0)
|
||||||
static void apply( const size_t /* rank */ ,
|
|
||||||
const size_t /* n0 */ , const size_t /* n1 */ ,
|
|
||||||
const size_t /* n2 */ , const size_t /* n3 */ ,
|
|
||||||
const size_t /* n4 */ , const size_t /* n5 */ ,
|
|
||||||
const size_t /* n6 */ , const size_t /* n7 */ ,
|
|
||||||
|
|
||||||
const size_t /* arg_rank */ ,
|
template< typename ValueType , typename AliasType >
|
||||||
const size_t /* i0 */ , const size_t /* i1 */ ,
|
struct CudaTextureFetch {
|
||||||
const size_t /* i2 */ , const size_t /* i3 */ ,
|
|
||||||
const size_t /* i4 */ , const size_t /* i5 */ ,
|
::cudaTextureObject_t m_obj ;
|
||||||
const size_t /* i6 */ , const size_t /* i7 */ )
|
const ValueType * m_ptr ;
|
||||||
|
int m_offset ;
|
||||||
|
|
||||||
|
// Deference operator pulls through texture object and returns by value
|
||||||
|
template< typename iType >
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
ValueType operator[]( const iType & i ) const
|
||||||
{
|
{
|
||||||
Kokkos::abort("Kokkos::View array bounds violation");
|
#if defined( __CUDA_ARCH__ ) && ( 300 <= __CUDA_ARCH__ )
|
||||||
|
AliasType v = tex1Dfetch<AliasType>( m_obj , i + m_offset );
|
||||||
|
return *(reinterpret_cast<ValueType*> (&v));
|
||||||
|
#else
|
||||||
|
return m_ptr[ i ];
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pointer to referenced memory
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
operator const ValueType * () const { return m_ptr ; }
|
||||||
|
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaTextureFetch() : m_obj() , m_ptr() , m_offset() {}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
~CudaTextureFetch() {}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaTextureFetch( const CudaTextureFetch & rhs )
|
||||||
|
: m_obj( rhs.m_obj )
|
||||||
|
, m_ptr( rhs.m_ptr )
|
||||||
|
, m_offset( rhs.m_offset )
|
||||||
|
{}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaTextureFetch( CudaTextureFetch && rhs )
|
||||||
|
: m_obj( rhs.m_obj )
|
||||||
|
, m_ptr( rhs.m_ptr )
|
||||||
|
, m_offset( rhs.m_offset )
|
||||||
|
{}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaTextureFetch & operator = ( const CudaTextureFetch & rhs )
|
||||||
|
{
|
||||||
|
m_obj = rhs.m_obj ;
|
||||||
|
m_ptr = rhs.m_ptr ;
|
||||||
|
m_offset = rhs.m_offset ;
|
||||||
|
return *this ;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaTextureFetch & operator = ( CudaTextureFetch && rhs )
|
||||||
|
{
|
||||||
|
m_obj = rhs.m_obj ;
|
||||||
|
m_ptr = rhs.m_ptr ;
|
||||||
|
m_offset = rhs.m_offset ;
|
||||||
|
return *this ;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Texture object spans the entire allocation.
|
||||||
|
// This handle may view a subset of the allocation, so an offset is required.
|
||||||
|
template< class CudaMemorySpace >
|
||||||
|
inline explicit
|
||||||
|
CudaTextureFetch( const ValueType * const arg_ptr
|
||||||
|
, Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > & record
|
||||||
|
)
|
||||||
|
: m_obj( record.template attach_texture_object< AliasType >() )
|
||||||
|
, m_ptr( arg_ptr )
|
||||||
|
, m_offset( record.attach_texture_object_offset( reinterpret_cast<const AliasType*>( arg_ptr ) ) )
|
||||||
|
{}
|
||||||
|
|
||||||
|
// Texture object spans the entire allocation.
|
||||||
|
// This handle may view a subset of the allocation, so an offset is required.
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaTextureFetch( const CudaTextureFetch & rhs , size_t offset )
|
||||||
|
: m_obj( rhs.m_obj )
|
||||||
|
, m_ptr( rhs.m_ptr + offset)
|
||||||
|
, m_offset( offset + rhs.m_offset )
|
||||||
|
{}
|
||||||
|
};
|
||||||
|
|
||||||
|
#if defined( KOKKOS_CUDA_USE_LDG_INTRINSIC )
|
||||||
|
|
||||||
|
template< typename ValueType , typename AliasType >
|
||||||
|
struct CudaLDGFetch {
|
||||||
|
|
||||||
|
const ValueType * m_ptr ;
|
||||||
|
|
||||||
|
template< typename iType >
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
ValueType operator[]( const iType & i ) const
|
||||||
|
{
|
||||||
|
#ifdef __CUDA_ARCH__
|
||||||
|
AliasType v = __ldg(reinterpret_cast<const AliasType*>(&m_ptr[i]));
|
||||||
|
return *(reinterpret_cast<ValueType*> (&v));
|
||||||
|
#else
|
||||||
|
return m_ptr[i];
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
operator const ValueType * () const { return m_ptr ; }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaLDGFetch() : m_ptr() {}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
~CudaLDGFetch() {}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaLDGFetch( const CudaLDGFetch & rhs )
|
||||||
|
: m_ptr( rhs.m_ptr )
|
||||||
|
{}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaLDGFetch( CudaLDGFetch && rhs )
|
||||||
|
: m_ptr( rhs.m_ptr )
|
||||||
|
{}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaLDGFetch & operator = ( const CudaLDGFetch & rhs )
|
||||||
|
{
|
||||||
|
m_ptr = rhs.m_ptr ;
|
||||||
|
return *this ;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaLDGFetch & operator = ( CudaLDGFetch && rhs )
|
||||||
|
{
|
||||||
|
m_ptr = rhs.m_ptr ;
|
||||||
|
return *this ;
|
||||||
|
}
|
||||||
|
|
||||||
|
template< class CudaMemorySpace >
|
||||||
|
inline explicit
|
||||||
|
CudaLDGFetch( const ValueType * const arg_ptr
|
||||||
|
, Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > const &
|
||||||
|
)
|
||||||
|
: m_ptr( arg_ptr )
|
||||||
|
{}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
CudaLDGFetch( CudaLDGFetch const rhs ,size_t offset)
|
||||||
|
: m_ptr( rhs.m_ptr + offset )
|
||||||
|
{}
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
} // namespace Impl
|
||||||
|
} // namespace Experimental
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
namespace Experimental {
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
|
/** \brief Replace Default ViewDataHandle with Cuda texture fetch specialization
|
||||||
|
* if 'const' value type, CudaSpace and random access.
|
||||||
|
*/
|
||||||
|
template< class Traits >
|
||||||
|
class ViewDataHandle< Traits ,
|
||||||
|
typename std::enable_if<(
|
||||||
|
// Is Cuda memory space
|
||||||
|
( std::is_same< typename Traits::memory_space,Kokkos::CudaSpace>::value ||
|
||||||
|
std::is_same< typename Traits::memory_space,Kokkos::CudaUVMSpace>::value )
|
||||||
|
&&
|
||||||
|
// Is a trivial const value of 4, 8, or 16 bytes
|
||||||
|
std::is_trivial<typename Traits::const_value_type>::value
|
||||||
|
&&
|
||||||
|
std::is_same<typename Traits::const_value_type,typename Traits::value_type>::value
|
||||||
|
&&
|
||||||
|
( sizeof(typename Traits::const_value_type) == 4 ||
|
||||||
|
sizeof(typename Traits::const_value_type) == 8 ||
|
||||||
|
sizeof(typename Traits::const_value_type) == 16 )
|
||||||
|
&&
|
||||||
|
// Random access trait
|
||||||
|
( Traits::memory_traits::RandomAccess != 0 )
|
||||||
|
)>::type >
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
|
||||||
|
using track_type = Kokkos::Experimental::Impl::SharedAllocationTracker ;
|
||||||
|
|
||||||
|
using value_type = typename Traits::const_value_type ;
|
||||||
|
using return_type = typename Traits::const_value_type ; // NOT a reference
|
||||||
|
|
||||||
|
using alias_type = typename std::conditional< ( sizeof(value_type) == 4 ) , int ,
|
||||||
|
typename std::conditional< ( sizeof(value_type) == 8 ) , ::int2 ,
|
||||||
|
typename std::conditional< ( sizeof(value_type) == 16 ) , ::int4 , void
|
||||||
|
>::type
|
||||||
|
>::type
|
||||||
|
>::type ;
|
||||||
|
|
||||||
|
#if defined( KOKKOS_CUDA_USE_LDG_INTRINSIC )
|
||||||
|
using handle_type = Kokkos::Experimental::Impl::CudaLDGFetch< value_type , alias_type > ;
|
||||||
|
#else
|
||||||
|
using handle_type = Kokkos::Experimental::Impl::CudaTextureFetch< value_type , alias_type > ;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
static handle_type const & assign( handle_type const & arg_handle , track_type const & /* arg_tracker */ )
|
||||||
|
{
|
||||||
|
return arg_handle ;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
static handle_type const assign( handle_type const & arg_handle , size_t offset )
|
||||||
|
{
|
||||||
|
return handle_type(arg_handle,offset) ;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
static handle_type assign( value_type * arg_data_ptr, track_type const & arg_tracker )
|
||||||
|
{
|
||||||
|
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
|
||||||
|
// Assignment of texture = non-texture requires creation of a texture object
|
||||||
|
// which can only occur on the host. In addition, 'get_record' is only valid
|
||||||
|
// if called in a host execution space
|
||||||
|
return handle_type( arg_data_ptr , arg_tracker.template get_record< typename Traits::memory_space >() );
|
||||||
|
#else
|
||||||
|
Kokkos::Impl::cuda_abort("Cannot create Cuda texture object from within a Cuda kernel");
|
||||||
|
return handle_type();
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#endif // KOKKOS_HAVE_CUDA
|
#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
|
||||||
#endif /* #ifndef KOKKOS_CUDA_VIEW_HPP */
|
#endif /* #ifndef KOKKOS_CUDA_VIEW_HPP */
|
||||||
|
|
||||||
|
|||||||
@ -47,18 +47,10 @@
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
#include "Kokkos_Macros.hpp"
|
#include "Kokkos_Macros.hpp"
|
||||||
#if defined( __CUDACC__ ) && defined( __CUDA_ARCH__ ) && defined( KOKKOS_HAVE_CUDA )
|
#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
|
||||||
|
|
||||||
#include <cuda.h>
|
#include <cuda.h>
|
||||||
|
|
||||||
#if ! defined( CUDA_VERSION ) || ( CUDA_VERSION < 4010 )
|
|
||||||
#error "Cuda version 4.1 or greater required"
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#if ( __CUDA_ARCH__ < 200 )
|
|
||||||
#error "Cuda device capability 2.0 or greater required"
|
|
||||||
#endif
|
|
||||||
|
|
||||||
extern "C" {
|
extern "C" {
|
||||||
/* Cuda runtime function, declared in <crt/device_runtime.h>
|
/* Cuda runtime function, declared in <crt/device_runtime.h>
|
||||||
* Requires capability 2.x or better.
|
* Requires capability 2.x or better.
|
||||||
@ -90,30 +82,6 @@ void cuda_abort( const char * const message )
|
|||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
#endif /* #if defined(__CUDACC__) && defined( KOKKOS_HAVE_CUDA ) */
|
||||||
#else
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
namespace Impl {
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
void cuda_abort( const char * const ) {}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#endif /* #if defined( __CUDACC__ ) && defined( __CUDA_ARCH__ ) */
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
|
|
||||||
namespace Kokkos {
|
|
||||||
__device__ inline
|
|
||||||
void abort( const char * const message ) { Kokkos::Impl::cuda_abort(message); }
|
|
||||||
}
|
|
||||||
#endif /* defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA ) */
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
//----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
#endif /* #ifndef KOKKOS_CUDA_ABORT_HPP */
|
#endif /* #ifndef KOKKOS_CUDA_ABORT_HPP */
|
||||||
|
|
||||||
|
|||||||
@ -75,15 +75,16 @@
|
|||||||
#if defined(_WIN32)
|
#if defined(_WIN32)
|
||||||
#define KOKKOS_ATOMICS_USE_WINDOWS
|
#define KOKKOS_ATOMICS_USE_WINDOWS
|
||||||
#else
|
#else
|
||||||
#if defined( __CUDA_ARCH__ ) && defined( KOKKOS_HAVE_CUDA )
|
#if defined( KOKKOS_HAVE_CUDA )
|
||||||
|
|
||||||
// Compiling NVIDIA device code, must use Cuda atomics:
|
// Compiling NVIDIA device code, must use Cuda atomics:
|
||||||
|
|
||||||
#define KOKKOS_ATOMICS_USE_CUDA
|
#define KOKKOS_ATOMICS_USE_CUDA
|
||||||
|
#endif
|
||||||
|
|
||||||
#elif ! defined( KOKKOS_ATOMICS_USE_GCC ) && \
|
#if ! defined( KOKKOS_ATOMICS_USE_GCC ) && \
|
||||||
! defined( KOKKOS_ATOMICS_USE_INTEL ) && \
|
! defined( KOKKOS_ATOMICS_USE_INTEL ) && \
|
||||||
! defined( KOKKOS_ATOMICS_USE_OMP31 )
|
! defined( KOKKOS_ATOMICS_USE_OMP31 )
|
||||||
|
|
||||||
// Compiling for non-Cuda atomic implementation has not been pre-selected.
|
// Compiling for non-Cuda atomic implementation has not been pre-selected.
|
||||||
// Choose the best implementation for the detected compiler.
|
// Choose the best implementation for the detected compiler.
|
||||||
@ -91,7 +92,7 @@
|
|||||||
|
|
||||||
#if defined( KOKKOS_COMPILER_GNU ) || \
|
#if defined( KOKKOS_COMPILER_GNU ) || \
|
||||||
defined( KOKKOS_COMPILER_CLANG ) || \
|
defined( KOKKOS_COMPILER_CLANG ) || \
|
||||||
( defined ( KOKKOS_COMPILER_NVCC ) && defined ( __GNUC__ ) )
|
( defined ( KOKKOS_COMPILER_NVCC ) )
|
||||||
|
|
||||||
#define KOKKOS_ATOMICS_USE_GCC
|
#define KOKKOS_ATOMICS_USE_GCC
|
||||||
|
|
||||||
@ -126,6 +127,9 @@ namespace Impl {
|
|||||||
/// This function tries to aquire the lock for the hash value derived
|
/// This function tries to aquire the lock for the hash value derived
|
||||||
/// from the provided ptr. If the lock is successfully aquired the
|
/// from the provided ptr. If the lock is successfully aquired the
|
||||||
/// function returns true. Otherwise it returns false.
|
/// function returns true. Otherwise it returns false.
|
||||||
|
#ifdef KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
|
||||||
|
extern
|
||||||
|
#endif
|
||||||
__device__ inline
|
__device__ inline
|
||||||
bool lock_address_cuda_space(void* ptr);
|
bool lock_address_cuda_space(void* ptr);
|
||||||
|
|
||||||
@ -135,6 +139,9 @@ bool lock_address_cuda_space(void* ptr);
|
|||||||
/// from the provided ptr. This function should only be called
|
/// from the provided ptr. This function should only be called
|
||||||
/// after previously successfully aquiring a lock with
|
/// after previously successfully aquiring a lock with
|
||||||
/// lock_address.
|
/// lock_address.
|
||||||
|
#ifdef KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
|
||||||
|
extern
|
||||||
|
#endif
|
||||||
__device__ inline
|
__device__ inline
|
||||||
void unlock_address_cuda_space(void* ptr);
|
void unlock_address_cuda_space(void* ptr);
|
||||||
}
|
}
|
||||||
@ -287,7 +294,7 @@ const char * atomic_query_version()
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
// This atomic-style macro should be an inlined function, not a macro
|
// This atomic-style macro should be an inlined function, not a macro
|
||||||
|
|
||||||
#if defined( KOKKOS_COMPILER_GNU ) && !defined(__PGIC__)
|
#if defined( KOKKOS_COMPILER_GNU ) && !defined(__PGIC__) && !defined(__CUDA_ARCH__)
|
||||||
|
|
||||||
#define KOKKOS_NONTEMPORAL_PREFETCH_LOAD(addr) __builtin_prefetch(addr,0,0)
|
#define KOKKOS_NONTEMPORAL_PREFETCH_LOAD(addr) __builtin_prefetch(addr,0,0)
|
||||||
#define KOKKOS_NONTEMPORAL_PREFETCH_STORE(addr) __builtin_prefetch(addr,1,0)
|
#define KOKKOS_NONTEMPORAL_PREFETCH_STORE(addr) __builtin_prefetch(addr,1,0)
|
||||||
|
|||||||
@ -46,7 +46,14 @@
|
|||||||
|
|
||||||
#include <type_traits>
|
#include <type_traits>
|
||||||
|
|
||||||
|
// Needed for 'is_space<S>::host_mirror_space
|
||||||
|
#include <Kokkos_Core_fwd.hpp>
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
|
|
||||||
//Schedules for Execution Policies
|
//Schedules for Execution Policies
|
||||||
struct Static {};
|
struct Static {};
|
||||||
struct Dynamic {};
|
struct Dynamic {};
|
||||||
@ -59,7 +66,7 @@ struct Schedule
|
|||||||
|| std::is_same<T,Dynamic>::value
|
|| std::is_same<T,Dynamic>::value
|
||||||
, "Kokkos: Invalid Schedule<> type."
|
, "Kokkos: Invalid Schedule<> type."
|
||||||
);
|
);
|
||||||
using schedule_type = Schedule<T>;
|
using schedule_type = Schedule ;
|
||||||
using type = T;
|
using type = T;
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -68,11 +75,268 @@ template<typename T>
|
|||||||
struct IndexType
|
struct IndexType
|
||||||
{
|
{
|
||||||
static_assert(std::is_integral<T>::value,"Kokkos: Invalid IndexType<>.");
|
static_assert(std::is_integral<T>::value,"Kokkos: Invalid IndexType<>.");
|
||||||
using index_type = IndexType<T>;
|
using index_type = IndexType ;
|
||||||
using type = T;
|
using type = T;
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
|
||||||
|
#define KOKKOS_IMPL_IS_CONCEPT( CONCEPT ) \
|
||||||
|
template< typename T > struct is_ ## CONCEPT { \
|
||||||
|
private: \
|
||||||
|
template< typename , typename = std::true_type > struct have : std::false_type {}; \
|
||||||
|
template< typename U > struct have<U,typename std::is_same<U,typename U:: CONCEPT >::type> : std::true_type {}; \
|
||||||
|
public: \
|
||||||
|
enum { value = is_ ## CONCEPT::template have<T>::value }; \
|
||||||
|
};
|
||||||
|
|
||||||
|
// Public concept:
|
||||||
|
|
||||||
|
KOKKOS_IMPL_IS_CONCEPT( memory_space )
|
||||||
|
KOKKOS_IMPL_IS_CONCEPT( memory_traits )
|
||||||
|
KOKKOS_IMPL_IS_CONCEPT( execution_space )
|
||||||
|
KOKKOS_IMPL_IS_CONCEPT( execution_policy )
|
||||||
|
KOKKOS_IMPL_IS_CONCEPT( array_layout )
|
||||||
|
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
|
// For backward compatibility:
|
||||||
|
|
||||||
|
using Kokkos::is_memory_space ;
|
||||||
|
using Kokkos::is_memory_traits ;
|
||||||
|
using Kokkos::is_execution_space ;
|
||||||
|
using Kokkos::is_execution_policy ;
|
||||||
|
using Kokkos::is_array_layout ;
|
||||||
|
|
||||||
|
// Implementation concept:
|
||||||
|
|
||||||
|
KOKKOS_IMPL_IS_CONCEPT( iteration_pattern )
|
||||||
|
KOKKOS_IMPL_IS_CONCEPT( schedule_type )
|
||||||
|
KOKKOS_IMPL_IS_CONCEPT( index_type )
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef KOKKOS_IMPL_IS_CONCEPT
|
||||||
|
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
|
||||||
|
template< class ExecutionSpace , class MemorySpace >
|
||||||
|
struct Device {
|
||||||
|
static_assert( Kokkos::is_execution_space<ExecutionSpace>::value
|
||||||
|
, "Execution space is not valid" );
|
||||||
|
static_assert( Kokkos::is_memory_space<MemorySpace>::value
|
||||||
|
, "Memory space is not valid" );
|
||||||
|
typedef ExecutionSpace execution_space;
|
||||||
|
typedef MemorySpace memory_space;
|
||||||
|
typedef Device<execution_space,memory_space> device_type;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
template< typename T >
|
||||||
|
struct is_space {
|
||||||
|
private:
|
||||||
|
|
||||||
|
template< typename , typename = void >
|
||||||
|
struct exe : std::false_type { typedef void space ; };
|
||||||
|
|
||||||
|
template< typename , typename = void >
|
||||||
|
struct mem : std::false_type { typedef void space ; };
|
||||||
|
|
||||||
|
template< typename , typename = void >
|
||||||
|
struct dev : std::false_type { typedef void space ; };
|
||||||
|
|
||||||
|
template< typename U >
|
||||||
|
struct exe<U,typename std::conditional<true,void,typename U::execution_space>::type>
|
||||||
|
: std::is_same<U,typename U::execution_space>::type
|
||||||
|
{ typedef typename U::execution_space space ; };
|
||||||
|
|
||||||
|
template< typename U >
|
||||||
|
struct mem<U,typename std::conditional<true,void,typename U::memory_space>::type>
|
||||||
|
: std::is_same<U,typename U::memory_space>::type
|
||||||
|
{ typedef typename U::memory_space space ; };
|
||||||
|
|
||||||
|
template< typename U >
|
||||||
|
struct dev<U,typename std::conditional<true,void,typename U::device_type>::type>
|
||||||
|
: std::is_same<U,typename U::device_type>::type
|
||||||
|
{ typedef typename U::device_type space ; };
|
||||||
|
|
||||||
|
typedef typename is_space::template exe<T> is_exe ;
|
||||||
|
typedef typename is_space::template mem<T> is_mem ;
|
||||||
|
typedef typename is_space::template dev<T> is_dev ;
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
enum { value = is_exe::value || is_mem::value || is_dev::value };
|
||||||
|
|
||||||
|
typedef typename is_exe::space execution_space ;
|
||||||
|
typedef typename is_mem::space memory_space ;
|
||||||
|
|
||||||
|
// For backward compatibility, deprecated in favor of
|
||||||
|
// Kokkos::Impl::HostMirror<S>::host_mirror_space
|
||||||
|
|
||||||
|
typedef typename std::conditional
|
||||||
|
< std::is_same< memory_space , Kokkos::HostSpace >::value
|
||||||
|
#if defined( KOKKOS_HAVE_CUDA )
|
||||||
|
|| std::is_same< memory_space , Kokkos::CudaUVMSpace >::value
|
||||||
|
|| std::is_same< memory_space , Kokkos::CudaHostPinnedSpace >::value
|
||||||
|
#endif /* #if defined( KOKKOS_HAVE_CUDA ) */
|
||||||
|
, memory_space
|
||||||
|
, Kokkos::HostSpace
|
||||||
|
>::type host_memory_space ;
|
||||||
|
|
||||||
|
#if defined( KOKKOS_HAVE_CUDA )
|
||||||
|
typedef typename std::conditional
|
||||||
|
< std::is_same< execution_space , Kokkos::Cuda >::value
|
||||||
|
, Kokkos::DefaultHostExecutionSpace , execution_space
|
||||||
|
>::type host_execution_space ;
|
||||||
|
#else
|
||||||
|
typedef execution_space host_execution_space ;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
typedef typename std::conditional
|
||||||
|
< std::is_same< execution_space , host_execution_space >::value &&
|
||||||
|
std::is_same< memory_space , host_memory_space >::value
|
||||||
|
, T , Kokkos::Device< host_execution_space , host_memory_space >
|
||||||
|
>::type host_mirror_space ;
|
||||||
|
};
|
||||||
|
|
||||||
|
// For backward compatiblity
|
||||||
|
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
|
using Kokkos::is_space ;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
|
/**\brief Access relationship between DstMemorySpace and SrcMemorySpace
|
||||||
|
*
|
||||||
|
* The default case can assume accessibility for the same space.
|
||||||
|
* Specializations must be defined for different memory spaces.
|
||||||
|
*/
|
||||||
|
template< typename DstMemorySpace , typename SrcMemorySpace >
|
||||||
|
struct MemorySpaceAccess {
|
||||||
|
|
||||||
|
static_assert( Kokkos::is_memory_space< DstMemorySpace >::value &&
|
||||||
|
Kokkos::is_memory_space< SrcMemorySpace >::value
|
||||||
|
, "template arguments must be memory spaces" );
|
||||||
|
|
||||||
|
/**\brief Can a View (or pointer) to memory in SrcMemorySpace
|
||||||
|
* be assigned to a View (or pointer) to memory marked DstMemorySpace.
|
||||||
|
*
|
||||||
|
* 1. DstMemorySpace::execution_space == SrcMemorySpace::execution_space
|
||||||
|
* 2. All execution spaces that can access DstMemorySpace can also access
|
||||||
|
* SrcMemorySpace.
|
||||||
|
*/
|
||||||
|
enum { assignable = std::is_same<DstMemorySpace,SrcMemorySpace>::value };
|
||||||
|
|
||||||
|
/**\brief For all DstExecSpace::memory_space == DstMemorySpace
|
||||||
|
* DstExecSpace can access SrcMemorySpace.
|
||||||
|
*/
|
||||||
|
enum { accessible = assignable };
|
||||||
|
|
||||||
|
/**\brief Does a DeepCopy capability exist
|
||||||
|
* to DstMemorySpace from SrcMemorySpace
|
||||||
|
*/
|
||||||
|
enum { deepcopy = assignable };
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
/**\brief Can AccessSpace access MemorySpace ?
|
||||||
|
*
|
||||||
|
* Requires:
|
||||||
|
* Kokkos::is_space< AccessSpace >::value
|
||||||
|
* Kokkos::is_memory_space< MemorySpace >::value
|
||||||
|
*
|
||||||
|
* Can AccessSpace::execution_space access MemorySpace ?
|
||||||
|
* enum : bool { accessible };
|
||||||
|
*
|
||||||
|
* Is View<AccessSpace::memory_space> assignable from View<MemorySpace> ?
|
||||||
|
* enum : bool { assignable };
|
||||||
|
*
|
||||||
|
* If ! accessible then through which intercessory memory space
|
||||||
|
* should a be used to deep copy memory for
|
||||||
|
* AccessSpace::execution_space
|
||||||
|
* to get access.
|
||||||
|
* When AccessSpace::memory_space == Kokkos::HostSpace
|
||||||
|
* then space is the View host mirror space.
|
||||||
|
*/
|
||||||
|
template< typename AccessSpace , typename MemorySpace >
|
||||||
|
struct SpaceAccessibility {
|
||||||
|
private:
|
||||||
|
|
||||||
|
static_assert( Kokkos::is_space< AccessSpace >::value
|
||||||
|
, "template argument #1 must be a Kokkos space" );
|
||||||
|
|
||||||
|
static_assert( Kokkos::is_memory_space< MemorySpace >::value
|
||||||
|
, "template argument #2 must be a Kokkos memory space" );
|
||||||
|
|
||||||
|
// The input AccessSpace may be a Device<ExecSpace,MemSpace>
|
||||||
|
// verify that it is a valid combination of spaces.
|
||||||
|
static_assert( Kokkos::Impl::MemorySpaceAccess
|
||||||
|
< typename AccessSpace::execution_space::memory_space
|
||||||
|
, typename AccessSpace::memory_space
|
||||||
|
>::accessible
|
||||||
|
, "template argument #1 is an invalid space" );
|
||||||
|
|
||||||
|
typedef Kokkos::Impl::MemorySpaceAccess
|
||||||
|
< typename AccessSpace::execution_space::memory_space , MemorySpace >
|
||||||
|
exe_access ;
|
||||||
|
|
||||||
|
typedef Kokkos::Impl::MemorySpaceAccess
|
||||||
|
< typename AccessSpace::memory_space , MemorySpace >
|
||||||
|
mem_access ;
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
/**\brief Can AccessSpace::execution_space access MemorySpace ?
|
||||||
|
*
|
||||||
|
* Default based upon memory space accessibility.
|
||||||
|
* Specialization required for other relationships.
|
||||||
|
*/
|
||||||
|
enum { accessible = exe_access::accessible };
|
||||||
|
|
||||||
|
/**\brief Can assign to AccessSpace from MemorySpace ?
|
||||||
|
*
|
||||||
|
* Default based upon memory space accessibility.
|
||||||
|
* Specialization required for other relationships.
|
||||||
|
*/
|
||||||
|
enum { assignable =
|
||||||
|
is_memory_space< AccessSpace >::value && mem_access::assignable };
|
||||||
|
|
||||||
|
/**\brief Can deep copy to AccessSpace::memory_Space from MemorySpace ? */
|
||||||
|
enum { deepcopy = mem_access::deepcopy };
|
||||||
|
|
||||||
|
// What intercessory space for AccessSpace::execution_space
|
||||||
|
// to be able to access MemorySpace?
|
||||||
|
// If same memory space or not accessible use the AccessSpace
|
||||||
|
// else construct a device with execution space and memory space.
|
||||||
|
typedef typename std::conditional
|
||||||
|
< std::is_same<typename AccessSpace::memory_space,MemorySpace>::value ||
|
||||||
|
! exe_access::accessible
|
||||||
|
, AccessSpace
|
||||||
|
, Kokkos::Device< typename AccessSpace::execution_space , MemorySpace >
|
||||||
|
>::type space ;
|
||||||
|
};
|
||||||
|
|
||||||
|
}} // namespace Kokkos::Impl
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#endif // KOKKOS_CORE_CONCEPTS_HPP
|
#endif // KOKKOS_CORE_CONCEPTS_HPP
|
||||||
|
|
||||||
|
|||||||
@ -72,6 +72,7 @@
|
|||||||
#include <Kokkos_Vectorization.hpp>
|
#include <Kokkos_Vectorization.hpp>
|
||||||
#include <Kokkos_Atomic.hpp>
|
#include <Kokkos_Atomic.hpp>
|
||||||
#include <Kokkos_hwloc.hpp>
|
#include <Kokkos_hwloc.hpp>
|
||||||
|
#include <Kokkos_Timer.hpp>
|
||||||
|
|
||||||
#ifdef KOKKOS_HAVE_CXX11
|
#ifdef KOKKOS_HAVE_CXX11
|
||||||
#include <Kokkos_Complex.hpp>
|
#include <Kokkos_Complex.hpp>
|
||||||
@ -112,7 +113,6 @@ void fence();
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Experimental {
|
|
||||||
|
|
||||||
/* Allocate memory from a memory space.
|
/* Allocate memory from a memory space.
|
||||||
* The allocation is tracked in Kokkos memory tracking system, so
|
* The allocation is tracked in Kokkos memory tracking system, so
|
||||||
@ -155,18 +155,8 @@ void * kokkos_realloc( void * arg_alloc , const size_t arg_alloc_size )
|
|||||||
reallocate_tracked( arg_alloc , arg_alloc_size );
|
reallocate_tracked( arg_alloc , arg_alloc_size );
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Experimental
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
|
|
||||||
using Kokkos::Experimental::kokkos_malloc ;
|
|
||||||
using Kokkos::Experimental::kokkos_realloc ;
|
|
||||||
using Kokkos::Experimental::kokkos_free ;
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|||||||
@ -49,6 +49,7 @@
|
|||||||
// and compiler environment then sets a collection of #define macros.
|
// and compiler environment then sets a collection of #define macros.
|
||||||
|
|
||||||
#include <Kokkos_Macros.hpp>
|
#include <Kokkos_Macros.hpp>
|
||||||
|
#include <impl/Kokkos_Utilities.hpp>
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
// Have assumed a 64bit build (8byte pointers) throughout the code base.
|
// Have assumed a 64bit build (8byte pointers) throughout the code base.
|
||||||
|
|||||||
@ -56,7 +56,7 @@
|
|||||||
#include <Kokkos_CudaSpace.hpp>
|
#include <Kokkos_CudaSpace.hpp>
|
||||||
|
|
||||||
#include <Kokkos_Parallel.hpp>
|
#include <Kokkos_Parallel.hpp>
|
||||||
#include <Kokkos_TaskPolicy.hpp>
|
#include <Kokkos_TaskScheduler.hpp>
|
||||||
#include <Kokkos_Layout.hpp>
|
#include <Kokkos_Layout.hpp>
|
||||||
#include <Kokkos_ScratchSpace.hpp>
|
#include <Kokkos_ScratchSpace.hpp>
|
||||||
#include <Kokkos_MemoryTraits.hpp>
|
#include <Kokkos_MemoryTraits.hpp>
|
||||||
@ -229,6 +229,39 @@ private:
|
|||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess
|
||||||
|
< Kokkos::CudaSpace
|
||||||
|
, Kokkos::Cuda::scratch_memory_space
|
||||||
|
>
|
||||||
|
{
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = false };
|
||||||
|
};
|
||||||
|
|
||||||
|
#if defined( KOKKOS_USE_CUDA_UVM )
|
||||||
|
|
||||||
|
// If forcing use of UVM everywhere
|
||||||
|
// then must assume that CudaUVMSpace
|
||||||
|
// can be a stand-in for CudaSpace.
|
||||||
|
// This will fail when a strange host-side execution space
|
||||||
|
// that defines CudaUVMSpace as its preferredmemory space.
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess
|
||||||
|
< Kokkos::CudaUVMSpace
|
||||||
|
, Kokkos::Cuda::scratch_memory_space
|
||||||
|
>
|
||||||
|
{
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = false };
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
template<>
|
template<>
|
||||||
struct VerifyExecutionCanAccessMemorySpace
|
struct VerifyExecutionCanAccessMemorySpace
|
||||||
< Kokkos::CudaSpace
|
< Kokkos::CudaSpace
|
||||||
@ -259,9 +292,6 @@ struct VerifyExecutionCanAccessMemorySpace
|
|||||||
|
|
||||||
#include <Cuda/Kokkos_CudaExec.hpp>
|
#include <Cuda/Kokkos_CudaExec.hpp>
|
||||||
#include <Cuda/Kokkos_Cuda_View.hpp>
|
#include <Cuda/Kokkos_Cuda_View.hpp>
|
||||||
|
|
||||||
#include <Cuda/KokkosExp_Cuda_View.hpp>
|
|
||||||
|
|
||||||
#include <Cuda/Kokkos_Cuda_Parallel.hpp>
|
#include <Cuda/Kokkos_Cuda_Parallel.hpp>
|
||||||
#include <Cuda/Kokkos_Cuda_Task.hpp>
|
#include <Cuda/Kokkos_Cuda_Task.hpp>
|
||||||
|
|
||||||
|
|||||||
@ -88,6 +88,9 @@ public:
|
|||||||
void deallocate( void * const arg_alloc_ptr
|
void deallocate( void * const arg_alloc_ptr
|
||||||
, const size_t arg_alloc_size ) const ;
|
, const size_t arg_alloc_size ) const ;
|
||||||
|
|
||||||
|
/**\brief Return Name of the MemorySpace */
|
||||||
|
static constexpr const char* name();
|
||||||
|
|
||||||
/*--------------------------------*/
|
/*--------------------------------*/
|
||||||
/** \brief Error reporting for HostSpace attempt to access CudaSpace */
|
/** \brief Error reporting for HostSpace attempt to access CudaSpace */
|
||||||
static void access_error();
|
static void access_error();
|
||||||
@ -97,7 +100,8 @@ private:
|
|||||||
|
|
||||||
int m_device ; ///< Which Cuda device
|
int m_device ; ///< Which Cuda device
|
||||||
|
|
||||||
// friend class Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > ;
|
static constexpr const char* m_name = "Cuda";
|
||||||
|
friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > ;
|
||||||
};
|
};
|
||||||
|
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
@ -156,6 +160,14 @@ public:
|
|||||||
/** \brief If UVM capability is available */
|
/** \brief If UVM capability is available */
|
||||||
static bool available();
|
static bool available();
|
||||||
|
|
||||||
|
|
||||||
|
/*--------------------------------*/
|
||||||
|
/** \brief CudaUVMSpace specific routine */
|
||||||
|
static int number_of_allocations();
|
||||||
|
|
||||||
|
/*--------------------------------*/
|
||||||
|
|
||||||
|
|
||||||
/*--------------------------------*/
|
/*--------------------------------*/
|
||||||
|
|
||||||
CudaUVMSpace();
|
CudaUVMSpace();
|
||||||
@ -172,11 +184,16 @@ public:
|
|||||||
void deallocate( void * const arg_alloc_ptr
|
void deallocate( void * const arg_alloc_ptr
|
||||||
, const size_t arg_alloc_size ) const ;
|
, const size_t arg_alloc_size ) const ;
|
||||||
|
|
||||||
|
/**\brief Return Name of the MemorySpace */
|
||||||
|
static constexpr const char* name();
|
||||||
|
|
||||||
/*--------------------------------*/
|
/*--------------------------------*/
|
||||||
|
|
||||||
private:
|
private:
|
||||||
|
|
||||||
int m_device ; ///< Which Cuda device
|
int m_device ; ///< Which Cuda device
|
||||||
|
|
||||||
|
static constexpr const char* m_name = "CudaUVM";
|
||||||
|
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
@ -215,6 +232,13 @@ public:
|
|||||||
void deallocate( void * const arg_alloc_ptr
|
void deallocate( void * const arg_alloc_ptr
|
||||||
, const size_t arg_alloc_size ) const ;
|
, const size_t arg_alloc_size ) const ;
|
||||||
|
|
||||||
|
/**\brief Return Name of the MemorySpace */
|
||||||
|
static constexpr const char* name();
|
||||||
|
|
||||||
|
private:
|
||||||
|
|
||||||
|
static constexpr const char* m_name = "CudaHostPinned";
|
||||||
|
|
||||||
/*--------------------------------*/
|
/*--------------------------------*/
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -226,6 +250,126 @@ public:
|
|||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
|
static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaSpace >::assignable , "" );
|
||||||
|
static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaUVMSpace >::assignable , "" );
|
||||||
|
static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaSpace > {
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = false };
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaUVMSpace > {
|
||||||
|
// HostSpace::execution_space != CudaUVMSpace::execution_space
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace > {
|
||||||
|
// HostSpace::execution_space == CudaHostPinnedSpace::execution_space
|
||||||
|
enum { assignable = true };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::HostSpace > {
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = false };
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaUVMSpace > {
|
||||||
|
// CudaSpace::execution_space == CudaUVMSpace::execution_space
|
||||||
|
enum { assignable = true };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaHostPinnedSpace > {
|
||||||
|
// CudaSpace::execution_space != CudaHostPinnedSpace::execution_space
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true }; // CudaSpace::execution_space
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
// CudaUVMSpace::execution_space == Cuda
|
||||||
|
// CudaUVMSpace accessible to both Cuda and Host
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::HostSpace > {
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = false }; // Cuda cannot access HostSpace
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaSpace > {
|
||||||
|
// CudaUVMSpace::execution_space == CudaSpace::execution_space
|
||||||
|
// Can access CudaUVMSpace from Host but cannot access CudaSpace from Host
|
||||||
|
enum { assignable = false };
|
||||||
|
|
||||||
|
// CudaUVMSpace::execution_space can access CudaSpace
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaHostPinnedSpace > {
|
||||||
|
// CudaUVMSpace::execution_space != CudaHostPinnedSpace::execution_space
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true }; // CudaUVMSpace::execution_space
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
// CudaHostPinnedSpace::execution_space == HostSpace::execution_space
|
||||||
|
// CudaHostPinnedSpace accessible to both Cuda and Host
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace > {
|
||||||
|
enum { assignable = false }; // Cannot access from Cuda
|
||||||
|
enum { accessible = true }; // CudaHostPinnedSpace::execution_space
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaSpace > {
|
||||||
|
enum { assignable = false }; // Cannot access from Host
|
||||||
|
enum { accessible = false };
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaUVMSpace > {
|
||||||
|
enum { assignable = false }; // different execution_space
|
||||||
|
enum { accessible = true }; // same accessibility
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
}} // namespace Kokkos::Impl
|
||||||
|
|
||||||
|
/*--------------------------------------------------------------------------*/
|
||||||
|
/*--------------------------------------------------------------------------*/
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
void DeepCopyAsyncCuda( void * dst , const void * src , size_t n);
|
void DeepCopyAsyncCuda( void * dst , const void * src , size_t n);
|
||||||
|
|
||||||
template<> struct DeepCopy< CudaSpace , CudaSpace , Cuda>
|
template<> struct DeepCopy< CudaSpace , CudaSpace , Cuda>
|
||||||
@ -553,7 +697,6 @@ struct VerifyExecutionCanAccessMemorySpace< Kokkos::HostSpace , Kokkos::CudaHost
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
template<>
|
template<>
|
||||||
@ -791,7 +934,6 @@ public:
|
|||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
} // namespace Experimental
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|||||||
@ -52,6 +52,7 @@
|
|||||||
#include <impl/Kokkos_AnalyzePolicy.hpp>
|
#include <impl/Kokkos_AnalyzePolicy.hpp>
|
||||||
#include <Kokkos_Concepts.hpp>
|
#include <Kokkos_Concepts.hpp>
|
||||||
#include <iostream>
|
#include <iostream>
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
@ -82,7 +83,6 @@ class RangePolicy
|
|||||||
: public Impl::PolicyTraits<Properties ... >
|
: public Impl::PolicyTraits<Properties ... >
|
||||||
{
|
{
|
||||||
private:
|
private:
|
||||||
|
|
||||||
typedef Impl::PolicyTraits<Properties ... > traits;
|
typedef Impl::PolicyTraits<Properties ... > traits;
|
||||||
|
|
||||||
typename traits::execution_space m_space ;
|
typename traits::execution_space m_space ;
|
||||||
@ -90,8 +90,8 @@ private:
|
|||||||
typename traits::index_type m_end ;
|
typename traits::index_type m_end ;
|
||||||
typename traits::index_type m_granularity ;
|
typename traits::index_type m_granularity ;
|
||||||
typename traits::index_type m_granularity_mask ;
|
typename traits::index_type m_granularity_mask ;
|
||||||
public:
|
|
||||||
|
|
||||||
|
public:
|
||||||
//! Tag this class as an execution policy
|
//! Tag this class as an execution policy
|
||||||
typedef RangePolicy execution_policy;
|
typedef RangePolicy execution_policy;
|
||||||
typedef typename traits::index_type member_type ;
|
typedef typename traits::index_type member_type ;
|
||||||
@ -100,7 +100,6 @@ public:
|
|||||||
KOKKOS_INLINE_FUNCTION member_type begin() const { return m_begin ; }
|
KOKKOS_INLINE_FUNCTION member_type begin() const { return m_begin ; }
|
||||||
KOKKOS_INLINE_FUNCTION member_type end() const { return m_end ; }
|
KOKKOS_INLINE_FUNCTION member_type end() const { return m_end ; }
|
||||||
|
|
||||||
|
|
||||||
//TODO: find a better workaround for Clangs weird instantiation order
|
//TODO: find a better workaround for Clangs weird instantiation order
|
||||||
// This thing is here because of an instantiation error, where the RangePolicy is inserted into FunctorValue Traits, which
|
// This thing is here because of an instantiation error, where the RangePolicy is inserted into FunctorValue Traits, which
|
||||||
// tries decltype on the operator. It tries to do this even though the first argument of parallel for clearly doesn't match.
|
// tries decltype on the operator. It tries to do this even though the first argument of parallel for clearly doesn't match.
|
||||||
@ -135,47 +134,45 @@ public:
|
|||||||
, work_begin , work_end )
|
, work_begin , work_end )
|
||||||
{}
|
{}
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
/** \brief return chunk_size */
|
||||||
|
inline member_type chunk_size() const {
|
||||||
|
return m_granularity;
|
||||||
|
}
|
||||||
|
|
||||||
/** \brief return chunk_size */
|
/** \brief set chunk_size to a discrete value*/
|
||||||
inline member_type chunk_size() const {
|
inline RangePolicy set_chunk_size(int chunk_size_) const {
|
||||||
return m_granularity;
|
RangePolicy p = *this;
|
||||||
}
|
p.m_granularity = chunk_size_;
|
||||||
|
p.m_granularity_mask = p.m_granularity - 1;
|
||||||
|
return p;
|
||||||
|
}
|
||||||
|
|
||||||
/** \brief set chunk_size to a discrete value*/
|
private:
|
||||||
inline RangePolicy set_chunk_size(int chunk_size_) const {
|
/** \brief finalize chunk_size if it was set to AUTO*/
|
||||||
RangePolicy p = *this;
|
inline void set_auto_chunk_size() {
|
||||||
p.m_granularity = chunk_size_;
|
|
||||||
p.m_granularity_mask = p.m_granularity - 1;
|
|
||||||
return p;
|
|
||||||
}
|
|
||||||
|
|
||||||
private:
|
typename traits::index_type concurrency = traits::execution_space::concurrency();
|
||||||
/** \brief finalize chunk_size if it was set to AUTO*/
|
if( concurrency==0 ) concurrency=1;
|
||||||
inline void set_auto_chunk_size() {
|
|
||||||
|
|
||||||
typename traits::index_type concurrency = traits::execution_space::concurrency();
|
if(m_granularity > 0) {
|
||||||
if( concurrency==0 ) concurrency=1;
|
if(!Impl::is_integral_power_of_two( m_granularity ))
|
||||||
|
Kokkos::abort("RangePolicy blocking granularity must be power of two" );
|
||||||
|
}
|
||||||
|
|
||||||
if(m_granularity > 0) {
|
member_type new_chunk_size = 1;
|
||||||
if(!Impl::is_integral_power_of_two( m_granularity ))
|
while(new_chunk_size*100*concurrency < m_end-m_begin)
|
||||||
Kokkos::abort("RangePolicy blocking granularity must be power of two" );
|
new_chunk_size *= 2;
|
||||||
}
|
if(new_chunk_size < 128) {
|
||||||
|
new_chunk_size = 1;
|
||||||
|
while( (new_chunk_size*40*concurrency < m_end-m_begin ) && (new_chunk_size<128) )
|
||||||
|
new_chunk_size*=2;
|
||||||
|
}
|
||||||
|
m_granularity = new_chunk_size;
|
||||||
|
m_granularity_mask = m_granularity - 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
public:
|
||||||
member_type new_chunk_size = 1;
|
|
||||||
while(new_chunk_size*100*concurrency < m_end-m_begin)
|
|
||||||
new_chunk_size *= 2;
|
|
||||||
if(new_chunk_size < 128) {
|
|
||||||
new_chunk_size = 1;
|
|
||||||
while( (new_chunk_size*40*concurrency < m_end-m_begin ) && (new_chunk_size<128) )
|
|
||||||
new_chunk_size*=2;
|
|
||||||
}
|
|
||||||
m_granularity = new_chunk_size;
|
|
||||||
m_granularity_mask = m_granularity - 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
public:
|
|
||||||
/** \brief Subrange for a partition's rank and size.
|
/** \brief Subrange for a partition's rank and size.
|
||||||
*
|
*
|
||||||
* Typically used to partition a range over a group of threads.
|
* Typically used to partition a range over a group of threads.
|
||||||
@ -212,16 +209,15 @@ public:
|
|||||||
if ( range.end() < m_end ) m_end = range.end() ;
|
if ( range.end() < m_end ) m_end = range.end() ;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
private:
|
|
||||||
member_type m_begin ;
|
|
||||||
member_type m_end ;
|
|
||||||
WorkRange();
|
|
||||||
WorkRange & operator = ( const WorkRange & );
|
|
||||||
|
|
||||||
|
private:
|
||||||
|
member_type m_begin ;
|
||||||
|
member_type m_end ;
|
||||||
|
WorkRange();
|
||||||
|
WorkRange & operator = ( const WorkRange & );
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
@ -231,7 +227,6 @@ namespace Kokkos {
|
|||||||
|
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
|
|
||||||
template< class ExecSpace, class ... Properties>
|
template< class ExecSpace, class ... Properties>
|
||||||
class TeamPolicyInternal: public Impl::PolicyTraits<Properties ... > {
|
class TeamPolicyInternal: public Impl::PolicyTraits<Properties ... > {
|
||||||
private:
|
private:
|
||||||
@ -245,6 +240,10 @@ public:
|
|||||||
* This size takes into account execution space concurrency limitations and
|
* This size takes into account execution space concurrency limitations and
|
||||||
* scratch memory space limitations for reductions, team reduce/scan, and
|
* scratch memory space limitations for reductions, team reduce/scan, and
|
||||||
* team shared memory.
|
* team shared memory.
|
||||||
|
*
|
||||||
|
* This function only works for single-operator functors.
|
||||||
|
* With multi-operator functors it cannot be determined
|
||||||
|
* which operator will be called.
|
||||||
*/
|
*/
|
||||||
template< class FunctorType >
|
template< class FunctorType >
|
||||||
static int team_size_max( const FunctorType & );
|
static int team_size_max( const FunctorType & );
|
||||||
@ -254,6 +253,10 @@ public:
|
|||||||
* This size takes into account execution space concurrency limitations and
|
* This size takes into account execution space concurrency limitations and
|
||||||
* scratch memory space limitations for reductions, team reduce/scan, and
|
* scratch memory space limitations for reductions, team reduce/scan, and
|
||||||
* team shared memory.
|
* team shared memory.
|
||||||
|
*
|
||||||
|
* This function only works for single-operator functors.
|
||||||
|
* With multi-operator functors it cannot be determined
|
||||||
|
* which operator will be called.
|
||||||
*/
|
*/
|
||||||
template< class FunctorType >
|
template< class FunctorType >
|
||||||
static int team_size_recommended( const FunctorType & );
|
static int team_size_recommended( const FunctorType & );
|
||||||
@ -344,9 +347,7 @@ public:
|
|||||||
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const ;
|
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const ;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
}
|
|
||||||
|
|
||||||
namespace Impl {
|
|
||||||
struct PerTeamValue {
|
struct PerTeamValue {
|
||||||
int value;
|
int value;
|
||||||
PerTeamValue(int arg);
|
PerTeamValue(int arg);
|
||||||
@ -356,12 +357,12 @@ namespace Impl {
|
|||||||
int value;
|
int value;
|
||||||
PerThreadValue(int arg);
|
PerThreadValue(int arg);
|
||||||
};
|
};
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
Impl::PerTeamValue PerTeam(const int& arg);
|
Impl::PerTeamValue PerTeam(const int& arg);
|
||||||
Impl::PerThreadValue PerThread(const int& arg);
|
Impl::PerThreadValue PerThread(const int& arg);
|
||||||
|
|
||||||
|
|
||||||
/** \brief Execution policy for parallel work over a league of teams of threads.
|
/** \brief Execution policy for parallel work over a league of teams of threads.
|
||||||
*
|
*
|
||||||
* The work functor is called for each thread of each team such that
|
* The work functor is called for each thread of each team such that
|
||||||
@ -443,10 +444,6 @@ public:
|
|||||||
|
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Kokkos
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
|
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
template<typename iType, class TeamMemberType>
|
template<typename iType, class TeamMemberType>
|
||||||
@ -484,8 +481,8 @@ public:
|
|||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
TeamThreadRangeBoundariesStruct( const TeamMemberType& arg_thread
|
TeamThreadRangeBoundariesStruct( const TeamMemberType& arg_thread
|
||||||
, const iType& arg_end
|
, const iType& arg_end
|
||||||
)
|
)
|
||||||
: start( ibegin( 0 , arg_end , arg_thread.team_rank() , arg_thread.team_size() ) )
|
: start( ibegin( 0 , arg_end , arg_thread.team_rank() , arg_thread.team_size() ) )
|
||||||
, end( iend( 0 , arg_end , arg_thread.team_rank() , arg_thread.team_size() ) )
|
, end( iend( 0 , arg_end , arg_thread.team_rank() , arg_thread.team_size() ) )
|
||||||
, thread( arg_thread )
|
, thread( arg_thread )
|
||||||
@ -502,32 +499,33 @@ public:
|
|||||||
{}
|
{}
|
||||||
};
|
};
|
||||||
|
|
||||||
template<typename iType, class TeamMemberType>
|
template<typename iType, class TeamMemberType>
|
||||||
struct ThreadVectorRangeBoundariesStruct {
|
struct ThreadVectorRangeBoundariesStruct {
|
||||||
typedef iType index_type;
|
typedef iType index_type;
|
||||||
enum {start = 0};
|
enum {start = 0};
|
||||||
const iType end;
|
const iType end;
|
||||||
enum {increment = 1};
|
enum {increment = 1};
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
ThreadVectorRangeBoundariesStruct (const TeamMemberType& thread, const iType& count):
|
ThreadVectorRangeBoundariesStruct ( const TeamMemberType, const iType& count ) : end( count ) {}
|
||||||
end( count )
|
KOKKOS_INLINE_FUNCTION
|
||||||
{}
|
ThreadVectorRangeBoundariesStruct ( const iType& count ) : end( count ) {}
|
||||||
};
|
};
|
||||||
|
|
||||||
template<class TeamMemberType>
|
template<class TeamMemberType>
|
||||||
struct ThreadSingleStruct {
|
struct ThreadSingleStruct {
|
||||||
const TeamMemberType& team_member;
|
const TeamMemberType& team_member;
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
ThreadSingleStruct(const TeamMemberType& team_member_):team_member(team_member_){}
|
ThreadSingleStruct( const TeamMemberType& team_member_ ) : team_member( team_member_ ) {}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
template<class TeamMemberType>
|
||||||
|
struct VectorSingleStruct {
|
||||||
|
const TeamMemberType& team_member;
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
VectorSingleStruct( const TeamMemberType& team_member_ ) : team_member( team_member_ ) {}
|
||||||
|
};
|
||||||
|
|
||||||
template<class TeamMemberType>
|
|
||||||
struct VectorSingleStruct {
|
|
||||||
const TeamMemberType& team_member;
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
|
||||||
VectorSingleStruct(const TeamMemberType& team_member_):team_member(team_member_){}
|
|
||||||
};
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
|
|
||||||
/** \brief Execution policy for parallel work over a threads within a team.
|
/** \brief Execution policy for parallel work over a threads within a team.
|
||||||
@ -538,7 +536,8 @@ public:
|
|||||||
*/
|
*/
|
||||||
template<typename iType, class TeamMemberType>
|
template<typename iType, class TeamMemberType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,TeamMemberType> TeamThreadRange(const TeamMemberType&, const iType& count);
|
Impl::TeamThreadRangeBoundariesStruct<iType,TeamMemberType>
|
||||||
|
TeamThreadRange( const TeamMemberType&, const iType& count );
|
||||||
|
|
||||||
/** \brief Execution policy for parallel work over a threads within a team.
|
/** \brief Execution policy for parallel work over a threads within a team.
|
||||||
*
|
*
|
||||||
@ -546,9 +545,10 @@ Impl::TeamThreadRangeBoundariesStruct<iType,TeamMemberType> TeamThreadRange(cons
|
|||||||
* This policy is used together with a parallel pattern as a nested layer within a kernel launched
|
* This policy is used together with a parallel pattern as a nested layer within a kernel launched
|
||||||
* with the TeamPolicy. This variant expects a begin and end. So the range is (begin,end].
|
* with the TeamPolicy. This variant expects a begin and end. So the range is (begin,end].
|
||||||
*/
|
*/
|
||||||
template<typename iType, class TeamMemberType>
|
template<typename iType1, typename iType2, class TeamMemberType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,TeamMemberType> TeamThreadRange(const TeamMemberType&, const iType& begin, const iType& end);
|
Impl::TeamThreadRangeBoundariesStruct<typename std::common_type<iType1, iType2>::type, TeamMemberType>
|
||||||
|
TeamThreadRange( const TeamMemberType&, const iType1& begin, const iType2& end );
|
||||||
|
|
||||||
/** \brief Execution policy for a vector parallel loop.
|
/** \brief Execution policy for a vector parallel loop.
|
||||||
*
|
*
|
||||||
@ -558,13 +558,12 @@ Impl::TeamThreadRangeBoundariesStruct<iType,TeamMemberType> TeamThreadRange(cons
|
|||||||
*/
|
*/
|
||||||
template<typename iType, class TeamMemberType>
|
template<typename iType, class TeamMemberType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::ThreadVectorRangeBoundariesStruct<iType,TeamMemberType> ThreadVectorRange(const TeamMemberType&, const iType& count);
|
Impl::ThreadVectorRangeBoundariesStruct<iType,TeamMemberType>
|
||||||
|
ThreadVectorRange( const TeamMemberType&, const iType& count );
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
|
||||||
#endif /* #define KOKKOS_EXECPOLICY_HPP */
|
#endif /* #define KOKKOS_EXECPOLICY_HPP */
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|||||||
@ -46,7 +46,6 @@
|
|||||||
|
|
||||||
|
|
||||||
#include <Kokkos_HostSpace.hpp>
|
#include <Kokkos_HostSpace.hpp>
|
||||||
#include <impl/Kokkos_HBWAllocators.hpp>
|
|
||||||
|
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
#ifdef KOKKOS_HAVE_HBWSPACE
|
#ifdef KOKKOS_HAVE_HBWSPACE
|
||||||
@ -148,11 +147,14 @@ public:
|
|||||||
void deallocate( void * const arg_alloc_ptr
|
void deallocate( void * const arg_alloc_ptr
|
||||||
, const size_t arg_alloc_size ) const ;
|
, const size_t arg_alloc_size ) const ;
|
||||||
|
|
||||||
|
/**\brief Return Name of the MemorySpace */
|
||||||
|
static constexpr const char* name();
|
||||||
|
|
||||||
private:
|
private:
|
||||||
|
|
||||||
AllocationMechanism m_alloc_mech ;
|
AllocationMechanism m_alloc_mech ;
|
||||||
|
static constexpr const char* m_name = "HBW";
|
||||||
friend class Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > ;
|
friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > ;
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Experimental
|
} // namespace Experimental
|
||||||
@ -162,7 +164,6 @@ private:
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Experimental {
|
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
template<>
|
template<>
|
||||||
@ -239,9 +240,33 @@ public:
|
|||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
} // namespace Experimental
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
namespace Impl {
|
||||||
|
|
||||||
|
static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::Experimental::HBWSpace , Kokkos::Experimental::HBWSpace >::assignable , "" );
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::Experimental::HBWSpace > {
|
||||||
|
enum { assignable = true };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess< Kokkos::Experimental::HBWSpace , Kokkos::HostSpace> {
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = true };
|
||||||
|
};
|
||||||
|
|
||||||
|
}}
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|||||||
@ -50,12 +50,12 @@
|
|||||||
#include <typeinfo>
|
#include <typeinfo>
|
||||||
|
|
||||||
#include <Kokkos_Core_fwd.hpp>
|
#include <Kokkos_Core_fwd.hpp>
|
||||||
|
#include <Kokkos_Concepts.hpp>
|
||||||
#include <Kokkos_MemoryTraits.hpp>
|
#include <Kokkos_MemoryTraits.hpp>
|
||||||
|
|
||||||
#include <impl/Kokkos_Traits.hpp>
|
#include <impl/Kokkos_Traits.hpp>
|
||||||
#include <impl/Kokkos_Error.hpp>
|
#include <impl/Kokkos_Error.hpp>
|
||||||
|
#include <impl/Kokkos_SharedAlloc.hpp>
|
||||||
#include <impl/KokkosExp_SharedAlloc.hpp>
|
|
||||||
|
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
|
|
||||||
@ -155,11 +155,14 @@ public:
|
|||||||
void deallocate( void * const arg_alloc_ptr
|
void deallocate( void * const arg_alloc_ptr
|
||||||
, const size_t arg_alloc_size ) const ;
|
, const size_t arg_alloc_size ) const ;
|
||||||
|
|
||||||
|
/**\brief Return Name of the MemorySpace */
|
||||||
|
static constexpr const char* name();
|
||||||
|
|
||||||
private:
|
private:
|
||||||
|
|
||||||
AllocationMechanism m_alloc_mech ;
|
AllocationMechanism m_alloc_mech ;
|
||||||
|
static constexpr const char* m_name = "Host";
|
||||||
friend class Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > ;
|
friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > ;
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
@ -168,7 +171,47 @@ private:
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Experimental {
|
namespace Impl {
|
||||||
|
|
||||||
|
static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::HostSpace >::assignable , "" );
|
||||||
|
|
||||||
|
|
||||||
|
template< typename S >
|
||||||
|
struct HostMirror {
|
||||||
|
private:
|
||||||
|
|
||||||
|
// If input execution space can access HostSpace then keep it.
|
||||||
|
// Example: Kokkos::OpenMP can access, Kokkos::Cuda cannot
|
||||||
|
enum { keep_exe = Kokkos::Impl::MemorySpaceAccess
|
||||||
|
< typename S::execution_space::memory_space , Kokkos::HostSpace >
|
||||||
|
::accessible };
|
||||||
|
|
||||||
|
// If HostSpace can access memory space then keep it.
|
||||||
|
// Example: Cannot access Kokkos::CudaSpace, can access Kokkos::CudaUVMSpace
|
||||||
|
enum { keep_mem = Kokkos::Impl::MemorySpaceAccess
|
||||||
|
< Kokkos::HostSpace , typename S::memory_space >::accessible };
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
typedef typename std::conditional
|
||||||
|
< keep_exe && keep_mem /* Can keep whole space */
|
||||||
|
, S
|
||||||
|
, typename std::conditional
|
||||||
|
< keep_mem /* Can keep memory space, use default Host execution space */
|
||||||
|
, Kokkos::Device< Kokkos::HostSpace::execution_space
|
||||||
|
, typename S::memory_space >
|
||||||
|
, Kokkos::HostSpace
|
||||||
|
>::type
|
||||||
|
>::type Space ;
|
||||||
|
};
|
||||||
|
|
||||||
|
} // namespace Impl
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
template<>
|
template<>
|
||||||
@ -245,7 +288,6 @@ public:
|
|||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
} // namespace Experimental
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|||||||
@ -82,7 +82,7 @@ struct LayoutLeft {
|
|||||||
LayoutLeft & operator = ( LayoutLeft && ) = default ;
|
LayoutLeft & operator = ( LayoutLeft && ) = default ;
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
constexpr
|
explicit constexpr
|
||||||
LayoutLeft( size_t N0 = 0 , size_t N1 = 0 , size_t N2 = 0 , size_t N3 = 0
|
LayoutLeft( size_t N0 = 0 , size_t N1 = 0 , size_t N2 = 0 , size_t N3 = 0
|
||||||
, size_t N4 = 0 , size_t N5 = 0 , size_t N6 = 0 , size_t N7 = 0 )
|
, size_t N4 = 0 , size_t N5 = 0 , size_t N6 = 0 , size_t N7 = 0 )
|
||||||
: dimension { N0 , N1 , N2 , N3 , N4 , N5 , N6 , N7 } {}
|
: dimension { N0 , N1 , N2 , N3 , N4 , N5 , N6 , N7 } {}
|
||||||
@ -114,7 +114,7 @@ struct LayoutRight {
|
|||||||
LayoutRight & operator = ( LayoutRight && ) = default ;
|
LayoutRight & operator = ( LayoutRight && ) = default ;
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
constexpr
|
explicit constexpr
|
||||||
LayoutRight( size_t N0 = 0 , size_t N1 = 0 , size_t N2 = 0 , size_t N3 = 0
|
LayoutRight( size_t N0 = 0 , size_t N1 = 0 , size_t N2 = 0 , size_t N3 = 0
|
||||||
, size_t N4 = 0 , size_t N5 = 0 , size_t N6 = 0 , size_t N7 = 0 )
|
, size_t N4 = 0 , size_t N5 = 0 , size_t N6 = 0 , size_t N7 = 0 )
|
||||||
: dimension { N0 , N1 , N2 , N3 , N4 , N5 , N6 , N7 } {}
|
: dimension { N0 , N1 , N2 , N3 , N4 , N5 , N6 , N7 } {}
|
||||||
@ -132,6 +132,11 @@ struct LayoutStride {
|
|||||||
size_t dimension[ ARRAY_LAYOUT_MAX_RANK ] ;
|
size_t dimension[ ARRAY_LAYOUT_MAX_RANK ] ;
|
||||||
size_t stride[ ARRAY_LAYOUT_MAX_RANK ] ;
|
size_t stride[ ARRAY_LAYOUT_MAX_RANK ] ;
|
||||||
|
|
||||||
|
LayoutStride( LayoutStride const & ) = default ;
|
||||||
|
LayoutStride( LayoutStride && ) = default ;
|
||||||
|
LayoutStride & operator = ( LayoutStride const & ) = default ;
|
||||||
|
LayoutStride & operator = ( LayoutStride && ) = default ;
|
||||||
|
|
||||||
/** \brief Compute strides from ordered dimensions.
|
/** \brief Compute strides from ordered dimensions.
|
||||||
*
|
*
|
||||||
* Values of order uniquely form the set [0..rank)
|
* Values of order uniquely form the set [0..rank)
|
||||||
@ -164,7 +169,8 @@ struct LayoutStride {
|
|||||||
return tmp ;
|
return tmp ;
|
||||||
}
|
}
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION constexpr
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
explicit constexpr
|
||||||
LayoutStride( size_t N0 = 0 , size_t S0 = 0
|
LayoutStride( size_t N0 = 0 , size_t S0 = 0
|
||||||
, size_t N1 = 0 , size_t S1 = 0
|
, size_t N1 = 0 , size_t S1 = 0
|
||||||
, size_t N2 = 0 , size_t S2 = 0
|
, size_t N2 = 0 , size_t S2 = 0
|
||||||
@ -220,7 +226,7 @@ struct LayoutTileLeft {
|
|||||||
LayoutTileLeft & operator = ( LayoutTileLeft && ) = default ;
|
LayoutTileLeft & operator = ( LayoutTileLeft && ) = default ;
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
constexpr
|
explicit constexpr
|
||||||
LayoutTileLeft( size_t argN0 = 0 , size_t argN1 = 0 , size_t argN2 = 0 , size_t argN3 = 0
|
LayoutTileLeft( size_t argN0 = 0 , size_t argN1 = 0 , size_t argN2 = 0 , size_t argN3 = 0
|
||||||
, size_t argN4 = 0 , size_t argN5 = 0 , size_t argN6 = 0 , size_t argN7 = 0
|
, size_t argN4 = 0 , size_t argN5 = 0 , size_t argN6 = 0 , size_t argN7 = 0
|
||||||
)
|
)
|
||||||
|
|||||||
@ -114,11 +114,11 @@
|
|||||||
#error "#include <cuda.h> did not define CUDA_VERSION"
|
#error "#include <cuda.h> did not define CUDA_VERSION"
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#if ( CUDA_VERSION < 6050 )
|
#if ( CUDA_VERSION < 7000 )
|
||||||
// CUDA supports (inofficially) C++11 in device code starting with
|
// CUDA supports C++11 in device code starting with
|
||||||
// version 6.5. This includes auto type and device code internal
|
// version 7.0. This includes auto type and device code internal
|
||||||
// lambdas.
|
// lambdas.
|
||||||
#error "Cuda version 6.5 or greater required"
|
#error "Cuda version 7.0 or greater required"
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#if defined( __CUDA_ARCH__ ) && ( __CUDA_ARCH__ < 300 )
|
#if defined( __CUDA_ARCH__ ) && ( __CUDA_ARCH__ < 300 )
|
||||||
@ -127,16 +127,19 @@
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
#ifdef KOKKOS_CUDA_USE_LAMBDA
|
#ifdef KOKKOS_CUDA_USE_LAMBDA
|
||||||
#if ( CUDA_VERSION < 7000 )
|
#if ( CUDA_VERSION < 7050 )
|
||||||
// CUDA supports C++11 lambdas generated in host code to be given
|
// CUDA supports C++11 lambdas generated in host code to be given
|
||||||
// to the device starting with version 7.5. But the release candidate (7.5.6)
|
// to the device starting with version 7.5. But the release candidate (7.5.6)
|
||||||
// still identifies as 7.0
|
// still identifies as 7.0
|
||||||
#error "Cuda version 7.5 or greater required for host-to-device Lambda support"
|
#error "Cuda version 7.5 or greater required for host-to-device Lambda support"
|
||||||
#endif
|
#endif
|
||||||
#if ( CUDA_VERSION < 8000 )
|
#if ( CUDA_VERSION < 8000 ) && defined(__NVCC__)
|
||||||
#define KOKKOS_LAMBDA [=]__device__
|
#define KOKKOS_LAMBDA [=]__device__
|
||||||
#else
|
#else
|
||||||
#define KOKKOS_LAMBDA [=]__host__ __device__
|
#define KOKKOS_LAMBDA [=]__host__ __device__
|
||||||
|
#if defined( KOKKOS_HAVE_CXX1Z )
|
||||||
|
#define KOKKOS_CLASS_LAMBDA [=,*this] __host__ __device__
|
||||||
|
#endif
|
||||||
#endif
|
#endif
|
||||||
#define KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
|
#define KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
|
||||||
#endif
|
#endif
|
||||||
@ -145,7 +148,7 @@
|
|||||||
|
|
||||||
#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
|
#if defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
|
||||||
// Cuda version 8.0 still needs the functor wrapper
|
// Cuda version 8.0 still needs the functor wrapper
|
||||||
#if (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA /* && (CUDA_VERSION < 8000) */ )
|
#if (KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA /* && (CUDA_VERSION < 8000) */ ) && defined(__NVCC__)
|
||||||
#define KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
|
#define KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
|
||||||
#endif
|
#endif
|
||||||
#endif
|
#endif
|
||||||
@ -153,13 +156,12 @@
|
|||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
/* Language info: C++, CUDA, OPENMP */
|
/* Language info: C++, CUDA, OPENMP */
|
||||||
|
|
||||||
#if defined( __CUDA_ARCH__ ) && defined( KOKKOS_HAVE_CUDA )
|
#if defined( KOKKOS_HAVE_CUDA )
|
||||||
// Compiling Cuda code to 'ptx'
|
// Compiling Cuda code to 'ptx'
|
||||||
|
|
||||||
#define KOKKOS_FORCEINLINE_FUNCTION __device__ __host__ __forceinline__
|
#define KOKKOS_FORCEINLINE_FUNCTION __device__ __host__ __forceinline__
|
||||||
#define KOKKOS_INLINE_FUNCTION __device__ __host__ inline
|
#define KOKKOS_INLINE_FUNCTION __device__ __host__ inline
|
||||||
#define KOKKOS_FUNCTION __device__ __host__
|
#define KOKKOS_FUNCTION __device__ __host__
|
||||||
|
|
||||||
#endif /* #if defined( __CUDA_ARCH__ ) */
|
#endif /* #if defined( __CUDA_ARCH__ ) */
|
||||||
|
|
||||||
#if defined( _OPENMP )
|
#if defined( _OPENMP )
|
||||||
@ -184,10 +186,12 @@
|
|||||||
|
|
||||||
#else
|
#else
|
||||||
#if defined( KOKKOS_HAVE_CXX11 ) && ! defined( KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA )
|
#if defined( KOKKOS_HAVE_CXX11 ) && ! defined( KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA )
|
||||||
|
#if !defined (KOKKOS_HAVE_CUDA) // Compiling with clang for Cuda does not work with LAMBDAs either
|
||||||
// CUDA (including version 6.5) does not support giving lambdas as
|
// CUDA (including version 6.5) does not support giving lambdas as
|
||||||
// arguments to global functions. Thus its not currently possible
|
// arguments to global functions. Thus its not currently possible
|
||||||
// to dispatch lambdas from the host.
|
// to dispatch lambdas from the host.
|
||||||
#define KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
|
#define KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
|
||||||
|
#endif
|
||||||
#endif
|
#endif
|
||||||
#endif /* #if defined( __NVCC__ ) */
|
#endif /* #if defined( __NVCC__ ) */
|
||||||
|
|
||||||
@ -195,7 +199,11 @@
|
|||||||
#define KOKKOS_LAMBDA [=]
|
#define KOKKOS_LAMBDA [=]
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#if ! defined( __CUDA_ARCH__ ) /* Not compiling Cuda code to 'ptx'. */
|
#if defined( KOKKOS_HAVE_CXX1Z ) && !defined (KOKKOS_CLASS_LAMBDA)
|
||||||
|
#define KOKKOS_CLASS_LAMBDA [=,*this]
|
||||||
|
#endif
|
||||||
|
|
||||||
|
//#if ! defined( __CUDA_ARCH__ ) /* Not compiling Cuda code to 'ptx'. */
|
||||||
|
|
||||||
/* Intel compiler for host code */
|
/* Intel compiler for host code */
|
||||||
|
|
||||||
@ -243,7 +251,7 @@
|
|||||||
#endif
|
#endif
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#endif /* #if ! defined( __CUDA_ARCH__ ) */
|
//#endif /* #if ! defined( __CUDA_ARCH__ ) */
|
||||||
|
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
/*--------------------------------------------------------------------------*/
|
/*--------------------------------------------------------------------------*/
|
||||||
@ -257,6 +265,20 @@
|
|||||||
#define KOKKOS_HAVE_PRAGMA_VECTOR 1
|
#define KOKKOS_HAVE_PRAGMA_VECTOR 1
|
||||||
#define KOKKOS_HAVE_PRAGMA_SIMD 1
|
#define KOKKOS_HAVE_PRAGMA_SIMD 1
|
||||||
|
|
||||||
|
#define KOKKOS_RESTRICT __restrict__
|
||||||
|
|
||||||
|
#ifndef KOKKOS_ALIGN
|
||||||
|
#define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef KOKKOS_ALIGN_PTR
|
||||||
|
#define KOKKOS_ALIGN_PTR(size) __attribute__((align_value(size)))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef KOKKOS_ALIGN_SIZE
|
||||||
|
#define KOKKOS_ALIGN_SIZE 64
|
||||||
|
#endif
|
||||||
|
|
||||||
#if ( 1400 > KOKKOS_COMPILER_INTEL )
|
#if ( 1400 > KOKKOS_COMPILER_INTEL )
|
||||||
#if ( 1300 > KOKKOS_COMPILER_INTEL )
|
#if ( 1300 > KOKKOS_COMPILER_INTEL )
|
||||||
#error "Compiling with Intel version earlier than 13.0 is not supported. Official minimal version is 14.0."
|
#error "Compiling with Intel version earlier than 13.0 is not supported. Official minimal version is 14.0."
|
||||||
@ -264,11 +286,11 @@
|
|||||||
#warning "Compiling with Intel version 13.x probably works but is not officially supported. Official minimal version is 14.0."
|
#warning "Compiling with Intel version 13.x probably works but is not officially supported. Official minimal version is 14.0."
|
||||||
#endif
|
#endif
|
||||||
#endif
|
#endif
|
||||||
#if ( 1200 <= KOKKOS_COMPILER_INTEL ) && ! defined( KOKKOS_ENABLE_ASM ) && ! defined( _WIN32 )
|
#if ! defined( KOKKOS_ENABLE_ASM ) && ! defined( _WIN32 )
|
||||||
#define KOKKOS_ENABLE_ASM 1
|
#define KOKKOS_ENABLE_ASM 1
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#if ( 1200 <= KOKKOS_COMPILER_INTEL ) && ! defined( KOKKOS_FORCEINLINE_FUNCTION )
|
#if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
|
||||||
#if !defined (_WIN32)
|
#if !defined (_WIN32)
|
||||||
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
|
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
|
||||||
#else
|
#else
|
||||||
@ -335,14 +357,11 @@
|
|||||||
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
|
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#if ! defined( KOKKOS_ENABLE_ASM ) && \
|
#if ! defined( KOKKOS_ENABLE_ASM ) && ! defined( __PGIC__ ) && \
|
||||||
! ( defined( __powerpc) || \
|
( defined( __amd64 ) || \
|
||||||
defined(__powerpc__) || \
|
defined( __amd64__ ) || \
|
||||||
defined(__powerpc64__) || \
|
defined( __x86_64 ) || \
|
||||||
defined(__POWERPC__) || \
|
defined( __x86_64__ ) )
|
||||||
defined(__ppc__) || \
|
|
||||||
defined(__ppc64__) || \
|
|
||||||
defined(__PGIC__) )
|
|
||||||
#define KOKKOS_ENABLE_ASM 1
|
#define KOKKOS_ENABLE_ASM 1
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
@ -385,10 +404,30 @@
|
|||||||
#define KOKKOS_FUNCTION /**/
|
#define KOKKOS_FUNCTION /**/
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
///** Define empty macro for restrict if necessary: */
|
||||||
|
|
||||||
|
#if ! defined(KOKKOS_RESTRICT)
|
||||||
|
#define KOKKOS_RESTRICT
|
||||||
|
#endif
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
/** Define Macro for alignment: */
|
/** Define Macro for alignment: */
|
||||||
|
#if ! defined KOKKOS_ALIGN_SIZE
|
||||||
|
#define KOKKOS_ALIGN_SIZE 16
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#if ! defined(KOKKOS_ALIGN)
|
||||||
|
#define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#if ! defined(KOKKOS_ALIGN_PTR)
|
||||||
|
#define KOKKOS_ALIGN_PTR(size) __attribute__((aligned(size)))
|
||||||
|
#endif
|
||||||
|
|
||||||
#if ! defined(KOKKOS_ALIGN_16)
|
#if ! defined(KOKKOS_ALIGN_16)
|
||||||
#define KOKKOS_ALIGN_16 __attribute__((aligned(16)))
|
#define KOKKOS_ALIGN_16 KOKKOS_ALIGN(16)
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
@ -456,10 +495,6 @@
|
|||||||
* are no longer supported.
|
* are no longer supported.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#if defined( KOKKOS_USING_DEPRECATED_VIEW )
|
|
||||||
#error "Kokkos deprecated View has been removed"
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#define KOKKOS_USING_EXP_VIEW 1
|
#define KOKKOS_USING_EXP_VIEW 1
|
||||||
#define KOKKOS_USING_EXPERIMENTAL_VIEW
|
#define KOKKOS_USING_EXPERIMENTAL_VIEW
|
||||||
|
|
||||||
|
|||||||
@ -49,7 +49,7 @@
|
|||||||
#include <Kokkos_Atomic.hpp>
|
#include <Kokkos_Atomic.hpp>
|
||||||
#include <impl/Kokkos_BitOps.hpp>
|
#include <impl/Kokkos_BitOps.hpp>
|
||||||
#include <impl/Kokkos_Error.hpp>
|
#include <impl/Kokkos_Error.hpp>
|
||||||
#include <impl/KokkosExp_SharedAlloc.hpp>
|
#include <impl/Kokkos_SharedAlloc.hpp>
|
||||||
|
|
||||||
#include <limits>
|
#include <limits>
|
||||||
#include <algorithm>
|
#include <algorithm>
|
||||||
@ -70,12 +70,6 @@
|
|||||||
//#define KOKKOS_MEMPOOL_PRINT_PAGE_INFO
|
//#define KOKKOS_MEMPOOL_PRINT_PAGE_INFO
|
||||||
//#define KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
|
//#define KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
|
||||||
|
|
||||||
// A superblock is considered full when this percentage of its pages are full.
|
|
||||||
#define KOKKOS_MEMPOOL_SB_FULL_FRACTION 0.80
|
|
||||||
|
|
||||||
// A page is considered full when this percentage of its blocks are full.
|
|
||||||
#define KOKKOS_MEMPOOL_PAGE_FULL_FRACTION 0.875 // 28 / 32
|
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
@ -128,7 +122,7 @@ struct bitset_count
|
|||||||
{ dst += src; }
|
{ dst += src; }
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void operator()( size_type i, value_type & count) const
|
void operator()( size_type i, value_type & count ) const
|
||||||
{
|
{
|
||||||
count += Kokkos::Impl::bit_count( m_words[i] );
|
count += Kokkos::Impl::bit_count( m_words[i] );
|
||||||
}
|
}
|
||||||
@ -183,7 +177,7 @@ public:
|
|||||||
|
|
||||||
size_type count() const
|
size_type count() const
|
||||||
{
|
{
|
||||||
size_type val;
|
size_type val = 0;
|
||||||
bitset_count< Bitset > bc( m_words, m_num_words, val );
|
bitset_count< Bitset > bc( m_words, m_num_words, val );
|
||||||
return val;
|
return val;
|
||||||
}
|
}
|
||||||
@ -232,6 +226,20 @@ public:
|
|||||||
return atomic_fetch_and( &m_words[ word_pos ], ~mask ) & mask;
|
return atomic_fetch_and( &m_words[ word_pos ], ~mask ) & mask;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
|
Kokkos::pair< bool, word_type >
|
||||||
|
fetch_word_set( size_type i ) const
|
||||||
|
{
|
||||||
|
size_type word_pos = i >> LG_WORD_SIZE;
|
||||||
|
word_type mask = word_type(1) << ( i & WORD_MASK );
|
||||||
|
|
||||||
|
Kokkos::pair<bool, word_type> result;
|
||||||
|
result.second = atomic_fetch_or( &m_words[ word_pos ], mask );
|
||||||
|
result.first = !( result.second & mask );
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
Kokkos::pair< bool, word_type >
|
Kokkos::pair< bool, word_type >
|
||||||
fetch_word_reset( size_type i ) const
|
fetch_word_reset( size_type i ) const
|
||||||
@ -247,12 +255,10 @@ public:
|
|||||||
}
|
}
|
||||||
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
Kokkos::pair< bool, size_type >
|
Kokkos::pair< bool, word_type >
|
||||||
set_any_in_word( size_type i, word_type & prev_val ) const
|
set_any_in_word( size_type & pos ) const
|
||||||
{
|
{
|
||||||
prev_val = 0;
|
size_type word_pos = pos >> LG_WORD_SIZE;
|
||||||
|
|
||||||
size_type word_pos = i >> LG_WORD_SIZE;
|
|
||||||
word_type word = volatile_load( &m_words[ word_pos ] );
|
word_type word = volatile_load( &m_words[ word_pos ] );
|
||||||
|
|
||||||
// Loop until there are no more unset bits in the word.
|
// Loop until there are no more unset bits in the word.
|
||||||
@ -261,28 +267,26 @@ public:
|
|||||||
size_type bit = Kokkos::Impl::bit_scan_forward( ~word );
|
size_type bit = Kokkos::Impl::bit_scan_forward( ~word );
|
||||||
|
|
||||||
// Try to set the bit.
|
// Try to set the bit.
|
||||||
word_type mask = word_type(1) << bit;
|
word_type mask = word_type(1) << bit;
|
||||||
word = atomic_fetch_or( &m_words[ word_pos ], mask );
|
word = atomic_fetch_or( &m_words[ word_pos ], mask );
|
||||||
|
|
||||||
if ( !( word & mask ) ) {
|
if ( !( word & mask ) ) {
|
||||||
// Successfully set the bit.
|
// Successfully set the bit.
|
||||||
prev_val = word;
|
pos = ( word_pos << LG_WORD_SIZE ) + bit;
|
||||||
|
|
||||||
return Kokkos::pair<bool, size_type>( true, ( word_pos << LG_WORD_SIZE ) + bit );
|
return Kokkos::pair<bool, word_type>( true, word );
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Didn't find a free bit in this word.
|
// Didn't find a free bit in this word.
|
||||||
return Kokkos::pair<bool, size_type>( false, i );
|
return Kokkos::pair<bool, word_type>( false, word_type(0) );
|
||||||
}
|
}
|
||||||
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
Kokkos::pair< bool, size_type >
|
Kokkos::pair< bool, word_type >
|
||||||
set_any_in_word( size_type i, word_type & prev_val, word_type word_mask ) const
|
set_any_in_word( size_type & pos, word_type word_mask ) const
|
||||||
{
|
{
|
||||||
prev_val = 0;
|
size_type word_pos = pos >> LG_WORD_SIZE;
|
||||||
|
|
||||||
size_type word_pos = i >> LG_WORD_SIZE;
|
|
||||||
word_type word = volatile_load( &m_words[ word_pos ] );
|
word_type word = volatile_load( &m_words[ word_pos ] );
|
||||||
word = ( ~word ) & word_mask;
|
word = ( ~word ) & word_mask;
|
||||||
|
|
||||||
@ -292,30 +296,28 @@ public:
|
|||||||
size_type bit = Kokkos::Impl::bit_scan_forward( word );
|
size_type bit = Kokkos::Impl::bit_scan_forward( word );
|
||||||
|
|
||||||
// Try to set the bit.
|
// Try to set the bit.
|
||||||
word_type mask = word_type(1) << bit;
|
word_type mask = word_type(1) << bit;
|
||||||
word = atomic_fetch_or( &m_words[ word_pos ], mask );
|
word = atomic_fetch_or( &m_words[ word_pos ], mask );
|
||||||
|
|
||||||
if ( !( word & mask ) ) {
|
if ( !( word & mask ) ) {
|
||||||
// Successfully set the bit.
|
// Successfully set the bit.
|
||||||
prev_val = word;
|
pos = ( word_pos << LG_WORD_SIZE ) + bit;
|
||||||
|
|
||||||
return Kokkos::pair<bool, size_type>( true, ( word_pos << LG_WORD_SIZE ) + bit );
|
return Kokkos::pair<bool, word_type>( true, word );
|
||||||
}
|
}
|
||||||
|
|
||||||
word = ( ~word ) & word_mask;
|
word = ( ~word ) & word_mask;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Didn't find a free bit in this word.
|
// Didn't find a free bit in this word.
|
||||||
return Kokkos::pair<bool, size_type>( false, i );
|
return Kokkos::pair<bool, word_type>( false, word_type(0) );
|
||||||
}
|
}
|
||||||
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
Kokkos::pair< bool, size_type >
|
Kokkos::pair< bool, word_type >
|
||||||
reset_any_in_word( size_type i, word_type & prev_val ) const
|
reset_any_in_word( size_type & pos ) const
|
||||||
{
|
{
|
||||||
prev_val = 0;
|
size_type word_pos = pos >> LG_WORD_SIZE;
|
||||||
|
|
||||||
size_type word_pos = i >> LG_WORD_SIZE;
|
|
||||||
word_type word = volatile_load( &m_words[ word_pos ] );
|
word_type word = volatile_load( &m_words[ word_pos ] );
|
||||||
|
|
||||||
// Loop until there are no more set bits in the word.
|
// Loop until there are no more set bits in the word.
|
||||||
@ -324,28 +326,26 @@ public:
|
|||||||
size_type bit = Kokkos::Impl::bit_scan_forward( word );
|
size_type bit = Kokkos::Impl::bit_scan_forward( word );
|
||||||
|
|
||||||
// Try to reset the bit.
|
// Try to reset the bit.
|
||||||
word_type mask = word_type(1) << bit;
|
word_type mask = word_type(1) << bit;
|
||||||
word = atomic_fetch_and( &m_words[ word_pos ], ~mask );
|
word = atomic_fetch_and( &m_words[ word_pos ], ~mask );
|
||||||
|
|
||||||
if ( word & mask ) {
|
if ( word & mask ) {
|
||||||
// Successfully reset the bit.
|
// Successfully reset the bit.
|
||||||
prev_val = word;
|
pos = ( word_pos << LG_WORD_SIZE ) + bit;
|
||||||
|
|
||||||
return Kokkos::pair<bool, size_type>( true, ( word_pos << LG_WORD_SIZE ) + bit );
|
return Kokkos::pair<bool, word_type>( true, word );
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Didn't find a free bit in this word.
|
// Didn't find a free bit in this word.
|
||||||
return Kokkos::pair<bool, size_type>( false, i );
|
return Kokkos::pair<bool, word_type>( false, word_type(0) );
|
||||||
}
|
}
|
||||||
|
|
||||||
KOKKOS_FORCEINLINE_FUNCTION
|
KOKKOS_FORCEINLINE_FUNCTION
|
||||||
Kokkos::pair< bool, size_type >
|
Kokkos::pair< bool, word_type >
|
||||||
reset_any_in_word( size_type i, word_type & prev_val, word_type word_mask ) const
|
reset_any_in_word( size_type & pos, word_type word_mask ) const
|
||||||
{
|
{
|
||||||
prev_val = 0;
|
size_type word_pos = pos >> LG_WORD_SIZE;
|
||||||
|
|
||||||
size_type word_pos = i >> LG_WORD_SIZE;
|
|
||||||
word_type word = volatile_load( &m_words[ word_pos ] );
|
word_type word = volatile_load( &m_words[ word_pos ] );
|
||||||
word = word & word_mask;
|
word = word & word_mask;
|
||||||
|
|
||||||
@ -355,21 +355,21 @@ public:
|
|||||||
size_type bit = Kokkos::Impl::bit_scan_forward( word );
|
size_type bit = Kokkos::Impl::bit_scan_forward( word );
|
||||||
|
|
||||||
// Try to reset the bit.
|
// Try to reset the bit.
|
||||||
word_type mask = word_type(1) << bit;
|
word_type mask = word_type(1) << bit;
|
||||||
word = atomic_fetch_and( &m_words[ word_pos ], ~mask );
|
word = atomic_fetch_and( &m_words[ word_pos ], ~mask );
|
||||||
|
|
||||||
if ( word & mask ) {
|
if ( word & mask ) {
|
||||||
// Successfully reset the bit.
|
// Successfully reset the bit.
|
||||||
prev_val = word;
|
pos = ( word_pos << LG_WORD_SIZE ) + bit;
|
||||||
|
|
||||||
return Kokkos::pair<bool, size_type>( true, ( word_pos << LG_WORD_SIZE ) + bit );
|
return Kokkos::pair<bool, word_type>( true, word );
|
||||||
}
|
}
|
||||||
|
|
||||||
word = word & word_mask;
|
word = word & word_mask;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Didn't find a free bit in this word.
|
// Didn't find a free bit in this word.
|
||||||
return Kokkos::pair<bool, size_type>( false, i );
|
return Kokkos::pair<bool, word_type>( false, word_type(0) );
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -442,7 +442,7 @@ struct create_histogram {
|
|||||||
|
|
||||||
total_allocated_blocks += page_allocated_blocks;
|
total_allocated_blocks += page_allocated_blocks;
|
||||||
|
|
||||||
atomic_fetch_add( &m_page_histogram(page_allocated_blocks), 1 );
|
atomic_increment( &m_page_histogram(page_allocated_blocks) );
|
||||||
}
|
}
|
||||||
|
|
||||||
r.first += double(total_allocated_blocks) / blocks_per_sb;
|
r.first += double(total_allocated_blocks) / blocks_per_sb;
|
||||||
@ -609,7 +609,7 @@ public:
|
|||||||
};
|
};
|
||||||
|
|
||||||
private:
|
private:
|
||||||
typedef Impl::SharedAllocationTracker Tracker;
|
typedef Kokkos::Impl::SharedAllocationTracker Tracker;
|
||||||
typedef View< uint32_t *, device_type > UInt32View;
|
typedef View< uint32_t *, device_type > UInt32View;
|
||||||
typedef View< SuperblockHeader *, device_type > SBHeaderView;
|
typedef View< SuperblockHeader *, device_type > SBHeaderView;
|
||||||
|
|
||||||
@ -726,11 +726,11 @@ public:
|
|||||||
|
|
||||||
// Allocate memory for Views. This is done here instead of at construction
|
// Allocate memory for Views. This is done here instead of at construction
|
||||||
// so that the runtime checks can be performed before allocating memory.
|
// so that the runtime checks can be performed before allocating memory.
|
||||||
resize(m_active, m_num_block_size );
|
resize( m_active, m_num_block_size );
|
||||||
resize(m_sb_header, m_num_sb );
|
resize( m_sb_header, m_num_sb );
|
||||||
|
|
||||||
// Allocate superblock memory.
|
// Allocate superblock memory.
|
||||||
typedef Impl::SharedAllocationRecord< backend_memory_space, void > SharedRecord;
|
typedef Kokkos::Impl::SharedAllocationRecord< backend_memory_space, void > SharedRecord;
|
||||||
SharedRecord * rec =
|
SharedRecord * rec =
|
||||||
SharedRecord::allocate( memspace, "mempool", m_total_size );
|
SharedRecord::allocate( memspace, "mempool", m_total_size );
|
||||||
|
|
||||||
@ -751,10 +751,15 @@ public:
|
|||||||
m_ceil_num_sb * m_num_block_size );
|
m_ceil_num_sb * m_num_block_size );
|
||||||
|
|
||||||
// Initialize all active superblocks to be invalid.
|
// Initialize all active superblocks to be invalid.
|
||||||
typename UInt32View::HostMirror host_active = create_mirror_view(m_active);
|
typename UInt32View::HostMirror host_active = create_mirror_view( m_active );
|
||||||
for (size_t i = 0; i < m_num_block_size; ++i) host_active(i) = INVALID_SUPERBLOCK;
|
for ( size_t i = 0; i < m_num_block_size; ++i ) host_active(i) = INVALID_SUPERBLOCK;
|
||||||
|
deep_copy( m_active, host_active );
|
||||||
|
|
||||||
deep_copy(m_active, host_active);
|
// A superblock is considered full when this percentage of its pages are full.
|
||||||
|
const double superblock_full_fraction = .8;
|
||||||
|
|
||||||
|
// A page is considered full when this percentage of its blocks are full.
|
||||||
|
const double page_full_fraction = .875;
|
||||||
|
|
||||||
// Initialize the blocksize info.
|
// Initialize the blocksize info.
|
||||||
for ( size_t i = 0; i < m_num_block_size; ++i ) {
|
for ( size_t i = 0; i < m_num_block_size; ++i ) {
|
||||||
@ -767,7 +772,7 @@ public:
|
|||||||
|
|
||||||
// Set the full level for the superblock.
|
// Set the full level for the superblock.
|
||||||
m_blocksize_info[i].m_sb_full_level =
|
m_blocksize_info[i].m_sb_full_level =
|
||||||
static_cast<uint32_t>( pages_per_sb * KOKKOS_MEMPOOL_SB_FULL_FRACTION );
|
static_cast<uint32_t>( pages_per_sb * superblock_full_fraction );
|
||||||
|
|
||||||
if ( m_blocksize_info[i].m_sb_full_level == 0 ) {
|
if ( m_blocksize_info[i].m_sb_full_level == 0 ) {
|
||||||
m_blocksize_info[i].m_sb_full_level = 1;
|
m_blocksize_info[i].m_sb_full_level = 1;
|
||||||
@ -778,7 +783,7 @@ public:
|
|||||||
blocks_per_sb < BLOCKS_PER_PAGE ? blocks_per_sb : BLOCKS_PER_PAGE;
|
blocks_per_sb < BLOCKS_PER_PAGE ? blocks_per_sb : BLOCKS_PER_PAGE;
|
||||||
|
|
||||||
m_blocksize_info[i].m_page_full_level =
|
m_blocksize_info[i].m_page_full_level =
|
||||||
static_cast<uint32_t>( blocks_per_page * KOKKOS_MEMPOOL_PAGE_FULL_FRACTION );
|
static_cast<uint32_t>( blocks_per_page * page_full_fraction );
|
||||||
|
|
||||||
if ( m_blocksize_info[i].m_page_full_level == 0 ) {
|
if ( m_blocksize_info[i].m_page_full_level == 0 ) {
|
||||||
m_blocksize_info[i].m_page_full_level = 1;
|
m_blocksize_info[i].m_page_full_level = 1;
|
||||||
@ -820,7 +825,7 @@ public:
|
|||||||
/// \brief The actual block size allocated given alloc_size.
|
/// \brief The actual block size allocated given alloc_size.
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
size_t allocate_block_size( const size_t alloc_size ) const
|
size_t allocate_block_size( const size_t alloc_size ) const
|
||||||
{ return size_t(1) << ( get_block_size_index( alloc_size ) + LG_MIN_BLOCK_SIZE); }
|
{ return size_t(1) << ( get_block_size_index( alloc_size ) + LG_MIN_BLOCK_SIZE ); }
|
||||||
|
|
||||||
/// \brief Allocate a chunk of memory.
|
/// \brief Allocate a chunk of memory.
|
||||||
/// \param alloc_size Size of the requested allocated in number of bytes.
|
/// \param alloc_size Size of the requested allocated in number of bytes.
|
||||||
@ -834,27 +839,41 @@ public:
|
|||||||
|
|
||||||
// Only support allocations up to the superblock size. Just return 0
|
// Only support allocations up to the superblock size. Just return 0
|
||||||
// (failed allocation) for any size above this.
|
// (failed allocation) for any size above this.
|
||||||
if (alloc_size <= m_sb_size )
|
if ( alloc_size <= m_sb_size )
|
||||||
{
|
{
|
||||||
int block_size_id = get_block_size_index( alloc_size );
|
int block_size_id = get_block_size_index( alloc_size );
|
||||||
uint32_t blocks_per_sb = m_blocksize_info[block_size_id].m_blocks_per_sb;
|
uint32_t blocks_per_sb = m_blocksize_info[block_size_id].m_blocks_per_sb;
|
||||||
uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;
|
uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;
|
||||||
|
|
||||||
|
#ifdef KOKKOS_CUDA_CLANG_WORKAROUND
|
||||||
|
// Without this test it looks like pages_per_sb might come back wrong.
|
||||||
|
if ( pages_per_sb == 0 ) return NULL;
|
||||||
|
#endif
|
||||||
|
|
||||||
unsigned word_size = blocks_per_sb > 32 ? 32 : blocks_per_sb;
|
unsigned word_size = blocks_per_sb > 32 ? 32 : blocks_per_sb;
|
||||||
unsigned word_mask = ( uint64_t(1) << word_size ) - 1;
|
unsigned word_mask = ( uint64_t(1) << word_size ) - 1;
|
||||||
|
|
||||||
|
// Instead of forcing an atomic read to guarantee the updated value,
|
||||||
|
// reading the old value is actually beneficial because more threads will
|
||||||
|
// attempt allocations on the old active superblock instead of waiting on
|
||||||
|
// the new active superblock. This will help hide the latency of
|
||||||
|
// switching the active superblock.
|
||||||
uint32_t sb_id = volatile_load( &m_active(block_size_id) );
|
uint32_t sb_id = volatile_load( &m_active(block_size_id) );
|
||||||
|
|
||||||
// If the active is locked, keep reading it until the lock is released.
|
// If the active is locked, keep reading it atomically until the lock is
|
||||||
|
// released.
|
||||||
while ( sb_id == SUPERBLOCK_LOCK ) {
|
while ( sb_id == SUPERBLOCK_LOCK ) {
|
||||||
sb_id = volatile_load( &m_active(block_size_id) );
|
sb_id = atomic_fetch_or( &m_active(block_size_id), uint32_t(0) );
|
||||||
}
|
}
|
||||||
|
|
||||||
|
load_fence();
|
||||||
|
|
||||||
bool allocation_done = false;
|
bool allocation_done = false;
|
||||||
|
|
||||||
while (!allocation_done) {
|
while ( !allocation_done ) {
|
||||||
bool need_new_sb = false;
|
bool need_new_sb = false;
|
||||||
|
|
||||||
if (sb_id != INVALID_SUPERBLOCK) {
|
if ( sb_id != INVALID_SUPERBLOCK ) {
|
||||||
// Use the value from the clock register as the hash value.
|
// Use the value from the clock register as the hash value.
|
||||||
uint64_t hash_val = get_clock_register();
|
uint64_t hash_val = get_clock_register();
|
||||||
|
|
||||||
@ -875,12 +894,11 @@ public:
|
|||||||
|
|
||||||
bool search_done = false;
|
bool search_done = false;
|
||||||
|
|
||||||
while (!search_done) {
|
while ( !search_done ) {
|
||||||
bool success;
|
bool success = false;
|
||||||
unsigned prev_val;
|
unsigned prev_val = 0;
|
||||||
|
|
||||||
Kokkos::tie( success, pos ) =
|
Kokkos::tie( success, prev_val ) = m_sb_blocks.set_any_in_word( pos, word_mask );
|
||||||
m_sb_blocks.set_any_in_word( pos, prev_val, word_mask );
|
|
||||||
|
|
||||||
if ( !success ) {
|
if ( !success ) {
|
||||||
if ( ++pages_searched >= pages_per_sb ) {
|
if ( ++pages_searched >= pages_per_sb ) {
|
||||||
@ -905,6 +923,8 @@ public:
|
|||||||
}
|
}
|
||||||
else {
|
else {
|
||||||
// Reserved a memory location to allocate.
|
// Reserved a memory location to allocate.
|
||||||
|
memory_fence();
|
||||||
|
|
||||||
search_done = true;
|
search_done = true;
|
||||||
allocation_done = true;
|
allocation_done = true;
|
||||||
|
|
||||||
@ -918,7 +938,7 @@ public:
|
|||||||
if ( used_bits == 0 ) {
|
if ( used_bits == 0 ) {
|
||||||
// This page was empty. Decrement the number of empty pages for
|
// This page was empty. Decrement the number of empty pages for
|
||||||
// the superblock.
|
// the superblock.
|
||||||
atomic_fetch_sub( &m_sb_header(sb_id).m_empty_pages, 1 );
|
atomic_decrement( &m_sb_header(sb_id).m_empty_pages );
|
||||||
}
|
}
|
||||||
else if ( used_bits == m_blocksize_info[block_size_id].m_page_full_level - 1 )
|
else if ( used_bits == m_blocksize_info[block_size_id].m_page_full_level - 1 )
|
||||||
{
|
{
|
||||||
@ -962,7 +982,7 @@ public:
|
|||||||
#ifdef KOKKOS_MEMPOOL_PRINT_INFO
|
#ifdef KOKKOS_MEMPOOL_PRINT_INFO
|
||||||
else {
|
else {
|
||||||
printf( "** Requested allocation size (%zu) larger than superblock size (%lu). **\n",
|
printf( "** Requested allocation size (%zu) larger than superblock size (%lu). **\n",
|
||||||
alloc_size, m_sb_size);
|
alloc_size, m_sb_size );
|
||||||
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||||
fflush( stdout );
|
fflush( stdout );
|
||||||
#endif
|
#endif
|
||||||
@ -997,8 +1017,10 @@ public:
|
|||||||
uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
|
uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
|
||||||
uint32_t pos_rel = offset >> lg_block_size;
|
uint32_t pos_rel = offset >> lg_block_size;
|
||||||
|
|
||||||
bool success;
|
bool success = false;
|
||||||
unsigned prev_val;
|
unsigned prev_val = 0;
|
||||||
|
|
||||||
|
memory_fence();
|
||||||
|
|
||||||
Kokkos::tie( success, prev_val ) = m_sb_blocks.fetch_word_reset( pos_base + pos_rel );
|
Kokkos::tie( success, prev_val ) = m_sb_blocks.fetch_word_reset( pos_base + pos_rel );
|
||||||
|
|
||||||
@ -1023,7 +1045,7 @@ public:
|
|||||||
volatile_store( &m_sb_header(sb_id).m_empty_pages, uint32_t(0) );
|
volatile_store( &m_sb_header(sb_id).m_empty_pages, uint32_t(0) );
|
||||||
volatile_store( &m_sb_header(sb_id).m_lg_block_size, uint32_t(0) );
|
volatile_store( &m_sb_header(sb_id).m_lg_block_size, uint32_t(0) );
|
||||||
|
|
||||||
memory_fence();
|
store_fence();
|
||||||
|
|
||||||
m_empty_sb.set( sb_id );
|
m_empty_sb.set( sb_id );
|
||||||
}
|
}
|
||||||
@ -1088,7 +1110,7 @@ public:
|
|||||||
printf( "\n" );
|
printf( "\n" );
|
||||||
|
|
||||||
#ifdef KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
|
#ifdef KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
|
||||||
typename SBHeaderView::HostMirror host_sb_header = create_mirror_view(m_sb_header);
|
typename SBHeaderView::HostMirror host_sb_header = create_mirror_view( m_sb_header );
|
||||||
deep_copy( host_sb_header, m_sb_header );
|
deep_copy( host_sb_header, m_sb_header );
|
||||||
|
|
||||||
UInt32View num_allocated_blocks( "Allocated Blocks", m_num_sb );
|
UInt32View num_allocated_blocks( "Allocated Blocks", m_num_sb );
|
||||||
@ -1101,7 +1123,7 @@ public:
|
|||||||
}
|
}
|
||||||
|
|
||||||
typename UInt32View::HostMirror host_num_allocated_blocks =
|
typename UInt32View::HostMirror host_num_allocated_blocks =
|
||||||
create_mirror_view(num_allocated_blocks);
|
create_mirror_view( num_allocated_blocks );
|
||||||
deep_copy( host_num_allocated_blocks, num_allocated_blocks );
|
deep_copy( host_num_allocated_blocks, num_allocated_blocks );
|
||||||
|
|
||||||
// Print header info of all superblocks.
|
// Print header info of all superblocks.
|
||||||
@ -1135,7 +1157,7 @@ public:
|
|||||||
m_lg_max_sb_blocks, LG_MIN_BLOCK_SIZE, BLOCKS_PER_PAGE, result );
|
m_lg_max_sb_blocks, LG_MIN_BLOCK_SIZE, BLOCKS_PER_PAGE, result );
|
||||||
}
|
}
|
||||||
|
|
||||||
typename UInt32View::HostMirror host_page_histogram = create_mirror_view(page_histogram);
|
typename UInt32View::HostMirror host_page_histogram = create_mirror_view( page_histogram );
|
||||||
deep_copy( host_page_histogram, page_histogram );
|
deep_copy( host_page_histogram, page_histogram );
|
||||||
|
|
||||||
// Find the used and total pages and blocks.
|
// Find the used and total pages and blocks.
|
||||||
@ -1158,8 +1180,8 @@ public:
|
|||||||
double percent_used_blocks = total_blocks == 0 ? 0.0 : double(used_blocks) / total_blocks;
|
double percent_used_blocks = total_blocks == 0 ? 0.0 : double(used_blocks) / total_blocks;
|
||||||
|
|
||||||
// Count active superblocks.
|
// Count active superblocks.
|
||||||
typename UInt32View::HostMirror host_active = create_mirror_view(m_active);
|
typename UInt32View::HostMirror host_active = create_mirror_view( m_active );
|
||||||
deep_copy(host_active, m_active);
|
deep_copy( host_active, m_active );
|
||||||
|
|
||||||
unsigned num_active_sb = 0;
|
unsigned num_active_sb = 0;
|
||||||
for ( size_t i = 0; i < m_num_block_size; ++i ) {
|
for ( size_t i = 0; i < m_num_block_size; ++i ) {
|
||||||
@ -1224,6 +1246,7 @@ public:
|
|||||||
// Print the blocks used for each page of a few individual superblocks.
|
// Print the blocks used for each page of a few individual superblocks.
|
||||||
for ( uint32_t i = 0; i < num_sb_id; ++i ) {
|
for ( uint32_t i = 0; i < num_sb_id; ++i ) {
|
||||||
uint32_t lg_block_size = host_sb_header(sb_id[i]).m_lg_block_size;
|
uint32_t lg_block_size = host_sb_header(sb_id[i]).m_lg_block_size;
|
||||||
|
|
||||||
if ( lg_block_size != 0 ) {
|
if ( lg_block_size != 0 ) {
|
||||||
printf( "SB_ID BLOCK ID USED_BLOCKS\n" );
|
printf( "SB_ID BLOCK ID USED_BLOCKS\n" );
|
||||||
|
|
||||||
@ -1249,16 +1272,16 @@ public:
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
printf( " Used blocks: %10u / %10u = %10.6lf\n", used_blocks, total_blocks,
|
printf( " Used blocks: %10u / %10u = %10.6lf\n", used_blocks, total_blocks,
|
||||||
percent_used_blocks );
|
percent_used_blocks );
|
||||||
printf( " Used pages: %10u / %10u = %10.6lf\n", used_pages, total_pages,
|
printf( " Used pages: %10u / %10u = %10.6lf\n", used_pages, total_pages,
|
||||||
percent_used_pages );
|
percent_used_pages );
|
||||||
printf( " Used SB: %10zu / %10zu = %10.6lf\n", m_num_sb - num_empty_sb, m_num_sb,
|
printf( " Used SB: %10zu / %10zu = %10.6lf\n", m_num_sb - num_empty_sb, m_num_sb,
|
||||||
percent_used_sb );
|
percent_used_sb );
|
||||||
printf( " Active SB: %10u\n", num_active_sb );
|
printf( " Active SB: %10u\n", num_active_sb );
|
||||||
printf( " Empty SB: %10u\n", num_empty_sb );
|
printf( " Empty SB: %10u\n", num_empty_sb );
|
||||||
printf( " Partfull SB: %10u\n", num_partfull_sb );
|
printf( " Partfull SB: %10u\n", num_partfull_sb );
|
||||||
printf( " Full SB: %10lu\n",
|
printf( " Full SB: %10lu\n",
|
||||||
m_num_sb - num_active_sb - num_empty_sb - num_partfull_sb );
|
m_num_sb - num_active_sb - num_empty_sb - num_partfull_sb );
|
||||||
printf( "Ave. SB Full %%: %10.6lf\n", ave_sb_full );
|
printf( "Ave. SB Full %%: %10.6lf\n", ave_sb_full );
|
||||||
printf( "\n" );
|
printf( "\n" );
|
||||||
fflush( stdout );
|
fflush( stdout );
|
||||||
@ -1316,6 +1339,8 @@ private:
|
|||||||
uint32_t lock_sb =
|
uint32_t lock_sb =
|
||||||
Kokkos::atomic_compare_exchange( &m_active(block_size_id), old_sb, SUPERBLOCK_LOCK );
|
Kokkos::atomic_compare_exchange( &m_active(block_size_id), old_sb, SUPERBLOCK_LOCK );
|
||||||
|
|
||||||
|
load_fence();
|
||||||
|
|
||||||
// Initialize the new superblock to be the previous one so the previous
|
// Initialize the new superblock to be the previous one so the previous
|
||||||
// superblock is returned if a new superblock can't be found.
|
// superblock is returned if a new superblock can't be found.
|
||||||
uint32_t new_sb = lock_sb;
|
uint32_t new_sb = lock_sb;
|
||||||
@ -1334,11 +1359,11 @@ private:
|
|||||||
// size's bitset.
|
// size's bitset.
|
||||||
unsigned pos = block_size_id * m_ceil_num_sb;
|
unsigned pos = block_size_id * m_ceil_num_sb;
|
||||||
|
|
||||||
while (!search_done) {
|
while ( !search_done ) {
|
||||||
bool success = false;
|
bool success = false;
|
||||||
unsigned prev_val;
|
unsigned prev_val = 0;
|
||||||
|
|
||||||
Kokkos::tie( success, pos ) = m_partfull_sb.reset_any_in_word( pos, prev_val );
|
Kokkos::tie( success, prev_val ) = m_partfull_sb.reset_any_in_word( pos );
|
||||||
|
|
||||||
if ( !success ) {
|
if ( !success ) {
|
||||||
if ( ++tries >= max_tries ) {
|
if ( ++tries >= max_tries ) {
|
||||||
@ -1351,22 +1376,21 @@ private:
|
|||||||
}
|
}
|
||||||
else {
|
else {
|
||||||
// Found a superblock.
|
// Found a superblock.
|
||||||
|
|
||||||
|
// It is possible that the newly found superblock is the same as the
|
||||||
|
// old superblock. In this case putting the old value back in yields
|
||||||
|
// correct behavior. This could happen as follows. This thread
|
||||||
|
// grabs the lock and transitions the superblock to the full state.
|
||||||
|
// Before it searches for a new superblock, other threads perform
|
||||||
|
// enough deallocations to transition the superblock to the partially
|
||||||
|
// full state. This thread then searches for a partially full
|
||||||
|
// superblock and finds the one it removed. There's potential for
|
||||||
|
// this to cause a performance issue if the same superblock keeps
|
||||||
|
// being removed and added due to the right mix and ordering of
|
||||||
|
// allocations and deallocations.
|
||||||
search_done = true;
|
search_done = true;
|
||||||
new_sb = pos - block_size_id * m_ceil_num_sb;
|
new_sb = pos - block_size_id * m_ceil_num_sb;
|
||||||
|
|
||||||
// Assertions:
|
|
||||||
// 1. A different superblock than the current should be found.
|
|
||||||
#ifdef KOKKOS_MEMPOOL_PRINTERR
|
|
||||||
if ( new_sb == lock_sb ) {
|
|
||||||
printf( "\n** MemoryPool::find_superblock() FOUND_SAME_SUPERBLOCK: %u **\n",
|
|
||||||
new_sb);
|
|
||||||
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
|
||||||
fflush( stdout );
|
|
||||||
#endif
|
|
||||||
Kokkos::abort( "" );
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
// Set the head status for the superblock.
|
// Set the head status for the superblock.
|
||||||
volatile_store( &m_sb_header(new_sb).m_is_active, uint32_t(true) );
|
volatile_store( &m_sb_header(new_sb).m_is_active, uint32_t(true) );
|
||||||
|
|
||||||
@ -1376,7 +1400,7 @@ private:
|
|||||||
volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
|
volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
|
||||||
}
|
}
|
||||||
|
|
||||||
memory_fence();
|
store_fence();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -1389,11 +1413,11 @@ private:
|
|||||||
// size's bitset.
|
// size's bitset.
|
||||||
pos = 0;
|
pos = 0;
|
||||||
|
|
||||||
while (!search_done) {
|
while ( !search_done ) {
|
||||||
bool success = false;
|
bool success = false;
|
||||||
unsigned prev_val;
|
unsigned prev_val = 0;
|
||||||
|
|
||||||
Kokkos::tie( success, pos ) = m_empty_sb.reset_any_in_word( pos, prev_val );
|
Kokkos::tie( success, prev_val ) = m_empty_sb.reset_any_in_word( pos );
|
||||||
|
|
||||||
if ( !success ) {
|
if ( !success ) {
|
||||||
if ( ++tries >= max_tries ) {
|
if ( ++tries >= max_tries ) {
|
||||||
@ -1406,22 +1430,22 @@ private:
|
|||||||
}
|
}
|
||||||
else {
|
else {
|
||||||
// Found a superblock.
|
// Found a superblock.
|
||||||
|
|
||||||
|
// It is possible that the newly found superblock is the same as
|
||||||
|
// the old superblock. In this case putting the old value back in
|
||||||
|
// yields correct behavior. This could happen as follows. This
|
||||||
|
// thread grabs the lock and transitions the superblock to the full
|
||||||
|
// state. Before it searches for a new superblock, other threads
|
||||||
|
// perform enough deallocations to transition the superblock to the
|
||||||
|
// partially full state and then the empty state. This thread then
|
||||||
|
// searches for a partially full superblock and none exist. This
|
||||||
|
// thread then searches for an empty superblock and finds the one
|
||||||
|
// it removed. The likelihood of this happening is so remote that
|
||||||
|
// the potential for this to cause a performance issue is
|
||||||
|
// infinitesimal.
|
||||||
search_done = true;
|
search_done = true;
|
||||||
new_sb = pos;
|
new_sb = pos;
|
||||||
|
|
||||||
// Assertions:
|
|
||||||
// 1. A different superblock than the current should be found.
|
|
||||||
#ifdef KOKKOS_MEMPOOL_PRINTERR
|
|
||||||
if ( new_sb == lock_sb ) {
|
|
||||||
printf( "\n** MemoryPool::find_superblock() FOUND_SAME_SUPERBLOCK: %u **\n",
|
|
||||||
new_sb);
|
|
||||||
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
|
||||||
fflush( stdout );
|
|
||||||
#endif
|
|
||||||
Kokkos::abort( "" );
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
// Set the empty pages, block size, and head status for the
|
// Set the empty pages, block size, and head status for the
|
||||||
// superblock.
|
// superblock.
|
||||||
volatile_store( &m_sb_header(new_sb).m_empty_pages,
|
volatile_store( &m_sb_header(new_sb).m_empty_pages,
|
||||||
@ -1436,7 +1460,7 @@ private:
|
|||||||
volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
|
volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
|
||||||
}
|
}
|
||||||
|
|
||||||
memory_fence();
|
store_fence();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -1445,14 +1469,17 @@ private:
|
|||||||
atomic_exchange( &m_active(block_size_id), new_sb );
|
atomic_exchange( &m_active(block_size_id), new_sb );
|
||||||
}
|
}
|
||||||
else {
|
else {
|
||||||
// Either another thread has the lock and is switching the active superblock for
|
// Either another thread has the lock and is switching the active
|
||||||
// this block size or another thread has already changed the active superblock
|
// superblock for this block size or another thread has already changed
|
||||||
// since this thread read its value. Keep reading the active superblock until
|
// the active superblock since this thread read its value. Keep
|
||||||
// it isn't locked to get the new active superblock.
|
// atomically reading the active superblock until it isn't locked to get
|
||||||
|
// the new active superblock.
|
||||||
do {
|
do {
|
||||||
new_sb = volatile_load( &m_active(block_size_id) );
|
new_sb = atomic_fetch_or( &m_active(block_size_id), uint32_t(0) );
|
||||||
} while ( new_sb == SUPERBLOCK_LOCK );
|
} while ( new_sb == SUPERBLOCK_LOCK );
|
||||||
|
|
||||||
|
load_fence();
|
||||||
|
|
||||||
// Assertions:
|
// Assertions:
|
||||||
// 1. An invalid superblock should never be found here.
|
// 1. An invalid superblock should never be found here.
|
||||||
// 2. If the new superblock is the same as the previous superblock, the
|
// 2. If the new superblock is the same as the previous superblock, the
|
||||||
@ -1477,14 +1504,25 @@ private:
|
|||||||
{
|
{
|
||||||
#if defined( __CUDA_ARCH__ )
|
#if defined( __CUDA_ARCH__ )
|
||||||
// Return value of 64-bit hi-res clock register.
|
// Return value of 64-bit hi-res clock register.
|
||||||
return clock64();
|
return clock64();
|
||||||
#elif defined( __i386__ ) || defined( __x86_64 )
|
#elif defined( __i386__ ) || defined( __x86_64 )
|
||||||
// Return value of 64-bit hi-res clock register.
|
// Return value of 64-bit hi-res clock register.
|
||||||
unsigned a, d;
|
unsigned a = 0, d = 0;
|
||||||
__asm__ volatile("rdtsc" : "=a" (a), "=d" (d));
|
|
||||||
return ( (uint64_t) a) | ( ( (uint64_t) d ) << 32 );
|
__asm__ volatile( "rdtsc" : "=a" (a), "=d" (d) );
|
||||||
|
|
||||||
|
return ( (uint64_t) a ) | ( ( (uint64_t) d ) << 32 );
|
||||||
|
#elif defined( __powerpc ) || defined( __powerpc__ ) || defined( __powerpc64__ ) || \
|
||||||
|
defined( __POWERPC__ ) || defined( __ppc__ ) || defined( __ppc64__ )
|
||||||
|
unsigned int cycles = 0;
|
||||||
|
|
||||||
|
asm volatile( "mftb %0" : "=r" (cycles) );
|
||||||
|
|
||||||
|
return (uint64_t) cycles;
|
||||||
#else
|
#else
|
||||||
const uint64_t ticks = std::chrono::high_resolution_clock::now().time_since_epoch().count();
|
const uint64_t ticks =
|
||||||
|
std::chrono::high_resolution_clock::now().time_since_epoch().count();
|
||||||
|
|
||||||
return ticks;
|
return ticks;
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
@ -1517,7 +1555,4 @@ private:
|
|||||||
#undef KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
|
#undef KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#undef KOKKOS_MEMPOOL_SB_FULL_FRACTION
|
|
||||||
#undef KOKKOS_MEMPOOL_PAGE_FULL_FRACTION
|
|
||||||
|
|
||||||
#endif // KOKKOS_MEMORYPOOL_HPP
|
#endif // KOKKOS_MEMORYPOOL_HPP
|
||||||
|
|||||||
@ -63,6 +63,8 @@ enum MemoryTraitsFlags
|
|||||||
{ Unmanaged = 0x01
|
{ Unmanaged = 0x01
|
||||||
, RandomAccess = 0x02
|
, RandomAccess = 0x02
|
||||||
, Atomic = 0x04
|
, Atomic = 0x04
|
||||||
|
, Restrict = 0x08
|
||||||
|
, Aligned = 0x10
|
||||||
};
|
};
|
||||||
|
|
||||||
template < unsigned T >
|
template < unsigned T >
|
||||||
@ -73,6 +75,8 @@ struct MemoryTraits {
|
|||||||
enum { Unmanaged = T & unsigned(Kokkos::Unmanaged) };
|
enum { Unmanaged = T & unsigned(Kokkos::Unmanaged) };
|
||||||
enum { RandomAccess = T & unsigned(Kokkos::RandomAccess) };
|
enum { RandomAccess = T & unsigned(Kokkos::RandomAccess) };
|
||||||
enum { Atomic = T & unsigned(Kokkos::Atomic) };
|
enum { Atomic = T & unsigned(Kokkos::Atomic) };
|
||||||
|
enum { Restrict = T & unsigned(Kokkos::Restrict) };
|
||||||
|
enum { Aligned = T & unsigned(Kokkos::Aligned) };
|
||||||
|
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@ -58,7 +58,7 @@
|
|||||||
#endif
|
#endif
|
||||||
#include <Kokkos_ScratchSpace.hpp>
|
#include <Kokkos_ScratchSpace.hpp>
|
||||||
#include <Kokkos_Parallel.hpp>
|
#include <Kokkos_Parallel.hpp>
|
||||||
#include <Kokkos_TaskPolicy.hpp>
|
#include <Kokkos_TaskScheduler.hpp>
|
||||||
#include <Kokkos_Layout.hpp>
|
#include <Kokkos_Layout.hpp>
|
||||||
#include <impl/Kokkos_Tags.hpp>
|
#include <impl/Kokkos_Tags.hpp>
|
||||||
|
|
||||||
@ -160,6 +160,17 @@ public:
|
|||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess
|
||||||
|
< Kokkos::OpenMP::memory_space
|
||||||
|
, Kokkos::OpenMP::scratch_memory_space
|
||||||
|
>
|
||||||
|
{
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = false };
|
||||||
|
};
|
||||||
|
|
||||||
template<>
|
template<>
|
||||||
struct VerifyExecutionCanAccessMemorySpace
|
struct VerifyExecutionCanAccessMemorySpace
|
||||||
< Kokkos::OpenMP::memory_space
|
< Kokkos::OpenMP::memory_space
|
||||||
|
|||||||
@ -53,7 +53,8 @@ struct is_reducer_type {
|
|||||||
|
|
||||||
template<class T>
|
template<class T>
|
||||||
struct is_reducer_type<T,typename std::enable_if<
|
struct is_reducer_type<T,typename std::enable_if<
|
||||||
std::is_same<T,typename T::reducer_type>::value
|
std::is_same<typename std::remove_cv<T>::type,
|
||||||
|
typename std::remove_cv<typename T::reducer_type>::type>::value
|
||||||
>::type> {
|
>::type> {
|
||||||
enum { value = 1 };
|
enum { value = 1 };
|
||||||
};
|
};
|
||||||
@ -726,6 +727,119 @@ public:
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
template<class Scalar>
|
||||||
|
struct MinMaxScalar {
|
||||||
|
Scalar min_val,max_val;
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void operator = (const MinMaxScalar& rhs) {
|
||||||
|
min_val = rhs.min_val;
|
||||||
|
max_val = rhs.max_val;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void operator = (const volatile MinMaxScalar& rhs) volatile {
|
||||||
|
min_val = rhs.min_val;
|
||||||
|
max_val = rhs.max_val;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
template<class Scalar, class Space = HostSpace>
|
||||||
|
struct MinMax {
|
||||||
|
private:
|
||||||
|
typedef typename std::remove_cv<Scalar>::type scalar_type;
|
||||||
|
|
||||||
|
public:
|
||||||
|
//Required
|
||||||
|
typedef MinMax reducer_type;
|
||||||
|
typedef MinMaxScalar<scalar_type> value_type;
|
||||||
|
|
||||||
|
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
|
||||||
|
|
||||||
|
scalar_type min_init_value;
|
||||||
|
scalar_type max_init_value;
|
||||||
|
|
||||||
|
private:
|
||||||
|
result_view_type result;
|
||||||
|
|
||||||
|
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
|
||||||
|
struct MinInitWrapper;
|
||||||
|
|
||||||
|
template<class ValueType >
|
||||||
|
struct MinInitWrapper<ValueType,true> {
|
||||||
|
static ValueType value() {
|
||||||
|
return std::numeric_limits<scalar_type>::max();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
template<class ValueType >
|
||||||
|
struct MinInitWrapper<ValueType,false> {
|
||||||
|
static ValueType value() {
|
||||||
|
return scalar_type();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
|
||||||
|
struct MaxInitWrapper;
|
||||||
|
|
||||||
|
template<class ValueType >
|
||||||
|
struct MaxInitWrapper<ValueType,true> {
|
||||||
|
static ValueType value() {
|
||||||
|
return std::numeric_limits<scalar_type>::min();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
template<class ValueType >
|
||||||
|
struct MaxInitWrapper<ValueType,false> {
|
||||||
|
static ValueType value() {
|
||||||
|
return scalar_type();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
MinMax(value_type& result_):
|
||||||
|
min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(&result_) {}
|
||||||
|
MinMax(const result_view_type& result_):
|
||||||
|
min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(result_) {}
|
||||||
|
MinMax(value_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
|
||||||
|
min_init_value(min_init_value_),max_init_value(max_init_value_),result(&result_) {}
|
||||||
|
MinMax(const result_view_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
|
||||||
|
min_init_value(min_init_value_),max_init_value(max_init_value_),result(result_) {}
|
||||||
|
|
||||||
|
//Required
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void join(value_type& dest, const value_type& src) const {
|
||||||
|
if ( src.min_val < dest.min_val ) {
|
||||||
|
dest.min_val = src.min_val;
|
||||||
|
}
|
||||||
|
if ( src.max_val > dest.max_val ) {
|
||||||
|
dest.max_val = src.max_val;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void join(volatile value_type& dest, const volatile value_type& src) const {
|
||||||
|
if ( src.min_val < dest.min_val ) {
|
||||||
|
dest.min_val = src.min_val;
|
||||||
|
}
|
||||||
|
if ( src.max_val > dest.max_val ) {
|
||||||
|
dest.max_val = src.max_val;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
//Optional
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void init( value_type& val) const {
|
||||||
|
val.min_val = min_init_value;
|
||||||
|
val.max_val = max_init_value;
|
||||||
|
}
|
||||||
|
|
||||||
|
result_view_type result_view() const {
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
template<class Scalar, class Index>
|
template<class Scalar, class Index>
|
||||||
struct MinMaxLocScalar {
|
struct MinMaxLocScalar {
|
||||||
Scalar min_val,max_val;
|
Scalar min_val,max_val;
|
||||||
@ -1124,7 +1238,8 @@ void parallel_reduce(const PolicyType& policy,
|
|||||||
typename Impl::enable_if<
|
typename Impl::enable_if<
|
||||||
Kokkos::Impl::is_execution_policy<PolicyType>::value
|
Kokkos::Impl::is_execution_policy<PolicyType>::value
|
||||||
>::type * = 0) {
|
>::type * = 0) {
|
||||||
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,const ReturnType>::execute("",policy,functor,return_value);
|
ReturnType return_value_impl = return_value;
|
||||||
|
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute("",policy,functor,return_value_impl);
|
||||||
}
|
}
|
||||||
|
|
||||||
template< class FunctorType, class ReturnType >
|
template< class FunctorType, class ReturnType >
|
||||||
@ -1133,8 +1248,8 @@ void parallel_reduce(const size_t& policy,
|
|||||||
const FunctorType& functor,
|
const FunctorType& functor,
|
||||||
const ReturnType& return_value) {
|
const ReturnType& return_value) {
|
||||||
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
|
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
|
||||||
|
ReturnType return_value_impl = return_value;
|
||||||
Impl::ParallelReduceAdaptor<policy_type,FunctorType,const ReturnType>::execute("",policy_type(0,policy),functor,return_value);
|
Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute("",policy_type(0,policy),functor,return_value_impl);
|
||||||
}
|
}
|
||||||
|
|
||||||
template< class FunctorType, class ReturnType >
|
template< class FunctorType, class ReturnType >
|
||||||
@ -1144,7 +1259,8 @@ void parallel_reduce(const std::string& label,
|
|||||||
const FunctorType& functor,
|
const FunctorType& functor,
|
||||||
const ReturnType& return_value) {
|
const ReturnType& return_value) {
|
||||||
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
|
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
|
||||||
Impl::ParallelReduceAdaptor<policy_type,FunctorType,const ReturnType>::execute(label,policy_type(0,policy),functor,return_value);
|
ReturnType return_value_impl = return_value;
|
||||||
|
Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute(label,policy_type(0,policy),functor,return_value_impl);
|
||||||
}
|
}
|
||||||
|
|
||||||
// No Return Argument
|
// No Return Argument
|
||||||
|
|||||||
@ -144,6 +144,17 @@ public:
|
|||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess
|
||||||
|
< Kokkos::Qthread::memory_space
|
||||||
|
, Kokkos::Qthread::scratch_memory_space
|
||||||
|
>
|
||||||
|
{
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = false };
|
||||||
|
};
|
||||||
|
|
||||||
template<>
|
template<>
|
||||||
struct VerifyExecutionCanAccessMemorySpace
|
struct VerifyExecutionCanAccessMemorySpace
|
||||||
< Kokkos::Qthread::memory_space
|
< Kokkos::Qthread::memory_space
|
||||||
|
|||||||
@ -50,7 +50,7 @@
|
|||||||
#include <cstddef>
|
#include <cstddef>
|
||||||
#include <iosfwd>
|
#include <iosfwd>
|
||||||
#include <Kokkos_Parallel.hpp>
|
#include <Kokkos_Parallel.hpp>
|
||||||
#include <Kokkos_TaskPolicy.hpp>
|
#include <Kokkos_TaskScheduler.hpp>
|
||||||
#include <Kokkos_Layout.hpp>
|
#include <Kokkos_Layout.hpp>
|
||||||
#include <Kokkos_HostSpace.hpp>
|
#include <Kokkos_HostSpace.hpp>
|
||||||
#include <Kokkos_ScratchSpace.hpp>
|
#include <Kokkos_ScratchSpace.hpp>
|
||||||
@ -59,7 +59,6 @@
|
|||||||
#include <impl/Kokkos_FunctorAdapter.hpp>
|
#include <impl/Kokkos_FunctorAdapter.hpp>
|
||||||
#include <impl/Kokkos_Profiling_Interface.hpp>
|
#include <impl/Kokkos_Profiling_Interface.hpp>
|
||||||
|
|
||||||
|
|
||||||
#include <KokkosExp_MDRangePolicy.hpp>
|
#include <KokkosExp_MDRangePolicy.hpp>
|
||||||
|
|
||||||
#if defined( KOKKOS_HAVE_SERIAL )
|
#if defined( KOKKOS_HAVE_SERIAL )
|
||||||
@ -192,6 +191,17 @@ public:
|
|||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess
|
||||||
|
< Kokkos::Serial::memory_space
|
||||||
|
, Kokkos::Serial::scratch_memory_space
|
||||||
|
>
|
||||||
|
{
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = false };
|
||||||
|
};
|
||||||
|
|
||||||
template<>
|
template<>
|
||||||
struct VerifyExecutionCanAccessMemorySpace
|
struct VerifyExecutionCanAccessMemorySpace
|
||||||
< Kokkos::Serial::memory_space
|
< Kokkos::Serial::memory_space
|
||||||
@ -250,7 +260,6 @@ public:
|
|||||||
const scratch_memory_space & thread_scratch(int) const
|
const scratch_memory_space & thread_scratch(int) const
|
||||||
{ return m_space ; }
|
{ return m_space ; }
|
||||||
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
|
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
|
||||||
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
|
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
|
||||||
KOKKOS_INLINE_FUNCTION int team_rank() const { return 0 ; }
|
KOKKOS_INLINE_FUNCTION int team_rank() const { return 0 ; }
|
||||||
@ -306,10 +315,9 @@ public:
|
|||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* < Kokkos::Serial , WorkArgTag >
|
* < Kokkos::Serial , WorkArgTag >
|
||||||
* < WorkArgTag , Impl::enable_if< Impl::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value >::type >
|
* < WorkArgTag , Impl::enable_if< std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value >::type >
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
@ -402,7 +410,6 @@ public:
|
|||||||
, m_chunk_size ( 32 )
|
, m_chunk_size ( 32 )
|
||||||
{}
|
{}
|
||||||
|
|
||||||
|
|
||||||
inline int chunk_size() const { return m_chunk_size ; }
|
inline int chunk_size() const { return m_chunk_size ; }
|
||||||
|
|
||||||
/** \brief set chunk_size to a discrete value*/
|
/** \brief set chunk_size to a discrete value*/
|
||||||
@ -525,7 +532,6 @@ private:
|
|||||||
const ReducerType m_reducer ;
|
const ReducerType m_reducer ;
|
||||||
const pointer_type m_result_ptr ;
|
const pointer_type m_result_ptr ;
|
||||||
|
|
||||||
|
|
||||||
template< class TagType >
|
template< class TagType >
|
||||||
inline
|
inline
|
||||||
typename std::enable_if< std::is_same< TagType , void >::value >::type
|
typename std::enable_if< std::is_same< TagType , void >::value >::type
|
||||||
@ -895,20 +901,22 @@ struct TeamThreadRangeBoundariesStruct<iType,SerialTeamMember> {
|
|||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
|
|
||||||
template<typename iType>
|
template< typename iType >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>
|
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>
|
||||||
TeamThreadRange( const Impl::SerialTeamMember& thread, const iType & count )
|
TeamThreadRange( const Impl::SerialTeamMember& thread, const iType & count )
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>(thread,count);
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::SerialTeamMember >( thread, count );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template< typename iType1, typename iType2 >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>
|
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
|
||||||
TeamThreadRange( const Impl::SerialTeamMember& thread, const iType & begin , const iType & end )
|
Impl::SerialTeamMember >
|
||||||
|
TeamThreadRange( const Impl::SerialTeamMember& thread, const iType1 & begin, const iType2 & end )
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>(thread,begin,end);
|
typedef typename std::common_type< iType1, iType2 >::type iType;
|
||||||
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::SerialTeamMember >( thread, iType(begin), iType(end) );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template<typename iType>
|
||||||
@ -1113,4 +1121,3 @@ void single(const Impl::ThreadSingleStruct<Impl::SerialTeamMember>& , const Func
|
|||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
700
lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
Normal file
700
lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
Normal file
@ -0,0 +1,700 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef KOKKOS_TASKSCHEDULER_HPP
|
||||||
|
#define KOKKOS_TASKSCHEDULER_HPP
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
#include <Kokkos_Core_fwd.hpp>
|
||||||
|
|
||||||
|
// If compiling with CUDA then must be using CUDA 8 or better
|
||||||
|
// and use relocateable device code to enable the task policy.
|
||||||
|
// nvcc relocatable device code option: --relocatable-device-code=true
|
||||||
|
|
||||||
|
#if ( defined( KOKKOS_HAVE_CUDA ) )
|
||||||
|
#if ( 8000 <= CUDA_VERSION ) && \
|
||||||
|
defined( KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE )
|
||||||
|
|
||||||
|
#define KOKKOS_ENABLE_TASKDAG
|
||||||
|
|
||||||
|
#endif
|
||||||
|
#else
|
||||||
|
#define KOKKOS_ENABLE_TASKDAG
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
#if defined( KOKKOS_ENABLE_TASKDAG )
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
#include <Kokkos_MemoryPool.hpp>
|
||||||
|
#include <impl/Kokkos_Tags.hpp>
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
|
||||||
|
// Forward declarations used in Impl::TaskQueue
|
||||||
|
|
||||||
|
template< typename Arg1 = void , typename Arg2 = void >
|
||||||
|
class Future ;
|
||||||
|
|
||||||
|
template< typename Space >
|
||||||
|
class TaskScheduler ;
|
||||||
|
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
#include <impl/Kokkos_TaskQueue.hpp>
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
|
||||||
|
/**
|
||||||
|
*
|
||||||
|
* Future< space > // value_type == void
|
||||||
|
* Future< value > // space == Default
|
||||||
|
* Future< value , space >
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
template< typename Arg1 , typename Arg2 >
|
||||||
|
class Future {
|
||||||
|
private:
|
||||||
|
|
||||||
|
template< typename > friend class TaskScheduler ;
|
||||||
|
template< typename , typename > friend class Future ;
|
||||||
|
template< typename , typename , typename > friend class Impl::TaskBase ;
|
||||||
|
|
||||||
|
enum { Arg1_is_space = Kokkos::is_space< Arg1 >::value };
|
||||||
|
enum { Arg2_is_space = Kokkos::is_space< Arg2 >::value };
|
||||||
|
enum { Arg1_is_value = ! Arg1_is_space &&
|
||||||
|
! std::is_same< Arg1 , void >::value };
|
||||||
|
enum { Arg2_is_value = ! Arg2_is_space &&
|
||||||
|
! std::is_same< Arg2 , void >::value };
|
||||||
|
|
||||||
|
static_assert( ! ( Arg1_is_space && Arg2_is_space )
|
||||||
|
, "Future cannot be given two spaces" );
|
||||||
|
|
||||||
|
static_assert( ! ( Arg1_is_value && Arg2_is_value )
|
||||||
|
, "Future cannot be given two value types" );
|
||||||
|
|
||||||
|
using ValueType =
|
||||||
|
typename std::conditional< Arg1_is_value , Arg1 ,
|
||||||
|
typename std::conditional< Arg2_is_value , Arg2 , void
|
||||||
|
>::type >::type ;
|
||||||
|
|
||||||
|
using Space =
|
||||||
|
typename std::conditional< Arg1_is_space , Arg1 ,
|
||||||
|
typename std::conditional< Arg2_is_space , Arg2 , void
|
||||||
|
>::type >::type ;
|
||||||
|
|
||||||
|
using task_base = Impl::TaskBase< Space , ValueType , void > ;
|
||||||
|
using queue_type = Impl::TaskQueue< Space > ;
|
||||||
|
|
||||||
|
task_base * m_task ;
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION explicit
|
||||||
|
Future( task_base * task ) : m_task(0)
|
||||||
|
{ if ( task ) queue_type::assign( & m_task , task ); }
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
using execution_space = typename Space::execution_space ;
|
||||||
|
using value_type = ValueType ;
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
bool is_null() const { return 0 == m_task ; }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
int reference_count() const
|
||||||
|
{ return 0 != m_task ? m_task->reference_count() : 0 ; }
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
void clear()
|
||||||
|
{ if ( m_task ) queue_type::assign( & m_task , (task_base*)0 ); }
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
~Future() { clear(); }
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
constexpr Future() noexcept : m_task(0) {}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
Future( Future && rhs )
|
||||||
|
: m_task( rhs.m_task ) { rhs.m_task = 0 ; }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
Future( const Future & rhs )
|
||||||
|
: m_task(0)
|
||||||
|
{ if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task ); }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
Future & operator = ( Future && rhs )
|
||||||
|
{
|
||||||
|
clear();
|
||||||
|
m_task = rhs.m_task ;
|
||||||
|
rhs.m_task = 0 ;
|
||||||
|
return *this ;
|
||||||
|
}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
Future & operator = ( const Future & rhs )
|
||||||
|
{
|
||||||
|
if ( m_task || rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
|
||||||
|
return *this ;
|
||||||
|
}
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
template< class A1 , class A2 >
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
Future( Future<A1,A2> && rhs )
|
||||||
|
: m_task( rhs.m_task )
|
||||||
|
{
|
||||||
|
static_assert
|
||||||
|
( std::is_same< Space , void >::value ||
|
||||||
|
std::is_same< Space , typename Future<A1,A2>::Space >::value
|
||||||
|
, "Assigned Futures must have the same space" );
|
||||||
|
|
||||||
|
static_assert
|
||||||
|
( std::is_same< value_type , void >::value ||
|
||||||
|
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
|
||||||
|
, "Assigned Futures must have the same value_type" );
|
||||||
|
|
||||||
|
rhs.m_task = 0 ;
|
||||||
|
}
|
||||||
|
|
||||||
|
template< class A1 , class A2 >
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
Future( const Future<A1,A2> & rhs )
|
||||||
|
: m_task(0)
|
||||||
|
{
|
||||||
|
static_assert
|
||||||
|
( std::is_same< Space , void >::value ||
|
||||||
|
std::is_same< Space , typename Future<A1,A2>::Space >::value
|
||||||
|
, "Assigned Futures must have the same space" );
|
||||||
|
|
||||||
|
static_assert
|
||||||
|
( std::is_same< value_type , void >::value ||
|
||||||
|
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
|
||||||
|
, "Assigned Futures must have the same value_type" );
|
||||||
|
|
||||||
|
if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
|
||||||
|
}
|
||||||
|
|
||||||
|
template< class A1 , class A2 >
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
Future & operator = ( const Future<A1,A2> & rhs )
|
||||||
|
{
|
||||||
|
static_assert
|
||||||
|
( std::is_same< Space , void >::value ||
|
||||||
|
std::is_same< Space , typename Future<A1,A2>::Space >::value
|
||||||
|
, "Assigned Futures must have the same space" );
|
||||||
|
|
||||||
|
static_assert
|
||||||
|
( std::is_same< value_type , void >::value ||
|
||||||
|
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
|
||||||
|
, "Assigned Futures must have the same value_type" );
|
||||||
|
|
||||||
|
if ( m_task || rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
|
||||||
|
return *this ;
|
||||||
|
}
|
||||||
|
|
||||||
|
template< class A1 , class A2 >
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
Future & operator = ( Future<A1,A2> && rhs )
|
||||||
|
{
|
||||||
|
static_assert
|
||||||
|
( std::is_same< Space , void >::value ||
|
||||||
|
std::is_same< Space , typename Future<A1,A2>::Space >::value
|
||||||
|
, "Assigned Futures must have the same space" );
|
||||||
|
|
||||||
|
static_assert
|
||||||
|
( std::is_same< value_type , void >::value ||
|
||||||
|
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
|
||||||
|
, "Assigned Futures must have the same value_type" );
|
||||||
|
|
||||||
|
clear();
|
||||||
|
m_task = rhs.m_task ;
|
||||||
|
rhs.m_task = 0 ;
|
||||||
|
return *this ;
|
||||||
|
}
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
typename task_base::get_return_type
|
||||||
|
get() const
|
||||||
|
{
|
||||||
|
if ( 0 == m_task ) {
|
||||||
|
Kokkos::abort( "Kokkos:::Future::get ERROR: is_null()");
|
||||||
|
}
|
||||||
|
return m_task->get();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
|
||||||
|
enum TaskType { TaskTeam = Impl::TaskBase<void,void,void>::TaskTeam
|
||||||
|
, TaskSingle = Impl::TaskBase<void,void,void>::TaskSingle };
|
||||||
|
|
||||||
|
enum TaskPriority { TaskHighPriority = 0
|
||||||
|
, TaskRegularPriority = 1
|
||||||
|
, TaskLowPriority = 2 };
|
||||||
|
|
||||||
|
template< typename Space >
|
||||||
|
void wait( TaskScheduler< Space > const & );
|
||||||
|
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
|
||||||
|
template< typename ExecSpace >
|
||||||
|
class TaskScheduler
|
||||||
|
{
|
||||||
|
private:
|
||||||
|
|
||||||
|
using track_type = Kokkos::Impl::SharedAllocationTracker ;
|
||||||
|
using queue_type = Kokkos::Impl::TaskQueue< ExecSpace > ;
|
||||||
|
using task_base = Impl::TaskBase< ExecSpace , void , void > ;
|
||||||
|
|
||||||
|
track_type m_track ;
|
||||||
|
queue_type * m_queue ;
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
// Process optional arguments to spawn and respawn functions
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION static
|
||||||
|
void assign( task_base * const ) {}
|
||||||
|
|
||||||
|
// TaskTeam or TaskSingle
|
||||||
|
template< typename ... Options >
|
||||||
|
KOKKOS_INLINE_FUNCTION static
|
||||||
|
void assign( task_base * const task
|
||||||
|
, TaskType const & arg
|
||||||
|
, Options const & ... opts )
|
||||||
|
{
|
||||||
|
task->m_task_type = arg ;
|
||||||
|
assign( task , opts ... );
|
||||||
|
}
|
||||||
|
|
||||||
|
// TaskHighPriority or TaskRegularPriority or TaskLowPriority
|
||||||
|
template< typename ... Options >
|
||||||
|
KOKKOS_INLINE_FUNCTION static
|
||||||
|
void assign( task_base * const task
|
||||||
|
, TaskPriority const & arg
|
||||||
|
, Options const & ... opts )
|
||||||
|
{
|
||||||
|
task->m_priority = arg ;
|
||||||
|
assign( task , opts ... );
|
||||||
|
}
|
||||||
|
|
||||||
|
// Future for a dependence
|
||||||
|
template< typename A1 , typename A2 , typename ... Options >
|
||||||
|
KOKKOS_INLINE_FUNCTION static
|
||||||
|
void assign( task_base * const task
|
||||||
|
, Future< A1 , A2 > const & arg
|
||||||
|
, Options const & ... opts )
|
||||||
|
{
|
||||||
|
// Assign dependence to task->m_next
|
||||||
|
// which will be processed within subsequent call to schedule.
|
||||||
|
// Error if the dependence is reset.
|
||||||
|
|
||||||
|
if ( 0 != Kokkos::atomic_exchange(& task->m_next, arg.m_task) ) {
|
||||||
|
Kokkos::abort("TaskScheduler ERROR: resetting task dependence");
|
||||||
|
}
|
||||||
|
|
||||||
|
if ( 0 != arg.m_task ) {
|
||||||
|
// The future may be destroyed upon returning from this call
|
||||||
|
// so increment reference count to track this assignment.
|
||||||
|
Kokkos::atomic_increment( &(arg.m_task->m_ref_count) );
|
||||||
|
}
|
||||||
|
|
||||||
|
assign( task , opts ... );
|
||||||
|
}
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
using execution_policy = TaskScheduler ;
|
||||||
|
using execution_space = ExecSpace ;
|
||||||
|
using memory_space = typename queue_type::memory_space ;
|
||||||
|
using member_type = Kokkos::Impl::TaskExec< ExecSpace > ;
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
TaskScheduler() : m_track(), m_queue(0) {}
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
TaskScheduler( TaskScheduler && rhs ) = default ;
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
TaskScheduler( TaskScheduler const & rhs ) = default ;
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
TaskScheduler & operator = ( TaskScheduler && rhs ) = default ;
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
TaskScheduler & operator = ( TaskScheduler const & rhs ) = default ;
|
||||||
|
|
||||||
|
TaskScheduler( memory_space const & arg_memory_space
|
||||||
|
, unsigned const arg_memory_pool_capacity
|
||||||
|
, unsigned const arg_memory_pool_log2_superblock = 12 )
|
||||||
|
: m_track()
|
||||||
|
, m_queue(0)
|
||||||
|
{
|
||||||
|
typedef Kokkos::Impl::SharedAllocationRecord
|
||||||
|
< memory_space , typename queue_type::Destroy >
|
||||||
|
record_type ;
|
||||||
|
|
||||||
|
record_type * record =
|
||||||
|
record_type::allocate( arg_memory_space
|
||||||
|
, "TaskQueue"
|
||||||
|
, sizeof(queue_type)
|
||||||
|
);
|
||||||
|
|
||||||
|
m_queue = new( record->data() )
|
||||||
|
queue_type( arg_memory_space
|
||||||
|
, arg_memory_pool_capacity
|
||||||
|
, arg_memory_pool_log2_superblock );
|
||||||
|
|
||||||
|
record->m_destroy.m_queue = m_queue ;
|
||||||
|
|
||||||
|
m_track.assign_allocated_record_to_uninitialized( record );
|
||||||
|
}
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
/**\brief Allocation size for a spawned task */
|
||||||
|
template< typename FunctorType >
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
size_t spawn_allocation_size() const
|
||||||
|
{
|
||||||
|
using task_type = Impl::TaskBase< execution_space
|
||||||
|
, typename FunctorType::value_type
|
||||||
|
, FunctorType > ;
|
||||||
|
|
||||||
|
return m_queue->allocate_block_size( sizeof(task_type) );
|
||||||
|
}
|
||||||
|
|
||||||
|
/**\brief Allocation size for a when_all aggregate */
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
size_t when_all_allocation_size( int narg ) const
|
||||||
|
{
|
||||||
|
using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;
|
||||||
|
|
||||||
|
return m_queue->allocate_block_size( sizeof(task_base) + narg * sizeof(task_base*) );
|
||||||
|
}
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
/**\brief A task spawns a task with options
|
||||||
|
*
|
||||||
|
* 1) High, Normal, or Low priority
|
||||||
|
* 2) With or without dependence
|
||||||
|
* 3) Team or Serial
|
||||||
|
*/
|
||||||
|
template< typename FunctorType , typename ... Options >
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
Future< typename FunctorType::value_type , ExecSpace >
|
||||||
|
task_spawn( FunctorType const & arg_functor
|
||||||
|
, Options const & ... arg_options
|
||||||
|
) const
|
||||||
|
{
|
||||||
|
using value_type = typename FunctorType::value_type ;
|
||||||
|
using future_type = Future< value_type , execution_space > ;
|
||||||
|
using task_type = Impl::TaskBase< execution_space
|
||||||
|
, value_type
|
||||||
|
, FunctorType > ;
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
// Give single-thread back-ends an opportunity to clear
|
||||||
|
// queue of ready tasks before allocating a new task
|
||||||
|
|
||||||
|
m_queue->iff_single_thread_recursive_execute();
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
future_type f ;
|
||||||
|
|
||||||
|
// Allocate task from memory pool
|
||||||
|
f.m_task =
|
||||||
|
reinterpret_cast< task_type * >(m_queue->allocate(sizeof(task_type)));
|
||||||
|
|
||||||
|
if ( f.m_task ) {
|
||||||
|
|
||||||
|
// Placement new construction
|
||||||
|
new ( f.m_task ) task_type( arg_functor );
|
||||||
|
|
||||||
|
// Reference count starts at two
|
||||||
|
// +1 for matching decrement when task is complete
|
||||||
|
// +1 for future
|
||||||
|
f.m_task->m_queue = m_queue ;
|
||||||
|
f.m_task->m_ref_count = 2 ;
|
||||||
|
f.m_task->m_alloc_size = sizeof(task_type);
|
||||||
|
|
||||||
|
assign( f.m_task , arg_options... );
|
||||||
|
|
||||||
|
// Spawning from within the execution space so the
|
||||||
|
// apply function pointer is guaranteed to be valid
|
||||||
|
f.m_task->m_apply = task_type::apply ;
|
||||||
|
|
||||||
|
m_queue->schedule( f.m_task );
|
||||||
|
// this task may be updated or executed at any moment
|
||||||
|
}
|
||||||
|
|
||||||
|
return f ;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**\brief The host process spawns a task with options
|
||||||
|
*
|
||||||
|
* 1) High, Normal, or Low priority
|
||||||
|
* 2) With or without dependence
|
||||||
|
* 3) Team or Serial
|
||||||
|
*/
|
||||||
|
template< typename FunctorType , typename ... Options >
|
||||||
|
inline
|
||||||
|
Future< typename FunctorType::value_type , ExecSpace >
|
||||||
|
host_spawn( FunctorType const & arg_functor
|
||||||
|
, Options const & ... arg_options
|
||||||
|
) const
|
||||||
|
{
|
||||||
|
using value_type = typename FunctorType::value_type ;
|
||||||
|
using future_type = Future< value_type , execution_space > ;
|
||||||
|
using task_type = Impl::TaskBase< execution_space
|
||||||
|
, value_type
|
||||||
|
, FunctorType > ;
|
||||||
|
|
||||||
|
if ( m_queue == 0 ) {
|
||||||
|
Kokkos::abort("Kokkos::TaskScheduler not initialized");
|
||||||
|
}
|
||||||
|
|
||||||
|
future_type f ;
|
||||||
|
|
||||||
|
// Allocate task from memory pool
|
||||||
|
f.m_task =
|
||||||
|
reinterpret_cast<task_type*>( m_queue->allocate(sizeof(task_type)) );
|
||||||
|
|
||||||
|
if ( f.m_task ) {
|
||||||
|
|
||||||
|
// Placement new construction
|
||||||
|
new( f.m_task ) task_type( arg_functor );
|
||||||
|
|
||||||
|
// Reference count starts at two:
|
||||||
|
// +1 to match decrement when task completes
|
||||||
|
// +1 for the future
|
||||||
|
f.m_task->m_queue = m_queue ;
|
||||||
|
f.m_task->m_ref_count = 2 ;
|
||||||
|
f.m_task->m_alloc_size = sizeof(task_type);
|
||||||
|
|
||||||
|
assign( f.m_task , arg_options... );
|
||||||
|
|
||||||
|
// Potentially spawning outside execution space so the
|
||||||
|
// apply function pointer must be obtained from execution space.
|
||||||
|
// Required for Cuda execution space function pointer.
|
||||||
|
queue_type::specialization::template
|
||||||
|
proc_set_apply< FunctorType >( & f.m_task->m_apply );
|
||||||
|
|
||||||
|
m_queue->schedule( f.m_task );
|
||||||
|
}
|
||||||
|
return f ;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**\brief Return a future that is complete
|
||||||
|
* when all input futures are complete.
|
||||||
|
*/
|
||||||
|
template< typename A1 , typename A2 >
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
Future< ExecSpace >
|
||||||
|
when_all( int narg , Future< A1 , A2 > const * const arg ) const
|
||||||
|
{
|
||||||
|
static_assert
|
||||||
|
( std::is_same< execution_space
|
||||||
|
, typename Future< A1 , A2 >::execution_space
|
||||||
|
>::value
|
||||||
|
, "Future must have same execution space" );
|
||||||
|
|
||||||
|
using future_type = Future< ExecSpace > ;
|
||||||
|
using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;
|
||||||
|
|
||||||
|
future_type f ;
|
||||||
|
|
||||||
|
size_t const size = sizeof(task_base) + narg * sizeof(task_base*);
|
||||||
|
|
||||||
|
f.m_task =
|
||||||
|
reinterpret_cast< task_base * >( m_queue->allocate( size ) );
|
||||||
|
|
||||||
|
if ( f.m_task ) {
|
||||||
|
|
||||||
|
new( f.m_task ) task_base();
|
||||||
|
|
||||||
|
// Reference count starts at two:
|
||||||
|
// +1 to match decrement when task completes
|
||||||
|
// +1 for the future
|
||||||
|
f.m_task->m_queue = m_queue ;
|
||||||
|
f.m_task->m_ref_count = 2 ;
|
||||||
|
f.m_task->m_alloc_size = size ;
|
||||||
|
f.m_task->m_dep_count = narg ;
|
||||||
|
f.m_task->m_task_type = task_base::Aggregate ;
|
||||||
|
|
||||||
|
task_base ** const dep = f.m_task->aggregate_dependences();
|
||||||
|
|
||||||
|
// Assign dependences to increment their reference count
|
||||||
|
// The futures may be destroyed upon returning from this call
|
||||||
|
// so increment reference count to track this assignment.
|
||||||
|
|
||||||
|
for ( int i = 0 ; i < narg ; ++i ) {
|
||||||
|
task_base * const t = dep[i] = arg[i].m_task ;
|
||||||
|
if ( 0 != t ) {
|
||||||
|
Kokkos::atomic_increment( &(t->m_ref_count) );
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
m_queue->schedule( f.m_task );
|
||||||
|
// this when_all may be processed at any moment
|
||||||
|
}
|
||||||
|
|
||||||
|
return f ;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**\brief An executing task respawns itself with options
|
||||||
|
*
|
||||||
|
* 1) High, Normal, or Low priority
|
||||||
|
* 2) With or without dependence
|
||||||
|
*/
|
||||||
|
template< class FunctorType , typename ... Options >
|
||||||
|
KOKKOS_FUNCTION
|
||||||
|
void respawn( FunctorType * task_self
|
||||||
|
, Options const & ... arg_options ) const
|
||||||
|
{
|
||||||
|
using value_type = typename FunctorType::value_type ;
|
||||||
|
using task_type = Impl::TaskBase< execution_space
|
||||||
|
, value_type
|
||||||
|
, FunctorType > ;
|
||||||
|
|
||||||
|
task_base * const zero = (task_base *) 0 ;
|
||||||
|
task_base * const lock = (task_base *) task_base::LockTag ;
|
||||||
|
task_type * const task = static_cast< task_type * >( task_self );
|
||||||
|
|
||||||
|
// Precondition:
|
||||||
|
// task is in Executing state
|
||||||
|
// therefore m_next == LockTag
|
||||||
|
//
|
||||||
|
// Change to m_next == 0 for no dependence
|
||||||
|
|
||||||
|
if ( lock != Kokkos::atomic_exchange( & task->m_next, zero ) ) {
|
||||||
|
Kokkos::abort("TaskScheduler::respawn ERROR: already respawned");
|
||||||
|
}
|
||||||
|
|
||||||
|
assign( task , arg_options... );
|
||||||
|
|
||||||
|
// Postcondition:
|
||||||
|
// task is in Executing-Respawn state
|
||||||
|
// therefore m_next == dependece or 0
|
||||||
|
}
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
template< typename S >
|
||||||
|
friend
|
||||||
|
void Kokkos::wait( Kokkos::TaskScheduler< S > const & );
|
||||||
|
|
||||||
|
//----------------------------------------
|
||||||
|
|
||||||
|
inline
|
||||||
|
int allocation_capacity() const noexcept
|
||||||
|
{ return m_queue->m_memory.get_mem_size(); }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
int allocated_task_count() const noexcept
|
||||||
|
{ return m_queue->m_count_alloc ; }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
int allocated_task_count_max() const noexcept
|
||||||
|
{ return m_queue->m_max_alloc ; }
|
||||||
|
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
long allocated_task_count_accum() const noexcept
|
||||||
|
{ return m_queue->m_accum_alloc ; }
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
template< typename ExecSpace >
|
||||||
|
inline
|
||||||
|
void wait( TaskScheduler< ExecSpace > const & policy )
|
||||||
|
{ policy.m_queue->execute(); }
|
||||||
|
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
|
||||||
|
#endif /* #ifndef KOKKOS_TASKSCHEDULER_HPP */
|
||||||
|
|
||||||
@ -189,6 +189,17 @@ public:
|
|||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
namespace Impl {
|
namespace Impl {
|
||||||
|
|
||||||
|
template<>
|
||||||
|
struct MemorySpaceAccess
|
||||||
|
< Kokkos::Threads::memory_space
|
||||||
|
, Kokkos::Threads::scratch_memory_space
|
||||||
|
>
|
||||||
|
{
|
||||||
|
enum { assignable = false };
|
||||||
|
enum { accessible = true };
|
||||||
|
enum { deepcopy = false };
|
||||||
|
};
|
||||||
|
|
||||||
template<>
|
template<>
|
||||||
struct VerifyExecutionCanAccessMemorySpace
|
struct VerifyExecutionCanAccessMemorySpace
|
||||||
< Kokkos::Threads::memory_space
|
< Kokkos::Threads::memory_space
|
||||||
|
|||||||
112
lib/kokkos/core/src/Kokkos_Timer.hpp
Normal file
112
lib/kokkos/core/src/Kokkos_Timer.hpp
Normal file
@ -0,0 +1,112 @@
|
|||||||
|
/*
|
||||||
|
//@HEADER
|
||||||
|
// ************************************************************************
|
||||||
|
//
|
||||||
|
// Kokkos v. 2.0
|
||||||
|
// Copyright (2014) Sandia Corporation
|
||||||
|
//
|
||||||
|
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||||
|
// the U.S. Government retains certain rights in this software.
|
||||||
|
//
|
||||||
|
// Redistribution and use in source and binary forms, with or without
|
||||||
|
// modification, are permitted provided that the following conditions are
|
||||||
|
// met:
|
||||||
|
//
|
||||||
|
// 1. Redistributions of source code must retain the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer.
|
||||||
|
//
|
||||||
|
// 2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
// notice, this list of conditions and the following disclaimer in the
|
||||||
|
// documentation and/or other materials provided with the distribution.
|
||||||
|
//
|
||||||
|
// 3. Neither the name of the Corporation nor the names of the
|
||||||
|
// contributors may be used to endorse or promote products derived from
|
||||||
|
// this software without specific prior written permission.
|
||||||
|
//
|
||||||
|
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||||
|
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||||
|
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
//
|
||||||
|
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||||
|
//
|
||||||
|
// ************************************************************************
|
||||||
|
//@HEADER
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef KOKKOS_TIMER_HPP
|
||||||
|
#define KOKKOS_TIMER_HPP
|
||||||
|
|
||||||
|
#include <stddef.h>
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#undef KOKKOS_USE_LIBRT
|
||||||
|
#include <gettimeofday.c>
|
||||||
|
#else
|
||||||
|
#ifdef KOKKOS_USE_LIBRT
|
||||||
|
#include <ctime>
|
||||||
|
#else
|
||||||
|
#include <sys/time.h>
|
||||||
|
#endif
|
||||||
|
#endif
|
||||||
|
|
||||||
|
namespace Kokkos {
|
||||||
|
|
||||||
|
/** \brief Time since construction */
|
||||||
|
|
||||||
|
class Timer {
|
||||||
|
private:
|
||||||
|
#ifdef KOKKOS_USE_LIBRT
|
||||||
|
struct timespec m_old;
|
||||||
|
#else
|
||||||
|
struct timeval m_old ;
|
||||||
|
#endif
|
||||||
|
Timer( const Timer & );
|
||||||
|
Timer & operator = ( const Timer & );
|
||||||
|
public:
|
||||||
|
|
||||||
|
inline
|
||||||
|
void reset() {
|
||||||
|
#ifdef KOKKOS_USE_LIBRT
|
||||||
|
clock_gettime(CLOCK_REALTIME, &m_old);
|
||||||
|
#else
|
||||||
|
gettimeofday( & m_old , ((struct timezone *) NULL ) );
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
inline
|
||||||
|
~Timer() {}
|
||||||
|
|
||||||
|
inline
|
||||||
|
Timer() { reset(); }
|
||||||
|
|
||||||
|
inline
|
||||||
|
double seconds() const
|
||||||
|
{
|
||||||
|
#ifdef KOKKOS_USE_LIBRT
|
||||||
|
struct timespec m_new;
|
||||||
|
clock_gettime(CLOCK_REALTIME, &m_new);
|
||||||
|
|
||||||
|
return ( (double) ( m_new.tv_sec - m_old.tv_sec ) ) +
|
||||||
|
( (double) ( m_new.tv_nsec - m_old.tv_nsec ) * 1.0e-9 );
|
||||||
|
#else
|
||||||
|
struct timeval m_new ;
|
||||||
|
|
||||||
|
gettimeofday( & m_new , ((struct timezone *) NULL ) );
|
||||||
|
|
||||||
|
return ( (double) ( m_new.tv_sec - m_old.tv_sec ) ) +
|
||||||
|
( (double) ( m_new.tv_usec - m_old.tv_usec ) * 1.0e-6 );
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
#endif /* #ifndef KOKKOS_TIMER_HPP */
|
||||||
File diff suppressed because it is too large
Load Diff
@ -1,24 +1,25 @@
|
|||||||
KOKKOS_PATH = ../..
|
ifndef KOKKOS_PATH
|
||||||
|
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
|
||||||
|
KOKKOS_PATH = $(subst Makefile,,$(MAKEFILE_PATH))../..
|
||||||
|
endif
|
||||||
|
|
||||||
PREFIX ?= /usr/local/lib/kokkos
|
PREFIX ?= /usr/local/lib/kokkos
|
||||||
|
|
||||||
default: messages build-lib
|
default: messages build-lib
|
||||||
echo "End Build"
|
echo "End Build"
|
||||||
|
|
||||||
include $(KOKKOS_PATH)/Makefile.kokkos
|
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||||
|
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
|
||||||
CXX = $(NVCC_WRAPPER)
|
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK = $(NVCC_WRAPPER)
|
|
||||||
LINKFLAGS ?=
|
|
||||||
else
|
else
|
||||||
CXX ?= g++
|
CXX = g++
|
||||||
CXXFLAGS ?= -O3
|
|
||||||
LINK ?= g++
|
|
||||||
LINKFLAGS ?=
|
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
CXXFLAGS = -O3
|
||||||
|
LINK ?= $(CXX)
|
||||||
|
LDFLAGS ?=
|
||||||
|
|
||||||
|
include $(KOKKOS_PATH)/Makefile.kokkos
|
||||||
|
|
||||||
PWD = $(shell pwd)
|
PWD = $(shell pwd)
|
||||||
|
|
||||||
KOKKOS_HEADERS_INCLUDE = $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
|
KOKKOS_HEADERS_INCLUDE = $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
|
||||||
@ -49,6 +50,16 @@ ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
|
|||||||
CONDITIONAL_COPIES += copy-openmp
|
CONDITIONAL_COPIES += copy-openmp
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
ifeq ($(KOKKOS_OS),CYGWIN)
|
||||||
|
COPY_FLAG = -u
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_OS),Linux)
|
||||||
|
COPY_FLAG = -u
|
||||||
|
endif
|
||||||
|
ifeq ($(KOKKOS_OS),Darwin)
|
||||||
|
COPY_FLAG =
|
||||||
|
endif
|
||||||
|
|
||||||
messages:
|
messages:
|
||||||
echo "Start Build"
|
echo "Start Build"
|
||||||
|
|
||||||
@ -77,6 +88,15 @@ build-makefile-kokkos:
|
|||||||
echo "KOKKOS_LINK_DEPENDS = $(KOKKOS_LINK_DEPENDS)" >> Makefile.kokkos
|
echo "KOKKOS_LINK_DEPENDS = $(KOKKOS_LINK_DEPENDS)" >> Makefile.kokkos
|
||||||
echo "KOKKOS_LIBS = $(KOKKOS_LIBS)" >> Makefile.kokkos
|
echo "KOKKOS_LIBS = $(KOKKOS_LIBS)" >> Makefile.kokkos
|
||||||
echo "KOKKOS_LDFLAGS = $(KOKKOS_LDFLAGS)" >> Makefile.kokkos
|
echo "KOKKOS_LDFLAGS = $(KOKKOS_LDFLAGS)" >> Makefile.kokkos
|
||||||
|
echo "" >> Makefile.kokkos
|
||||||
|
echo "#Internal settings which need to propagated for Kokkos examples" >> Makefile.kokkos
|
||||||
|
echo "KOKKOS_INTERNAL_USE_CUDA = ${KOKKOS_INTERNAL_USE_CUDA}" >> Makefile.kokkos
|
||||||
|
echo "KOKKOS_INTERNAL_USE_OPENMP = ${KOKKOS_INTERNAL_USE_OPENMP}" >> Makefile.kokkos
|
||||||
|
echo "KOKKOS_INTERNAL_USE_PTHREADS = ${KOKKOS_INTERNAL_USE_PTHREADS}" >> Makefile.kokkos
|
||||||
|
echo "" >> Makefile.kokkos
|
||||||
|
echo "#Fake kokkos-clean target" >> Makefile.kokkos
|
||||||
|
echo "kokkos-clean:" >> Makefile.kokkos
|
||||||
|
echo "" >> Makefile.kokkos
|
||||||
sed \
|
sed \
|
||||||
-e 's|$(KOKKOS_PATH)/core/src|$(PREFIX)/include|g' \
|
-e 's|$(KOKKOS_PATH)/core/src|$(PREFIX)/include|g' \
|
||||||
-e 's|$(KOKKOS_PATH)/containers/src|$(PREFIX)/include|g' \
|
-e 's|$(KOKKOS_PATH)/containers/src|$(PREFIX)/include|g' \
|
||||||
@ -98,27 +118,27 @@ mkdir:
|
|||||||
|
|
||||||
copy-cuda: mkdir
|
copy-cuda: mkdir
|
||||||
mkdir -p $(PREFIX)/include/Cuda
|
mkdir -p $(PREFIX)/include/Cuda
|
||||||
cp $(KOKKOS_HEADERS_CUDA) $(PREFIX)/include/Cuda
|
cp $(COPY_FLAG) $(KOKKOS_HEADERS_CUDA) $(PREFIX)/include/Cuda
|
||||||
|
|
||||||
copy-threads: mkdir
|
copy-threads: mkdir
|
||||||
mkdir -p $(PREFIX)/include/Threads
|
mkdir -p $(PREFIX)/include/Threads
|
||||||
cp $(KOKKOS_HEADERS_THREADS) $(PREFIX)/include/Threads
|
cp $(COPY_FLAG) $(KOKKOS_HEADERS_THREADS) $(PREFIX)/include/Threads
|
||||||
|
|
||||||
copy-qthread: mkdir
|
copy-qthread: mkdir
|
||||||
mkdir -p $(PREFIX)/include/Qthread
|
mkdir -p $(PREFIX)/include/Qthread
|
||||||
cp $(KOKKOS_HEADERS_QTHREAD) $(PREFIX)/include/Qthread
|
cp $(COPY_FLAG) $(KOKKOS_HEADERS_QTHREAD) $(PREFIX)/include/Qthread
|
||||||
|
|
||||||
copy-openmp: mkdir
|
copy-openmp: mkdir
|
||||||
mkdir -p $(PREFIX)/include/OpenMP
|
mkdir -p $(PREFIX)/include/OpenMP
|
||||||
cp $(KOKKOS_HEADERS_OPENMP) $(PREFIX)/include/OpenMP
|
cp $(COPY_FLAG) $(KOKKOS_HEADERS_OPENMP) $(PREFIX)/include/OpenMP
|
||||||
|
|
||||||
install: mkdir $(CONDITIONAL_COPIES) build-lib
|
install: mkdir $(CONDITIONAL_COPIES) build-lib
|
||||||
cp $(NVCC_WRAPPER) $(PREFIX)/bin
|
cp $(COPY_FLAG) $(NVCC_WRAPPER) $(PREFIX)/bin
|
||||||
cp $(KOKKOS_HEADERS_INCLUDE) $(PREFIX)/include
|
cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE) $(PREFIX)/include
|
||||||
cp $(KOKKOS_HEADERS_INCLUDE_IMPL) $(PREFIX)/include/impl
|
cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE_IMPL) $(PREFIX)/include/impl
|
||||||
cp Makefile.kokkos $(PREFIX)
|
cp $(COPY_FLAG) Makefile.kokkos $(PREFIX)
|
||||||
cp libkokkos.a $(PREFIX)/lib
|
cp $(COPY_FLAG) libkokkos.a $(PREFIX)/lib
|
||||||
cp KokkosCore_config.h $(PREFIX)/include
|
cp $(COPY_FLAG) KokkosCore_config.h $(PREFIX)/include
|
||||||
|
|
||||||
clean: kokkos-clean
|
clean: kokkos-clean
|
||||||
rm -f Makefile.kokkos
|
rm -f Makefile.kokkos
|
||||||
|
|||||||
@ -43,7 +43,7 @@
|
|||||||
|
|
||||||
#include <Kokkos_Core.hpp>
|
#include <Kokkos_Core.hpp>
|
||||||
|
|
||||||
#if defined( KOKKOS_HAVE_OPENMP ) && defined( KOKKOS_ENABLE_TASKPOLICY )
|
#if defined( KOKKOS_HAVE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG )
|
||||||
|
|
||||||
#include <impl/Kokkos_TaskQueue_impl.hpp>
|
#include <impl/Kokkos_TaskQueue_impl.hpp>
|
||||||
|
|
||||||
@ -324,6 +324,6 @@ void TaskQueueSpecialization< Kokkos::OpenMP >::
|
|||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_HAVE_OPENMP ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
|
#endif /* #if defined( KOKKOS_HAVE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG ) */
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -44,7 +44,7 @@
|
|||||||
#ifndef KOKKOS_IMPL_OPENMP_TASK_HPP
|
#ifndef KOKKOS_IMPL_OPENMP_TASK_HPP
|
||||||
#define KOKKOS_IMPL_OPENMP_TASK_HPP
|
#define KOKKOS_IMPL_OPENMP_TASK_HPP
|
||||||
|
|
||||||
#if defined( KOKKOS_ENABLE_TASKPOLICY )
|
#if defined( KOKKOS_ENABLE_TASKDAG )
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
@ -156,21 +156,30 @@ template<typename iType>
|
|||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >
|
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >
|
||||||
TeamThreadRange
|
TeamThreadRange
|
||||||
( Impl::TaskExec< Kokkos::OpenMP > & thread
|
( Impl::TaskExec< Kokkos::OpenMP > & thread, const iType & count )
|
||||||
, const iType & count )
|
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >(thread,count);
|
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >(thread,count);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
template<typename iType1, typename iType2>
|
||||||
|
KOKKOS_INLINE_FUNCTION
|
||||||
|
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
|
||||||
|
Impl::TaskExec< Kokkos::OpenMP > >
|
||||||
|
TeamThreadRange
|
||||||
|
( Impl:: TaskExec< Kokkos::OpenMP > & thread, const iType1 & begin, const iType2 & end )
|
||||||
|
{
|
||||||
|
typedef typename std::common_type<iType1, iType2>::type iType;
|
||||||
|
return Impl::TeamThreadRangeBoundariesStruct<iType, Impl::TaskExec< Kokkos::OpenMP > >(thread, begin, end);
|
||||||
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template<typename iType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >
|
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >
|
||||||
TeamThreadRange
|
ThreadVectorRange
|
||||||
( Impl:: TaskExec< Kokkos::OpenMP > & thread
|
( Impl::TaskExec< Kokkos::OpenMP > & thread
|
||||||
, const iType & start
|
, const iType & count )
|
||||||
, const iType & end )
|
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >(thread,start,end);
|
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >(thread,count);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
|
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
|
||||||
@ -351,6 +360,6 @@ void parallel_scan
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
|
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
|
||||||
#endif /* #ifndef KOKKOS_IMPL_OPENMP_TASK_HPP */
|
#endif /* #ifndef KOKKOS_IMPL_OPENMP_TASK_HPP */
|
||||||
|
|
||||||
|
|||||||
@ -300,12 +300,12 @@ void OpenMP::initialize( unsigned thread_count ,
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Check for over-subscription
|
// Check for over-subscription
|
||||||
if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
|
//if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
|
||||||
std::cout << "Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
|
// std::cout << "Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
|
||||||
std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
|
// std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
|
||||||
std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
|
// std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
|
||||||
std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
|
// std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
|
||||||
}
|
//}
|
||||||
// Init the array for used for arbitrarily sized atomics
|
// Init the array for used for arbitrarily sized atomics
|
||||||
Impl::init_lock_array_host_space();
|
Impl::init_lock_array_host_space();
|
||||||
|
|
||||||
|
|||||||
@ -393,12 +393,14 @@ public:
|
|||||||
typedef typename if_c< sizeof(ValueType) < TEAM_REDUCE_SIZE
|
typedef typename if_c< sizeof(ValueType) < TEAM_REDUCE_SIZE
|
||||||
, ValueType , void >::type type ;
|
, ValueType , void >::type type ;
|
||||||
|
|
||||||
type * const local_value = ((type*) m_exec.scratch_thread());
|
type volatile * const shared_value =
|
||||||
if(team_rank() == thread_id)
|
((type*) m_exec.pool_rev( m_team_base_rev )->scratch_thread());
|
||||||
*local_value = value;
|
|
||||||
|
if ( team_rank() == thread_id ) *shared_value = value;
|
||||||
memory_fence();
|
memory_fence();
|
||||||
team_barrier();
|
team_barrier(); // Wait for 'thread_id' to write
|
||||||
value = *local_value;
|
value = *shared_value ;
|
||||||
|
team_barrier(); // Wait for team members to read
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -655,8 +657,6 @@ public:
|
|||||||
static inline int team_reduce_size() { return TEAM_REDUCE_SIZE ; }
|
static inline int team_reduce_size() { return TEAM_REDUCE_SIZE ; }
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
template< class ... Properties >
|
template< class ... Properties >
|
||||||
class TeamPolicyInternal< Kokkos::OpenMP, Properties ... >: public PolicyTraits<Properties ...>
|
class TeamPolicyInternal< Kokkos::OpenMP, Properties ... >: public PolicyTraits<Properties ...>
|
||||||
{
|
{
|
||||||
@ -740,9 +740,9 @@ public:
|
|||||||
|
|
||||||
inline int team_size() const { return m_team_size ; }
|
inline int team_size() const { return m_team_size ; }
|
||||||
inline int league_size() const { return m_league_size ; }
|
inline int league_size() const { return m_league_size ; }
|
||||||
|
|
||||||
inline size_t scratch_size(const int& level, int team_size_ = -1) const {
|
inline size_t scratch_size(const int& level, int team_size_ = -1) const {
|
||||||
if(team_size_ < 0)
|
if(team_size_ < 0) team_size_ = m_team_size;
|
||||||
team_size_ = m_team_size;
|
|
||||||
return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level] ;
|
return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level] ;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -840,7 +840,6 @@ public:
|
|||||||
};
|
};
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
|
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
@ -864,29 +863,26 @@ int OpenMP::thread_pool_rank()
|
|||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Kokkos
|
template< typename iType >
|
||||||
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
|
|
||||||
template<typename iType>
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>
|
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >
|
||||||
TeamThreadRange(const Impl::OpenMPexecTeamMember& thread, const iType& count) {
|
TeamThreadRange( const Impl::OpenMPexecTeamMember& thread, const iType& count ) {
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>(thread,count);
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >( thread, count );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template< typename iType1, typename iType2 >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>
|
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
|
||||||
TeamThreadRange(const Impl::OpenMPexecTeamMember& thread, const iType& begin, const iType& end) {
|
Impl::OpenMPexecTeamMember >
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>(thread,begin,end);
|
TeamThreadRange( const Impl::OpenMPexecTeamMember& thread, const iType1& begin, const iType2& end ) {
|
||||||
|
typedef typename std::common_type< iType1, iType2 >::type iType;
|
||||||
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >( thread, iType(begin), iType(end) );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template<typename iType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >
|
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >
|
||||||
ThreadVectorRange(const Impl::OpenMPexecTeamMember& thread, const iType& count) {
|
ThreadVectorRange(const Impl::OpenMPexecTeamMember& thread, const iType& count) {
|
||||||
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >(thread,count);
|
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >(thread,count);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -899,6 +895,7 @@ KOKKOS_INLINE_FUNCTION
|
|||||||
Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember> PerThread(const Impl::OpenMPexecTeamMember& thread) {
|
Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember> PerThread(const Impl::OpenMPexecTeamMember& thread) {
|
||||||
return Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>(thread);
|
return Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>(thread);
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
@ -959,7 +956,6 @@ void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::Ope
|
|||||||
|
|
||||||
} //namespace Kokkos
|
} //namespace Kokkos
|
||||||
|
|
||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
|
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
|
||||||
*
|
*
|
||||||
@ -1080,4 +1076,3 @@ void single(const Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>& single_s
|
|||||||
}
|
}
|
||||||
|
|
||||||
#endif /* #ifndef KOKKOS_OPENMPEXEC_HPP */
|
#endif /* #ifndef KOKKOS_OPENMPEXEC_HPP */
|
||||||
|
|
||||||
|
|||||||
@ -511,6 +511,7 @@ public:
|
|||||||
};
|
};
|
||||||
|
|
||||||
} // namespace Impl
|
} // namespace Impl
|
||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
@ -518,26 +519,24 @@ public:
|
|||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
|
|
||||||
template<typename iType>
|
template< typename iType >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>
|
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >
|
||||||
TeamThreadRange(const Impl::QthreadTeamPolicyMember& thread, const iType& count)
|
TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread, const iType& count )
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>(thread,count);
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >( thread, count );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template< typename iType1, typename iType2 >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>
|
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
|
||||||
TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread
|
Impl::QthreadTeamPolicyMember >
|
||||||
, const iType & begin
|
TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread, const iType1 & begin, const iType2 & end )
|
||||||
, const iType & end
|
|
||||||
)
|
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>(thread,begin,end);
|
typedef typename std::common_type< iType1, iType2 >::type iType;
|
||||||
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >( thread, iType(begin), iType(end) );
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
template<typename iType>
|
template<typename iType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >
|
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >
|
||||||
@ -545,7 +544,6 @@ Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >
|
|||||||
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >(thread,count);
|
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >(thread,count);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember> PerTeam(const Impl::QthreadTeamPolicyMember& thread) {
|
Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember> PerTeam(const Impl::QthreadTeamPolicyMember& thread) {
|
||||||
return Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
|
return Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
|
||||||
@ -556,14 +554,10 @@ Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember> PerThread(const Impl::Qt
|
|||||||
return Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
|
return Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Kokkos
|
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
|
||||||
|
*
|
||||||
namespace Kokkos {
|
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
|
||||||
|
* This functionality requires C++11 support.*/
|
||||||
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
|
|
||||||
*
|
|
||||||
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
|
|
||||||
* This functionality requires C++11 support.*/
|
|
||||||
template<typename iType, class Lambda>
|
template<typename iType, class Lambda>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries, const Lambda& lambda) {
|
void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries, const Lambda& lambda) {
|
||||||
@ -618,9 +612,6 @@ void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::Qth
|
|||||||
|
|
||||||
#endif /* #if defined( KOKKOS_HAVE_CXX11 ) */
|
#endif /* #if defined( KOKKOS_HAVE_CXX11 ) */
|
||||||
|
|
||||||
} // namespace Kokkos
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
|
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
|
||||||
*
|
*
|
||||||
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
|
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
|
||||||
@ -707,10 +698,6 @@ void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::Qth
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace Kokkos
|
|
||||||
|
|
||||||
namespace Kokkos {
|
|
||||||
|
|
||||||
template<class FunctorType>
|
template<class FunctorType>
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
void single(const Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda) {
|
void single(const Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda) {
|
||||||
@ -740,6 +727,4 @@ void single(const Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>& singl
|
|||||||
|
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
|
|
||||||
#endif /* #define KOKKOS_QTHREAD_PARALLEL_HPP */
|
#endif /* #define KOKKOS_QTHREAD_PARALLEL_HPP */
|
||||||
|
|
||||||
|
|||||||
@ -58,7 +58,7 @@
|
|||||||
#include <Kokkos_Atomic.hpp>
|
#include <Kokkos_Atomic.hpp>
|
||||||
#include <Qthread/Kokkos_Qthread_TaskPolicy.hpp>
|
#include <Qthread/Kokkos_Qthread_TaskPolicy.hpp>
|
||||||
|
|
||||||
#if defined( KOKKOS_ENABLE_TASKPOLICY )
|
#if defined( KOKKOS_ENABLE_TASKDAG )
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
@ -196,7 +196,7 @@ void Task::assign( Task ** const lhs , Task * rhs , const bool no_throw )
|
|||||||
static const char msg_error_dependences[] = ": destroy task that has dependences" ;
|
static const char msg_error_dependences[] = ": destroy task that has dependences" ;
|
||||||
static const char msg_error_exception[] = ": caught internal exception" ;
|
static const char msg_error_exception[] = ": caught internal exception" ;
|
||||||
|
|
||||||
if ( rhs ) { Kokkos::atomic_fetch_add( & (*rhs).m_ref_count , 1 ); }
|
if ( rhs ) { Kokkos::atomic_increment( &(*rhs).m_ref_count ); }
|
||||||
|
|
||||||
Task * const lhs_val = Kokkos::atomic_exchange( lhs , rhs );
|
Task * const lhs_val = Kokkos::atomic_exchange( lhs , rhs );
|
||||||
|
|
||||||
@ -486,6 +486,6 @@ void wait( Kokkos::Experimental::TaskPolicy< Kokkos::Qthread > & policy )
|
|||||||
} // namespace Experimental
|
} // namespace Experimental
|
||||||
} // namespace Kokkos
|
} // namespace Kokkos
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
|
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
|
||||||
#endif /* #if defined( KOKKOS_HAVE_QTHREAD ) */
|
#endif /* #if defined( KOKKOS_HAVE_QTHREAD ) */
|
||||||
|
|
||||||
|
|||||||
@ -43,8 +43,8 @@
|
|||||||
|
|
||||||
// Experimental unified task-data parallel manycore LDRD
|
// Experimental unified task-data parallel manycore LDRD
|
||||||
|
|
||||||
#ifndef KOKKOS_QTHREAD_TASKPOLICY_HPP
|
#ifndef KOKKOS_QTHREAD_TASKSCHEDULER_HPP
|
||||||
#define KOKKOS_QTHREAD_TASKPOLICY_HPP
|
#define KOKKOS_QTHREAD_TASKSCHEDULER_HPP
|
||||||
|
|
||||||
#include <string>
|
#include <string>
|
||||||
#include <typeinfo>
|
#include <typeinfo>
|
||||||
@ -64,12 +64,12 @@
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#include <Kokkos_Qthread.hpp>
|
#include <Kokkos_Qthread.hpp>
|
||||||
#include <Kokkos_TaskPolicy.hpp>
|
#include <Kokkos_TaskScheduler.hpp>
|
||||||
#include <Kokkos_View.hpp>
|
#include <Kokkos_View.hpp>
|
||||||
|
|
||||||
#include <impl/Kokkos_FunctorAdapter.hpp>
|
#include <impl/Kokkos_FunctorAdapter.hpp>
|
||||||
|
|
||||||
#if defined( KOKKOS_ENABLE_TASKPOLICY )
|
#if defined( KOKKOS_ENABLE_TASKDAG )
|
||||||
|
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
@ -154,7 +154,7 @@ public:
|
|||||||
KOKKOS_FUNCTION static
|
KOKKOS_FUNCTION static
|
||||||
TaskMember * verify_type( TaskMember * t )
|
TaskMember * verify_type( TaskMember * t )
|
||||||
{
|
{
|
||||||
enum { check_type = ! Kokkos::Impl::is_same< ResultType , void >::value };
|
enum { check_type = ! std::is_same< ResultType , void >::value };
|
||||||
|
|
||||||
if ( check_type && t != 0 ) {
|
if ( check_type && t != 0 ) {
|
||||||
|
|
||||||
@ -298,7 +298,7 @@ public:
|
|||||||
|
|
||||||
template< class FunctorType , class ResultType >
|
template< class FunctorType , class ResultType >
|
||||||
KOKKOS_INLINE_FUNCTION static
|
KOKKOS_INLINE_FUNCTION static
|
||||||
void apply_single( typename Kokkos::Impl::enable_if< ! Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t )
|
void apply_single( typename std::enable_if< ! std::is_same< ResultType , void >::value , TaskMember * >::type t )
|
||||||
{
|
{
|
||||||
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
|
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
|
||||||
|
|
||||||
@ -314,7 +314,7 @@ public:
|
|||||||
|
|
||||||
template< class FunctorType , class ResultType >
|
template< class FunctorType , class ResultType >
|
||||||
KOKKOS_INLINE_FUNCTION static
|
KOKKOS_INLINE_FUNCTION static
|
||||||
void apply_single( typename Kokkos::Impl::enable_if< Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t )
|
void apply_single( typename std::enable_if< std::is_same< ResultType , void >::value , TaskMember * >::type t )
|
||||||
{
|
{
|
||||||
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
|
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
|
||||||
|
|
||||||
@ -332,7 +332,7 @@ public:
|
|||||||
|
|
||||||
template< class FunctorType , class ResultType >
|
template< class FunctorType , class ResultType >
|
||||||
KOKKOS_INLINE_FUNCTION static
|
KOKKOS_INLINE_FUNCTION static
|
||||||
void apply_team( typename Kokkos::Impl::enable_if< ! Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t
|
void apply_team( typename std::enable_if< ! std::is_same< ResultType , void >::value , TaskMember * >::type t
|
||||||
, Kokkos::Impl::QthreadTeamPolicyMember & member )
|
, Kokkos::Impl::QthreadTeamPolicyMember & member )
|
||||||
{
|
{
|
||||||
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
|
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
|
||||||
@ -344,7 +344,7 @@ public:
|
|||||||
|
|
||||||
template< class FunctorType , class ResultType >
|
template< class FunctorType , class ResultType >
|
||||||
KOKKOS_INLINE_FUNCTION static
|
KOKKOS_INLINE_FUNCTION static
|
||||||
void apply_team( typename Kokkos::Impl::enable_if< Kokkos::Impl::is_same< ResultType , void >::value , TaskMember * >::type t
|
void apply_team( typename std::enable_if< std::is_same< ResultType , void >::value , TaskMember * >::type t
|
||||||
, Kokkos::Impl::QthreadTeamPolicyMember & member )
|
, Kokkos::Impl::QthreadTeamPolicyMember & member )
|
||||||
{
|
{
|
||||||
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
|
typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
|
||||||
@ -575,10 +575,10 @@ public:
|
|||||||
template< class A1 , class A2 , class A3 , class A4 >
|
template< class A1 , class A2 , class A3 , class A4 >
|
||||||
void add_dependence( const Future<A1,A2> & after
|
void add_dependence( const Future<A1,A2> & after
|
||||||
, const Future<A3,A4> & before
|
, const Future<A3,A4> & before
|
||||||
, typename Kokkos::Impl::enable_if
|
, typename std::enable_if
|
||||||
< Kokkos::Impl::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
|
< std::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
|
||||||
&&
|
&&
|
||||||
Kokkos::Impl::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
|
std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
|
||||||
>::type * = 0
|
>::type * = 0
|
||||||
)
|
)
|
||||||
{
|
{
|
||||||
@ -621,8 +621,8 @@ public:
|
|||||||
template< class FunctorType , class A3 , class A4 >
|
template< class FunctorType , class A3 , class A4 >
|
||||||
void add_dependence( FunctorType * task_functor
|
void add_dependence( FunctorType * task_functor
|
||||||
, const Future<A3,A4> & before
|
, const Future<A3,A4> & before
|
||||||
, typename Kokkos::Impl::enable_if
|
, typename std::enable_if
|
||||||
< Kokkos::Impl::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
|
< std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
|
||||||
>::type * = 0
|
>::type * = 0
|
||||||
)
|
)
|
||||||
{
|
{
|
||||||
@ -659,6 +659,6 @@ public:
|
|||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
//----------------------------------------------------------------------------
|
//----------------------------------------------------------------------------
|
||||||
|
|
||||||
#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
|
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
|
||||||
#endif /* #define KOKKOS_QTHREAD_TASK_HPP */
|
#endif /* #define KOKKOS_QTHREAD_TASK_HPP */
|
||||||
|
|
||||||
|
|||||||
@ -714,12 +714,12 @@ void ThreadsExec::initialize( unsigned thread_count ,
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Check for over-subscription
|
// Check for over-subscription
|
||||||
if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
|
//if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
|
||||||
std::cout << "Kokkos::Threads::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
|
// std::cout << "Kokkos::Threads::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
|
||||||
std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
|
// std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
|
||||||
std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
|
// std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
|
||||||
std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
|
// std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
|
||||||
}
|
//}
|
||||||
|
|
||||||
// Init the array for used for arbitrarily sized atomics
|
// Init the array for used for arbitrarily sized atomics
|
||||||
Impl::init_lock_array_host_space();
|
Impl::init_lock_array_host_space();
|
||||||
|
|||||||
@ -406,6 +406,8 @@ public:
|
|||||||
m_exec->barrier();
|
m_exec->barrier();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
else
|
||||||
|
{ m_invalid_thread = 1; }
|
||||||
}
|
}
|
||||||
|
|
||||||
ThreadsExecTeamMember()
|
ThreadsExecTeamMember()
|
||||||
@ -460,7 +462,7 @@ public:
|
|||||||
|
|
||||||
if(m_league_chunk_end > m_league_size) m_league_chunk_end = m_league_size;
|
if(m_league_chunk_end > m_league_size) m_league_chunk_end = m_league_size;
|
||||||
|
|
||||||
if(m_league_rank>=0)
|
if((m_league_rank>=0) && (m_league_rank < m_league_chunk_end))
|
||||||
return true;
|
return true;
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
@ -704,23 +706,22 @@ public:
|
|||||||
|
|
||||||
namespace Kokkos {
|
namespace Kokkos {
|
||||||
|
|
||||||
template<typename iType>
|
template< typename iType >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>
|
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >
|
||||||
TeamThreadRange(const Impl::ThreadsExecTeamMember& thread, const iType& count)
|
TeamThreadRange( const Impl::ThreadsExecTeamMember& thread, const iType& count )
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>(thread,count);
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >( thread, count );
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename iType>
|
template< typename iType1, typename iType2 >
|
||||||
KOKKOS_INLINE_FUNCTION
|
KOKKOS_INLINE_FUNCTION
|
||||||
Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>
|
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
|
||||||
TeamThreadRange( const Impl::ThreadsExecTeamMember& thread
|
Impl::ThreadsExecTeamMember>
|
||||||
, const iType & begin
|
TeamThreadRange( const Impl::ThreadsExecTeamMember& thread, const iType1 & begin, const iType2 & end )
|
||||||
, const iType & end
|
|
||||||
)
|
|
||||||
{
|
{
|
||||||
return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>(thread,begin,end);
|
typedef typename std::common_type< iType1, iType2 >::type iType;
|
||||||
|
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >( thread, iType(begin), iType(end) );
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user