132 lines
3.7 KiB
Markdown
132 lines
3.7 KiB
Markdown
# Design Notes for Execution and Memory Space Instances
|
|
|
|
## Objective
|
|
|
|
* Enable Kokkos interoperability with coarse-grain tasking models
|
|
|
|
## Requirements
|
|
|
|
* Backwards compatable with existing Kokkos API
|
|
* Support existing Host execution spaces (Serial, Threads, OpenMP, maybe Qthreads)
|
|
* Support DARMA threading model (may require a new Host execution space)
|
|
* Support Uintah threading model, i.e. indepentant worker threadpools working of of shared task queues
|
|
|
|
|
|
## Execution Space
|
|
|
|
* Parallel work is *dispatched* on an execution space instance
|
|
|
|
* Execution space instances are conceptually disjoint/independant from each other
|
|
|
|
|
|
## Host Execution Space Instances
|
|
|
|
* A host-side *control* thread dispatches work to an instance
|
|
|
|
* `main` is the initial control thread
|
|
|
|
* A host execution space instance is an organized thread pool
|
|
|
|
* All instances are disjoint, i.e. hardware resources are not shared between instances
|
|
|
|
* Exactly one control thread is associated with
|
|
an instance and only that control thread may
|
|
dispatch work to to that instance
|
|
|
|
* The control thread is a member of the instance
|
|
|
|
* The pool of threads associated with an instances is not mutatable during that instance existance
|
|
|
|
* The pool of threads associated with an instance may be masked
|
|
|
|
- Allows work to be dispatched to a subset of the pool
|
|
|
|
- Example: only one hyperthread per core of the instance
|
|
|
|
- A mask can be applied during the policy creation of a parallel algorithm
|
|
|
|
- Masking is portable by defining it as ceiling of fraction between [0.0, 1.0]
|
|
of the available resources
|
|
|
|
```
|
|
class ExecutionSpace {
|
|
public:
|
|
using execution_space = ExecutionSpace;
|
|
using memory_space = ...;
|
|
using device_type = Kokkos::Device<execution_space, memory_space>;
|
|
using array_layout = ...;
|
|
using size_type = ...;
|
|
using scratch_memory_space = ...;
|
|
|
|
|
|
class Instance
|
|
{
|
|
int thread_pool_size( int depth = 0 );
|
|
...
|
|
};
|
|
|
|
class InstanceRequest
|
|
{
|
|
public:
|
|
using Control = std::function< void( Instance * )>;
|
|
|
|
InstanceRequest( Control control
|
|
, unsigned thread_count
|
|
, unsigned use_numa_count = 0
|
|
, unsigned use_cores_per_numa = 0
|
|
);
|
|
|
|
};
|
|
|
|
static bool in_parallel();
|
|
|
|
static bool sleep();
|
|
static bool wake();
|
|
|
|
static void fence();
|
|
|
|
static void print_configuration( std::ostream &, const bool detailed = false );
|
|
|
|
static void initialize( unsigned thread_count = 0
|
|
, unsigned use_numa_count = 0
|
|
, unsigned use_cores_per_numa = 0
|
|
);
|
|
|
|
// Partition the current instance into the requested instances
|
|
// and run the given functions on the cooresponding instances
|
|
// will block until all the partitioned instances complete and
|
|
// the original instance will be restored
|
|
//
|
|
// Requires that the space has already been initialized
|
|
// Requires that the request can be statisfied by the current instance
|
|
// i.e. the sum of number of requested threads must be less than the
|
|
// max_hardware_threads
|
|
//
|
|
// Each control functor will accept a handle to its new default instance
|
|
// Each instance must be independant of all other instances
|
|
// i.e. no assumption on scheduling between instances
|
|
// The user is responible for checking the return code for errors
|
|
static int run_instances( std::vector< InstanceRequest> const& requests );
|
|
|
|
static void finalize();
|
|
|
|
static int is_initialized();
|
|
|
|
static int concurrency();
|
|
|
|
static int thread_pool_size( int depth = 0 );
|
|
|
|
static int thread_pool_rank();
|
|
|
|
static int max_hardware_threads();
|
|
|
|
static int hardware_thread_id();
|
|
|
|
};
|
|
|
|
```
|
|
|
|
|
|
|
|
|