- in most cases can simply construct mapDistribute with the sendMap
and have it take care of communication and addressing for the
corresponding constructMap.
This removes code duplication, which in some cases was also using
much less efficient mechanisms (eg, combineReduce on list of
lists, or an allGatherList on the send sizes etc) and also
reduces the number of places where Pstream::exchange/exchangeSizes
is being called.
ENH: reduce communication in turbulentDFSEMInlet
- was doing an allGatherList to populate a mapDistribute.
Now simply use PstreamBuffers mechanisms directly.
- dynamic sparse data exchange using Map to hold data and sizes.
Still uses the personalised exchange paradigm, but with non-blocking
consensus exchange to obtain the sizes and regular point-to-point
for the data exchange itself. This avoids an all-to-all but still
keeps the point-to-point for overlapping communication, data
chunking etc.
- to service both List and Map exchanges with limited message sizes
(termed 'data chunking' here) add a PstreamDetail for walking and
dispatching. Like other Detail components, the API is subject
to (possibly breaking) changes in the future at any time.
The regular exchangeBuf detail has this type of signature:
PstreamDetail::exchangeBuf
(
const UList<std::pair<int, stdFoam::span<const Type>>>& sends,
const UList<std::pair<int, stdFoam::span<Type>>>& recvs,
...
)
Where [rank, span] is the tuple pack.
The basic idea is to pre-process the send/receive buffers and
marshall them into a flat list of [rank, span] tuples.
The originating buffers could be any type of container (List or Map)
which is then marshalled into this given sequence that can be
processed in source-agnostic fashion.
If data chunking is required (when UPstream::maxCommsSize > 0)
it is possible to make a cheap copy of the rank/address information
and then walk different slices or views.
ENH: replace private static methods with PstreamDetail functions
- simpler to update locally.
- since List is being used to manage the storage content for
DynamicList, it needs to free old memory for zero-sized lists first.
Consider this case (slightly exaggerated):
line 0: DynamicList<label> list;
line 1: list.reserve(100000);
line 2: list.reserve(200000);
After line 0:
- list has size=0, capacity=0 and data=nullptr
After line 1:
- list has size=0, capacity=1e+5 and data != nullptr
After line 2:
- list has size=0, capacity=2e+5 and data != nullptr
---
The internal resizing associated with line 1 corresponds to what the
List resize would naturally do. Namely allocate new storage, copy/move
any overlapping elements (in this case none) before freeing the old
storage and replacing with new storage.
Applying the same resizing logic for line 2 means, however, that the
old memory (1e5) and new memory (2e5) are temporarily both
accessible - leading to an unnecessary memory peak.
Now: if there is no overlap, just remove old memory first.
- basic functionality similar to std::span (C++20).
Holds pointer and size: for lightweight handling of address ranges.
- implements cdata_bytes() and data_bytes() methods for similarity
with UList. For span, however, both container accesses are const
but the data_bytes() method is only available when the
underlying pointer is non-const.
No specializations of std::as_bytes() or std::as_writeable_bytes()
as free functions, since std::byte etc are not available anyhow.
- name and functionality similar to std::unordered_map (C++17).
Formalizes what had been previously been implemented in IOobjectList
but now manages without pointer deletion/creation.
- use persistent PstreamBuffers between iterations, restrict size
information exchange to the processor neighbours (which is what the
algorithm is handling there anyhow).
- attempted reduction in bookkeeping (commit: 068ab8ccc7) meant that
the worldComm didn't have a group from which sub-communicators could
be spun off.
- do not force reset of PstreamBuffers positions
STYLE: UPstream::globalComm instead of '0'
- functionality provided as 'found(key)' in OpenFOAM naming, since
there was no stl equivalent at the time. Now support contains(),
which is the equivalent for C++20 maps/sets.
STYLE: general contains() method for containers
STYLE: treat Enum and Switch similarly as hash-like objects
- waits for completion of any of the listed requests and returns the
corresponding index into the list.
This allows, for example, dispatching of data when the receive is
completed.
- make nProcs() independent of internal storage mechanism.
- reset receive positions with finished sends
- use size of received buffers to manage validity instead of
an separate additional gather operation.
- clearing the receive 'slots' is preferrable to clearing out the map
itself since this can potentially preserve allocated space (eg
DynamicList entries) between calls.
BUG: remove stray MPI barrier in exchange code
- permits distinction between communicators/groups that were
user-created (eg, MPI_Comm_create) versus those queried from MPI.
Previously simply relied on non-null values, but that is too fragile
ENH: support List<Request> version of UPstream::finishedRequests
- allows more independent algorithms
ENH: added UPstream::probeMessage(...). Blocking or non-blocking
- allows the possibility of using demand-driven internal buffers
and/or different storage mechanisms.
Changes:
* old: sendBuf_[proci] -> accessSendBuffer(proci)
* old: recvBuf_[proci] -> accessRecvBuffer(proci)
* old: recvBufPos_[proci] -> accessRecvPosition(proci)
only affects internals of UIPstreamBase and UOPstreamBase
BUG: reduceOr in PstreamBuffers uses world communicator
- should respect the value of the communicator defined within
PstreamBuffers
- previously built the entire adjacency table (full communication!)
but this is only strictly needed when using 'scheduled' as the
default communication mode. For blocking/nonBlocking modes this
information is not necessary at that point.
The processorTopology::New now generally creates a smaller amount of
data at startup: the processor->patch mapping and the patchSchedule.
If the default communication mode is 'scheduled', the behaviour is
almost identical to previously.
- Use Map<label> for the processor->patch mapping for a smaller memory
footprint on large (ie, sparsely connected) cases. It also
simplifies coding and allows recovery of the list of procNeighbours
on demand.
- Setup the processor initEvaluate/evaluate states with fewer loops
over the patches.
========
BREAKING: procNeighbours() method changed definition
- this was previously the entire adjacency table, but is now only the
processor-local neighbours. Now use procAdjacency() to create or
recover the entire adjacency table.
The only known use is within Cloud<ParticleType>::move and there it
was only used to obtain processor-local information.
Old:
const labelList& neighbourProcs =
mesh.globalData().topology().procNeighbours()[Pstream::myProcNo()];
New:
const labelList& neighbourProcs =
mesh.globalData().topology().procNeighbours();
// If needed, the old definition (with communication!)
const labelListList& connectivity =
mesh.globalData().topology().procAdjacency();
transformation support in-place modifies the data (e.g. to
add a transform). This might cause the neighbour side patch
to pick up owner side information.
- wish to deprecate and remove exprFixedValue in the future since the
same functionality is possible using patch expressions with a
uniformFixedValue condition.
- skip loading of fields with -no-internal, -no-boundary
- suppress reporting fields with -no-internal, -no-boundary
- cache loaded volume field for reuse with point interpolation.
Trade off some memory overhead against reading twice.
NOTE: this issue will not be evident with foamToEnsight since there
it only handles cell data *or* point data (not both), so a field is
only ever loaded/processed once.
- This simplifies definition of 'lazier' (READ_IF_PRESENT)
construction or assignment.
For construction:
- For MUST_READ and key not found: FatalIOError.
- For LAZY_READ and key not found: initialise field with Zero.
- For NO_READ and key not found: simply size the field.
For assignment:
- If len == 0 : a no-op and return True.
- For NO_READ : a no-op and return False.
- For MUST_READ and key not found : FatalIOError
- encompasses isReadOptional or isReadRequired check
STYLE: allow LAZY_READ as a shorter synonym for READ_IF_PRESENT
- add helper for downgrading MUST_READ... to LAZY_READ
- with geometryOrder=1, edge normal calculation is done directly from
the faces, whereas geometryOrder=2 they are calculated based on the
point normals of each end.
In both cases, the geometry calculation uses processor communication
(with corresponding waitRequests etc).
Since the final correction and the halo face normals also need
collective communication, these routines must be triggered on all
processors or they will block. Thus also include edgeAreaNormals()
triggering in addition to pointAreaNormals() triggering.
- handle lower geometryOrder values directly within edgeAreaNormals()
and reuse the results within Le().
- direct nonBlocking recv/send of edge normals instead using the
intermediate processorLduInterface buffers
- symmetrical evaluation for processor patches, eliminates
scalar/vector multiply followed by projection.
STYLE: use evaluateCoupled instead of local versions