- the internal data are contiguous so can broadcast size and internals
directly without an intermediate stream.
ENH: split out broadcast time for profilingPstream information
STYLE: minor Pstream cleanup
- UPstream::commsType_ from protected to private, since it already has
inlined noexcept getters/setters that should be used.
- don't pass unused/unneed tag into low-level MPI reduction templates.
Document where tags are not needed
- had Pstream::broadcast instead of UPstream::broadcast in internals
- used Pstream::maxCommsSize (bytes) for the lower limit when sending.
This would have send more data on each iteration than expected based
on maxCommsSize and finish with a number of useless iterations.
Was generally not a serious bug since maxCommsSize (if used) was
likely still far away from the MPI limits and exchange() is primarily
harnessed by PstreamBuffers, which is sending character data
(ie, number of elements and number of bytes is identical).
- For v2112 and earlier: pre-assembled lists of particles
to be transferred and target patch on a per processor basis.
Apart from memory overhead of assembling the lists this adds
allocations/de-allocation when building linked-lists.
- Now stream particle transfer tuples directly into PstreamBuffers.
Use a local cache of UOPstream wrappers for the formatters
(since there are potentially many particles being shifted about).
On the receiving size, read out tuple-wise.
- Communication on transfers now restricted to the immediate
neighbours instead of using an all-to-all to exchange sizes.
Applied to Cloud::move and RecycleInteraction
- now largely encapsulated using PstreamBuffers methods,
which makes it simpler to centralize and maintain
- avoid building intermediate structures when sending data,
remove unused methods/data
TUT: parallel version of depthCharge2D
STYLE: minor update in ProcessorTopology
- PstreamBuffers nProcs() and allProcs() methods to recover the rank
information consistent with the communicator used for construction
- allowClearRecv() methods for more control over buffer reuse
For example,
pBufs.allowClearRecv(false);
forAll(particles, particlei)
{
pBufs.clear();
fill...
read via IPstream(..., pBufs);
}
This preserves the receive buffers memory allocation between calls.
- finishedNeighbourSends() method as compact wrapper for
finishedSends() when send/recv ranks are identically
(eg, neighbours)
- hasSendData()/hasRecvData() methods for PstreamBuffers.
Can be useful for some situations to skip reading entirely.
For example,
pBufs.finishedNeighbourSends(neighProcs);
if (!returnReduce(pBufs.hasRecvData(), orOp<bool>()))
{
// Nothing to do
continue;
}
...
On an individual basis:
for (const int proci : pBufs.allProcs())
{
if (pBufs.hasRecvData(proci))
{
...
}
}
Also conceivable to do the following instead (nonBlocking only):
if (!returnReduce(pBufs.hasSendData(), orOp<bool>()))
{
// Nothing to do
pBufs.clear();
continue;
}
pBufs.finishedNeighbourSends(neighProcs);
...
- a somewhat specialized use case, but can be useful when there are
many ranks with sparse communication but for which the access
pattern is established during inner loops.
PstreamBuffers pBufs(Pstream::commsTypes::nonBlocking);
pBufs.allowClearRecv(false);
PtrList<OPstream> output(Pstream::nProcs());
while (condition)
{
// Rewind existing streams
forAll(output, proci)
{
auto* osptr = output.get(proci);
if (osptr)
{
(*osptr).rewind();
}
}
for (Particle& p : myCloud)
{
label toProci = ...;
// Get or create output stream
auto* osptr = output.get(toProci);
if (!osptr)
{
osptr = new OPstream(toProci, pBufs);
output.set(toProci, osptr);
}
// Append more data...
(*osptr) << p;
}
pBufs.finishedSends();
... reads
}
- split off a Pstream::genericBroadcast() which uses UOPBstream during
serialization and UOPBstream during de-serialization.
This function will not normally be used directly by callers, but
provides a base layer for higher-level broadcast calls.
- low-level UPstream broadcast of string content.
Since std::string has length and contiguous content, it is possible
to handle directly by the following:
1. broadcast size
2. resize
3. broadcast content when size != 0
Although this is a similar amount of communication as the generic
streaming version (min 1, max 2 broadcasts) it is more efficient
by avoiding serialization/de-serialization overhead.
- handle broadcast of List content distinctly.
Allows an optimized path for contiguous data, similar to how
std::string is handled (broadcast size, resize container, broadcast
content when size != 0), but can revert to genericBroadcast (streamed)
for non-contiguous data.
- make various scatter variants simple aliases for broadcast, since
that is what they are doing behind the scenes anyhow:
* scatter()
* combineScatter()
* listCombineScatter()
* mapCombineScatter()
Except scatterList() which remains somewhat different.
Beyond the additional (size == nProcs) check, the only difference to
using broadcast(List<T>&) or a regular scatter(List<T>&) is that
processor-local data is skipped. So leave this variant as-is.
STYLE: rename/prefix implementation code with 'Pstream'
- better association with its purpose and provides a unique name
- reduces later surprises and simplifies effort for the caller
- more flexible globalIndex scatter with auto-sized return field.
- Avoid communication for scattering into zero-sized fields.
- the data front for isoAdvection can be particularly sparse and at
higher processor counts there is an advantage to avoiding all-to-all
communication for the PstreamBuffers exchange
Based on code changes from T.Aoyagi(RIST), A.Azami(RIST)
- use MPI_Bcast intrinsic instead of manual tree to reduce the overall
number of messages.
Old behaviour can be re-enabled with
`#define Foam_Pstream_scatter_nobroadcast`
- The idea of broadcast streams is to replace multiple master to
subProcs communications with a single MPI_Bcast.
if (Pstream::master())
{
OPBstream toAll(Pstream::masterNo());
toAll << data;
}
else
{
IPBstream fromMaster(Pstream::masterNo());
fromMaster >> data;
}
// vs.
if (Pstream::master())
{
for (const int proci : Pstream::subProcs())
{
OPstream os(Pstream::commsTypes::scheduled, proci);
os << data;
}
}
else
{
IPstream is(Pstream::commsTypes::scheduled, Pstream::masterNo());
is >> data;
}
Can simply use UPstream::broadcast() directly for contiguous data
with known lengths.
Based on ideas from T.Aoyagi(RIST), A.Azami(RIST)
- native MPI min/max/sum reductions for float/double
irrespective of WM_PRECISION_OPTION
- native MPI min/max/sum reductions for (u)int32_t/(u)int64_t types,
irrespective of WM_LABEL_SIZE
- replace rarely used vector2D sum reduction with FixedList as a
indicator of its intent and also generalizes to different lengths.
OLD:
vector2D values; values.x() = ...; values.y() = ...;
reduce(values, sumOp<vector2D>());
NEW:
FixedList<scalar,2> values; values[0] = ...; values[1] = ...;
reduce(values, sumOp<scalar>());
- allow returnReduce() to use native reductions. Previous code (with
linear/tree selector) would have bypassed them inadvertently.
ENH: added support for MPI broadcast (for a memory span)
ENH: select communication schedule as a static method
- UPstream::whichCommunication(comm) to select linear/tree
communication instead of ternary or
if (Pstream::nProcs() < Pstream::nProcsSimpleSum) ...
STYLE: align nProcsSimpleSum static value with etc/controlDict override
- refactor as an MPI-independent base class.
Add bufferIPC{send,recv} private methods for construct/destruct.
Eliminates code duplication from two constructor forms and reduces
additional constructor definitions in dummy library.
- add PstreamBuffers access methods, refactor common finish sends
code, tweak member packing
ENH: resize_nocopy for processorLduInterface buffers
- content is immediately overwritten
STYLE: cull unneeded includes in processorFa*
- handled by processorLduInterface
- this can be used to apply a uniform field level to remove from
a sampled field. For example,
fieldLevel
{
"p.*" 1e5; // Absolute -> gauge [Pa]
T 273.15; // [K] -> [C]
U #eval{ 10/sqrt(3) }; // Uniform mag(U)=10
}
After the fieldLevel has been removed, any fieldScale is applied.
For example
fieldScale
{
"p.*" 0.01; // [Pa] -> [mbar]
}
The fieldLevel for vector and tensor fields may still need some
further refinement.
The runTimeControl function object can activate further function objects using
triggers. Previously the trigger index could only advance; this change set
allows users to set smaller values to enable function object recycling, e.g.
Repeat for N cycles:
1. average the pressure at a point in space
2. when the average stabilises, run for a further 100 iterations
3. set a new patch inlet velocity
- back to (1)
- Removes old default behaviour that only permitted an increase in the
trigger level. This type of 'ratcheting' mechanism (if required) is
now the responsibility of the derived function object.
- notably affects writing continuous data in binary. If generating a
compound token (eg, List<label>), need to add in the size prefix
otherwise it cannot actually be parsed properly as a List.
BUG: bad fallthrough for compound reading (FixedList)
- the branch was likely never reached, but would have attempted to
read twice due to a bad fall-through condition.
GIT: relocate globalIndex (is independent of mesh)
STYLE: include label/scalar Fwd in contiguous.H
STYLE: unneed commSchedule include in GeometricField
- as a side-effect of changes to probes, the file pointers are not
automatically creating when reading the dictionary but delayed
until prepare(WRITE_ACTION) is called.
This nuance was missed in thermoCoupleProbes.