- bundles frequently used 'gather/scatter' patterns more consistently.
- combineAllGather -> combineGather + broadcast
- listCombineAllGather -> listCombineGather + broadcast
- mapCombineAllGather -> mapCombineGather + broadcast
- allGatherList -> gatherList + scatterList
- reduce -> gather + broadcast (ie, allreduce)
- The allGatherList currently wraps gatherList/scatterList, but may be
replaced with a different algorithm in the future.
STYLE: PstreamCombineReduceOps.H is mostly unneeded now
STYLE: LduInterfaceFieldPtrsList as alias instead of a class
STYLE: define patch lists typedefs when defining the base patch
- eg, polyPatchList typedef within polyPatch.H
INT: relocate GeometricField::Boundary -> GeometricBoundaryField
- was internal to GeometricField but moving it outside simplifies
forward declarations etc. Code adapted from openfoam.org
- when writing surface formats (eg, vtk, ensight etc) the sampled
surfaces merge the faces/points originating from different
processors into a single surface (ie, patch gatherAndMerge).
Previous versions of mergePoints simply merged all points possible,
which proves to be rather slow for larger meshes. This has now been
modified to only consider boundary points, which reduces the number
of points to consider. As part of this change, the reference point
is now always equivalent to the min of the bounding box, which
reduces the number of search loops. The merged points retain their
original order.
- inplaceMergePoints version to simplify use and improve code
robustness and efficiency.
ENH: make PrimitivePatch::boundaryPoints() less costly
- if edge addressing does not already exist, it will now simply walk
the local face edges directly to define the boundary points.
This avoids a rather large overhead of the full faceFaces,
edgeFaces, faceEdges addressing.
This operation is now more important since it is used in the revised
patch gatherAndMerge.
ENH: topological merge for mesh-based surfaces in surfaceFieldValue
- also disables PointData if manifold cells are detected.
This is a partial workaround for volPointInterpolation problems
with handling manifold cells.
- additional verbosity option for conversions
- ignore old `-finite-area` option and always convert available
finiteArea mesh/fields unless `-no-finite-area` is specified (#2374)
ENH: simplify point offset handling for ensight output
- extend writing to include compact face/cell lists
- a try/catch approach is not really robust enough (or even possible)
since read failures likely do not occur on all ranks simultaneously.
This leads to situations where the master has thrown an exception
(and thus exiting the current routine) while other ranks are still
waiting to receive data and the program blocks completely.
Since this primarily affects data conversion routines such as
foamToEnsight etc, treat similarly to lagrangian: check for the
existence of essential files before proceeding or not. This is
wrapped into a TryNew factory method:
autoPtr<faMesh> faMeshPtr(faMesh::TryNew(mesh));
if (faMeshPtr) ...
- gather/scatter types of operations can avoid AllToAll communication
and use simple MPI gather (or scatter) to establish the receive sizes.
New methods: finishedGathers() / finishedScatters()
- do not need contruct or move assign from SortableList.
Rarely (never) used and can simply treat like a normal list
by applying shrink beforehand.
- make append() methods return void instead of returning self, which
makes it easier to derive from. Having them return self was a bit of
an original design mistake.
Chaining appends do not actually occur anywhere. Even if they were
to be used, would not want to rely on them (fear of slicing on any
derived classes).
BUG: IndirectList iterator comparison loses constness
- eliminate redundant size_ accounting
- drop extra 'Container' template parameter and replace functionality
with more flexible pack/unpack methods.
There is also a pack() method that handles indirect lists of lists
that can be used, for example, to pack a patch slice of faces.
Drop the 'operator()' method in favour of unpack to expose and properly
document the conversion. Should revisit the corresponding code in
some places for optimization potential.
- align some method names with globalIndex:
totalSize(), maxSize() etc
- less communication than gatherList/scatterList
ENH: refine send granularity in Pstream::exchange
STYLE: ensure PstreamBuffers and defaultCommsType agree
- simpler loops for lduSchedule
- the very old 'writer' class was fully stateless and always templated
on an particular output type.
This is now replaced with a 'coordSetWriter' with similar concepts
as previously introduced for surface writers (#1206).
- writers change from being a generic state-less set of routines to
more properly conforming to the normal notion of a writer.
- Parallel data is done *outside* of the writers, since they are used
in a wide variety of contexts and the caller is currently still in
a better position for deciding how to combine parallel data.
ENH: update sampleSets to sample on per-field basis (#2347)
- sample/write a field in a single step.
- support for 'sampleOnExecute' to obtain values at execution
intervals without writing.
- support 'sets' input as a dictionary entry (as well as a list),
which is similar to the changes for sampled-surface and permits use
of changeDictionary to modify content.
- globalIndex for gather to reduce parallel communication, less code
- qualify the sampleSet results (properties) with the name of the set.
The sample results were previously without a qualifier, which meant
that only the last property value was actually saved (previous ones
overwritten).
For example,
```
sample1
{
scalar
{
average(line,T) 349.96521;
min(line,T) 349.9544281;
max(line,T) 350;
average(cells,T) 349.9854619;
min(cells,T) 349.6589286;
max(cells,T) 350.4967271;
average(line,epsilon) 0.04947733869;
min(line,epsilon) 0.04449639927;
max(line,epsilon) 0.06452856475;
}
label
{
size(line,T) 79;
size(cells,T) 1720;
size(line,epsilon) 79;
}
}
```
ENH: update particleTracks application
- use globalIndex to manage original parcel addressing and
for gathering. Simplify code by introducing a helper class,
storing intermediate fields in hash tables instead of
separate lists.
ADDITIONAL NOTES:
- the regionSizeDistribution largely retains separate writers since
the utility of placing sum/dev/count for all fields into a single file
is questionable.
- the streamline writing remains a "soft" upgrade, which means that
scalar and vector fields are still collected a priori and not
on-the-fly. This is due to how the streamline infrastructure is
currently handled (should be upgraded in the future).
Automatic hole closure:
- introduces 'holeToFace' topoSet source
- used when detecting a 'leak-path'
- creates additional baffles to close the leak
Multi-stage layer addition:
- Can add layers in multiple passes
See issues: #2403, #2404
- for metis-like graphs there is no guarantee that a zero-sized graph
has an offsets list with size 1 or size 0, so always use
numCells = max(0, xadj.size()-1)
this was already done in most places, but missed in the
decomposeGeneral method
STYLE: use sumOp<label>() instead of plusOp<label>()
- the internal data are contiguous so can broadcast size and internals
directly without an intermediate stream.
ENH: split out broadcast time for profilingPstream information
STYLE: minor Pstream cleanup
- UPstream::commsType_ from protected to private, since it already has
inlined noexcept getters/setters that should be used.
- don't pass unused/unneed tag into low-level MPI reduction templates.
Document where tags are not needed
- had Pstream::broadcast instead of UPstream::broadcast in internals
- used Pstream::maxCommsSize (bytes) for the lower limit when sending.
This would have send more data on each iteration than expected based
on maxCommsSize and finish with a number of useless iterations.
Was generally not a serious bug since maxCommsSize (if used) was
likely still far away from the MPI limits and exchange() is primarily
harnessed by PstreamBuffers, which is sending character data
(ie, number of elements and number of bytes is identical).
- PstreamBuffers nProcs() and allProcs() methods to recover the rank
information consistent with the communicator used for construction
- allowClearRecv() methods for more control over buffer reuse
For example,
pBufs.allowClearRecv(false);
forAll(particles, particlei)
{
pBufs.clear();
fill...
read via IPstream(..., pBufs);
}
This preserves the receive buffers memory allocation between calls.
- finishedNeighbourSends() method as compact wrapper for
finishedSends() when send/recv ranks are identically
(eg, neighbours)
- hasSendData()/hasRecvData() methods for PstreamBuffers.
Can be useful for some situations to skip reading entirely.
For example,
pBufs.finishedNeighbourSends(neighProcs);
if (!returnReduce(pBufs.hasRecvData(), orOp<bool>()))
{
// Nothing to do
continue;
}
...
On an individual basis:
for (const int proci : pBufs.allProcs())
{
if (pBufs.hasRecvData(proci))
{
...
}
}
Also conceivable to do the following instead (nonBlocking only):
if (!returnReduce(pBufs.hasSendData(), orOp<bool>()))
{
// Nothing to do
pBufs.clear();
continue;
}
pBufs.finishedNeighbourSends(neighProcs);
...
- split off a Pstream::genericBroadcast() which uses UOPBstream during
serialization and UOPBstream during de-serialization.
This function will not normally be used directly by callers, but
provides a base layer for higher-level broadcast calls.
- low-level UPstream broadcast of string content.
Since std::string has length and contiguous content, it is possible
to handle directly by the following:
1. broadcast size
2. resize
3. broadcast content when size != 0
Although this is a similar amount of communication as the generic
streaming version (min 1, max 2 broadcasts) it is more efficient
by avoiding serialization/de-serialization overhead.
- handle broadcast of List content distinctly.
Allows an optimized path for contiguous data, similar to how
std::string is handled (broadcast size, resize container, broadcast
content when size != 0), but can revert to genericBroadcast (streamed)
for non-contiguous data.
- make various scatter variants simple aliases for broadcast, since
that is what they are doing behind the scenes anyhow:
* scatter()
* combineScatter()
* listCombineScatter()
* mapCombineScatter()
Except scatterList() which remains somewhat different.
Beyond the additional (size == nProcs) check, the only difference to
using broadcast(List<T>&) or a regular scatter(List<T>&) is that
processor-local data is skipped. So leave this variant as-is.
STYLE: rename/prefix implementation code with 'Pstream'
- better association with its purpose and provides a unique name
- The idea of broadcast streams is to replace multiple master to
subProcs communications with a single MPI_Bcast.
if (Pstream::master())
{
OPBstream toAll(Pstream::masterNo());
toAll << data;
}
else
{
IPBstream fromMaster(Pstream::masterNo());
fromMaster >> data;
}
// vs.
if (Pstream::master())
{
for (const int proci : Pstream::subProcs())
{
OPstream os(Pstream::commsTypes::scheduled, proci);
os << data;
}
}
else
{
IPstream is(Pstream::commsTypes::scheduled, Pstream::masterNo());
is >> data;
}
Can simply use UPstream::broadcast() directly for contiguous data
with known lengths.
Based on ideas from T.Aoyagi(RIST), A.Azami(RIST)
- native MPI min/max/sum reductions for float/double
irrespective of WM_PRECISION_OPTION
- native MPI min/max/sum reductions for (u)int32_t/(u)int64_t types,
irrespective of WM_LABEL_SIZE
- replace rarely used vector2D sum reduction with FixedList as a
indicator of its intent and also generalizes to different lengths.
OLD:
vector2D values; values.x() = ...; values.y() = ...;
reduce(values, sumOp<vector2D>());
NEW:
FixedList<scalar,2> values; values[0] = ...; values[1] = ...;
reduce(values, sumOp<scalar>());
- allow returnReduce() to use native reductions. Previous code (with
linear/tree selector) would have bypassed them inadvertently.
ENH: added support for MPI broadcast (for a memory span)
ENH: select communication schedule as a static method
- UPstream::whichCommunication(comm) to select linear/tree
communication instead of ternary or
if (Pstream::nProcs() < Pstream::nProcsSimpleSum) ...
STYLE: align nProcsSimpleSum static value with etc/controlDict override
GIT: relocate globalIndex (is independent of mesh)
STYLE: include label/scalar Fwd in contiguous.H
STYLE: unneed commSchedule include in GeometricField
ENH: provide fieldTypes::surface names (as per fieldTypes::volume)
ENH: reduce number of files for surface fields
- combine face and point field declarations/definitions,
simplify typeName definitions
- used low-level MPI gather, but the wrapping routine contains an
additional safety check for is_contiguous which is not defined for
various std::pair<..> combination.
So std::pair<label,vector> (which is actually contiguous, but not
declared as is_contiguous) would falsely trip the check.
Avoid by simply gathering unbundled values instead.
- do not need STRINGIFY macros in ragel code
- remove wordPairHashTable.H and use equivalent wordPairHashes.H instead
STYLE: replace addDictOption with explicit option
- the usage text is otherwise misleading
GIT: combine Pair/Tuple2 directories
- unused in regular OpenFOAM code
- POSIX version uses deprecated gethostbyname()
- Windows version never worked
COMP: localize, noexcept on internal OSspecific methods
STYLE: support fileName::Type SYMLINK and LINK as synonyms
- for contiguous data, added mpiGatherOp() to complement the
gatherOp() static method
- the gather ops (static methods) populate the globalIndex on the
master only (not needed on other procs) for reduced communication
- rename inplace gather methods to include 'inplace' in their name.
Regular gather methods return the gathered data directly, which
allows the following:
const scalarField mergedWeights(globalFaces().gather(wghtSum));
vs.
scalarField mergedWeights;
globalFaces().gather(wghtSum, mergedWeights());
or even:
scalarField mergedWeights;
List<scalarField> allWeights(Pstream::nProcs());
allWeights[Pstream::myProcNo()] = wghtSum;
Pstream::gatherList(allWeights);
if (Pstream::master())
{
mergedWeights =
ListListOps::combine<scalarField>
(
allWeights, accessOp<scalarField>()
);
}
- add parRun guards on various globalIndex gather methods
(simple copies or no-ops in serial) to simplify the effort for callers.
ENH: reduce code effort for clearing linked-lists
ENH: adjust linked-list method name
- complement linked-list append() method with prepend() method
instead of 'insert', which is not very descriptive