- input: added support (similar to DynamicList)
- output: redirect through UList output, which enables binary etc
- these changes enable support for broadcast and other parallel IO
for std::vector
ENH: support SubList of std::vector (entire length)
- allows a 'glue' layer for re-casting std::vector to UList etc
- when running with "errors warn;" it will reset the warnings counter
(if any) when it returns to a good state.
This re-enables un-suppressed warnings for the next cycle.
Also reset the warnings counter on end().
ENH: add short-circuit in HashTable::erase(key)
- skip and return false if the table is empty or the key is not found.
This makes for faster no-op behaviour.
- for workflows with appearing/disappearing patches (for example)
can specify that empty surfaces should be ignored or warned about
instead of raising a FatalError.
Note that this handling is additional to the regular top-level
"errors" specification. So specifying 'strict' will only actually
result in a FatalError if the "errors" does not trap errors.
- "ignore" : any empty surfaces are simply ignored and no
file output (besides the header).
- "warn" : empty surfaces are warned about a few times (10)
and the file output contains a NaN entry
- "strict" : corresponds to the default behaviour.
Throws a FatalError if the surface is empty.
This error may still be caught by the top-level "errors" handling.
- support UList shallowCopy with pointer/size
(eg, for slicing into or from span)
ENH: add SubList::reset() functionality
- allows modification of a SubList after construction.
Previously a SubList had an immutable location after construction
and there was no way to shift or change its location.
BUG: missed special handling for DynamicList<char>::readList (fixes#2974)
- equivalent to List<char>::readList, in which the stream is
temporarily toggled from ASCII to BINARY when reading in a List of
char data.
This specialization was missed when DynamicList<T>::readList() was
fully implemented.
- defines values for EMPTY, UNIFORM, NONUNIFORM and MIXED
that allow bitwise OR reduction and provide an algorithm
for calculating uniformity
ENH: consolidate more efficient uniformity checks in PackedList
ENH: improve linebreak handling when outputting small matrices
- use ignore instead of seekg/tellg to swallow input (robuster)
- check for bad gcount() values
- wrap Foam::fileSize() compressed/uncompressed handling into IFstream.
- improve handling of compressed files in masterUncollatedFileOperation.
Previously read into a string via stream iterators.
Now read chunk-wise into a List of char for fewer reallocations.
- soft renames (ie, old names still available via typedefs) for more
reasonable names and more coverage with std stream variants.
The old names could be a bit cryptic.
For example, uiliststream (== an unallocated/external list storage),
which is written as std::ispanstream for C++23.
Could similarly argue that IListStream is better named as
ICharStream, since it is an input stream of characters and the
internal storage mechanism (List or something else) is mostly
irrelevant.
Extending the coverage to include all std stream variants, and
simply rewrap them for OpenFOAM IOstream types. This simplifies the
inheritance patterns and allows reuse of icharstream/ocharstream as
a drop-in replace for istringstream/ostringstream in other wrappers.
Classes:
* icharstream / ICharStream [old: none / IListStream]
* ocharstream / OCharStream [old: none / OListStream]
* ispanstream / ISpanStream [old: uiliststream / UIListStream]
* ospanstream / OSpanStream [old: none / UOListStream]
Possible new uses : read file contents into a buffer, broadcast
buffer contents to other ranks and then transfer into an icharstream
to be read from. This avoid the multiple intermediate copies that
would be associated when using an istringstream.
- Use size doubling instead of block-wise incremental for ocharstream
(OCharStream). This corresponds to the sizing behaviour as per
std::stringstream (according to gcc-11 includes)
STYLE: drop Foam_IOstream_extras constructors for memory streams
- transitional/legacy constructors but not used in any code
- nonBlocking: receive before send
- nonBlocking: wait for receive requests, process and then wait for
other requests. Can be extended to use polling...
- the 'fake' send (to self) now send copies into recv buffers instead
of send buffers.
This provides a clear separation of send and receive fields
- avoid unnecessary reallocations with PtrList of send/recv buffers
- remove outer looping for accessAndFlip and pass target field
as parameter for inner looping instead
ENH: refine mapDistribute send/recv requests handling
- separate send/recv requests for finer control
- receive does not need access to PtrList of sendBuffers, since the
send-to-self now uses the recvBuffers
- totalSize() returns retrieve the linear (total) size
(naming as per globalIndex)
- operator[] retrieves the referenced value (linear indexing)
- labels() returns a flattened labelList (as per labelRange itself)
- broadcasts list contiguous content as a two-step process:
1. broadcast the size, and resize for receiver list
2. broadcast contiguous contents (if non-empty)
This avoids serialization/de-serialization memory overhead but at
the expense of an additional broadcast call.
The trade-off of the extra broadcast of the size will be less
important than avoiding a memory peak for large contiguous mesh data.
REVERT: unstable MPI_Mprobe/MPI_Mrecv on intelmpi + PMI-2 (#2796)
- partial revert of commit c6f528588b, for NBX implementation.
Not yet flagged as causing errors here, but eliminated for
consistency.
- simplifies use with other allocators (eg, memory pools).
Can also be used with other containers.
vectorField fld = ...;
sigFpe::fillNan(fld.data_bytes(), fld.size_bytes());
COMP: inline sigFpe::ignore helper class
- now unused (may be removed in the future), but can avoid compiling
code for it
COMP: missing sigStopAtWriteNow() definition for MSwindows
- simplifies internal handling (like a fileName) and allows the
dictionary name to be used with unambiguous addressing.
The previous dot (.) separator is ambiguous (ie, as dictionary
separator or as part of a keyword).
ENH: foamDictionary report -add/-set to stderr
- selected with '+strict' in WM_COMPILE_CONTROL or 'wmake -strict', it
enables the FOAM_DEPRECATED_STRICT() macro, which can be used to
mark methods that are implicitly deprecated, but are not yet marked
as full deprecated (eg, API modification is too recent, generates
too many warnings). Can be considered a developer option.
- since the Apple SIP (System Integrity Protection) clears environment
variables such as DYLD_LIBRARY_PATH, a number of workarounds have
been used to provide shadow values. However, for a more robust
installation using -rpath at compilation time appears to be the
better solution.
In addition to the usual -rpath specification with absolute file
paths, MacOS supports (@loader_path, @executable_path) as well.
Now default to link with rpath information for MacOS, which can be
disabled by adding `~rpath` in WM_COMPILE_CONTROL
Explicit library paths handled:
- FOAM_FOAM_EXT_LIBBIN, FOAM_EXT_LIBBIN/FOAM_MPI
The executable rpaths are handled assuming a structure of
install-path/bin
install-path/lib/$(FOAM_MPI)
install-path/lib
Absolute compile-time paths for FOAM_USER_LIBBIN, FOAM_SITE_LIBBIN
and FOAM_LIBBIN are not handled since these are either too fragile
(FOAM_USER_LIBBIN and FOAM_SITE_LIBBIN values) or covered via
@loader_path anyhow (FOAM_LIBBIN).
Since the value of FOAM_MPI is a compile-time value, this rpath
treatment makes the installation less suitable for runtime changes
to the MPI vendor/version.
Note: no rpath added for c-only compilations since there are
currently no c-only libraries or executables with dynamic loading
- eliminate ClassName in favour of simple debug
- include Apple-specific FPE handling after local definition
to allow for more redefinitions
COMP: remove stray <csignal> includes
- naming like std::map::try_emplace(), it behaves like emplace_set()
if there is no element at the given location otherwise a no-op
ENH: reuse existing HashPtrTable 'slot' when setting pointers
- avoids extra HashTable operations
- the construction of compound tokens is now split into two stages:
- default construct
- read contents
This permits a larger variety of handling.
- the new token::readCompoundToken(..) method allows for simpler
more failsafe invocations.
- forward resize(), read() methods for compound tokens to support
separate read and population.
Top-level refCompoundToken() method for modify access.
ENH: split off a private readCompoundToken() method within ISstream
- this allows overloading and alternative tokenisation handling for
derived classes
- simplifies iteration of ITstream using nRemainingTokens() and skip()
methods or directly as a list of tokens.
The currentToken() method returns const or non-const access to
the token at the current tokenIndex.
The peekToken(label) method provides failsafe read access to tokens
at given locations.
ENH: add primitiveEntry construct with moving a single token
Increase usage of std algoritms within the OpenFOAM List classes. Remove reliance on linked-list during reading
See merge request Development/openfoam!620
- drop unnecessary Foam::Swap specializations when MoveConstructible
and MoveAssignable already apply. The explicit redirect to swap
member functions was needed before proper move semantics where
added.
Removed specializations: autoPtr, refPtr, tmp, UList.
Retained specialization: DynamicList, FixedList.
Special handling for DynamicList is only to accommodate dissimilar
sizing template parameters (which probably doesn't occur in
practice).
Special handling for FixedList to apply element-wise swapping.
- use std::swap for primitives. No need to mask with Foam::Swap wrapper
- fully implement DynamicList::readList() instead of simply
redirecting to List::readList(). This also benefits DynamicField.
Leverage DynamicList reading to simplify and improve CircularBuffer
reading.
- bracket lists are now read chunk-wise instead of using a
singly-linked list. For integral and vector-space types
(eg, scalar, vector, etc) this avoids intermediate allocations
for each element.
ENH: add CircularBuffer emplace_front/emplace_back
STYLE: isolate to-be-deprecated construct/assign forms
- still have construct/assign FixedList from a C-array.
This is not really needed, can use std::initializer_list
- still have construct/assign List from SLList.
Prefer to avoid these in the future.
DEFEATURE: remove construct/assign FixedList from SLList
- never used
DEFEATURE: remove move construct/assign List from SLList
- now unused. Retain copy construct/assign from SLList for transition
purposes.
- test for existing globalData() or perhaps use DIY globalIndex instead
STYLE: check for non-ASCII instead of BINARY with compression
- allows for other non-ASCII formats
- is_vectorspace :
test existence and non-zero value of the Type 'rank' static variable
- pTraits_rank :
value of 'rank' static variable (if it exists), 0 otherwise
- pTraits_nComponents :
value of 'nComponents' static variable (if it exists), 1 otherwise
- pTraits_has_zero :
test for pTraits<T>::zero member, which probably means that it also
has one, min, max members as well
Note that these traits are usable with any classes. For example,
- is_vectorspace<std::string>::value ==> false
- pTraits_nComponents<std::string>::value ==> 1
- pTraits<std::string>::nComponents ==> fails to compile
Thus also allows testing pTraits_rank<...>::value with items
for which pTraits<...>::rank fails to compile.
Eg, cyclicAMIPolyPatch::interpolate called by FaceCellWave with a
wallPoint.
pTraits<wallPoint>::rank ==> fails to compile
is_vectorspace<wallPoint>::value ==> false
GIT: relocate ListLoopM.H to src/OpenFOAM/fields/Fields (future isolation)
- in most cases a parallel-consistent order is required.
Even when the order is not important, it will generally require
fewer allocations to create a UPtrList of entries instead of a
HashTable or even a wordList.
- prefer csorted() method for const access since it ensures that the
return values are also const pointers (for example) even if
the object itself can be accessed as a non-const.
- the csorted() method already existed for HashTable and
objectRegistry, but now added to IOobjectList for method name
consistency (even although the IOobjectList only has a const-access
version)
ENH: objectRegistry with templated strict lookup
- for lookupClass and csorted/sorted. Allows isType restriction as a
compile-time specification.
* resize_null() methods for PtrList variants
- for cases where an existing PtrList needs a specific size and
but not retain any existing entries.
Eg,
ptrs.resize_null(100);
vs. ptrs.free(); ptr.resize(100);
or ptr.resize(100); ptrs.free();
* remove stored pointer before emplacing PtrList elements
- may reduce memory peaks
* STYLE: static_cast of (nullptr) instead of reinterpret_cast of (0)
* COMP: implement emplace_set() for PtrDynList
- previously missing, which meant it would have leaked through to the
underlying PtrList definition
* emplace methods for autoPtr, refPtr, tmp
- applies reset() with forwarding arguments.
For example,
tmp<GeoField> tfld = ...;
later...
tfld.emplace(io, mesh);
vs.
tfld.reset(new GeoField(io, mesh));
or
tfld.reset(tmp<GeoField>::New(io, mesh));
The emplace() obviously has reduced typing, but also allows the
existing stored pointer to be deleted *before* creating its
replacement (reduces memory peaks).
- this simplifies polling receives and allows separation from
the sends
ENH: add UPstream::removeRequests(pos, len)
- cancel/free of outstanding requests and remove segment from the
internal list of outstanding requests
- removed gatherv control.
The globalIndex information is cached on the merged surface
and thus not triggered often.
- strip out debug mergeField method which was a precursor to
what is now within surfaceWriter itself.
- add 'merge' true/false handling to allow testing without
parallel merging (implies no writing)
- continue to support spherical by default (for compatibility)
but add the 'spherical' switch to disable that and use a cubic
distribution instead.
STYLE: reduce number of inline files
Co-authored-by: Mark Olesen <>
- can be used, for example, to track global states:
// Encode as 0:empty, 1:uniform, 2:nonuniform, 3:mixed
PackedList<2> uniformity(fields.size());
forAll(fields, i)
{
uniformity.set(i, fields[i].whichUniformity());
}
reduce
(
uniformity.data(),
uniformity.size_data(),
bitOrOp<unsigned>()
);
- can reduce communication by only sending non-zero data (especially
when using NBX for size exchanges), but proper synchronisation with
multiply-connected processor/processor patches (eg, processorCyclic)
may still require speculative sends.
Can now setup for PstreamBuffers 'registered' sends to avoid
ad hoc bookkeeping within the caller.
- simplifies code by avoiding code duplication:
* parLagrangianDistributor
* meshToMesh (processorLOD and AABBTree methods)
BUG: inconsistent mapping when using processorLOD boxes (fixes#2932)
- internally the processorLODs createMap() method used a 'localFirst'
layout whereas a 'linear' order is what is actually expected for the
meshToMesh mapping. This will cause of incorrect behaviour
if using processorLOD instead of AABBTree.
A dormant bug since processorLOD is not currently selectable.
- when constructing from a sendMap, can now also specify a linear
receive layout instead of a localFirst layout
This will make it easier to reduce some code (#2932)
- add missing interface for simple distribute of List/DynamicList
with a specified commsType. Was previously restricted to
defaultCommsType only.
ENH: mapDistribute distribute/reverseDistribute with specified commsType
STYLE: prefer UPstream vs Pstream within mapDistribute
- replaces previous (similar) union but leverages the type tag for
handling logic
STYLE: remove unneeded refCount from exprResult
COMP: operator!= as member operator (exprResultDelayed, exprResultStored)
- the operator!= as a free function failed to resolve after removing
the refCount inheritance
- primarily for handling expression results,
but can also be used as a universal value holder.
Has some characteristics suitable for type-less IO:
eg, is_integral(), nComponents()
ENH: add is_pointer() check for expression scanToken
- handle existence/non-existence of a FoamFile header automatically
- support an upper limit when getting the number of blocks and
use that for a hasBlock(...) method, which will stop reading sooner.
- Time is normally constructed with READ_MODIFIED for its controlDict
and objectRegistry, but for certain applications (eg, redistributePar)
it can be useful to construct without file monitoring and specifying
MUST_READ instead.
Example,
Info<< "Create time\n" << Foam::endl;
Time runTime
(
Time::controlDictName,
args,
false, // Disallow functionObjects
true, // Allow controlDict "libs"
IOobjectOption::MUST_READ // Instead of READ_MODIFIED
);
- update TimeState access methods
- use writeTime() instead of old method name outputTime()
- use deltaTValue() instead of deltaT().value()
to avoids pointless construct of intermediate
- no change in behaviour except to emit a warning when called with the
a non-reading readOption
STYLE: remove redundant size check
- size checking is already done by Field::assign() within the
DimensionedField::readField
- commit fb69a54bc3 accidentally changed the constructMap compact
order from linear ordering to local elements first order. Seems to
interact poorly with other bookkeeping so doing a partial revert,
but still replacing the old allGatherList with exchangeSizes.
Note:
the processorLOD method does actually use a constructMap with local
elements first ordering, so some inconsistency may still exist
there
Corrects turbulence viscosity field (e.g. nut) within a specified
region by applying a maximum limit, set according to a coefficient
multiplied by the laminar viscosity:
\nu_{t,max} = c \nu
Corrections applied to:
nut | Turbulence vicosity [m2/s2]
Usage
Minimal example by using \c constant/fvOptions:
\verbatim
limitTurbulenceViscosity1
{
// Mandatory entries (unmodifiable)
type limitTurbulenceViscosity;
// Optional entries (runtime modifiable)
nut nut;
c 1e5;
// Mandatory/Optional (inherited) entries
...
}
The optional areaNormalisationMode entry determines how the area normalisation
is performed. Options are:
- `project`: tri face area dotted with patch face normal; same as v2212 (default)
- `mag`: tri face area magnitude (v2206 and earlier)
Example usage:
AMI1
{
type cyclicAMI;
...
areaNormalisationMode mag;
//areaNormalisationMode project;
}
- the special MacOS dlopen handling (commit f584ec97d0)
did not fully solve the problem with SIP clearing.
Eg, sourcing the RunFunctions (for runParallel) triggers SIP and
clears DYLD_LIBRARY_PATH. With the cleared path it finds the dummy
libraries: the dummy Pstream::init() fails.
- for simulations where the yPlus is needed for other purposes or
just for obtaining information on the patches it can be useful
to disable field writing and save disk space.
The 'writeFields' flag (as per some other function objects)
has been added control writing the yPlus volume field.
If unspecified, the default value is 'true' so that the yPlus
function object continues to work as before.
However, this default may change to 'false' in the future to align
with other function objects.
ENH: wallShearStress: support disable of field writing
- similar to yPlus, the write() method combines writing information
and writing the fields. The 'writeFields' flag allows some
separation of that logic.
- replace Map with a List or DynamicList to reduce the number of
operations and allocations within the loops.
Use polyBoundaryMesh::nProcessorPatches() for initial capacity
to avoid reallocations.
- returns the number of processorPolyPatch patches (finiteVolume)
or else the number of processorFaPatch patches (finiteArea).
These can be useful when sizing lists etc.
- the changes introduced in f215ad15d1 aim to reduce unnecessary
point-to-point communication. However, if there are also
processorCyclic boundaries involved, there are multiple connections
between any two processors, so simply skipping empty sends will cause
synchronization problems.
Eg,
On the send side:
patch0to1_a is zero (doesn't send) and patch0to1_b does send
(to the same processor).
On the receive side:
patch1to0_a receives the data intended for patch1to0_b !
Remedy
======
Simply stream all of send data into PstreamBuffers
(regardless if empty or non-empty) but track the sends
as a bit operation: empty (0) or non-empty (1)
Reset the buffer slots that were only sent empty data.
This adds an additional local overhead but avoids communication
as much as possible.
- files might have been set during token reading so only on
known on master processor.
Broadcast names to all processors (even alhough they are only
checked on master) so that the watched states remain synchronised
- freeCommmunicatorComponents needs an additional bounds check.
When MPI is initialized outside of OpenFOAM, there are no
UPstream communicator equivalents
- for boundary conditions such as uniformFixed, uniformMixed etc the
optional 'value' entry (optional) is used for the initial values and
restarts. Otherwise the various Function1 or PatchFunction1 entries
are evaluated and used determine the boundary condition values.
In most cases this is OK, but in some case such coded or expression
entries with references to other fields it can be problematic since
they may reference fields (eg, phi) that have not yet been created.
For these cases the 'value' entry will be needed: documentation
updated accordingly.
STYLE: eliminate some unneeded/unused declaration headers
- provides a more succinct way of writing
{fa,fv}PatchField<Type>::patchInternalField(*this)
as well as a consistent naming that can be used for patches derived
from valuePointPatchField
ENH: readGradientEntry helper method for fixedGradient conditions
- simplifies coding and logic.
- support different read construct modes for fixedGradient
- individual processor Time databases are purely for internal logistics
and should not be introducing any new library symbols: these will
already have been loaded in the outer loop.
- MPI_THREAD_MULTIPLE is usually undesirable for performance reasons,
but in some cases may be necessary if a linked library expects it.
Provide a '-mpi-threads' option to explicitly request it.
ENH: consolidate some looping logic within argList
- can be broadly categorised as 'unthreaded'
or 'collated' (threading requirement depends on buffering)
without other opaque inheritances.
CONFIG: add hostUncollated to bash completion prompt
- The Apple SIP (System Integrity Protection) clears environment
variables, which affects the behaviour of dynamic library loading
(the DYLD_LIBRARY_PATH env variable).
OpenFOAM shadows this variable as FOAM_LD_LIBRARY_PATH, which has
been used to restore DYLD_LIBRARY_PATH (eg, in RunFunctions script).
However, this solution is not quite complete, as it
(a) requires sourcing of RunFunctions file,
(b) additional errors appear depending on a user workflow.
This changeset alleviates the problem by also iterating through
paths stored in the shadow variable when loading dynamic libraries
(if the DYLD_LIBRARY_PATH is empty).
- with C++11, static constexpr variables apparently also require
definition in a translation unit and not just as inlined quantities.
Mostly not an issue, however gcc with -O0 does not do the inlining
and thus actually requires them to be defined in a translation unit
as well.
These variables were provided for symmetry with worldComm, but only
used in low-level internal code. Changing to inlined functions
solves the linkage issue and also aligns with the commWorld()
function naming.
Mnemonics:
MPI_COMM_SELF => UPstream::commSelf()
overall MPI_COMM_WORLD => UPstream::commGlobal(), sometimes commWorld()
local COMM_WORLD => UPstream::commWorld()
- useful when speculative receives have been initiated but are no
longer required.
Combines MPI_Cancel() + MPI_Request_free() for consistent resource
management. Currently no feedback provided if the request was
satisfied by a completed send/recv or by cancellation (can be added
later if required).
- primarily relevant for finite-area meshes, in which case they can be
considered to be an additional, detailed diagnositic that is
normally not needed (clutters the file system).
Writing can be enabled with the `-write-edges` option.
- enable 'faceZones' support.
- enable 'writeFile' support to better control file output.
- rename 'PatchPostProcessing' as 'ParticlePostProcessing' for better clarity.
- fix#1808
- enable 'faceZone' support.
- introduce 'cloudFunctionObjectTools' to simplify collection of particle info
on patches or face zones.
- enable 'writeFile' support to better control file output.
- rename 'PatchParticleHistogram' as 'ParticleHistogram' for better clarity.
- extend the loadOrCreateMesh functionality to work in conjunction
with file handlers. This allows selective loading of the mesh parts
without the ugly workaround of writing zero-sized meshes to disk and
then reading them back.
Co-authored-by: Mark Olesen <>
- fatten the interface to continue allowing write control with a bool
or with a dedicated file handler. This may slim down in the future.
Co-authored-by: mattijs <mattijs>
- use local function for the decision making, whether worldComm or a
dedicated communicator is needed (and which sibling ranks are
involved)
Co-authored-by: mattijs <mattijs>
- accept plain lists (space or comma separated) as well as the
traditional OpenFOAM lists. This simplifies argument handling
with job scripts.
For example,
simpleFoam -ioRanks 0,4,8 ...
vs
simpleFoam -ioRanks '(0 4 8)' ...
It is also possible to select the IO ranks on a per-host basis:
simpleFoam -ioRanks host ...
- expose rank/subrank handling as static fileOperation methods
- previously checked on destruction, but it is robuster to check for a
locally defined communicator during construction
- add InfoProxy output for fileOperation
ENH: add fileOperation::storeComm()
- transfers management of the communicator from external to internal.
Use with caution
- for special cases it can simplify sharing of processor communication
patterns, but no visible change for most code.
- make fileHandler communicator modifiable (mutable), for special
cases. The changes from 9711b7f1b9 now make this safer to do.
Continue to support legacy global function using an autoPtr:
autoPtr<fileOperation> Foam::fileHandler(autoPtr<fileOperation>&&);
However, new code using refPtr uses the following static method since
swapping out file handlers is an infrequent operation that should
also stand out a bit more.
fileOperation::fileHandler(...);
- consolidate file synchronization checks in dynamicCode
STYLE: report missing library on master only (not every rank)
- avoid flooding the output with messages
Co-authored-by: mattijs <mattijs>
- avoid explicit isFile() check in favour of a lazy-read.
With redistributePar + fileHandler, for example, it is possible that
the master processor finds file but not the subprocs
ENH: lazy reading of tetBasePtIs
- delay reading until needed
Co-authored-by: Mark Olesen <>
- added UPstream::allGatherValues() with a direct call to MPI_Allgather.
This enables possible benefit from a variety of internal algorithms
and simplifies the caller
Old:
labelList nPerProc
(
UPstream::listGatherValues<label>(patch_.size(), myComm)
);
Pstream::broadcast(nPerProc, myComm);
New:
const labelList nPerProc
(
UPstream::allGatherValues<label>(patch_.size(), myComm)
);
- Pstream::allGatherList uses MPI_Allgather for contiguous values
instead of the hand-rolled tree walking involved with
gatherList/scatterList.
-
- simplified the calling parameters for mpiGather/mpiScatter.
Since send/recv data types are identical, the send/recv count
is also always identical. Eliminates the possibility of any
discrepancies.
Since this is a low-level call, it does not affect much code.
Currently just Foam::profilingPstream and a UPstream internal.
BUG: call to MPI_Allgather had hard-coded MPI_BYTE (not the data type)
- a latent bug since it is currently only passed char data anyhow
UPstream::allocateCommunicator
- with contiguous sub-procs. Simpler, more compact handling, ranks
are guaranteed to be monotonic
UPstream::commWorld(label)
- ignore placeholder values, prevents accidental negative values
- make communicator non-optional for UPstream::broadcast(), which
means it has three mandatory parameters and thus always fully
disambiguated from Pstream::broadcast().
ENH: relax size checking on gatherList/scatterList
- only fatal if the List size is less than nProcs.
Can silent ignore any trailing elements: they will be untouched.
- calling the mixed BC dictionary construct with NO_READ leaves the
fields properly sized, but not initialised.
ENH: add mixed BC constructor zero initialise
- sometimes the last commit is not enough information about
the tested state (especially with extensive rebasing).
Also provide the short context of some previous commits.
- this refinement of commit 81807646ca makes these methods
consistent with other objects/containers.
The 'unsigned char' access is still available via cdata()
- extend toc/sortedToc wrappers to bitSet and labelHashSet to allow
use of BitOps::toc(...) in templated code
- size_data() method to return the number of addressed integer blocks
similar to size_bytes() does, but for int instead of char.
- use typeHeaderOk<regIOobject>(false) for some generic file existence
checks. Often had something like labelIOField as a placeholder, but
that may be construed to have a particular something.
- ensures that read failures can be properly detected
COMP: include refPtr.H instead of autoPtr.H in IOobject.H
- ensures inclusion of autoPtr/refPtr/tmp/stdFoam
ENH: add IOobject::resetHeader() method
- when re-using an IOobject for repeated read operations it enforces
resetting of headerClassName, scalar/label sizes etc prior to
reading. Permits convenient resetting of the name too (optional).
Example,
IOobject rio("none", ..., IOobject::LAZY_READ);
rio.resetHeader("U")
if (returnReduceOr(rio.typeHeaderOk<volVectorField>(false)))
...
io.resetHeader("p")
if (returnReduceOr(rio.typeHeaderOk<volScalarField>(false)))
...
- coupled patches are treated distinctly and independently of
internalFacesOnly, it makes little sense to report them with a
warning about turning off boundary faces (which would not work
anyhow).
STYLE: update code style for createBaffles
- permitting a cast to a non-const pointer adds uncertainty of
ownership.
- adjust PtrDynList transfer. Remove the unused 'PtrDynList::remove()'
method, which is better handled with pop().
- initialise with Switch::INVALID and then test if good() to
trigger the initial update.
This avoids some overhead, but primarily avoids ambiguity with
implicit casting to a 'bool' that autoPtr<bool> has.
- Allows clearing or freeing pointers without touching the underlying
list size. Was previously only for PtrDynList, but now available on
UPtrList, PtrList as well.
- add transfer() method to PtrDynList to avoid potential slicing.
- provide uniformMixed conditions for finite-area and finite-volume.
These are intended to replace the exprMixed condition but allow
the full range of different PatchFunction1 and Function1 types.
- add uniformFixedGradient to finite-area for completeness.
Note:
- still some possible difficulties with the order of evaluation.
- eg, using an expression within the 'U' field that depends
of the surface 'phi' field before that is constructed.
In this case, the 'value' entry is really needed.
- multiply-connected edges can arise at the centre of a "star"
connection or because the patch faces are actually baffles.
- In the serial case these internal edges are also rather dubious in
terms of modelling. However, when they are split across multiple
processors there can only be a single processor-to-processor
connectivity.
We don't necessary have enough information to know how things should
be connected, so connect pair-wise as the first remedial solution
- Any extra dangle edges are relegated to an 'ignore' faPatch
to tag as needing different handling.
- this is a placeholder boundary BC for using with bad or illegal
edges. It is currently functionally identical to zero-gradient.
Naming and definition still subject to change.
- this complements the whichPatch(meshFacei) method [binary search]
and the list of patchID() by adding internal range checks.
eg,
Before
~~~~~~
if (facei >= mesh.nInternalFaces() && facei < mesh.nFaces())
{
patchi = pbm.patchID()[facei - mesh.nInternalFaces()];
...
}
After
~~~~~
patchi = pbm.patchID(facei);
if (patchi >= 0)
{
...
}
- functionality introduced by openfoam.org to support selective
caching of temporary fields. The purpose is two-fold: to enable
diagnostics and to allow more places to use unregistered fields by
default.
For example to cache the grad(k) field in
cacheTemporaryObjects
(
grad(k)
);
If the name of a field which in never constructed is added to the
cacheTemporaryObjects list a waning message is generated which
includes a useful list of ALL the temporary fields constructed
during the time step
Multiple regions are also supported by specifying individual region
names in a cacheTemporaryObjects dictionary.
cacheTemporaryObjects
{
porous
(
porosityBlockage:UNbr
);
}
functions
{
writePorousObjects
{
type writeObjects;
libs (utilityFunctionObjects);
region porous;
writeControl writeTime;
writeOption anyWrite;
objects (porosityBlockage:UNbr);
}
}
- for interface polling previously required that both send and recv
requests were completed before evaluating (values or matrix update).
However, only the recv needs to be complete, which helps disentangle
the inter-rank waiting.
NB: this change is possible following (1f5cf3958b) that replaced
UPstream::resetRequests() call in favour of UPstream::waitRequests()
- UPstream exit with a non-zero return code is raised by things like
exit(FatalError) which means there is no reason to believe that
any/all of the buffered sends, requests etc have completed.
Thus avoid detaching buffers, freeing communicators etc in this
situation. This makes exit(1) behave much more like abort(), but
without any stack trace. Should presumably help with avoiding
deadlocks on exit.
ENH: support transfer from a wrapped MPI request to global list
- allows coding with a list UPstream::Request and subsequently either
retain that list or transfer into the global list.
- can use traits to distinguish label vs scalar types and
setComponents to properly index into single or multi-component
types without needing template specialisations for the task.
This avoids the need for a concrete translation unit and the
reported problem of multiply-defined specialisations when the header
is included in different places.
- the default (uninitialised) value for edge connections of -1
could be confused with a tagged finiteArea patch, which used
(-patchid-1) encoding. This would lead to messages about erroneous
processor-processor addressing, but is in fact an mismatched edge
connection.
Now tag the finiteArea patch as (-patchid-2) to avoid this ambiguity
and correctly generate an "Undefined connection:" message instead.
Properly flush the VTP writers before raising a FatalError
to ensure that they are not prematurely truncated.
Open Point:
The base problem of "Undefined connection:" is largely related to
multiply-connected face edges (ie, from the underlying volume mesh).
Not easily remedied in the finiteArea generation.
TUT: basic finiteArea setup on motorBike
- have read(nullptr, count) and readRaw(nullptr, count) act like a
forward seek instead of failing.
This lets it be used to advance through a file without needing to
allocate (and discard) storage space etc.
- construct from components, or use word::null to ensure
consistent avoid naming between IOobject vs dimensioned type.
- support construct with parameter ordering as per DimensionedField
ENH: instantiate a uniformDimensionedLabelField
- eg, for registering standalone integer counters
- directory discovery originally designed for a sub-dir location
(eg, etc/openfoam) but failed if called from within the sub-dir
itself.
Now simply assume it is located in the project directory or the etc/
sub-dir, so that it can also be relocated into the project directory
in the future (pending changes to RPM and debian packaging)
- for querying all outstanding requests:
if (UPstream::finishedRequests(startRequest)) ...
if (UPstream::finishedRequests(startRequest, -1)) ...
- for querying slice of outstanding requests:
if (UPstream::finishedRequests(startRequest, 10)) ...
- simplifies communication structuring with intra-host communication.
Can be used for IO only, or for specialised communication.
Demand-driven construction. Gathers the SHA1 of host names when
determining the connectivity. Internally uses an MPI_Gather of the
digests and a MPI_Bcast of the unique host indices.
NOTE:
does not use MPI_Comm_splt or MPI_Comm_splt_type since these
return MPI_COMM_NULL on non-participating process which does not
easily fit into the OpenFOAM framework.
Additionally, if using the caching version of
UPstream::commInterHost() and UPstream::commIntraHost()
the topology is determined simultaneously
(ie, equivalent or potentially lower communication).
- make sizing of commsStruct List demand-driven as well
for more robustness, fewer unneeded allocations.
- fix potential latent bug with allBelow/allNotBelow for proc 0
(linear communication).
ENH: remove unused/unusable UPstream::communicator optional parameter
- had constructor option to avoid constructing the MPI backend,
but this is not useful and inconsistent with what the reset or
destructor expect.
STYLE: local use of UPstream::communicator
- automatically frees communicator when it leaves scope
- these are primarily when encountering sparse (eg, inter-host)
communicators. Additional UPstream convenience methods:
is_rank(comm)
=> True if process corresponds to a rank in the communicators.
Can be a master rank or a sub-rank.
is_parallel(comm)
=> True if parallel algorithm or exchange is used on the process.
same as
(parRun() && (nProcs(comm) > 1) && is_rank(comm))
- for robustness with small edges (which can occur with snappy meshes),
the Le() and magLe() are limited to SMALL (commit a0f1e98d24).
Now use factor sqrt(1/3) in the components to maintain magnitude of 1.
ENH: add fvMesh::unitSf() and faMesh::unitLe() methods
- simple wrappers around Sf()/magSf() and Le()/magLe() but with
the potential for additional/alternative corrections.
STYLE: thisDb() in faMesh code to simplify future changes in storage
ENH: do not register finite-area geometric fields
- consistent with finite-volume treatment
- replace the "one-size-fits-all" approach of tensor field inv()
with individual 'failsafe' inverts.
The inv() field function historically just checked the first entry
to detect 2D cases and adjusted/readjusted *all* tensors accordingly
(to avoid singularity tensors and/or noisy inversions).
This seems to have worked reasonably well with 3D volume meshes, but
breaks down for 2D area meshes, which can be axis-aligned
differently on different sections of the mesh.
- with (nPollProcInterfaces < 0) it does the following:
- loop, waiting for some requests to finish
- for each out-of-date interface, check if its associated
requests have now finished (ie, the ready() check).
- if ready() -> call updateInterfaceMatrix()
In contrast to (nPollProcInterfaces > 0) which loops a specified
number of times with several calls to MPI_Test each time, the
(nPollProcInterfaces < 0) variant relies on internal MPI looping
within MPI_Waitsome to progress communication.
The actual dispatch still remains non-deterministic (ie, waiting for
some requests to finish does not mean that any particular interface
is eligible for update, or in any particular order). However, using
Waitsome places the tight looping into the MPI layer, which results
in few calls and eliminates behaviour dependent on the value of
nPollProcInterfaces.
TUT: add polling to windAroundBuildings case (for testing purposes)
- fewer calls, potentially more consistent
ENH: update sendRequest state after recvRequest wait
- previously had this type of code:
// Treat send as finished when recv is done
UPstream::waitRequest(recvRequest_);
recvRequest_ = -1;
sendRequest_ = -1;
Now refined as follows:
// Require receive data. Update the send request state.
UPstream::waitRequest(recvRequest_);
recvRequest_ = -1;
if (UPstream::finishedRequest(sendRequest_)) sendRequest_ = -1;
Can potentially investigate with requiring both,
but this may be over-contrained.
Example,
// Require receive data, but also wait for sends too
UPstream::waitRequestPair(recvRequest_, sendRequest_);
- checks requests from completion, returning true when some requests
have completed and false when there are no active requests.
This allows it to be used in a polling loop to progress MPI
and then respond when as requests become satisfied.
When using as part of a dispatch loop, waitSomeRequests() is
probably more efficient than calling waitAnyRequest() and can help
avoid biasing which client requests are serviced.
Takes an optional return parameter, to retrieve the indices,
but more importantly to avoid inner-loop reallocations.
Example,
DynamicList<int> indices;
while (UPstream::waitSomeRequests(startRequest, &indices))
{
// Dispatch something ....
}
// Reset list of outstanding requests with 'Waitall' for safety
UPstream::waitRequests(startRequest);
---
If only dealing with single items and an index is required for
dispatching, it can be better to use a list of UPstream::Request
instead.
Example,
List<UPstream::Request> requests = ...;
label index = -1;
while ((index = UPstream::waitAnyRequest(requests)) >= 0)
{
// Do something at index
}
ENH: pair-wise wrappers for MPI_Test or MPI_Wait
- for send/recv pairs of requests, can bundle both together and use a
single MPI_Testsome and MPI_Waitall instead of two individual
calls.
- previously had an additional stack for freedRequests_,
which were used to 'remember' locations into the list of
outstandingRequests_ that were handled by 'waitRequest()'.
This was principally done for sanity checks on shutdown,
but we now just test for any outstanding requests that
are *not* MPI_REQUEST_NULL instead (much simpler).
The framework with freedRequests_ also had a provision to 'recycle'
them by popping from that stack, but this is rather fragile since it
would only triggered by some collectives
(MPI_Iallreduce, MPI_Ialltoall, MPI_Igather, MPI_Iscatter)
with no guarantee that these will all be properly removed again.
There was also no pruning of extraneous indices.
ENH: consolidate internal reset/push of requests
- replace duplicate code with inline functions
reset_request(), push_request()
ENH: null out trailing requests
- extra safety (paranoia) for the UPstream::Request versions
of finishedRequests(), waitAnyRequest()
CONFIG: document nPollProcInterfaces in etc/controlDict
- still experimental, but at least make the keyword known
- mechanism has been unused for at least a decade or more
(or was never used). Message tags are assigned on an ad hoc basis
locally when collision avoidance is necessary.
- not currently used, but it is possible that communicator allocation
modifies the list of sub-ranks. Ensure that the correct size is used
when (re)initialising the linear/tree structures.
STYLE: adjust MPI test applications
- remove some clutter and unneeded grouping.
Some ideas for host-only communicators
- allow reporting even when profiling is suspended
- consolidate reporting into profilingPstream itself
(avoids code scatter).
Example of possible advanced use for timing only one section of
code:
====
// Profile local operations
profilingPstream::enable();
... do something
// Don't profile elsewhere
profilingPstream::suspend();
====
- separate broadcast times from reduce/gather/scatter time
- separate wait times from all-to-all time
- support invocation counts, split off requests time/count
from others to avoid flooding the counts
- support 'detail' switch to increase the output information.
Format may change in the future
- attributes such as assignable(), coupled() etc
- common patchField types: calculatedType(), zeroGradientType() etc.
This simplifies reference to these types without actually needing a
typed patchField version.
ENH: add some basic patchField types to fieldTypes namespace
- allows more general use of the names
ENH: set extrapolated/calculated from patchInternalField directly
- avoids intermediate tmp
- with the current handling of small edges (finite-area), the LSQ
vectors can result in singular/2D tensors. However, the regular
2D handling in field inv() only detects based on the first element.
Provide a 'failsafe' inv() method for symmTensor and tensor that
follows a similar logic for avoiding zero determinates, but it is
applied on a per element basis, instead of deciding based on the
first field element.
The symmTensor::inv(bool) and tensor::inv(bool) methods have a
fairly modest additional overhead.
- unroll the field inv() function to avoid creating an intermediate
field. Reduce the number of operations when adjusting/re-adjusting
the diagonal.
- for cases where a 3D tensor is being used to represent 2D content,
the determinant is zero. Can use inv2D(excludeDirection) to compensate
and invert as if it were only 2D.
ENH: consistent definitions for magSqr of symmTensors, diagSqr() norm
COMP: return scalar not component type for magSqr
- had inconsistent definitions with SymmTensor returning the component
type and Tensor returning scalar. Only evident with complex.
- when only a partial stacktrace is desirable.
ENH: add stack trace decorators
- the 0-th frame is always printStack(), so skip that and emit
some headers/footers instead. Eg,
[stack trace]
=============
#1 Foam::SymmTensor<double> Foam::inv<double>(...)
#2 Foam::inv(Foam::UList<Foam::SymmTensor<double>> const&) ...
...
=============
- data_bytes(), size_bytes() methods to support broadcasting or
gather/scatter content. Additional construct from raw bytes
to support transmitting content.
- missed consistency in a few places.
- return nullptr (with automatic conversion to tmp) on failures
instead of tmp<....>(nullptr), for cleaner coding.
INT: add support for an 'immovable' tmp pointer
- this idea is from openfoam.org, to allow creation of a tmp that is
protected from having its memory reclaimed in field operations
ENH: tmp NewImmovable factory method, forwards as immovable/movable
@ -124,6 +124,9 @@ int main(int argc, char *argv[])
argList::addDryRunOption("Just for testing");
argList::addVerboseOption("Increase verbosity");
// Check -verbose before initialisation
UPstream::debug=argList::verbose(argc,argv);
#include"setRootCase.H"
Pout<<"command-line ("
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.