ThirdParty-6/scotch_6.0.6/doc/src/ptscotch/p_l.tex

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                                  %
% Titre  : p_l.tex                 %
% Sujet  : Manuel de l'utilisateur %
%          du projet 'PT-Scotch'   %
%          Bibliotheque            %
% Auteur : Francois Pellegrini     %
%                                  %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Library}
\label{sec-lib}

All of the features provided by the programs of the
\ptscotch\ distribution may be directly accessed by calling
the appropriate functions of the \libscotch\ library, archived
in files {\tt libptscotch.a} and {\tt libptscotcherr.a}.
All of the existing parallel routines belong to four distinct classes:
\begin{itemize}
\item
distributed source graph handling routines, which serve to declare,
build, load, save, and check the consistency of distributed source
graphs;
%, along with their geometry data;
\item
strategy handling routines, which allow the user to declare and build
parallel mapping and ordering strategies;
\item
parallel graph partitioning and static mapping routines, which allow
the user to declare, compute, and save distributed static mappings of
distributed source graphs;
\item
parallel ordering routines, which allow the user to declare, compute,
and save distributed orderings of distributed source graphs.
\end{itemize}
Error handling is performed using the existing sequential routines of
the \scotch\ distribution, which are described in the
{\it\scotch\ User's Guide}~\scotchcitesuser. Their use is recalled in
Section~\ref{sec-lib-error}.

A \parmetis\ compatibility library, called {\tt lib\lbo ptscotch\lbo
parmetis.a}, is also available. It allows users who were previously
using \parmetis\ in their software to take advantage of the efficieny
of \ptscotch\ without having to modify their code. The services
provided by this library are described in
Section~\ref{sec-lib-parmetis}.

\subsection{Running at proper thread level}
\label{sec-lib-thread}

Since \ptscotch\ is based on the MPI API, all processes must call some
flavor of \texttt{MPI\_\lbt Init} before using any routine of the
library that performs communication. The thread support level of MPI
has to be set in accordance to the level required by the library.

If \ptscotch\ has been compiled without the \texttt{-DSCOTCH\_\lbt
PTHREAD} flag, a call to the simple \texttt{MPI\_\lbt Init} routine
will suffice, because no concurrent MPI calls will be performed by
library routines. Else, the extended \texttt{MPI\_\lbt Init\_\lbt
thread} initialization routine has to be used, to request the
\texttt{MPI\_\lbt THREAD\_\lbt MULTIPLE} level, and the provided
thread support level value returned by the routine must be checked
carefully. If your MPI implementation does not provide the
\texttt{MPI\_\lbt THREAD\_\lbt MULTIPLE} level, you will have to
recompile \ptscotch\ without the \texttt{-DSCOTCH\_\lbt PTHREAD}
flag. Else, library calls may cause random bugs in the communication
subsystem, resulting in program crashes.

\subsection{Calling the routines of {\sc libScotch}}

\subsubsection{Calling from C}

All of the C routines of the \libscotch\ library are prefixed with
``{\tt SCOTCH\_}''. The remainder of the function names is made of the
name of the type of object to which the functions apply (e\@.g\@.
``{\tt dgraph}'', ``{\tt dorder}'', etc.),
followed by the type of action performed on this object: ``{\tt
Init}'' for the initialization of the object, ``{\tt Exit}'' for the
freeing of its internal structures, ``{\tt Load}'' for loading the
object from one or several streams, and so on.

Typically, functions that return an error code return zero if the
function succeeds, and a non-zero value in case of error.

For instance, the {\tt SCOTCH\_\lbt dgraph\lbt Init} and {\tt
SCOTCH\_\lbt dgraph\lbt Load} routines, described in
section~\ref{sec-lib-dgraph}, can be called from C by using the
following code.
{\tt
\begin{verbatim}
#include <stdio.h>
#include <mpi.h>
#include "ptscotch.h"
  ...
  SCOTCH_Dgraph     grafdat;
  FILE *            fileptr;

  if (SCOTCH_dgraphInit (&grafdat) != 0) {
    ... /* Error handling */
  }
  if ((fileptr = fopen ("brol.grf", "r")) == NULL) {
  ... /* Error handling */
  }
  if (SCOTCH_dgraphLoad (&grafdat, fileptr, -1, 0) != 0) {
    ... /* Error handling */
  }
  ...
\end{verbatim}
}

Since ``{\tt ptscotch.h}'' uses several system and communication
objects which are declared in ``{\tt stdio.h}'' and ``{\tt mpi.h}'',
respectively, these latter files must be included beforehand
in your application code.

Although the ``{\tt scotch.h}'' and ``{\tt ptscotch.h}'' files may
look very similar on your system, never mistake them, and always use
the ``{\tt ptscotch.h}'' file as the right include file for compiling
a program which uses the parallel routines of the \libscotch\ library,
whether it also calls sequential routines or not.

\subsubsection{Calling from Fortran}

The routines of the \libscotch\ library can also be called from
Fortran. For any C function named {\tt SCOTCH\_\lbt {\it type\lbt
Action}()} which is documented in this manual, there exists a {\tt
SCOTCHF\lbt {\it TYPE\lbt ACTION\/}()} Fortran counterpart, in which
the separating underscore character is replaced by an ``{\tt
F}''. In most cases, the Fortran routines have exactly the same
parameters as the C functions, save for an added trailing {\tt
INTEGER} argument to store the return value yielded by the function
when the return type of the C function is not {\tt void}.
\\

Since all the data structures used in \libscotch\ are
opaque, equivalent declarations for these structures must
be provided in Fortran. These structures must therefore
be defined as arrays of {\tt DOUBLEPRECISION}s, of sizes
given in file {\tt ptscotchf.h}, which must be included whenever
necessary.

For routines that read or write data using a {\tt FILE~*} stream
in C, the Fortran counterpart uses an {\tt INTEGER} parameter which
is the numer of the Unix file descriptor corresponding to the logical
unit from which to read or write. In most Unix implementations of
Fortran, standard descriptors 0 for standard input (logical unit 5),
1 for standard output (logical unit 6) and 2 for standard error are
opened by default. However, for files that are opened using
{\tt OPEN} statements, an additional function must be used to obtain
the number of the Unix file descriptor from the number of the logical
unit. This function is called \texttt{PXFFILENO} in the normalized
POSIX Fortran API, and files which use it should include the
\texttt{USE IFPOSIX} directive whenever necessary. An alternate, non
normalized, function also exists in most Unix implementations of
Fortran, and is called {\tt FNUM}.

For instance, the {\tt SCOTCH\_\lbt dgraph\lbt Init} and
{\tt SCOTCH\_\lbt dgraph\lbt Load} routines, described in
sections~\ref{sec-lib-dgraphinit} and~\ref{sec-lib-dgraphload},
respectively, can be called from Fortran by using the following code.
{\tt
\begin{verbatim}
        INCLUDE "ptscotchf.h"
        DOUBLEPRECISION GRAFDAT(SCOTCH_DGRAPHDIM)
        INTEGER RETVAL
        ...
        CALL SCOTCHFDGRAPHINIT (GRAFDAT (1), RETVAL)
        IF (RETVAL .NE. 0) THEN
        ...
        OPEN (10, FILE='brol.grf')
        CALL SCOTCHFDGRAPHLOAD (GRAFDAT (1), PXFFILENO (10), 1, 0, RETVAL)
        CLOSE (10)
        IF (RETVAL .NE. 0) THEN
        ...
\end{verbatim}
}

Although the ``{\tt scotchf.h}'' and ``{\tt ptscotchf.h}'' files may
look very similar on your system, never mistake them, and always use
the ``{\tt ptscotchf.h}'' file as the include file for compiling a
Fortran program that uses the parallel routines of the
\libscotch\ library, whether it also calls sequential routines or not.

All of the Fortran routines of the \libscotch\ library are stubs which
call their C counterpart. While this poses no problem for the usual
integer and double precision data types, some conflicts may occur at
compile or run time if your MPI implementation does not represent the
{\tt MPI\_Comm} type in the same way in C and in Fortran. Please check
on your platform to see in the {\tt mpi.h} include file if the {\tt
MPI\_Comm} data type is represented as an {\tt int}. If it is the
case, there should be no problem in using the Fortran routines of
the \ptscotch\ library.

\subsubsection{Compiling and linking}

The compilation of C or Fortran routines which use parallel
routines of the \libscotch\ library requires that either {\tt
ptscotch.h} or {\tt ptscotchf.h} be included, respectively. Since some
of the parallel routines of the \libscotch\ library must be passed MPI
communicators, it is necessary to include MPI files {\tt mpi.h} or
{\tt mpif.h}, respectively, before the relevant \ptscotch\ include
files, such that prototypes of the parallel \libscotch\ routines are
properly defined.

The parallel routines of the \libscotch\ library, along with
taylored versions of the sequential routines, are grouped in a
library file called {\tt libptscotch.a}. Default error routines that
print an error message and exit are provided in the classical
\scotch\ library file {\tt libptscotcherr.a}.

Therefore, the linking of applications that make use of the
\libscotch\ library with standard error handling is carried out by
using the following options: ``{\tt -lptscotch -lptscotcherr -lmpi -lm}''.
The ``{\tt -lmpi}'' option is most often not necessary, as the MPI
library is automatically considered when compiling with commands such
as {\tt mpicc}.

If you want to handle errors by yourself, you should not link with
library file {\tt libptscotcherr.a}, but rather provide a
{\tt SCOTCH\_\lbt error\lbt Print()} routine. Please refer to
Section~\ref{sec-lib-error} for more information on error handling.
Section~\ref{sec-lib-error} for more information on error handling.

Programs that use both sequential and parallel routines of
\scotch\ need only be linked against the parallel version of the
library, as it also contains an adapted version of the sequential
routines. The reason why the sequential routines are duplicated in the
parallel \ptscotch\ library is because they are slightly modified so
as to keep track of the parallel environment. This allows one to
create ``multi-sequential'' routines that can exchange data with other
processes, for instance. Because the \libscotch\ data structures
contain extra parameters, never mix the \texttt{scotch.h} sequential
include file and the \texttt{libptscotch.a} parallel library, as the
latter expects \scotch\ data structures to be larger than the ones
defined in the sequential include file. Consequently, when using only
sequential routines in a sequential program, include the
\texttt{scotch.h} file only and link the program against the sequential
\texttt{libscotch.a} library only. When using only parallel routines,
or when using a mix of sequential and parallel routines, include the
\texttt{ptscotch.h} file only and link the program against the parallel
\texttt{libptscotch.a} library only. When using only sequential
routines in a parallel program, both options can be used.

\subsubsection{Machine word size issues}
\label{sec-lib-inttypesize}

Graph indices are represented in \scotch\ as integer values of type
{\tt SCOTCH\_\lbt Num}. By default, this type equates to the {\tt int}
C type, that is, an integer type of size equal to the one of
the machine word. However, it can represent any other integer
type. Indeed, the size of the {\tt SCOTCH\_\lbt Num} integer type can
be coerced to 32 or 64 bits by using the ``{\tt -DINTSIZE32}'' or
``{\tt -DINTSIZE64}'' compilation flags, respectively, or else by
using the ``{\tt -DINT=}'' definition (see
Section~\ref{sec-install-inttypesize} for more information on the
setting of these compilation flags).

This may, however, pose a problem with MPI, the interface of which is
based on the regular {\tt int} type. \ptscotch\ has been coded so
as to avoid typecast bugs, but overflow errors may result from the
conversion of values of a larger integer type into {\tt int}s when
handling communication buffer indices.

Consequently, the C interface of \scotch\ uses two types of integers.
Graph-related quantities are passed as {\tt SCOTCH\_\lbt Num}s,
while system-related values such as file handles, as well as
return values of \libscotch\ routines, are always passed as
{\tt int}s.

Because of the variability of library integer type sizes, one must be
careful when using the Fortran interface of \scotch, as it does not
provide any prototyping information, and consequently cannot produce
any warning at link time. In the manual pages of the
\libscotch\ routines, Fortran prototypes are written using three types
of {\tt INTEGER}s. As for the C interface, the regular {\tt INTEGER}
type is used for system-based values, such as file handles and MPI
communicators, as well as for return values of the
\libscotch\ routines, while the {\tt INTEGER*}{\it num} type
should be used for all graph-related values, in accordance to the size
of the {\tt SCOTCH\_\lbt Num} type, as set by the
``{\tt -DINTSIZE}{\it x}'' compilation flags. Also, the
{\tt INTEGER*}{\it idx} type represents an integer type of a size
equivalent to the one of a {\tt SCOTCH\_\lbt Idx}, as set by the
``{\tt -DIDXSIZE}{\it x}'' compilation flags. Values of this type are
used in the Fortran interface to represent arbitrary array indices
which can span across the whole address space, and consequently
deserve special treatment.

In practice, when \scotch\ is compiled on a 32-bit architecture so as
to use 64-bit {\tt SCOTCH\_\lbt Num}s, graph indices should be
declared as {\tt INTEGER*8}, while error return values
should still be declared as plain {\tt INTEGER} (that is,
{\tt INTEGER*4}) values. On a 32\_64-bit architecture, irrespective of
whether {\tt SCOTCH\_\lbt Num}s are defined as {\tt INTEGER*4}
or {\tt INTEGER*8} quantities, the {\tt SCOTCH\_\lbt Idx} type
should always be defined as a 64-bit quantity, that is, an
{\tt INTEGER*8}, because it stores differences between memory
addresses, which are represented by 64-bit values.
The above is no longer a problem if \scotch\ is compiled such that
{\tt int}s equate 64-bit integers. In this case, there is no need to
use any type coercing definition.
\\

The \metis\ v3 compatibility library provided by \scotch\ can also
run on a 64-bit architecture. Yet, if you are willing to use it this
way, you will have to replace all {\tt int}'s that are passed to the
\metis\ routines by 64-bit integer {\tt SCOTCH\_\lbt Num} values (even
the option configuration values). However, in this case, you will no
longer be able to link against the service routines of the genuine
\metis/\parmetis\ v3 library, as they are only available as a 32-bit
implementation.

\subsection{Data formats}

All of the data used in the \libscotch\ interface are of integer type
{\tt SCOTCH\_Num}. To hide the internals of \ptscotch\ to callers, all
of the data structures are opaque, that is, declared within {\tt
ptscotch.h} as dummy arrays of double precision values, for the sake of
data alignment. Accessor routines, the names of which end in ``{\tt
Size}'' and ``{\tt Data}'', allow callers to retrieve information from
opaque structures.
\\

In all of the following, whenever arrays are defined, passed, and
accessed, it is assumed that the first element of these arrays is
always labeled as {\tt baseval}, whether {\tt baseval} is set to $0$
(for C-style arrays) or $1$ (for Fortran-style arrays). \ptscotch\
internally manages with base values and array pointers so as to
process these arrays accordingly.

\subsubsection{Distributed graph format}
\label{sec-lib-format-dgraph}

In \ptscotch, distributed source graphs are represented so as to
distribute graph data without any information duplication which could
hinder scalability. The only data which are replicated on every
process are of a size linear in the number of processes and
small. Apart from these, the sum across all processes of all of the
vertex data is in $O(v+p)$, where $v$ is the overall number of
vertices in the distributed graph and $p$ the number of processes, and
the sum of all of the edge data is in $O(e)$, where $e$ is the overall
number of arcs (that is, twice the number of edges) in the distributed
graph. When graphs are ill-distributed, the overall halo vertex
information may also be in $o(e)$ at worst, which makes the distributed
graph structure fully scalable.

Distributed source graphs are described by means of adjacency
lists. The description of a distributed graph requires several {\tt
SCOTCH\_Num} scalars and arrays, as shown for instance in
Figures~\ref{fig-lib-dgraf-one} and~\ref{fig-lib-dgraf-two}.
Some of these data are said to be global, and are duplicated on every
process that holds part of the distributed graph; their names contain
the ``{\tt glb}'' infix. Others are local, that is, their value may
differ for each process; their names contain the ``{\tt loc}'' or
``{\tt gst}'' infix. Global data have the following meaning:
\begin{itemize}
\iteme[{\tt baseval}]
Base value for all array indexings.
\iteme[{\tt vertglbnbr}]
Overall number of vertices in the distributed graph.
\iteme[{\tt edgeglbnbr}]
Overall number of arcs in the distributed graph. Since edges are
represented by both of their ends, the number of edge data in
the graph is twice the number of edges.
\iteme[{\tt procglbnbr}]
Overall number of processes that share distributed graph data.
\iteme[{\tt proccnttab}]
Array holding the current number of local vertices borne by every process.
\iteme[{\tt procvrttab}]
Array holding the global indices from which the vertices of every
process are numbered. For optimization purposes, this array has an
extra slot which stores a number which must be greater than all of the
assigned global indices. For each process $p$, it must be ensured that
$\mbox{\tt proc\lbt vrt\lbt tab[}p + 1\mbox{\tt]} \geq
(\mbox{\tt proc\lbt vrt\lbt tab[}p\mbox{\tt]} +
\mbox{\tt proc\lbt cnt\lbt tab[}p\mbox{\tt]})$, that is, that no process
can have more local vertices than allowed by its range of global indices.
When the global numbering of vertices is continuous, for each process $p$,
$\mbox{\tt proc\lbt vrt\lbt tab[}p + 1\mbox{\tt]} =
(\mbox{\tt proc\lbt vrt\lbt tab[}p\mbox{\tt]} +
\mbox{\tt proc\lbt cnt\lbt tab[}p\mbox{\tt]})$.
\end{itemize}
Local data have the following meaning:
\begin{itemize}
\iteme[{\tt vertlocnbr}]
Number of local vertices borne by the given process. In
fact, on every process $p$, {\tt vert\lbt loc\lbt nbr} is equal to
${\tt proc\lbt cnt\lbt tab}\mbox{\tt [}p\mbox{\tt]}$.
\iteme[{\tt vertgstnbr}]
Number of both local and ghost vertices borne by the given process.
Ghost vertices are local images of neighboring vertices located on
distant processes.
\iteme[{\tt vertloctab}]
Array of start indices in {\tt edgeloctab} and {\tt edgegsttab} of
vertex adjacency sub-arrays.
\iteme[{\tt vendloctab}]
Array of after-last indices in {\tt edgeloctab} and {\tt edgegsttab}
of vertex adjacency sub-arrays.
For any local vertex $i$, with $\mbox{\tt baseval} \leq i < (\mbox{\tt
baseval} + \mbox{\tt vertlocnbr})$,
$\mbox{\tt vend\lbt loc\lbt tab[}i\mbox{\tt ]} -
\mbox{\tt vert\lbt loc\lbt tab[}i\mbox{\tt ]}$
is the degree of vertex $i$.

When all vertex adjacency lists are stored in order in
{\tt edge\lbt loc\lbt tab} without any empty space between them,
it is possible to save memory by not allocating the physical memory
for {\tt vend\lbt loc\lbt tab}. In this case, illustrated in
Figure~\ref{fig-lib-dgraf-one}, {\tt vert\lbt loc\lbt tab} is of size
$\mbox{\tt vert\lbt loc\lbt nbr} + 1$ and {\tt vend\lbt loc\lbt tab}
points to $\mbox{\tt vert\lbt loc\lbt tab} + 1$.
For these graphs , called ``compact edge array graphs'', or ``compact
graphs'' for short, {\tt vert\lbt loc\lbt tab} is sorted in
ascending order, $\mbox{\tt vert\lbt loc\lbt tab[}\lbt\mbox{\tt
baseval]} = \mbox{\tt baseval}$ and $\mbox{\tt vert\lbt loc\lbt
tab[}\lbt\mbox{\tt baseval} + \mbox{\tt vert\lbt loc\lbt nbr]} =
(\mbox{\tt baseval} + \mbox{\tt edge\lbt loc\lbt nbr})$.

Since {\tt vertloctab} and {\tt vendloctab} only account for local
vertices and not for ghost vertices, the sum across all processes of the
sizes of these arrays does not depend on the number of ghost vertices;
it is equal to $(v+p)$ for compact graphs and to $2v$ else.
\iteme[{\tt veloloctab}]
Optional array, of size {\tt vert\lbt loc\lbt nbr}, holding the
integer load associated with every vertex.
\iteme[{\tt edgeloctab}]
Array, of a size equal at least to $\left(\max_{i}(\mbox{\tt vend\lbt
loc\lbt tab[} i \mbox{\tt ]}) - \mbox{\tt baseval}\right)$, holding
the adjacency array of every local vertex.
For any local vertex $i$, with $\mbox{\tt baseval} \leq i < (\mbox{\tt
baseval} + \mbox{\tt vertlocnbr})$, the global indices of the neighbors
of $i$ are stored in {\tt edge\lbt loc\lbt tab}
from {\tt edge\lbt loc\lbt tab\lbt [vert\lbt loc\lbt tab[$i$]]} to
$\mbox{\tt edge\lbt loc\lbt tab[vend\lbt loc\lbt tab[}i\mbox{\tt ]} -
1\mbox{\tt ]}$, inclusive.

Since ghost vertices do not have adjacency arrays, because only arcs
from local vertices to ghost vertices are recorded and not the
opposite, the overall sum of the sizes of all {\tt edge\lbt loc\lbt
tab} arrays is $e$.
\iteme[{\tt edgegsttab}]
Optional array holding the local and ghost indices of neighbors of
local vertices.
For any local vertex $i$, with $\mbox{\tt baseval} \leq i < (\mbox{\tt
baseval} + \mbox{\tt vertlocnbr})$, the local and ghost indices
of the neighbors of $i$ are stored in {\tt edge\lbt gst\lbt tab}
from {\tt edge\lbt gst\lbt tab\lbt [vert\lbt loc\lbt tab[$i$]]} to
{\tt edge\lbt gst\lbt tab[vend\lbt loc\lbt tab[$i$]\lbt $- 1$]},
inclusive.

Local vertices are numbered in global vertex order,
starting from {\tt baseval} to $(\mbox{\tt baseval} +
\mbox{\tt vertlocnbr} - 1)$, inclusive. Ghost vertices are also
numbered in global vertex order, from $(\mbox{\tt baseval} +
\mbox{\tt vertlocnbr})$ to $(\mbox{\tt baseval} +
\mbox{\tt vertgstnbr} - 1)$, inclusive.

Only {\tt edgeloctab} has to be provided by the user. {\tt edge\lbt
gst\lbt tab} is internally computed by \ptscotch\ whenever needed,
or can be explicitey asked for by the user by calling function
{\tt SCOTCH\_\lbt dgraph\lbt Ghst}.
This array can serve to index user-defined arrays of quantities borne
by graph vertices, which can be exchanged between neighboring
processes thanks to the {\tt SCOTCH\_\lbt dgraph\lbt Halo}
routine documented in Section~\ref{sec-lib-dgraphhalo}.
\iteme[{\tt edloloctab}]
Optional array, of a size equal at least to $\left(\max_{i}(\mbox{\tt
vend\lbt loc\lbt tab[} i \mbox{\tt ]}) - \mbox{\tt baseval}\right)$,
holding the integer load associated with every arc. Matching arcs
should always have identical loads.
\end{itemize}

\begin{figure}
\centering\includegraphics[scale=0.47]{p_f_gr1.eps}
\caption{Sample distributed graph and its description by
\libscotch\ arrays using a continuous numbering and compact edge
arrays. Numbers within vertices are vertex indices. Top graph is a
global view of the distributed graph, labeled with global, continuous,
indices. Bottom graphs are local views labeled with local and ghost
indices, where ghost vertices are drawn in black.  Since the edge
array is compact, all {\tt vertloctab} arrays are of size $\mbox{\tt
vertlocnbr} + 1$, and {\tt vendloctab} points to $\mbox{\tt
vertloctab} + 1$. {\tt edgeloctab} edge arrays hold global indices
of end vertices, while optional {\tt edgegsttab} edge arrays hold
local and ghost indices. {\tt edgelocnbr} is the local number of arcs
(that is, twice the number of edges), including arcs to local vertices
as well as to ghost vertices {\tt veloloctab} and {\tt edloloctab} are
not represented.}
\label{fig-lib-dgraf-one}
\end{figure}

\begin{figure}
\centering\includegraphics[scale=0.47]{p_f_gr2.eps}
\caption{Adjacency structure of the sample graph of
Figure~\protect\ref{fig-lib-dgraf-one} with a disjoint edge array and
a discontinuous ordering. Both {\tt vertloctab} and {\tt vendloctab}
are of size {\tt vertlocnbr}. This allows for the handling of dynamic
graphs, the structure of which can evolve with time.}
\label{fig-lib-dgraf-two}
\end{figure}

Dynamic graphs can be handled elegantly by using the
{\tt vend\lbt loc\lbt tab} and {\tt proc\lbt vrt\lbt tab} arrays.
In order to dynamically manage distributed graphs, one just has to
reserve index ranges large enough to create new vertices on each
process, and to allocate
{\tt vert\lbt loc\lbt tab}, {\tt vend\lbt loc\lbt tab} and
{\tt edge\lbt loc\lbt tab} arrays that are large
enough to contain all of the expected new vertex and edge
data. This can be done by passing {\tt SCOTCH\_\lbt graph\lbo Build} a
maximum number of local vertices, {\tt vert\lbt loc\lbt max}, greater
than the current number of local vertices, {\tt vert\lbt loc\lbt nbr}.

On every process $p$, vertices are globally labeled starting from
${\tt proc\lbt vrt\lbt tab}\mbox{\tt [}p\mbox{\tt ]}$, and locally
labeled from {\tt baseval}, leaving free space at the end of the local
arrays. To remove some vertex of local index $i$, one just has to
replace $\mbox{\tt vert\lbt loc\lbt tab}\mbox{\tt [}i\mbox{\tt ]}$ and
${\tt vend\lbt loc\lbt tab}\mbox{\tt [}i\mbox{\tt ]}$ with the values
of ${\tt vert\lbt loc\lbt tab}\lbt \mbox{\tt [vert\lbt loc\lbt
nbr}-1\mbox{\tt ]}$ and ${\tt vend\lbt loc\lbt tab}\lbt \mbox{\tt
[vert\lbt loc\lbt nbr}-1\mbox{\tt ]}$, respectively, and browse the
adjacencies of all neighbors of former vertex $(\mbox{\tt
vert}\lbt\mbox{\tt loc}\lbt\mbox{\tt nbr}-1)$ such that all
$(\mbox{\tt vert}\lbt\mbox{\tt loc}\lbt\mbox{\tt nbr}-1)$ indices are
turned into $i$s. Then, {\tt vert\lbt loc\lbt nbr} must be
decremented, and {\tt SCOTCH\_\lbt dgraph\lbt Build()} must be called
to account for the change of topology. If a graph building routine
such as {\tt SCOTCH\_\lbt dgraph\lbt Load()} or {\tt SCOTCH\_\lbt
dgraph\lbt Build()} had already been called on the {\tt SCOTCH\_\lbt
Dgraph} structure, {\tt SCOTCH\_\lbt dgraph\lbt Free()} has to be
called first in order to free the internal structures associated with
the older version of the graph, else these data would be lost, which
would result in memory leakage.

To add a new vertex, one has to fill {\tt vert\lbt loc\lbt tab\lbt
[vertnbr\lbt -1]} and {\tt vend\lbt loc\lbt tab\lbt [vertnbr\lbt -1]}
with the starting and end indices of the adjacency sub-array of the
new vertex. Then, the adjacencies of its neighbor vertices must also
be updated to account for it. If free space had been reserved at the
end of each of the neighbors, one just has to increment the
${\tt vend\lbt loc\lbt tab}\mbox{\tt [}i\mbox{\tt ]}$ values of every
neighbor $i$, and add the index of the new vertex at the end of the
adjacency sub-array. If the sub-array cannot be extended, then it has
to be copied elsewhere in the edge array, and both ${\tt vert\lbt
loc\lbt tab}\mbox{\tt [}i\mbox{\tt ]}$ and ${\tt vend\lbt loc\lbt
tab}\mbox{\tt [}i\mbox{\tt ]}$ must be updated accordingly. With
simple housekeeping of free areas of the edge array, dynamic arrays
can be updated with as little data movement as possible.

\subsubsection{Block ordering format}
\label{sec-lib-format-order}

Block orderings associated with distributed graphs are described by
means of block and permutation arrays, made of {\tt SCOTCH\_Num}s.  In
order for all orderings to have the same structure, irrespective of
whether they are centralized or distributed, or whether they are
created from graphs or meshes, all ordering data indices start from
{\tt baseval}. Consequently, row indices are related to vertex
indices in memory in the following way: row $i$ is associated with
vertex $i$ of the {\tt SCOTCH\_\lbt Dgraph} structure as if the vertex
numbering used for the graph was continuous.

Block orderings are made of the following data:
\begin{itemize}
\iteme[{\tt permtab}]
Array holding the permutation of the reordered matrix. Thus, if $k =
\mbox{\tt permtab}\mbox{\tt [}i\mbox{\tt ]}$, then row $i$ of the
original matrix is now row $k$ of the reordered matrix, that is, row
$i$ is the $k^{\mbox{th}}$ pivot.
\iteme[{\tt peritab}]
Inverse permutation of the reordered matrix. Thus, if $i = \mbox{\tt
peritab[$k$]}$, then row $k$ of the reordered matrix was row $i$ of
the original matrix.
\iteme[{\tt cblknbr}]
Number of column blocks (that is, supervariables) in the block ordering.
\iteme[{\tt rangtab}]
Array of ranges for the column blocks. Column block $c$, with
$\mbox{\tt baseval} \leq c < (\mbox{\tt cblknbr} + \mbox{\tt
baseval})$, contains columns with indices ranging from {\tt
rangtab[$i$]} to {\tt rangtab[$i + 1$]}, exclusive, in the reordered
matrix. Therefore,
{\tt rangtab[baseval]} is always equal to {\tt baseval}, and
{\tt rangtab[cblknbr + baseval]} is always equal to
$\mbox{\tt vert\lbt glb\lbt nbr} + \mbox{\tt baseval}$.
% for graphs
% and to $\mbox{\tt vnod\lbt glb\lbt nbr} + \mbox{\tt baseval}$ for meshes.
In order to avoid memory errors when column blocks are all single
columns, the size of {\tt rangtab} must always be one more than the
number of columns, that is, $\mbox{\tt vert\lbt glb\lbt nbr} + 1$.
% for graphs and $\mbox{\tt vnod\lbt glb\lbt nbr} + 1$ for meshes.
\iteme[{\tt treetab}]
Array of ascendants of permuted column blocks in the separators tree.
{\tt treetab[i]} is the index of the father of column block $i$ in the
separators tree, or $-1$ if column block $i$ is the root of the
separators tree. Whenever separators or leaves of the separators tree
are split into subblocks, as the block splitting, minimum fill or minimum
degree methods do, all subblocks of the same level are linked to the
column block of higher index belonging to the closest separator
ancestor. Indices in {\tt treetab} are based, in the same way as for
the other blocking structures. See Figure~\ref{fig-lib-ord-block} for
a complete example.
\end{itemize}

\begin{figure}
\centering\includegraphics[scale=0.47]{p_f_orb.eps}
\caption{Arrays resulting from the ordering by complete nested
dissection of a 4 by 3 grid based from $1$. Leftmost grid is the
original grid, and righmost grid is the reordered grid, with
separators shown and column block indices written in bold.}
% using strategy  n{sep=hf{bal=0},ole=s,ose=s}
\label{fig-lib-ord-block}
\end{figure}

\subsection{Strategy strings}

The behavior of the static mapping and block ordering routines of the
\libscotch\ library is parametrized by means of strategy strings,
which describe how and when given partitioning or ordering methods
should be applied to graphs and subgraphs
% , or to meshes and submeshes.

\subsubsection{Using default strategy strings}
\label{sec-lib-format-strat-default}

While strategy strings can be built by hand, according to the syntax
given in the next sections, users who do not have specific needs can
take advantage of default strategies already implemented in the
\libscotch, which will yield very good results in most cases. By
doing so, they will spare themselves the hassle of updating their
strategies to comply to subsequent syntactic changes, and they will
benefit from the availability of new partitioning or ordering methods
as soon as they are made available.

The simplest way to use default strategy strings is to avoid
specifying any. By initializing a strategy object, by means of the
{\tt SCOTCH\_\lbt stratInit} routine, and by using the initialized
strategy object as is, without further parametrization, this object
will be filled with a default strategy when passing it as a parameter
to the next partitioning or ordering routine to be called. On return,
the strategy object will contain a fully specified strategy, tailored
for the type of operation which has been requested. Consequently, a
fresh strategy object that was used to partition a graph cannot be
used afterward as a default strategy for calling an ordering routine,
for instance, as partitioning and ordering strategies are incompatible.

The \libscotch\ also provides helper routines which allow users to
express their preferences on the kind of strategy that they
need. These helper routines, which are of the form
{\tt SCOTCH\_\lbt strat\lbt *\lbt Build}, tune default strategy strings
according to parameters provided by the user, such as the requested
number of parts (used as a hint to select the most efficient
partitioning routines), the desired maximum load imbalance ratio,
and a set of preference flags. While some of these flags are
antagonistic, most of them can be combined, by means of addition or
``binary or'' operators. These flags are the following.
They are grouped by application class.

\paragraph{Global flags}

\begin{itemize}
\iteme[{\tt SCOTCH\_STRATDEFAULT}]
Default behavior. No flags are set.
\iteme[{\tt SCOTCH\_STRATBALANCE}]
Enforce load balance as much as possible.
\iteme[{\tt SCOTCH\_STRATQUALITY}]
Privilege quality over speed.
\iteme[{\tt SCOTCH\_STRATSAFETY}]
Do not use methods that can lead to the occurrence of problematic
events, such as floating point exceptions, which could not be properly
handled by the calling software.
\iteme[{\tt SCOTCH\_STRATSPEED}]
Privilege speed over quality.
\end{itemize}

%% \paragraph{Mapping and partitioning flags}

%% \begin{itemize}
%% \iteme[{\tt SCOTCH\_STRATRECURSIVE}]
%% Use only recursive bipartitioning methods, and not direct k-way
%% methods. When this flag is not set, any combination of methods can be
%% used, so as to achieve the best result according to other user
%% preferences.
%% \iteme[{\tt SCOTCH\_STRATREMAP}]
%% Use the strategy for remapping an existing partition.
%% \end{itemize}

\paragraph{Ordering flags}

\begin{itemize}
\iteme[{\tt SCOTCH\_STRATLEVELMAX}]
Create at most the prescribed levels of nested dissection
separators. If the number of levels is less than the logarithm of the
number of processing elements used, distributed pieces of the
separated subgraph may have to be centralized so that the leaves can
be ordered, which may result in memory shortage.
\iteme[{\tt SCOTCH\_STRATLEVELMIN}]
Create at least the prescribed levels of nested dissection separators.
When used in conjunction with {\tt SCOTCH\_\lbt STRAT\lbt LEVEL\lbt
MAX}, the exact number of nested dissection levels will be performed,
unless the graph to order is too small.
\iteme[{\tt SCOTCH\_STRATLEAFSIMPLE}]
Order nested dissection leaves as cheaply as possible.
\iteme[{\tt SCOTCH\_STRATSEPASIMPLE}]
Order nested dissection separators as cheaply as possible.
\end{itemize}

\subsubsection{Parallel mapping strategy strings}
\label{sec-lib-format-pmap}

A parallel mapping strategy is made of one or several parallel
mapping methods, which can be combined by means of strategy
operators. The strategy operators that can be used in mapping
strategies are listed below, by increasing precedence.
\begin{itemize}
\iteme[{\tt (}{\it strat\/}{\tt )}]
Grouping operator.
The strategy enclosed within the parentheses is treated as a single
mapping method.
\iteme[{\tt /}{\it cond\/}{\tt ?}{\it strat1\/}{[{\tt :}{\it strat2}]{\tt ;}}]
Condition operator. According to the result of the evaluation of
condition {\it cond}, either {\it strat1\/} or {\it strat2\/} (if it
is present) is applied. The condition applies to the characteristics
of the current mapping task, and can be built from logical and
relational operators. Conditional operators are listed below, by
increasing precedence.
\begin{itemize}
\iteme[{\it cond1\/}{\tt |}{\it cond2}]
Logical or operator. The result of the condition is true if {\it cond1\/}
or {\it cond2\/} are true, or both.
\iteme[{\it cond1\/}{\tt \&}{\it cond2}]
Logical and operator. The result of the condition is true only if both
{\it cond1\/} and {\it cond2\/} are true.
\iteme[{\tt !}{\it cond}]
Logical not operator. The result of the condition is true only if
{\it cond\/} is false.
\iteme[{\it var} {\it relop} {\it val}]
Relational operator, where {\it var\/} is a node variable, {\it val\/} is
either a node variable or a constant of the type of variable {\it var}, and
{\it relop\/} is one of '{\tt\verb+<+}', '{\tt\verb+=+}', and '{\tt\verb+>+}'.
The node variables are listed below, along with their types.
\begin{itemize}
\iteme[{\tt edge}]
The global number of arcs of the current subgraph.
Integer.
\iteme[{\tt levl}]
The level of the subgraph in the recursion tree, starting from zero
for the initial graph at the root of the tree.
Integer.
\iteme[{\tt load}]
The overall sum of the vertex loads of the subgraph. It is equal to
{\tt vert} if the graph has no vertex loads.
Integer.
\iteme[{\tt mdeg}]
The maximum degree of the subgraph.
Integer.
\iteme[{\tt proc}]
The number of processes on which the current subgraph is distributed
at this level of the separators tree.
Integer.
\iteme[{\tt rank}]
The rank of the current process among the group of processes on
which the current subgraph is distributed at this level of the
separators tree.
Integer.
\iteme[{\tt vert}]
The global number of vertices of the current subgraph.
Integer.
\end{itemize}
\end{itemize}
\iteme[{\it method\/}{[{\tt \{}{\it parameters\/}{\tt \}}]}]
Parallel graph mapping method. Available parallel mapping methods
are listed below.
\end{itemize}
The currently available parallel mapping methods are the following.
\begin{itemize}
\iteme[{\tt r}]
Dual recursive bipartitioning method. The parameters of the dual recursive
bipartitioning method are given below.
\begin{itemize}
\iteme[{\tt seq=}{\it strat}]
Set the sequential mapping strategy that is used on every centralized
subgraph of the recursion tree, once the dual recursive bipartitioning
process has gone far enough such that the number of processes handling
some subgraph is restricted to one.
\iteme[{\tt sep=}{\it strat}]
Set the parallel graph bipartitioning strategy that is used on every
current job of the recursion tree. Parallel graph bipartitioning
strategies are described below, in section~\ref{sec-lib-format-pbipart}.
\end{itemize}
\end{itemize}

\subsubsection{Parallel graph bipartitioning strategy strings}
\label{sec-lib-format-pbipart}

A parallel graph bipartitioning strategy is made of one or several
parallel graph bipartitioning methods, which can be combined by means
of strategy operators. Strategy operators are listed below, by
increasing precedence.

\begin{itemize}
\iteme[{\it strat1\/}{\tt |}{\it strat2}]
Selection operator. The result of the selection is the best bipartition of
the two that are obtained by the distinct application of {\it strat1\/} and
{\it strat2\/} to the current bipartition.
\iteme[{\it strat1$\:$}{\it strat2}]
Combination operator. Strategy {\it strat2\/} is applied to the bipartition
resulting from the application of strategy {\it strat1\/} to the current
bipartition. Typically, the first method used should compute an initial
bipartition from scratch, and every following method should use the
result of the previous one at its starting point.
\iteme[{\tt (}{\it strat\/}{\tt )}]
Grouping operator.
The strategy enclosed within the parentheses is treated as a single
bipartitioning method.
\iteme[{\tt /}{\it cond\/}{\tt ?}{\it strat1\/}{[{\tt :}{\it strat2}]{\tt ;}}]
Condition operator. According to the result of the evaluation of condition
{\it cond}, either {\it strat1\/} or {\it strat2\/} (if it is present) is
applied. The condition applies to the characteristics of the current active
graph, and can be built from logical and relational operators. Conditional
operators are listed below, by increasing precedence.
\begin{itemize}
\iteme[{\it cond1\/}{\tt |}{\it cond2}]
Logical or operator. The result of the condition is true if {\it cond1\/}
or {\it cond2\/} are true, or both.
\iteme[{\it cond1\/}{\tt \&}{\it cond2}]
Logical and operator. The result of the condition is true only if both
{\it cond1\/} and {\it cond2\/} are true.
\iteme[{\tt !}{\it cond}]
Logical not operator. The result of the condition is true only if
{\it cond\/} is false.
\iteme[{\it var} {\it relop} {\it val}]
Relational operator, where {\it var\/} is a graph or node variable,
{\it val\/} is either a graph or node variable or a constant of the type of
variable {\it var\/}, and {\it relop\/} is one of
'{\tt\verb+<+}', '{\tt\verb+=+}', and '{\tt\verb+>+}'.
The graph and node variables are listed below, along with their types.
\begin{itemize}
\iteme[{\tt edge}]
The global number of edges of the current subgraph.
Integer.
\iteme[{\tt levl}]
The level of the subgraph in the bipartition or multi-level tree, starting from zero
for the initial graph at the root of the tree.
Integer.
\iteme[{\tt load}]
The overall sum of the vertex loads of the subgraph. It is equal to
{\tt vert} if the graph has no vertex loads.
Integer.
\iteme[{\tt load0}]
The vertex load of the first subset of the current bipartition of the current
graph.
Integer.
\iteme[{\tt proc}]
The number of processes on which the current subgraph is distributed
at this level of the nested dissection process.
Integer.
\iteme[{\tt rank}]
The rank of the current process among the group of processes on
which the current subgraph is distributed at this level of the
nested dissection process.
Integer.
\iteme[{\tt vert}]
The number of vertices of the current subgraph.
Integer.
\end{itemize}
\end{itemize}
The currently available parallel vertex separation methods are the
following.
\begin{itemize}
\iteme[{\tt b}]
Band method. Basing on the current distributed graph and on its
partition, this method creates a new distributed graph reduced to the
vertices which are at most at a given distance from the current
frontier, runs a parallel graph bipartitioning strategy on this graph,
and prolongs back the new bipartition to the current graph. This method
is primarily used to run bipartition refinement methods during the
uncoarsening phase of the multi-level parallel graph bipartitioning
method. The parameters of the band method are listed below.
\begin{itemize}
\iteme[{\tt bnd=}{\it strat}]
Set the parallel graph bipartitioning strategy to be applied to the band
graph.
\iteme[{\tt org=}{\it strat}]
Set the parallel graph bipartitioning strategy to be applied to the
full distributed graph if the band graph could not be extracted.
\iteme[{\tt width=}{\it val}]
Set the maximum distance from the current frontier of vertices to be
kept in the band graph. $0$ means that only frontier vertices
themselves are kept, $1$ that immediate neighboring vertices are kept
too, and so on.
\end{itemize}
\iteme[{\tt d}]
Parallel diffusion method. This method, presented in its sequential
formulation in~\cite{pell07b}, flows two kinds of antagonistic
liquids, scotch and anti-scotch, from two source vertices, and sets
the new frontier as the limit between vertices which contain scotch
and the ones which contain anti-scotch. Because selecting the source
vertices is essential to the obtainment of useful results, this method
has been hard-coded so that the two source vertices are the two
vertices of highest indices, since in the band method these are the
anchor vertices which represent all of the removed vertices of each
part. Therefore, this method must be used on band graphs only, or on
specifically crafted graphs. Applying it to any other graphs is very
likely to lead to extremely poor results.  The parameters of the
diffusion bipartitioning method are listed below.
\begin{itemize}
\iteme[{\tt dif=}{\it rat}]
Fraction of liquid which is diffused to neighbor vertices at each
pass. To achieve convergence, the sum of the {\tt dif} and {\tt rem}
parameters must be equal to $1$, but in order to speed-up the diffusion
process, other combinations of higher sum can be tried. In this case,
the number of passes must be kept low, to avoid numerical overflows
which would make the results useless.
\iteme[{\tt pass=}{\it nbr}]
Set the number of diffusion sweeps performed by the algorithm. This
number depends on the width of the band graph to which the diffusion
method is applied. Useful values range from $30$ to $500$ according
to chosen {\tt dif} and {\tt rem} coefficients.
\iteme[{\tt rem=}{\it rat}]
Fraction of liquid which remains on vertices at each pass. See above.
\end{itemize}
\iteme[{\tt m}]
Parallel multi-level method. The parameters of the multi-level method
are listed below.
\begin{itemize}
\iteme[{\tt asc=}{\it strat}]
Set the strategy that is used to refine the distributed bipartition
obtained at ascending levels of the uncoarsening phase by
prolongation of the bipartition computed for coarser graphs.
% or meshes.
This strategy is not applied to the coarsest graph,
% or mesh
for which only the {\tt low} strategy is used.
\iteme[{\tt dlevl=}{\it nbr}]
Set the minimum level after which duplication is allowed in the
folding process. A value of $-1$ results in duplication being always
performed when folding.
\iteme[{\tt dvert=}{\it nbr}]
Set the average number of vertices per process under which
the folding process is performed during the coarsening phase.
\iteme[{\tt low=}{\it strat}]
Set the strategy that is used to compute the bipartition of the
coarsest distributed graph,
% or mesh
at the lowest level of the coarsening process.
\iteme[{\tt rat=}{\it rat}]
Set the threshold maximum coarsening ratio over which graphs
% or meshes
are no longer coarsened. The ratio of any given coarsening cannot be
less that $0.5$ (case of a perfect matching), and cannot be greater
than $1.0$. Coarsening stops when either the coarsening ratio is
above the maximum coarsening ratio, or the graph
% or mesh
has fewer node vertices than the minimum number of vertices allowed.
\iteme[{\tt vert=}{\it nbr}]
Set the threshold minimum size under which graphs
% or meshes
are no longer coarsened. Coarsening stops when either the coarsening
ratio is above the maximum coarsening ratio, or the graph
% or mesh
has fewer node vertices than the minimum number of vertices allowed.
\end{itemize}
\iteme[{\tt q}]
Multi-sequential method. The current distributed graph and its
separator are centralized on every process that holds a part of it, and
a sequential graph bipartitioning method is applied independently to each
of them. Then, the best bipartition found is prolonged back to the
distributed graph. This method is primarily designed to operate on
band graphs, which are orders of magnitude smaller than their parent
graph. Else, memory bottlenecks are very likely to occur.
The parameters of the multi-sequential method are listed below.
\begin{itemize}
\iteme[{\tt strat=}{\it strat}]
Set the sequential edge separation strategy that is used to refine
the bipartition of the centralized graph. For a description of all of
the available sequential bipartitioning methods, please refer to the
{\it\scotch\ User's Guide}~\scotchcitesuser.
\end{itemize}
\iteme[{\tt x}]
Load balance enforcement method. This method moves as many vertices
from the heaviest part to the lightest one so as to reduce load
imbalance as much as possible, without impacting communication load
too negatively. The only parameter of this method is listed below.
\begin{itemize}
\iteme[{\tt sbbt=}{\it nbr}]
Number of sub-buckets to sort communication gains. $5$ is a common
value.
\end{itemize}
\iteme[{\tt z}]
Zero method. This method moves all of the vertices to the first
part, resulting in an empty frontier. Its main use is to stop the
bipartitioning process whenever some condition is true.
\end{itemize}
\end{itemize}

\subsubsection{Parallel ordering strategy strings}
\label{sec-lib-format-pord}

A parallel ordering strategy is made of one or several parallel
ordering methods, which can be combined by means of strategy
operators. The strategy operators that can be used in ordering
strategies are listed below, by increasing precedence.
\begin{itemize}
\iteme[{\tt (}{\it strat\/}{\tt )}]
Grouping operator.
The strategy enclosed within the parentheses is treated as a single
ordering method.
\iteme[{\tt /}{\it cond\/}{\tt ?}{\it strat1\/}{[{\tt :}{\it strat2}]{\tt ;}}]
Condition operator. According to the result of the evaluation of
condition {\it cond}, either {\it strat1\/} or {\it strat2\/} (if it
is present) is applied. The condition applies to the characteristics
of the current node of the separators tree, and can be built from
logical and relational operators. Conditional operators are listed
below, by increasing precedence.
\begin{itemize}
\iteme[{\it cond1\/}{\tt |}{\it cond2}]
Logical or operator. The result of the condition is true if {\it cond1\/}
or {\it cond2\/} are true, or both.
\iteme[{\it cond1\/}{\tt \&}{\it cond2}]
Logical and operator. The result of the condition is true only if both
{\it cond1\/} and {\it cond2\/} are true.
\iteme[{\tt !}{\it cond}]
Logical not operator. The result of the condition is true only if
{\it cond\/} is false.
\iteme[{\it var} {\it relop} {\it val}]
Relational operator, where {\it var\/} is a node variable, {\it val\/} is
either a node variable or a constant of the type of variable {\it var}, and
{\it relop\/} is one of '{\tt\verb+<+}', '{\tt\verb+=+}', and '{\tt\verb+>+}'.
The node variables are listed below, along with their types.
\begin{itemize}
\iteme[{\tt edge}]
The global number of arcs of the current subgraph.
Integer.
\iteme[{\tt levl}]
The level of the subgraph in the separators tree, starting from zero
for the initial graph at the root of the tree.
Integer.
\iteme[{\tt load}]
The overall sum of the vertex loads of the subgraph. It is equal to
{\tt vert} if the graph has no vertex loads.
Integer.
\iteme[{\tt mdeg}]
The maximum degree of the subgraph.
Integer.
\iteme[{\tt proc}]
The number of processes on which the current subgraph is distributed
at this level of the separators tree.
Integer.
\iteme[{\tt rank}]
The rank of the current process among the group of processes on
which the current subgraph is distributed at this level of the
separators tree.
Integer.
\iteme[{\tt vert}]
The global number of vertices of the current subgraph.
Integer.
\end{itemize}
\end{itemize}
\iteme[{\it method\/}{[{\tt \{}{\it parameters\/}{\tt \}}]}]
Parallel graph ordering method. Available parallel ordering methods
are listed below.
\end{itemize}
The currently available parallel ordering methods are the following.
\begin{itemize}
\iteme[{\tt n}]
Nested dissection method. The parameters of the nested dissection
method are given below.
\begin{itemize}
\iteme[{\tt ole=}{\it strat}]
Set the parallel ordering strategy that is used on every distributed
leaf of the parallel separators tree if the node separation strategy
{\tt sep} has failed to separate it further.
\iteme[{\tt ose=}{\it strat}]
Set the parallel ordering strategy that is used on every distributed
separator of the separators tree.
\iteme[{\tt osq=}{\it strat}]
Set the sequential ordering strategy that is used on every centralized
subgraph of the separators tree, once the nested dissection process has
gone far enough such that the number of processes handling some
subgraph is restricted to one.
\iteme[{\tt sep=}{\it strat}]
Set the parallel node separation strategy that is used on every
current leaf of the separators tree to make it grow. Parallel
node separation strategies are described below, in
section~\ref{sec-lib-format-pnsep}.
\end{itemize}
\iteme[{\tt q}]
Sequential ordering method. The distributed graph is gathered onto a
single process which runs a sequential ordering strategy. The only
parameter of the sequential method is given below.
\begin{itemize}
\iteme[{\tt strat=}{\it strat}]
Set the sequential ordering strategy that is applied to the
centralized graph. For a description of all of the available
sequential ordering methods, please refer to the
{\it\scotch\ User's Guide}~\scotchcitesuser.
\end{itemize}
\iteme[{\tt s}]
Simple method. Vertices are ordered in their natural order. This
method is fast, and should be used to order separators if the number
of extra-diagonal blocks is not relevant
\end{itemize}

\subsubsection{Parallel node separation strategy strings}
\label{sec-lib-format-pnsep}

A parallel node separation strategy is made of one or several parallel
node separation methods, which can be combined by means of strategy
operators. Strategy operators are listed below, by increasing
precedence.

\begin{itemize}
\iteme[{\it strat1\/}{\tt |}{\it strat2}]
Selection operator. The result of the selection is the best vertex separator of
the two that are obtained by the distinct application of {\it strat1\/} and
{\it strat2\/} to the current separator.
\iteme[{\it strat1$\:$}{\it strat2}]
Combination operator. Strategy {\it strat2\/} is applied to the vertex
separator resulting from the application of strategy {\it strat1\/} to the
current separator. Typically, the first method used should compute an initial
separation from scratch, and every following method should use the
result of the previous one as a starting point.
\iteme[{\tt (}{\it strat\/}{\tt )}]
Grouping operator.
The strategy enclosed within the parentheses is treated as a single
separation method.
\iteme[{\tt /}{\it cond\/}{\tt ?}{\it strat1\/}{[{\tt :}{\it strat2}]{\tt ;}}]
Condition operator. According to the result of the evaluation of condition
{\it cond}, either {\it strat1\/} or {\it strat2\/} (if it is present) is
applied. The condition applies to the characteristics of the current subgraph,
and can be built from logical and relational operators. Conditional
operators are listed below, by increasing precedence.
\begin{itemize}
\iteme[{\it cond1\/}{\tt |}{\it cond2}]
Logical or operator. The result of the condition is true if {\it cond1\/}
or {\it cond2\/} are true, or both.
\iteme[{\it cond1\/}{\tt \&}{\it cond2}]
Logical and operator. The result of the condition is true only if both
{\it cond1\/} and {\it cond2\/} are true.
\iteme[{\tt !}{\it cond}]
Logical not operator. The result of the condition is true only if
{\it cond\/} is false.
\iteme[{\it var} {\it relop} {\it val}]
Relational operator, where {\it var\/} is a graph or node variable,
{\it val\/} is either a graph or node variable or a constant of the type of
variable {\it var\/}, and {\it relop\/} is one of
'{\tt\verb+<+}', '{\tt\verb+=+}', and '{\tt\verb+>+}'.
The graph and node variables are listed below, along with their types.
\begin{itemize}
\iteme[{\tt edge}]
The global number of edges of the current subgraph.
Integer.
\iteme[{\tt levl}]
The level of the subgraph in the separators tree, starting from zero
for the initial graph at the root of the tree.
Integer.
\iteme[{\tt load}]
The overall sum of the vertex loads of the subgraph. It is equal to
{\tt vert} if the graph has no vertex loads.
Integer.
\iteme[{\tt proc}]
The number of processes on which the current subgraph is distributed
at this level of the nested dissection process.
Integer.
\iteme[{\tt rank}]
The rank of the current process among the group of processes on
which the current subgraph is distributed at this level of the
nested dissection process.
Integer.
\iteme[{\tt vert}]
The number of vertices of the current subgraph.
Integer.
\end{itemize}
\end{itemize}
The currently available parallel vertex separation methods are the
following.
\begin{itemize}
\iteme[{\tt b}]
Band method. Basing on the current distributed graph and on its
partition, this method creates a new distributed graph reduced to the
vertices which are at most at a given distance from the current
separator, runs a parallel vertex separation strategy on this graph,
and prolongs back the new separator to the current graph. This method
is primarily used to run separator refinement methods during the
uncoarsening phase of the multi-level parallel graph separation
method. The parameters of the band method are listed below.
\begin{itemize}
\iteme[{\tt strat=}{\it strat}]
Set the parallel vertex separation strategy to be applied to the band
graph.
\iteme[{\tt width=}{\it val}]
Set the maximum distance from the current separator of vertices to be
kept in the band graph. $0$ means that only separator vertices
themselves are kept, $1$ that immediate neighboring vertices are kept
too, and so on.
\end{itemize}
\iteme[{\tt m}]
Parallel vertex multi-level method. The parameters of the vertex
multi-level method are listed below.
\begin{itemize}
\iteme[{\tt asc=}{\it strat}]
Set the strategy that is used to refine the distributed vertex
separators obtained at ascending levels of the uncoarsening phase by
prolongation of the separators computed for coarser graphs.
% or meshes.
This strategy is not applied to the coarsest graph,
% or mesh
for which only the {\tt low} strategy is used.
\iteme[{\tt dlevl=}{\it nbr}]
Set the minimum level after which duplication is allowed in the
folding process. A value of $-1$ results in duplication being always
performed when folding.
\iteme[{\tt dvert=}{\it nbr}]
Set the average number of vertices per process under which
the folding process is performed during the coarsening phase.
\iteme[{\tt low=}{\it strat}]
Set the strategy that is used to compute the vertex separator of the
coarsest distributed graph,
% or mesh
at the lowest level of the coarsening process.
\iteme[{\tt rat=}{\it rat}]
Set the threshold maximum coarsening ratio over which graphs
% or meshes
are no longer coarsened. The ratio of any given coarsening cannot be
less that $0.5$ (case of a perfect matching), and cannot be greater
than $1.0$. Coarsening stops when either the coarsening ratio is
above the maximum coarsening ratio, or the graph
% or mesh
has fewer node vertices than the minimum number of vertices allowed.
\iteme[{\tt vert=}{\it nbr}]
Set the threshold minimum size under which graphs
% or meshes
are no longer coarsened. Coarsening stops when either the coarsening
ratio is above the maximum coarsening ratio, or the graph
% or mesh
has fewer node vertices than the minimum number of vertices allowed.
\end{itemize}
\iteme[{\tt q}]
Multi-sequential method. The current distributed graph and its
separator are centralized on every process that holds a part of it, and
a sequential vertex separation method is applied independently to each
of them. Then, the best separator found is prolonged back to the
distributed graph. This method is primarily designed to operate on
band graphs, which are orders of magnitude smaller than their parent
graph. Else, memory bottlenecks are very likely to occur.
The parameters of the multi-sequential method are listed below.
\begin{itemize}
\iteme[{\tt strat=}{\it strat}]
Set the sequential vertex separation strategy that is used to refine
the separator of the centralized graph. For a description of all of
the available sequential methods, please refer to the
{\it\scotch\ User's Guide}~\scotchcitesuser.
\end{itemize}
\iteme[{\tt z}]
Zero method. This method moves all of the node vertices to the first
part, resulting in an empty separator. Its main use is to stop the
separation process whenever some condition is true.
\end{itemize}
\end{itemize}

\subsection{Distributed graph handling routines}
\label{sec-lib-dgraph}

\subsubsection{{\tt SCOTCH\_dgraphAlloc}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}l}
SCOTCH\_Dgraph * SCOTCH\_dgraphAlloc ( & void)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphAlloc} function allocates a memory area of a
size sufficient to store a {\tt SCOTCH\_\lbt Dgraph} structure. It is
the user's responsibility to free this memory when it is no longer
needed. The allocated space must be initialized before use, by means
of the {\tt SCOTCH\_\lbt dgraph\lbt Init} routine.

\progret

{\tt SCOTCH\_dgraphAlloc} returns the pointer to the memory area if it
has been successfully allocated, and {\tt NULL} else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphBand}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphBand ( & SCOTCH\_Dgraph * const    & orggrafptr, \\
                         & const SCOTCH\_Num         & fronlocnbr, \\
                         & const SCOTCH\_Num * const & fronloctab, \\
                         & const SCOTCH\_Num         & distval,    \\
                         & SCOTCH\_Dgraph * const    & bndgrafptr) \\
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphband ( & doubleprecision (*)   & orggrafdat, \\
                    & integer*{\it num}     & seedlocnbr, \\
                    & integer*{\it num} (*) & seedloctab, \\
                    & integer*{\it num}     & distval,    \\
                    & doubleprecision (*)   & bndgrafdat, \\
                    & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphBand} routine creates in the {\tt SCOTCH\_\lbt
Dgraph} structure pointed to by {\tt bndgrafptr} a distributed band
graph induced from the {\tt SCOTCH\_\lbt Dgraph} pointed to by {\tt
orggrafptr}. The distributed band graph will contain all the vertices
of the original graph located at a distance smaller than or equal to
{\tt distval} from any vertex provided in the {\tt seedloctab} lists
of seed vertices.

On each process, the {\tt seedloctab} array should contain the local
indices of the local vertices that will serve as seeds. The number of
such local vertices is passed to {\tt SCOTCH\_\lbt dgraph\lbt Band} in
the {\tt seedlocnbr} value. The size of the {\tt seedloctab} array
should be at least equal to the number of local vertices of the
original graph, as it is internally used as a queue array. Hence, no
user data should be placed immediately after the {\tt seedlocnbr}
values in the array, as they are most likely to be overwritten by
{\tt SCOTCH\_\lbt dgraph\lbt Band}.

{\tt bndgrafptr} must have been initialized with the
{\tt SCOTCH\_\lbt dgraph\lbt Init} routine before
{\tt SCOTCH\_dgraph\lbt Band} is called. The communicator that is
passed to it can either be the communicator used by the original graph
{\tt org\lbt graf\lbt ptr}, or any congruent communicator created by
using {\tt MPI\_\lbt Comm\_\lbt dup} on this communicator. Using a
distinct communicator for the induced band graph allows subsequent
library routines to be called in parallel on the two graphs after the
band graph is created.

Induced band graphs have vertex labels attached to each of their
vertices, in the {\tt vlbl\lbt loc\lbt tab} array. If the original
graph had vertex labels attached to it, band graph vertex labels are
the labels of the corresponding vertices in the original graph. Else,
band graph vertex labels are the global indices of corresponding
vertices in the original graph.

Depending on original graph vertex and seed distributions, the
distribution of induced band graph vertices may be highly
imbalanced. In order for further computations on this distributed
graph to scale well, a redistribution of its data may be necessary,
using the {\tt SCOTCH\_dgraph\lbt Redist} routine.

\progret

{\tt SCOTCH\_dgraphBand} returns $0$ if the band graph structure has
been successfully created, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphBuild}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphBuild ( & SCOTCH\_Dgraph *    & grafptr,    \\
                          & const SCOTCH\_Num   & baseval,    \\
                          & const SCOTCH\_Num   & vertlocnbr, \\
                          & const SCOTCH\_Num   & vertlocmax, \\
                          & const SCOTCH\_Num * & vertloctab, \\
                          & const SCOTCH\_Num * & vendloctab, \\
                          & const SCOTCH\_Num * & veloloctab, \\
                          & const SCOTCH\_Num * & vlblocltab, \\
                          & const SCOTCH\_Num   & edgelocnbr, \\
                          & const SCOTCH\_Num   & edgelocsiz, \\
                          & const SCOTCH\_Num * & edgeloctab, \\
                          & const SCOTCH\_Num * & edgegsttab, \\
                          & const SCOTCH\_Num * & edloloctab)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphbuild ( & doubleprecision (*)   & grafdat,    \\
                     & integer*{\it num}     & baseval,    \\
                     & integer*{\it num}     & vertlocnbr, \\
                     & integer*{\it num}     & vertlocmax, \\
                     & integer*{\it num} (*) & vertloctab, \\
                     & integer*{\it num} (*) & vendloctab, \\
                     & integer*{\it num} (*) & veloloctab, \\
                     & integer*{\it num} (*) & vlblloctab, \\
                     & integer*{\it num}     & edgelocnbr, \\
                     & integer*{\it num}     & edgelocsiz, \\
                     & integer*{\it num} (*) & edgeloctab, \\
                     & integer*{\it num} (*) & edgegsttab, \\
                     & integer*{\it num} (*) & edloloctab, \\
                     & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphBuild} routine fills the distributed source
graph structure pointed to by {\tt grafptr} with all of the data that
are passed to it.

{\tt baseval} is the graph base value for index arrays (typically $0$ for
structures built from C and $1$ for structures built from Fortran).
{\tt vertlocnbr} is the number of local vertices on the calling
process, used to create the {\tt proccnttab} array.
{\tt vertlocmax} is the maximum number of local vertices to be created
on the calling process, used to create the {\tt proc\lbt vrt\lbt tab}
array of global indices, and which must be set to {\tt vert\lbt
loc\lbt nbr} for graphs wihout holes in their global numbering.
{\tt vertloctab} is the local adjacency index array, of size $({\tt
vertlocnbr} + 1)$ if the edge array is compact (that is, if {\tt
vendloctab} equals $\mbox{\tt vertloctab}+1$ or {\tt NULL}), or of
size {\tt vertlocnbr} else.
{\tt vendloctab} is the adjacency end index array, of size {\tt
vertlocnbr} if it is disjoint from {\tt vertloctab}.
{\tt veloloctab} is the local vertex load array, of size
{\tt vertlocnbr} if it exists.
{\tt vlblloctab} is the local vertex label array, of size
{\tt vertlocnbr} if it exists.
{\tt edgelocnbr} is the local number of arcs (that is, twice the
number of edges), including arcs to local vertices as well as to
ghost vertices.
{\tt edgelocsiz} is lower-bounded by the minimum size of the edge
array required to encompass all used adjacency values; it is therefore
at least equal to the maximum of the {\tt vendloctab} entries, over
all local vertices, minus {\tt baseval}; it can be set to {\tt
edgelocnbr} if the edge array is compact.
{\tt edgeloctab} is the local adjacency array, of size at least
{\tt edgelocsiz}, which stores the global indices of end vertices.
{\tt edgegsttab} is the adjacency array, of size at least
{\tt edgelocsiz}, if it exists; if {\tt edgegsttab} is given, it is
assumed to be a pointer to an empty array to be filled with ghost
vertex data computed by {\tt SCOTCH\_dgraph\lbt Ghst} whenever
needed by communication routines such as
{\tt SCOTCH\_dgraph\lbt Halo}.
{\tt edloloctab} is the arc load array, of size {\tt edgelocsiz}
if it exists.

The {\tt vendloctab}, {\tt veloloctab}, {\tt vlblloctab},
{\tt edloloctab} and {\tt edgegsttab} arrays are optional,
and a null pointer can be passed as argument whenever
they are not defined.

Note that, for \ptscotch\ to operate properly, either all the arrays
of a kind must be set to null on all processes, or else all of them
must be non null. This is mandatory because some algorithms require that
collective communication operations be performed when some kind of
data is present. If some processes considered that the arrays are
present, and start such communications, while others did not, a
deadlock would occur. In most cases, this situation will be
anticipated and an error message will be issued, stating that graph
data are inconsistent.

The situation above may accidentally arise when some processes don't
own any edge or vertex. In that case, depending on the implementation,
a user \texttt{malloc} of size zero may return a null pointer rather
than a non null pointer to an area of size zero, leading to the
aforementioned inconsistencies. In order to avoid this problem, it is
necessary to ensure that no null pointer will be returned, even in the
case when zero bytes are requested. A workaround can be to call
\texttt{malloc (\textit{x} | 4)} instead of \texttt{malloc
(\textit{x})}. The ``\texttt{| 4}'' will consume only $4$ extra bytes
at most, depending on the value of \texttt{\textit{x}}.

Since, in Fortran, there is no null reference, passing the
{\tt scotchf\lbt dgraph\lbt build} routine a reference equal to
{\tt vertloctab} in the {\tt veloloctab} or {\tt vlblloctab} fields
makes them be considered as missing arrays. The same holds for
{\tt edloloctab} and {\tt edgegsttab} when they are passed a
reference equal to {\tt edgeloctab}. Setting {\tt vendloctab}
to refer to one cell after {\tt vertloctab} yields the same result,
as it is the exact semantics of a compact vertex array.

To limit memory consumption, {\tt SCOTCH\_\lbt dgraph\lbo Build} does
not copy array data, but instead references them in the {\tt
SCOTCH\_\lbt Dgraph} structure. Therefore, great care should be taken
not to modify the contents of the arrays passed to {\tt SCOTCH\_\lbt
dgraph\lbo Build} as long as the graph structure is in use. Every
update of the arrays should be preceded by a call to {\tt SCOTCH\_\lbt
dgraph\lbo Free}, to free internal graph structures, and eventually
followed by a new call to {\tt SCOTCH\_\lbt dgraph\lbo Build} to
re-build these internal structures so as to be able to use the new
distributed graph.

To ensure that inconsistencies in user data do not result in an
erroneous behavior of the \libscotch\ routines, it is recommended, at
least in the development stage of your application code, to call the
{\tt SCOTCH\_\lbt dgraph\lbt Check} routine on the newly created
{\tt SCOTCH\_\lbt Dgraph} structure before calling any other
\libscotch\ routine.

\progret

{\tt SCOTCH\_dgraphBuild} returns $0$ if the graph structure has been
successfully set with all of the input data, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphCheck}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphCheck ( & const SCOTCH\_Dgraph * & grafptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphcheck ( & doubleprecision (*) & grafdat, \\
                     & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphCheck} routine checks the consistency of the
given {\tt SCOTCH\_\lbt Dgraph} structure. It can be used in client
applications to determine if a graph which has been created from
user-generated data by means of the {\tt SCOTCH\_\lbt dgraph\lbt Build}
routine is consistent, prior to calling any other routines of the
\libscotch\ library which would otherwise return internal error
messages or crash the program.

\progret

{\tt SCOTCH\_dgraphCheck} returns $0$ if graph data are consistent, and
$1$ else.

\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphCoarsen}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphCoarsen ( & SCOTCH\_Dgraph * const & finegrafptr, \\
                            & const SCOTCH\_Num      & coarnbr,     \\
                            & const double           & coarrat,     \\
                            & const SCOTCH\_Num      & flagval,     \\
                            & SCOTCH\_Dgraph * const & coargrafptr, \\
                            & SCOTCH\_Num * const    & multloctab)  \\
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphcoarsen ( & doubleprecision (*)   & finegrafdat, \\
                       & integer*{\it num}     & coarnbr,     \\
                       & doubleprecision       & coarrat,     \\
                       & integer*{\it num}     & flagval,     \\
                       & doubleprecision (*)   & coargrafdat, \\
                       & integer*{\it num} (*) & multloctab,  \\
                       & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphCoarsen} routine creates in the
{\tt SCOTCH\_\lbt Dgraph} structure pointed to by {\tt coargrafptr}
a distributed coarsened graph from the {\tt SCOTCH\_\lbt Dgraph}
pointed to by {\tt finegrafptr}. The coarsened graph is created only
if it is comprises more than {\tt coarnbr} vertices, or if the
coarsening ratio is lower than {\tt coarrat}. Valid coarsening ratio
values range from $0.5$ (in the case of a perfect matching) to $1.0$
(if no vertex could be coarsened). Classical threshold values range
from $0.7$ to $0.8$.

The {\tt flagval} flag specifies the type of coarsening. Several
groups of flags can be combined, by means of addition or
``binary or'' operators. When {\tt SCOTCH\_\lbt COARSEN\lbt
NO\lbt MERGE} is set, isolated vertices are never merged with other
vertices. This preserves the topology of the graph, at the expense of
a higher coarsening ratio. When {\tt SCOTCH\_\lbt COARSEN\lbt
FOLD} or {\tt SCOTCH\_\lbt COARSEN\lbt FOLD\lbt DUP} are set, if a
coarsened graph is created, it is folded onto half of the processes of
the initial communicator. In the case of {\tt SCOTCH\_\lbt COARSEN\lbt
FOLD\lbt DUP}, a second copy is created (duplicated) onto the other
half. The two copies may not be identical, if the number of processors
of the finer graph is odd.

The {\tt multloctab} array must be of a size that is big enough to
store multinode data for the resulting coarsened graph. This array
will contain pairs of consecutive {\tt SCOTCH\_\lbt Num} values,
representing the global indices of the two fine vertices that have
been coarsened into each of the local coarse vertices. In case of
plain coarsening, the size of the array must be at least twice the
maximum expected number of local coarse vertices, that is, on each
processor, twice the value of {\tt vert\lbt loc\lbt nbr} of the finer
graph, because in the worst case no coarsening may happen on some
processor. In case of folding, a redistribution of vertices is
performed. Hence, the maximum number of coarse vertices on some
processor is upper-bounded by the expected maximum global number of
coarse vertices, divided by the resulting number of processors, that
is, the integer floor value of half of the number of processors of the
finer graph.

{\tt coargrafptr} must have been initialized with the
{\tt SCOTCH\_\lbt dgraph\lbt Init} routine before
{\tt SCOTCH\_dgraph\lbt Coarsen} is called. The communicator that is
passed to it can either be the communicator used by the fine graph
{\tt fine\lbt graf\lbt ptr}, or any congruent communicator created by
using {\tt MPI\_\lbt Comm\_\lbt dup} on this communicator. Using a
distinct communicator for the coarsened subgraph allows subsequent
library routines to be called in parallel on the two graphs after the
coarse graph is created.

Depending on the way vertex mating is performed, the distribution of
coarsened graph vertices may be imbalanced. In order for further
computations on this distributed graph to scale well, a redistribution
of its data might be necessary, using the
{\tt SCOTCH\_dgraph\lbt Redist} routine.

\progret

{\tt SCOTCH\_dgraphCoarsen} returns $0$ if the coarse graph
structure has been successfully created, $1$ if the coarse graph was
not created because it did not enforce the threshold parameters, and
$2$ on error.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphData}}
\label{sec-lib-func-scotchdgraphdata}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_dgraphData ( & const SCOTCH\_Graph * & grafptr,    \\
                          & SCOTCH\_Num *         & baseptr,    \\
                          & SCOTCH\_Num *         & vertglbptr, \\
                          & SCOTCH\_Num *         & vertlocptr, \\
                          & SCOTCH\_Num *         & vertlocptz, \\
                          & SCOTCH\_Num *         & vertgstptr, \\
                          & SCOTCH\_Num **        & vertloctab, \\
                          & SCOTCH\_Num **        & vendloctab, \\
                          & SCOTCH\_Num **        & veloloctab, \\
                          & SCOTCH\_Num **        & vlblloctab, \\
                          & SCOTCH\_Num *         & edgeglbptr, \\
                          & SCOTCH\_Num *         & edgelocptr, \\
                          & SCOTCH\_Num *         & edgelocptz, \\
                          & SCOTCH\_Num **        & edgeloctab, \\
                          & SCOTCH\_Num **        & edgegsttab, \\
                          & SCOTCH\_Num **        & edloloctab, \\
                          & MPI\_Comm *           & comm)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphdata ( & doubleprecision (*)   & grafdat,    \\
                    & integer*{\it num} (*) & indxtab,    \\
                    & integer*{\it num}     & baseval,    \\
                    & integer*{\it num}     & vertglbnbr, \\
                    & integer*{\it num}     & vertlocnbr, \\
                    & integer*{\it num}     & vertlocmax, \\
                    & integer*{\it num}     & vertgstnbr, \\
                    & integer*{\it idx}     & vertlocidx, \\
                    & integer*{\it idx}     & vendlocidx, \\
                    & integer*{\it idx}     & velolocidx, \\
                    & integer*{\it idx}     & vlbllocidx, \\
                    & integer*{\it num}     & edgeglbnbr, \\
                    & integer*{\it num}     & edgelocnbr, \\
                    & integer*{\it num}     & edgelocsiz, \\
                    & integer*{\it idx}     & edgelocidx, \\
                    & integer*{\it idx}     & edgegstidx, \\
                    & integer*{\it idx}     & edlolocidx, \\
                    & integer               & comm)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphData} routine is the dual of the
{\tt SCOTCH\_\lbt dgraph\lbo Build} routine. It is a multiple
accessor that returns scalar values and array references.

{\tt baseptr} is the pointer to a location that will hold the graph base
value for index arrays (typically $0$ for
structures built from C and $1$ for structures built from Fortran).
{\tt vertglbptr} is the pointer to a location that will hold the
global number of vertices.
{\tt vertlocptr} is the pointer to a location that will hold the
number of local vertices.
{\tt vertlocptz} is the pointer to a location that will hold the
maximum allowed number of local vertices, that is,
$(\mbox{\tt proc\lbt vrt\lbt tab[}p + 1\mbox{\tt]} -
\mbox{\tt proc\lbt vrt\lbt tab[}p\mbox{\tt]})$, where $p$ is the
rank of the local process.
{\tt vertgstptr} is the pointer to a location that will hold the
number of local and ghost vertices if it has already been computed
by a prior call to {\tt SCOTCH\_\lbt dgraph\lbo Ghst}, and $-1$ else.
{\tt vertloctab} is the pointer to a location that will hold the
reference to the adjacency index array, of size
$\mbox{\tt *vertlocptr} + 1$ if the adjacency array is compact,
or of size {\tt *vertlocptr} else.
{\tt vendloctab} is the pointer to a location that will hold the
reference to the adjacency end index array, and is equal to
$\mbox{\tt vertloctab} + 1$ if the adjacency array is compact.
{\tt veloloctab} is the pointer to a location that will hold the
reference to the vertex load array, of size {\tt *vertlocptr}.
{\tt vlblloctab} is the pointer to a location that will hold the
reference to the vertex label array, of size {\tt vertlocnbr}.
{\tt edgeglbptr} is the pointer to a location that will hold the
global number of arcs (that is, twice the number of global edges).
{\tt edgelocptr} is the pointer to a location that will hold the
number of local arcs (that is, twice the number of local edges).
{\tt edgelocptz} is the pointer to a location that will hold the
declared size of the local edge array, which must encompass all
used adjacency values; it is at least equal to {\tt *edgelocptr}.
{\tt edgeloctab} is the pointer to a location that will hold the
reference to the local adjacency array of global indices, of size
at least {\tt *edgelocptz}.
{\tt edgegsttab} is the pointer to a location that will hold the
reference to the ghost adjacency array, of size at least
{\tt *edgelocptz}; if it is non null, its data are valid if
{\tt vertgstnbr} is non-negative.
{\tt edloloctab} is the pointer to a location that will hold the
reference to the arc load array, of size {\tt *edgelocptz}.
{\tt comm} is the pointer to a location that will hold the MPI
communicator of the distributed graph.

Any of these pointers can be set to {\tt NULL} on input if the
corresponding information is not needed. Else, the reference to a
dummy area can be provided, where all unwanted data will be written.

Since there are no pointers in Fortran, a specific mechanism is used
to allow users to access graph arrays. The {\tt scotchf\lbt dgraph\lbt
data} routine is passed an integer array, the first element of which
is used as a base address from which all other array indices are
computed. Therefore, instead of returning references, the routine
returns integers, which represent the starting index of each of the
relevant arrays with respect to the base input array, or {\tt vert\lbt
loc\lbt idx}, the index of {\tt vert\lbt loc\lbt tab}, if they do not
exist. For instance, if some base array {\tt myarray\lbt (1)} is
passed as parameter {\tt indxtab}, then the first cell of array {\tt
vert\lbt loc\lbt tab} will be accessible as {\tt myarray\lbt
(vert\lbt loc\lbt idx)}.
In order for this feature to behave properly, the {\tt indxtab}
array must be word-aligned with the graph arrays. This is
automatically enforced on most systems, but some care should be
taken on systems that allow to access data that is not
word-aligned. On such systems, declaring the array after a
dummy {\tt double\lbt precision} array can coerce the compiler
into enforcing the proper alignment. The integer value returned in
{\tt comm} is the communicator itself, not its index with respect to
{\tt indxtab}. Also, on 32\_64 architectures,
such indices can be larger than the size of a regular
{\tt INTEGER}. This is why the indices to be returned are defined by
means of a specific integer type. See
Section~\ref{sec-lib-inttypesize} for more information on this
issue.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphExit}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_dgraphExit ( & SCOTCH\_Dgraph * & grafptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphexit ( & doubleprecision (*) & grafdat)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphExit} function frees the contents of a
{\tt SCOTCH\_\lbt Dgraph} structure previously initialized by
{\tt SCOTCH\_\lbt dgraphInit}. All subsequent calls to
{\tt SCOTCH\_\lbt dgraph} routines other than {\tt SCOTCH\_\lbt
dgraphInit}, using this structure as parameter, may yield
unpredictable results.

If {\tt SCOTCH\_\lbt dgraph\lbt Init} was called with a
communicator that is not a predefined MPI communicator, it is
the user's responsibility to free this communicator after
all graphs that use it have been freed by means of the
{\tt SCOTCH\_\lbt dgraph\lbt Exit} routine.

\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphFree}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_dgraphFree ( & SCOTCH\_Dgraph * & grafptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphfree ( & doubleprecision (*) & grafdat)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphFree} function frees the graph data of a
{\tt SCOTCH\_\lbt Dgraph} structure previously initialized by
{\tt SCOTCH\_\lbt dgraph\lbt Init}, but preserves its internal
communication data structures. This call is equivalent to
a call to {\tt SCOTCH\_\lbt dgraph\lbt Exit} immediately
followed by a call to {\tt SCOTCH\_\lbt dgraph\lbt Init} with the
same communicator as in the previous {\tt SCOTCH\_\lbt dgraph\lbt
Init} call. Consequently, the given {\tt SCOTCH\_\lbt Dgraph}
structure remains ready for subsequent calls to any distributed
graph handling routine of the \libscotch\ library.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphGather}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphGather ( & SCOTCH\_Dgraph * const      & dgrfptr, \\
                           & const SCOTCH\_Graph * const & cgrfptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphgather ( & doubleprecision (*) & dgrfdat, \\
                      & doubleprecision (*) & cgrfdat, \\
                      & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphGather} routine gathers the contents of the
distributed {\tt SCOTCH\_\lbt Dgraph} structure pointed to by {\tt
dgrfptr} to the centralized {\tt SCOTCH\_\lbt Graph} structure(s)
pointed to by {\tt cgrfptr}.

If only one of the processes has a non-null {\tt cgrfptr}
pointer, it is considered as the root process to which distributed
graph data is sent. Else, all of the processes must provide a valid
{\tt cgrfptr} pointer, and each of them will receive a copy of
the centralized graph.

\progret

{\tt SCOTCH\_dgraphGather} returns $0$ if the graph structure has
been successfully gathered, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphInducePart}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphInducePart ( & SCOTCH\_Dgraph * const    & orggrafptr,    \\
                               & const SCOTCH\_Num * const & orgpartloctab, \\
                               & const SCOTCH\_Num         & indpartval,    \\
                               & const SCOTCH\_Num         & indvertlocnbr, \\
                               & SCOTCH\_Dgraph * const    & indgrafptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphinducepart ( & doubleprecision (*)   & orggrafdat,    \\
                          & integer*{\it num} (*) & orgpartloctab, \\
                          & integer*{\it num}     & indpartval,    \\
                          & integer*{\it num}     & indvertlocnbr, \\
                          & doubleprecision (*)   & indgrafdat,    \\
                          & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphInducePart} routine creates in the
{\tt SCOTCH\_\lbt Dgraph} structure pointed to by {\tt indgrafptr}
a distributed induced subgraph of the {\tt SCOTCH\_\lbt Dgraph}
pointed to by {\tt orggrafptr}. The local vertices of every processor
that are kept in the induced subgraph are the ones for which the
values contained in the {\tt orgpart\lbt loctab} array cells are equal
to {\tt indpartval}.

The {\tt orgpartloctab} array must be of a size at least equal to
the number of local vertices of the original graph. It may be larger,
e.g. equal to the number of local plus ghost vertices, if needed by
the user, but only the area corresponding to the local vertices will
be used by {\tt SCOTCH\_\lbt dgraph\lbt Induce\lbt Part}.

{\tt indvertlocnbr} is the number of local vertices in the induced
subgraph. It must therefore be equal to the number of local vertices
that have their associated {\tt org\lbt part\lbt loc\lbt tab} cell
value equal to {\tt indpartval}. This value is necessary to internal
array allocations. While it could have been easily computed by
\scotch, by traversing the {\tt orgpart\lbt gsttab} array, making it
used-provided spares such a traversal if the user already knows the
value. If it is not the case, setting this value to {\tt -1} will make
\scotch\ compute it automatically.

{\tt indgrafptr} must have been initialized with the {\tt SCOTCH\_\lbt
dgraph\lbt Init} routine before {\tt SCOTCH\_dgraph\lbt Induce\lbt
Part} is called. The communicator that is passed to it can either be
the communicator used by the original graph {\tt org\lbt graf\lbt
ptr}, or any congruent communicator created by using {\tt MPI\_\lbt
Comm\_\lbt dup} on this communicator. Using a distinct communicator
for the induced subgraph allows subsequent library routines to be
called in parallel on the two graphs after the induced graph is
created.

Induced band graphs have vertex labels attached to each of their
vertices, in the {\tt vlbl\lbt loc\lbt tab} array. If the original
graph had vertex labels attached to it, induced graph vertex labels
are the labels of the corresponding vertices in the original
graph. Else, induced graph vertex labels are the global indices of
corresponding vertices in the original graph.

Depending on the partition array, the distribution of induced graph
vertices may be highly imbalanced. In order for further computations
on this distributed graph to scale well, a redistribution of its data
may be necessary, using the {\tt SCOTCH\_dgraph\lbt Redist} routine.

\progret

{\tt SCOTCH\_dgraphInducePart} returns $0$ if the induced graph
structure has been successfully created, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphInit}}
\label{sec-lib-dgraphinit}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphInit ( & SCOTCH\_Dgraph * & grafptr, \\
                         & MPI\_Comm        & comm)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphinit ( & doubleprecision (*) & grafdat, \\
                    & integer             & comm, \\
                    & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphInit} function initializes a {\tt SCOTCH\_\lbt
Dgraph} structure so as to make it suitable for future parallel
operations. It should be the first function to be called upon a {\tt
SCOTCH\_\lbt Dgraph} structure. By accessing the communicator handle
which is passed to it, {\tt SCOTCH\_dgraphInit} can know how many
processes will be used to manage the distributed graph and can allocate
its private structures accordingly.

{\tt SCOTCH\_dgraphInit} does not make a duplicate of the communicator
which is passed to it, but instead keeps a reference to it, so that
all future communications needed by \libscotch\ to process this graph
will be performed using this communicator. Therefore, it is the user's
responsibility, whenever several \libscotch\ routines might be called
in parallel, to create appropriate duplicates of communicators so as
to avoid any potential interferences between concurrent
communications.

When the distributed graph is no longer of use, the {\tt
SCOTCH\_\lbt dgraph\lbt Exit} function must be called, to free
its internal data arrays.

If {\tt SCOTCH\_\lbt dgraph\lbt Init} was called with a
communicator that is not a predefined MPI communicator (such as
{\tt MPI\_\tt COMM\_\lbt WORLD} or {\tt MPI\_\tt COMM\_\lbt SELF}), it
is the user's responsibility to free this communicator after
all graphs that use it have been freed by means of the
{\tt SCOTCH\_\lbt dgraph\lbt Exit} routine.

\progret

{\tt SCOTCH\_dgraphInit} returns $0$ if the graph structure has been
successfully initialized, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphRedist}}
\label{sec-lib-dgraphredist}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}

int SCOTCH\_dgraphRedist ( & SCOTCH\_Dgraph * const    & orggrafptr, \\
                           & const SCOTCH\_Num * const & partloctab, \\
                           & const SCOTCH\_Num * const & permgsttab, \\
                           & const SCOTCH\_Num         & vertlocdlt, \\
                           & const SCOTCH\_Num         & edgelocdlt, \\
                           & SCOTCH\_Dgraph * const    & redgrafptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphredist ( & doubleprecision (*)   & orggrafdat, \\
                      & integer*{\it num} (*) & partloctab, \\
                      & integer*{\it num} (*) & permgsttab, \\
                      & integer*{\it num}     & vertlocdlt, \\
                      & integer*{\it num}     & edgelocdlt, \\
                      & doubleprecision (*)   & redgrafptr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphRedist} routine initializes and fills the
redistributed graph structure pointed to by
{\tt red\lbt graf\lbt ptr} with a new distributed graph
made from data redistributed from the original
graph pointed to by {\tt org\lbt graf\lbt ptr}.

The partition array, {\tt part\lbt loc\lbt tab}, must always be
provided. It holds the part number associated with each local
vertex. Part indices are {\em not\/} based: target vertices are
numbered from $0$ to the number of parts minus $1$.

Whenever provided, the permutation array {\tt perm\lbt gst\lbt tab}
must be of a size equal to the number of local and ghost vertices of
the source graph (that is, {\tt vert\lbt gst\lbt nbr}, see
Section~\ref{sec-lib-format-dgraph}). Its contents must be based, that
is, permutation global indices start at {\tt baseval}. Both its local
and ghost contents must be valid. Consequently, it is the user's
responsibility to call {\tt SCOTCH\_dgraph\lbt Halo} whenever
necessary so as to propagate part values of the local vertices to
their ghost counterparts on other processes. {\tt SCOTCH\_\lbt
dgraph\lbt Redist} does not perform this halo exchange itself
because users may already have computed these values by themselves
when computing the new partition. If {\tt perm\lbt gst\lbt tab} is not
provided by the user, vertices in each part are reordered according to
their global indices in the source graph.

{\tt redgrafptr} must have been initialized with the {\tt SCOTCH\_\lbt
dgraph\lbt Init} routine before {\tt SCOTCH\_dgraph\lbt Redist} is
called. The communicator that is passed to it can either be the
communicator used by the original graph {\tt org\lbt graf\lbt ptr},
or any congruent communicator created by using {\tt MPI\_\lbt
Comm\_\lbt dup} on this communicator. Using a distinct communicator
for the redistributed graph allows subsequent library routines to be
called in parallel on the two graphs after the redistributed graph is
created.

Redistributed graphs have vertex labels attached to each of their
vertices, in the {\tt vlbl\lbt loc\lbt tab} array. If the original
graph had vertex labels attached to it, redistributed graph vertex
labels are the labels of the corresponding vertices in the original
graph. Else, redistributed graph vertex labels are the global indices
of corresponding vertices in the original graph.

\progret

{\tt SCOTCH\_dgraphRedist} returns $0$ if the redistributed graph has
been successfully created, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphScatter}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphScatter ( & SCOTCH\_Dgraph * const      & dgrfptr, \\
                            & const SCOTCH\_Graph * const & cgrfptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphscatter ( & doubleprecision (*) & dgrfdat, \\
                       & doubleprecision (*) & cgrfdat, \\
                       & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphScatter} routine scatters the contents of the
centralized {\tt SCOTCH\_\lbt Graph} structure pointed to by
{\tt cgrfptr} across the processes of the distributed
{\tt SCOTCH\_\lbt Dgraph} structure pointed to by {\tt dgrfptr}.

Only one of the processes should provide a non-null {\tt cgrfptr}
parameter. This process is considered the root process for the
scattering operation. Since, in Fortran, there is no null reference,
processes which are not the root must indicate it by passing a pointer
to the distributed graph structure equal to the pointer to their
centralized graph structure.

The scattering is performed such that graph vertices are evenly
spread across the processes of the communicator associated with
the distributed graph, in ascending order. Every process receives
either
$\left\lceil\frac{\mbox{vertglbnbr}}{\mbox{procglbnbr}}\right\rceil$
or
$\left\lfloor\frac{\mbox{vertglbnbr}}{\mbox{procglbnbr}}\right\rfloor$
vertices, according to its rank: processes of lower ranks are filled
first, eventually with one more vertex than processes of higher ranks.

\progret

{\tt SCOTCH\_dgraphScatter} returns $0$ if the graph structure has
been successfully scattered, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphSize}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_dgraphSize ( & const SCOTCH\_Dgraph * & grafptr,    \\
                          & SCOTCH\_Num *          & vertglbptr, \\
                          & SCOTCH\_Num *          & vertlocptr, \\
                          & SCOTCH\_Num *          & edgeglbptr, \\
                          & SCOTCH\_Num *          & edgelocptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphsize ( & doubleprecision (*) & grafdat,    \\
                    & integer*{\it num}   & vertglbnbr, \\
                    & integer*{\it num}   & vertlocnbr, \\
                    & integer*{\it num}   & edgeglbnbr, \\
                    & integer*{\it num}   & edgelocnbr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphSize} routine fills the four areas of type
{\tt SCOTCH\_\lbt Num} pointed to by {\tt vertglbptr},
{\tt vertlocptr}, {\tt edgeglbptr} and {\tt edgelocptr}
with the number of global vertices and arcs (that is, twice the number
of edges) of the given graph pointed to by {\tt grafptr}, as well as
with the number of local vertices and arcs borne by each of the
calling processes.

Any of these pointers can be set to {\tt NULL} on input if the
corresponding information is not needed. Else, the reference to a
dummy area can be provided, where all unwanted data will be written.

This routine is useful to get the size of a graph read by means
of the {\tt SCOTCH\_\lbt dgraph\lbo Load} routine, in order to allocate
auxiliary arrays of proper sizes. If the whole structure of the
graph is wanted, function {\tt SCOTCH\_dgraph\lbo Data} should be
preferred.

\end{itemize}

\subsection{Distributed graph I/O routines}

\subsubsection{{\tt SCOTCH\_dgraphLoad}}
\label{sec-lib-dgraphload}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphLoad ( & SCOTCH\_Dgraph * & grafptr, \\
                         & FILE *           & stream, \\
                         & SCOTCH\_Num      & baseval, \\
                         & SCOTCH\_Num      & flagval)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphload ( & doubleprecision (*) & grafdat, \\
                    & integer             & fildes, \\
                    & integer*{\it num}   & baseval, \\
                    & integer*{\it num}   & flagval, \\
                    & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphLoad} routine fills the {\tt
SCOTCH\_\lbt Dgraph} structure pointed to by {\tt grafptr} with the
centralized or distributed source graph description available from
one or several streams {\tt stream} in the \scotch\ graph formats
(please refer to section~\ref{sec-file-dsgraph} for a description
of the distributed graph format, and to the {\it\scotch\ User's
Guide}~\scotchcitesuser\ for the centralized graph format).

When only one stream pointer is not null, the associated source graph
file must be a centralized one, the contents of which are spread
across all of the processes. When all stream pointers are non null,
they can either refer to multiple instances of the same centralized
graph, or to the distinct fragments of a distributed graph. In the
first case, all processes read all of the contents of the centralized
graph files but keep only the relevant part. In the second case, every
process reads its fragment in parallel.

To ease the handling of source graph files by programs written in C as
well as in Fortran, the base value of the graph to read can be set
to {\tt 0} or {\tt 1}, by setting the {\tt baseval} parameter to the
proper value. A value of {\tt -1} indicates that the graph base should
be the same as the one provided in the graph description that is read
from {\tt stream}.

The {\tt flagval} value is a combination of the following integer values,
that may be added or bitwise-ored:
\begin{itemize}
\iteme[{\tt 0}]
Keep vertex and edge weights if they are present in the {\tt stream} data.
\iteme[{\tt 1}]
Remove vertex weights. The graph read will have all of its vertex weights
set to one, regardless of what is specified in the {\tt stream} data.
\iteme[{\tt 2}]
Remove edge weights. The graph read will have all of its edge weights
set to one, regardless of what is specified in the {\tt stream} data.
\end{itemize}

Fortran users must use the {\tt PXFFILENO} or {\tt FNUM} functions to
obtain the number of the Unix file descriptor {\tt fildes} associated
with the logical unit of the graph file. Processes which would pass a
{\tt NULL} stream pointer in C must pass descriptor number {\tt -1} in
Fortran.

\progret

{\tt SCOTCH\_dgraphLoad} returns $0$ if the distributed graph
structure has been successfully allocated and filled with the data
read, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphSave}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphSave ( & const SCOTCH\_Dgraph * & grafptr, \\
                         & FILE *                 & stream)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphsave ( & doubleprecision (*) & grafdat, \\
                    & integer             & fildes, \\
                    & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphSave} routine saves the contents of the {\tt
SCOTCH\_\lbt Dgraph} structure pointed to by {\tt grafptr} to streams
{\tt stream}, in the \scotch\ distributed graph format (see
section~\ref{sec-file-dsgraph}).

Fortran users must use the {\tt PXFFILENO} or {\tt FNUM} functions to
obtain the number of the Unix file descriptor {\tt fildes} associated
with the logical unit of the graph file.

\progret

{\tt SCOTCH\_dgraphSave} returns $0$ if the graph structure has been
successfully written to {\tt stream}, and $1$ else.
\end{itemize}

\subsection{Data handling and exchange routines}

\subsubsection{{\tt SCOTCH\_dgraphGhst}}
\label{sec-lib-dgraphghst}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}

int SCOTCH\_dgraphGhst ( & SCOTCH\_Dgraph * const & grafptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphghst ( & doubleprecision (*) & grafdat, \\
                    & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphGhst} routine fills the {\tt edge\lbt gst\lbt
tab} arrays of the distributed graph structure pointed to by {\tt
grafptr} with the local and ghost vertex indices corresponding to the
global vertex indices contained in its {\tt edge\lbt loc\lbt tab}
arrays, according to the semantics described in
Section~\ref{sec-lib-format-dgraph}.

If memory areas had not been previously reserved by the user for the
{\tt edge\lbt gst\lbt tab} arrays and linked to the distributed graph
structure through a call to {\tt SCOTCH\_\lbt dgraph\lbt Build}, they
are allocated. Their references can be retrieved on every process by
means of a call to {\tt SCOTCH\_\lbt dgraph\lbt Data}, which will also
return the number of local and ghost vertices, suitable for allocating
vertex data arrays for {\tt SCOTCH\_\lbt dgraph\lbt Halo}.

\progret

{\tt SCOTCH\_dgraphGhst} returns $0$ if ghost vertex data has been
successfully computed, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphHalo}}
\label{sec-lib-dgraphhalo}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}

int SCOTCH\_dgraphHalo ( & SCOTCH\_Dgraph * const & grafptr, \\
                         & void *                 & datatab, \\
                         & MPI\_Datatype          & typeval)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphhalo ( & doubleprecision (*) & grafdat, \\
                    & doubleprecision (*) & datatab, \\
                    & integer             & typeval, \\
                    & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphHalo} routine propagates the data borne by
local vertices to all of the corresponding halo vertices located on
neighboring processes, in a synchronous way. On every process,
{\tt datatab} should point to a data array of a size sufficient to
hold {\tt vert\lbt gst\lbt nbr} elements of the data type to be
exchanged, the first {\tt vertlocnbr} slots of which must already be
filled with the information associated with the local vertices. On
completion, the $({\tt vert\lbt gst\lbt nbr} - {\tt vert\lbt loc\lbt
nbr})$ remaining slots are filled with copies of the corresponding
remote data obtained from the local parts of the data arrays of
neighboring processes.

When the MPI data type to be used is not a collection of contiguous
entries, great care should be taken in the definition of the upper
bound of the type (by using the {\tt MPI\_\lbo UB} pseudo-datatype),
such that when asking MPI to send a certain number of elements of the
said type located at some address, contiguous areas in memory will be
considered. Please refer to the MPI documentation regarding the
creation of derived datatypes~\cite[Section 3.12.3]{mpi11} for more
information.

To perform its data exchanges, the {\tt SCOTCH\_dgraph\lbt Halo}
routine requires ghost vertex management data provided by the {\tt
SCOTCH\_\lbt dgraph\lbt Ghst} routine. Therefore, the {\tt edge\lbt
gst\lbt tab} array returned by the {\tt SCOTCH\_dgraph\lbt Data}
routine will always be valid after a call to {\tt SCOTCH\_dgraph\lbt
Halo}, if it was not already.

In case useful computation can be carried out during the halo
exchange, an asynchronous version of this routine is available, called
{\tt SCOTCH\_\lbt dgraph\lbt Halo\lbt Async}.

\progret

{\tt SCOTCH\_dgraphHalo} returns $0$ if halo data has been
successfully exchanged, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphHaloAsync}}
\label{sec-lib-dgraphhaloasync}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}

int SCOTCH\_dgraphHaloAsync ( & SCOTCH\_Dgraph * const        & grafptr, \\
                              & void *                        & datatab, \\
                              & MPI\_Datatype                 & typeval, \\
                              & SCOTCH\_DgraphHaloReq * const & requptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphhaloasync ( & doubleprecision (*) & grafdat, \\
                         & doubleprecision (*) & datatab, \\
                         & integer             & typeval, \\
                         & doubleprecision (*) & requptr, \\
                         & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphHaloAsync} routine propagates the data borne by
local vertices to all of the corresponding halo vertices located on
neighboring processes, in an asynchronous way. On every process, {\tt
datatab} should point to a data array of a size sufficient to hold
{\tt vert\lbt gst\lbt nbr} elements of the data type to be exchanged,
the first {\tt vertlocnbr} slots of which must already be filled with
the information associated with the local vertices. On completion, the
$({\tt vert\lbt gst\lbt nbr} - {\tt vert\lbt loc\lbt nbr})$ remaining
slots are filled with copies of the corresponding remote data obtained
from the local parts of the data arrays of neighboring processes.

The semantics of {\tt SCOTCH\_dgraphHaloAsync} is similar to the one
of {\tt SCOTCH\_dgraph\lbt Halo}, except that it returns as soon as
possible, while effective communication may not have started nor
completed. Also, it possesses an additional parameter, {\tt requptr},
which must point to a {\tt SCOTCH\_\lbt Dgraph\lbt Halo\lbt Req} data
structure. Similarly to asynchronous MPI calls, users can wait for the
completion of a {\tt SCOTCH\_dgraph\lbt Halo\lbt Async} routine by
calling the {\tt SCOTCH\_dgraph\lbt Halo\lbt Wait} routine, passing it
a pointer to this request structure. In Fortran, the request structure
must be defined as an array of {\tt DOUBLEPRECISION}s, of size
{\tt SCOTCH\_\lbt DGRAPH\lbt HALO\lbt REQDIM}. This constant is
defined in file {\tt ptscotchf.h}, which must be included whenever
necessary.

The effective means for {\tt SCOTCH\_dgraph\lbt Halo\lbt Async} to
perform its task may vary at compile time, depending on the presence
of a thread-safe MPI library or on the existence of asynchronous
collective communication routines such as {\tt MPE\_\lbt Ialltoallv}.
In case no method for performing asynchronous collective communication
is available, {\tt SCOTCH\_dgraph\lbt Halo\lbt Async} will internally
call {\tt SCOTCH\_dgraph\lbt Halo} to perform synchronous communication.

Because of possible limitations in the implementation of third-party
communication routines, it is not recommended to perform simultaneous
{\tt SCOTCH\_dgraph\lbt Halo\lbt Async} calls on the same communicator.

\progret

{\tt SCOTCH\_dgraphHaloAsync} returns $0$ if the halo data exchange
has been successfully started, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphHaloWait}}
\label{sec-lib-dgraphhalowait}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}

int SCOTCH\_dgraphHaloWait ( & SCOTCH\_DgraphHaloReq * const & requptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphhalowait ( & doubleprecision (*) & requptr, \\
                        & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphHaloWait} routine waits for the termination of
an asynchronous halo exchange process, started by a call to {\tt
SCOTCH\_dgraph\lbt Halo\lbt Async}, and represented by its request,
pointed to by {\tt requptr}.

In Fortran, the request structure must be defined as an array of
{\tt DOUBLEPRECISION}s, of size {\tt SCOTCH\_\lbt DGRAPH\lbt HALO\lbt
REQDIM}. This constant is defined in file {\tt ptscotchf.h}, which
must be included whenever necessary.

\progret

{\tt SCOTCH\_dgraphHaloWait} returns $0$ if halo data has been
successfully exchanged, and $1$ else.
\end{itemize}

\subsection{Distributed graph mapping and partitioning routines}

{\tt SCOTCH\_dgraphMap} and {\tt SCOTCH\_dgraphPart} provide
high-level functionalities and free the user from the burden of
calling in sequence several of the low-level routines also described
in this section.

\subsubsection{{\tt SCOTCH\_dgraphMap}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphMap ( & const SCOTCH\_Dgraph * & grafptr, \\
                        & const SCOTCH\_Arch *   & archptr, \\
                        & const SCOTCH\_Strat *  & straptr, \\
                        & SCOTCH\_Num *          & partloctab)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphmap ( & doubleprecision (*)   & grafdat,    \\
                   & doubleprecision (*)   & archdat,    \\
                   & doubleprecision (*)   & stradat,    \\
                   & integer*{\it num} (*) & partloctab, \\
                   & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphMap} routine computes a mapping of the
distributed source graph structure pointed to by {\tt grafptr} onto
the target architecture pointed to by {\tt archptr}, using the mapping
strategy pointed to by {\tt straptr}, and returns distributed
fragments of the partition data in the array pointed to by {\tt
partloctab}.

The {\tt partloctab} array should have been previously allocated, of a
size sufficient to hold as many {\tt SCOTCH\_\lbt Num} integers as
there are local vertices of the source graph on each of the processes.

On return, every cell of the mapping array holds the number of the
target vertex to which the corresponding source vertex is mapped.
The numbering of target values is {\em not\/} based: target vertices
are numbered from $0$ to the number of target vertices minus $1$.

{\bf Attention}: version {\sc 6.0} of \scotch\ does not allow yet to
map distributed graphs onto target architectures which are not
complete graphs. This restriction will be removed in the next release.

\progret

{\tt SCOTCH\_dgraphMap} returns $0$ if the partition of the graph has
been successfully computed, and $1$ else. In this last case, the
{\tt partloctab} arrays may however have been partially or completely
filled, but their contents is not significant.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphMapCompute}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphMapCompute ( & const SCOTCH\_Dgraph * & grafptr, \\
                               & SCOTCH\_Dmapping *     & mappptr, \\
                               & const SCOTCH\_Strat *  & straptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphmapcompute ( & doubleprecision (*) & grafdat, \\
                          & doubleprecision (*) & mappdat, \\
                          & doubleprecision (*) & stradat, \\
                          & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphMapCompute} routine computes a mapping
on the given {\tt SCOTCH\_\lbt Dmapping} structure pointed
to by {\tt mappptr} using the parallel mapping strategy pointed to
by {\tt stratptr}.

On return, every cell of the distributed mapping array (see
section~\ref{sec-lib-dgraph-map-init}) holds the number of the target
vertex to which the corresponding source vertex is mapped. The
numbering of target values is {\em not\/} based: target vertices are
numbered from $0$ to the number of target vertices, minus $1$.

{\bf Attention}: version {\sc 6.0} of \scotch\ does not allow yet to
map distributed graphs onto target architectures which are not
complete graphs. This restriction will be removed in the next release.

\progret

{\tt SCOTCH\_dgraphMapCompute} returns $0$ if the mapping has been
successfully computed, and $1$ else. In this latter case, the local
mapping arrays may however have been partially or completely filled,
but their contents is not significant.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphMapExit}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_dgraphMapExit ( & const SCOTCH\_Dgraph * & grafptr, \\
                             & SCOTCH\_Dmapping *     & mappptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphmapexit ( & doubleprecision (*) & grafdat, \\
                       & doubleprecision (*) & mappdat)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphMapExit} function frees the contents of a
{\tt SCOTCH\_\lbt Dmapping} structure previously initialized by
{\tt SCOTCH\_\lbt dgraph\lbt Map\lbt Init}. All subsequent calls to
{\tt SCOTCH\_\lbt dgraph\lbt Map*} routines other than
{\tt SCOTCH\_\lbt dgraph\lbt Map\lbt Init}, using this structure
as parameter, may yield unpredictable results.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphMapInit}}
\label{sec-lib-dgraph-map-init}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphMapInit ( & const SCOTCH\_Dgraph * & grafptr, \\
                            & SCOTCH\_Dmapping *     & mappptr, \\
                            & const SCOTCH\_Arch *   & archptr, \\
                            & SCOTCH\_Num *          & partloctab)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphmapinit ( & doubleprecision (*)   & grafdat,    \\
                       & doubleprecision (*)   & mappdat,    \\
                       & doubleprecision (*)   & archdat,    \\
                       & integer*{\it num} (*) & partloctab, \\
                       & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphMapInit} routine fills the distributed mapping
structure pointed to by {\tt mappptr} with all of the data that is
passed to it. Thus, all subsequent calls to ordering routines such as
{\tt SCOTCH\_\lbt dgraph\lbt Map\lbt Compute}, using this mapping
structure as parameter, will place mapping results in field {\tt
part\lbt loc\lbt tab}.

{\tt partloctab} is the pointer to an array of as many {\tt
SCOTCH\_\lbt Num}s as there are local vertices in each local fragment
of the distributed graph pointed to by {\tt grafptr}, and which will
receive the indices of the vertices of the target architecture pointed
to by {\tt archptr}.

It should be the first function to be called upon a {\tt SCOTCH\_\lbt
Dmapping} structure. When the distributed mapping structure is no
longer of use, call function {\tt SCOTCH\_dgraph\lbt \lbt Map\lbt
Exit} to free its internal structures.

\progret

{\tt SCOTCH\_dgraphMapInit} returns $0$ if the distributed mapping
structure has been successfully initialized, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphMapSave}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphMapSave ( & const SCOTCH\_Dgraph *   & grafptr, \\
                            & const SCOTCH\_Dmapping * & mappptr, \\
                            & FILE *                   & stream)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphmapsave ( & doubleprecision (*) & grafdat, \\
                       & doubleprecision (*) & mappdat, \\
                       & integer             & fildes,  \\
                       & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphMapSave} routine saves the contents of the {\tt
SCOTCH\_\lbt Dmapping} structure pointed to by {\tt mappptr} to stream
{\tt stream}, in the \scotch\ mapping format. Please refer to the
{\it\scotch\ User's Guide}~\scotchcitesuser\ for more information about
this format.

Since the mapping format is centralized, only one process should
provide a valid output stream; other processes must pass a null
pointer.

Fortran users must use the {\tt PXFFILENO} or {\tt FNUM} functions to
obtain the number of the Unix file descriptor {\tt fildes} associated
with the logical unit of the mapping file.

\progret

{\tt SCOTCH\_dgraphMapSave} returns $0$ if the mapping structure
has been successfully written to {\tt stream}, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphPart}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphPart ( & const SCOTCH\_Dgraph * & grafptr, \\
                         & const SCOTCH\_Num      & partnbr, \\
                         & const SCOTCH\_Strat *  & straptr, \\
                         & SCOTCH\_Num *          & partloctab)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphpart ( & doubleprecision (*)   & grafdat,    \\
                    & integer*{\it num}     & partnbr,    \\
                    & doubleprecision (*)   & stradat,    \\
                    & integer*{\it num} (*) & partloctab, \\
                    & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphPart} routine computes a partition into {\tt
partnbr} parts of the distributed source graph structure pointed to
by {\tt grafptr}, using the graph partitioning strategy pointed to by
{\tt stratptr}, and returns distributed fragments of the partition
data in the array pointed to by {\tt partloctab}.

The {\tt partloctab} array should have been previously allocated, of a
size sufficient to hold as many {\tt SCOTCH\_\lbt Num} integers as
there are local vertices of the source graph on each of the processes.

On return, every array cell holds the number of the part to which the
corresponding vertex is mapped. Parts are numbered from $0$ to
$\mbox{\tt partnbr} - 1$.

\progret

{\tt SCOTCH\_dgraphPart} returns $0$ if the partition of the graph has
been successfully computed, and $1$ else. In this latter case, the
{\tt partloctab} array may however have been partially or completely
filled, but its content is not significant.
\end{itemize}

\subsection{Distributed graph ordering routines}

\subsubsection{{\tt SCOTCH\_dgraphOrderCblkDist}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
SCOTCH\_Num SCOTCH\_dgraphOrderCblkDist ( & const SCOTCH\_Dgraph * & grafptr, \\
                                          & SCOTCH\_Dordering *    & ordeptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphordercblkdist ( & doubleprecision (*) & grafdat, \\
                             & doubleprecision (*) & ordedat, \\
                             & integer*{\it num}   & cblkglbnbr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphOrderCblkDist} routine returns on all processes
the global number of distributed elimination tree (super-)nodes
possessed by the given distributed ordering. Distributed elimination
tree nodes are produced for instance by parallel nested dissection,
before the ordering process goes sequential. Subsequent sequential
nodes generated locally afterwards on individual processes are not
accounted for in this figure.

This routine is used to allocate space for the tree structure arrays
to be filled by the {\tt SCOTCH\_\lbt dgraph\lbt Order\lbt Tree\lbt
Dist} routine.

\progret

{\tt SCOTCH\_dgraphOrderCblkDist} returns a positive number if the
number of distributed elimination tree nodes has been successfully
computed, and a negative value else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphOrderCompute}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphOrderCompute ( & const SCOTCH\_Dgraph * & grafptr, \\
                                 & SCOTCH\_Dordering *    & ordeptr, \\
                                 & const SCOTCH\_Strat *  & straptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphordercompute ( & doubleprecision (*) & grafdat, \\
                            & doubleprecision (*) & ordedat, \\
                            & doubleprecision (*) & stradat, \\
                            & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphOrderCompute} routine computes in parallel a
distributed block ordering of the distributed graph structure pointed
to by {\tt grafptr}, using the distributed ordering strategy pointed to
by {\tt stratptr}, and stores its result in the distributed ordering
structure pointed to by {\tt ordeptr}.

\progret

{\tt SCOTCH\_dgraphOrderCompute} returns $0$ if the ordering has been
successfully computed, and $1$ else. In this latter case, the ordering
arrays may however have been partially or completely filled, but their
contents are not significant.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphOrderExit}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_dgraphOrderExit ( & const SCOTCH\_Dgraph * & grafptr, \\
                               & SCOTCH\_Dordering *    & ordeptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfgraphdorderexit ( & doubleprecision (*) & grafdat, \\
                         & doubleprecision (*) & ordedat)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphOrderExit} function frees the contents of a
{\tt SCOTCH\_\lbt Dordering} structure previously initialized by
{\tt SCOTCH\_\lbt dgraph\lbt Order\lbt Init}. All subsequent calls to
{\tt SCOTCH\_\lbt dgraph\lbt Order*} routines other than
{\tt SCOTCH\_\lbt dgraph\lbt Order\lbt Init}, using this structure
as parameter, may yield unpredictable results.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphOrderInit}}
\label{sec-lib-dgraphorderinit}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphOrderInit ( & const SCOTCH\_Dgraph * & grafptr, \\
                              & SCOTCH\_Dordering *    & ordeptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphorderinit ( & doubleprecision (*) & grafdat, \\
                         & doubleprecision (*) & ordedat, \\
                         & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraph\lbt Order\lbt Init} routine initializes the
distributed ordering structure pointed to by {\tt ordeptr} so that it
can be used to store the results of the parallel ordering of the
associated distributed graph, to be computed by means of the
{\tt SCOTCH\_\lbt dgraph\lbt Order\lbt Compute} routine.

The {\tt SCOTCH\_\lbt dgraph\lbt Order\lbt Init} routine should be the
first function to be called upon a {\tt SCOTCH\_\lbt Dordering}
structure for ordering distributed graphs. When the ordering structure
is no longer of use, the {\tt SCOTCH\_\lbt dgraph\lbt Order\lbt Exit}
function must be called, in order to to free its internal structures.

\progret

{\tt SCOTCH\_dgraphOrderInit} returns $0$ if the distributed ordering
structure has been successfully initialized, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphOrderSave}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphOrderSave ( & const SCOTCH\_Dgraph *    & grafptr, \\
                              & const SCOTCH\_Dordering * & ordeptr, \\
                              & FILE *                    & stream)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphordersave ( & doubleprecision (*) & grafdat, \\
                         & doubleprecision (*) & ordedat, \\
                         & integer             & fildes,  \\
                         & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphOrderSave} routine saves the contents of the {\tt
SCOTCH\_\lbt Dordering} structure pointed to by {\tt ordeptr} to stream
{\tt stream}, in the \scotch\ ordering format. Please refer to the
{\it\scotch\ User's Guide}~\scotchcitesuser\ for more information about
this format.

Since the ordering format is centralized, only one process should
provide a valid output stream; other processes must pass a null
pointer.

Fortran users must use the {\tt PXFFILENO} or {\tt FNUM} functions to
obtain the number of the Unix file descriptor {\tt fildes} associated
with the logical unit of the ordering file. Processes which would pass
a {\tt NULL} stream pointer in C must pass descriptor number {\tt -1}
in Fortran.

\progret

{\tt SCOTCH\_dgraphOrderSave} returns $0$ if the ordering structure
has been successfully written to {\tt stream}, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphOrderSaveMap}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphOrderSaveMap ( & const SCOTCH\_Dgraph *    & grafptr, \\
                                 & const SCOTCH\_Dordering * & ordeptr, \\
                                 & FILE *                    & stream)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfgraphdordersavemap ( & doubleprecision (*) & grafdat, \\
                            & doubleprecision (*) & ordedat, \\
                            & integer             & fildes,  \\
                            & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphOrderSaveMap} routine saves the block
partitioning data associated with the {\tt SCOTCH\_\lbt Dordering}
structure pointed to by {\tt ordeptr} to stream {\tt stream}, in the
\scotch\ mapping format. A target domain number is associated with
every block, such that all node vertices belonging to the same block
are shown as belonging to the same target vertex.
The resulting mapping file can be used by the {\tt gout} program
to produce pictures showing the different separators and blocks.
Please refer to the {\it\scotch\ User's Guide} for more information
on the \scotch\ mapping format and on {\tt gout}.

Since the block partitioning format is centralized, only one process
should provide a valid output stream; other processes must pass a
null pointer.

Fortran users must use the {\tt PXFFILENO} or {\tt FNUM} functions to
obtain the number of the Unix file descriptor {\tt fildes} associated
with the logical unit of the ordering file. Processes which would pass
a {\tt NULL} stream pointer in C must pass descriptor number {\tt -1}
in Fortran.

\progret

{\tt SCOTCH\_dgraphOrderSaveMap} returns $0$ if the ordering structure
has been successfully written to {\tt stream}, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphOrderSaveTree}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphOrderSaveTree ( & const SCOTCH\_Dgraph *    & grafptr, \\
                                  & const SCOTCH\_Dordering * & ordeptr, \\
                                  & FILE *                    & stream)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphordersavetree ( & doubleprecision (*) & grafdat, \\
                             & doubleprecision (*) & ordedat, \\
                             & integer             & fildes,  \\
                             & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphOrderSaveTree} routine saves the tree
hierarchy information associated with the {\tt SCOTCH\_\lbt Dordering}
structure pointed to by {\tt ordeptr} to stream {\tt stream}.

The format of the tree output file resembles the one of a mapping or
ordering file: it is made up of as many lines as there are vertices in
the ordering. Each of these lines holds two integer numbers. The first
one is the index or the label of the vertex, and the second one is the
index of its parent node in the separators tree, or $-1$ if the vertex
belongs to a root node.

Since the tree hierarchy format is centralized, only one process
should provide a valid output stream; other processes must pass a
null pointer.

Fortran users must use the {\tt PXFFILENO} or {\tt FNUM} functions to
obtain the number of the Unix file descriptor {\tt fildes} associated
with the logical unit of the ordering file. Processes which would pass
a {\tt NULL} stream pointer in C must pass descriptor number {\tt -1}
in Fortran.

\progret

{\tt SCOTCH\_dgraphOrderSaveTree} returns $0$ if the ordering structure
has been successfully written to {\tt stream}, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphOrderPerm}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphOrderPerm ( & const SCOTCH\_Dgraph * & grafptr, \\
                              & SCOTCH\_Dordering *    & ordeptr, \\
                              & SCOTCH\_Num *          & permloctab)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphorderperm ( & doubleprecision (*)   & grafdat,    \\
                         & doubleprecision (*)   & ordedat,    \\
                         & integer*{\it num} (*) & permloctab, \\
                         & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphOrderPerm} routine fills the distributed direct
permutation array {\tt permloctab} according to the ordering provided
by the given distributed ordering pointed to by {\tt ordeptr}. Each
{\tt permloctab} local array must be of size {\tt vertlocnbr}.

\progret

{\tt SCOTCH\_dgraphOrderPerm} returns $0$ if the distributed
permutation has been successfully computed, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphOrderTreeDist}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphOrderTreeDist ( & const SCOTCH\_Dgraph * & grafptr,   \\
                                  & SCOTCH\_Dordering *    & ordeptr,   \\
                                  & SCOTCH\_Num *          & treeglbtab \\
                                  & SCOTCH\_Num *          & sizeglbtab)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphordertreedist ( & doubleprecision (*)   & grafdat,    \\
                             & doubleprecision (*)   & ordedat,    \\
                             & integer*{\it num} (*) & treeglbtab, \\
                             & integer*{\it num} (*) & sizeglbtab, \\
                             & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphOrderTreeDist} routine fills on all processes
the arrays representing the distributed part of the elimination tree
structure associated with the given distributed ordering. This structure
describes the sizes and relations between all distributed elimination
tree (super-)nodes. These nodes are mainly the result of parallel
nested dissection, before the ordering process goes sequential.
Sequential nodes generated locally on individual processes are not
represented in this structure.

A node can either be a leaf column block, which has no
descendants, or a nested dissection node, which has most often
three sons: its two separated sub-parts and the separator. A
nested dissection node may have two sons only if the separator is
empty; it cannot have only one son. Sons are indexed such that the
separator of a block, if any, is always the son of highest index.
Hence, the order of the indices of the two sub-parts matches the one
of the direct permutation of the unknowns.

For any column block $i$, {\tt treeglbtab[}$i${\tt ]} holds the index
of the father of node $i$ in the elimination tree, or $-1$ if $i$ is
the root of the tree. All node indices start from
{\tt baseval}. {\tt size\lbt glb\lbt tab[}$i${\tt ]} holds the number
of graph vertices possessed by node $i$, plus the ones of all
of its descendants if it is not a leaf of the tree. Therefore, the
{\tt size\lbt glb\lbt tab} value of the root vertex is always equal to
the number of vertices in the distributed graph.

Each of the {\tt treeglbtab} and {\tt size\lbt glb\lbt tab} arrays
must be large enough to hold a number of {\tt SCOTCH\_\lbt Num}s equal
to the number of distributed elimination tree nodes and column blocks,
as returned by the {\tt SCOTCH\_\lbt dgraph\lbt Order\lbt Cblk\lbt Dist}
routine.

\progret

{\tt SCOTCH\_dgraphOrderTreeDist} returns $0$ if the arrays describing
the distributed part of the distributed tree structure have been
successfully filled, and $1$ else.
\end{itemize}

\subsection{Centralized ordering handling routines}
\label{sec-lib-corder}

Since distributed ordering structures maintain scattered information
which cannot be easily collated, the only practical way to access this
information is to centralize it in a sequential {\tt SCOTCH\_\lbt
Ordering} structure. Several routines are provided to
create and destroy sequential orderings attached to a distributed
graph, and to gather the information contained in a distributed
ordering on such a sequential ordering structure.

Since the arrays which represent centralized ordering must be of a
size equal to the global number of vertices, these routines are not
scalable and may require much memory for very large graphs.

\subsubsection{{\tt SCOTCH\_dgraphCorderExit}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_dgraphCorderExit ( & const SCOTCH\_Dgraph * & grafptr, \\
                                & SCOTCH\_Ordering *     & cordptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphcorderexit ( & doubleprecision (*) & grafdat, \\
                          & doubleprecision (*) & corddat)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraphCorderExit} function frees the contents of a
centralized {\tt SCOTCH\_\lbt Ordering} structure previously
initialized by {\tt SCOTCH\_\lbt dgraph\lbt Corder\lbt Init}.
\end{itemize}


\subsubsection{{\tt SCOTCH\_dgraphCorderInit}}
\label{sec-lib-graph-corder-init}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphCorderInit ( & const SCOTCH\_Dgraph * & grafptr, \\
                               & SCOTCH\_Ordering *     & cordptr, \\
                               & SCOTCH\_Num *          & permtab, \\
                               & SCOTCH\_Num *          & peritab, \\
                               & SCOTCH\_Num *          & cblkptr, \\
                               & SCOTCH\_Num *          & rangtab, \\
                               & SCOTCH\_Num *          & treetab)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphcorderinit ( & doubleprecision (*)   & grafdat, \\
                          & doubleprecision (*)   & corddat, \\
                          & integer*{\it num} (*) & permtab, \\
                          & integer*{\it num} (*) & peritab, \\
                          & integer*{\it num}     & cblknbr, \\
                          & integer*{\it num} (*) & rangtab, \\
                          & integer*{\it num} (*) & treetab, \\
                          & integer               & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraph\lbt Corder\lbt Init} routine fills the
centralized ordering structure pointed to by {\tt cordptr} with all of
the data that are passed to it. This routine is the equivalent of the
{\tt SCOTCH\_\lbt graph\lbt Order\lbt Init} routine of the
\scotch\ sequential library, except that it takes a distributed graph
as input. It is used to initialize a centralized ordering structure on
which a distributed ordering will be centralized by means of the
{\tt SCOTCH\_\lbt dgraph\lbt Order\lbt Gather} routine. Only the
process on which distributed ordering data is to be centralized has
to handle a centralized ordering structure.

{\tt permtab} is the ordering permutation array, of size ${\tt
vert\lbt glb\lbt nbr}$, {\tt peritab} is the inverse ordering permutation array,
of size ${\tt vert\lbt glb\lbt nbr}$, {\tt cblkptr} is the pointer to a
{\tt SCOTCH\_\lbt Num} that will receive the number of produced
column blocks, {\tt rangtab} is the array that holds the column block
span information, of size $\mbox{\tt vert\lbt glb\lbt nbr} + 1$, and {\tt treetab}
is the array holding the structure of the separators tree, of size
${\tt vert\lbt glb\lbt nbr}$. Please refer to
Section~\ref{sec-lib-format-order} for an explanation of their semantics.
Any of these five output fields can be set to {\tt NULL} if the
corresponding information is not needed. Since, in Fortran, there is
no null reference, passing a reference to {\tt grafptr} will have the
same effect.

The {\tt SCOTCH\_\lbt dgraph\lbt Corder\lbt Init} routine should be
the first function to be called upon a {\tt SCOTCH\_\lbt Ordering}
structure to be used for gathering distributed ordering data. When the
centralized ordering structure is no longer of use, the {\tt
SCOTCH\_\lbt dgraph\lbt Corder\lbt Exit} function must be called, in
order to to free its internal structures.

\progret

{\tt SCOTCH\_dgraphCorderInit} returns $0$ if the ordering structure has
been successfully initialized, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dgraphOrderGather}}
\label{sec-lib-dgraph-order-gather}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_dgraphOrderGather ( & const SCOTCH\_Dgraph * & grafptr, \\
                                & SCOTCH\_Dordering *    & cordptr, \\
                                & SCOTCH\_Ordering *     & cordptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfdgraphordergather ( & doubleprecision (*) & grafdat, \\
                           & doubleprecision (*) & dorddat, \\
                           & doubleprecision (*) & corddat, \\
                           & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dgraph\lbt Order\lbt Gather} routine gathers the
distributed ordering data borne by {\tt dordptr} to the
centralized ordering structure pointed to by {\tt cordptr}.

\progret

{\tt SCOTCH\_dgraphOrderGather} returns $0$ if the centralized
ordering structure has been successfully updated, and $1$ else.
\end{itemize}

\subsection{Strategy handling routines}
\label{sec-lib-strat}

This section presents basic strategy handling routines which are also
described in the {\it\scotch\ User's Guide} but which are duplicated
here for the sake of readability, as well as a strategy declaration
routine which is specific to the \ptscotch\ library.

\subsubsection{{\tt SCOTCH\_stratExit}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_stratExit ( & SCOTCH\_Strat * & archptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfstratexit ( & doubleprecision (*) & stradat)
\end{tabular}}

\progdes

The {\tt SCOTCH\_stratExit} function frees the contents of a
{\tt SCOTCH\_\lbt Strat} structure previously initialized by
{\tt SCOTCH\_\lbt strat\lbt Init}. All subsequent calls to
{\tt SCOTCH\_\lbt strat} routines other than {\tt SCOTCH\_\lbt
strat\lbt Init}, using this structure as parameter, may yield
unpredictable results.
\end{itemize}

\subsubsection{{\tt SCOTCH\_stratInit}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_stratInit ( & SCOTCH\_Strat * & straptr)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfstratinit ( & doubleprecision (*) & stradat, \\
                   & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_stratInit} function initializes a {\tt SCOTCH\_\lbt
Strat} structure so as to make it suitable for future operations. It
should be the first function to be called upon a {\tt SCOTCH\_\lbt
Strat} structure. When the strategy data is no longer of use, call
function {\tt SCOTCH\_\lbt strat\lbt Exit} to free its internal
structures.

\progret

{\tt SCOTCH\_stratInit} returns $0$ if the strategy structure has been
successfully initialized, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_stratSave}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_stratSave ( & const SCOTCH\_Strat * & straptr, \\
                        & FILE *                & stream)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfstratsave ( & doubleprecision (*) & stradat, \\
                   & integer             & fildes,  \\
                   & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_stratSave} routine saves the contents of the {\tt
SCOTCH\_\lbt Strat} structure pointed to by {\tt straptr} to stream
{\tt stream}, in the form of a text string. The methods and
parameters of the strategy string depend on the type of the strategy,
that is, whether it is a bipartitioning, mapping, or ordering
strategy, and to which structure it applies, that is, graphs or
meshes.

Fortran users must use the {\tt PXFFILENO} or {\tt FNUM} functions to
obtain the number of the Unix file descriptor {\tt fildes} associated
with the logical unit of the output file.

\progret

{\tt SCOTCH\_stratSave} returns $0$ if the strategy string has been
successfully written to {\tt stream}, and $1$ else.
\end{itemize}

\subsection{Strategy creation routines}
\label{sec-lib-strat-creation}

Strategy creation routines parse the user-provided strategy string and
populate the given opaque strategy object with a tree-shaped structure
that represents the parsed expression. It is this structure that will
be later traversed by the generic routines for partitioning, mapping or
ordering, so as to determine which specific partitioning, mapping or
ordering method to be called on a subgraph being considered.

Because strategy creation routines call third-party lexical analyzers
that may have been implemented in a non-reentrant way, no guarantee is
given on the reentrance of these routines. Consequently, strategy
creation routines that might be called simultaneously by multiple
threads should be protected by a mutex.

\subsubsection{{\tt SCOTCH\_stratDgraphMap}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_stratDgraphMap ( & SCOTCH\_Strat * & straptr, \\
                             & const char *    & string)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfstratdgraphmap ( & doubleprecision (*) & stradat, \\
                        & character (*)       & string,  \\
                        & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_stratDgraphMap} routine fills the strategy
structure pointed to by {\tt straptr} with the distributed graph
mapping strategy string pointed to by {\tt string}. The format of this
strategy string is described in Section~\ref{sec-lib-format-pmap}.
From this point, strategy {\tt strat} can only be used as a
distributed graph mapping strategy, to be used by functions {\tt
SCOTCH\_\lbt dgraph\lbt Part}, {\tt SCOTCH\_\lbt dgraph\lbt Map} or
{\tt SCOTCH\_\lbt dgraph\lbt Map\lbt Compute}. This routine must be
called on every process with the same strategy string.

When using the C interface, the array of characters pointed to by
{\tt string} must be null-terminated.

\progret

{\tt SCOTCH\_stratDgraphMap} returns $0$ if the strategy string
has been successfully set, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_stratDgraphMapBuild}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_stratDgraphMapBuild ( & SCOTCH\_Strat *   & straptr, \\
                                  & const SCOTCH\_Num & flagval, \\
                                  & const SCOTCH\_Num & procnbr, \\
                                  & const SCOTCH\_Num & partnbr, \\
                                  & const double      & balrat)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfstratdgraphmapbuild ( & doubleprecision (*) & stradat, \\
                             & integer*{\it num}   & flagval, \\
                             & integer*{\it num}   & procnbr, \\
                             & integer*{\it num}   & partnbr, \\
                             & doubleprecision     & balrat,  \\
                             & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_stratDgraphMapBuild} routine fills the strategy
structure pointed to by {\tt straptr} with a default mapping strategy
tuned according to the preference flags passed as {\tt flagval} and to
the desired number of parts {\tt partnbr} and imbalance ratio {\tt
balrat}, to be used on {\tt procnbr} processes. From this point, the
strategy structure can only be used as a parallel mapping strategy, to
be used by function {\tt SCOTCH\_\lbt dgraph\lbt Map}, for
instance. See Section~\ref{sec-lib-format-strat-default} for a
description of the available flags.

\progret

{\tt SCOTCH\_stratDgraphMapBuild} returns $0$ if the strategy string
has been successfully set, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_stratDgraphOrder}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_stratDgraphOrder ( & SCOTCH\_Strat * & straptr, \\
                               & const char *    & string)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfstratdgraphorder ( & doubleprecision (*) & stradat, \\
                          & character (*)       & string,  \\
                          & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_stratDgraphOrder} routine fills the strategy
structure pointed to by {\tt straptr} with the distributed graph
ordering strategy string pointed to by {\tt string}. The format of
this strategy string is described in Section~\ref{sec-lib-format-pord}.
From this point, strategy {\tt strat} can only be used as a
distributed graph ordering strategy, to be used by function {\tt
SCOTCH\_\lbt dgraph\lbt Order\lbt Compute}. This routine must be
called on every process with the same strategy string.

When using the C interface, the array of characters pointed to by
{\tt string} must be null-terminated.

\progret

{\tt SCOTCH\_stratDgraphOrder} returns $0$ if the strategy string
has been successfully set, and $1$ else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_stratDgraphOrderBuild}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
int SCOTCH\_stratDgraphOrderBuild ( & SCOTCH\_Strat *   & straptr, \\
                                    & const SCOTCH\_Num & flagval, \\
                                    & const SCOTCH\_Num & procnbr, \\
                                    & const SCOTCH\_Num & levlnbr, \\
                                    & const double      & balrat)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfstratdgraphorderbuild ( & doubleprecision (*) & stradat, \\
                               & integer*{\it num}   & flagval, \\
                               & integer*{\it num}   & procnbr, \\
                               & integer*{\it num}   & levlnbr, \\
                               & doubleprecision     & balrat,  \\
                               & integer             & ierr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_stratDgraphOrderBuild} routine fills the strategy
structure pointed to by {\tt straptr} with a default parallel ordering
strategy tuned according to the preference flags passed as {\tt
flagval} and to the desired nested dissection imbalance ratio {\tt
balrat}, to be used on {\tt procnbr} processes. From this point, the
strategy structure can only be used as a parallel ordering strategy, to
be used by function {\tt SCOTCH\_\lbt dgraph\lbt Order}, for
instance.

See Section~\ref{sec-lib-format-strat-default} for a description of
the available flags. When any of the {\tt SCOTCH\_\lbt STRAT\lbt
LEVEL\lbt MIN} or {\tt SCOTCH\_\lbt STRAT\lbt LEVEL\lbt MAX} flags is
set, the {\tt levlnbr} parameter is taken into account.

\progret

{\tt SCOTCH\_stratDgraphOrderBuild} returns $0$ if the strategy string
has been successfully set, and $1$ else.
\end{itemize}

\subsection{Other data structure routines}
\label{sec-lib-other}

\subsubsection{{\tt SCOTCH\_dmapAlloc}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}l}
SCOTCH\_Dmapping * SCOTCH\_dmapAlloc ( & void)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dmapAlloc} function allocates a memory area of a
size sufficient to store a {\tt SCOTCH\_\lbt Dmapping} structure. It is
the user's responsibility to free this memory when it is no longer
needed.

\progret

{\tt SCOTCH\_dmapAlloc} returns the pointer to the memory area if it
has been successfully allocated, and {\tt NULL} else.
\end{itemize}

\subsubsection{{\tt SCOTCH\_dorderAlloc}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}l}
SCOTCH\_Dordering * SCOTCH\_dorderAlloc ( & void)
\end{tabular}}

\progdes

The {\tt SCOTCH\_dorderAlloc} function allocates a memory area of a
size sufficient to store a {\tt SCOTCH\_\lbt Dordering} structure. It is
the user's responsibility to free this memory when it is no longer
needed.

\progret

{\tt SCOTCH\_dorderAlloc} returns the pointer to the memory area if it
has been successfully allocated, and {\tt NULL} else.
\end{itemize}

\subsection{Error handling routines}
\label{sec-lib-error}

The handling of errors that occur within library routines is
often difficult, because library routines should be able to issue
error messages that help the application programmer to find the
error, while being compatible with the way the application handles
its own errors.

To match these two requirements, all the error and warning messages
produced by the routines of the \libscotch\ library are issued using
the user-definable variable-length argument routines {\tt SCOTCH\_\lbt
error\lbt Print} and {\tt SCOTCH\_\lbt error\lbt PrintW}. Thus, one
can redirect these error messages to his own error handling routines,
and can choose if he wants his program to terminate on error or to
resume execution after the erroneous function has returned.

In order to free the user from the burden of writing a basic error
handler from scratch, the {\tt libptscotcherr.a} library provides error
routines that print error messages on the standard error stream
{\tt stderr} and return control to the application. Application
programmers who want to take advantage of them have to add
{\tt -lptscotcherr} to the list of arguments of the linker, after
the {\tt -lptscotch} argument.

\subsubsection{{\tt SCOTCH\_errorPrint}}
\label{sec-lib-errorprint}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_errorPrint ( & const char * const & errstr, ... )
\end{tabular}}

\progdes

The {\tt SCOTCH\_errorPrint} function is designed to output a
variable-length argument error string to some stream.
\end{itemize}

\subsubsection{{\tt SCOTCH\_errorPrintW}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_errorPrintW ( & const char * const & errstr, ...)
\end{tabular}}

\progdes

The {\tt SCOTCH\_errorPrintW} function is designed to output a
variable-length argument warning string to some stream.
\end{itemize}

\subsubsection{{\tt SCOTCH\_errorProg}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_errorProg ( & const char * & progstr)
\end{tabular}}

\progdes

The {\tt SCOTCH\_errorProg} function is designed to be called at
the beginning of a program or of a portion of code to identify the
place where subsequent errors take place.
This routine is not reentrant, as it is only a minor help function. It
is defined in {\tt lib\lbt scotch\lbt err.a} and is used by the
standalone programs of the \scotch\ distribution.
\end{itemize}

\subsection{Miscellaneous routines}
\label{sec-lib-misc}

\subsubsection{{\tt SCOTCH\_memCur}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}l}
SCOTCH\_Idx SCOTCH\_memCur ( & void)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfmemcur ( & integer*{\it idx} & memcur) \\

\end{tabular}}

\progdes

When \scotch\ is compiled with the {\tt COMMON\_\lbt MEMORY\_\lbt
TRACE} flag set, the {\tt SCOTCH\_memCur} routine returns the amount
of memory, in bytes, that is currently allocated by \scotch\ on the
current processing element, either by itself or on the behalf of the
user. Else, the routine returns {\tt -1}.

The returned figure does not account for the memory that has been
allocated by the user and made visible to \scotch\ by means of
routines such as {\tt SCOTCH\_\lbt dgraph\lbt Build} calls. This
memory is not under the control of \scotch, and it is the user's
responsibility to free it after calling the relevant
{\tt SCOTCH\_\lbt *\lbt Exit} routines.

Some third-party software used by \scotch, such as the strategy string
parser, may allocate some memory for internal use and never free it.
Consequently, there may be small discrepancies between memory
occupation figures returned by \scotch\ and those returned by
third-party tools. However, these discrepancies should not exceed a
few kilobytes.

While memory occupation is internally recorded in a variable of type
{\tt intptr\_\lbt t}, it is output as a {\tt SCOTCH\_\lbt Idx} for the
sake of interface homogeneity, especially for Fortran. It is therefore
the installer's responsibility to make sure that the support integer
type of {\tt SCOTCH\_\lbt Idx} is large enough to not overflow. See
section~\ref{sec-lib-inttypesize} for more information.
\end{itemize}

\subsubsection{{\tt SCOTCH\_memMax}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}l}
SCOTCH\_Idx SCOTCH\_memMax ( & void)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfmemmax ( & integer*{\it idx} & memcur) \\

\end{tabular}}

\progdes

When \scotch\ is compiled with the {\tt COMMON\_\lbt MEMORY\_\lbt
TRACE} flag set, the {\tt SCOTCH\_memMax} routine returns the maximum
amount of memory, in bytes, ever allocated by \scotch\ on the current
processing element, either by itself or on the behalf of the
user. Else, the routine returns {\tt -1}.

The returned figure does not account for the memory that has been
allocated by the user and made visible to \scotch\ by means of
routines such as {\tt SCOTCH\_\lbt dgraph\lbt Build} calls. This
memory is not under the control of \scotch, and it is the user's
responsibility to free it after calling the relevant
{\tt SCOTCH\_\lbt *\lbt Exit} routines.

Some third-party software used by \scotch, such as the strategy string
parser, may allocate some memory for internal use and never free it.
Consequently, there may be small discrepancies between memory
occupation figures returned by \scotch\ and those returned by
third-party tools. However, these discrepancies should not exceed a
few kilobytes.

While memory occupation is internally recorded in a variable of type
{\tt intptr\_\lbt t}, it is output as a {\tt SCOTCH\_\lbt Idx} for the
sake of interface homogeneity, especially for Fortran. It is therefore
the installer's responsibility to make sure that the support integer
type of {\tt SCOTCH\_\lbt Idx} is large enough to not overflow. See
section~\ref{sec-lib-inttypesize} for more information.
\end{itemize}

\subsubsection{{\tt SCOTCH\_randomProc}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_randomProc ( & int & procnum)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfrandomproc ( & integer & procnum )
\end{tabular}}

\progdes

The {\tt SCOTCH\_randomProc} routine records internally the provided
number, which contributes to the initialization of the pseudo-random
generator. Hence, providing different values for each process,
e.g. process rank, will result in different random seeds across
processors. This allows processes to compute concurrently different
initial partitions, in the course of the parallel multilevel framework
with folding and duplication.

In order for the provided number to be taken into account,
{\tt SCOTCH\_\lbt random\lbt Proc} must be called before any other
routine of the \libscotch\ that is likely to use (and consequently
to initialize) the pseudo-random number generator. By default, it is
set to zero.
\end{itemize}

\subsubsection{{\tt SCOTCH\_randomReset}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}l}
void SCOTCH\_randomReset ( & void)
\end{tabular}}

{\tt\begin{tabular}{l@{}l}
scotchfrandomreset ( & )
\end{tabular}}

\progdes

The {\tt SCOTCH\_randomReset} routine resets the seed of the
pseudo-random generator used by the graph partitioning routines
of the \libscotch\ library. Two consecutive calls to
the same \libscotch\ partitioning or ordering routines, separated
by a call to {\tt SCOTCH\_\lbt random\lbt Reset}, will always yield
the same results.
\end{itemize}

\subsubsection{{\tt SCOTCH\_randomSeed}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void SCOTCH\_randomSeed ( & SCOTCH\_Num & seedval)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
scotchfrandomseed ( & integer*{\it num} & seedval )
\end{tabular}}

\progdes

The {\tt SCOTCH\_randomSeed} routine sets to {\tt seedval} the
seed of the pseudo-random generator used internally by several
algorithms of \scotch. All subsequent calls to {\tt SCOTCH\_\lbt
random\lbt Reset} will use this value to reset the pseudo-random
generator.

This routine needs only to be used by users willing to evaluate
the robustness and quality of partitioning algorithms with respect to
the variability of random seeds. Else, depending whether \scotch\ has
been compiled with any of the flags {\tt COMMON\_\lbt RANDOM\_\lbt
FIXED\_\lbt SEED} or {\tt SCOTCH\_\lbt DETERMINISTIC} set or not,
either the same pseudo-random seed will be always used, or a
process-dependent seed will be used, respectively.
\end{itemize}

\subsection{\parmetis\ compatibility library}
\label{sec-lib-parmetis}

The \parmetis\ compatibility library provides stubs which redirect some
calls to \parmetis\ routines to the corresponding \ptscotch\ counterparts.
In order to use this feature, the only thing to do is to re-link the
existing software with the {\tt lib\lbo ptscotch\lbo parmetis} library, and
eventually with the original \parmetis\ library if the software uses
\parmetis\ routines which do not need to have \ptscotch\ equivalents, such
as graph transformation routines.
In that latter case, the ``{\tt -lptscotch\lbt parmetis}'' argument must be
placed before the ``{\tt -lparmetis}'' one (and of course before the
``{\tt -lptscotch}'' one too), so that routines that are redefined by
\ptscotch\ are chosen instead of their \parmetis\ counterpart. Routines
of \parmetis\ which are not redefined by \ptscotch\ may also require
that the sequential \metis\ library be linked too. When no other
\parmetis\ routines than the ones redefined by \ptscotch\ are used,
the ``{\tt -lparmetis}'' argument can be omitted. See
Section~\ref{sec-examples} for an example.

\subsubsection{{\tt ParMETIS\_V3\_NodeND}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void ParMETIS\_V3\_NodeND ( & const SCOTCH\_Num * const & vtxdist, \\
                            & const SCOTCH\_Num * const & xadj, \\
                            & const SCOTCH\_Num * const & adjncy, \\
                            & const SCOTCH\_Num * const & numflag, \\
                            & const SCOTCH\_Num * const & options, \\
                            & SCOTCH\_Num * const       & order, \\
                            & SCOTCH\_Num * const       & sizes, \\
                            & MPI\_Comm *               & comm)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
parmetis\_v3\_nodend ( & integer*{\it num} (*) & vtxdist, \\
                       & integer*{\it num} (*) & xadj, \\
                       & integer*{\it num} (*) & adjncy, \\
                       & integer*{\it num}     & numflag, \\
                       & integer*{\it num} (*) & options, \\
                       & integer*{\it num} (*) & order, \\
                       & integer*{\it num} (*) & sizes, \\
                       & integer               & comm)
\end{tabular}}

\progdes

The {\tt ParMETIS\_V3\_NodeND} function performs a nested dissection
ordering of the distributed graph passed as arrays {\tt vtxdist},
{\tt xadj} and {\tt adjncy}, using the default \ptscotch\ ordering
strategy. Unlike for \parmetis, this routine will compute an ordering
even when the number of processors on which it is run is not a power
of two. The {\tt options} array is not used. When the number of
processors is a power of two, the contents of the {\tt sizes} array is
equivalent to the one returned by the original {\tt ParMETIS\_V3\_NodeND}
routine, else it is filled with $-1$ values.

Users willing to get the tree structure of orderings computed on
numbers of processors which are not power of two should use the native
\ptscotch\ ordering routine, and extract the relevant information from
the distributed ordering with the {\tt SCOTCH\_\lbt dgraph\lbt
Order\lbt Cblk\lbt Dist} and {\tt SCOTCH\_\lbt dgraph\lbt Order\lbt
Tree\lbt Dist} routines.

Similarly, as there is no {\tt ParMETIS\_V3\_NodeWND} routine in
\parmetis, users willing to order distributed graphs with node weights
should directly call the \ptscotch\ routines.
\end{itemize}

\subsubsection{{\tt ParMETIS\_V3\_PartGeomKway}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void ParMETIS\_V3\_PartGeomKway ( & const SCOTCH\_Num * const & vtxdist, \\
                                  & const SCOTCH\_Num * const & xadj, \\
                                  & const SCOTCH\_Num * const & adjncy, \\
                                  & const SCOTCH\_Num * const & vwgt, \\
                                  & const SCOTCH\_Num * const & adjwgt, \\
                                  & const SCOTCH\_Num * const & wgtflag, \\
                                  & const SCOTCH\_Num * const & numflag, \\
                                  & const SCOTCH\_Num * const & ndims, \\
                                  & const float * const       & xyz, \\
                                  & const SCOTCH\_Num * const & ncon, \\
                                  & const SCOTCH\_Num * const & nparts, \\
                                  & const float * const       & tpwgts, \\
                                  & const float * const       & ubvec, \\
                                  & const SCOTCH\_Num * const & options, \\
                                  & SCOTCH\_Num * const       & edgecut, \\
                                  & SCOTCH\_Num * const       & part, \\
                                  & MPI\_Comm *               & comm)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
parmetis\_v3\_partgeomkway ( & integer*{\it num} (*) & vtxdist, \\
                             & integer*{\it num} (*) & xadj, \\
                             & integer*{\it num} (*) & adjncy, \\
                             & integer*{\it num} (*) & vwgt, \\
                             & integer*{\it num} (*) & adjwgt, \\
                             & integer*{\it num}     & wgtflag, \\
                             & integer*{\it num}     & numflag, \\
                             & integer*{\it num}     & ndims, \\
                             & float (*)             & xyz, \\
                             & integer*{\it num}     & ncon, \\
                             & integer*{\it num}     & nparts, \\
                             & float (*)             & tpwgts, \\
                             & float (*)             & ubvec, \\
                             & integer*{\it num} (*) & options, \\
                             & integer*{\it num}     & edgecut, \\
                             & integer*{\it num} (*) & part, \\
                             & integer               & comm)
\end{tabular}}

\progdes

The {\tt ParMETIS\_V3\_PartGeomKway} function computes a partition into
{\tt nparts} parts of the distributed graph passed as arrays {\tt
vtxdist}, {\tt xadj} and {\tt adjncy}, using the default
\ptscotch\ mapping strategy. The partition is returned in the form of
the distributed vector {\tt part}, which holds the indices of the
parts to which every vertex belongs, from $0$ to $(\mbox{\tt nparts} -
1)$.

Since \scotch\ does not handle geometry, the {\tt ndims} and {\tt xyz}
arrays are not used, and this routine directly calls the
{\tt ParMETIS\_\lbt V3\_\lbt Part\lbt Kway} stub.
\end{itemize}

\subsubsection{{\tt ParMETIS\_V3\_PartKway}}

\begin{itemize}
\progsyn

{\tt\begin{tabular}{l@{}ll}
void ParMETIS\_V3\_PartKway ( & const SCOTCH\_Num * const & vtxdist, \\
                              & const SCOTCH\_Num * const & xadj, \\
                              & const SCOTCH\_Num * const & adjncy, \\
                              & const SCOTCH\_Num * const & vwgt, \\
                              & const SCOTCH\_Num * const & adjwgt, \\
                              & const SCOTCH\_Num * const & wgtflag, \\
                              & const SCOTCH\_Num * const & numflag, \\
                              & const SCOTCH\_Num * const & ncon, \\
                              & const SCOTCH\_Num * const & nparts, \\
                              & const float * const       & tpwgts, \\
                              & const float * const       & ubvec, \\
                              & const SCOTCH\_Num * const & options, \\
                              & SCOTCH\_Num * const       & edgecut, \\
                              & SCOTCH\_Num * const       & part, \\
                              & MPI\_Comm *               & comm)
\end{tabular}}

{\tt\begin{tabular}{l@{}ll}
parmetis\_v3\_partkway ( & integer*{\it num} (*) & vtxdist, \\
                         & integer*{\it num} (*) & xadj, \\
                         & integer*{\it num} (*) & adjncy, \\
                         & integer*{\it num} (*) & vwgt, \\
                         & integer*{\it num} (*) & adjwgt, \\
                         & integer*{\it num}     & wgtflag, \\
                         & integer*{\it num}     & numflag, \\
                         & integer*{\it num}     & ncon, \\
                         & integer*{\it num}     & nparts, \\
                         & float (*)             & tpwgts, \\
                         & float (*)             & ubvec, \\
                         & integer*{\it num} (*) & options, \\
                         & integer*{\it num}     & edgecut, \\
                         & integer*{\it num} (*) & part, \\
                         & integer               & comm)
\end{tabular}}

\progdes

The {\tt ParMETIS\_V3\_PartKway} function computes a partition into
{\tt nparts} parts of the distributed graph passed as arrays {\tt
vtxdist}, {\tt xadj} and {\tt adjncy}, using the default
\ptscotch\ mapping strategy. The partition is returned in the form of
the distributed vector {\tt part}, which holds the indices of the
parts to which every vertex belongs, from $0$ to $(\mbox{\tt nparts} -
1)$.

Since \scotch\ does not handle multiple constraints, only the first
constraint is taken into account to define the respective weights of
the parts. Consequently, only the first {\tt nparts} cells of the
{\tt tpwgts} array are considered. The {\tt ncon}, {\tt ubvec}
and {\tt options} parameters are not used.
\end{itemize}