git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@13961 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
sjplimp
2015-08-29 00:13:36 +00:00
parent f945c6d79a
commit 4d13f3d33d

View File

@ -1753,9 +1753,10 @@ thermodynamic state and a total run time for the simulation. It then
appends statistics about the CPU time and storage requirements for the appends statistics about the CPU time and storage requirements for the
simulation. An example set of statistics is shown here: simulation. An example set of statistics is shown here:
</P> </P>
<PRE>Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms <P>Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
</P>
<PRE>Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
97.0% CPU use with 4 MPI tasks x no OpenMP threads 97.0% CPU use with 4 MPI tasks x no OpenMP threads
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
</PRE> </PRE>
<PRE>MPI task timings breakdown: <PRE>MPI task timings breakdown:
Section | min time | avg time | max time |%varavg| %total Section | min time | avg time | max time |%varavg| %total
@ -1783,16 +1784,16 @@ Neighbor list builds = 26
Dangerous builds = 0 Dangerous builds = 0
</PRE> </PRE>
<P>The first section provides a global loop timing summary. The loop time <P>The first section provides a global loop timing summary. The loop time
is the total wall time for the section. The second line provides the is the total wall time for the section. The <I>Performance</I> line is
CPU utilzation per MPI task; it should be close to 100% times the number
of OpenMP threads (or 1). Lower numbers correspond to delays due to
file i/o or unsufficient thread utilization. The <I>Performance</I> line is
provided for convenience to help predicting the number of loop provided for convenience to help predicting the number of loop
continuations required and for comparing performance with other similar continuations required and for comparing performance with other
MD codes. similar MD codes. The CPU use line provides the CPU utilzation per
MPI task; it should be close to 100% times the number of OpenMP
threads (or 1). Lower numbers correspond to delays due to file I/O or
insufficient thread utilization.
</P> </P>
<P>The second section gives the breakdown of the CPU run time (in seconds) <P>The MPI task section gives the breakdown of the CPU run time (in
into major categories: seconds) into major categories:
</P> </P>
<UL><LI><I>Pair</I> stands for all non-bonded force computation <UL><LI><I>Pair</I> stands for all non-bonded force computation
<LI><I>Bond</I> stands for bonded interactions: bonds, angles, dihedrals, impropers <LI><I>Bond</I> stands for bonded interactions: bonds, angles, dihedrals, impropers
@ -1811,17 +1812,17 @@ the difference between minimum, maximum and average is small and thus
the variation from the average close to zero. The final column shows the variation from the average close to zero. The final column shows
the percentage of the total loop time is spent in this section. the percentage of the total loop time is spent in this section.
</P> </P>
<P>When using the <A HREF = "timers.html">timers full</A> setting, and additional column <P>When using the <A HREF = "timers.html">timers full</A> setting, an additional column
is present that also prints the CPU utilization in percent. In addition, is present that also prints the CPU utilization in percent. In
when using <I>timers full</I> and the <A HREF = "package.html">package omp</A> command are addition, when using <I>timers full</I> and the <A HREF = "package.html">package omp</A>
active, a similar timing summary of time spent in threaded regions to command are active, a similar timing summary of time spent in threaded
monitor thread utilization and load balance is provided. A new enrty is regions to monitor thread utilization and load balance is provided. A
the <I>Reduce</I> section, which lists the time spend in reducing the per-thread new entry is the <I>Reduce</I> section, which lists the time spend in
data elements to the storage for non-threaded computation. These thread reducing the per-thread data elements to the storage for non-threaded
timings are taking from the first MPI rank only and and thus, as the computation. These thread timings are taking from the first MPI rank
breakdown for MPI tasks can change from MPI rank to MPI rank, this only and and thus, as the breakdown for MPI tasks can change from MPI
breakdown can be very different for individual ranks. Here is an example rank to MPI rank, this breakdown can be very different for individual
output for this optional output section: ranks. Here is an example output for this section:
</P> </P>
<P>Thread timings breakdown (MPI rank 0): <P>Thread timings breakdown (MPI rank 0):
Total threaded time 0.6846 / 90.6% Total threaded time 0.6846 / 90.6%