From 4d13f3d33da97368a2983fff79a1e33c4a75ef45 Mon Sep 17 00:00:00 2001
From: sjplimp
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms -97.0% CPU use with 4 MPI tasks x no OpenMP threads -Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s +Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms +
+Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s +97.0% CPU use with 4 MPI tasks x no OpenMP threadsMPI task timings breakdown: Section | min time | avg time | max time |%varavg| %total @@ -1783,16 +1784,16 @@ Neighbor list builds = 26 Dangerous builds = 0The first section provides a global loop timing summary. The loop time -is the total wall time for the section. The second line provides the -CPU utilzation per MPI task; it should be close to 100% times the number -of OpenMP threads (or 1). Lower numbers correspond to delays due to -file i/o or unsufficient thread utilization. The Performance line is +is the total wall time for the section. The Performance line is provided for convenience to help predicting the number of loop -continuations required and for comparing performance with other similar -MD codes. +continuations required and for comparing performance with other +similar MD codes. The CPU use line provides the CPU utilzation per +MPI task; it should be close to 100% times the number of OpenMP +threads (or 1). Lower numbers correspond to delays due to file I/O or +insufficient thread utilization.
-The second section gives the breakdown of the CPU run time (in seconds) -into major categories: +
The MPI task section gives the breakdown of the CPU run time (in +seconds) into major categories:
When using the timers full setting, and additional column -is present that also prints the CPU utilization in percent. In addition, -when using timers full and the package omp command are -active, a similar timing summary of time spent in threaded regions to -monitor thread utilization and load balance is provided. A new enrty is -the Reduce section, which lists the time spend in reducing the per-thread -data elements to the storage for non-threaded computation. These thread -timings are taking from the first MPI rank only and and thus, as the -breakdown for MPI tasks can change from MPI rank to MPI rank, this -breakdown can be very different for individual ranks. Here is an example -output for this optional output section: +
When using the timers full setting, an additional column +is present that also prints the CPU utilization in percent. In +addition, when using timers full and the package omp +command are active, a similar timing summary of time spent in threaded +regions to monitor thread utilization and load balance is provided. A +new entry is the Reduce section, which lists the time spend in +reducing the per-thread data elements to the storage for non-threaded +computation. These thread timings are taking from the first MPI rank +only and and thus, as the breakdown for MPI tasks can change from MPI +rank to MPI rank, this breakdown can be very different for individual +ranks. Here is an example output for this section:
Thread timings breakdown (MPI rank 0): Total threaded time 0.6846 / 90.6%