Arcane  4.1.12.0
User documentation
Loading...
Searching...
No Matches
Performance analysis by sampling

Warning
Currently, profiling only works on Linux platforms.
Currently, profiling DOES NOT WORK when multi-threading (whether using the task mechanism or message passing) is active.

Profiling in Arcane works on a sampling principle: at regular intervals, the code is interrupted and we check which method we are in. If the sampling interval is small and the code is executed for a sufficient amount of time, we obtain a good statistical representation of the proportion of time spent in the most costly methods. Sampling methods do not require code instrumentation and therefore slow down execution very little. However, they do not allow us to know, for example, how many times a method is called, nor do they easily allow us to know the call graph. For this, it is necessary to use mechanisms like gprof with compiler support.

To activate profiling, you must set the ARCANE_PROFILING environment variable to one of the following values:

  • Prof . This allows for imprecise profiling but works on all Linux machines. The sampling frequency is around 20Hz, so you must run the code for at least 1 minute to get significant results
  • Papi. This allows for very precise profiling, using the processor's hardware counters. For this, the free PAPI library is used. This only works with recent Linux kernels (2.6.32 or later) or patched kernels, and Arcane must be compiled with this support. In this mode, sampling is given in the number of processor clock cycles. The default value is 500,000 cycles. For a 3GHz processor, this means 6000 samples per second. It is possible to change the number of cycles via the ARCANE_PROFILING_PERIOD environment variable. It is preferable not to go below the default value.

In order to maintain reasonable performance during execution, it is preferable not to exceed 100,000 samples. Therefore, you must adjust the test execution duration or the sampling frequency accordingly.

When active, profiling starts at the first iteration and stops at the last iteration of the execution. In parallel, profiling is performed for each processor, and Arcane then displays the profiling information for each processor in the following manner:

*I-Internal PROCESS_ID = 17977
*I-Internal NB ADDRESS MAP = 737
*I-Internal NB FUNC MAP = 53
*I-Internal NB STACK MAP = 0
*I-Internal TOTAL STACK = 0
*I-Internal FUNC EVENT=2798
*I-Internal FUNC FP=0
*I-Internal TOTAL EVENT = 2799
*I-Internal TOTAL FP = 1 (nb_giga_flip=0.0005)
*I-Internal RATIO FP/CYC = 0.000357270453733476
*I-Internal event % function
*I-Internal 1113 39.7 1113 0 0 0 SimpleHydro::ModuleSimpleHydro::computeCQs(Arcane::Real3*, Arcane::Real3*, Arcane::Cell const&)
*I-Internal 613 21.9 613 0 0 0 SimpleHydro::ModuleSimpleHydro::_computeGeometricValues(Arcane::ItemVectorViewT<Arcane::Cell>)
*I-Internal 396 14.1 396 0 0 0 SimpleHydro::ModuleSimpleHydro::_computePressureAndCellPseudoViscosityForces()
*I-Internal 217 7.7 217 0 0 0 SimpleHydro::ModuleSimpleHydro::_applyEquationOfState(Arcane::ItemVectorViewT<Arcane::Cell>)
*I-Internal 87 3.1 87 0 0 0 SimpleHydro::ModuleSimpleHydro::applyBoundaryCondition()
*I-Internal 85 3.0 85 0 0 0 SimpleHydro::ModuleSimpleHydro::_computeViscosityWork(Arcane::ItemVectorViewT<Arcane::Cell>)
*I-Internal 77 2.7 77 0 0 0 SimpleHydro::ModuleSimpleHydro::updateDensity()
*I-Internal 70 2.5 70 0 0 0 SimpleHydro::ModuleSimpleHydro::computeVelocity()
*I-Internal 50 1.7 50 0 0 0 SimpleHydro::ModuleSimpleHydro::computeDeltaT()
*I-Internal 28 1.0 28 0 0 0 SimpleHydro::ModuleSimpleHydro::moveNodes()
*I-Internal 13 0.4 13 0 0 0 Arcane::VariableArrayT<Arcane::Real3>::fill(Arcane::Real3 const&, Arcane::ItemGroup const&)
*I-Internal 4 0.1 4 0 0 0 _IO_vfprintf
*I-Internal 2 0.0 2 0 0 0 Arcane::Timer::Sentry::~Sentry()
*I-Internal 1 0.0 1 0 0 0 SimpleHydro::ModuleSimpleHydro::applyEquationOfState()

The first lines are internal information for Arcane. Then, each method is displayed in descending order of time spent. In the previous example, we see that 39.7% of the time is spent in the ModuleSimpleHydro::computeCQs() method.

For the results to be relevant, the code must be compiled in optimized mode with inlining enabled.