osnoise-tracer.rst 9.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
  1. ==============
  2. OSNOISE Tracer
  3. ==============
  4. In the context of high-performance computing (HPC), the Operating System
  5. Noise (*osnoise*) refers to the interference experienced by an application
  6. due to activities inside the operating system. In the context of Linux,
  7. NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
  8. system. Moreover, hardware-related jobs can also cause noise, for example,
  9. via SMIs.
  10. hwlat_detector is one of the tools used to identify the most complex
  11. source of noise: *hardware noise*.
  12. In a nutshell, the hwlat_detector creates a thread that runs
  13. periodically for a given period. At the beginning of a period, the thread
  14. disables interrupt and starts sampling. While running, the hwlatd
  15. thread reads the time in a loop. As interrupts are disabled, threads,
  16. IRQs, and SoftIRQs cannot interfere with the hwlatd thread. Hence, the
  17. cause of any gap between two different reads of the time roots either on
  18. NMI or in the hardware itself. At the end of the period, hwlatd enables
  19. interrupts and reports the max observed gap between the reads. It also
  20. prints a NMI occurrence counter. If the output does not report NMI
  21. executions, the user can conclude that the hardware is the culprit for
  22. the latency. The hwlat detects the NMI execution by observing
  23. the entry and exit of a NMI.
  24. The osnoise tracer leverages the hwlat_detector by running a
  25. similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing
  26. all the sources of *osnoise* during its execution. Using the same approach
  27. of hwlat, osnoise takes note of the entry and exit point of any
  28. source of interferences, increasing a per-cpu interference counter. The
  29. osnoise tracer also saves an interference counter for each source of
  30. interference. The interference counter for NMI, IRQs, SoftIRQs, and
  31. threads is increased anytime the tool observes these interferences' entry
  32. events. When a noise happens without any interference from the operating
  33. system level, the hardware noise counter increases, pointing to a
  34. hardware-related noise. In this way, osnoise can account for any
  35. source of interference. At the end of the period, the osnoise tracer
  36. prints the sum of all noise, the max single noise, the percentage of CPU
  37. available for the thread, and the counters for the noise sources.
  38. Usage
  39. -----
  40. Write the ASCII text "osnoise" into the current_tracer file of the
  41. tracing system (generally mounted at /sys/kernel/tracing).
  42. For example::
  43. [root@f32 ~]# cd /sys/kernel/tracing/
  44. [root@f32 tracing]# echo osnoise > current_tracer
  45. It is possible to follow the trace by reading the trace file::
  46. [root@f32 tracing]# cat trace
  47. # tracer: osnoise
  48. #
  49. # _-----=> irqs-off
  50. # / _----=> need-resched
  51. # | / _---=> hardirq/softirq
  52. # || / _--=> preempt-depth MAX
  53. # || / SINGLE Interference counters:
  54. # |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+
  55. # TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD
  56. # | | | |||| | | | | | | | | | |
  57. <...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1
  58. <...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3
  59. <...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21
  60. <...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0
  61. <...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41
  62. <...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2
  63. <...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1
  64. <...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19
  65. In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the
  66. tracer prints a message at the end of each period for each CPU that is
  67. running an osnoise/ thread. The osnoise specific fields report:
  68. - The RUNTIME IN US reports the amount of time in microseconds that
  69. the osnoise thread kept looping reading the time.
  70. - The NOISE IN US reports the sum of noise in microseconds observed
  71. by the osnoise tracer during the associated runtime.
  72. - The % OF CPU AVAILABLE reports the percentage of CPU available for
  73. the osnoise thread during the runtime window.
  74. - The MAX SINGLE NOISE IN US reports the maximum single noise observed
  75. during the runtime window.
  76. - The Interference counters display how many each of the respective
  77. interference happened during the runtime window.
  78. Note that the example above shows a high number of HW noise samples.
  79. The reason being is that this sample was taken on a virtual machine,
  80. and the host interference is detected as a hardware interference.
  81. Tracer Configuration
  82. --------------------
  83. The tracer has a set of options inside the osnoise directory, they are:
  84. - osnoise/cpus: CPUs at which a osnoise thread will execute.
  85. - osnoise/period_us: the period of the osnoise thread.
  86. - osnoise/runtime_us: how long an osnoise thread will look for noise.
  87. - osnoise/stop_tracing_us: stop the system tracing if a single noise
  88. higher than the configured value happens. Writing 0 disables this
  89. option.
  90. - osnoise/stop_tracing_total_us: stop the system tracing if total noise
  91. higher than the configured value happens. Writing 0 disables this
  92. option.
  93. - tracing_threshold: the minimum delta between two time() reads to be
  94. considered as noise, in us. When set to 0, the default value will
  95. be used, which is currently 1 us.
  96. - osnoise/options: a set of on/off options that can be enabled by
  97. writing the option name to the file or disabled by writing the option
  98. name preceded with the 'NO\_' prefix. For example, writing
  99. NO_OSNOISE_WORKLOAD disables the OSNOISE_WORKLOAD option. The
  100. special DEAFAULTS option resets all options to the default value.
  101. Tracer Options
  102. --------------
  103. The osnoise/options file exposes a set of on/off configuration options for
  104. the osnoise tracer. These options are:
  105. - DEFAULTS: reset the options to the default value.
  106. - OSNOISE_WORKLOAD: do not dispatch osnoise workload (see dedicated
  107. section below).
  108. - PANIC_ON_STOP: call panic() if the tracer stops. This option serves to
  109. capture a vmcore.
  110. - OSNOISE_PREEMPT_DISABLE: disable preemption while running the osnoise
  111. workload, allowing only IRQ and hardware-related noise.
  112. - OSNOISE_IRQ_DISABLE: disable IRQs while running the osnoise workload,
  113. allowing only NMIs and hardware-related noise, like hwlat tracer.
  114. Additional Tracing
  115. ------------------
  116. In addition to the tracer, a set of tracepoints were added to
  117. facilitate the identification of the osnoise source.
  118. - osnoise:sample_threshold: printed anytime a noise is higher than
  119. the configurable tolerance_ns.
  120. - osnoise:nmi_noise: noise from NMI, including the duration.
  121. - osnoise:irq_noise: noise from an IRQ, including the duration.
  122. - osnoise:softirq_noise: noise from a SoftIRQ, including the
  123. duration.
  124. - osnoise:thread_noise: noise from a thread, including the duration.
  125. Note that all the values are *net values*. For example, if while osnoise
  126. is running, another thread preempts the osnoise thread, it will start a
  127. thread_noise duration at the start. Then, an IRQ takes place, preempting
  128. the thread_noise, starting a irq_noise. When the IRQ ends its execution,
  129. it will compute its duration, and this duration will be subtracted from
  130. the thread_noise, in such a way as to avoid the double accounting of the
  131. IRQ execution. This logic is valid for all sources of noise.
  132. Here is one example of the usage of these tracepoints::
  133. osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns
  134. osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns
  135. migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns
  136. osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8812 ns interferences 2
  137. In this example, a noise sample of 8 microseconds was reported in the last
  138. line, pointing to two interferences. Looking backward in the trace, the
  139. two previous entries were about the migration thread running after a
  140. timer IRQ execution. The first event is not part of the noise because
  141. it took place one millisecond before.
  142. It is worth noticing that the sum of the duration reported in the
  143. tracepoints is smaller than eight us reported in the sample_threshold.
  144. The reason roots in the overhead of the entry and exit code that happens
  145. before and after any interference execution. This justifies the dual
  146. approach: measuring thread and tracing.
  147. Running osnoise tracer without workload
  148. ---------------------------------------
  149. By enabling the osnoise tracer with the NO_OSNOISE_WORKLOAD option set,
  150. the osnoise: tracepoints serve to measure the execution time of
  151. any type of Linux task, free from the interference of other tasks.