timerlat-tracer.rst 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260
  1. ###############
  2. Timerlat tracer
  3. ###############
  4. The timerlat tracer aims to help the preemptive kernel developers to
  5. find sources of wakeup latencies of real-time threads. Like cyclictest,
  6. the tracer sets a periodic timer that wakes up a thread. The thread then
  7. computes a *wakeup latency* value as the difference between the *current
  8. time* and the *absolute time* that the timer was set to expire. The main
  9. goal of timerlat is tracing in such a way to help kernel developers.
  10. Usage
  11. -----
  12. Write the ASCII text "timerlat" into the current_tracer file of the
  13. tracing system (generally mounted at /sys/kernel/tracing).
  14. For example::
  15. [root@f32 ~]# cd /sys/kernel/tracing/
  16. [root@f32 tracing]# echo timerlat > current_tracer
  17. It is possible to follow the trace by reading the trace file::
  18. [root@f32 tracing]# cat trace
  19. # tracer: timerlat
  20. #
  21. # _-----=> irqs-off
  22. # / _----=> need-resched
  23. # | / _---=> hardirq/softirq
  24. # || / _--=> preempt-depth
  25. # || /
  26. # |||| ACTIVATION
  27. # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY
  28. # | | | |||| | | | |
  29. <idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns
  30. <...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns
  31. <idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns
  32. <...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns
  33. <idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns
  34. <...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns
  35. <idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns
  36. <...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns
  37. The tracer creates a per-cpu kernel thread with real-time priority that
  38. prints two lines at every activation. The first is the *timer latency*
  39. observed at the *hardirq* context before the activation of the thread.
  40. The second is the *timer latency* observed by the thread. The ACTIVATION
  41. ID field serves to relate the *irq* execution to its respective *thread*
  42. execution.
  43. The *irq*/*thread* splitting is important to clarify in which context
  44. the unexpected high value is coming from. The *irq* context can be
  45. delayed by hardware-related actions, such as SMIs, NMIs, IRQs,
  46. or by thread masking interrupts. Once the timer happens, the delay
  47. can also be influenced by blocking caused by threads. For example, by
  48. postponing the scheduler execution via preempt_disable(), scheduler
  49. execution, or masking interrupts. Threads can also be delayed by the
  50. interference from other threads and IRQs.
  51. Tracer options
  52. ---------------------
  53. The timerlat tracer is built on top of osnoise tracer.
  54. So its configuration is also done in the osnoise/ config
  55. directory. The timerlat configs are:
  56. - cpus: CPUs at which a timerlat thread will execute.
  57. - timerlat_period_us: the period of the timerlat thread.
  58. - stop_tracing_us: stop the system tracing if a
  59. timer latency at the *irq* context higher than the configured
  60. value happens. Writing 0 disables this option.
  61. - stop_tracing_total_us: stop the system tracing if a
  62. timer latency at the *thread* context is higher than the configured
  63. value happens. Writing 0 disables this option.
  64. - print_stack: save the stack of the IRQ occurrence. The stack is printed
  65. after the *thread context* event, or at the IRQ handler if *stop_tracing_us*
  66. is hit.
  67. timerlat and osnoise
  68. ----------------------------
  69. The timerlat can also take advantage of the osnoise: traceevents.
  70. For example::
  71. [root@f32 ~]# cd /sys/kernel/tracing/
  72. [root@f32 tracing]# echo timerlat > current_tracer
  73. [root@f32 tracing]# echo 1 > events/osnoise/enable
  74. [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
  75. [root@f32 tracing]# tail -10 trace
  76. cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 13585 ns
  77. cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns
  78. cc1-87882 [005] dNLh2.. 548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns
  79. cc1-87882 [005] d...3.. 548.771102: thread_noise: cc1:87882 start 548.771078243 duration 9909 ns
  80. timerlat/5-1035 [005] ....... 548.771104: #402268 context thread timer_latency 39960 ns
  81. In this case, the root cause of the timer latency does not point to a
  82. single cause but to multiple ones. Firstly, the timer IRQ was delayed
  83. for 13 us, which may point to a long IRQ disabled section (see IRQ
  84. stacktrace section). Then the timer interrupt that wakes up the timerlat
  85. thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally,
  86. the cc1 thread noise took 9909 ns of time before the context switch.
  87. Such pieces of evidence are useful for the developer to use other
  88. tracing methods to figure out how to debug and optimize the system.
  89. It is worth mentioning that the *duration* values reported
  90. by the osnoise: events are *net* values. For example, the
  91. thread_noise does not include the duration of the overhead caused
  92. by the IRQ execution (which indeed accounted for 12736 ns). But
  93. the values reported by the timerlat tracer (timerlat_latency)
  94. are *gross* values.
  95. The art below illustrates a CPU timeline and how the timerlat tracer
  96. observes it at the top and the osnoise: events at the bottom. Each "-"
  97. in the timelines means circa 1 us, and the time moves ==>::
  98. External timer irq thread
  99. clock latency latency
  100. event 13585 ns 39960 ns
  101. | ^ ^
  102. v | |
  103. |-------------| |
  104. |-------------+-------------------------|
  105. ^ ^
  106. ========================================================================
  107. [tmr irq] [dev irq]
  108. [another thread...^ v..^ v.......][timerlat/ thread] <-- CPU timeline
  109. =========================================================================
  110. |-------| |-------|
  111. |--^ v-------|
  112. | | |
  113. | | + thread_noise: 9909 ns
  114. | +-> irq_noise: 6139 ns
  115. +-> irq_noise: 7597 ns
  116. IRQ stacktrace
  117. ---------------------------
  118. The osnoise/print_stack option is helpful for the cases in which a thread
  119. noise causes the major factor for the timer latency, because of preempt or
  120. irq disabled. For example::
  121. [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us
  122. [root@f32 tracing]# echo 500 > osnoise/print_stack
  123. [root@f32 tracing]# echo timerlat > current_tracer
  124. [root@f32 tracing]# tail -21 per_cpu/cpu7/trace
  125. insmod-1026 [007] dN.h1.. 200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns
  126. insmod-1026 [007] d..h1.. 200.202587: #29800 context irq timer_latency 1616 ns
  127. insmod-1026 [007] dN.h2.. 200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns
  128. insmod-1026 [007] dN.h3.. 200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns
  129. insmod-1026 [007] d...3.. 200.203444: thread_noise: insmod:1026 start 200.202586933 duration 838681 ns
  130. timerlat/7-1001 [007] ....... 200.203445: #29800 context thread timer_latency 859978 ns
  131. timerlat/7-1001 [007] ....1.. 200.203446: <stack trace>
  132. => timerlat_irq
  133. => __hrtimer_run_queues
  134. => hrtimer_interrupt
  135. => __sysvec_apic_timer_interrupt
  136. => asm_call_irq_on_stack
  137. => sysvec_apic_timer_interrupt
  138. => asm_sysvec_apic_timer_interrupt
  139. => delay_tsc
  140. => dummy_load_1ms_pd_init
  141. => do_one_initcall
  142. => do_init_module
  143. => __do_sys_finit_module
  144. => do_syscall_64
  145. => entry_SYSCALL_64_after_hwframe
  146. In this case, it is possible to see that the thread added the highest
  147. contribution to the *timer latency* and the stack trace, saved during
  148. the timerlat IRQ handler, points to a function named
  149. dummy_load_1ms_pd_init, which had the following code (on purpose)::
  150. static int __init dummy_load_1ms_pd_init(void)
  151. {
  152. preempt_disable();
  153. mdelay(1);
  154. preempt_enable();
  155. return 0;
  156. }
  157. User-space interface
  158. ---------------------------
  159. Timerlat allows user-space threads to use timerlat infra-structure to
  160. measure scheduling latency. This interface is accessible via a per-CPU
  161. file descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd.
  162. This interface is accessible under the following conditions:
  163. - timerlat tracer is enable
  164. - osnoise workload option is set to NO_OSNOISE_WORKLOAD
  165. - The user-space thread is affined to a single processor
  166. - The thread opens the file associated with its single processor
  167. - Only one thread can access the file at a time
  168. The open() syscall will fail if any of these conditions are not met.
  169. After opening the file descriptor, the user space can read from it.
  170. The read() system call will run a timerlat code that will arm the
  171. timer in the future and wait for it as the regular kernel thread does.
  172. When the timer IRQ fires, the timerlat IRQ will execute, report the
  173. IRQ latency and wake up the thread waiting in the read. The thread will be
  174. scheduled and report the thread latency via tracer - as for the kernel
  175. thread.
  176. The difference from the in-kernel timerlat is that, instead of re-arming
  177. the timer, timerlat will return to the read() system call. At this point,
  178. the user can run any code.
  179. If the application rereads the file timerlat file descriptor, the tracer
  180. will report the return from user-space latency, which is the total
  181. latency. If this is the end of the work, it can be interpreted as the
  182. response time for the request.
  183. After reporting the total latency, timerlat will restart the cycle, arm
  184. a timer, and go to sleep for the following activation.
  185. If at any time one of the conditions is broken, e.g., the thread migrates
  186. while in user space, or the timerlat tracer is disabled, the SIG_KILL
  187. signal will be sent to the user-space thread.
  188. Here is an basic example of user-space code for timerlat::
  189. int main(void)
  190. {
  191. char buffer[1024];
  192. int timerlat_fd;
  193. int retval;
  194. long cpu = 0; /* place in CPU 0 */
  195. cpu_set_t set;
  196. CPU_ZERO(&set);
  197. CPU_SET(cpu, &set);
  198. if (sched_setaffinity(gettid(), sizeof(set), &set) == -1)
  199. return 1;
  200. snprintf(buffer, sizeof(buffer),
  201. "/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd",
  202. cpu);
  203. timerlat_fd = open(buffer, O_RDONLY);
  204. if (timerlat_fd < 0) {
  205. printf("error opening %s: %s\n", buffer, strerror(errno));
  206. exit(1);
  207. }
  208. for (;;) {
  209. retval = read(timerlat_fd, buffer, 1024);
  210. if (retval < 0)
  211. break;
  212. }
  213. close(timerlat_fd);
  214. exit(0);
  215. }