| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260 |
- ###############
- Timerlat tracer
- ###############
- The timerlat tracer aims to help the preemptive kernel developers to
- find sources of wakeup latencies of real-time threads. Like cyclictest,
- the tracer sets a periodic timer that wakes up a thread. The thread then
- computes a *wakeup latency* value as the difference between the *current
- time* and the *absolute time* that the timer was set to expire. The main
- goal of timerlat is tracing in such a way to help kernel developers.
- Usage
- -----
- Write the ASCII text "timerlat" into the current_tracer file of the
- tracing system (generally mounted at /sys/kernel/tracing).
- For example::
- [root@f32 ~]# cd /sys/kernel/tracing/
- [root@f32 tracing]# echo timerlat > current_tracer
- It is possible to follow the trace by reading the trace file::
- [root@f32 tracing]# cat trace
- # tracer: timerlat
- #
- # _-----=> irqs-off
- # / _----=> need-resched
- # | / _---=> hardirq/softirq
- # || / _--=> preempt-depth
- # || /
- # |||| ACTIVATION
- # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY
- # | | | |||| | | | |
- <idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns
- <...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns
- <idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns
- <...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns
- <idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns
- <...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns
- <idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns
- <...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns
- The tracer creates a per-cpu kernel thread with real-time priority that
- prints two lines at every activation. The first is the *timer latency*
- observed at the *hardirq* context before the activation of the thread.
- The second is the *timer latency* observed by the thread. The ACTIVATION
- ID field serves to relate the *irq* execution to its respective *thread*
- execution.
- The *irq*/*thread* splitting is important to clarify in which context
- the unexpected high value is coming from. The *irq* context can be
- delayed by hardware-related actions, such as SMIs, NMIs, IRQs,
- or by thread masking interrupts. Once the timer happens, the delay
- can also be influenced by blocking caused by threads. For example, by
- postponing the scheduler execution via preempt_disable(), scheduler
- execution, or masking interrupts. Threads can also be delayed by the
- interference from other threads and IRQs.
- Tracer options
- ---------------------
- The timerlat tracer is built on top of osnoise tracer.
- So its configuration is also done in the osnoise/ config
- directory. The timerlat configs are:
- - cpus: CPUs at which a timerlat thread will execute.
- - timerlat_period_us: the period of the timerlat thread.
- - stop_tracing_us: stop the system tracing if a
- timer latency at the *irq* context higher than the configured
- value happens. Writing 0 disables this option.
- - stop_tracing_total_us: stop the system tracing if a
- timer latency at the *thread* context is higher than the configured
- value happens. Writing 0 disables this option.
- - print_stack: save the stack of the IRQ occurrence. The stack is printed
- after the *thread context* event, or at the IRQ handler if *stop_tracing_us*
- is hit.
- timerlat and osnoise
- ----------------------------
- The timerlat can also take advantage of the osnoise: traceevents.
- For example::
- [root@f32 ~]# cd /sys/kernel/tracing/
- [root@f32 tracing]# echo timerlat > current_tracer
- [root@f32 tracing]# echo 1 > events/osnoise/enable
- [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
- [root@f32 tracing]# tail -10 trace
- cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 13585 ns
- cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns
- cc1-87882 [005] dNLh2.. 548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns
- cc1-87882 [005] d...3.. 548.771102: thread_noise: cc1:87882 start 548.771078243 duration 9909 ns
- timerlat/5-1035 [005] ....... 548.771104: #402268 context thread timer_latency 39960 ns
- In this case, the root cause of the timer latency does not point to a
- single cause but to multiple ones. Firstly, the timer IRQ was delayed
- for 13 us, which may point to a long IRQ disabled section (see IRQ
- stacktrace section). Then the timer interrupt that wakes up the timerlat
- thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally,
- the cc1 thread noise took 9909 ns of time before the context switch.
- Such pieces of evidence are useful for the developer to use other
- tracing methods to figure out how to debug and optimize the system.
- It is worth mentioning that the *duration* values reported
- by the osnoise: events are *net* values. For example, the
- thread_noise does not include the duration of the overhead caused
- by the IRQ execution (which indeed accounted for 12736 ns). But
- the values reported by the timerlat tracer (timerlat_latency)
- are *gross* values.
- The art below illustrates a CPU timeline and how the timerlat tracer
- observes it at the top and the osnoise: events at the bottom. Each "-"
- in the timelines means circa 1 us, and the time moves ==>::
- External timer irq thread
- clock latency latency
- event 13585 ns 39960 ns
- | ^ ^
- v | |
- |-------------| |
- |-------------+-------------------------|
- ^ ^
- ========================================================================
- [tmr irq] [dev irq]
- [another thread...^ v..^ v.......][timerlat/ thread] <-- CPU timeline
- =========================================================================
- |-------| |-------|
- |--^ v-------|
- | | |
- | | + thread_noise: 9909 ns
- | +-> irq_noise: 6139 ns
- +-> irq_noise: 7597 ns
- IRQ stacktrace
- ---------------------------
- The osnoise/print_stack option is helpful for the cases in which a thread
- noise causes the major factor for the timer latency, because of preempt or
- irq disabled. For example::
- [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us
- [root@f32 tracing]# echo 500 > osnoise/print_stack
- [root@f32 tracing]# echo timerlat > current_tracer
- [root@f32 tracing]# tail -21 per_cpu/cpu7/trace
- insmod-1026 [007] dN.h1.. 200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns
- insmod-1026 [007] d..h1.. 200.202587: #29800 context irq timer_latency 1616 ns
- insmod-1026 [007] dN.h2.. 200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns
- insmod-1026 [007] dN.h3.. 200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns
- insmod-1026 [007] d...3.. 200.203444: thread_noise: insmod:1026 start 200.202586933 duration 838681 ns
- timerlat/7-1001 [007] ....... 200.203445: #29800 context thread timer_latency 859978 ns
- timerlat/7-1001 [007] ....1.. 200.203446: <stack trace>
- => timerlat_irq
- => __hrtimer_run_queues
- => hrtimer_interrupt
- => __sysvec_apic_timer_interrupt
- => asm_call_irq_on_stack
- => sysvec_apic_timer_interrupt
- => asm_sysvec_apic_timer_interrupt
- => delay_tsc
- => dummy_load_1ms_pd_init
- => do_one_initcall
- => do_init_module
- => __do_sys_finit_module
- => do_syscall_64
- => entry_SYSCALL_64_after_hwframe
- In this case, it is possible to see that the thread added the highest
- contribution to the *timer latency* and the stack trace, saved during
- the timerlat IRQ handler, points to a function named
- dummy_load_1ms_pd_init, which had the following code (on purpose)::
- static int __init dummy_load_1ms_pd_init(void)
- {
- preempt_disable();
- mdelay(1);
- preempt_enable();
- return 0;
- }
- User-space interface
- ---------------------------
- Timerlat allows user-space threads to use timerlat infra-structure to
- measure scheduling latency. This interface is accessible via a per-CPU
- file descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd.
- This interface is accessible under the following conditions:
- - timerlat tracer is enable
- - osnoise workload option is set to NO_OSNOISE_WORKLOAD
- - The user-space thread is affined to a single processor
- - The thread opens the file associated with its single processor
- - Only one thread can access the file at a time
- The open() syscall will fail if any of these conditions are not met.
- After opening the file descriptor, the user space can read from it.
- The read() system call will run a timerlat code that will arm the
- timer in the future and wait for it as the regular kernel thread does.
- When the timer IRQ fires, the timerlat IRQ will execute, report the
- IRQ latency and wake up the thread waiting in the read. The thread will be
- scheduled and report the thread latency via tracer - as for the kernel
- thread.
- The difference from the in-kernel timerlat is that, instead of re-arming
- the timer, timerlat will return to the read() system call. At this point,
- the user can run any code.
- If the application rereads the file timerlat file descriptor, the tracer
- will report the return from user-space latency, which is the total
- latency. If this is the end of the work, it can be interpreted as the
- response time for the request.
- After reporting the total latency, timerlat will restart the cycle, arm
- a timer, and go to sleep for the following activation.
- If at any time one of the conditions is broken, e.g., the thread migrates
- while in user space, or the timerlat tracer is disabled, the SIG_KILL
- signal will be sent to the user-space thread.
- Here is an basic example of user-space code for timerlat::
- int main(void)
- {
- char buffer[1024];
- int timerlat_fd;
- int retval;
- long cpu = 0; /* place in CPU 0 */
- cpu_set_t set;
- CPU_ZERO(&set);
- CPU_SET(cpu, &set);
- if (sched_setaffinity(gettid(), sizeof(set), &set) == -1)
- return 1;
- snprintf(buffer, sizeof(buffer),
- "/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd",
- cpu);
- timerlat_fd = open(buffer, O_RDONLY);
- if (timerlat_fd < 0) {
- printf("error opening %s: %s\n", buffer, strerror(errno));
- exit(1);
- }
- for (;;) {
- retval = read(timerlat_fd, buffer, 1024);
- if (retval < 0)
- break;
- }
- close(timerlat_fd);
- exit(0);
- }
|